[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y ] [Search | Free Show | Home]

Computer Science

This is a blue board which means that it's for everybody (Safe For Work content only). If you see any adult content, please report it.

Thread replies: 7
Thread images: 1

File: 2016-01-12-185924_752x799_scrot.png (107KB, 752x799px) Image search: [Google]
2016-01-12-185924_752x799_scrot.png
107KB, 752x799px
How does one get a good understanding of preprocessing data before starting to think about neural network architecture, etc?

Is there a checklist or something? I guess there's imputation if needed, converting categorical to numerical, then... I look for correlations (correlation matrix) and maybe for mutual information (to check for non-linear correlations) but what else? I don't know, is there a complete guide for this?

Also, computer science general
>>
>http://blog.kaggle.com/2016/01/04/how-much-did-it-rain-ii-winners-interview-1st-place-pupa-aka-aaron-sim/

>mfw random physics guy jumps into ML and gets #1
>>
>>7779060
fuck NNs, bayesian program learning BTFO deep learning: http://science.sciencemag.org/content/350/6266/1332.full
>>
>>7779090
nice paywall kike
>>
>If I were to take one point away from this contest, it is that the days of manually constructing features from data are almost over. The machines will win. I experienced this in the Plankton classification contest where the monumental effort that my teammate and I put into extracting image features was eclipsed within minutes by even the shallowest of CNNs.
>>
>>7779060
That basically means you have to learn the field you are trying to do learning on.

>>7779172
People in general don't bother reading it if it's behind a paywall. Also the machines won't win if you don't have a method of selecting relevant training data. Any machine learning method could fail if you train it using the wrong data. Manually selected features could be used to disqualify the worst training data to avoid ruining the network.
>>
>>7780343
>not being part of a group that provides access to all papers you want
Thread posts: 7
Thread images: 1


[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y] [Search | Top | Home]

I'm aware that Imgur.com will stop allowing adult images since 15th of May. I'm taking actions to backup as much data as possible.
Read more on this topic here - https://archived.moe/talk/thread/1694/


If you need a post removed click on it's [Report] button and follow the instruction.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com.
If you like this website please support us by donating with Bitcoins at 16mKtbZiwW52BLkibtCr8jUg2KVUMTxVQ5
All trademarks and copyrights on this page are owned by their respective parties.
Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
This is a 4chan archive - all of the content originated from that site.
This means that RandomArchive shows their content, archived.
If you need information for a Poster - contact them.