How does one get a good understanding of preprocessing data before starting to think about neural network architecture, etc?
Is there a checklist or something? I guess there's imputation if needed, converting categorical to numerical, then... I look for correlations (correlation matrix) and maybe for mutual information (to check for non-linear correlations) but what else? I don't know, is there a complete guide for this?
Also, computer science general
>http://blog.kaggle.com/2016/01/04/how-much-did-it-rain-ii-winners-interview-1st-place-pupa-aka-aaron-sim/
>mfw random physics guy jumps into ML and gets #1
>>7779060
fuck NNs, bayesian program learning BTFO deep learning: http://science.sciencemag.org/content/350/6266/1332.full
>>7779090
nice paywall kike
>If I were to take one point away from this contest, it is that the days of manually constructing features from data are almost over. The machines will win. I experienced this in the Plankton classification contest where the monumental effort that my teammate and I put into extracting image features was eclipsed within minutes by even the shallowest of CNNs.
>>7779060
That basically means you have to learn the field you are trying to do learning on.
>>7779172
People in general don't bother reading it if it's behind a paywall. Also the machines won't win if you don't have a method of selecting relevant training data. Any machine learning method could fail if you train it using the wrong data. Manually selected features could be used to disqualify the worst training data to avoid ruining the network.
>>7780343
>not being part of a group that provides access to all papers you want