[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y ] [Search | Free Show | Home]

Data Transformation/Normalization

This is a blue board which means that it's for everybody (Safe For Work content only). If you see any adult content, please report it.

Thread replies: 6
Thread images: 1

File: Normalization.png (31KB, 300x226px) Image search: [Google]
Normalization.png
31KB, 300x226px
Good morning,

I have a question about "normalizing" data. Basically, it refers to the process of applying a non-linear transformation (such as a log function) in order to shift the shape of the data's distribution to something that more resembles the normal distribution.

My question is, why are we allowed to do that? Wouldn't the process bias the results? Isn't this basically a new data set altogether?

I realize that the answer is usually given an inverse transformation, but I still don't understand why its valid to analyze the data in its "morphed" stage.

If anyone could point me in the right direction that would be great.

Thanks.
>>
>>8630610
Monotone transformation
>>
>>8630610
Monotonic transformation
>>
>>8630610
Gin Tonic transformation
>>
>>8630610
Whether you're allowed to apply such a transformation depends on what type of test you're going to run on the data.

Parametric statistical tests make assumptions about the distribution of the data on which the test is run. So if the data violate these assumptions, then *not* applying a transformation to make sure that the data conform to the assumptions would be bad practice. In essence, your transformation allows you to make a valid statistical inference, which would otherwise not be possible.

Moreover, it's necessary to know if the type of transformation you're using in itself is a valid one. For example, if you are comparing to distributions with a t-test, you can't just go and reduce the standard deviation of one distribution but not the other, because that would again invalidate the statistical inference that you're trying to make. The general rule is that you conserve the statistical properties of the data that you want to make inferences about, but transform the data such that the other statistical properties which violate assumptions are discarded.
>>
>>8630610
Besides the normality assumption for regression residuals (which can be relaxed using other generalized linear models), transformations also facilitate the fitting and interpretation of exponential and power law relationships, and putting covariates on a common scale is also helpful for machine learning and MCMC
Thread posts: 6
Thread images: 1


[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y] [Search | Top | Home]

I'm aware that Imgur.com will stop allowing adult images since 15th of May. I'm taking actions to backup as much data as possible.
Read more on this topic here - https://archived.moe/talk/thread/1694/


If you need a post removed click on it's [Report] button and follow the instruction.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com.
If you like this website please support us by donating with Bitcoins at 16mKtbZiwW52BLkibtCr8jUg2KVUMTxVQ5
All trademarks and copyrights on this page are owned by their respective parties.
Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
This is a 4chan archive - all of the content originated from that site.
This means that RandomArchive shows their content, archived.
If you need information for a Poster - contact them.