[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y ] [Search | Free Show | Home]

wat do please be gentle

This is a blue board which means that it's for everybody (Safe For Work content only). If you see any adult content, please report it.

Thread replies: 45
Thread images: 2

File: Untitled2.png (14KB, 633x607px) Image search: [Google]
Untitled2.png
14KB, 633x607px
wat do

please be gentle
>>
>>8930570
x$AGE goes into every field
>>
>>8930570
What are bullshit points?
>>
>>8930579

Its data from 39 sample, y axis is hours worked and x axis is age. I'm trying to do a linear regression model, but these two blue points (highlighted like shit) are messing my line. The red line is far better, but do I have any right to simply remove those blue points?
>>
>>8930579
oldfags and newfags, obviously
>>
>>8930585
They're outliers. Note that you removed outliers in your report, use the good fit, and move on with your life.
>>
This >>8930597
They're outliers of the independent variable, too, so just say that you reduced your sample space to subjects between thirty years and sixty years or whatever. These data wouldn't be sufficient for interpolating to anything outside of that range in the first place.
>>
>>8930619
Meant to say thirty-five through forty-five for the reduced sample space.
>>
>>8930585
Why did you not get any people aged 25-35 or 45-55?
>>
>>8930626
On purpose. Professor modified the data to rustle our jimmies thoroughly
>tfw being tested on things you haven't been taught
>>
>>8930570
The linear regression is crap either way. It's clearly random what hours the middle-aged work.
>>
Use Chauvenet's criterion, faggot
>>
>>8930570
check the influence on the model using cook's distance criterion and remove them if the cutoff is above 4p/N, where N is the observations and P is the number of predictors you want, which seems to be two here, one slope and another intercpept.
>>
>>8930636
>>8930642
Thank you

I also heard about some rule that you are allowed to remove at most 5% of your data if the data is anomalous
>>
>>8930628
Good teacher.
What do you think the distribution might have looked like with a better sample?
>>
>>8930570
Even without the bullshit it looks wrong.
>>
>>8930667
>>8930667
Competely random desu. It's not linear but either way the points are bullshit
>>8930679
Yeah, but you got to prove it
>>
>>8930570
Blue line is a better fit, you shouldn't just remove points that weren't the result of an error
>>
>>8930685
>Blue line is a better fit
literally opposite of definition
>>
>>8930570

Why would you even do a regression on this?
>>
>>8930731
to prove it's not linear.
>>
>>8930739

What's not linear?
>>
>>8930739
What if it is linear?

People could be working slightly longer hours as they get older with maximum variability around 40.
>>
Ransac
>>
>>8930745
The model
>>8930746
Well I went on to prove it's linear, got my assumptions violated
This is a problem for GLM
>>
>>8930753

Don't you have to assume it's linear before you do a linear regression?
>>
>>8930755
That's what i just said you doughnut
>>
>>8930756

so you assume it is linear just to prove that it isn't?
>>
>>8930761
Yup
>>
>>8930762

But if you proved it's linear and your assumptions were violated, doesn't that mean your assumption was that it was non-linear?
>>
Your data is garbage, dude. If you leave those points in it's garbage, if you take them out it's garbage.

What that graph says is that Age is a terrible predictor for Hours.
>>
>>8930767
I did not proved it's linear. I assumed the model is correct, but the assumption OF the model were violated. Thus not linear

>>8930768
What about their correlation? It increases substantially if we remove those data points . I agree the model sucks, but there is obviously a relation there.
>>
>>8930774
>but there is obviously a relation there.

If every data point was the same age, let's say 40, what do you think the line of best fit would look like?
>>
>>8930784
But they go from 38 to 45. That's 7 years
I get your point, butstill, what's with the correlation? it's -0.5
>>
>>8930648
There are no rules for removing data, and 5% seems excessive to me. It's mostly an eyeball call anyway.

>>8930774
A linear regression will always produce a line of best fit, and a correlation statistic will always be found. That doesn't mean either of those are meaningful. Remove those data points and show us the residuals and line of best fit with just the central cluster and we'll see how it looks.
>>
>approximating random points with arbitrarily chosen line that you think looks prettiest
is statistics really considered maths?
>>
>>8930949
>Arbitrarily chosen
>What is least squares
>>
The data makes no sense. Why would a 39 yo work significantly shorter hours than a 42 yo?
>>
[eqn]Y=\lambda f.\ (\lambda x.\ f(xx))(\lambda x.\ f(xx))[/eqn]
>>
>>8930793
i think that was an attempt to reference 1/20 ______ (data points, what have you..) will be statistically significant


>>>thr0wiing out datA is bad scINCe
>>
>>8930793
if the linear line does not fit that's fine...an f-value and a p-value should explain why that is >>8930570
>>
>>8931120
They won't explain that at all, and no, it's not fine if the line doesn't fit well. That means you shouldn't settle on that model.
>>
>>8932630
he mentioned he has to do a linear regression on this data

>this obviously isn't being used as a real analysis
>>
>>8930577
underrated
>>
>>8930570
Without those outliers the data is pretty much meaningless.
>"Wow there's a fairly even distribution of hours worked in the age group 35-35!"
And that red line is utter bullshit, implying that the data suggests that the amount of hours the average man works drops to 0 at just past the age of 45, even when you have a fat cloud of outliers that clearly disproves that. My prediction is that the professor will tear you a new one if you drop the so-called outliers. Especially since it's hard to tell but it looks like almost half your points are in those two clouds of outliers.
Thread posts: 45
Thread images: 2


[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y] [Search | Top | Home]

I'm aware that Imgur.com will stop allowing adult images since 15th of May. I'm taking actions to backup as much data as possible.
Read more on this topic here - https://archived.moe/talk/thread/1694/


If you need a post removed click on it's [Report] button and follow the instruction.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com.
If you like this website please support us by donating with Bitcoins at 16mKtbZiwW52BLkibtCr8jUg2KVUMTxVQ5
All trademarks and copyrights on this page are owned by their respective parties.
Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
This is a 4chan archive - all of the content originated from that site.
This means that RandomArchive shows their content, archived.
If you need information for a Poster - contact them.