wat do
please be gentle
>>8930570
x$AGE goes into every field
>>8930570
What are bullshit points?
>>8930579
Its data from 39 sample, y axis is hours worked and x axis is age. I'm trying to do a linear regression model, but these two blue points (highlighted like shit) are messing my line. The red line is far better, but do I have any right to simply remove those blue points?
>>8930579
oldfags and newfags, obviously
>>8930585
They're outliers. Note that you removed outliers in your report, use the good fit, and move on with your life.
This >>8930597
They're outliers of the independent variable, too, so just say that you reduced your sample space to subjects between thirty years and sixty years or whatever. These data wouldn't be sufficient for interpolating to anything outside of that range in the first place.
>>8930619
Meant to say thirty-five through forty-five for the reduced sample space.
>>8930585
Why did you not get any people aged 25-35 or 45-55?
>>8930626
On purpose. Professor modified the data to rustle our jimmies thoroughly
>tfw being tested on things you haven't been taught
>>8930570
The linear regression is crap either way. It's clearly random what hours the middle-aged work.
Use Chauvenet's criterion, faggot
>>8930570
check the influence on the model using cook's distance criterion and remove them if the cutoff is above 4p/N, where N is the observations and P is the number of predictors you want, which seems to be two here, one slope and another intercpept.
>>8930628
Good teacher.
What do you think the distribution might have looked like with a better sample?
>>8930570
Even without the bullshit it looks wrong.
>>8930570
Blue line is a better fit, you shouldn't just remove points that weren't the result of an error
>>8930685
>Blue line is a better fit
literally opposite of definition
>>8930570
Why would you even do a regression on this?
>>8930731
to prove it's not linear.
>>8930739
What's not linear?
>>8930739
What if it is linear?
People could be working slightly longer hours as they get older with maximum variability around 40.
Ransac
>>8930753
Don't you have to assume it's linear before you do a linear regression?
>>8930755
That's what i just said you doughnut
>>8930756
so you assume it is linear just to prove that it isn't?
>>8930761
Yup
>>8930762
But if you proved it's linear and your assumptions were violated, doesn't that mean your assumption was that it was non-linear?
Your data is garbage, dude. If you leave those points in it's garbage, if you take them out it's garbage.
What that graph says is that Age is a terrible predictor for Hours.
>>8930774
>but there is obviously a relation there.
If every data point was the same age, let's say 40, what do you think the line of best fit would look like?
>>8930784
But they go from 38 to 45. That's 7 years
I get your point, butstill, what's with the correlation? it's -0.5
>>8930648
There are no rules for removing data, and 5% seems excessive to me. It's mostly an eyeball call anyway.
>>8930774
A linear regression will always produce a line of best fit, and a correlation statistic will always be found. That doesn't mean either of those are meaningful. Remove those data points and show us the residuals and line of best fit with just the central cluster and we'll see how it looks.
>approximating random points with arbitrarily chosen line that you think looks prettiest
is statistics really considered maths?
>>8930949
>Arbitrarily chosen
>What is least squares
The data makes no sense. Why would a 39 yo work significantly shorter hours than a 42 yo?
[eqn]Y=\lambda f.\ (\lambda x.\ f(xx))(\lambda x.\ f(xx))[/eqn]
>>8930793
i think that was an attempt to reference 1/20 ______ (data points, what have you..) will be statistically significant
>>>thr0wiing out datA is bad scINCe
>>8931120
They won't explain that at all, and no, it's not fine if the line doesn't fit well. That means you shouldn't settle on that model.
>>8932630
he mentioned he has to do a linear regression on this data
>this obviously isn't being used as a real analysis
>>8930577
underrated
>>8930570
Without those outliers the data is pretty much meaningless.
>"Wow there's a fairly even distribution of hours worked in the age group 35-35!"
And that red line is utter bullshit, implying that the data suggests that the amount of hours the average man works drops to 0 at just past the age of 45, even when you have a fat cloud of outliers that clearly disproves that. My prediction is that the professor will tear you a new one if you drop the so-called outliers. Especially since it's hard to tell but it looks like almost half your points are in those two clouds of outliers.