When we test the significance of a model, we also test if the regression coefficients in that model at the same time can be statistically different from 0.
This is correct right? My friend tells me it's not.
>>8222407
He tells me it tests if R-squared is statistically significant from 0. Is this true?
>>8222418
Sure
>>8222460
LEWD
>>8222468
Report you and sage
>When we test the significance of a model, we also test if the regression coefficients in that model at the same time can be statistically different from 0.
Assuming the usual classical linear regression (OLS) model, this is correct. The null hypothesis that all the coefficients are 0 is equivalent to [math]y = \mathrm{constant}[/math] i.e. the model has no explanatory power for y.
>>8222425
He tells me it tests if R-squared is statistically significant from 0.
Strictly speaking no -- although the usual F-statistic for testing model significance is proportional to [math]1 - \frac{1}{1 + R^2}[/math] so it is true that a model with low R-squared will tend to produce low F-statistics as well.
As far as I know there is no hypothesis test for R-squared besides a normal approximation using the central limit theorem (which is only valid for large samples anyway), which is one of the reasons why statisticians tend to downplay this statistic. The main reason is that your sample correlation coefficient is biased, i.e. it overestimate R-squared on average
(http://blog.minitab.com/blog/adventures-in-statistics/r-squared-shrinkage-and-power-and-sample-size-guidelines-for-regression-analysis).
>>8222475
>report you
No rules are being broken here and everything was relevant with OP's thread until you arrived.
>sage
Give up, this post is bumping the thread to the top anyway.
>>8222482
post more brat
>>8222486
What does this post have to do with science?
>>8222482
Alright, so assuming OLS it's correct.
Under the classical linear model, [math]y \sim N(X\beta, \sigma^2I)[/math] has a multivariate normal distribution (conditional on X).
Hence the OLS estimator [math]\hat{\beta} := (X^TX)^{-1}X^Ty \sim N(X\beta \sigma^2(X^TX)^{-1})[/math] is an unbiased estimator of [math]\beta[/math]. Moreover, it is the "best" linear unbiased estimator in that it attains the Cramer-Rao bound -- for any other such estimator with variance [math]A[/math], [math]A - \sigma^2(X^TX)^{-1}[/math] is positive semidefinite.
By construction, the residuals [math]r = y - X\hat{\beta}[/math] are jointly normally distributed and uncorrelated with [math]\hat{\beta}[/math], hence they are statistically independent. Thus we are justified in forming the sum of squared residuals [math]SSR = \sigma^2 r^Tr[/math] and explained sum of squares [math]ESS = \sigma^2\hat{\beta}^T(X^TX)^{-1}\hat{\beta}[/math]. Both of them are chi-squared distributions, and when we take their ratio, the unknown [math]\sigma^2[/math] cancels out and we are left with the F-statistic (multiplied by a constant factor that adjusts for degrees of freedom) which is used to test [math]\hat{\beta} = 0[/math].
The R-squared [math]ESS/(SSR+ESS)[/math] is a very intuitive measure of model fit. If only we could actually compute it! Unfortunately there's no way to get rid of the unknown [math]\sigma^2[/math] parameter, and it is known that the sample r-squared is a systematic overestimate (see >>8222482).
>>8222566
Errata: the distribution of the OLS estimator should be [math]\hat{\beta} \sim N \left( \beta, \sigma^2(X^TX)^{-1} \right)[/math], not whatever that mess is...
>>8222617
Thanks anon-kun!