Posts tagged “null hypothesis”

Financial Statistics (7) – Analysis of Variance and the F-Test

October 4th, 2010

-Eric Bank

Analysis of variance (ANOVA) is used to determine how useful an independent variable X is at explaining variation of the dependent variable Y. For this article, we’ll confine our discussion to linear regressions with a single independent variable, although ANOVA is also appropriate for multi-variable regressions. Recall that we have examined linear regressions and the meaning of the slope coefficient. The F-test is used within ANOVA to see whether the slope coefficient (b1) is equal to zero. That is, we test the null hypothesis H0 of b1 = 0 against the alternative hypothesis H1 that b1 ≠ 0.  If H0 turns out to be true, then X is not a good predictor of Y.

To calculate the F-statistic, we need the following items of data:

  • the number of observations n
  • the number of parameters (intercept and slope coefficient) = 2
  • the sum of squared errors SSE =

  • the total variation in Y that is explained by the regression, known as the regression sum of squares RSS =

  • total variation  TSS = SSE + RSS

The F-statistic is the ratio of the RSS to the average SSE. This average is calculated by dividing the non-averaged SSE by n-2, the degrees of freedom (observations less parameters). Thus, the F-statistic equals RSS / (SSE / (n -2)).

To clarify, suppose H0 was true, and that X did nothing to predict the value of Y. In that case, the predicted value of Y for Xi is equal to the mean value of the dependent variable.  But if this were true, then RSS = zero, and the F-statistic = zero.  The higher the value of F, the more predictive X is.

In a previous blog, we examined the t-statistic. Note that F is equal to the square of t for a single-variable linear regression. In the next blog, we will look at prediction intervals.

Financial Statistics (4) – Testing Correlations for Significance: the t-Test

September 20th, 2010

Tea test

- Eric Bank

Now that we have examined correlation and linear regression, we now need to understand whether a correlation describes a real relationship or is just the result of chance.  Only real relationships are predictive.  Another way of saying this is that we want to test the null hypothesis (H0) that a correlation coefficient ϱ in the population is equal to zero (ϱ = 0), versus the alternative hypothesis (H1) that it is significantly different from zero (ϱ 0).

Since we are testing whether the correlation is not zero (i.e. significantly bigger or smaller than zero), we need to perform a two-tailed test. We assume that the variables (X and Y) are normally distributed – this permits us to perform a t-test:

where the sample correlation r is an estimate of the population correlation ϱ, and n is the sample size. We use (n -2) degrees of freedom to see if the test statistic has a t-distribution; if it does, then H0 is true. By using n – 2 instead of n for the degrees of freedom, we avoid introducing a bias into the calculation.  If the calculated t-value exceeds the critical t-value for the degrees of freedom, then H0 can be rejected. By the way, you can look up the critical t-value in a table at the back of any statistics book. Note that as n increases, the absolute value of the critical t-value decreases: it’s easier to reject the null hypothesis with a larger sample size. Also note that the numerator of the t-test increases with increasing n, meaning you get larger values of t for larger samples. The bottom line is that the likelihood of failing to reject a false H0 decreases with sample size.

When we perform a t-test, we need to specify a level of statistical significance.  For example, if we choose the 0.05 level of significance, we are confident in the results of test 95 times out of 100. All things being equal, a lower level of significance produces a higher critical t-value: it becomes harder to reject H0, but you have more confidence in the predictive value of the correlation.

Let’s work a numerical example[1].  We determine that the sample correlation r between monthly returns on long-term U. S. government bonds and 30-day T-bills was 0.1119 over 924 months of observations. Is this value of r high enough to reject the hypothesis that returns on the bonds were uncorrelated to returns on the T-bills?  For the 0.05 level of significance, the critical t-value is 1.96, and we can plug in the values into the t-test:

tactual > tcritical =  0.1119 (924 – 2).5 / (1 – 0.11192).5 = 3.4193 > 1.96

Thus, in this example we are able to reject the null hypothesis, and say that there is correlation between government bonds and T-bills.

We want next to assess the strength of a relationship between an independent and a dependent variable as determined by a linear regression. We will examine this test in our next blog using a statistic called the standard error of estimate.


[1] Quantitative Methods for Investment Analysis, Second Edition, by Richard A. DeFusco, CFA, Dennis W. McLeavey, CFA, Jerald E. Pinto, CFA, and David E. Runkle, 294-295.

Bottom Logo Wall Bottom Logo Reuters Bottom Logo Forbes Bottom Logo Fortune Bottom Logo Cnn Bottom Logo Cnbc Bottom Logo Fox Bottom Logo Comunity