- Eric Bank
Now that we have examined correlation and linear regression, we now need to understand whether a correlation describes a real relationship or is just the result of chance. Only real relationships are predictive. Another way of saying this is that we want to test the null hypothesis (H0) that a correlation coefficient ϱ in the population is equal to zero (ϱ = 0), versus the alternative hypothesis (H1) that it is significantly different from zero (ϱ ≠ 0).
Since we are testing whether the correlation is not zero (i.e. significantly bigger or smaller than zero), we need to perform a two-tailed test. We assume that the variables (X and Y) are normally distributed – this permits us to perform a t-test:
where the sample correlation r is an estimate of the population correlation ϱ, and n is the sample size. We use (n -2) degrees of freedom to see if the test statistic has a t-distribution; if it does, then H0 is true. By using n – 2 instead of n for the degrees of freedom, we avoid introducing a bias into the calculation. If the calculated t-value exceeds the critical t-value for the degrees of freedom, then H0 can be rejected. By the way, you can look up the critical t-value in a table at the back of any statistics book. Note that as n increases, the absolute value of the critical t-value decreases: it’s easier to reject the null hypothesis with a larger sample size. Also note that the numerator of the t-test increases with increasing n, meaning you get larger values of t for larger samples. The bottom line is that the likelihood of failing to reject a false H0 decreases with sample size.
When we perform a t-test, we need to specify a level of statistical significance. For example, if we choose the 0.05 level of significance, we are confident in the results of test 95 times out of 100. All things being equal, a lower level of significance produces a higher critical t-value: it becomes harder to reject H0, but you have more confidence in the predictive value of the correlation.
Let’s work a numerical example[1]. We determine that the sample correlation r between monthly returns on long-term U. S. government bonds and 30-day T-bills was 0.1119 over 924 months of observations. Is this value of r high enough to reject the hypothesis that returns on the bonds were uncorrelated to returns on the T-bills? For the 0.05 level of significance, the critical t-value is 1.96, and we can plug in the values into the t-test:
tactual > tcritical = 0.1119 (924 – 2).5 / (1 – 0.11192).5 = 3.4193 > 1.96
Thus, in this example we are able to reject the null hypothesis, and say that there is correlation between government bonds and T-bills.
We want next to assess the strength of a relationship between an independent and a dependent variable as determined by a linear regression. We will examine this test in our next blog using a statistic called the standard error of estimate.
[1] Quantitative Methods for Investment Analysis, Second Edition, by Richard A. DeFusco, CFA, Dennis W. McLeavey, CFA, Jerald E. Pinto, CFA, and David E. Runkle, 294-295.
