- Eric Bank
As we pointed out in our discussion of the standard error of estimate, it would be nice to know how well the independent variable X explains variation in the dependent variable Y. To calculate the fraction of the total variation in the dependent variable that is explained by the independent variable, one uses the coefficient of determination (R2).
There are two ways to calculate R2. The easier method involves squaring the correlation coefficient for a linear regression with a single independent variable. Recall from a previous blog that the correlation coefficient, r, is equal to the covariance of the two variables divided by the product of their standard deviations (sxsy). (We pointed out that covariance measures the extent to which two variables (X, Y) change together). The formula for the correlation coefficient is:
r = Cov(X, Y) / sxsy.
We square it, giving us R2 as the coefficient of determination. However, this doesn’t work when we are dealing with more than one independent variable (X).
The alternate calculation of R2 for multiple independent variables is to use the following definition:
Total variation = Unexplained variation + Explained variation
Since R2 stands for the fraction of the total variation that is explained by a linear regression, we get this solution:
R2 = Explained Variation/Total Variation = 1 – (Unexplained Variation / Total Variation)
There is one more alternative for calculating R2 . Linear regression packages typically report a statistic called multiple R, which is the correlation between actual Y values and predicted Y values. R2 is the square of multiple R.
As an example, let’s take the results from a hypothetical multiple regression which regresses inflation rate on money supply growth rate for several different countries over a particular period of time. We calculate the following results:
- total variation is the sum of the squared deviations (Yi – Yavg)2 = 0.001598
- the unexplained variation is 0.000386
the value for R2 is (0.001598 – 0.000386) / 0.001598 = 0.7586.
Now when you inspect the generated results from a linear regression, you’ll have an understanding of the reported R2 statistic, and can judge the meaningfulness of the predicted Y values.
We are making great progress with our review of elementary financial statistics. Next time, we’ll look at analysis of variance (ANOVA) and the F-test.