- Eric Bank
We continue our review of elementary statistical concepts that are commonly used in the financial industry (i.e. by prime brokerages, hedge funds, financial analysts, etc.). Recall from a recent article that the formula for a linear regression is:
Yi = b0 + b1Xi + εi for i = 1, …, n
Yi is the ith value of the dependent variable
b0 is the y-intercept
b1 is the slope coefficient
Xi is the ith value of the independent variable
εi is the ith value of an error term
i is the index of a particular variable
n is the maximum value of i
Unfortunately, we do not have access to the population values of b0 and b1, so we are forced to estimate these values with b0estimated and b1estimated. This is one cause of uncertainty in the predicted value of Yi. The second cause of uncertainty is the error term εi , which is the difference between the estimated and true value of Yi. These two uncertainties beg the question: How confident are we about the forecast results? To answer this question, we calculate a prediction interval which is an estimated interval into which future observations will fall, with a given probability, in light of past observations.
For example, if we forecasted that sales for ABC Corporation would grow by 8 percent this year, our prediction would be more meaningful if we were 95 percent confident that sales growth would fall in the interval from 7 percent to 9 percent. A value outside the 7% to 9% range would not instill confidence in the value.
We can compute confidence intervals using our old friend, the standard error of the estimate s. The variance of the prediction error is equal to the square of the standard error of the estimate, namely sf2. This estimated variance can be calculated using this formula:
Note that sx2 is the variance of the independent variable X.
After you calculate the variance of the prediction error, you choose a significance level α, say 0.05. We apply another old friend, the t-statistic, which is the critical value for the forecast interval and can be looked up in the back of any statistics textbook.. By using (1 – α) = 0.95, we can compute the percent prediction interval Y as
Y = ± tc sf
Let’s take a numerical example[i] as follows:
1) Assume a linear regression equation Y = 1.3478 + 30.0169(0.10) = 4.3495; the standard error of the estimate s = 0.7422; the mean value of X = 0.0647; the variance of the mean sx2 = 0.004641; the number of observations n = 9, the number of coefficients (the y-intercept and the slope) = 2.
2) Assume we are interested in the 95% confidence interval.
3) Compute the variance of the prediction error:
sf2 = 0.74222 [1 + 1/9 + (0.10 – 0.0647)2 / (9 – 1)0.004641] = 0.630556
4) Take the square root of the variance of the prediction error sf2, giving the standard deviation of the forecast error sf = (0.630556)1/2 = 0.7941.
5) The degrees of freedom = (observations n – number of coefficients) = (9 – 2) = 7. From the back of a statistics book, the critical t-statistic for 7 degrees of freedom at the 95% confidence interval, tc = 2.365.
6) We compute the prediction interval for the 95% level of confidence. It is equal to the following:
Y = ± tc sf = 4.3495 – 2.365(0.7941) to 4.3495 + 2.365(0.7941) = 2.4715 to 6.2275.
From this example, we are 95% confident that a value of the dependent variable will have a value between 2.4715 and 6.2275, the prediction interval.
Well, we have now reviewed the basic concepts pertaining to single-variable linear regressions. We’ll pick up our voyage through financial statistics next time by examining multiple regressions.
[i] DeFusco, McLeavey, Pinto and Runkle, “Quantitative Methods for Investment Analysis, Second Edition”, pages 323-324.