Posts tagged “maximum value”

Financial Statistics (8) – Prediction Intervals

October 7th, 2010

- Eric Bank

Prediction interval

We continue our review of elementary statistical concepts that are commonly used in the financial industry (i.e. by prime brokerages, hedge funds, financial analysts, etc.). Recall from a recent article that the formula for a linear regression is:

Yi = b0 + b1Xi + εi for i = 1, …, n

where:

Yi is the ith value of the dependent variable

b0 is the y-intercept

b1 is the slope coefficient

Xi is the ith value of the independent variable

εi is the ith value of an error term

i is the index of a particular variable

n is the maximum value of i

Unfortunately, we do not have access to the population values of b0 and b1, so we are forced to estimate these values with b0estimated and b1estimated.  This is one cause of uncertainty in the predicted value of Yi. The second cause of uncertainty is the error term εi , which is the difference between the estimated and true value of Yi.  These two uncertainties beg the question: How confident are we about the forecast results? To answer this question, we calculate a prediction interval which is an estimated interval into which future observations will fall, with a given probability, in light of past observations.

For example, if we forecasted that sales for ABC Corporation would grow by 8 percent this year, our prediction would be more meaningful if we were 95 percent confident that sales growth would fall in the interval from 7 percent to 9 percent.  A value outside the 7% to 9% range would not instill confidence in the value.

We can compute confidence intervals using our old friend, the standard error of the estimate s. The variance of the prediction error is equal to the square of the standard error of the estimate, namely sf2.   This estimated variance can be calculated using this formula:

Note that sx2 is the variance of the independent variable X.

After you calculate the variance of the prediction error, you choose a significance level α, say 0.05.  We apply another old friend, the t-statistic, which is the critical value for the forecast interval and can be looked up in the back of any statistics textbook..  By using (1 – α) = 0.95, we can compute the percent prediction interval Y as

Y = ± tc sf

Let’s take a numerical example[i] as follows:

1)     Assume a linear regression equation Y = 1.3478 + 30.0169(0.10) = 4.3495; the standard error of the estimate s = 0.7422; the mean value of X = 0.0647; the variance of the mean sx2 = 0.004641; the number of observations n = 9, the number of coefficients (the y-intercept and the slope) = 2.

2)     Assume we are interested in the 95% confidence interval.

3)     Compute the variance of the prediction error:

sf2 = 0.74222 [1 + 1/9 +  (0.10 – 0.0647)2 /  (9 – 1)0.004641] = 0.630556

4)     Take the square root of the variance of the prediction error sf2, giving the standard deviation of the forecast error sf = (0.630556)1/2 =  0.7941.

5)     The degrees of freedom = (observations n – number of coefficients) = (9 – 2) = 7. From the back of a statistics book, the critical t-statistic for 7 degrees of freedom at the 95% confidence interval, tc = 2.365.

6)     We compute the prediction interval for the 95% level of confidence. It is equal to the following:

Y = ± tc sf = 4.3495 – 2.365(0.7941) to 4.3495 + 2.365(0.7941) = 2.4715 to 6.2275.

From this example, we are 95% confident that a value of the dependent variable will have a value between 2.4715 and 6.2275, the prediction interval.

Well, we have now reviewed the basic concepts pertaining to single-variable linear regressions. We’ll pick up our voyage through financial statistics next time by examining multiple regressions.


[i] DeFusco, McLeavey, Pinto and Runkle, “Quantitative Methods for Investment Analysis, Second Edition”, pages 323-324.

Financial Statistics (2) – Linear Regression: Definition

September 9th, 2010

A linear regression is a statistical method that helps one understand the relationship between two (or more) variables.  It does this in three ways:

  1. It uses one variable to predict the value of another variable
  2. It tests hypotheses concerning the relationship between two variables
  3. It quantifies the strength of the relationship between two variables

As we did in our discussion of linear correlation, we will denote two variables as X and Y; X is the independent variable, Y the dependent one.  A linear regression assumes that there is a linear relationship between X and Y, and is given by the following formula:

Yi = b0 + b1Xi + εi for i = 1, …, n

where:

Yi is the ith value of the dependent variable

b0 is the y-intercept

b1 is the slope coefficient

Xi is the ith value of the independent variable

εi is the ith value of an error term

i is the index of a particular variable

n is the maximum value of i

In English, the value of the dependent variable Yi is equal to {the value of dependent variable when the independent variable’s value is zero (b0)} plus {the product of the slope b1 and the independent variable b1} plus {some error term εi}. The error term is that part of Yi that is not explained by Xi . We call b0 and b1 the regression coefficients.

When we speak about the relationship between two variables, we think in terms of many contemporaneous observations (a cross-sectional series) or observations over a period of time (a time-series). Observations are indexed by values 1 to n.  For example, you may be interested in the effect in various countries of money supply (Xi where i refers to a particular country) on the country’s inflation rate (Yi) – that would be a cross-sectional analysis.  Conversely, you would use a time-series analysis to test the money supply/inflation rate relationship in one country over a period of time.

A perfect linear regression would be one where all of the error terms equaled zero.  This would indicate that all changes to Y were accounted for by changes to X.  For instance, if I eat every cookie handed to me, then there would be no error values when I plot cookies offered versus cookie consumed. In this case, the regression line’s y-intercept would be zero and the slope would equal 1; all actual data values would be points directly on the regression line. Thus, if you offered me 3 cookies, I’d eat 3 cookies. Obviously this example is unrealistic when the number of cookies offered rises above some critical value, say 3-dozen in my case.

A more realistic case is one that plots a straight regression line through the data in which the errors are minimized – the best fit.  In real life, we are interested in imperfect correlations, so we need a method to achieve the best fit, which we define as the regression line that minimizes the sum of the squared vertical distances (deviations) between observations and the regression line.  This method is called the linear least-square method.  Nifty, but how do we calculate the best fit?

To achieve the best fitting regression line, we need to find the slope b1 and y-intercept b0 that produces the minimum sum of the squared errors. (We square the errors, which are simply the vertical deviations from the regression line, because we don’t want positive and negative values to cancel each other out).  How do we find these magic regression coefficients? We need to make estimates, which we call the fitted parameters, according the following formula:

The funny little hat (^) above b0 and b1 designates that the regression coefficients are estimated. We are summing, for all index values of i, the squares of the following difference: the actual value of the dependent variable minus the predicted value of the dependent variable. When this sum (the sum of the squared error terms) is minimized, we have a best-fit regression line. The actual method of calculating this minimum is complicated, and we leave it to a computer spreadsheet or math package to do the nitty-gritty work.

A note about the slope coefficient b1: when a linear regression contains a single independent variable, the slope coefficient is equal to the following:

b1 = Cov(Y, X) / Var(X) = Cov(Y, X) / sxsx where s = standard deviation

which is the covariance of Y and X divided by the variance of X.  Alert readers will recall from the previous blog that this formula is very similar to that for the correlation coefficient (r). The difference here is that the denominator, the variance of X, is the equivalent to the square of the standard deviation of X (sx). For the correlation coefficient, the denominator is the product of the standard deviations for X and Y:

r = Cov(Y, X) / sxsy

Conceptually, one can see that the coefficients are very similar – they both give a scale to the covariance of the two variables.

Next time, we will address the assumptions one makes in order to calculate a proper linear regression.

Bottom Logo Wall Bottom Logo Reuters Bottom Logo Forbes Bottom Logo Fortune Bottom Logo Cnn Bottom Logo Cnbc Bottom Logo Fox Bottom Logo Comunity