Last time, we defined linear regression and explained the relevant equations. We’ll continue today with a look at the assumptions underlying the proper use of linear regression, limitations on the interpretation of linear regression results, and uses of correlation analysis for financial and economic forecasting.
1) There must be more data points than there are variables. For the two-variable examples we have been discussing, this just trivially requires at least 3 data points. For multi-variable regressions, the number of data points must always exceed the number of variables; otherwise you encounter the dreaded multi-collinearity, which results in coefficient estimates that may change haphazardly in response to small changes in the model or the data.
2) The regressors (the independent variables on the X-axis that predict the value of the dependent variable on the Y-axis) must be free from measurement error.
3) Some estimation methods prefer that observations not be strongly correlated to each other, although there are techniques to handle this occurrence.
4) It is preferred that the error terms (ε) all have the same mean and standard deviation. This leads to the situation where each probability distribution for different Y–values all have the same standard deviation, independent of associated X-values – a condition called homoscedasticity. Unequal standard deviations in the error terms, heteroscedasticity, are allowed but decrease the accuracy of certain parameter-estimation methods.
1) There may be a strong nonlinear relation among the variables that is not detected by a linear regression. An example would be a quadratic relationship.
2) Outliers (a few observations with values far away from all others) can compromise the accuracy of a linear regression. Judgment is required to know whether to include or exclude outliers.
3) Correlation does not imply causality. Furthermore, spurious correlations can imply a relationship between variables when in fact none exists. There are three causes of spurious correlations:
- correlation between two variables that exhibits chance relationships in a particular set of observations
- correlation created by a calculation that mingles each of two variables with a third
- correlation between two variables created not from a direct relation between them but from their relation to a third variable
Examples of Uses for Correlation Analysis
1) Evaluating the accuracy of economic forecasts that are based on linear regression of forecast and actual economic results. For example, the outlook for inflation may be forecast by changes in the consumer price index – how accurate would such a forecast be? One could do a linear regression between forecast and actual inflation rates. The higher the correlation, the more useful the forecast.
2) It is important to measure portfolio manager performance as compared to a specific benchmark, such as the S&P 500. Style analysis is used to choose a benchmark appropriate to the portfolio choices of a specific portfolio manager. If two styles show a very high correlation to each other, there may be no justification for differentiating the two styles. For example, if small-cap growth and small-cap value had a correlation near 1, then it would be just as relevant to use just small-cap as the relevant benchmark.
3) Currency traders attempt to optimize the amounts allocated to each currency. By using a multiple regression matrix, one can see the cross-correlations between any pair of currencies. This information can help a currency trader decide how to hedge currency risks by selecting currencies with low correlation coefficients relative to currencies that dominate a portfolio.
4) Portfolio managers who seek to diversify risks across different asset classes need to know how the returns of each asset class correlate to the returns of other classes. In this way, the manager can determine if an investment in a particular asset class actually provides a sufficient increment of diversification.
It is important to know whether apparent relationships among variables are caused by chance or are a reflection of the real world. Therefore, it is important to know how to test the significance of a correlation coefficient. We’ll tackle this subject next time.