Many people who work at financial institutions, such as prime brokerages and hedge funds, have had formal financial training, including the use of statistics and other quantitative methods. Today we are launching a series of blogs that cover these important topics at a straightforward, accessible level. We’ll assume you have had some exposure to the subject matter (for instance, you are familiar with terms like population and sample) and that you can handle simple algebra.
Statistics play a key role in financial modeling, so we’ll begin by looking at linear correlations and linear regressions.
Data analysis and prediction are the reasons for employing statistical method. Data can be organized and presented in many ways. One of the most popular presentations is a scatter plot, in which two series of observations are plotted on an x-y coordinate graph. For each data pair (that is, two simultaneous observations), the appropriate point is shown on the graph as the intersection of the x and y values. For instance, if we place money-supply growth on the x-axis and inflation rate on the y-axis, we can plot a series of unconnected points that indicate some kind of relationship between the two data series.
To indicate how closely two data series are related, we use a measure of their linear association, the correlation coefficient (r). The values that r can have range from -1 (perfect negative correlation) through zero (no linear correlation) to +1 (perfect positive correlation). To calculate the r of a data sample, we must first understand another statistic: sample covariance.
Covariance measures the extent to which two variables (X, Y) change together. It is given by the following equation:
n is the number of data pairs
i is a particular value from 1 to n
is the ith X variable, is the ith Y variable
and are the mean X and Y values, respectively
In English, this states that the sample covariance is the average value of the product of the deviations of observations on two random variables from their sample means. The use of (n – 1) instead of n to calculate the mean is used to ensure that sample covariance is an unbiased estimate of population variance.
To show the relationship between covariance and r, we note that if we take the covariance of X with itself, we have calculated the variance of X. Variance (denoted by the symbol s2) is a measure of how far values deviate from their mean, and is given by the following equation:
This is the variance of X, a measure of X’s dispersion around its mean Standard deviation (sx) is the positive square root of variance:
Now we have all of the elements in place to calculate the sample correlation coefficient:
Thus, the correlation coefficient, r, is equal to the covariance of the two variables divided by the product of their standard deviations. Think of it as the covariance normalized for the dispersion of each variable.
It is assumed that for the correlation coefficient the means and covariances of X, Y, and Cov(X,Y) are finite and constant. Note that r refers solely to linear associations between X and Y, that is, no exponents greater than 1.
A value of r equal to, say, 0.9, would indicate a strong linear relationship between X and Y, but not necessarily any causal relationships between the two variables. A classic example of spurious correlation is one between vocabulary and height. One may infer that the real relationship has something to do with age.
Forecasters use correlations to analyze trends and changes in trends. For instance, a change in the consumer price index (CPI) is correlated with a change to the inflation rate. So whenever a new CPI figure is released, economists revise their forecasts for inflation, which in turn affect interest rates and bond prices. When dealing with more than two variables, a correlation matrix is used to sort out the various linear relationships among the variables.
Next time out, we’ll tackle linear regressions.