Correlation measures the linear relationship of two variables. By measuring and relating the variance of each variable, correlation gives an indication of the strength of the relationship. Or to put it another way, correlation answers the question: How much does variable A (the independent variable) explain variable B (the dependent variable)?

The Formula for Correlation

Correlation combines several important and related statistical concepts, namely, variance and standard deviation.  Variance is the dispersion of a variable around the mean, and standard deviation is the square root of variance.  

The formula is: 

Since correlation wants to assess the linear relationship of two variables, what's really required is to see what amount of covariance those two variables have, and to what extent that covariance is reflected by the standard deviations of each variable individually.

Common Mistakes with Correlation

The single most common mistake is assuming a correlation approaching +/- 1 is statistically significant.  A reading approaching +/- 1 definitely increases the chances of actual statistical significance, but without further testing it's impossible to know. The statistical testing of a correlation can get complicated for a number of reasons; it's not at all straightforward.  A critical assumption of correlation is that the variables are independent and that the relationship between them is linear. In theory, you would test these claims to determine if a correlation calculation is appropriate.  

The second most common mistake is forgetting to normalize the data into a common unit.  If calculating a correlation on two betas, then the units are already normalized: beta itself is the unit.  However, if you want to correlate stocks, it's critical you normalize them into percent return, and not share price changes.  This happens all too frequently, even among investment professionals.  

For stock price correlation, you are essentially asking two questions: What is the return over a certain number of periods, and how does that return correlate to another security's return over the same period?  This is also why correlating stock prices is difficult: Two securities might have a high correlation if the return is daily percent changes over the past 52 weeks, but a low correlation if the return is monthly changes over the past 52 weeks.  Which one is "better"? There really is no perfect answer, and it depends on the purpose of the test. 

Finding Correlation in Excel

There are several methods to calculating correlation in Excel.

The simplest way is to get two data sets and use the built-in correlation formula:

This is a convenient way to calculate a correlation between just two data sets. But what if you want to create a correlation matrix across a range of data sets? To do this, you need to use Excel's Data Analysis plugin. The plugin can be found in the Data tab, under Analyze.  

Select the table of returns. In this case, our columns are titled, so we want to check the box "Labels in first row," so Excel knows to treat these as titles.  Then you can choose to output on the same sheet or on a new sheet.  

Once you hit enter, the data is automatically made.  You can add some text and conditional formatting to clean up the result.