Originally developed by the statistician Karl Pearson, the correlation coefficient, also known as Pearson’s r, is meant to identify the degree of linear dependence between two variables. In securities analysis, it can be utilized to measure the similarity between two stocks’ price movements over time. The correlation coefficient has a range between -1.0 and 1.0, where a result of 1 exhibit perfect correlation between two variables, a result of -1 exhibits perfect negative correlation, and 0 reveals no correlation. Two similar stocks in the same industry experiencing similar growth will have a higher correlation than stocks in variant industries with different levels of growth. For more on the correlation coefficient, check out this link.
By building a matrix, we can measure the correlation of a set of securities against all members of that set.
We’ll start by creating and populating a price table. Please run the attached script (prices.sql). The table stores the 8 fictitious stocks shown below:
Ticker | Description |
CAR1 | Automotive Company |
DRG1 | Pharmaceutical Company |
GLD1 | Gold Mining Company |
OIL1 | Energy Company |
RTL1 | Retail Consumer Goods Company |
TCH1 | Technology Company 1 |
TCH2 | Technology Company 2 |
TCH3 | Technology Company 3 |
Once the price table is built and populated, we can build a scalar function to calculate the correlation coefficient between two stocks(see below). This function accepts 4 variables: @TickerA and @TickerB represent the two stocks for which we’re measuring correlation, while @DateStart and @DateEnd reflect the date range over which we’re making the measurement. Two assumptions made when using this function is that there are no null prices for either security throughout the specified range, and the specified range is sufficiently large(30 days or greater). A smaller date range would require a slightly different calculation.
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[fnGetCorrelation]') AND type in (N'FN', N'IF', N'TF', N'FS', N'FT')) DROP FUNCTION [dbo].[fnGetCorrelation] GO CREATE FUNCTION [dbo].[fnGetCorrelation] ( @TickerA NVARCHAR(10), @TickerB NVARCHAR(10), @DateStart DATETIME, @DateEnd DATETIME ) RETURNS NUMERIC(22,6) AS BEGIN DECLARE @MeanA NUMERIC(22,6), @MeanB NUMERIC(22,6), @Corr NUMERIC(22,6) --Get Mean Value for Ticker A SELECT @MeanA = AVG(Price) FROM Prices WHERE ticker = @TickerA AND Date >= @DateStart AND Date <= ISNULL(@DateEnd, GETDATE()) GROUP BY ticker --Get Mean Value for Ticker B SELECT @MeanB = AVG(Price) FROM Prices WHERE ticker = @TickerB AND Date >= @DateStart AND Date <= ISNULL(@DateEnd, GETDATE()) GROUP BY ticker --Calculate Correlation Coefficient SELECT @Corr =(AVG((prcA.Price-@MeanA)*(prcB.Price-@MeanB)))/(STDEVP(prcA.Price)*STDEVP(prcB.Price))--, FROM Prices prcA JOIN Prices prcB ON prcA.Date = prcB.Date WHERE prcA.ticker = @TickerA AND prcB.TICKER = @TickerB AND prcA.Date >= @dateStart AND prcA.Date <= @dateend RETURN(@Corr) END GO
Let’s test out the function as follows:
select dbo.fnGetCorrelation('CAR1', 'TCH1','1/1/2011','3/31/2011') AS Corr
Results:
Corr |
-0.338718 |
The above reveals that the price movements of stocks CAR1 and TCH1, with a correlation of roughly -.34, are very much dissimilar.
Now that we’ve got a function capable of calculating correlation, let’s build the matrix. We’ll need to create a stored procedure that accepts a table-valued parameter(TVP) to load a set of tickers(check out this link for more info on TVPs) to build the matrix. But before building the procedure, we’ll first need to create a table type and subseqent TVP:
IF EXISTS (SELECT * FROM sys.types st JOIN sys.schemas ss ON st.schema_id = ss.schema_id WHERE st.name = N'TickerListType' AND ss.name = N'dbo') DROP TYPE [dbo].[TickerListType] GO CREATE TYPE [dbo].[TickerListType] AS TABLE ( [ticker] [nvarchar](30) NOT NULL, PRIMARY KEY CLUSTERED ( [ticker] ASC )WITH (IGNORE_DUP_KEY = OFF) ) GO DECLARE @Tickers AS TickerListType /* Add data to the table variable. */ INSERT INTO @Tickers (ticker) VALUES ('CAR1'), ('DRG1'), ('GLD1') SELECT * FROM @Tickers
We’ll now build the stored procedure which will utilize a CROSS JOIN to fire off the fn_GetCorrelation function against all possible two-security combinations very easily.
The procedure will then perform a dynamic pivot and a self CROSS JOIN to calculate the correlation for all possible security combinations and lay them out in a matrix(please note that I used Andras Belokosztolszki’s fantastic dynamic pivot method, which is explained here). The dynamic pivot is required in order to accommodate a variable number of input securities as columns in the output:
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[uspGetCorrMatrix]') AND type in (N'P', N'PC')) DROP PROCEDURE [dbo].[uspGetCorrMatrix] GO CREATE PROC [dbo].[uspGetCorrMatrix] @Tickers TickerListType READONLY, @DateStart NVARCHAR(10), @DateEnd NVARCHAR(10) AS DECLARE @Sql NVARCHAR(MAX), @Columns NVARCHAR(MAX) SELECT @Columns = COALESCE(@Columns + ',','') + QUOTENAME(TICKER) FROM (SELECT DISTINCT ticker FROM @tickers) AS Columns ORDER BY Columns.ticker --select @Columns --select * FROM @Tickers set @SQL = ' SELECT CorrMatrix, ' + @Columns + ' FROM ( select CorrMatrix = a.TICKER, Ticker2 = b.TICKER, Corr = dbo.fnGetCorrelation(a.TICKER, b.TICKER, ''' + @DateStart + ''', ''' + @DateEnd + ''') from @Tick a CROSS JOIN @Tick b ) AS SourceTable PIVOT ( SUM(Corr) FOR Ticker2 IN (' + @Columns +') ) AS PivotTable ' EXEC sp_executesql @SQL, N'@tick TickerListType readonly', @Tickers GO
Let’s now fire off the procedure by first populating a TVP with a set of tickers:
DECLARE @tick AS TickerListType INSERT INTO @tick (TICKER) VALUES ('CAR1'), ('DRG1'), ('GLD1'), ('OIL1'), ('RTL1') EXEC uspGetCorrMatrix @Tick, @DateStart = '01/01/2011', @DateEnd = '03/31/2011';
The output is as follows:
CorrMatrix | CAR1 | DRG1 | GLD1 | OIL1 | RTL1 |
CAR1 | 1 | -0.83086 | 0.491672 | -0.77953 | 0.684447 |
DRG1 | -0.83086 | 1 | -0.52497 | 0.721084 | -0.61818 |
GLD1 | 0.491672 | -0.52497 | 1 | -0.1153 | 0.335694 |
OIL1 | -0.77953 | 0.721084 | -0.1153 | 1 | -0.37631 |
RTL1 | 0.684447 | -0.61818 | 0.335694 | -0.37631 | 1 |
Please use caution in utilizing this code against a large number of tickers over a long period as the query cost increases exponentially as we add additional tickers.