September 13, 2006 at 7:04 am
Hi
Who knows an elegant way to calculate the correlation coefficient of two series (columns) in T-SQL. It seems there is no aggregate function as there is one for the standard deviation.
Many thanks in advance
Urs
September 13, 2006 at 7:36 am
b/c ther'es no standard function for this, u'd have to write your own function according to a well-known algorhithm. I'm sure it'll be elegant enough
September 13, 2006 at 7:05 pm
If the columns a VARCHAR or the like, you can use SOUNDEX (won't work on numbers).
--Jeff Moden
Change is inevitable... Change for the better is not.
September 13, 2006 at 9:15 pm
There two Logs in the SQL Server math functions so that is a place to start float data type is required but you can see in the code sample you can convert to varchar. Hope this helps.
http://msdn2.microsoft.com/en-us/library/ms177516.aspx
Kind regards,
Gift Peddie
September 14, 2006 at 1:43 am
Implemented from the formula for the sample correlation coefficient on
http://en.wikipedia.org/wiki/Correlation (I'm assuming you want the linear
correlation coefficient).
Table called "data", columns "x" and "y":
select (count(*) * sum(x * y) - sum(x) * sum(y)) /
(sqrt(count(*) * sum(x * x) - sum(x) * sum(x)) *
sqrt(count(*) * sum(y * y) - sum(y) * sum(y)))
from data
I've tried it with some sample data, and the results look "reasonable" so
I think I've got the implementation right. It is a little numerically unstable
though - I got a value just a shade over 1 for a perfect linear fit.
Tony
September 18, 2006 at 12:46 am
Thank you very much to all of you. I implemented Tony's formula and I got exact the result what I was looking for. I will perform some more tests to check the stability for my secific data sets.
Urs
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply