April 28, 2006 at 12:31 pm
I have the following SQL for calculating the correlation coefficient. The issue I am having is that if the data doesn't vary then I will get 0 for the denominator, so I need to check and just return NULL as the coefficient is undefined. This is not too slow, but if there is a more efficient way to do the case I would like to know it, as when I have to do this calculation over and over any little improvement will help.
SELECT
(COUNT(*)*SUM(x.Value*y.Value)-SUM(x.value)*SUM(y.value))/case(
SQRT(COUNT(*)*SUM(SQUARE(x.value))-SQUARE(SUM(x.value)))*
SQRT(COUNT(*)*SUM(SQUARE(y.value))-SQUARE(SUM(y.value)))) when 0 then null else (
SQRT(COUNT(*)*SUM(SQUARE(x.value))-SQUARE(SUM(x.value)))*
SQRT(COUNT(*)*SUM(SQUARE(y.value))-SQUARE(SUM(y.value)))) end
correlation
FROM response x
JOIN response y ON x.RespondentNumber=y.RespondentNumber
WHERE
x.SurveyID=@SurveyID
AND y.SurveyID=@SurveyID
and x.SurveyQuestionId = @SATQuestionID
and y.SurveyQuestionId = @CurrentQuestion
April 28, 2006 at 1:29 pm
Here's a link that discusses three different way to avoid the Division by zero error:
http://www.sql-server-helper.com/error-messages/msg-8134.aspx
Viewing 2 posts - 1 through 1 (of 1 total)
You must be logged in to reply to this topic. Login to reply