Is the formula for STDEV defined anywhere?

  • Hi,

    We are using STDEV in some stored procedures in SQL 2000.  For our test plan, we have a defined set of data and results that have been calculated using Excel 2000.  We are seeing some differences that we can't account for and given that there is more than one accepted way of calculating standard deviations, I was wondering if the discrepancies are down to Excel 2000 and SQL 2000 doing it in slightly different ways. 

    The Excel help is useful enough to give a definition of the formula it uses for STDEV, but I haven't been able to find the same information for SQL 2000 in the BOL. 

    Anyone know where I might find the formula?

    Thanks!

     

     

  • Sure, there are myriads of source out there.

    Personally I like this one:

    http://mathworld.wolfram.com

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Thanks, but what I actually meant was does anyone know which STDEV formula SQL 2000 uses?

  • Oh, I see.

    But are you concerned about the difference on the 14th decimal place anyway?

    When you play with the example given in BOL and recalculate this in Excel you'll get the following results:

    SQL Server: 4,7731890108535753

    Excel: 4,7731890108535800

    You see that Excel "rounds up" to the 14th decimal place.

    Now what happens in SQL Server? I'm quite sure you won't find anywhere the exact implementation of the formula. However I guess something very similar to this happens internally:

    USE pubs

    DECLARE @a DECIMAL(18,10)

    DECLARE @b-2 DECIMAL(18,10)

    SELECT @a= SUM(royalty)/(COUNT(royalty)*1.0) FROM titles

    SELECT @b-2=SUM(SQUARE(royalty-@a)) FROM titles

    SELECT SQRT(@b/(COUNT(royalty)-1)) FROM titles

    -----------------------------------------------------

    4.7731890108535753

    (1 row(s) affected)

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Oh my, probably easier is to simply look at the execution plan

    StmtText

    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

    |--Compute Scalar(DEFINE[Expr1002]=sqrt(If (If ([Expr1003]-1>0) then (([Expr1005]-square([Expr1004])/Convert([Expr1003]))/Convert(([Expr1003]-1))) else NULL=NULL) then If ([Expr1003]-1>0) then (([Expr1005]-square([Expr1004])/Convert([Expr1003]))/Conver

    |--Stream Aggregate(DEFINE[Expr1003]=COUNT_BIG([titles].[royalty]), [Expr1004]=SUM(Convert([titles].[royalty])), [Expr1005]=SUM(square(Convert([titles].[royalty])))))

    |--Clustered Index Scan(OBJECT[pubs].[dbo].[titles].[UPKCL_titleidind]))

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Thanks Frank, this information is very useful!  I was trying to ascertain if the differences in the results we are calculating in our app and the expected results in our test plan could have been due to different standard deviation methodologies. 

    From what you have told me, it looks as if there is probably a bug in our code that we need to look into.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply