Standard Deviation in SSAS

  • I've got OLTP data that includes a duration column in seconds.

    My measures are

    Avg of duration

    min of duration

    max of duration

    I have a time dimension

    Year,

    Quarter,

    Month,

    Day,

    Hour

    I have a screens dimension

    Screen Area

    Screen Name

    Screen Version

    What I'm trying to figure out how to do is twofold.

    At every level above the lowest one I'd like to create a measure that calculate the standard deviation of the duration measure (so in the case of screens, I'd like to calculate it for screen area, and screen name) at the lowest level I'd like to calculate how many std deviations out the data is.

    i.e. if the standard deviation of a particular screen name is 10 with a median of 50 at the screen version level if a screen version had an average duration of 20 it would be 3 standard deviations from the average.

    I can honestly say I'm a newbie at SSAS... I get the basics, I've designed a star schema with good key relationships to connect my dimensions to my facts table, but when it comes to calculations I just don't get it.

    any tips, pointers, suggestions?



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

  • Can anybody at least point me to a tutorial on creating calculated dimensions?

    So far I've got a calculated dimension that looks like this.

    IIF(ISEMPTY([Measures].[Screen Views Count]),0,

    ((

    ([Measures].[Average Loadtime]*[Measures].[Average Loadtime])*[Measures].[Screen Views Count]) -

    ([Measures].[Average Loadtime]*[Measures].[Average Loadtime])

    ) /

    (([Measures].[Screen Views Count]*([Measures].[Screen Views Count]-1)))^(.5))

    This should calculate the Std deviation adjusted for population bias, but all I get back are #VALUE!

    Which to me means I'm doing division by zero... except that if I'm using IIF(ISEMPTY, I shouldn't get division by zero.



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

  • Hey Mark,

    Not sure I follow where/why you want 'calculated dimensions'? This link [/url]is from someone who is doing something very similar to what I'e done previously using the STDEV function within MDX. Is this close to what you're wanting to do (or perhaps way off base)?

    Oh, and on re-reading, remember that 'empty' != 0, so checking for empty is good (ie you'll ignore empty cell intersections) but you *won't* ignore zero's 🙁 You could use a filter to remove these from the cellset.

    Steve.

  • Hey Steve,

    I've looked at creating MDX queries, but my problem is dimensions.

    If I create a query that will calculate the standard deviation, as I understand it, I have to specify which dimensions, etc I want to use when calculating the standard deviation.

    What I want is a dynamic measure that will give me the standard deviation for the value (load times) regardless of which dimensions the user is working with.

    That's part one.

    Then I need to figure out how to reference that from a higher level in a drill down.

    So for instance, if the standard deviation for a year was 10 and the average for the year was 50, when I drilled down to the quarter level, if a given quarter had an average of 20, then it would be 3 standard deviations from the yearly mean.

    Similarly if I were to drill down another level into the per month, if the average for the month above was 20 and the standard deviation for that month was 3, if a given month had an average of 32, it would be 4 standard deviations from the mean for the quarter.

    etc... ad naseum down to hourly averages compared to a day's standard deviation if the user drills that far.

    As for the divide by zero's issue. I'm dividing by the count of records. If ISEMPTY is false, then I by default should have at least a 1 in there.



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

  • Hey Mark,

    For the resolution of current level value compared to parent, you could try using something along the lines of (very rough pseudo code)

    (([Measures].[AvgScreenViews]) - (Axis(0).CurrentMember.Parent, [Measures].[AvgScreenViews])) / (Axis(0).CurrentMember.Parent, [Measures].[StdDevs])

    So basically using the Axis function to reference whats currently (in this case) on columns, 'reaching up' to it's parent to get the parents avg plus stdev and then using these in a calc with the current members value.

    The issues i see with this are:

    - what will happen when the current member is the top level (ie year).

    - what will happen if you change the Axis to reference 1 (ie rows) but the user works only with columns?

    You could fix the first one (sortof) by checking for the CurrentMember level, if it's top level, then 'do something else'.

    Not quite sure how to fix the second, other than inform you users that this measure will only work when they utilize rows for the analytical elements (eg year, screen etc).

    Steve.

  • Hmmm which brings to the thought...

    What if they use columns and rows?

    Again... is there a really good tutorial on doing these sorts of things?

    I've looked at the Microsoft tutorial on this stuff, but it just doesn't seem to go deep enough.



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

  • Ok so I now am calculating Standard Deviations...

    My problem was my dimension I was using for the count of them had a different name in my cube, but on the formula page it was showing an older name... is there a refresh button I'm missing somewhere?

    Anyways... this generates a standard devation.

    IIF(ISEMPTY([Measures].[Count of Instances]),0,IIF((([Measures].[Count of Instances]*([Measures].[Count of Instances]-1))) = 0,0,

    ((([Measures].[Average Loadtime]*[Measures].[Average Loadtime]*[Measures].[Count of Instances]) -

    ([Measures].[Average Loadtime]*[Measures].[Average Loadtime])

    ) /

    ([Measures].[Count of Instances]*([Measures].[Count of Instances]-1)))^(.5)))

    The problem I have now is it also sticks in 0's for empty cells... Which means I have a lot of extra bits of data in my cube. i.e. if I break it down by software versions and have 10 versions in the dimensino table, but only 1 version with data, the other 9 show up as 0.

    How can I specify that I want to return an empty cell instead of a 0? EMPTY is apparently not the same thing as sticking in a NULL... Is NULL the proper keyword to use here?



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

  • Ok... now to figure out how to reference a higher level...

    So I want to get the absolute value of the difference between a lower level's average and the higher level's average, then divide it by the higher level's standard deviation to figure out how many standard deviations from the roll-up average my lower level's average is.

    IIF(ISEMPTY([Measures].[Count of Instances]),0,ABS(([Measures].CurrentMember.Parent.[Average Loadtime])-([Measures].CurrentMember.[Average Loadtime]))/([Measures].CurrentMember.Parent.[Standard Deviation]))

    I would think this would work, but it throws me an error that there's a problem involving the [Average Loadtime]....

    Google is not my friend... so any tips on how I can reference a higher level in the cube's measures to do calculations based on it at a lower level?



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

  • Ok I misread what you had there...

    So based on the Axis(0).currentmember.parent approach I tried this

    (ABS(([Measures].[Average Loadtime])-(Axis(1).CurrentMember.Parent,[Measures].[Average Loadtime]))/(Axis(1).CurrentMember.Parent,[Measures].[Standard Deviation]))

    I get back

    the current member function expects a hierarchy expression for the 0 argument

    any more ideas?



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

  • I've also been google searching a couple other problems.

    So I wanted the Average duration for the cube... at each level.

    What I noticed was that AverageofChildren was working fine until it hit the totals... which it then summed the averages... which confused me.

    so I changed the cube to instead sum the durations, and since I was totalling the number of data instances, I then specified to calculate the average by dividing the sums by the counts of instances.

    And I've manage to create a calculation of the number of standard deviations from the next level's average a given lower level average is... but it only works for one heirarchy... still trying to figure out how to make that calculated measure work for any combination of heirarchies/dimensions.



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

  • Hey Mark,

    Just catching up on your progress, sounds like you're getting close. A long time ago we had tried to do something similar (i think it was back on AS2K even) where we wanted to create a measure that worked across any dimension, but relied on the members of said dimension. At that time, we couldn't find a solid way of doing it. As you pointed out, using Axis requires you to 1) pick an axis and 2) have your user put something on *that* axis. If you re-read those requirements, what you end up with is something that sounds much more like a calculation performed/created by the user rather than pre-built. At the time, this is what we went with - basically saying that because at the time of query generation, I know what the rows/cols are, then I can, at that same time, create a calculation that works for that given combination.

    Let us all know if you can/do come up with a solid approach as I'm sure this is a situation that a lot of people hit a lot of the time.

    Steve.

  • I was until I realized that my formula to calculate standard deviation was wrong.

    SELECT STDDEV(duration) FROM TABLE

    yeilded something entirely different from the stddev I was calculating with my formula, so dump the formula and try to use the built in MDX STDDEV or STDDEVP.

    Which seems to work if I use it to calculate the STDDEV of my calculated average, but it doesn't go down to the lowest level....

    i.e. because it's calculating STDDEV on the average, when it hits the lowest member of the Hierarchy, I get a -1.#IND result, the parents of said child however calculate STDDEV's

    If I create a new measure that's the duration and set it to have no aggregates, and try to calculate the STDDEV of that, I just get -1.#IND everywhere.

    to say the least, this is getting very frustrating.

    Every article google give me, I read... but they seem to veer away from calculated measures... which if I'm doing statistical calculations means I need. At a min... Mean (or at least an approximation), Min, Max, STDDEV, # of STDDEV's from the mean, etc..

    I can get the first three... but coming up with something I trust as the stddev is killing me.



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply