Cummulative percents

  • I have code to create cummulative percents. Unfortunately, it is running extrememly slowly. Here is the code:

    declare @table as table (name varchar(15), sales numeric (5,0))

    Insert into @table (name, sales)

    Values ('John', 10),

    ('Jennifer', 15),

    ('Stella', 20),

    ('Sophia', 40),

    ('Greg',50),

    ('Jeff', 20);

    SELECT a1.name, a1.sales, SUM(a2.sales)/(SELECT SUM(sales) from @table) as PctToTotal

    FROM @table a1, @table a2

    WHERE a1.sales <= a2.sales or (a1.sales=a2.sales and a1.name =a2.name)

    group by a1.name, a1.sales

    order by a1.sales desc, a1.name desc

    This runs fine for small tables, but when I switched to actual work tables (~ 500,000 rows) it is way too slow. Going on several hours now. Anyone know a better/faster formula?

    Thank you, Amy

  • Test your code, with this variation. Instead of using a TABLE VARIABLE (@Table) use a Temp Table (#Table)

    Look at the QOD posted on 10/03/2011 at

    http://www.sqlservercentral.com/Forums/Post.aspx?SessionID=mqk3w345ndswzpnnc0a4njj4

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • bitbucket-25253 (10/18/2011)


    Test your code, with this variation. Instead of using a TABLE VARIABLE (@Table) use a Temp Table (#Table)

    Ron, That code is just for example on the forum. The actual code is using a temp table (#table).

  • Jeff Moden has a very good article on exactly this type of problem.

    Solving the Running Total and Ordinal Rank Problems (Rewritten)[/url]

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Read the article and re-wrote the code. I now know I was trying to do a "triangular join". The performance went from nearly a full day (it never did finish, I put it out of its misery) to 10 seconds. 😀 I'm not sure if you can beat that for performance. Thank you!!!

  • Amy.G (10/18/2011)


    Read the article and re-wrote the code. I now know I was trying to do a "triangular join". The performance went from nearly a full day (it never did finish, I put it out of its misery) to 10 seconds. 😀 I'm not sure if you can beat that for performance. Thank you!!!

    Did you read the discussion as well? I think there may have been some code changes implemented during the discussion. I haven't read the article lately, so I'm not sure if the rewrite has been rewritten again.

  • Amy.G (10/18/2011)


    Read the article and re-wrote the code. I now know I was trying to do a "triangular join". The performance went from nearly a full day (it never did finish, I put it out of its misery) to 10 seconds. 😀 I'm not sure if you can beat that for performance. Thank you!!!

    Amy... Lynn is correct. There have been some improvements to the Quirky Update code mostly for added safety. If you'd post the code you ended up with, I'd be happy to do a double check on it for you.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I did see the changes, and hopefully incorporated them. I should go back and read it again though; I was so excited it worked, but realize I don't fully understand it, like what does MAXDOP mean?

    But here is the code I used:

    --create clustered index for cum counts

    CREATE CLUSTERED INDEX IXC_LED_Data

    ON #led (ordering_value, instance_count);

    --===== Supress the auto-display of rowcounts for speed an appearance

    SET NOCOUNT ON

    --===== Declare the working variables

    DECLARE @PrevInstance int

    DECLARE @RunningPercent numeric(20,15)

    --===== Update the running total and running percent for this row using the "Quirky

    -- Update" and a "Pseudo-cursor". The order of the UPDATE is controlled by the

    -- order of the clustered index.

    UPDATE #led

    SET @RunningPercent = RunningPercent = CASE

    WHEN Instance_count = THEN @RunningPercent + Percents

    ELSE Percents

    END,

    @PrevInstance = Instance_count

    FROM #led WITH (TABLOCKX)

    OPTION (MAXDOP 1)

    GO

  • Amy.G (10/19/2011)


    I did see the changes, and hopefully incorporated them. I should go back and read it again though; I was so excited it worked, but realize I don't fully understand it, like what does MAXDOP mean?

    But here is the code I used:

    --create clustered index for cum counts

    CREATE CLUSTERED INDEX IXC_LED_Data

    ON #led (ordering_value, instance_count);

    --===== Supress the auto-display of rowcounts for speed an appearance

    SET NOCOUNT ON

    --===== Declare the working variables

    DECLARE @PrevInstance int

    DECLARE @RunningPercent numeric(20,15)

    --===== Update the running total and running percent for this row using the "Quirky

    -- Update" and a "Pseudo-cursor". The order of the UPDATE is controlled by the

    -- order of the clustered index.

    UPDATE #led

    SET @RunningPercent = RunningPercent = CASE

    WHEN Instance_count = THEN @RunningPercent + Percents

    ELSE Percents

    END,

    @PrevInstance = Instance_count

    FROM #led WITH (TABLOCKX)

    OPTION (MAXDOP 1)

    GO

    Apologies for the very late reply. I lost track of this thread.

    You've not incorporated the new "safety counter" but, since you're doing this in a Temp Table, you won't have a problem with this code because you've followed all the other rules.

    I invite you to lookup OPTION(MAXDOP 1) in Books Online for additional clarification but, as stated in the article, it prevents parallelism from occuring which could destroy the serial nature of the update. MAXDOP stands for "MAX Degree Of Parallelism".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply