Cummulative percents

Question

Post reply

Cummulative percents

Amy.G

SSCrazy

Points: 2305
More actions
October 18, 2011 at 7:34 am

#261550

I have code to create cummulative percents. Unfortunately, it is running extrememly slowly. Here is the code:
declare @table as table (name varchar(15), sales numeric (5,0))
Insert into @table (name, sales)
Values ('John', 10),
('Jennifer', 15),
('Stella', 20),
('Sophia', 40),
('Greg',50),
('Jeff', 20);
SELECT a1.name, a1.sales, SUM(a2.sales)/(SELECT SUM(sales) from @table) as PctToTotal
FROM @table a1, @table a2
WHERE a1.sales <= a2.sales or (a1.sales=a2.sales and a1.name =a2.name)
group by a1.name, a1.sales
order by a1.sales desc, a1.name desc
This runs fine for small tables, but when I switched to actual work tables (~ 500,000 rows) it is way too slow. Going on several hours now. Anyone know a better/faster formula?
Thank you, Amy

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Ron McCullough SSC Guru Points: 63877 More actions · Answer 1

Test your code, with this variation. Instead of using a TABLE VARIABLE (@Table) use a Temp Table (#Table)

Look at the QOD posted on 10/03/2011 at

http://www.sqlservercentral.com/Forums/Post.aspx?SessionID=mqk3w345ndswzpnnc0a4njj4

If everything seems to be going well, you have obviously overlooked something.

Ron

Please help us, help you -before posting a question please read[/url]
Before posting a performance problem please read[/url]

Amy.G SSCrazy Points: 2305 More actions · Answer 2

bitbucket-25253 (10/18/2011)
Test your code, with this variation. Instead of using a TABLE VARIABLE (@Table) use a Temp Table (#Table)

Ron, That code is just for example on the forum. The actual code is using a temp table (#table).

drew.allen SSC Guru Points: 77004 More actions · Answer 3

Jeff Moden has a very good article on exactly this type of problem.

Solving the Running Total and Ordinal Rank Problems (Rewritten)[/url]

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

Amy.G SSCrazy Points: 2305 More actions · Answer 4

Read the article and re-wrote the code. I now know I was trying to do a "triangular join". The performance went from nearly a full day (it never did finish, I put it out of its misery) to 10 seconds. 😀 I'm not sure if you can beat that for performance. Thank you!!!

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 5

Amy.G (10/18/2011)
Read the article and re-wrote the code. I now know I was trying to do a "triangular join". The performance went from nearly a full day (it never did finish, I put it out of its misery) to 10 seconds. 😀 I'm not sure if you can beat that for performance. Thank you!!!

Did you read the discussion as well? I think there may have been some code changes implemented during the discussion. I haven't read the article lately, so I'm not sure if the rewrite has been rewritten again.

Jeff Moden SSC Guru Points: 1004498 More actions · Answer 6

Amy.G (10/18/2011)
Read the article and re-wrote the code. I now know I was trying to do a "triangular join". The performance went from nearly a full day (it never did finish, I put it out of its misery) to 10 seconds. 😀 I'm not sure if you can beat that for performance. Thank you!!!

Amy... Lynn is correct. There have been some improvements to the Quirky Update code mostly for added safety. If you'd post the code you ended up with, I'd be happy to do a double check on it for you.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Amy.G SSCrazy Points: 2305 More actions · Answer 7

I did see the changes, and hopefully incorporated them. I should go back and read it again though; I was so excited it worked, but realize I don't fully understand it, like what does MAXDOP mean?

But here is the code I used:

--create clustered index for cum counts

CREATE CLUSTERED INDEX IXC_LED_Data

ON #led (ordering_value, instance_count);

--===== Supress the auto-display of rowcounts for speed an appearance

SET NOCOUNT ON

--===== Declare the working variables

DECLARE @PrevInstance int

DECLARE @RunningPercent numeric(20,15)

--===== Update the running total and running percent for this row using the "Quirky

-- Update" and a "Pseudo-cursor". The order of the UPDATE is controlled by the

-- order of the clustered index.

UPDATE #led

SET @RunningPercent = RunningPercent = CASE

WHEN Instance_count = THEN @RunningPercent + Percents

ELSE Percents

END,

@PrevInstance = Instance_count

FROM #led WITH (TABLOCKX)

OPTION (MAXDOP 1)

GO

Jeff Moden SSC Guru Points: 1004498 More actions · Answer 8

Amy.G (10/19/2011)
I did see the changes, and hopefully incorporated them. I should go back and read it again though; I was so excited it worked, but realize I don't fully understand it, like what does MAXDOP mean?
But here is the code I used:
--create clustered index for cum counts
CREATE CLUSTERED INDEX IXC_LED_Data
ON #led (ordering_value, instance_count);
--===== Supress the auto-display of rowcounts for speed an appearance
SET NOCOUNT ON
--===== Declare the working variables
DECLARE @PrevInstance int
DECLARE @RunningPercent numeric(20,15)
--===== Update the running total and running percent for this row using the "Quirky
-- Update" and a "Pseudo-cursor". The order of the UPDATE is controlled by the
-- order of the clustered index.
UPDATE #led
SET @RunningPercent = RunningPercent = CASE
WHEN Instance_count = THEN @RunningPercent + Percents
ELSE Percents
END,
@PrevInstance = Instance_count
FROM #led WITH (TABLOCKX)
OPTION (MAXDOP 1)
GO

Apologies for the very late reply. I lost track of this thread.

You've not incorporated the new "safety counter" but, since you're doing this in a Temp Table, you won't have a problem with this code because you've followed all the other rules.

I invite you to lookup OPTION(MAXDOP 1) in Books Online for additional clarification but, as stated in the article, it prevents parallelism from occuring which could destroy the serial nature of the update. MAXDOP stands for "MAX Degree Of Parallelism".

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)