December 8, 2008 at 12:18 pm
Kris (12/5/2008)
Is it save to assume that both Pivot and Cross Tabs preform better than tabulating the data from a view that inner joins the table on itself for each quarter?
Jeff Moden (12/8/2008)
It's never safe to assume anything of this nature. Only testing with the appropriate amount of data will prove things.
Indeed, assumptions are not safe.
I tried the following query, probably not the best way of doing it but I'm still a newbie to performance tuning
I added an index (that the execution plan suggested) on the 1million record set it seemed to preform better than the pivot but worse than the cross tab. I must stress however that I am a newbie to performance tuning.
CREATE NONCLUSTERED INDEX [ind_allcolumns]
ON [dbo].[CrossTabTest1Mill] ([Quarter])
INCLUDE ([Company],[Amount],[Quantity],[Year])
SELECT yearly.Company,
yearly.Year,
q1.q1Amt, q1.q1Qty,
q2.q2Amt, q2.q2Qty,
q3.q3Amt, q3.q3Qty,
q4.q4Amt, q4.q4Qty,
yearly.TotalAmount,
yearly.TotalQty
FROM (SELECT Company, Year , SUM(Amount) AS TotalAmount,SUM(Quantity) AS TotalQty
FROM CrossTabTest1Mill GROUP BY Company,Year) yearly inner join
(SELECT Company, Year ,Quarter,SUM(Amount) AS q1Amt,SUM(Quantity) AS q1Qty
FROM CrossTabTest1Mill where Quarter = 1
GROUP BY Company,Year,Quarter) q1 on yearly.Company = q1.Company and yearly.Year = q1.Year
inner join
(SELECT Company, Year ,Quarter,SUM(Amount) AS q2Amt,SUM(Quantity) AS q2Qty
FROM CrossTabTest1Mill where Quarter = 2
GROUP BY Company,Year,Quarter) q2 on yearly.Company = q2.Company and yearly.Year = q2.Year
inner join
(SELECT Company, Year ,Quarter,SUM(Amount) AS q3Amt,SUM(Quantity) AS q3Qty
FROM CrossTabTest1Mill where Quarter = 3
GROUP BY Company,Year,Quarter) q3 on yearly.Company = q3.Company and yearly.Year = q3.Year
inner join
(SELECT Company, Year ,Quarter,SUM(Amount) AS q4Amt,SUM(Quantity) AS q4Qty
FROM CrossTabTest1Mill where Quarter = 4
GROUP BY Company,Year,Quarter) q4 on yearly.Company = q4.Company and yearly.Year = q4.Year
ORDER BY yearly.Company, yearly.Year
Do you have any advice on increasing the performance of this query?
Kris (12/5/2008)
I'm working on a survey database, the questions needs to be fairly dynamic. The survey question set is up to 500 questions and at least 20% of which will change. I was looking at making a participant table, a question table and an answer table.The answer table would be an EAV table with a participant id, question id, and then an answer.
The initial survey count will be about 300, however they are talking about the program increasing to something like 1000-2000 and with dreams of hitting 20,000.
At 300 surveys that would yield 150,000 rows.
At 1500 surveys that would yield 750,000 rows
At 20,000 surveys would yield 10,000,000 rows
For the most part(95% of questions) the data needs to be extracted to insert, edit, and view a single survey. However for the reports they would want to view a survey_id along side all 500 columns(for each question) for each survey ids. (so they could put it in a spreadsheet and analyze it)
In your article, the queries seemed to preform rather well on my machine, but it is only tabulated to 12 columns wide for 80 rows, do you think it would still preform as well if you went to 50 columns wide or even 500?
Jeff Moden (12/8/2008)
Again... I don't believe the survey database is correctly designed if you need to drag this into a spreadsheet. You should be able to get the survey ids for each answered question without loading it into a spreadsheet and you should be able to do all sorts of analysis without loading it into a spreadsheet.If people insist on loading this data into a spreadsheet, perhaps Part 2 of this series would be better suited...
[font="Arial Black"](click here) Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs[/font][/url]
The reason it must dragged into a spread sheet is because researcher want to be able to extract the data from the DB and perform statistical analysis on it themselves with SAS or SPSS. (this is a requirement put on me that I have little control over)
P.S. I have already read part 2 of your series and I plan on using it to develop the stored procedures for this project.
December 8, 2008 at 12:28 pm
Kris (12/8/2008)
Do you have any advice on increasing the performance of this query?
Heh... yeah... use the CrossTab code instead. 😉
The reason it must dragged into a spread sheet is because researcher want to be able to extract the data from the DB and perform statistical analysis on it themselves with SAS or SPSS. (this is a requirement put on me that I have little control over)
Ok... whatever... I think they're a bit whacked for doing it that way, but ok. They're not easily going to get 500 columns if this is for SQL Server 2000 but can be done easily in 2k5.
Recommend that if you have further questions on this particular subject, that you start a new thread in the appropriate forum so that more folks than just me will reply and to keep from dilluting this particulart thread too much..
--Jeff Moden
Change is inevitable... Change for the better is not.
December 8, 2008 at 6:44 pm
Kris (12/8/2008)
The reason it must dragged into a spread sheet is because researcher want to be able to extract the data from the DB and perform statistical analysis on it themselves with SAS or SPSS. (this is a requirement put on me that I have little control over)
Good heavens, you should not be dragging data into Excel for SAS or SPSS to use. Both of these should be able to get the data directly from SQL Server. Excel is definitely the weak link in that chain.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
December 8, 2008 at 7:01 pm
rbarryyoung (12/8/2008)
Kris (12/8/2008)
The reason it must dragged into a spread sheet is because researcher want to be able to extract the data from the DB and perform statistical analysis on it themselves with SAS or SPSS. (this is a requirement put on me that I have little control over)Good heavens, you should not be dragging data into Excel for SAS or SPSS to use. Both of these should be able to get the data directly from SQL Server. Excel is definitely the weak link in that chain.
Heh... now, now, Barry... we all know that a single user is smarter than all the DBA's on these forums... 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
December 8, 2008 at 8:50 pm
But this is more than bad design, it's just completely unnecessary. Excel adds nothing to this process, except limitations and overhead.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
December 9, 2008 at 4:57 am
Absolutely agree... but it won't change their mind...
--Jeff Moden
Change is inevitable... Change for the better is not.
December 9, 2008 at 12:23 pm
rbarryyoung (12/8/2008)
But this is more than bad design, it's just completely unnecessary. Excel adds nothing to this process, except limitations and overhead.
... And potential data loss.
December 11, 2008 at 11:02 am
Jeff Moden (12/8/2008)
rbarryyoung (12/8/2008)
Kris (12/8/2008)
The reason it must dragged into a spread sheet is because researcher want to be able to extract the data from the DB and perform statistical analysis on it themselves with SAS or SPSS. (this is a requirement put on me that I have little control over)Good heavens, you should not be dragging data into Excel for SAS or SPSS to use. Both of these should be able to get the data directly from SQL Server. Excel is definitely the weak link in that chain.
Heh... now, now, Barry... we all know that a single user is smarter than all the DBA's on these forums... 😛
pretty much, it wasn't my decision, I simply have to make a program to fit within their protocol. With reporting services, the excel format is easy, so long as I can make the right query. Which makes this cross tab article very useful.
As far as the excel format being bad, that's something they'll believe once they see it. Then it'll be an easy fix on my side.
Anyways, great article Jeff very helpful
December 11, 2008 at 11:23 am
Its pretty hard to understand why your SAS and SPSS users would want to wait for SQL Server to export to Execl, then import from Excel when they could just import straight from SQL Server.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
December 11, 2008 at 8:00 pm
Kris (12/11/2008)
pretty much, it wasn't my decision, I simply have to make a program to fit within their protocol. With reporting services, the excel format is easy, so long as I can make the right query. Which makes this cross tab article very useful.As far as the excel format being bad, that's something they'll believe once they see it. Then it'll be an easy fix on my side.
Anyways, great article Jeff very helpful
Ah... now I get it... it's one of my favorite sayings... "Give them the opportunity to fail". They won't really know or care if their method is wrong or flawed until they've tried it and until they try it, they're not gonna believe you. So, you have to give them the opportunity to try it and see. Well done and very patient of you. 😉
And, thanks for the compliment, Kris. If you run into any problems on this, post back and I'll try to answer your questions.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 12, 2009 at 6:45 am
Hey Jeff!
Well presented and thought-provoking article, as millions of others have already said 😉
I do love a performance challenge though, and the figures showing that PIVOT was slower than the old CASE trick got me interested. The below code is not pretty nor particularly readable (or even finished!) but it does illustrate a point. Using a nested PIVOT on the one million row example used 3,473 logical reads (one scan of the index - the same as the winning "normal" cross-tab, and slightly less CPU than the winning "Pre-Agg Cross-Tab with CTE".
Cheers,
Paul
;WITH
PartialAgg AS
(
SELECT[Company], [Year], [Quarter] AS Q, [Quarter] AS Q2, SUM(Amount) AS Amt, SUM(Quantity) AS Qty
FROM#sometable3 as PA
GROUPBY
[Company], [Year], [Quarter]
),
PivotOne AS
(
SELECT[Company], [Year], Q2 AS
,
ISNULL([1],0) AS Q1Amt,ISNULL([2],0) AS Q2Amt,ISNULL([3],0) AS Q3Amt,ISNULL([4],0) AS Q4Amt, qty
FROMPartialAgg
PIVOT(SUM(Amt) FOR
IN ([1], [2], [3], [4])) AS pvt
),
PivotTwo AS
(
SELECT[Company], [Year],
SUM(Q1Amt) AS Q1Amt, SUM(Q2Amt) AS Q2Amt, SUM(Q3Amt) AS Q3Amt, SUM(Q4Amt) AS Q4Amt,
SUM(ISNULL([1],0)) AS Q1Qty, SUM(ISNULL([2],0)) AS Q2Qty, SUM(ISNULL([3],0)) AS Q3Qty, SUM(ISNULL([4],0)) AS Q4Qty
FROMPivotOne
PIVOT(Sum(Qty) FOR
IN ([1], [2], [3], [4])) AS pvt
GROUPBY
[Company], [Year]
)
SELECT[Company], [Year], Q1Amt, Q1Qty, Q2Amt, Q2Qty, Q3Amt, Q3Qty, Q4Amt, Q4Qty,
Q1Amt + Q2Amt + Q3Amt +Q4Amt AS TotalAmt,
Q1Qty + Q2Qty + Q3Qty + Q4Qty AS TotalQty
FROMPivotTwo
ORDERBY
[Company], [Year];
April 12, 2009 at 7:31 am
Outstanding... thanks for the post and the code, Paul... I'll check it out. That's what I love about forums. Lots of folks with lots of ideas.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 12, 2009 at 8:14 am
Jeff Moden (4/12/2009)
Outstanding... thanks for the post and the code, Paul... I'll check it out. That's what I love about forums. Lots of folks with lots of ideas.
Just to repeat what I posted elsewhere, please don't look upon this a good idea in practice - it's an ugly hack to get around the double-PIVOT join, and to take advantage of the fact that the plan has to evaluate all CASE statements for all rows (the hack presented does lots more work but on many fewer rows).
This is a curiosity - that's all - until PIVOT is extended to do multiple aggregations, anyway :-).
April 12, 2009 at 7:25 pm
Thanks for your concern. If I'm not mistaken, PIVOT can do multiple aggregations... I just gotta look back in my notes because I don't use PIVOT for all the reasons I stated in the article. I know I have code somewhere that shows how to do multiple aggregations using PIVOT.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 13, 2009 at 3:14 pm
Jeff Moden (4/12/2009)
Thanks for your concern. If I'm not mistaken, PIVOT can do multiple aggregations... I just gotta look back in my notes because I don't use PIVOT for all the reasons I stated in the article. I know I have code somewhere that shows how to do multiple aggregations using PIVOT.
Jeff,
Did you find that code at all? And did you get chance to check out the nested PIVOT? If so, what did you think?
Oh, and have you enabled CLR yet and learnt C#? :laugh:
Cheers,
Paul
Viewing 15 posts - 121 through 135 (of 243 total)
You must be logged in to reply to this topic. Login to reply