Cross Tabs and Pivots, Part 1 – Converting Rows to Columns

  • Kris (12/5/2008)


    Is it save to assume that both Pivot and Cross Tabs preform better than tabulating the data from a view that inner joins the table on itself for each quarter?

    Jeff Moden (12/8/2008)


    It's never safe to assume anything of this nature. Only testing with the appropriate amount of data will prove things.

    Indeed, assumptions are not safe.

    I tried the following query, probably not the best way of doing it but I'm still a newbie to performance tuning

    I added an index (that the execution plan suggested) on the 1million record set it seemed to preform better than the pivot but worse than the cross tab. I must stress however that I am a newbie to performance tuning.

    CREATE NONCLUSTERED INDEX [ind_allcolumns]

    ON [dbo].[CrossTabTest1Mill] ([Quarter])

    INCLUDE ([Company],[Amount],[Quantity],[Year])

    SELECT yearly.Company,

    yearly.Year,

    q1.q1Amt, q1.q1Qty,

    q2.q2Amt, q2.q2Qty,

    q3.q3Amt, q3.q3Qty,

    q4.q4Amt, q4.q4Qty,

    yearly.TotalAmount,

    yearly.TotalQty

    FROM (SELECT Company, Year , SUM(Amount) AS TotalAmount,SUM(Quantity) AS TotalQty

    FROM CrossTabTest1Mill GROUP BY Company,Year) yearly inner join

    (SELECT Company, Year ,Quarter,SUM(Amount) AS q1Amt,SUM(Quantity) AS q1Qty

    FROM CrossTabTest1Mill where Quarter = 1

    GROUP BY Company,Year,Quarter) q1 on yearly.Company = q1.Company and yearly.Year = q1.Year

    inner join

    (SELECT Company, Year ,Quarter,SUM(Amount) AS q2Amt,SUM(Quantity) AS q2Qty

    FROM CrossTabTest1Mill where Quarter = 2

    GROUP BY Company,Year,Quarter) q2 on yearly.Company = q2.Company and yearly.Year = q2.Year

    inner join

    (SELECT Company, Year ,Quarter,SUM(Amount) AS q3Amt,SUM(Quantity) AS q3Qty

    FROM CrossTabTest1Mill where Quarter = 3

    GROUP BY Company,Year,Quarter) q3 on yearly.Company = q3.Company and yearly.Year = q3.Year

    inner join

    (SELECT Company, Year ,Quarter,SUM(Amount) AS q4Amt,SUM(Quantity) AS q4Qty

    FROM CrossTabTest1Mill where Quarter = 4

    GROUP BY Company,Year,Quarter) q4 on yearly.Company = q4.Company and yearly.Year = q4.Year

    ORDER BY yearly.Company, yearly.Year

    Do you have any advice on increasing the performance of this query?

    Kris (12/5/2008)


    I'm working on a survey database, the questions needs to be fairly dynamic. The survey question set is up to 500 questions and at least 20% of which will change. I was looking at making a participant table, a question table and an answer table.

    The answer table would be an EAV table with a participant id, question id, and then an answer.

    The initial survey count will be about 300, however they are talking about the program increasing to something like 1000-2000 and with dreams of hitting 20,000.

    At 300 surveys that would yield 150,000 rows.

    At 1500 surveys that would yield 750,000 rows

    At 20,000 surveys would yield 10,000,000 rows

    For the most part(95% of questions) the data needs to be extracted to insert, edit, and view a single survey. However for the reports they would want to view a survey_id along side all 500 columns(for each question) for each survey ids. (so they could put it in a spreadsheet and analyze it)

    In your article, the queries seemed to preform rather well on my machine, but it is only tabulated to 12 columns wide for 80 rows, do you think it would still preform as well if you went to 50 columns wide or even 500?

    Jeff Moden (12/8/2008)


    Again... I don't believe the survey database is correctly designed if you need to drag this into a spreadsheet. You should be able to get the survey ids for each answered question without loading it into a spreadsheet and you should be able to do all sorts of analysis without loading it into a spreadsheet.

    If people insist on loading this data into a spreadsheet, perhaps Part 2 of this series would be better suited...

    [font="Arial Black"](click here) Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs[/font][/url]

    The reason it must dragged into a spread sheet is because researcher want to be able to extract the data from the DB and perform statistical analysis on it themselves with SAS or SPSS. (this is a requirement put on me that I have little control over)

    P.S. I have already read part 2 of your series and I plan on using it to develop the stored procedures for this project.

  • Kris (12/8/2008)


    Do you have any advice on increasing the performance of this query?

    Heh... yeah... use the CrossTab code instead. 😉

    The reason it must dragged into a spread sheet is because researcher want to be able to extract the data from the DB and perform statistical analysis on it themselves with SAS or SPSS. (this is a requirement put on me that I have little control over)

    Ok... whatever... I think they're a bit whacked for doing it that way, but ok. They're not easily going to get 500 columns if this is for SQL Server 2000 but can be done easily in 2k5.

    Recommend that if you have further questions on this particular subject, that you start a new thread in the appropriate forum so that more folks than just me will reply and to keep from dilluting this particulart thread too much..

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Kris (12/8/2008)


    The reason it must dragged into a spread sheet is because researcher want to be able to extract the data from the DB and perform statistical analysis on it themselves with SAS or SPSS. (this is a requirement put on me that I have little control over)

    Good heavens, you should not be dragging data into Excel for SAS or SPSS to use. Both of these should be able to get the data directly from SQL Server. Excel is definitely the weak link in that chain.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • rbarryyoung (12/8/2008)


    Kris (12/8/2008)


    The reason it must dragged into a spread sheet is because researcher want to be able to extract the data from the DB and perform statistical analysis on it themselves with SAS or SPSS. (this is a requirement put on me that I have little control over)

    Good heavens, you should not be dragging data into Excel for SAS or SPSS to use. Both of these should be able to get the data directly from SQL Server. Excel is definitely the weak link in that chain.

    Heh... now, now, Barry... we all know that a single user is smarter than all the DBA's on these forums... 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • But this is more than bad design, it's just completely unnecessary. Excel adds nothing to this process, except limitations and overhead.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Absolutely agree... but it won't change their mind...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • rbarryyoung (12/8/2008)


    But this is more than bad design, it's just completely unnecessary. Excel adds nothing to this process, except limitations and overhead.

    ... And potential data loss.

  • Jeff Moden (12/8/2008)


    rbarryyoung (12/8/2008)


    Kris (12/8/2008)


    The reason it must dragged into a spread sheet is because researcher want to be able to extract the data from the DB and perform statistical analysis on it themselves with SAS or SPSS. (this is a requirement put on me that I have little control over)

    Good heavens, you should not be dragging data into Excel for SAS or SPSS to use. Both of these should be able to get the data directly from SQL Server. Excel is definitely the weak link in that chain.

    Heh... now, now, Barry... we all know that a single user is smarter than all the DBA's on these forums... 😛

    pretty much, it wasn't my decision, I simply have to make a program to fit within their protocol. With reporting services, the excel format is easy, so long as I can make the right query. Which makes this cross tab article very useful.

    As far as the excel format being bad, that's something they'll believe once they see it. Then it'll be an easy fix on my side.

    Anyways, great article Jeff very helpful

  • Its pretty hard to understand why your SAS and SPSS users would want to wait for SQL Server to export to Execl, then import from Excel when they could just import straight from SQL Server.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Kris (12/11/2008)


    pretty much, it wasn't my decision, I simply have to make a program to fit within their protocol. With reporting services, the excel format is easy, so long as I can make the right query. Which makes this cross tab article very useful.

    As far as the excel format being bad, that's something they'll believe once they see it. Then it'll be an easy fix on my side.

    Anyways, great article Jeff very helpful

    Ah... now I get it... it's one of my favorite sayings... "Give them the opportunity to fail". They won't really know or care if their method is wrong or flawed until they've tried it and until they try it, they're not gonna believe you. So, you have to give them the opportunity to try it and see. Well done and very patient of you. 😉

    And, thanks for the compliment, Kris. If you run into any problems on this, post back and I'll try to answer your questions.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hey Jeff!

    Well presented and thought-provoking article, as millions of others have already said 😉

    I do love a performance challenge though, and the figures showing that PIVOT was slower than the old CASE trick got me interested. The below code is not pretty nor particularly readable (or even finished!) but it does illustrate a point. Using a nested PIVOT on the one million row example used 3,473 logical reads (one scan of the index - the same as the winning "normal" cross-tab, and slightly less CPU than the winning "Pre-Agg Cross-Tab with CTE".

    Cheers,

    Paul

    ;WITH

    PartialAgg AS

    (

    SELECT[Company], [Year], [Quarter] AS Q, [Quarter] AS Q2, SUM(Amount) AS Amt, SUM(Quantity) AS Qty

    FROM#sometable3 as PA

    GROUPBY

    [Company], [Year], [Quarter]

    ),

    PivotOne AS

    (

    SELECT[Company], [Year], Q2 AS

    ,

    ISNULL([1],0) AS Q1Amt,ISNULL([2],0) AS Q2Amt,ISNULL([3],0) AS Q3Amt,ISNULL([4],0) AS Q4Amt, qty

    FROMPartialAgg

    PIVOT(SUM(Amt) FOR

    IN ([1], [2], [3], [4])) AS pvt

    ),

    PivotTwo AS

    (

    SELECT[Company], [Year],

    SUM(Q1Amt) AS Q1Amt, SUM(Q2Amt) AS Q2Amt, SUM(Q3Amt) AS Q3Amt, SUM(Q4Amt) AS Q4Amt,

    SUM(ISNULL([1],0)) AS Q1Qty, SUM(ISNULL([2],0)) AS Q2Qty, SUM(ISNULL([3],0)) AS Q3Qty, SUM(ISNULL([4],0)) AS Q4Qty

    FROMPivotOne

    PIVOT(Sum(Qty) FOR

    IN ([1], [2], [3], [4])) AS pvt

    GROUPBY

    [Company], [Year]

    )

    SELECT[Company], [Year], Q1Amt, Q1Qty, Q2Amt, Q2Qty, Q3Amt, Q3Qty, Q4Amt, Q4Qty,

    Q1Amt + Q2Amt + Q3Amt +Q4Amt AS TotalAmt,

    Q1Qty + Q2Qty + Q3Qty + Q4Qty AS TotalQty

    FROMPivotTwo

    ORDERBY

    [Company], [Year];

  • Outstanding... thanks for the post and the code, Paul... I'll check it out. That's what I love about forums. Lots of folks with lots of ideas.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (4/12/2009)


    Outstanding... thanks for the post and the code, Paul... I'll check it out. That's what I love about forums. Lots of folks with lots of ideas.

    Just to repeat what I posted elsewhere, please don't look upon this a good idea in practice - it's an ugly hack to get around the double-PIVOT join, and to take advantage of the fact that the plan has to evaluate all CASE statements for all rows (the hack presented does lots more work but on many fewer rows).

    This is a curiosity - that's all - until PIVOT is extended to do multiple aggregations, anyway :-).

  • Thanks for your concern. If I'm not mistaken, PIVOT can do multiple aggregations... I just gotta look back in my notes because I don't use PIVOT for all the reasons I stated in the article. I know I have code somewhere that shows how to do multiple aggregations using PIVOT.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (4/12/2009)


    Thanks for your concern. If I'm not mistaken, PIVOT can do multiple aggregations... I just gotta look back in my notes because I don't use PIVOT for all the reasons I stated in the article. I know I have code somewhere that shows how to do multiple aggregations using PIVOT.

    Jeff,

    Did you find that code at all? And did you get chance to check out the nested PIVOT? If so, what did you think?

    Oh, and have you enabled CLR yet and learnt C#? :laugh:

    Cheers,

    Paul

Viewing 15 posts - 121 through 135 (of 243 total)

You must be logged in to reply to this topic. Login to reply