Cross Tabs and Pivots, Part 1 – Converting Rows to Columns

  • I realize I'm getting to the party late (you know - after Billy spiked the punch and got expelled for mooning the principal), but - isn't the PIVOT syntax available in 2005 following the standard established by ANSI? I vaguely remember reading that the incredibly obtuse syntax was yet another product from "the committee" (who seems hell-bent on "expressiveness", but with no regard for keeping the syntax something a Human might actually wrap their head around).

    But otherwise - all of the various comments as to the limitations fall pretty much in line with my own observations. The thing that kills me the most is that even though it seems to often boil down to the same code in the exec plan as the "old" way of doing things, PIVOT seems largely impervious to using ANY indexing. I remember trying to build an index specifiically for its use, and it would either not use it at all (even a tailor-made one with exactly what it would need), or the performance wouldn't budge at all....

    All in all - a reasonably frustrating feature....

    (By the way - nice way to describe the topic Jeff!).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt Miller (8/24/2008)


    I realize I'm getting to the party late (you know - after Billy spiked the punch and got expelled for mooning the principal), but - isn't the PIVOT syntax available in 2005 following the standard established by ANSI? I vaguely remember reading that the incredibly obtuse syntax was yet another product from "the committee" (who seems hell-bent on "expressiveness", but with no regard for keeping the syntax something a Human might actually wrap their head around).

    You know, I've been thinking the same thing throughout this discussion, Matt, but I haven't gotten around to checking it out for sure.

    *sigh*, Just one more reason why the committee needs real users on it instead of mostly vendors and academics.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Heh... the thing I absolutely "love" about the idea of ANSI standards is... if we all do it their way, there will be no need for competition amongst the makers of RDBMS's to make anything better... open source or not.

    Thank goodness for Non-Ansi code and the renegades that design it because if Pivot is an example of what ANSI code will make all code look like, I'll go back to writing custom apps on text files. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (8/24/2008)


    Heh... the thing I absolutely "love" about the idea of ANSI standards is... if we all do it their way, there will be no need for competition amongst the makers of RDBMS's to make anything better... open source or not.

    Well, actually, there's nothing in the standard that prevents them from adding their own stuff. But, yeah, we sure wouldn't want them forced to compete on things like "Performance" and "Features". It's much better to have them spending their time designing commands with incompatible syntax whose only real purpose is to make it difficult for us change DBMS products.

    Thank goodness for Non-Ansi code and the renegades that design it ...

    Yes, thank-goodness, for those brave and selfless renegades, like IBM, Oracle and Microsoft, always out there taking such crazy risks, and only thinking of our best interests.

    😛

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • LMAO! You must love them almost as much as me! 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • You know it brother! 😀

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • That's why I keep coming back to this site. To read articles like this one, written for us mere mortals. Thank you Jeff!

  • You bet, Mishaluba! Thanks for the compliment! 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (8/23/2008)


    If you want to suppress NULL's caused by missing data in a Pivot, there's no other way to do it than in the final Select. That's because you don't have Null's in the data until the final Select. Here's the proof... data is the same as before but has been "wounded" so it's missing all the data for a given Year, Company, and Quarter. Cross Tab takes it in stride no problem. Pivot, even though ALL data sources that can take COALESCE, still produces NULL's in the output because it's a position of data missing...

    Aye, but there is another way. The set of data you are providing to the pivot, can be forced to populate zeros or empty strings as the case may need to be in order to eliminate NULLs from the output, such as in the Cartesian product I mentioned.

    However, just because there *is* another way, doesn't mean it is worth using. So, I realize that this is not performance feasible in all cases... and I would not use the PIVOT in the current form anyway... so I guess the point is effectively mute.

  • dphillips (8/25/2008)


    Aye, but there is another way. The set of data you are providing to the pivot, can be forced to populate zeros or empty strings as the case may need to be in order to eliminate NULLs from the output, such as in the Cartesian product I mentioned.

    However, just because there *is* another way, doesn't mean it is worth using. So, I realize that this is not performance feasible in all cases... and I would not use the PIVOT in the current form anyway... so I guess the point is effectively mute.

    Heh... agreed. But, would you mind posting the code for the Pivot to work in such a fashion? I've got about 35 more emails to wade through and could use your help 'cause I'm just not thinking how it would work in a Pivot situation to preven nulls just now. Thanks.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (8/25/2008)


    Heh... agreed. But, would you mind posting the code for the Pivot to work in such a fashion?

    Jeff Moden (8/25/2008)


    Heh... agreed. But, would you mind posting the code for the Pivot to work in such a fashion?

    First, my PIVOT experience to this point is in other products, the most notable of which is MS Access. I don't have time to put the code together. Once the concept is understood, it is fairly easy to envision. Unfortunately, it is very wordy to explain; of moderate difficulty. It is a multi-layer query, broken into the following 4 basic steps or "layers":

    1) Cartesian Product to put all possible headers to all possible row data. Remember also that this Cartesian Product is very thin set of data, such as the ID column only against either the single column key values or text headers that will be created. We are not trying to bring in the world of data here; narrow is best and speediest (especially with Cartesian). This produces a vertical result of all row combinations between the two sources.

    2) Populate the data that matches. This is where code can get lengthy, as there can be many joins at this level, depending on implentation and need. Remember, again, narrow is best, do not try to bring in the extra row headers yet. Use the Key ID and the cartesianed-header key as joins to the values (or sums) that match.

    3) Populate missing spots as desired with zeros, zero-length strings, or other desired default value. Bring in the world of data now, like the label or description of the row keys.

    4) Pivot... or Crosstab. You can alternately bring in the world of row headers and summary columns here. Depends on the PIVOT engine used and whether it will make use of indexes and keys.

    No Nulls to deal with. The value data is fully populated before it even meets the PIVOT or CrossTab. In many cases I have found this to work faster than handling NULLs during a PIVOT because the indexes get used to better effect, and the source queries are extremely trim until it gets to the PIVOT or CrossTab.

    OK... I will tread out some sample code... but I am keeping this very simplistic. The reader should be able to discern where the subqueries meet, and can be inserted to make an actual working sample for their own need. I am also intentionally leaving off the final PIVOT as that should be self explanatory once you derive the first three layers. This is bare-bones. Essentially, we are merely building the derived table, as follows:

    --Layer1

    (SELECT

    DataSource.DataSourceID

    , Days120.DayNum

    FROM

    DataSource

    , Days120

    ) AS Layer1

    --Layer2

    (SELECT

    Layer1.DataSourceID

    , Layer1.DayNum

    , SUM(MyData.Data) AS MyValue

    FROM

    Layer1

    LEFT JOIN MyData ON ((MyData.DataSourceID = Layer1.DataSourceID)

    AND (MyData.DayNum = Layer1.DayNum))

    GROUP BY

    Layer1.DataSourceID

    , Layer1.DayNum

    ) AS Layer2

    --Layer3 NULL NUKE

    (SELECT

    Layer2.DataSourceID

    , Layer2.DayNum

    , ISNULL(Layer2.MyValue,0) AS MyValue

    FROM

    Layer2

    ) AS Layer3

    --Layer4 PIVOT or CrossTab

    --(not demonstrated... but straight-forward from here)

    This type of sample is what can be the guts of a Curve Model, with no missing (read NULL) holes. No matter when a datasource drops on the timeline, you can stack multiple DataSources against each other this way even when they do not run in the same timeframe. However, this model can also be used for non time-linear data.

    HTH.

  • Thanks for the code and the write up... it's exactly what I was afraid of though. In your example just for layer #1, if you have a million rows of data, imagine the internal rowset that will be generated by the full Cartesian product... 120 million rows. The methods you're talking about may work on small tables, but I don't believe to fair well in the face of scalability.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (8/26/2008)


    Thanks for the code and the write up... it's exactly what I was afraid of though. In your example just for layer #1, if you have a million rows of data, imagine the internal rowset that will be generated by the full Cartesian product... 120 million rows. The methods you're talking about may work on small tables, but I don't believe to fair well in the face of scalability.

    Not small tables... selective subsets. The size of the source tables matters not. Having intelligent indexes, and the size of the Cartesian product does. It works very well with COUNT and SUM data. Such models are only valid within specific groupings and time frames. Also keeping the number of columns in the vertical to a minimum is key; only 3 or 4, depending how many PIVOTed values one desires to display. In this example, it was 3, all numeric.

    It can be used well to good effect in specific needs. Can it be done other ways? Sure. On tremendously large datasets, I'd be shipping the data to a data warehouse where specific implementations can be made.

    All of this aside, however, was to show that one can pull a PIVOT without NULLS, and is only for specific situations.

  • dphillips (8/26/2008)


    All of this aside, however, was to show that one can pull a PIVOT without NULLS, and is only for specific situations.

    I should be rich 😀

    www.rac4sql.net

    www.beyondsql.blogspot.com

  • rog pike (8/27/2008)


    dphillips (8/26/2008)


    All of this aside, however, was to show that one can pull a PIVOT without NULLS, and is only for specific situations.

    I should be rich 😀

    www.rac4sql.net

    www.beyondsql.blogspot.com

    Heh... You still could be... convince Microsoft of what we already know... the Pivot function sucks and get them to license or outright buy what you've done.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 76 through 90 (of 243 total)

You must be logged in to reply to this topic. Login to reply