Trying to UnPivot without using multiple selects unioned together

  • CirquedeSQLeil (10/28/2009)


    I really like the PointlessAlias as the table alias Emperor PaulPatine.

    :laugh: Thanks Jason. Good to know that other people find the syntax frustrating at times too.

    I don't think I have ever pivoted the result of an unpivot before. Don't suppose I ever will again, but you never know 🙂

  • Why not? It doesn't appear to cost too much, if anything to Pivot an Unpivot. I might start doing it just for the heck of it to confuse some of our developers. 😛

    I can't wait to see the time and i/o stats of the various approaches when run against volumes of rows. I'm gradually coming to accept the similarity of the execution plans though. I wonder if, under the covers, an UNPIVOT is really just a cross join like Peter's solution.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Bob, if you're going to write an article about this, this one may interest you as well. In an attempt to reverse engineer the query plan of your solution I came up with this:

    select

    rowID, column_nm, old, new

    from

    @Update u

    cross apply

    (

    select

    'column1_new' column_nm,

    cast(u.column1_old as sql_variant) old,

    cast(u.column1_new as sql_variant) new

    union all

    select

    'column2_new' column_nm,

    cast(u.column2_old as sql_variant) old,

    cast(u.column2_new as sql_variant) new

    union all

    select

    'column3_new' column_nm,

    cast(u.column3_old as sql_variant) old,

    cast(u.column3_new as sql_variant) new

    ) n

    order by

    1, 2

    Again, the query plan is almost identical to my cross join solution, except that the conversions are now done within the Constant Scan, so the query plan looks even more compact. What I like about this is that it requires less coding (no case expressions) when you have to extend this to more columns.

    Good luck with your article.

    Peter

  • Paul White (10/29/2009)


    CirquedeSQLeil (10/28/2009)


    I really like the PointlessAlias as the table alias Emperor PaulPatine.

    :laugh: Thanks Jason. Good to know that other people find the syntax frustrating at times too.

    I don't think I have ever pivoted the result of an unpivot before. Don't suppose I ever will again, but you never know 🙂

    Nice correction on the Emperor/Senator join = Emperator.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Peter's latest submission is quite the trick. Though not significantly different, it does have improvement in execution plan and client statistics over the others.

    Again, surprising is the amazing similarity in plans and io stats between all of these queries.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Hence, my earlier comment 🙂

    I wonder if, under the covers, an UNPIVOT is really just a cross join like Peter's solution.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Peter's latest is awesome! :w00t:

    If I'm allowed a little 2008-syntax, this produces the same plan, but with even more compact SQL:

    select U.rowid, CA.column_nm, CA.old, CA.new

    from @Update U

    cross apply

    ( values

    ('column1', convert(sql_variant, u.Column1_old), convert(sql_variant, u.Column1_new)),

    ('column2', u.Column2_old, u.Column2_new),

    ('column3', u.Column3_old, u.Column3_new)

    ) CA (column_nm, old, new)

    order by U.rowid, CA.column_nm;

  • select U.rowid, CA.column_nm, CA.old, CA.new

    from @Update U

    cross apply

    ( values

    ('column1', convert(sql_variant, u.Column1_old), convert(sql_variant, u.Column1_new)),

    ('column2', u.Column2_old, u.Column2_new),

    ('column3', u.Column3_old, u.Column3_new)

    ) CA (column_nm, old, new)

    order by U.rowid, CA.column_nm;

    Very nice Paul. Never realized you could use the VALUES clause as a derived table in SQL 2008. And I looked up BOL to find anything about the implicit conversions, but I couldn't find anything. Any way, it seems to work that way. Good to know.

    Peter

  • Emperor Paulpatine, that is truly elegant to behold.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Peter Brinkhaus (10/29/2009)


    ...I looked up BOL to find anything about the implicit conversions, but I couldn't find anything. Any way, it seems to work that way. Good to know.

    The rules are the same as for UNION - see Guidelines for Using Union:

    Books Online


    When different data types are combined in a UNION operation, they are converted by using the rules of data type precedence. In the previous example, the int values are converted to float, because float has a higher precedence than int. For more information, see Data Type Precedence (Transact-SQL).

    SQL_VARIANT happens to have the second-highest precedence (after UDTs).

  • Bob Hovious 24601 (10/29/2009)


    Emperor Paulpatine, that is truly elegant to behold.

    Wait til Barry gets hold of it and on-the-fly compresses it down to 20 bytes :laugh:

    http://beyondrelational.com/blogs/tc/archive/2009/06/22/tsql-challenge-7-solution-by-barry-young.aspx

  • Okay, I've finished my performance testing. See attached zip file for the code used, as well as the execution plans generated and the trace file outputs.

    Test was:

    Create a 1,000,000 row test table, with 30 pairs of columns.

    Run Bob's "Hybrid" Cross-Join / Unpivot solution.

    Run Peter's Cross-Join.

    Run Paul's first solution (UnPivot / RePivot).

    The winner, by a landslide, is Peter's solution. It consistently ran in 1/3 the time of the second place, which was Bob's.

    Sorry Paul, but your solution was a very distant 3rd. I let it run overnight, and it wasn't finished in the morning. And tempdb had sucked up all remaining space on the drive, and was very busy swapping out with the page file.

    I then ran the following tests:

    Test 1: rerun test, but stopped Paul's after 30 minutes. See SSCTest1.trc. Tempdb was up to 30gb at this point.

    After seeing the effect of Paul's solution on tempdb, I decided to measure what effect the resizing of tempdb had on the performance of the other two routines.

    After a clean boot, a "select name, size from tempdb.sys.database_files" has tempDev/tempLog of 1024/64.

    After creating the million row #update table, they were 52048/528

    After running Bob's Hybrid method, they were 263168/8720.

    After running Peter's Cross-Join method, they were 263168/8720 (the same as after Bob's).

    see SSCTest2.trc

    Seeing that Peter's didn't have to resize tempdb, I then rebooted, and ran the tests again, but running Peter's Cross-Join method before Bobs:

    After the reboot, the tempDev/tempLog were 1024/64.

    After creating the million row #update table, they were 52048/528

    After running Peter's Cross-Join method, they were 217488/7208

    After running Bob's Hybrid method, they were 263168/8720.

    In this test, Peter's Cross-Join took about 1 minute longer to run, while Bob's Hybrid method took about 30 seconds less to run.

    see SSCTest3.trc

    In all of these tests, Peter's Cross-Tab solution had about 9x the reads and fewer writes, but it always ran in about 1/3 the time. The percentage of writes varies by whether it was run before or after Bob's Hybrid solution.

    Finally, I wanted to get a comparison of Paul's method, and to see it's execution plan. I set the test to 1000 rows, and ran again. Results are in SSCTest4.trc.

    Even with this small number of rows, the CPU and Duration was 20x the Cross-Join method; the reads were 1000x, and the writes were 12,000x. Peter's Cross-Join was finished in 237ms, while Bob's Hybrid was finished in 1050ms. The UnPivot/RePivot took 5656ms.

    When I tried this again with 10,000 rows, I had to abort it again... it was taking just too long. It clearly does not scale uniformly.

    All execution plans show table scans of the #Update table (expected).

    So, in summary:

    On the million row test, the Hybrid solution ran consistently in just short of 11 minutes; while the pure cross-join method ran in just over 4 minutes.

    With only 1,000 rows, Paul's Unpivot/RePivot solution took longer than the Cross-Join on one million rows. It also required extensive tempdb usage.

    In one of the earlier posts, there was mention of the time to write the code. They all took about the same. Though for all, I let SQL write a lot of it for me, then cut-and-paste.

    Note: the test was performed on my 4gb x64 laptop, w/ SQL 2008. You might not be able to read the trace files if you're not using SQL 2008 Profiler... but you should be able to query them with fn_trace_gettable.

    Edit: added attachment (wonder where it went the first time...)

    Edit2: I forgot to mention that in all of my tests, I had SSMS set to discard results.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Bob Hovious 24601 (10/29/2009)


    I wonder if, under the covers, an UNPIVOT is really just a cross join like Peter's solution.

    I don't think so... it doesn't perform like one.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • No matter, Wayne. I'm just delighted to see Peter's solution win out.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Wayne, thanks for the extensive test results. However I don't see the attachments in your post. I'm quite curious about the execution plans on a millions input rows, because the execution plans of all the solutions posted in this thread looked nearly identical. I expected only a little difference in CPU time, but not that difference in use of tempdb. Can you post the attachment again. Thanks. Great job.

    Peter

Viewing 15 posts - 16 through 30 (of 50 total)

You must be logged in to reply to this topic. Login to reply