Pivoting name value pairs

  • COldCoffee (4/19/2010)


    Yes Karthik, i could also see a slight change in the execution plan, but that extra component dint cost the query.. so i guess the query performs equally with Jeff's, but got to test it with a lot of rows..

    I was going to suggest that the Estimated and Actual Execution Plans can really throw you a curve ball when it comes to "cost" and "batch cost".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (4/19/2010)


    COldCoffee (4/19/2010)


    Yes Karthik, i could also see a slight change in the execution plan, but that extra component dint cost the query.. so i guess the query performs equally with Jeff's, but got to test it with a lot of rows..

    I was going to suggest that the Estimated and Actual Execution Plans can really throw you a curve ball when it comes to "cost" and "batch cost".

    Jeff, i actuallly ran both yours and mine in the same window and i saw 50% query cost a piece... but there was one scalar expression component in my code which accounted for 0% only.. still, when put to test with > million rows will reveal the piths and myths of both queries..:-) i would depend on your testing rather than mine, so waiting for your test results 🙂

  • Heh... I always get left holding the test bag... starting to feel like a cleaning lady. 😛

    Like I said, the Execution Plan lies. You should never rely on it to make a final decision on which code to use for performance. You should only use it to figure out what's being used and where potential trouble spots may be.

    Here's the test code including some duration measurements... as people have grown to expect, I tested on a million rows...

    --===== Do some presets to clean up the display

    SET ANSI_WARNINGS OFF;

    SET NOCOUNT ON;

    --===== Conditionally drop and recreate the test tables to make reruns easier

    IF OBJECT_ID('TempDB..#Transaction') IS NOT NULL DROP TABLE #Transaction;

    IF OBJECT_ID('TempDB..#NameValue1') IS NOT NULL DROP TABLE #NameValue1;

    IF OBJECT_ID('TempDB..#NameValue2') IS NOT NULL DROP TABLE #NameValue2;

    CREATE TABLE #Transaction

    (

    [id] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,

    [tran_id] [int] NOT NULL,

    [row_id] [int] NOT NULL,

    [name] [nvarchar](50) NULL,

    [value] [nvarchar](50) NULL,

    )

    ;

    CREATE TABLE #NameValue1 --For first test

    (

    [id] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,

    [tran_id] [int] NOT NULL,

    [name1] [nvarchar](50) NULL,

    [value1] [nvarchar](50) NULL,

    [name2] [nvarchar](50) NULL,

    [value2] [nvarchar](50) NULL,

    [name3] [nvarchar](50) NULL,

    [value3] [nvarchar](50) NULL

    )

    ;

    CREATE TABLE #NameValue2 --For second test

    (

    [id] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,

    [tran_id] [int] NOT NULL,

    [name1] [nvarchar](50) NULL,

    [value1] [nvarchar](50) NULL,

    [name2] [nvarchar](50) NULL,

    [value2] [nvarchar](50) NULL,

    [name3] [nvarchar](50) NULL,

    [value3] [nvarchar](50) NULL

    )

    ;

    --===== Create the test data. Since we don't really care for what the name

    -- and value are for this test, varying lengths of NEWID() values were

    -- used as a quick substitute.

    INSERT INTO #Transaction

    ([Tran_ID], [Row_ID], [Name], [Value])

    SELECT TOP (1000000)

    ((ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1)/3+1)*10 AS [Tran_ID],

    (ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1)%3+1 AS [Row_ID],

    LEFT(NEWID(),ABS(CHECKSUM(NEWID()))%36) AS [Name],

    LEFT(NEWID(),ABS(CHECKSUM(NEWID()))%36) AS [Value]

    FROM Master.sys.All_Columns ac1,

    Master.sys.All_Columns ac2

    ;

    --===== Begin measuring CPU and Duration times

    SET STATISTICS TIME ON;

    --===== ********** COldCoffee's Concatenate and Split solution **********

    PRINT '********** COldCoffee''s Concatenate and Split solution **********'

    INSERT INTO #NameValue1

    ([tran_id],[name1],[value1],[name2],[value2],[name3],[value3])

    SELECT

    [tran_id],

    SUBSTRING([1],1,(CHARINDEX(';',[1])-1)) name1,

    SUBSTRING([1],(CHARINDEX(';',[1])+1),DATALENGTH([1])-(CHARINDEX(';',[1]))) value1,

    SUBSTRING([2],1,(CHARINDEX(';',[2])-1)) name2,

    SUBSTRING([2],(CHARINDEX(';',[2])+1),DATALENGTH([2])-(CHARINDEX(';',[2]))) value2,

    SUBSTRING([3],1,(CHARINDEX(';',[3])-1)) name3,

    SUBSTRING([3],(CHARINDEX(';',[3])+1),DATALENGTH([3])-(CHARINDEX(';',[3]))) value3

    FROM

    (SELECT [tran_id], [row_id] , [NAME]+';'+[value] name_value FROM #Transaction) FOR_PIVOT

    PIVOT

    (MAX(name_value) FOR [row_id] IN ([1], [2], [3])) PIVOT_CONTROL

    ORDER BY tran_id -- Included ORDER BY clause

    ;

    --===== ********** Jeff's Cross Tab solution **********

    PRINT '********** Jeff''s Cross Tab solution **********'

    INSERT INTO #NameValue2

    ([tran_id],[name1],[value1],[name2],[value2],[name3],[value3])

    SELECT [tran_id],

    MAX(CASE WHEN [row_id] = 1 THEN [name] END) AS [name1],

    MAX(CASE WHEN [row_id] = 1 THEN [value] END) AS [value1],

    MAX(CASE WHEN [row_id] = 2 THEN [name] END) AS [name2],

    MAX(CASE WHEN [row_id] = 2 THEN [value] END) AS [value2],

    MAX(CASE WHEN [row_id] = 3 THEN [name] END) AS [name3],

    MAX(CASE WHEN [row_id] = 3 THEN [value] END) AS [value3]

    FROM #Transaction

    GROUP BY [tran_id]

    ORDER BY [tran_id]

    ;

    --===== Stop measuring time and return warnings to normal

    SET STATISTICS TIME OFF;

    SET ANSI_WARNINGS ON;

    And here are the results on my 8 year old but trusted desktop box...

    ********** COldCoffee's Concatenate and Split solution **********

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 37813 ms, elapsed time = 42957 ms.

    ********** Jeff's Cross Tab solution **********

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 10625 ms, elapsed time = 18448 ms.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Another approach using PIVOT:

    SELECT N.tran_id,

    N.name1,

    V.value1,

    N.name2,

    V.value2,

    N.name3,

    V.value3

    FROM (

    SELECT P.tran_id,

    name1 = P.[1],

    name2 = P.[2],

    name3 = P.[3]

    FROM (

    SELECT tran_id, name, row_id

    FROM #Transaction

    ) T

    PIVOT (

    MAX(name)

    FOR row_id IN ([1],[2],[3])

    ) P

    ) N

    JOIN (

    SELECT P.tran_id,

    value1 = P.[1],

    value2 = P.[2],

    value3 = P.[3]

    FROM (

    SELECT tran_id, value, row_id

    FROM #Transaction

    ) T

    PIVOT (

    MAX(value)

    FOR row_id IN ([1],[2],[3])

    ) P

    ) V

    ON N.tran_id = V.tran_id;

  • And a nested PIVOT:

    SELECT P.tran_id,

    name1 = MAX(P.name1),

    value1 = MAX(P.[1]),

    name2 = MAX(P.name2),

    value2 = MAX(P.[2]),

    name3 = MAX(P.name3),

    value3 = MAX(P.[3])

    FROM (

    SELECT P.tran_id,

    name1 = P.[1],

    name2 = P.[2],

    name3 = P.[3],

    P.row_id2,

    P.value

    FROM (

    SELECT tran_id, name, row_id, value, 1 * row_id AS row_id2

    FROM #Transaction

    ) T

    PIVOT (

    MAX(name)

    FOR row_id IN ([1],[2],[3])

    ) P

    ) SQ

    PIVOT (

    MAX(value)

    FOR row_id2 IN ([1],[2],[3])

    ) P

    GROUP BY

    P.tran_id;

  • Neither of those will be quite as fast as Jeff's method.

  • here are the results from my PC.

    ********** COldCoffee's Concatenate and Split solution **********

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 21157 ms, elapsed time = 17163 ms.

    ********** Jeff's Cross Tab solution **********

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 13750 ms, elapsed time = 11205 ms.

    karthik

  • Paul White NZ (4/20/2010)


    And a nested PIVOT:

    SELECT P.tran_id,

    name1 = MAX(P.name1),

    value1 = MAX(P.[1]),

    name2 = MAX(P.name2),

    value2 = MAX(P.[2]),

    name3 = MAX(P.name3),

    value3 = MAX(P.[3])

    FROM (

    SELECT P.tran_id,

    name1 = P.[1],

    name2 = P.[2],

    name3 = P.[3],

    P.row_id2,

    P.value

    FROM (

    SELECT tran_id, name, row_id, value, 1 * row_id AS row_id2

    FROM #Transaction

    ) T

    PIVOT (

    MAX(name)

    FOR row_id IN ([1],[2],[3])

    ) P

    ) SQ

    PIVOT (

    MAX(value)

    FOR row_id2 IN ([1],[2],[3])

    ) P

    GROUP BY

    P.tran_id;

    when i execute the query, i got the below result set...

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 34408 ms, elapsed time = 14404 ms.

    karthik

  • Paul, i initially coded both pivot-join and nested-pivot solution which is exactly similar to yours, but i feared tat 2 pivots and a join willl over-load the cpu and i dint give out the code 🙁

    unfortunately those solutions seems to be par with the concate-split solution.. hmmm...interesting..

  • COldCoffee (4/20/2010)


    Paul, i initially coded both pivot-join and nested-pivot solution which is exactly similar to yours, but i feared tat 2 pivots and a join willl over-load the cpu and i dint give out the code 🙁

    It's good to try different things. It's all a bit of a side-issue anyway since the traditional pivot is smaller, neater, and faster anyway!

    unfortunately those solutions seems to be par with the concate-split solution.. hmmm...interesting..

    A few seconds faster, even - elapsed time. String manipulation in T-SQL is pretty sucky.

  • Paul White NZ (4/20/2010)


    COldCoffee (4/20/2010)


    Paul, i initially coded both pivot-join and nested-pivot solution which is exactly similar to yours, but i feared tat 2 pivots and a join willl over-load the cpu and i dint give out the code 🙁

    It's good to try different things. It's all a bit of a side-issue anyway since the traditional pivot is smaller, neater, and faster anyway!

    COld

    I agree with Paul its all about trying different solutions for the same problem.

    I guess my next question would be that since in this case the traditional pivot method is

    smaller, neater and faster

    are there any particular cases that you can think of

    where using the PIVOT operator would be more appropriate?

    BTW Jeff thanks for setting up the test case.... You tote a mean test bag:-)

  • karthikeyan-444867 (4/20/2010)


    here are the results from my PC.

    ********** COldCoffee's Concatenate and Split solution **********

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 21157 ms, elapsed time = 17163 ms.

    ********** Jeff's Cross Tab solution **********

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 13750 ms, elapsed time = 11205 ms.

    Thanks, Karthik. It's always good to see how things work on more than one machine.

    {edit} Since the CPU times on both of those are greater than their respective Duration, it's also obvious that both pieces of code took advantage of some parallelism.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • pYak (4/20/2010)


    I agree with Paul its all about trying different solutions for the same problem.

    Absolutely true. "A Developer must not guess... A Developer must KNOW." 😀 That's why I carry a test bag. :hehe:

    BTW Jeff thanks for setting up the test case.... You tote a mean test bag:-)

    Heh... thanks, pYak. I aim to please. I sometimes miss but I'm always aiming. 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff, you really are a wonderful character.. and for paul and karthik, thanks for providing new views and test results.. u guys are simply great.. thanks all 🙂

  • pYak (4/20/2010)


    are there any particular cases that you can think of where using the PIVOT operator would be more appropriate?

    Not really, no. The PIVOT is shorthand for a very similar CASE construction - look closely at the execution plans and you'll see the close similarity. PIVOT is not as flexible though, and is often slightly slower.

Viewing 15 posts - 16 through 30 (of 32 total)

You must be logged in to reply to this topic. Login to reply