Pivoting name value pairs

Question

Pivoting name value pairs

Viewing 15 posts - 16 through 30 (of 32 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004414 More actions · Answer 1

COldCoffee (4/19/2010)
Yes Karthik, i could also see a slight change in the execution plan, but that extra component dint cost the query.. so i guess the query performs equally with Jeff's, but got to test it with a lot of rows..

I was going to suggest that the Estimated and Actual Execution Plans can really throw you a curve ball when it comes to "cost" and "batch cost".

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ColdCoffee SSC-Dedicated Points: 39972 More actions · Answer 2

Jeff Moden (4/19/2010)
COldCoffee (4/19/2010)
Yes Karthik, i could also see a slight change in the execution plan, but that extra component dint cost the query.. so i guess the query performs equally with Jeff's, but got to test it with a lot of rows..
I was going to suggest that the Estimated and Actual Execution Plans can really throw you a curve ball when it comes to "cost" and "batch cost".

Jeff, i actuallly ran both yours and mine in the same window and i saw 50% query cost a piece... but there was one scalar expression component in my code which accounted for 0% only.. still, when put to test with > million rows will reveal the piths and myths of both queries..:-) i would depend on your testing rather than mine, so waiting for your test results 🙂

Jeff Moden SSC Guru Points: 1004414 More actions · Answer 3

Heh... I always get left holding the test bag... starting to feel like a cleaning lady. 😛

Like I said, the Execution Plan lies. You should never rely on it to make a final decision on which code to use for performance. You should only use it to figure out what's being used and where potential trouble spots may be.

Here's the test code including some duration measurements... as people have grown to expect, I tested on a million rows...

--===== Do some presets to clean up the display

SET ANSI_WARNINGS OFF;

SET NOCOUNT ON;

--===== Conditionally drop and recreate the test tables to make reruns easier

IF OBJECT_ID('TempDB..#Transaction') IS NOT NULL DROP TABLE #Transaction;

IF OBJECT_ID('TempDB..#NameValue1') IS NOT NULL DROP TABLE #NameValue1;

IF OBJECT_ID('TempDB..#NameValue2') IS NOT NULL DROP TABLE #NameValue2;

CREATE TABLE #Transaction

(

[id] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,

[tran_id] [int] NOT NULL,

[row_id] [int] NOT NULL,

[name] [nvarchar](50) NULL,

[value] [nvarchar](50) NULL,

)

;

CREATE TABLE #NameValue1 --For first test

(

[id] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,

[tran_id] [int] NOT NULL,

[name1] [nvarchar](50) NULL,

[value1] [nvarchar](50) NULL,

[name2] [nvarchar](50) NULL,

[value2] [nvarchar](50) NULL,

[name3] [nvarchar](50) NULL,

[value3] [nvarchar](50) NULL

)

;

CREATE TABLE #NameValue2 --For second test

(

[id] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,

[tran_id] [int] NOT NULL,

[name1] [nvarchar](50) NULL,

[value1] [nvarchar](50) NULL,

[name2] [nvarchar](50) NULL,

[value2] [nvarchar](50) NULL,

[name3] [nvarchar](50) NULL,

[value3] [nvarchar](50) NULL

)

;

--===== Create the test data. Since we don't really care for what the name

-- and value are for this test, varying lengths of NEWID() values were

-- used as a quick substitute.

INSERT INTO #Transaction

([Tran_ID], [Row_ID], [Name], [Value])

SELECT TOP (1000000)

((ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1)/3+1)*10 AS [Tran_ID],

(ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1)%3+1 AS [Row_ID],

LEFT(NEWID(),ABS(CHECKSUM(NEWID()))%36) AS [Name],

LEFT(NEWID(),ABS(CHECKSUM(NEWID()))%36) AS [Value]

FROM Master.sys.All_Columns ac1,

Master.sys.All_Columns ac2

;

--===== Begin measuring CPU and Duration times

SET STATISTICS TIME ON;

--===== ********** COldCoffee's Concatenate and Split solution **********

PRINT '********** COldCoffee''s Concatenate and Split solution **********'

INSERT INTO #NameValue1

([tran_id],[name1],[value1],[name2],[value2],[name3],[value3])

SELECT

[tran_id],

SUBSTRING([1],1,(CHARINDEX(';',[1])-1)) name1,

SUBSTRING([1],(CHARINDEX(';',[1])+1),DATALENGTH([1])-(CHARINDEX(';',[1]))) value1,

SUBSTRING([2],1,(CHARINDEX(';',[2])-1)) name2,

SUBSTRING([2],(CHARINDEX(';',[2])+1),DATALENGTH([2])-(CHARINDEX(';',[2]))) value2,

SUBSTRING([3],1,(CHARINDEX(';',[3])-1)) name3,

SUBSTRING([3],(CHARINDEX(';',[3])+1),DATALENGTH([3])-(CHARINDEX(';',[3]))) value3

FROM

(SELECT [tran_id], [row_id] , [NAME]+';'+[value] name_value FROM #Transaction) FOR_PIVOT

PIVOT

(MAX(name_value) FOR [row_id] IN ([1], [2], [3])) PIVOT_CONTROL

ORDER BY tran_id -- Included ORDER BY clause

;

--===== ********** Jeff's Cross Tab solution **********

PRINT '********** Jeff''s Cross Tab solution **********'

INSERT INTO #NameValue2

([tran_id],[name1],[value1],[name2],[value2],[name3],[value3])

SELECT [tran_id],

MAX(CASE WHEN [row_id] = 1 THEN [name] END) AS [name1],

MAX(CASE WHEN [row_id] = 1 THEN [value] END) AS [value1],

MAX(CASE WHEN [row_id] = 2 THEN [name] END) AS [name2],

MAX(CASE WHEN [row_id] = 2 THEN [value] END) AS [value2],

MAX(CASE WHEN [row_id] = 3 THEN [name] END) AS [name3],

MAX(CASE WHEN [row_id] = 3 THEN [value] END) AS [value3]

FROM #Transaction

GROUP BY [tran_id]

ORDER BY [tran_id]

;

--===== Stop measuring time and return warnings to normal

SET STATISTICS TIME OFF;

SET ANSI_WARNINGS ON;

And here are the results on my 8 year old but trusted desktop box...

********** COldCoffee's Concatenate and Split solution **********

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:

CPU time = 37813 ms, elapsed time = 42957 ms.

********** Jeff's Cross Tab solution **********

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:

CPU time = 10625 ms, elapsed time = 18448 ms.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Paul White SSC Guru Points: 150468 More actions · Answer 4

Another approach using PIVOT:

SELECT N.tran_id,

N.name1,

V.value1,

N.name2,

V.value2,

N.name3,

V.value3

FROM (

SELECT P.tran_id,

name1 = P.[1],

name2 = P.[2],

name3 = P.[3]

FROM (

SELECT tran_id, name, row_id

FROM #Transaction

) T

PIVOT (

MAX(name)

FOR row_id IN ([1],[2],[3])

) P

) N

JOIN (

SELECT P.tran_id,

value1 = P.[1],

value2 = P.[2],

value3 = P.[3]

FROM (

SELECT tran_id, value, row_id

FROM #Transaction

) T

PIVOT (

MAX(value)

FOR row_id IN ([1],[2],[3])

) P

) V

ON N.tran_id = V.tran_id;

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Paul White SSC Guru Points: 150468 More actions · Answer 5

And a nested PIVOT:

SELECT P.tran_id,

name1 = MAX(P.name1),

value1 = MAX(P.[1]),

name2 = MAX(P.name2),

value2 = MAX(P.[2]),

name3 = MAX(P.name3),

value3 = MAX(P.[3])

FROM (

SELECT P.tran_id,

name1 = P.[1],

name2 = P.[2],

name3 = P.[3],

P.row_id2,

P.value

FROM (

SELECT tran_id, name, row_id, value, 1 * row_id AS row_id2

FROM #Transaction

) T

PIVOT (

MAX(name)

FOR row_id IN ([1],[2],[3])

) P

) SQ

PIVOT (

MAX(value)

FOR row_id2 IN ([1],[2],[3])

) P

GROUP BY

P.tran_id;

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Paul White SSC Guru Points: 150468 More actions · Answer 6

Neither of those will be quite as fast as Jeff's method.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

karthik M One Orange Chip Points: 29139 More actions · Answer 7

here are the results from my PC.

********** COldCoffee's Concatenate and Split solution **********

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:

CPU time = 21157 ms, elapsed time = 17163 ms.

********** Jeff's Cross Tab solution **********

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:

CPU time = 13750 ms, elapsed time = 11205 ms.

karthik

karthik M One Orange Chip Points: 29139 More actions · Answer 8

Paul White NZ (4/20/2010)
And a nested PIVOT:
SELECT P.tran_id,
name1 = MAX(P.name1),
value1 = MAX(P.[1]),
name2 = MAX(P.name2),
value2 = MAX(P.[2]),
name3 = MAX(P.name3),
value3 = MAX(P.[3])
FROM (
SELECT P.tran_id,
name1 = P.[1],
name2 = P.[2],
name3 = P.[3],
P.row_id2,
P.value
FROM (
SELECT tran_id, name, row_id, value, 1 * row_id AS row_id2
FROM #Transaction
) T
PIVOT (
MAX(name)
FOR row_id IN ([1],[2],[3])
) P
) SQ
PIVOT (
MAX(value)
FOR row_id2 IN ([1],[2],[3])
) P
GROUP BY
P.tran_id;

when i execute the query, i got the below result set...

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:

CPU time = 34408 ms, elapsed time = 14404 ms.

karthik

ColdCoffee SSC-Dedicated Points: 39972 More actions · Answer 9

Paul, i initially coded both pivot-join and nested-pivot solution which is exactly similar to yours, but i feared tat 2 pivots and a join willl over-load the cpu and i dint give out the code 🙁

unfortunately those solutions seems to be par with the concate-split solution.. hmmm...interesting..

Paul White SSC Guru Points: 150468 More actions · Answer 10

COldCoffee (4/20/2010)
Paul, i initially coded both pivot-join and nested-pivot solution which is exactly similar to yours, but i feared tat 2 pivots and a join willl over-load the cpu and i dint give out the code 🙁

It's good to try different things. It's all a bit of a side-issue anyway since the traditional pivot is smaller, neater, and faster anyway!

unfortunately those solutions seems to be par with the concate-split solution.. hmmm...interesting..

A few seconds faster, even - elapsed time. String manipulation in T-SQL is pretty sucky.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

pYak Ten Centuries Points: 1027 More actions · Answer 11

Paul White NZ (4/20/2010)
COldCoffee (4/20/2010)
Paul, i initially coded both pivot-join and nested-pivot solution which is exactly similar to yours, but i feared tat 2 pivots and a join willl over-load the cpu and i dint give out the code 🙁
It's good to try different things. It's all a bit of a side-issue anyway since the traditional pivot is smaller, neater, and faster anyway!

COld

I agree with Paul its all about trying different solutions for the same problem.

I guess my next question would be that since in this case the traditional pivot method is

smaller, neater and faster

are there any particular cases that you can think of

where using the PIVOT operator would be more appropriate?

BTW Jeff thanks for setting up the test case.... You tote a mean test bag:-)

Jeff Moden SSC Guru Points: 1004414 More actions · Answer 12

karthikeyan-444867 (4/20/2010)
here are the results from my PC.
********** COldCoffee's Concatenate and Split solution **********
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 21157 ms, elapsed time = 17163 ms.
********** Jeff's Cross Tab solution **********
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 13750 ms, elapsed time = 11205 ms.

Thanks, Karthik. It's always good to see how things work on more than one machine.

{edit} Since the CPU times on both of those are greater than their respective Duration, it's also obvious that both pieces of code took advantage of some parallelism.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004414 More actions · Answer 13

pYak (4/20/2010)
I agree with Paul its all about trying different solutions for the same problem.

Absolutely true. "A Developer must not guess... A Developer must KNOW." 😀 That's why I carry a test bag. :hehe:

BTW Jeff thanks for setting up the test case.... You tote a mean test bag:-)

Heh... thanks, pYak. I aim to please. I sometimes miss but I'm always aiming. 😛

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ColdCoffee SSC-Dedicated Points: 39972 More actions · Answer 14

Jeff, you really are a wonderful character.. and for paul and karthik, thanks for providing new views and test results.. u guys are simply great.. thanks all 🙂

Paul White SSC Guru Points: 150468 More actions · Answer 15

pYak (4/20/2010)
are there any particular cases that you can think of where using the PIVOT operator would be more appropriate?

Not really, no. The PIVOT is shorthand for a very similar CASE construction - look closely at the execution plans and you'll see the close similarity. PIVOT is not as flexible though, and is often slightly slower.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi