April 19, 2010 at 9:02 am
COldCoffee (4/19/2010)
Yes Karthik, i could also see a slight change in the execution plan, but that extra component dint cost the query.. so i guess the query performs equally with Jeff's, but got to test it with a lot of rows..
I was going to suggest that the Estimated and Actual Execution Plans can really throw you a curve ball when it comes to "cost" and "batch cost".
--Jeff Moden
Change is inevitable... Change for the better is not.
April 19, 2010 at 11:17 am
Jeff Moden (4/19/2010)
COldCoffee (4/19/2010)
Yes Karthik, i could also see a slight change in the execution plan, but that extra component dint cost the query.. so i guess the query performs equally with Jeff's, but got to test it with a lot of rows..I was going to suggest that the Estimated and Actual Execution Plans can really throw you a curve ball when it comes to "cost" and "batch cost".
Jeff, i actuallly ran both yours and mine in the same window and i saw 50% query cost a piece... but there was one scalar expression component in my code which accounted for 0% only.. still, when put to test with > million rows will reveal the piths and myths of both queries..:-) i would depend on your testing rather than mine, so waiting for your test results 🙂
April 19, 2010 at 9:59 pm
Heh... I always get left holding the test bag... starting to feel like a cleaning lady. 😛
Like I said, the Execution Plan lies. You should never rely on it to make a final decision on which code to use for performance. You should only use it to figure out what's being used and where potential trouble spots may be.
Here's the test code including some duration measurements... as people have grown to expect, I tested on a million rows...
--===== Do some presets to clean up the display
SET ANSI_WARNINGS OFF;
SET NOCOUNT ON;
--===== Conditionally drop and recreate the test tables to make reruns easier
IF OBJECT_ID('TempDB..#Transaction') IS NOT NULL DROP TABLE #Transaction;
IF OBJECT_ID('TempDB..#NameValue1') IS NOT NULL DROP TABLE #NameValue1;
IF OBJECT_ID('TempDB..#NameValue2') IS NOT NULL DROP TABLE #NameValue2;
CREATE TABLE #Transaction
(
[id] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
[tran_id] [int] NOT NULL,
[row_id] [int] NOT NULL,
[name] [nvarchar](50) NULL,
[value] [nvarchar](50) NULL,
)
;
CREATE TABLE #NameValue1 --For first test
(
[id] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
[tran_id] [int] NOT NULL,
[name1] [nvarchar](50) NULL,
[value1] [nvarchar](50) NULL,
[name2] [nvarchar](50) NULL,
[value2] [nvarchar](50) NULL,
[name3] [nvarchar](50) NULL,
[value3] [nvarchar](50) NULL
)
;
CREATE TABLE #NameValue2 --For second test
(
[id] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
[tran_id] [int] NOT NULL,
[name1] [nvarchar](50) NULL,
[value1] [nvarchar](50) NULL,
[name2] [nvarchar](50) NULL,
[value2] [nvarchar](50) NULL,
[name3] [nvarchar](50) NULL,
[value3] [nvarchar](50) NULL
)
;
--===== Create the test data. Since we don't really care for what the name
-- and value are for this test, varying lengths of NEWID() values were
-- used as a quick substitute.
INSERT INTO #Transaction
([Tran_ID], [Row_ID], [Name], [Value])
SELECT TOP (1000000)
((ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1)/3+1)*10 AS [Tran_ID],
(ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1)%3+1 AS [Row_ID],
LEFT(NEWID(),ABS(CHECKSUM(NEWID()))%36) AS [Name],
LEFT(NEWID(),ABS(CHECKSUM(NEWID()))%36) AS [Value]
FROM Master.sys.All_Columns ac1,
Master.sys.All_Columns ac2
;
--===== Begin measuring CPU and Duration times
SET STATISTICS TIME ON;
--===== ********** COldCoffee's Concatenate and Split solution **********
PRINT '********** COldCoffee''s Concatenate and Split solution **********'
INSERT INTO #NameValue1
([tran_id],[name1],[value1],[name2],[value2],[name3],[value3])
SELECT
[tran_id],
SUBSTRING([1],1,(CHARINDEX(';',[1])-1)) name1,
SUBSTRING([1],(CHARINDEX(';',[1])+1),DATALENGTH([1])-(CHARINDEX(';',[1]))) value1,
SUBSTRING([2],1,(CHARINDEX(';',[2])-1)) name2,
SUBSTRING([2],(CHARINDEX(';',[2])+1),DATALENGTH([2])-(CHARINDEX(';',[2]))) value2,
SUBSTRING([3],1,(CHARINDEX(';',[3])-1)) name3,
SUBSTRING([3],(CHARINDEX(';',[3])+1),DATALENGTH([3])-(CHARINDEX(';',[3]))) value3
FROM
(SELECT [tran_id], [row_id] , [NAME]+';'+[value] name_value FROM #Transaction) FOR_PIVOT
PIVOT
(MAX(name_value) FOR [row_id] IN ([1], [2], [3])) PIVOT_CONTROL
ORDER BY tran_id -- Included ORDER BY clause
;
--===== ********** Jeff's Cross Tab solution **********
PRINT '********** Jeff''s Cross Tab solution **********'
INSERT INTO #NameValue2
([tran_id],[name1],[value1],[name2],[value2],[name3],[value3])
SELECT [tran_id],
MAX(CASE WHEN [row_id] = 1 THEN [name] END) AS [name1],
MAX(CASE WHEN [row_id] = 1 THEN [value] END) AS [value1],
MAX(CASE WHEN [row_id] = 2 THEN [name] END) AS [name2],
MAX(CASE WHEN [row_id] = 2 THEN [value] END) AS [value2],
MAX(CASE WHEN [row_id] = 3 THEN [name] END) AS [name3],
MAX(CASE WHEN [row_id] = 3 THEN [value] END) AS [value3]
FROM #Transaction
GROUP BY [tran_id]
ORDER BY [tran_id]
;
--===== Stop measuring time and return warnings to normal
SET STATISTICS TIME OFF;
SET ANSI_WARNINGS ON;
And here are the results on my 8 year old but trusted desktop box...
********** COldCoffee's Concatenate and Split solution **********
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 37813 ms, elapsed time = 42957 ms.
********** Jeff's Cross Tab solution **********
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 10625 ms, elapsed time = 18448 ms.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 20, 2010 at 1:17 am
Another approach using PIVOT:
SELECT N.tran_id,
N.name1,
V.value1,
N.name2,
V.value2,
N.name3,
V.value3
FROM (
SELECT P.tran_id,
name1 = P.[1],
name2 = P.[2],
name3 = P.[3]
FROM (
SELECT tran_id, name, row_id
FROM #Transaction
) T
PIVOT (
MAX(name)
FOR row_id IN ([1],[2],[3])
) P
) N
JOIN (
SELECT P.tran_id,
value1 = P.[1],
value2 = P.[2],
value3 = P.[3]
FROM (
SELECT tran_id, value, row_id
FROM #Transaction
) T
PIVOT (
MAX(value)
FOR row_id IN ([1],[2],[3])
) P
) V
ON N.tran_id = V.tran_id;
April 20, 2010 at 1:26 am
And a nested PIVOT:
SELECT P.tran_id,
name1 = MAX(P.name1),
value1 = MAX(P.[1]),
name2 = MAX(P.name2),
value2 = MAX(P.[2]),
name3 = MAX(P.name3),
value3 = MAX(P.[3])
FROM (
SELECT P.tran_id,
name1 = P.[1],
name2 = P.[2],
name3 = P.[3],
P.row_id2,
P.value
FROM (
SELECT tran_id, name, row_id, value, 1 * row_id AS row_id2
FROM #Transaction
) T
PIVOT (
MAX(name)
FOR row_id IN ([1],[2],[3])
) P
) SQ
PIVOT (
MAX(value)
FOR row_id2 IN ([1],[2],[3])
) P
GROUP BY
P.tran_id;
April 20, 2010 at 1:27 am
Neither of those will be quite as fast as Jeff's method.
April 20, 2010 at 1:35 am
here are the results from my PC.
********** COldCoffee's Concatenate and Split solution **********
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 21157 ms, elapsed time = 17163 ms.
********** Jeff's Cross Tab solution **********
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 13750 ms, elapsed time = 11205 ms.
karthik
April 20, 2010 at 1:39 am
Paul White NZ (4/20/2010)
And a nested PIVOT:
SELECT P.tran_id,
name1 = MAX(P.name1),
value1 = MAX(P.[1]),
name2 = MAX(P.name2),
value2 = MAX(P.[2]),
name3 = MAX(P.name3),
value3 = MAX(P.[3])
FROM (
SELECT P.tran_id,
name1 = P.[1],
name2 = P.[2],
name3 = P.[3],
P.row_id2,
P.value
FROM (
SELECT tran_id, name, row_id, value, 1 * row_id AS row_id2
FROM #Transaction
) T
PIVOT (
MAX(name)
FOR row_id IN ([1],[2],[3])
) P
) SQ
PIVOT (
MAX(value)
FOR row_id2 IN ([1],[2],[3])
) P
GROUP BY
P.tran_id;
when i execute the query, i got the below result set...
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 34408 ms, elapsed time = 14404 ms.
karthik
April 20, 2010 at 5:48 am
Paul, i initially coded both pivot-join and nested-pivot solution which is exactly similar to yours, but i feared tat 2 pivots and a join willl over-load the cpu and i dint give out the code 🙁
unfortunately those solutions seems to be par with the concate-split solution.. hmmm...interesting..
April 20, 2010 at 5:54 am
COldCoffee (4/20/2010)
Paul, i initially coded both pivot-join and nested-pivot solution which is exactly similar to yours, but i feared tat 2 pivots and a join willl over-load the cpu and i dint give out the code 🙁
It's good to try different things. It's all a bit of a side-issue anyway since the traditional pivot is smaller, neater, and faster anyway!
unfortunately those solutions seems to be par with the concate-split solution.. hmmm...interesting..
A few seconds faster, even - elapsed time. String manipulation in T-SQL is pretty sucky.
April 20, 2010 at 6:21 am
Paul White NZ (4/20/2010)
COldCoffee (4/20/2010)
Paul, i initially coded both pivot-join and nested-pivot solution which is exactly similar to yours, but i feared tat 2 pivots and a join willl over-load the cpu and i dint give out the code 🙁It's good to try different things. It's all a bit of a side-issue anyway since the traditional pivot is smaller, neater, and faster anyway!
COld
I agree with Paul its all about trying different solutions for the same problem.
I guess my next question would be that since in this case the traditional pivot method is
smaller, neater and faster
are there any particular cases that you can think of
where using the PIVOT operator would be more appropriate?
BTW Jeff thanks for setting up the test case.... You tote a mean test bag:-)
April 20, 2010 at 7:12 am
karthikeyan-444867 (4/20/2010)
here are the results from my PC.********** COldCoffee's Concatenate and Split solution **********
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 21157 ms, elapsed time = 17163 ms.
********** Jeff's Cross Tab solution **********
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 13750 ms, elapsed time = 11205 ms.
Thanks, Karthik. It's always good to see how things work on more than one machine.
{edit} Since the CPU times on both of those are greater than their respective Duration, it's also obvious that both pieces of code took advantage of some parallelism.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 20, 2010 at 7:24 am
pYak (4/20/2010)
I agree with Paul its all about trying different solutions for the same problem.
Absolutely true. "A Developer must not guess... A Developer must KNOW." 😀 That's why I carry a test bag. :hehe:
BTW Jeff thanks for setting up the test case.... You tote a mean test bag:-)
Heh... thanks, pYak. I aim to please. I sometimes miss but I'm always aiming. 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
April 20, 2010 at 10:10 am
Jeff, you really are a wonderful character.. and for paul and karthik, thanks for providing new views and test results.. u guys are simply great.. thanks all 🙂
April 20, 2010 at 5:37 pm
pYak (4/20/2010)
are there any particular cases that you can think of where using the PIVOT operator would be more appropriate?
Not really, no. The PIVOT is shorthand for a very similar CASE construction - look closely at the execution plans and you'll see the close similarity. PIVOT is not as flexible though, and is often slightly slower.
Viewing 15 posts - 16 through 30 (of 32 total)
You must be logged in to reply to this topic. Login to reply