How to transpose from Row to Columns with out using Pivot

Question

Post reply

How to transpose from Row to Columns with out using Pivot

Viewing 15 posts - 16 through 30 (of 33 total)

You must be logged in to reply to this topic. Login to reply

LutzM SSC Guru Points: 107049 More actions · Answer 1

Hi Chris,

first of all thank you for taking the time to get the data together and post it here!

Great job! :Wow:

When looking at the numbers in your table I started scratching my head. I'm a baldhead by now. :crazy:

It looks very strange that you'd get only 20% performance gain from no index to using index on Act.

Would you mind posting the actual execution plan for both scenarios? Note: Please make sure to include the actual and not the estimated execution plan.

I'm not at the point to follow Paul's conclusion yet (PIVOT is slightly more efficient), since the total numbers don't make sense to me. Not yet. It looks like one of the performance gurus around should have a look at it. Maybe we overlook the obvious...

But one of the first question from the gurus -aside from execution plan- would be table def together with any index definition (beside the one's that are part of the "challenge").

Do you think you'll find the time to provide those information? That would really be great!

Note: It always prevents me from sleeping well if there are issues around I don't understand at all but that contradict my current understanding - in this case I don't understand why there's so little difference between no index and a variety of tested indexes...

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 2

lmu92 (6/22/2009)
You don't need the subselect:
SELECT
ACCT_DEBTOR,
MAX(Case WHEN OCCURRENCE=1 THEN LANDLINE_CONTACT_NO ELSE null END) AS LandLineNumber1,
MAX(Case WHEN OCCURRENCE=2 THEN LANDLINE_CONTACT_NO ELSE null END) AS LandLineNumber2,
MAX(Case WHEN OCCURRENCE=3 THEN LANDLINE_CONTACT_NO ELSE null END) AS LandLineNumber3
FROM Post_File082_Landline_No
GROUP BY ACCT_DEBTOR

I've had great success with a little trick that Peter Larson taught me a while back. I actually incorporated his method into one of my articles. Peter calls it "pre-aggregation". In the presence of the correct indexes, the method will usually beat PIVOT.

[font="Courier New"] SELECT preagg.Acct_Debtor,

MAX(CASE WHEN preagg.Occurance=1 THEN preagg.Max_LandLine_Contact_No ELSE NULL END) AS LandLineNumber1,

MAX(CASE WHEN preagg.Occurance=2 THEN preagg.Max_LandLine_Contact_No ELSE NULL END) AS LandLineNumber2,

MAX(CASE WHEN preagg.Occurance=3 THEN preagg.Max_LandLine_Contact_No ELSE NULL END) AS LandLineNumber3

FROM

(--==== Pre-aggregate the data. This will obviously work much better with the correct index

SELECT Acct_Debtor, Occurance, MAX(LandLine_Contact_No) AS Max_LandLine_Contact_No

FROM dbo.Post_File082_Landline_No

GROUP BY Acct_Debtor, Occurance

)preagg

GROUP BY preagg.Acct_Debtor[/font]

I don't know what your indexes are, but I would think that the following index would be beneficial to the task...

[font="Courier New"] CREATE INDEX IX_Post_File082_Landline_No_Cover

ON dbo.Post_File082_Landline_No (Acct_Debtor, Occurance)

INCLUDE (LandLine_Contact_No)[/font]

Of course, I don't have your data to test with so I don't guarantee the code but it should be fairly close.

_____________________________________________________________________________________________________________________________

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 3

Paul White (6/25/2009)
Cool stuff Chris. Of course it leaves me wondered about the hash aggregate versus stream aggregate thing but hey.
So PIVOT can be slightly more efficient - if you can be bothered to fight the syntax, and if the query you need suits it.
😎

Heh... not if you do it right. 😉

http://www.sqlservercentral.com/articles/T-SQL/63681/

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 4

Heh... I almost forgot... you can get a tiny bit more speed out of it if you move the pre-aggregation from the derived table to a CTE like this...

[font="Courier New"];WITH

ctePreAgg AS

(--==== Pre-aggregate the data. This will obviously work much better with the correct index

SELECT Acct_Debtor, Occurance, MAX(LandLine_Contact_No) AS Max_LandLine_Contact_No

FROM dbo.Post_File082_Landline_No

GROUP BY Acct_Debtor, Occurance

)

SELECT preagg.Acct_Debtor,

MAX(CASE WHEN preagg.Occurance=1 THEN preagg.Max_LandLine_Contact_No ELSE NULL END) AS LandLineNumber1,

MAX(CASE WHEN preagg.Occurance=2 THEN preagg.Max_LandLine_Contact_No ELSE NULL END) AS LandLineNumber2,

MAX(CASE WHEN preagg.Occurance=3 THEN preagg.Max_LandLine_Contact_No ELSE NULL END) AS LandLineNumber3

FROM ctePreAgg AS preagg

GROUP BY preagg.Acct_Debtor[/font]

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Paul White SSC Guru Points: 150468 More actions · Answer 5

Here is a test rig to try the various methods on the traditional one million rows.

(By the way moving a sub-query like this to a common-table expression generally results in an identical plan - there is no speed-up. It does look better though - at least it does to me).

-- Everyone has one of these

USE tempdb;

GO

-- Don't print x row(s) affected messages

SETNOCOUNT ON;

GO

-- Conditional drops

IFOBJECT_ID(N'#CaseVersion', N'P') IS NOT NULL DROP PROCEDURE #CaseVersion;

IFOBJECT_ID(N'#PreAggVersion', N'P') IS NOT NULL DROP PROCEDURE #PreAggVersion;

IFOBJECT_ID(N'#PivotVersion', N'P') IS NOT NULL DROP PROCEDURE #PivotVersion;

IFOBJECT_ID(N'dbo.ClientContact', N'U') IS NOT NULL DROP TABLE dbo.ClientContact;

GO

-- Test table

CREATE TABLE dbo.ClientContact

(

client_idVARCHAR(10)NOT NULL,

sequence_idINTEGERNOT NULL,

telephoneVARCHAR(10)NOT NULL,

CONSTRAINT [PK dbo.ClientContact client_id, sequence_id]

PRIMARY KEY CLUSTERED (client_id, sequence_id)

);

GO

-- Add 999,999 test rows

-- (this may take 30 seconds or so)

WITHNumbers (client_id)

AS(

SELECTTOP (333333)

'A' + RIGHT(1000000000 + ROW_NUMBER() OVER (ORDER BY C2.[object_id]), 9)

FROMmaster.sys.columns C1, master.sys.columns C2

)

INSERTdbo.ClientContact

(client_id, sequence_id, telephone)

SELECTclient_id,

Sequence.id,

Telephone.tel

FROMNumbers

CROSS

APPLY(SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) Sequence (id)

CROSS

APPLY(SELECT RIGHT(10000000000 + CONVERT(INT, RAND(CHECKSUM(NEWID())) * 123456789), 10)) Telephone (tel);

GO

-- CASE-based version

CREATE PROCEDURE #CaseVersion AS

BEGIN

SET NOCOUNT ON;

IFOBJECT_ID(N'#DumpTable', N'U') IS NOT NULL DROP TABLE #DumpTable;

SELECTclient_id,

MAX(CASE WHEN sequence_id = 1 THEN telephone ELSE NULL END) AS LandLineNumber1,

MAX(CASE WHEN sequence_id = 2 THEN telephone ELSE NULL END) AS LandLineNumber2,

MAX(CASE WHEN sequence_id = 3 THEN telephone ELSE NULL END) AS LandLineNumber3

INTO#DumpTable

FROMdbo.ClientContact

GROUPBY

client_id

ORDERBY

client_id

OPTION(MAXDOP 1);

END;

GO

-- Pre-aggregated CASE version

CREATE PROCEDURE #PreAggVersion AS

BEGIN

SET NOCOUNT ON;

IFOBJECT_ID(N'#DumpTable', N'U') IS NOT NULL DROP TABLE #DumpTable;

SELECTpreagg.client_id,

MAX(CASE WHEN preagg.sequence_id = 1 THEN preagg.max_telephone ELSE NULL END) AS LandLineNumber1,

MAX(CASE WHEN preagg.sequence_id = 2 THEN preagg.max_telephone ELSE NULL END) AS LandLineNumber2,

MAX(CASE WHEN preagg.sequence_id = 3 THEN preagg.max_telephone ELSE NULL END) AS LandLineNumber3

INTO#DumpTable

FROM

(--==== Pre-aggregate the data. This will obviously work much better with the correct index

SELECTclient_id, sequence_id, MAX(telephone) AS max_telephone

FROMdbo.ClientContact

GROUPBY

client_id, sequence_id

) preagg

GROUPBY

preagg.client_id

ORDERBY

preagg.client_id

OPTION(MAXDOP 1);

END;

GO

-- PIVOT version

CREATE PROCEDURE #PivotVersion AS

BEGIN

SET NOCOUNT ON;

IFOBJECT_ID(N'#DumpTable', N'U') IS NOT NULL DROP TABLE #DumpTable;

SELECTPVT.client_id,

PVT.[1] AS [LandLineNumber1],

PVT.[2] AS [LandLineNumber2],

PVT.[3] AS [LandLineNumber3]

INTO#DumpTable

FROM(SELECT client_id, sequence_id, telephone FROM dbo.ClientContact) CC

PIVOT(MAX(telephone) FOR sequence_id IN ([1],[2],[3])) PVT

OPTION(MAXDOP 1);

END;

GO

-- Show actual execution plans with row counts, in text form

SET STATISTICS PROFILE ON;

EXECUTE #CaseVersion;

EXECUTE#PreAggVersion;

EXECUTE #PivotVersion;

GO

SET STATISTICS PROFILE OFF;

GO

-- Clear cached plans

DBCC FREESYSTEMCACHE('Object Plans');

WAITFOR DELAY '00:00:05';

GO

-- Run each procedure five times

EXECUTE #CaseVersion;

GO 5

EXECUTE#PreAggVersion;

GO 5

EXECUTE #PivotVersion;

GO 5

-- Results summary

SELECT[text],

execution_count,

min_worker_time / 1000 AS min_worker_time_ms,

max_worker_time / 1000 AS max_worker_time_ms,

total_worker_time / execution_count / 1000 AS avg_worker_time_ms,

total_elapsed_time / execution_count / 1000 AS avg_elapsed_time_ms,

plan_generation_num,

query_plan

FROMsys.dm_exec_query_stats

CROSS

APPLYsys.dm_exec_sql_text ([sql_handle])

CROSS

APPLYsys.dm_exec_query_plan ([plan_handle])

WHERE[text] LIKE '--%#%Version%'

AND[text] NOT LIKE '-- Results summary%'

ORDERBY

creation_time ASC;

GO

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

LutzM SSC Guru Points: 107049 More actions · Answer 6

Hi Paul,

thanx for the test scenario!

When I ran your scenario (1 mill rows, 333333 client id's) on my laptop I got approx. 1.1sec for all three versions (CASE, aggregate and PIVOT).

I extended it to 9 mill rows, 333333 client id's, ending up with 7.4sec for CASE and aggregate and 7.3 for PIVOT (I'd still call this a tie between the three...)

After that I reviewed the table provided by Chris. (http://www.sqlservercentral.com/Forums/Attachment3400.aspx)

I stumbled over the number of Accounts (9mill) vs. the total number of rows (12mill).

Based on those numbers I think the pre-aggregation Jeff's example is using most probably won't reduce the number of rows to be processed significantly. It's still a great way to do in a scenario where you'd have several rows with identical Acct_Debtor and Occurance. Unfortunately, it doesn't look like the data distribution in this scenario would be like this.

I then repeated your scenario with 9 mill rows , 3 mill client id's and it took 12sec CASE and aggregate and 11.8sec for PIVOT, which is still 10 times faster compared to what Chris is getting.

I'm not sure how the data are distributed over Acct_Debtor and Occurance, but I assume we have one row per client_id with sequence_id = 1.

If that's the case, it may help to eliminate those 9mill rows from the pivot and just work with the remaining rows (code can be included in your test scenario):

CREATE PROCEDURE #seq1Version AS

BEGIN

SET NOCOUNT ON;

IFOBJECT_ID(N'#DumpTable', N'U') IS NOT NULL DROP TABLE #DumpTable;

;WITH seq1

AS

(

SELECTclient_id AS client_id,

telephone AS LandLineNumber1

FROMdbo.ClientContact

WHERE sequence_id = 1

)

,

seqNot1

AS

(

SELECTclient_id AS client_id,

MAX(CASE WHEN sequence_id = 2 THEN telephone ELSE NULL END) AS LandLineNumber2,

MAX(CASE WHEN sequence_id = 3 THEN telephone ELSE NULL END) AS LandLineNumber3

FROM dbo.ClientContact

WHERE sequence_id > 1

GROUP BY client_id

)

SELECTseq1.client_id AS client_id,

seq1.LandLineNumber1 AS LandLineNumber1,

seqNot1.LandLineNumber2 AS LandLineNumber2,

seqNot1.LandLineNumber3 AS LandLineNumber3

FROMseq1

LEFT OUTER JOIN seqNot1 ON seq1.client_id = seqNot1.client_id

When I ran this with 1mill rows and a data distribution of 900k client_id with seq_id=1 and 100k client_id with seq_id=2 I got 1.7sec for the three versions you evaluated and 0.9sec for the version above.

-- generate 1mill rows with 900k row with sequence_id=1 and 100k rows with sequence_id=2

;WITHNumbers (client_id)

AS(

SELECTTOP (900000)

'A' + RIGHT(1000000000 + ROW_NUMBER() OVER (ORDER BY C2.[object_id]), 9)

FROMmaster.sys.columns C1, master.sys.columns C2

)

INSERTdbo.ClientContact

(client_id, sequence_id, telephone)

SELECTclient_id,

1,

Telephone.tel

FROMNumbers

CROSS

APPLY(SELECT RIGHT(10000000000 + CONVERT(INT, RAND(CHECKSUM(NEWID())) * 123456789), 10)) Telephone (tel);

;WITHNumbers (client_id)

AS(

SELECTTOP (100000)

'A' + RIGHT(1000000000 + ROW_NUMBER() OVER (ORDER BY C3.[object_id]), 9)

FROMmaster.sys.columns C1, master.sys.columns C2, master.sys.columns C3

)

INSERTdbo.ClientContact

(client_id, sequence_id, telephone)

SELECTclient_id,

2,

Telephone.tel

FROMNumbers

CROSS

APPLY(SELECT RIGHT(10000000000 + CONVERT(INT, RAND(CHECKSUM(NEWID())) * 123456789), 10)) Telephone (tel);

So, I think if it's possible to move a large number of rows from the pivot to a separate subquery, the performance should improve.

Edit: Note: the solution above will only be faster under special data distribution (in this scenario: more than 65% of the rows can be moved to the subquery). It should not be considered as a general solution.

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Paul White SSC Guru Points: 150468 More actions · Answer 7

Hey Lutz,

My results were the same as yours; and I agree with your assessment.

Clearly, in this case, and with this data distribution, there is nothing to choose between the three methods.

PIVOT is a bit of an odd duck: I don't find the syntax particularly intuitive, it struggles with multiple aggregations, but it can produce more compact code, and produces the same basic plan as the CASE code (for basic tasks) leading to comparable performance in many instances.

When I get a free moment I might take another look at some of the more complex examples there are floating about on this site, to see if the 'pre-aggregation' can be incorporated in the PIVOT method too. I use the quotes there since, in the plans I have seen so far, the aggregation still follows the evaluation of the CASE statement for each row (in the Compute Scalar) despite the way the query is written.

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 8

Paul White (6/27/2009)
Clearly, in this case, and with this data distribution, there is nothing to choose between the three methods.

I'll probably stick to using the Cross-Tab code (one method or the other depending on the data but will usually be the pre-agg method) for a couple of reasons...

1. Since it's a bit faster in the other cases, I don't want to get into the habit of writing PIVOTs.

2. Since it's a bit faster in the other cases, I don't want other people to read my code and use a PIVOT I may have written for one of those other cases.

3. Admittedly a personal preference, but I prefer the readability of Cross-Tabs over that of PIVOTs even for the simple stuff.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Paul White SSC Guru Points: 150468 More actions · Answer 9

Jeff Moden (6/27/2009)
I'll probably stick to using the Cross-Tab code

Me too. I'm really quite anti-PIVOT myself (as I have said at least once earlier on this thread) but I do like to give it a fair go whenever I come across it...and also, I didn't want to not post my test rig results just because they didn't prove what I wanted 🙂

I'm mostly against PIVOT because it is such a missed opportunity (in terms of functionality and performance), has a clumsy syntax (despite Books Online's protestations to the contrary!) and the SUM...CASE construction which we have all used for many years does the job better anyway.

As an example of my willingness to give PIVOT a 'fair go' take another look at my post in the discussion of your first article on this subject. That shows how to do a nested PIVOT (rather than a join) to achieve the same performance as the 'pre-aggregated' cross-tab. I don't make a big thing about that since the syntax is just so horrible.

Also, I have just noticed that 2008 has broken the code I posted there. As written it produces:

[font="Courier New"]Msg 8624, Level 16, State 21, Line x

Internal Query Processor Error: The query processor could not produce a query plan. For more information, contact Customer Support Services.

[/font]

...which is just fantastic :hehe:

I think this is a bug - since the PIVOT consumes not just the column you would expect, but also any columns aliased to it. If you have a copy of 2008 lying around, the following illustrates this 'bug':

DECLARE @T TABLE (A INT NOT NULL, B INT NOT NULL)

INSERT@T (A, B)

SELECT1, 1 UNION ALL

SELECT2, 2 UNION ALL

SELECT3, 3;

WITHBase AS

(SELECTA, A AS A2, B FROM @T)

SELECTA2, [1], [2], [3]

FROMBase

PIVOT(SUM(B) FOR A IN ([1],[2],[3])) AS pvt

Substituting CONVERT(INT, A) AS A2 for 'A AS A2' makes it work again :sigh:

In fact, anything dumb no-op like 1 * A AS A2 also works.

Optimizers eh?

To 'unbreak' my nested-PIVOT code for 2008, change '[Quarter] AS Q2' to '1 *[Quarter] AS Q2', or similar.

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

LutzM SSC Guru Points: 107049 More actions · Answer 10

Paul White (6/27/2009)
Clearly, in this case, and with this data distribution, there is nothing to choose between the three methods. ...
Paul

But how about the method I posted above (let's call it "Sub-Cross-Tab")? With the data distribution as shown the three options you compared take twice as long.

This thread and Jeff's last post lead me to the the conclusion that I'll stop playing around with PIVOT. It just has been proofed that a database cannot be turned into Excel (where pivot works fine) 🙂

Edit1: One more reason to stop using it: ~~PIVOT cannot be used for dynamic number of columns.~~ Dynamic Cross Tab can do it.

Edit2: Have to partially retract my first edit note since it's simply not true. I'm sorry. :blush:

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Paul White SSC Guru Points: 150468 More actions · Answer 11

lmu92 (6/28/2009)
Paul White (6/27/2009)
Clearly, in this case, and with this data distribution, there is nothing to choose between the three methods. ...
Paul
But how about the method I posted above (let's call it "Sub-Cross-Tab")?

That's why I stressed 'in this case, and with this data distribution' 😛

lmu92 (6/28/2009)
With the data distribution as shown the three options you compared take twice as long.

Well yes - but then we aren't comparing apples with apples anymore. Your method is specifically optimized for the data distribution you set up - and it is faster simply because it doesn't do any aggregation at all for the 900K rows with sequence_id = 1. The number of rows which do go through the stream aggregate (100K instead of 1M) drops the cost enough so that introducing the extra table scan and merge join still results in a cheaper overall plan.

The PIVOT can be optimized the same way, and runs 50ms faster on my laptop:

CREATE PROCEDURE #PivotVersion2 AS

BEGIN

SET NOCOUNT ON;

IFOBJECT_ID(N'#DumpTable', N'U') IS NOT NULL DROP TABLE #DumpTable;

WITHseq1

AS(

SELECT client_id AS client_id,

telephone AS LandLineNumber1

FROM dbo.ClientContact

WHEREsequence_id = 1

)

,seqNot1

AS(

SELECTPVT.client_id,

PVT.[1] AS [LandLineNumber1],

PVT.[2] AS [LandLineNumber2],

PVT.[3] AS [LandLineNumber3]

FROM(SELECT client_id, sequence_id, telephone FROM dbo.ClientContact WHERE sequence_id > 1) CC

PIVOT(MAX(telephone) FOR sequence_id IN ([1],[2],[3])) PVT

)

SELECT seq1.client_id AS client_id,

seq1.LandLineNumber1 AS LandLineNumber1,

seqNot1.LandLineNumber2 AS LandLineNumber2,

seqNot1.LandLineNumber3 AS LandLineNumber3

INTO#DumpTable

FROMseq1

LEFT

OUTER

JOINseqNot1

ONseq1.client_id = seqNot1.client_id

ORDERBY

client_id

OPTION(MAXDOP 1)

END;

lmu92 (6/28/2009)
This thread and Jeff's last post lead me to the the conclusion that I'll stop playing around with PIVOT. It just has been proofed that a database cannot be turned into Excel (where pivot works fine) 🙂
Edit: One more reason to stop using it: PIVOT cannot be used for dynamic number of columns. Dynamic Cross Tab can do it.

Jeff, could you let me have the number of your handrail-installation team please? 😀

PIVOT can be used with dynamic SQL, just the same as the CASE...SUM idea.

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

LutzM SSC Guru Points: 107049 More actions · Answer 12

Paul White (6/28/2009)

... Well yes - but then we aren't comparing apples with apples anymore. Your method is specifically optimized for the data distribution you set up - and it is faster simply because it doesn't do any aggregation at all for the 900K rows with sequence_id = 1. The number of rows which do go through the stream aggregate (100K instead of 1M) drops the cost enough so that introducing the extra table scan and merge join still results in a cheaper overall plan.

You're definitely right: the solution I came up with is faster only under a special kind of data distribution. The break even is around 65% of the rows (if you can eliminate more than 65% of your rows you'll start being faster than with the standard pivot methods). My interpretation of the data Chris provided in one of his posts brought me to the idea to eliminate the largest amount of rows before even starting the pivot.

So it is a solution modified for the special request of the OP in this thread, not a general solution. Sorry, if I've left that impression.

My goal was to give Chris another option to play with to get his special case resolved. 😉

I'll edit my original post to clarify.

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

LutzM SSC Guru Points: 107049 More actions · Answer 13

Paul White (6/28/2009)
lmu92 (6/28/2009)
Paul White (6/27/2009)
Clearly, in this case, and with this data distribution, there is nothing to choose between the three methods. ...
Paul
But how about the method I posted above (let's call it "Sub-Cross-Tab")?
That's why I stressed 'in this case, and with this data distribution' 😛

I might have been a little confused about what you where referring to as "this case , and with this data distribution".

I thought you were talking about the scenario Chris described. But after reviewing your posts it becomes clear that you were talking about the test scenario you set up. Sorry for the misinterpretation and the confusion I caused. :Whistling:

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Paul White SSC Guru Points: 150468 More actions · Answer 14

lmu92 (6/28/2009)
You're definitely right: the solution I came up with is faster only under a special kind of data distribution. The break even is around 65% of the rows (if you can eliminate more than 65% of your rows you'll start being faster than with the standard pivot methods). My interpretation of the data Chris provided in one of his posts brought me to the idea to eliminate the largest amount of rows before even starting the pivot.
So it is a solution modified for the special request of the OP in this thread, not a general solution. Sorry, if I've left that impression.

Hey Lutz,

I make allowances for the NEAFLs :w00t: 😀 :w00t:

It's all good. I just had to respond with the equivalent PIVOT code - for balance 😉

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

LutzM SSC Guru Points: 107049 More actions · Answer 15

What makes me wondering though is what Chris (the OP who started this thread in the first place) ended up with to tune his query... The subject he was talking about got a little out of focus, I think.

I hope we didn't scare him away...

@chris-2: Are you still with us??

Edit: Btw: Congrats, Paul for being "TenCenturies" now!! :Wow: Well deserved!

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]