How to transpose from Row to Columns with out using Pivot

  • Hi Chris,

    first of all thank you for taking the time to get the data together and post it here!

    Great job! :Wow:

    When looking at the numbers in your table I started scratching my head. I'm a baldhead by now. :crazy:

    It looks very strange that you'd get only 20% performance gain from no index to using index on Act.

    Would you mind posting the actual execution plan for both scenarios? Note: Please make sure to include the actual and not the estimated execution plan.

    I'm not at the point to follow Paul's conclusion yet (PIVOT is slightly more efficient), since the total numbers don't make sense to me. Not yet. It looks like one of the performance gurus around should have a look at it. Maybe we overlook the obvious...

    But one of the first question from the gurus -aside from execution plan- would be table def together with any index definition (beside the one's that are part of the "challenge").

    Do you think you'll find the time to provide those information? That would really be great!

    Note: It always prevents me from sleeping well if there are issues around I don't understand at all but that contradict my current understanding - in this case I don't understand why there's so little difference between no index and a variety of tested indexes...



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • lmu92 (6/22/2009)


    You don't need the subselect:

    SELECT

    ACCT_DEBTOR,

    MAX(Case WHEN OCCURRENCE=1 THEN LANDLINE_CONTACT_NO ELSE null END) AS LandLineNumber1,

    MAX(Case WHEN OCCURRENCE=2 THEN LANDLINE_CONTACT_NO ELSE null END) AS LandLineNumber2,

    MAX(Case WHEN OCCURRENCE=3 THEN LANDLINE_CONTACT_NO ELSE null END) AS LandLineNumber3

    FROM Post_File082_Landline_No

    GROUP BY ACCT_DEBTOR

    I've had great success with a little trick that Peter Larson taught me a while back. I actually incorporated his method into one of my articles. Peter calls it "pre-aggregation". In the presence of the correct indexes, the method will usually beat PIVOT.

    [font="Courier New"] SELECT preagg.Acct_Debtor

            MAX(CASE WHEN preagg.Occurance=THEN preagg.Max_LandLine_Contact_No ELSE NULL ENDAS LandLineNumber1,

            MAX(CASE WHEN preagg.Occurance=THEN preagg.Max_LandLine_Contact_No ELSE NULL ENDAS LandLineNumber2,

            MAX(CASE WHEN preagg.Occurance=THEN preagg.Max_LandLine_Contact_No ELSE NULL ENDAS LandLineNumber3

       FROM

            (--==== Pre-aggregate the data.  This will obviously work much better with the correct index

             SELECT Acct_DebtorOccuranceMAX(LandLine_Contact_NoAS Max_LandLine_Contact_No

               FROM dbo.Post_File082_Landline_No

              GROUP BY Acct_DebtorOccurance

            )preagg

      GROUP BY preagg.Acct_Debtor[/font]

    I don't know what your indexes are, but I would think that the following index would be beneficial to the task...

    [font="Courier New"] CREATE INDEX IX_Post_File082_Landline_No_Cover

         ON dbo.Post_File082_Landline_No (Acct_DebtorOccurance)

    INCLUDE (LandLine_Contact_No)[/font]

    Of course, I don't have your data to test with so I don't guarantee the code but it should be fairly close.

    _____________________________________________________________________________________________________________________________

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Paul White (6/25/2009)


    Cool stuff Chris. Of course it leaves me wondered about the hash aggregate versus stream aggregate thing but hey.

    So PIVOT can be slightly more efficient - if you can be bothered to fight the syntax, and if the query you need suits it.

    😎

    Heh... not if you do it right. 😉

    http://www.sqlservercentral.com/articles/T-SQL/63681/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Heh... I almost forgot... you can get a tiny bit more speed out of it if you move the pre-aggregation from the derived table to a CTE like this...

    [font="Courier New"];WITH 

    ctePreAgg AS

    (--==== Pre-aggregate the data.  This will obviously work much better with the correct index

     SELECT Acct_DebtorOccuranceMAX(LandLine_Contact_NoAS Max_LandLine_Contact_No

       FROM dbo.Post_File082_Landline_No

      GROUP BY Acct_DebtorOccurance

    )

     SELECT preagg.Acct_Debtor

            MAX(CASE WHEN preagg.Occurance=THEN preagg.Max_LandLine_Contact_No ELSE NULL ENDAS LandLineNumber1,

            MAX(CASE WHEN preagg.Occurance=THEN preagg.Max_LandLine_Contact_No ELSE NULL ENDAS LandLineNumber2,

            MAX(CASE WHEN preagg.Occurance=THEN preagg.Max_LandLine_Contact_No ELSE NULL ENDAS LandLineNumber3

       FROM ctePreAgg AS preagg

      GROUP BY preagg.Acct_Debtor[/font]

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Here is a test rig to try the various methods on the traditional one million rows.

    (By the way moving a sub-query like this to a common-table expression generally results in an identical plan - there is no speed-up. It does look better though - at least it does to me).

    -- Everyone has one of these

    USE tempdb;

    GO

    -- Don't print x row(s) affected messages

    SETNOCOUNT ON;

    GO

    -- Conditional drops

    IFOBJECT_ID(N'#CaseVersion', N'P') IS NOT NULL DROP PROCEDURE #CaseVersion;

    IFOBJECT_ID(N'#PreAggVersion', N'P') IS NOT NULL DROP PROCEDURE #PreAggVersion;

    IFOBJECT_ID(N'#PivotVersion', N'P') IS NOT NULL DROP PROCEDURE #PivotVersion;

    IFOBJECT_ID(N'dbo.ClientContact', N'U') IS NOT NULL DROP TABLE dbo.ClientContact;

    GO

    -- Test table

    CREATE TABLE dbo.ClientContact

    (

    client_idVARCHAR(10)NOT NULL,

    sequence_idINTEGERNOT NULL,

    telephoneVARCHAR(10)NOT NULL,

    CONSTRAINT [PK dbo.ClientContact client_id, sequence_id]

    PRIMARY KEY CLUSTERED (client_id, sequence_id)

    );

    GO

    -- Add 999,999 test rows

    -- (this may take 30 seconds or so)

    WITHNumbers (client_id)

    AS(

    SELECTTOP (333333)

    'A' + RIGHT(1000000000 + ROW_NUMBER() OVER (ORDER BY C2.[object_id]), 9)

    FROMmaster.sys.columns C1, master.sys.columns C2

    )

    INSERTdbo.ClientContact

    (client_id, sequence_id, telephone)

    SELECTclient_id,

    Sequence.id,

    Telephone.tel

    FROMNumbers

    CROSS

    APPLY(SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) Sequence (id)

    CROSS

    APPLY(SELECT RIGHT(10000000000 + CONVERT(INT, RAND(CHECKSUM(NEWID())) * 123456789), 10)) Telephone (tel);

    GO

    -- CASE-based version

    CREATE PROCEDURE #CaseVersion AS

    BEGIN

    SET NOCOUNT ON;

    IFOBJECT_ID(N'#DumpTable', N'U') IS NOT NULL DROP TABLE #DumpTable;

    SELECTclient_id,

    MAX(CASE WHEN sequence_id = 1 THEN telephone ELSE NULL END) AS LandLineNumber1,

    MAX(CASE WHEN sequence_id = 2 THEN telephone ELSE NULL END) AS LandLineNumber2,

    MAX(CASE WHEN sequence_id = 3 THEN telephone ELSE NULL END) AS LandLineNumber3

    INTO#DumpTable

    FROMdbo.ClientContact

    GROUPBY

    client_id

    ORDERBY

    client_id

    OPTION(MAXDOP 1);

    END;

    GO

    -- Pre-aggregated CASE version

    CREATE PROCEDURE #PreAggVersion AS

    BEGIN

    SET NOCOUNT ON;

    IFOBJECT_ID(N'#DumpTable', N'U') IS NOT NULL DROP TABLE #DumpTable;

    SELECTpreagg.client_id,

    MAX(CASE WHEN preagg.sequence_id = 1 THEN preagg.max_telephone ELSE NULL END) AS LandLineNumber1,

    MAX(CASE WHEN preagg.sequence_id = 2 THEN preagg.max_telephone ELSE NULL END) AS LandLineNumber2,

    MAX(CASE WHEN preagg.sequence_id = 3 THEN preagg.max_telephone ELSE NULL END) AS LandLineNumber3

    INTO#DumpTable

    FROM

    (--==== Pre-aggregate the data. This will obviously work much better with the correct index

    SELECTclient_id, sequence_id, MAX(telephone) AS max_telephone

    FROMdbo.ClientContact

    GROUPBY

    client_id, sequence_id

    ) preagg

    GROUPBY

    preagg.client_id

    ORDERBY

    preagg.client_id

    OPTION(MAXDOP 1);

    END;

    GO

    -- PIVOT version

    CREATE PROCEDURE #PivotVersion AS

    BEGIN

    SET NOCOUNT ON;

    IFOBJECT_ID(N'#DumpTable', N'U') IS NOT NULL DROP TABLE #DumpTable;

    SELECTPVT.client_id,

    PVT.[1] AS [LandLineNumber1],

    PVT.[2] AS [LandLineNumber2],

    PVT.[3] AS [LandLineNumber3]

    INTO#DumpTable

    FROM(SELECT client_id, sequence_id, telephone FROM dbo.ClientContact) CC

    PIVOT(MAX(telephone) FOR sequence_id IN ([1],[2],[3])) PVT

    OPTION(MAXDOP 1);

    END;

    GO

    -- Show actual execution plans with row counts, in text form

    SET STATISTICS PROFILE ON;

    EXECUTE #CaseVersion;

    EXECUTE#PreAggVersion;

    EXECUTE #PivotVersion;

    GO

    SET STATISTICS PROFILE OFF;

    GO

    -- Clear cached plans

    DBCC FREESYSTEMCACHE('Object Plans');

    WAITFOR DELAY '00:00:05';

    GO

    -- Run each procedure five times

    EXECUTE #CaseVersion;

    GO 5

    EXECUTE#PreAggVersion;

    GO 5

    EXECUTE #PivotVersion;

    GO 5

    -- Results summary

    SELECT[text],

    execution_count,

    min_worker_time / 1000 AS min_worker_time_ms,

    max_worker_time / 1000 AS max_worker_time_ms,

    total_worker_time / execution_count / 1000 AS avg_worker_time_ms,

    total_elapsed_time / execution_count / 1000 AS avg_elapsed_time_ms,

    plan_generation_num,

    query_plan

    FROMsys.dm_exec_query_stats

    CROSS

    APPLYsys.dm_exec_sql_text ([sql_handle])

    CROSS

    APPLYsys.dm_exec_query_plan ([plan_handle])

    WHERE[text] LIKE '--%#%Version%'

    AND[text] NOT LIKE '-- Results summary%'

    ORDERBY

    creation_time ASC;

    GO

  • Hi Paul,

    thanx for the test scenario!

    When I ran your scenario (1 mill rows, 333333 client id's) on my laptop I got approx. 1.1sec for all three versions (CASE, aggregate and PIVOT).

    I extended it to 9 mill rows, 333333 client id's, ending up with 7.4sec for CASE and aggregate and 7.3 for PIVOT (I'd still call this a tie between the three...)

    After that I reviewed the table provided by Chris. (http://www.sqlservercentral.com/Forums/Attachment3400.aspx)

    I stumbled over the number of Accounts (9mill) vs. the total number of rows (12mill).

    Based on those numbers I think the pre-aggregation Jeff's example is using most probably won't reduce the number of rows to be processed significantly. It's still a great way to do in a scenario where you'd have several rows with identical Acct_Debtor and Occurance. Unfortunately, it doesn't look like the data distribution in this scenario would be like this.

    I then repeated your scenario with 9 mill rows , 3 mill client id's and it took 12sec CASE and aggregate and 11.8sec for PIVOT, which is still 10 times faster compared to what Chris is getting.

    I'm not sure how the data are distributed over Acct_Debtor and Occurance, but I assume we have one row per client_id with sequence_id = 1.

    If that's the case, it may help to eliminate those 9mill rows from the pivot and just work with the remaining rows (code can be included in your test scenario):

    CREATE PROCEDURE #seq1Version AS

    BEGIN

    SET NOCOUNT ON;

    IFOBJECT_ID(N'#DumpTable', N'U') IS NOT NULL DROP TABLE #DumpTable;

    ;WITH seq1

    AS

    (

    SELECTclient_id AS client_id,

    telephone AS LandLineNumber1

    FROMdbo.ClientContact

    WHERE sequence_id = 1

    )

    ,

    seqNot1

    AS

    (

    SELECTclient_id AS client_id,

    MAX(CASE WHEN sequence_id = 2 THEN telephone ELSE NULL END) AS LandLineNumber2,

    MAX(CASE WHEN sequence_id = 3 THEN telephone ELSE NULL END) AS LandLineNumber3

    FROM dbo.ClientContact

    WHERE sequence_id > 1

    GROUP BY client_id

    )

    SELECTseq1.client_id AS client_id,

    seq1.LandLineNumber1 AS LandLineNumber1,

    seqNot1.LandLineNumber2 AS LandLineNumber2,

    seqNot1.LandLineNumber3 AS LandLineNumber3

    FROMseq1

    LEFT OUTER JOIN seqNot1 ON seq1.client_id = seqNot1.client_id

    When I ran this with 1mill rows and a data distribution of 900k client_id with seq_id=1 and 100k client_id with seq_id=2 I got 1.7sec for the three versions you evaluated and 0.9sec for the version above.

    -- generate 1mill rows with 900k row with sequence_id=1 and 100k rows with sequence_id=2

    ;WITHNumbers (client_id)

    AS(

    SELECTTOP (900000)

    'A' + RIGHT(1000000000 + ROW_NUMBER() OVER (ORDER BY C2.[object_id]), 9)

    FROMmaster.sys.columns C1, master.sys.columns C2

    )

    INSERTdbo.ClientContact

    (client_id, sequence_id, telephone)

    SELECTclient_id,

    1,

    Telephone.tel

    FROMNumbers

    CROSS

    APPLY(SELECT RIGHT(10000000000 + CONVERT(INT, RAND(CHECKSUM(NEWID())) * 123456789), 10)) Telephone (tel);

    ;WITHNumbers (client_id)

    AS(

    SELECTTOP (100000)

    'A' + RIGHT(1000000000 + ROW_NUMBER() OVER (ORDER BY C3.[object_id]), 9)

    FROMmaster.sys.columns C1, master.sys.columns C2, master.sys.columns C3

    )

    INSERTdbo.ClientContact

    (client_id, sequence_id, telephone)

    SELECTclient_id,

    2,

    Telephone.tel

    FROMNumbers

    CROSS

    APPLY(SELECT RIGHT(10000000000 + CONVERT(INT, RAND(CHECKSUM(NEWID())) * 123456789), 10)) Telephone (tel);

    So, I think if it's possible to move a large number of rows from the pivot to a separate subquery, the performance should improve.

    Edit: Note: the solution above will only be faster under special data distribution (in this scenario: more than 65% of the rows can be moved to the subquery). It should not be considered as a general solution.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Hey Lutz,

    My results were the same as yours; and I agree with your assessment.

    Clearly, in this case, and with this data distribution, there is nothing to choose between the three methods.

    PIVOT is a bit of an odd duck: I don't find the syntax particularly intuitive, it struggles with multiple aggregations, but it can produce more compact code, and produces the same basic plan as the CASE code (for basic tasks) leading to comparable performance in many instances.

    When I get a free moment I might take another look at some of the more complex examples there are floating about on this site, to see if the 'pre-aggregation' can be incorporated in the PIVOT method too. I use the quotes there since, in the plans I have seen so far, the aggregation still follows the evaluation of the CASE statement for each row (in the Compute Scalar) despite the way the query is written.

    Paul

  • Paul White (6/27/2009)


    Clearly, in this case, and with this data distribution, there is nothing to choose between the three methods.

    I'll probably stick to using the Cross-Tab code (one method or the other depending on the data but will usually be the pre-agg method) for a couple of reasons...

    1. Since it's a bit faster in the other cases, I don't want to get into the habit of writing PIVOTs.

    2. Since it's a bit faster in the other cases, I don't want other people to read my code and use a PIVOT I may have written for one of those other cases.

    3. Admittedly a personal preference, but I prefer the readability of Cross-Tabs over that of PIVOTs even for the simple stuff.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (6/27/2009)


    I'll probably stick to using the Cross-Tab code

    Me too. I'm really quite anti-PIVOT myself (as I have said at least once earlier on this thread) but I do like to give it a fair go whenever I come across it...and also, I didn't want to not post my test rig results just because they didn't prove what I wanted 🙂

    I'm mostly against PIVOT because it is such a missed opportunity (in terms of functionality and performance), has a clumsy syntax (despite Books Online's protestations to the contrary!) and the SUM...CASE construction which we have all used for many years does the job better anyway.

    As an example of my willingness to give PIVOT a 'fair go' take another look at my post in the discussion of your first article on this subject. That shows how to do a nested PIVOT (rather than a join) to achieve the same performance as the 'pre-aggregated' cross-tab. I don't make a big thing about that since the syntax is just so horrible.

    Also, I have just noticed that 2008 has broken the code I posted there. As written it produces:

    [font="Courier New"]Msg 8624, Level 16, State 21, Line x

    Internal Query Processor Error: The query processor could not produce a query plan. For more information, contact Customer Support Services.

    [/font]

    ...which is just fantastic :hehe:

    I think this is a bug - since the PIVOT consumes not just the column you would expect, but also any columns aliased to it. If you have a copy of 2008 lying around, the following illustrates this 'bug':

    DECLARE @T TABLE (A INT NOT NULL, B INT NOT NULL)

    INSERT@T (A, B)

    SELECT1, 1 UNION ALL

    SELECT2, 2 UNION ALL

    SELECT3, 3;

    WITHBase AS

    (SELECTA, A AS A2, B FROM @T)

    SELECTA2, [1], [2], [3]

    FROMBase

    PIVOT(SUM(B) FOR A IN ([1],[2],[3])) AS pvt

    Substituting CONVERT(INT, A) AS A2 for 'A AS A2' makes it work again :sigh:

    In fact, anything dumb no-op like 1 * A AS A2 also works.

    Optimizers eh?

    To 'unbreak' my nested-PIVOT code for 2008, change '[Quarter] AS Q2' to '1 *[Quarter] AS Q2', or similar.

    Paul

  • Paul White (6/27/2009)


    Clearly, in this case, and with this data distribution, there is nothing to choose between the three methods. ...

    Paul

    But how about the method I posted above (let's call it "Sub-Cross-Tab")? With the data distribution as shown the three options you compared take twice as long.

    This thread and Jeff's last post lead me to the the conclusion that I'll stop playing around with PIVOT. It just has been proofed that a database cannot be turned into Excel (where pivot works fine) 🙂

    Edit1: One more reason to stop using it: PIVOT cannot be used for dynamic number of columns. Dynamic Cross Tab can do it.

    Edit2: Have to partially retract my first edit note since it's simply not true. I'm sorry. :blush:



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • lmu92 (6/28/2009)


    Paul White (6/27/2009)


    Clearly, in this case, and with this data distribution, there is nothing to choose between the three methods. ...

    Paul

    But how about the method I posted above (let's call it "Sub-Cross-Tab")?

    That's why I stressed 'in this case, and with this data distribution' 😛

    lmu92 (6/28/2009)


    With the data distribution as shown the three options you compared take twice as long.

    Well yes - but then we aren't comparing apples with apples anymore. Your method is specifically optimized for the data distribution you set up - and it is faster simply because it doesn't do any aggregation at all for the 900K rows with sequence_id = 1. The number of rows which do go through the stream aggregate (100K instead of 1M) drops the cost enough so that introducing the extra table scan and merge join still results in a cheaper overall plan.

    The PIVOT can be optimized the same way, and runs 50ms faster on my laptop:

    CREATE PROCEDURE #PivotVersion2 AS

    BEGIN

    SET NOCOUNT ON;

    IFOBJECT_ID(N'#DumpTable', N'U') IS NOT NULL DROP TABLE #DumpTable;

    WITHseq1

    AS(

    SELECT client_id AS client_id,

    telephone AS LandLineNumber1

    FROM dbo.ClientContact

    WHEREsequence_id = 1

    )

    ,seqNot1

    AS(

    SELECTPVT.client_id,

    PVT.[1] AS [LandLineNumber1],

    PVT.[2] AS [LandLineNumber2],

    PVT.[3] AS [LandLineNumber3]

    FROM(SELECT client_id, sequence_id, telephone FROM dbo.ClientContact WHERE sequence_id > 1) CC

    PIVOT(MAX(telephone) FOR sequence_id IN ([1],[2],[3])) PVT

    )

    SELECT seq1.client_id AS client_id,

    seq1.LandLineNumber1 AS LandLineNumber1,

    seqNot1.LandLineNumber2 AS LandLineNumber2,

    seqNot1.LandLineNumber3 AS LandLineNumber3

    INTO#DumpTable

    FROMseq1

    LEFT

    OUTER

    JOINseqNot1

    ONseq1.client_id = seqNot1.client_id

    ORDERBY

    client_id

    OPTION(MAXDOP 1)

    END;

    lmu92 (6/28/2009)


    This thread and Jeff's last post lead me to the the conclusion that I'll stop playing around with PIVOT. It just has been proofed that a database cannot be turned into Excel (where pivot works fine) 🙂

    Edit: One more reason to stop using it: PIVOT cannot be used for dynamic number of columns. Dynamic Cross Tab can do it.

    Jeff, could you let me have the number of your handrail-installation team please? 😀

    PIVOT can be used with dynamic SQL, just the same as the CASE...SUM idea.

    Paul

  • Paul White (6/28/2009)


    ... Well yes - but then we aren't comparing apples with apples anymore. Your method is specifically optimized for the data distribution you set up - and it is faster simply because it doesn't do any aggregation at all for the 900K rows with sequence_id = 1. The number of rows which do go through the stream aggregate (100K instead of 1M) drops the cost enough so that introducing the extra table scan and merge join still results in a cheaper overall plan.

    You're definitely right: the solution I came up with is faster only under a special kind of data distribution. The break even is around 65% of the rows (if you can eliminate more than 65% of your rows you'll start being faster than with the standard pivot methods). My interpretation of the data Chris provided in one of his posts brought me to the idea to eliminate the largest amount of rows before even starting the pivot.

    So it is a solution modified for the special request of the OP in this thread, not a general solution. Sorry, if I've left that impression.

    My goal was to give Chris another option to play with to get his special case resolved. 😉

    I'll edit my original post to clarify.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Paul White (6/28/2009)


    lmu92 (6/28/2009)


    Paul White (6/27/2009)


    Clearly, in this case, and with this data distribution, there is nothing to choose between the three methods. ...

    Paul

    But how about the method I posted above (let's call it "Sub-Cross-Tab")?

    That's why I stressed 'in this case, and with this data distribution' 😛

    I might have been a little confused about what you where referring to as "this case , and with this data distribution".

    I thought you were talking about the scenario Chris described. But after reviewing your posts it becomes clear that you were talking about the test scenario you set up. Sorry for the misinterpretation and the confusion I caused. :Whistling:



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • lmu92 (6/28/2009)


    You're definitely right: the solution I came up with is faster only under a special kind of data distribution. The break even is around 65% of the rows (if you can eliminate more than 65% of your rows you'll start being faster than with the standard pivot methods). My interpretation of the data Chris provided in one of his posts brought me to the idea to eliminate the largest amount of rows before even starting the pivot.

    So it is a solution modified for the special request of the OP in this thread, not a general solution. Sorry, if I've left that impression.

    Hey Lutz,

    I make allowances for the NEAFLs :w00t: 😀 :w00t:

    It's all good. I just had to respond with the equivalent PIVOT code - for balance 😉

    Paul

  • What makes me wondering though is what Chris (the OP who started this thread in the first place) ended up with to tune his query... The subject he was talking about got a little out of focus, I think.

    I hope we didn't scare him away...

    @chris-2: Are you still with us??

    Edit: Btw: Congrats, Paul for being "TenCenturies" now!! :Wow: Well deserved!



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Viewing 15 posts - 16 through 30 (of 33 total)

You must be logged in to reply to this topic. Login to reply