CTE performance

  • BTW... the code will suck for performance because we tried to build in the code for the Tally table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • This will run like greased lightning because we're NOT calculating the Tally table on the fly...

    WITH

    ctePreNumber AS

    ( --=== Add a row number to each "a" so we can repivot after the split

    SELECT ROW_NUMBER() OVER (ORDER BY a.a) AS RowNum,

    a.a AS StringToSplit

    FROM #a AS a

    )

    ,

    cteSplit AS

    ( --=== Obviously, this does the split

    SELECT pn.RowNum,

    SUBSTRING(pn.StringToSplit,N+1,CHARINDEX('.',pn.StringToSplit,N+1)-N-1) AS SplitData,

    ROW_NUMBER() OVER (PARTITION BY pn.StringToSplit ORDER BY N) AS Level

    FROM ctePreNumber AS pn

    CROSS JOIN dbo.Tally t

    -- ( --=== This makes for an inline Tally table

    -- SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N

    -- FROM Master.sys.All_Columns

    -- ) t

    WHERE t.N < LEN(pn.StringToSplit)

    AND SUBSTRING(pn.StringToSplit, t.N, 1) = '.'

    ) --=== This is a CrossTab to replace the Pivot. CrossTabs are faster than Pivots

    SELECT MAX(CASE WHEN Level = 1 THEN SplitData ELSE NULL END) AS Column1,

    MAX(CASE WHEN Level = 2 THEN SplitData ELSE NULL END) AS Column2,

    MAX(CASE WHEN Level = 3 THEN SplitData ELSE NULL END) AS Column3,

    MAX(CASE WHEN Level = 4 THEN SplitData ELSE NULL END) AS Column4,

    MAX(CASE WHEN Level = 5 THEN SplitData ELSE NULL END) AS Column5

    FROM cteSplit

    GROUP BY RowNum

    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thank you, thank you so much, it did work with #a I'll try with #Hie too. I'll keep you posted on it. Once I am done with the deployment I'll continue with this approach to make it generic enough for N levels.

    B/w, I'll say your 'Hi' to ' our friend ' πŸ˜‰ πŸ˜›

    Thanks

    ~ Kazim Raza

  • kazim.raza (5/9/2010)


    Thank you, thank you so much, it did work with #a I'll try with #Hie too. I'll keep you posted on it. Once I am done with the deployment I'll continue with this approach to make it generic enough for N levels.

    B/w, I'll say your 'Hi' to ' our friend ' πŸ˜‰ πŸ˜›

    Thanks

    ~ Kazim Raza

    Heh... you're welcome. Ask him/her if they like pork chops.

    Now, before you get carried away... there's a simple little bit of dynamic SQL that will carry this out "N" levels. Take a look at the following article... you may have to store the output of the CTE's (NOT the final select that does the pivot) in a temp table and then build the final select dynamically. You also see why I like CrossTabs a whole lot more than Pivot. Here's the article

    http://www.sqlservercentral.com/articles/Crosstab/65048/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Oh yeah... almost forgot... it would be the easiest to build a permanent Tally table for this one so we don't have to figure out how to keep the Table Spool that showed up away. With the Tally table, we have lines on the execution plan that are only 15 wide for the #a example. Using the code you previously used or even one of Itzik's CTE Tally "tables" will generate a Table Spool with all the rows generated by either just because of the nature of the join here.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Um; I had dynamic SQL in my mind after a glimpse of this query; however I'll go through the article and get back to you on this. I'll go catch a nap real quick before I go to work in couple hours.

    Thanks again mate; really appreciate.

    ~ Kazim Raza

  • It ain't working on #HIE πŸ™ either its too much of data or what.. it keeps executing freezing my notebook :S

  • kazim.raza (5/9/2010)


    It ain't working on #HIE πŸ™ either its too much of data or what.. it keeps executing freezing my notebook :S

    Are you using a real Tally table like I suggested?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • This is what I have done; plugged the tally from the snippet for #a;

    ;WITH

    ctePreNumber AS

    ( --=== Add a row number to each "a" so we can repivot after the split

    SELECT ROW_NUMBER() OVER (ORDER BY a.FullyQualifiedName) AS RowNum,

    a.FullyQualifiedName AS StringToSplit

    FROM #Hie AS a

    where LEN(a.FullyQualifiedName) > 0

    )

    ,

    cteSplit AS

    ( --=== Obviously, this does the split

    SELECT pn.RowNum,

    SUBSTRING(pn.StringToSplit,N+1,CHARINDEX('.',pn.StringToSplit,N+1)-N-1) AS SplitData,

    ROW_NUMBER() OVER (PARTITION BY pn.StringToSplit ORDER BY N) AS Level

    FROM ctePreNumber AS pn,

    ( --=== This makes for an inline Tally table

    select top 1000 row_number() over (order by id) N

    from syscolumns

    ) t

    WHERE t.N < LEN(pn.StringToSplit)

    AND SUBSTRING(pn.StringToSplit, t.N, 1) = '.'

    ) --=== This is a CrossTab to replace the Pivot. CrossTabs are faster than Pivots

    SELECT MAX(CASE WHEN Level = 1 THEN SplitData ELSE NULL END) AS Column1,

    MAX(CASE WHEN Level = 2 THEN SplitData ELSE NULL END) AS Column2,

    MAX(CASE WHEN Level = 3 THEN SplitData ELSE NULL END) AS Column3,

    MAX(CASE WHEN Level = 4 THEN SplitData ELSE NULL END) AS Column4,

    MAX(CASE WHEN Level = 5 THEN SplitData ELSE NULL END) AS Column5

    FROM cteSplit

    GROUP BY RowNum

    ;

    another thing; If I add a RowNum by identity or row_number in #Hie itself it would eliminate the first CTE and I can pass it directly to the 2nd one; please suggest?

    Thanks,

    ~ Kazim Raza

  • Jeff Moden (5/8/2010)


    <snip>

    Fifth... do a SELECT/INTO to build the temp table instead of building it ahead of time. It will be much faster because it won't log each row.

    <snip>

    Is that really not logged? Or is it not logged only if the database is in in the Bulk-logged Recovery Model?

    I always thought it was ONLY if the DB was in Bulk-logged mode. Perhaps I was wrong or I'm thinking about an older version of SQL?

    On a personal note, I always considered that form a bit lazy. But, if it is not logged then it might be worth using. πŸ™‚

  • Lamprey13 (5/10/2010)


    Jeff Moden (5/8/2010)


    <snip>

    Fifth... [font="Arial Black"]do a SELECT/INTO to build the temp table [/font]instead of building it ahead of time. It will be much faster because it won't log each row.

    <snip>

    Is that really not logged? Or is it not logged only if the database is in in the Bulk-logged Recovery Model?

    I always thought it was ONLY if the DB was in Bulk-logged mode. Perhaps I was wrong or I'm thinking about an older version of SQL?

    On a personal note, I always considered that form a bit lazy. But, if it is not logged then it might be worth using. πŸ™‚

    TempDB is in the SIMPLE recovery mode. πŸ˜‰ And lazy is good... especially when it's as fast as it is.

    SELECT/INTO is also faster even in a FULL recovery mode database.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • kazim.raza (5/10/2010)


    This is what I have done; plugged the tally from the snippet for #a;

    ;WITH

    ctePreNumber AS

    ( --=== Add a row number to each "a" so we can repivot after the split

    SELECT ROW_NUMBER() OVER (ORDER BY a.FullyQualifiedName) AS RowNum,

    a.FullyQualifiedName AS StringToSplit

    FROM #Hie AS a

    where LEN(a.FullyQualifiedName) > 0

    )

    ,

    cteSplit AS

    ( --=== Obviously, this does the split

    SELECT pn.RowNum,

    SUBSTRING(pn.StringToSplit,N+1,CHARINDEX('.',pn.StringToSplit,N+1)-N-1) AS SplitData,

    ROW_NUMBER() OVER (PARTITION BY pn.StringToSplit ORDER BY N) AS Level

    FROM ctePreNumber AS pn,

    ( --=== This makes for an inline Tally table

    select top 1000 row_number() over (order by id) N

    from syscolumns

    ) t

    WHERE t.N < LEN(pn.StringToSplit)

    AND SUBSTRING(pn.StringToSplit, t.N, 1) = '.'

    ) --=== This is a CrossTab to replace the Pivot. CrossTabs are faster than Pivots

    SELECT MAX(CASE WHEN Level = 1 THEN SplitData ELSE NULL END) AS Column1,

    MAX(CASE WHEN Level = 2 THEN SplitData ELSE NULL END) AS Column2,

    MAX(CASE WHEN Level = 3 THEN SplitData ELSE NULL END) AS Column3,

    MAX(CASE WHEN Level = 4 THEN SplitData ELSE NULL END) AS Column4,

    MAX(CASE WHEN Level = 5 THEN SplitData ELSE NULL END) AS Column5

    FROM cteSplit

    GROUP BY RowNum

    ;

    another thing; If I add a RowNum by identity or row_number in #Hie itself it would eliminate the first CTE and I can pass it directly to the 2nd one; please suggest?

    Thanks,

    ~ Kazim Raza

    You're using the "on-the-fly" method instead of the permanent Tally table I recommended with code previously. Look for the post where I start the post by saying "This will run like greased lightning because we're NOT calculating the Tally table on the fly..." and the other post where I say "BTW... the code will suck for performance because we tried to build in the code for the Tally table."

    And, you are correct. The row number in the #HEI table eliminates the need for building a separate row number.

    I'll take a closer look at this tonight after I get home from work.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (5/10/2010)


    Lamprey13 (5/10/2010)


    Jeff Moden (5/8/2010)


    <snip>

    Fifth... [font="Arial Black"]do a SELECT/INTO to build the temp table [/font]instead of building it ahead of time. It will be much faster because it won't log each row.

    <snip>

    Is that really not logged? Or is it not logged only if the database is in in the Bulk-logged Recovery Model?

    I always thought it was ONLY if the DB was in Bulk-logged mode. Perhaps I was wrong or I'm thinking about an older version of SQL?

    On a personal note, I always considered that form a bit lazy. But, if it is not logged then it might be worth using. πŸ™‚

    TempDB is in the SIMPLE recovery mode. πŸ˜‰ And lazy is good... especially when it's as fast as it is.

    SELECT/INTO is also faster even in a FULL recovery mode database.

    That's why I asked because TempDB is in SIMPLE not Bulk-logged. And my recolection is that only in a bulk-logged recovery model does sql perform that operation non-logged.

    I'll have to do some investigation.. πŸ™‚ </end thread hijack>

    EDIT: I found a snippet on MSDN (wierd how much you can find when you search for it.. :-D)

    FROM MSDN:

    The amount of logging for SELECT...INTO depends on the recovery model in effect for the database. Under the simple recovery model or bulk-logged recovery model, bulk operations are minimally logged. With minimal logging, using the SELECT… INTO statement can be more efficient than creating a table and then populating the table with an INSERT statement. For more information, see Operations That Can Be Minimally Logged.

  • Ah... sorry, Lamprey... I misunderstood what you were asking. I didn't realize that you didn't know that both SIMPLE and BULK LOGGED modes allow for minimal logging. I thought you missed the fact that we were doing the SELECT INTO into TempDB and not a FULL recovery database. My apologies.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff; I added RowNum in #Hie itself and removed the first CTE, works well within just under 2 minutes, Thumbs up SIR; just one last thing, you did mention it earlier though... Duplicate OrgEntityIDs go crazy

    πŸ™

    Level1 Level2 Level3 Level4 Level5 EntityID

    HONDA NULL NULL HONDA 07HONDA 07F78985

    NULL HONDAHONDA 07NULL NULL 78985

    How can I fix this?

    ;with cteSplit AS

    ( --=== Obviously, this does the split

    SELECT pn.*,

    SUBSTRING(pn.StringToSplit,N+1,CHARINDEX('.',pn.StringToSplit,N+1)-N-1) AS SplitData,

    ROW_NUMBER() OVER (PARTITION BY pn.StringToSplit ORDER BY N) AS Level

    FROM (select #Hie.EntityName, #Hie.FullyQualifiedName, #Hie.FullyQualifiedLevel, #Hie.Level OrgLevle,

    #Hie.OrgEntityID, #Hie.OrgEntityType, #Hie.RowNum, FullyQualifiedName AS StringToSplit from #Hie /*where RowNum < 101*/) AS pn

    CROSS JOIN tempdb.dbo.Tally t

    WHERE t.N < LEN(pn.StringToSplit)

    AND SUBSTRING(pn.StringToSplit, t.N, 1) = '.'

    ) --=== This is a CrossTab to replace the Pivot. CrossTabs are faster than Pivots

    SELECT MAX(CASE WHEN Level = 1 THEN SplitData ELSE NULL END) AS Level1,

    MAX(CASE WHEN Level = 2 THEN SplitData ELSE NULL END) AS Level2,

    MAX(CASE WHEN Level = 3 THEN SplitData ELSE NULL END) AS Level3,

    MAX(CASE WHEN Level = 4 THEN SplitData ELSE NULL END) AS Level4,

    MAX(CASE WHEN Level = 5 THEN SplitData ELSE NULL END) AS Level5

    ,OrgEntityID--, FullyQualifiedName, OrgEntityType

    FROM cteSplit

    GROUP BY RowNum, OrgEntityID--, FullyQualifiedName, OrgEntityID, OrgEntityType

    order by OrgEntityID

    ;

    thanks again... I can't thank you enough for this!

    ~Kazim Raza

Viewing 15 posts - 31 through 45 (of 64 total)

You must be logged in to reply to this topic. Login to reply