Performance issue with tally solution

  • Paul White (5/12/2009)


    Do you get parallelism with Jeff's solution? I would think that would help a lot.

    T-SQL UDFs always generate a serial plan.

    I've run into that before, and had to create a temporary table with the results of the UDF to get things to run fast enough. (Even a CTE against the UDF wasn't enough.)

    I'll get back to you later on Jeff's solution after I have had a chance to do the testing.

  • UMG, did you try the cursor?

    Greets

    Flo

  • >> Do you get parallelism with Jeff's solution? I would think that would help a lot.

    >> T-SQL UDFs always generate a serial plan.

    Is that for SELECTs or just for modification queries?? Got any reference for that statement please?

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • UMG Developer (5/12/2009)


    Jeff,

    Now you've essentially got my roll it up and then split it method that takes ~2 minutes for the roll up, and ~16 minutes for the split. I do have to use the .VALUE trick with the FOR XML to avoid the entitization. (I know I should have posted my code.)

    Not sure what you mean. Does that mean you tried it and it worked? Also, yes... please post your code especially if you used the ".VALUE trick" because I didn't have to use it. Good thing, too, because I don't actually know what the heck you're talking about. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • David Burrows (5/12/2009)


    Nice one Jeff 😎

    Thanks, David. I didn't understand the problem until I saw the test data (funny how important that is :-D). Thought we could do it with Bulk Insert based on the verbal description but the test data flash burned that idea.

    They could use the code in the CTE to build a Temp Table and index it specifically for "order" of the split but it should be fairly fast as a CTE, as well. Uses Index Seeks and all. Pick your favorite splitter for use below that.

    If anyone knows what the ".VALUE trick" that UMG was talking about is, I'm all ears, thank you. :w00t:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • UMG Developer (5/12/2009)


    Jeff Moden (5/11/2009)


    Once you recognize the "pattern", you don't need all the ancillary conditional data moves and deletes. The "pattern" is that if it's NOT a continuation line, add a leading delimiter to the concatenation by CommentID in order by the CommentPartNum.

    Jeff,

    Now you've essentially got my roll it up and then split it method that takes ~2 minutes for the roll up, and ~16 minutes for the split. I do have to use the .VALUE trick with the FOR XML to avoid the entitization. (I know I should have posted my code.)

    Also... that does seem to be a bit faster than stopping it after 2 hours and some. Why not just use the rollup code I used in my solution and pick whatever splitter you want (Flo's "Cursor" solution is nasty fast) instead of fighting with the machine on the other solution where you have to move and delete things?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • TheSQLGuru (5/12/2009)


    >> T-SQL UDFs always generate a serial plan.

    Is that for SELECTs or just for modification queries?? Got any reference for that statement please?

    All T-SQL UDFs under all circumstances 🙂

    Sure. See page 35 in the excellent: http://blogs.msdn.com/craigfr/archive/2007/04/17/parallel-query-execution-presentation.aspx

    The first comment from Craig reflects the change he made to the presentation following my feedback to him.

  • Jeff Moden (5/12/2009)


    If anyone knows what the ".VALUE trick" that UMG was talking about is, I'm all ears, thank you. :w00t:

    I believe he is referring to the (...FOR XML PATH('')).value('./[1]') nastiness. 😉

    Embed a non-printable character to blow this one up.

  • Jeff Moden (5/12/2009)If anyone knows what the ".VALUE trick" that UMG was talking about is, I'm all ears, thank you.

    When you are using FOR XML to do string concatention, there is a problem called "Entitization" where it turns certain characters like ">" into "& gt;". The ".VALUE" technique is one of several ways way to undo this, but adds 20-30% overhead.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Florian Reischl (5/12/2009)


    Here is the cursor I actually used in my tests. I compared it with all others but it seems to be the fastest.

    No parallelism, processor stayed at 13% (100% on 1 of the 8 cores) and I killed it after 4 hours with this code:

    [font="Courier New"]DECLARE @Delimiter CHAR(1);

    SELECT @Delimiter = '|';

    SELECT c.Tar_ID,

        ca.*

    INTO CommentLinesFromUDF

    FROM Comments c

       CROSS APPLY dbo.ufn_SplitString_Cursor_VM8(c.Comment, @Delimiter) ca;

    [/font]

    Did I do something wrong in calling it? Is there maybe a different way I should call the function?

    (I did modified the function slightly by adding an identity column to the output table so that I got the line number, but that shouldn't have made it take that long should it have?)

  • Paul White (5/12/2009)


    I believe he is referring to the (...FOR XML PATH('')).value('./[1]') nastiness.

    Embed a non-printable character to blow this one up.

    RBarryYoung (5/12/2009)


    When you are using FOR XML to do string concatention, there is a problem called "Entitization" where it turns certain characters like ">" into "& gt;". The ".VALUE" technique is one of several ways way to undo this, but adds 20-30% overhead.

    Ooooh... THAT problem. Didn't know what it was actually called and didn't know .VALUE would fix it (with the overhead mentioned). Thanks guys.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (5/11/2009)


    Once you recognize the "pattern", you don't need all the ancillary conditional data moves and deletes. The "pattern" is that if it's NOT a continuation line, add a leading delimiter to the concatenation by CommentID in order by the CommentPartNum.

    No parallelism, processor stayed at 13% (100% on 1 of the 8 cores) for just under 21 minutes to create the 9,177,458 rows. (Not counting the FOR XML combining since I created a table of the comments all rolled together for use in multiple tests.)

    The .VALUE trick is to deal with special characters like &, in the source text, here is the roll up code I ended up using:

    [font="Courier New"]DECLARE @Delimiter CHAR(1);

    SELECT @Delimiter = '|';

    SELECT

       t1.CommentID,

       (SELECT

           CASE CommentCont

               WHEN 'N' THEN @Delimiter

               ELSE ''

           END + CAST(t2.CommentPart AS VARCHAR(MAX))

       FROM dbo.Comments t2

       WHERE t2.CommentID = t1.CommentID   --- must match GROUP BY below

       ORDER BY t2.CommentPartNum

           FOR XML PATH(''), TYPE).value('.[1]','varchar(max)') + @Delimiter AS Comment

    INTO dbo.cteComments

    FROM dbo.Comments t1

    GROUP BY t1.CommentID;

    [/font]

    I re-did your example a little bit to get the parallelism to kick in and it ran in just over 14 minutes to create the 9,177,458 rows:

    [font="Courier New"]DECLARE @Delimiter CHAR(1);

    SELECT @Delimiter = '|';

    SELECT

       TAR_ID,

       ROW_NUMBER() OVER (PARTITION BY TAR_ID ORDER BY t.n) AS Line,

       SUBSTRING(s.Comment, t.N +1, CHARINDEX(@delimiter, s.Comment, t.N +1) -t.N -1) AS CommentLine

    INTO dbo.CommentLines3

    FROM dbo.cteComments s

       CROSS JOIN dbo.Tally t

    WHERE t.N < LEN(s.Comment)

       AND SUBSTRING(s.Comment, t.N, 1) = @delimiter;

    [/font]

    So I think in my case we have a winner and it is the Tally table, unless someone has something else I can try. (I know I don't have CLR on the server, so I can't try that solution with my large data set.)

  • UMG,

    So is the parallel tally fast enough for you? Seems fairly reasonable to me.

    Try: exec sp_configure 'clr_enabled', 1; reconfigure;

    ...some day 🙂

    It's that hard.

    Paul

  • Paul White (5/12/2009)


    Jeff Moden (5/12/2009)


    If anyone knows what the ".VALUE trick" that UMG was talking about is, I'm all ears, thank you. :w00t:

    I believe he is referring to the (...FOR XML PATH('')).value('./[1]') nastiness. 😉

    Embed a non-printable character to blow this one up.

    Paul,

    Non-printable characters blow up which method up? They appear to work just fine with the method I used. (For CR, LF, and TAB anyhow.)

  • Paul White (5/12/2009)


    UMG,

    So is the parallel tally fast enough for you? Seems fairly reasonable to me.

    Try: exec sp_configure 'clr_enabled', 1; reconfigure;

    ...some day 🙂

    It's that hard.

    Yes, I think the tally solution is really fast... But nothing can ever be too fast can it?

    I never said that technically it was hard to turn CLR on, but I don't get to just turn things on...

Viewing 15 posts - 451 through 465 (of 522 total)

You must be logged in to reply to this topic. Login to reply