Performance issue with tally solution

  • Hi Bob,

    I did a few more tests (beyond the earlier table) and on my systems, the double-tally acheives better than half CPU time at long executions (.e.g 146 second vs 321 seconds for 600,000 input rows). So you method scales better than the tally+charindex.

    The breakpoint on my system is about 25000 input records.

    Interestingly I found that below about 10000 records the double-tally was roughly linear in cpu/rows at about 0.9ms/row and above that it was linear with a slope of about 0.3ms/row. tally+charindex was roughly linear at 0.5 ms/row all the way.

    Looking at the actual execution plans, it seems that in the high range, SQL server includes an additional sort of the input and a table spool for the double-tally; apparently this speeds it up! Tally+charindex seems to use the same execution plan at all times.

    I assume the Flo's system is using the 2nd execution plan at lower inputs.

    Definitely an 'It Depends'. 🙂

    I've attached the execution plans for 8000 and 30000 rows input.

    Derek

  • Bob Hovious (4/23/2009)


    Thanks, Flo 🙂

    Derek, please don't feel you have to apologize. That needed to be tested. As usual, the answer is "It depends." The point is to understand what works best in which situations, and what the limitations are. That's why I warned everyone that the double-barreled solution was inflexible because it counted on a single character being the delimiter. That said, I see a lot of work with long strings consisting of lots of short elements separated by single character delimiters. Very seldom do I have to parse Moby Dick 😉

    I think calling the double-barreled solution 'inflexible' is probably harsh. It may just need a little more work to get it to handle multi-character delimiters.

    Of course, my tests show that, on my system and probably everyone else's, you need to match the method to the expected input.

    If I needed to parse 5000 records inputs I should use the tally+charindex method. If I needed to parse 500000 records, I should use the double-tally!

    I must find time to test the CLR variants...

    Derek

  • Derek Dongray (4/23/2009)


    Hi Bob,

    I did a few more tests (beyond the earlier table) and on my systems, the double-tally acheives better than half CPU time at long executions (.e.g 146 second vs 321 seconds for 600,000 input rows). So you method scales better than the tally+charindex.

    Not exactly what my findings were. Like you said, it definitely depends on the data. It's slow going for me because I've got so much other stuff going on, but I'll eventually get to publishing what I've found in the testing I've done.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • With respect to testing, and I apologize for not having done more, it seems there are at least three or four variables about the test data to consider:

    1. The number of rows in the table

    2. The average size of the data in the column to be parsed

    3. The average number of delimited strings to be parsed per row

    4. The average size of the strings being parsed (really the inverse of #3 above).

    I never thought about this before, but would the optimizer choose a different execution plan based on the characteristics (RAM, number of processors) of the server (or instance) it was running on?

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • I'd look at setting up a test DB here at work on our development server if everything could be put into a consolidated zip file. The server I have is dual quad-core x64 Intel processor system with 8 GB RAM.

    I'm just having problems finding everything on this thread to put it all together.

  • Bob Hovious (4/23/2009)


    With respect to testing, and I apologize for not having done more, it seems there are at least three or four variables about the test data to consider:

    1. The number of rows in the table

    2. The average size of the data in the column to be parsed

    3. The average number of delimited strings to be parsed per row

    4. The average size of the strings being parsed (really the inverse of #3 above).

    I never thought about this before, but would the optimizer choose a different execution plan based on the characteristics (RAM, number of processors) of the server (or instance) it was running on?

    I'd always been aware that all the above could affect the choice of execution plan, but usually found in practice that one plan was a clear winner in all situations so there was never any choice.

    This is the first example I've found where SQLserver changes the plan based simply on the number of rows input.

    Derek

  • Derek Dongray (4/24/2009)


    Bob Hovious (4/23/2009)


    With respect to testing, and I apologize for not having done more, it seems there are at least three or four variables about the test data to consider:

    1. The number of rows in the table

    2. The average size of the data in the column to be parsed

    3. The average number of delimited strings to be parsed per row

    4. The average size of the strings being parsed (really the inverse of #3 above).

    I never thought about this before, but would the optimizer choose a different execution plan based on the characteristics (RAM, number of processors) of the server (or instance) it was running on?

    I'd always been aware that all the above could affect the choice of execution plan, but usually found in practice that one plan was a clear winner in all situations so there was never any choice.

    This is the first example I've found where SQLserver changes the plan based simply on the number of rows input.

    I have seen one other situation where SQLServer changes the plan drastically - to the extent where a query can change from taking 10-15 minutes to complete down to seconds....

    All by the addition of a TOP clause, but not just any old TOP - oh no!

    I do quite a bit of work on a system for a customer where there are millions of records in various (fairly) normalised tables, so you are frequently joining two or more tables, each containing many millions of rows.

    It has happened on many occasions that a query takes minutes rather than seconds to complete.

    In these cases, you can normally increase the speed of the query by adding a TOP clause, however you will not see the speed increase for just any TOP clause and it is not something I have ever managed to work out a calculation for, but there will be a "Magic" number which will effect this change.

    An example might be that you have a query that returns only three rows, but the following speeds it up dramatically.

    SELECT TOP 10125 FROM blah blah blah

    While, this has no effect on speed at all.

    SELECT TOP 10124 FROM blah blah blah

    Now that is a discussion topic that could run and run!

    Anyone fancy trying it on the string splitting?

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Hi

    New information from splitter front... 🙂

    As first I have to apologize for all those "great" CLR test results... Since I tried to reduce the test results to the real split work I used a COUNT(*) instead of a INSERT INTO. This was okay to get the line count and ensure that the splitting was done. But the table valued CLR split functions work in two steps (first the real splitting, second just transfer the data to SSE) the test results have not been correct. As I already posted in anywhere in the medieval times of this thread, it seems that SQL Server uses reflection to call the methods for data access (lousy implementation!). The performance goes to the toilet if I just insert the data into a variable table. There are still some business cases where the CLR solutions are faster but not very much... Until this is fact and the lack of MARS support isn't fixed the usage of CLR is better for functions which handle more complex string manipulations.

    @Lynn

    There are currently two different test systems. The first is the complete test system for all functions posted in this thread. The second is focused to the new tally solutions. I attached two files:

    "TallyTests.txt"

    Is a stand alone SQL Script for the tally tests

    "AllPerformanceTests.zip"

    This is the complete performance test environment. It contains the following components:

    • DDL.sql" which creates all TSQL functions and a table called PerfMon and two procedures I use for my performance tests.
    • "ClrProject.zip" which is the Visual Studio project containing all CLR functions. It is a VS2008 C# project but you can also copy all functions into a VS2005 project.
    • "PerformanceTests.sql" is the script which can be used to execute all functions. It also creates the source test tables if not exist.

    If something is missing let me know 😉

    @Bob, Jeff and the rest of the tally masters

    This becomes a bit esoteric but as Bob said "if one tally is fast, two are faster"... If my UNION ALL dual-carburetor tally is fast and Bob's double-barreled tally solution is faster, a double-barreled dual-carburetor tally is even faster :-D. The source becomes quiet huge for a requirement to split strings, but is 10% faster and it handles the first and last element:

    PRINT '----=========================================================='

    PRINT '---- Double-barreled dual-carburetor Tally'

    DECLARE @delimiter CHAR(1)

    SELECT @delimiter = ','

    --SET STATISTICS IO ON

    SET STATISTICS TIME ON;

    SELECT rowNum, item

    INTO result

    FROM dbo.JBMTest s

    CROSS APPLY (

    SELECT

    SUBSTRING(s.CSV, 1, Z) item

    FROM (SELECT s.CSV) s2

    CROSS APPLY (

    SELECT TOP(1)

    ISNULL(NULLIF(t1.N, LEN(s.CSV) + 1), LEN(s.CSV)) AS Z

    FROM dbo.Tally t1

    WHERE

    SUBSTRING(s.CSV,t1.N,1) = @delimiter

    OR (t1.N = LEN(s.CSV) + 1)

    ORDER BY t1.N

    ) ca2

    UNION ALL

    SELECT

    SUBSTRING(s.CSV, t2.O, Z) item

    FROM dbo.Tally2 t2

    CROSS APPLY (

    SELECT TOP(1)

    ISNULL(NULLIF(t1.N - t2.O, LEN(s.CSV) + 1 - t2.O), LEN(s.CSV)) AS Z

    FROM dbo.Tally t1

    WHERE t1.N > t2.N

    AND (SUBSTRING(s.CSV,t1.N,1) = @delimiter

    OR (t1.N = LEN(s.CSV) + 1))

    ORDER BY t1.N

    ) ca2

    WHERE t2.N <= LEN(s.CSV)

    AND SUBSTRING(s.CSV,t2.N,1) = @delimiter

    ) l

    SET STATISTICS TIME OFF;

    SET STATISTICS IO OFF;

    GO

    Test results in combination with the other tally solutions

    ----==========================================================

    ---- UNION ALL solution

    SQL Server Execution Times:

    CPU time = 749 ms, elapsed time = 827 ms.

    (801000 row(s) affected)

    ----==========================================================

    ---- Traditional Tally solution

    SQL Server Execution Times:

    CPU time = 2434 ms, elapsed time = 2326 ms.

    (799000 row(s) affected)

    ----==========================================================

    ---- Bob Hovious double-barelled Tally

    SQL Server Execution Times:

    CPU time = 640 ms, elapsed time = 701 ms.

    (799000 row(s) affected)

    ----==========================================================

    ---- Double-barreled dual-carburetor Tally

    SQL Server Execution Times:

    CPU time = 577 ms, elapsed time = 564 ms.

    (801000 row(s) affected)

    The Article

    Sorry that the article is still pending, but it is quiet hard to write an article since all test results still changing... I didn't forget, I just have to finish all tests.

    Greets

    Flo

  • Hi mister.magoo

    mister.magoo (4/24/2009)


    In these cases, you can normally increase the speed of the query by adding a TOP clause, however you will not see the speed increase for just any TOP clause and it is not something I have ever managed to work out a calculation for, but there will be a "Magic" number which will effect this change.

    Thanks for your feedback! You are absolutely correct. There are many criteria for SQL Server to decide some "magic" things. I had even many of this cases in my tests for this thread...!

    Anyone fancy trying it on the string splitting?

    I already had this effects with my UNION ALL tally solution.

    Greets

    Flo

  • Florian Reischl (4/25/2009)


    Hi

    New information from splitter front... 🙂

    As first I have to apologize for all those "great" CLR test results... Since I tried to reduce the test results to the real split work I used a COUNT(*) instead of a INSERT INTO. This was okay to get the line count and ensure that the splitting was done. But the table valued CLR split functions work in two steps (first the real splitting, second just transfer the data to SSE) the test results have not been correct. As I already posted in anywhere in the medieval times of this thread, it seems that SQL Server uses reflection to call the methods for data access (lousy implementation!). The performance goes to the toilet if I just insert the data into a variable table. There are still some business cases where the CLR solutions are faster but not very much... Until this is fact and the lack of MARS support isn't fixed the usage of CLR is better for functions which handle more complex string manipulations.

    @Lynn

    There are currently two different test systems. The first is the complete test system for all functions posted in this thread. The second is focused to the new tally solutions. I attached two files:

    "TallyTests.txt"

    Is a stand alone SQL Script for the tally tests

    "AllPerformanceTests.zip"

    This is the complete performance test environment. It contains the following components:

    • DDL.sql" which creates all TSQL functions and a table called PerfMon and two procedures I use for my performance tests.
    • "ClrProject.zip" which is the Visual Studio project containing all CLR functions. It is a VS2008 C# project but you can also copy all functions into a VS2005 project.
    • "PerformanceTests.sql" is the script which can be used to execute all functions. It also creates the source test tables if not exist.

    If something is missing let me know 😉

    @Bob, Jeff and the rest of the tally masters

    This becomes a bit esoteric but as Bob said "if one tally is fast, two are faster"... If my UNION ALL dual-carburetor tally is fast and Bob's double-barreled tally solution is faster, a double-barreled dual-carburetor tally is even faster :-D. The source becomes quiet huge for a requirement to split strings, but is 10% faster and it handles the first and last element:

    PRINT '----=========================================================='

    PRINT '---- Double-barreled dual-carburetor Tally'

    DECLARE @delimiter CHAR(1)

    SELECT @delimiter = ','

    --SET STATISTICS IO ON

    SET STATISTICS TIME ON;

    SELECT rowNum, item

    INTO result

    FROM dbo.JBMTest s

    CROSS APPLY (

    SELECT

    SUBSTRING(s.CSV, 1, Z) item

    FROM (SELECT s.CSV) s2

    CROSS APPLY (

    SELECT TOP(1)

    ISNULL(NULLIF(t1.N, LEN(s.CSV) + 1), LEN(s.CSV)) AS Z

    FROM dbo.Tally t1

    WHERE

    SUBSTRING(s.CSV,t1.N,1) = @delimiter

    OR (t1.N = LEN(s.CSV) + 1)

    ORDER BY t1.N

    ) ca2

    UNION ALL

    SELECT

    SUBSTRING(s.CSV, t2.O, Z) item

    FROM dbo.Tally2 t2

    CROSS APPLY (

    SELECT TOP(1)

    ISNULL(NULLIF(t1.N - t2.O, LEN(s.CSV) + 1 - t2.O), LEN(s.CSV)) AS Z

    FROM dbo.Tally t1

    WHERE t1.N > t2.N

    AND (SUBSTRING(s.CSV,t1.N,1) = @delimiter

    OR (t1.N = LEN(s.CSV) + 1))

    ORDER BY t1.N

    ) ca2

    WHERE t2.N <= LEN(s.CSV)

    AND SUBSTRING(s.CSV,t2.N,1) = @delimiter

    ) l

    SET STATISTICS TIME OFF;

    SET STATISTICS IO OFF;

    GO

    Test results in combination with the other tally solutions

    ----==========================================================

    ---- UNION ALL solution

    SQL Server Execution Times:

    CPU time = 749 ms, elapsed time = 827 ms.

    (801000 row(s) affected)

    ----==========================================================

    ---- Traditional Tally solution

    SQL Server Execution Times:

    CPU time = 2434 ms, elapsed time = 2326 ms.

    (799000 row(s) affected)

    ----==========================================================

    ---- Bob Hovious double-barelled Tally

    SQL Server Execution Times:

    CPU time = 640 ms, elapsed time = 701 ms.

    (799000 row(s) affected)

    ----==========================================================

    ---- Double-barreled dual-carburetor Tally

    SQL Server Execution Times:

    CPU time = 577 ms, elapsed time = 564 ms.

    (801000 row(s) affected)

    The Article

    Sorry that the article is still pending, but it is quiet hard to write an article since all test results still changing... I didn't forget, I just have to finish all tests.

    Greets

    Flo

    Awesome work Flo. Thank you for taking the time to put everything together like this as well. I'll look at setting up a test DB here at home and on my server dev server at work. The string splitting is definately something I am interested in as we are working on a project that requires this, and the developer has at this time implemented a tally version based on the previous work done on SSC prior to this thread. If we can find a more performant solution, that is a bonus.

  • Hey Flo!

    These new tally solutions are only fast if all the rows in dbo.JBMTest are identical.

    The QO can tell from the string statistics on the CSV column that there is only one distinct row, so it uses a lazy table spool to hold the split results from the first row, in the expectation that all following rows will be the same. If they are the same, the entire result set is served from the results from the first row (the table spool). If not, additional splitting is done.

    In the case where all rows are unique, the QO dispenses with the table spool optimization, and the split takes ~20s on my laptop, up from ~1s.

    To see the difference, may I suggest you re-run a test with:CSV = REPLACE(@CSV, 'Part-', 10000 + ROW_NUMBER() OVER (ORDER BY (SELECT 1))) instead of [font="Courier New"]CSV = @CSV[/font] in the script which creates dbo.JBMTest?

    The previous routines (including the CLR routines) split every row individually.

    Naturally, if the requirement is to split the same string many, many times, one should of course use the tally routines :laugh:

    I am sorry it took me so long to notice this.

    Cheers,

    Paul

    edited for typo - and I previewed and everything... :sigh:

  • As first I have to apologize for all those "great" CLR test results... Since I tried to reduce the test results to the real split work I used a COUNT(*) instead of a INSERT INTO. This was okay to get the line count and ensure that the splitting was done. But the table valued CLR split functions work in two steps (first the real splitting, second just transfer the data to SSE) the test results have not been correct.

    I agree that it makes a difference to the CLR results, but it is not a huge difference. It certainly seems to put the while loop even more in the lead, though I have yet to test the double-barreled tally-technique. Sorry that I haven't published my test harness but I keep refining and changing it! Also, it is getting a bit long!

    Best wishes,
    Phil Factor

  • Phil Factor (4/25/2009)


    It certainly seems to put the while loop even more in the lead...

    Which one, Phil? :unsure:

      [dbo].[fnParseList]

      [dbo].[ufn_SplitLines_Cursor]

      [dbo].[WhileSplit]

    All seem to feature a while loop - those are from Flo's DDL.sql file.

    Paul

  • Hey Paul!

    Paul White (4/25/2009)


    Hey Flo!

    These new tally solutions are only fast if all the rows in dbo.JBMTest are identical.

    The QO can tell from the string statistics on the CSV column that there is only one distinct row, so it uses a lazy table spool to hold the split results from the first row, in the expectation that all following rows will be the same. If they are the same, the entire result set is served from the results from the first row (the table spool). If not, additional splitting is done.

    In the case where all rows are unique, the QO dispenses with the table spool optimization, and the split takes ~20s on my laptop, up from ~1s.

    To see the difference, may I suggest you re-run a test with:CSV = REPLACE(@CSV, 'Part-', 10000 + ROW_NUMBER() OVER (ORDER BY (SELECT 1))) instead of [font="Courier New"]CSV = @CSV[/font] in the script which creates dbo.JBMTest?

    The previous routines (including the CLR routines) split every row individually.

    Naturally, if the requirement is to split the same string many, many times, one should of course use the tally routines :laugh:

    *ARGH!*

    This starts to make me crazy... :crazy:

    Thank you for this very important information! New test results are awful:

    ----==========================================================

    ---- UNION ALL solution

    (0 row(s) affected)

    SQL Server Execution Times:

    CPU time = 6318 ms, elapsed time = 6255 ms.

    (801000 row(s) affected)

    ----==========================================================

    ---- Traditional Tally solution

    SQL Server Execution Times:

    CPU time = 5554 ms, elapsed time = 5506 ms.

    (799000 row(s) affected)

    ----==========================================================

    ---- Bob Hovious double-barelled Tally

    SQL Server Execution Times:

    CPU time = 17222 ms, elapsed time = 16751 ms.

    (799000 row(s) affected)

    ----==========================================================

    ---- Double-barreled dual-carburetor Tally

    SQL Server Execution Times:

    CPU time = 17269 ms, elapsed time = 16855 ms.

    (801000 row(s) affected)

    I really never thought that string splitting can become a problem like this...

    Thanks again!

    Flo

  • Hi Phil

    Phil Factor (4/25/2009)


    I agree that it makes a difference to the CLR results, but it is not a huge difference.

    Sorry, but I disagree. I just tried some of the functions with and without data access:

    Without Data access

    Sorting Description Duration(ms) CpuTime LogicalReads PhysicalReads

    -------- ----------------------------- ------------ ----------- ------------ -------------

    Large CLR tvf (Paul White) solution 93 94 7322 0

    Large CLR RegEx solution 130 140 7322 0

    Large CLR Chars solution 346 359 7322 0

    Large CLR XML Solution 853 702 4276 58

    Many CLR tvf (Paul White) solution 200 234 6929 0

    Many CLR RegEx solution 236 234 6929 0

    Many CLR XML Solution 1056 577 5213 65

    Many CLR Chars solution 1173 1248 6929 0

    MobyDick CLR Chars solution 190 124 7534 10

    Monster CLR Chars solution 33 16 2464 0

    Monster CLR tvf (Paul White) solution 163 31 2464 35

    Short CLR tvf (Paul White) solution 33 31 180 0

    Short CLR RegEx solution 36 32 180 0

    Short CLR Chars solution 126 171 180 0

    Short CLR XML Solution 206 141 188 8

    With data access

    Sorting Description Duration(ms) CpuTime LogicalReads PhysicalReads

    -------- ----------------------------- ------------ ----------- ------------ -------------

    Large CLR RegEx solution 1096 1139 399756 0

    Large CLR tvf (Paul White) solution 1106 1045 399728 0

    Large CLR XML Solution 1146 1186 91872 0

    Large CLR Chars solution 1330 1357 401046 0

    Many CLR RegEx solution 3266 3292 1045886 0

    Many CLR tvf (Paul White) solution 3276 3307 1044632 0

    Many CLR XML Solution 3443 3448 296491 0

    Many CLR Chars solution 4083 4057 1044563 0

    MobyDick CLR Chars solution 456 499 160488 0

    Monster CLR Chars solution 170 156 66444 0

    Monster CLR tvf (Paul White) solution 170 156 67130 0

    Short CLR tvf (Paul White) solution 390 405 138450 0

    Short CLR RegEx solution 396 405 138672 0

    Short CLR XML Solution 440 436 32935 0

    Short CLR Chars solution 473 468 138923 0

    I tried the same directly in a C# application for fun. It is even faster to connect to server, get all data to client, and split all data...

    Greets

    Flo

  • Viewing 15 posts - 196 through 210 (of 522 total)

    You must be logged in to reply to this topic. Login to reply