Performance issue with tally solution

Question

Post reply

Performance issue with tally solution

Forum Etiquette: How to post Reporting Services problems

[/url]

Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
[/url]

How to Post Performance Problems - by Gail Shaw
[/url]

StarNamer SSCrazy Eights Points: 8746 More actions · Answer 1

Hi Bob,

I did a few more tests (beyond the earlier table) and on my systems, the double-tally acheives better than half CPU time at long executions (.e.g 146 second vs 321 seconds for 600,000 input rows). So you method scales better than the tally+charindex.

The breakpoint on my system is about 25000 input records.

Interestingly I found that below about 10000 records the double-tally was roughly linear in cpu/rows at about 0.9ms/row and above that it was linear with a slope of about 0.3ms/row. tally+charindex was roughly linear at 0.5 ms/row all the way.

Looking at the actual execution plans, it seems that in the high range, SQL server includes an additional sort of the input and a table spool for the double-tally; apparently this speeds it up! Tally+charindex seems to use the same execution plan at all times.

I assume the Flo's system is using the 2nd execution plan at lower inputs.

Definitely an 'It Depends'. 🙂

I've attached the execution plans for 8000 and 30000 rows input.

Derek

StarNamer SSCrazy Eights Points: 8746 More actions · Answer 2

Bob Hovious (4/23/2009)
Thanks, Flo 🙂
Derek, please don't feel you have to apologize. That needed to be tested. As usual, the answer is "It depends." The point is to understand what works best in which situations, and what the limitations are. That's why I warned everyone that the double-barreled solution was inflexible because it counted on a single character being the delimiter. That said, I see a lot of work with long strings consisting of lots of short elements separated by single character delimiters. Very seldom do I have to parse Moby Dick 😉

I think calling the double-barreled solution 'inflexible' is probably harsh. It may just need a little more work to get it to handle multi-character delimiters.

Of course, my tests show that, on my system and probably everyone else's, you need to match the method to the expected input.

If I needed to parse 5000 records inputs I should use the tally+charindex method. If I needed to parse 500000 records, I should use the double-tally!

I must find time to test the CLR variants...

Derek

Jeff Moden SSC Guru Points: 1004439 More actions · Answer 3

Derek Dongray (4/23/2009)
Hi Bob,
I did a few more tests (beyond the earlier table) and on my systems, the double-tally acheives better than half CPU time at long executions (.e.g 146 second vs 321 seconds for 600,000 input rows). So you method scales better than the tally+charindex.

Not exactly what my findings were. Like you said, it definitely depends on the data. It's slow going for me because I've got so much other stuff going on, but I'll eventually get to publishing what I've found in the testing I've done.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

The Dixie Flatline SSC Guru Points: 53354 More actions · Answer 4

With respect to testing, and I apologize for not having done more, it seems there are at least three or four variables about the test data to consider:

1. The number of rows in the table

2. The average size of the data in the column to be parsed

3. The average number of delimited strings to be parsed per row

4. The average size of the strings being parsed (really the inverse of #3 above).

I never thought about this before, but would the optimizer choose a different execution plan based on the characteristics (RAM, number of processors) of the server (or instance) it was running on?

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 5

I'd look at setting up a test DB here at work on our development server if everything could be put into a consolidated zip file. The server I have is dual quad-core x64 Intel processor system with 8 GB RAM.

I'm just having problems finding everything on this thread to put it all together.

StarNamer SSCrazy Eights Points: 8746 More actions · Answer 6

Bob Hovious (4/23/2009)
With respect to testing, and I apologize for not having done more, it seems there are at least three or four variables about the test data to consider:
1. The number of rows in the table
2. The average size of the data in the column to be parsed
3. The average number of delimited strings to be parsed per row
4. The average size of the strings being parsed (really the inverse of #3 above).
I never thought about this before, but would the optimizer choose a different execution plan based on the characteristics (RAM, number of processors) of the server (or instance) it was running on?

I'd always been aware that all the above could affect the choice of execution plan, but usually found in practice that one plan was a clear winner in all situations so there was never any choice.

This is the first example I've found where SQLserver changes the plan based simply on the number of rows input.

Derek

mister.magoo SSC-Forever Points: 47174 More actions · Answer 7

Derek Dongray (4/24/2009)
Bob Hovious (4/23/2009)
With respect to testing, and I apologize for not having done more, it seems there are at least three or four variables about the test data to consider:
1. The number of rows in the table
2. The average size of the data in the column to be parsed
3. The average number of delimited strings to be parsed per row
4. The average size of the strings being parsed (really the inverse of #3 above).
I never thought about this before, but would the optimizer choose a different execution plan based on the characteristics (RAM, number of processors) of the server (or instance) it was running on?
I'd always been aware that all the above could affect the choice of execution plan, but usually found in practice that one plan was a clear winner in all situations so there was never any choice.
This is the first example I've found where SQLserver changes the plan based simply on the number of rows input.

I have seen one other situation where SQLServer changes the plan drastically - to the extent where a query can change from taking 10-15 minutes to complete down to seconds....

All by the addition of a TOP clause, but not just any old TOP - oh no!

I do quite a bit of work on a system for a customer where there are millions of records in various (fairly) normalised tables, so you are frequently joining two or more tables, each containing many millions of rows.

It has happened on many occasions that a query takes minutes rather than seconds to complete.

In these cases, you can normally increase the speed of the query by adding a TOP clause, however you will not see the speed increase for just any TOP clause and it is not something I have ever managed to work out a calculation for, but there will be a "Magic" number which will effect this change.

An example might be that you have a query that returns only three rows, but the following speeds it up dramatically.

SELECT TOP 10125 FROM blah blah blah

While, this has no effect on speed at all.

SELECT TOP 10124 FROM blah blah blah

Now that is a discussion topic that could run and run!

Anyone fancy trying it on the string splitting?

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

Florian Reischl SSC-Dedicated Points: 37301 More actions · Answer 8

Hi

New information from splitter front... 🙂

As first I have to apologize for all those "great" CLR test results... Since I tried to reduce the test results to the real split work I used a COUNT(*) instead of a INSERT INTO. This was okay to get the line count and ensure that the splitting was done. But the table valued CLR split functions work in two steps (first the real splitting, second just transfer the data to SSE) the test results have not been correct. As I already posted in anywhere in the medieval times of this thread, it seems that SQL Server uses reflection to call the methods for data access (lousy implementation!). The performance goes to the toilet if I just insert the data into a variable table. There are still some business cases where the CLR solutions are faster but not very much... Until this is fact and the lack of MARS support isn't fixed the usage of CLR is better for functions which handle more complex string manipulations.

@Lynn

There are currently two different test systems. The first is the complete test system for all functions posted in this thread. The second is focused to the new tally solutions. I attached two files:

"TallyTests.txt"

Is a stand alone SQL Script for the tally tests

"AllPerformanceTests.zip"

This is the complete performance test environment. It contains the following components:

DDL.sql" which creates all TSQL functions and a table called PerfMon and two procedures I use for my performance tests.
"ClrProject.zip" which is the Visual Studio project containing all CLR functions. It is a VS2008 C# project but you can also copy all functions into a VS2005 project.
"PerformanceTests.sql" is the script which can be used to execute all functions. It also creates the source test tables if not exist.

If something is missing let me know 😉

@Bob, Jeff and the rest of the tally masters

This becomes a bit esoteric but as Bob said "if one tally is fast, two are faster"... If my UNION ALL dual-carburetor tally is fast and Bob's double-barreled tally solution is faster, a double-barreled dual-carburetor tally is even faster :-D. The source becomes quiet huge for a requirement to split strings, but is 10% faster and it handles the first and last element:

PRINT '----=========================================================='

PRINT '---- Double-barreled dual-carburetor Tally'

DECLARE @delimiter CHAR(1)

SELECT @delimiter = ','

--SET STATISTICS IO ON

SET STATISTICS TIME ON;

SELECT rowNum, item

INTO result

FROM dbo.JBMTest s

CROSS APPLY (

SELECT

SUBSTRING(s.CSV, 1, Z) item

FROM (SELECT s.CSV) s2

CROSS APPLY (

SELECT TOP(1)

ISNULL(NULLIF(t1.N, LEN(s.CSV) + 1), LEN(s.CSV)) AS Z

FROM dbo.Tally t1

WHERE

SUBSTRING(s.CSV,t1.N,1) = @delimiter

OR (t1.N = LEN(s.CSV) + 1)

ORDER BY t1.N

) ca2

UNION ALL

SELECT

SUBSTRING(s.CSV, t2.O, Z) item

FROM dbo.Tally2 t2

CROSS APPLY (

SELECT TOP(1)

ISNULL(NULLIF(t1.N - t2.O, LEN(s.CSV) + 1 - t2.O), LEN(s.CSV)) AS Z

FROM dbo.Tally t1

WHERE t1.N > t2.N

AND (SUBSTRING(s.CSV,t1.N,1) = @delimiter

OR (t1.N = LEN(s.CSV) + 1))

ORDER BY t1.N

) ca2

WHERE t2.N <= LEN(s.CSV)

AND SUBSTRING(s.CSV,t2.N,1) = @delimiter

) l

SET STATISTICS TIME OFF;

SET STATISTICS IO OFF;

GO

Test results in combination with the other tally solutions

----==========================================================

---- UNION ALL solution

SQL Server Execution Times:

CPU time = 749 ms, elapsed time = 827 ms.

(801000 row(s) affected)

----==========================================================

---- Traditional Tally solution

SQL Server Execution Times:

CPU time = 2434 ms, elapsed time = 2326 ms.

(799000 row(s) affected)

----==========================================================

---- Bob Hovious double-barelled Tally

SQL Server Execution Times:

CPU time = 640 ms, elapsed time = 701 ms.

(799000 row(s) affected)

----==========================================================

---- Double-barreled dual-carburetor Tally

SQL Server Execution Times:

CPU time = 577 ms, elapsed time = 564 ms.

(801000 row(s) affected)

The Article

Sorry that the article is still pending, but it is quiet hard to write an article since all test results still changing... I didn't forget, I just have to finish all tests.

Greets

Flo

Florian Reischl SSC-Dedicated Points: 37301 More actions · Answer 9

Hi mister.magoo

mister.magoo (4/24/2009)
In these cases, you can normally increase the speed of the query by adding a TOP clause, however you will not see the speed increase for just any TOP clause and it is not something I have ever managed to work out a calculation for, but there will be a "Magic" number which will effect this change.

Thanks for your feedback! You are absolutely correct. There are many criteria for SQL Server to decide some "magic" things. I had even many of this cases in my tests for this thread...!

Anyone fancy trying it on the string splitting?

I already had this effects with my UNION ALL tally solution.

Greets

Flo

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 10

Florian Reischl (4/25/2009)
Hi
New information from splitter front... 🙂
As first I have to apologize for all those "great" CLR test results... Since I tried to reduce the test results to the real split work I used a COUNT(*) instead of a INSERT INTO. This was okay to get the line count and ensure that the splitting was done. But the table valued CLR split functions work in two steps (first the real splitting, second just transfer the data to SSE) the test results have not been correct. As I already posted in anywhere in the medieval times of this thread, it seems that SQL Server uses reflection to call the methods for data access (lousy implementation!). The performance goes to the toilet if I just insert the data into a variable table. There are still some business cases where the CLR solutions are faster but not very much... Until this is fact and the lack of MARS support isn't fixed the usage of CLR is better for functions which handle more complex string manipulations.
@Lynn
There are currently two different test systems. The first is the complete test system for all functions posted in this thread. The second is focused to the new tally solutions. I attached two files:
"TallyTests.txt"
Is a stand alone SQL Script for the tally tests
"AllPerformanceTests.zip"
This is the complete performance test environment. It contains the following components:
DDL.sql" which creates all TSQL functions and a table called PerfMon and two procedures I use for my performance tests.
"ClrProject.zip" which is the Visual Studio project containing all CLR functions. It is a VS2008 C# project but you can also copy all functions into a VS2005 project.
"PerformanceTests.sql" is the script which can be used to execute all functions. It also creates the source test tables if not exist.
If something is missing let me know 😉
@Bob, Jeff and the rest of the tally masters
This becomes a bit esoteric but as Bob said "if one tally is fast, two are faster"... If my UNION ALL dual-carburetor tally is fast and Bob's double-barreled tally solution is faster, a double-barreled dual-carburetor tally is even faster :-D. The source becomes quiet huge for a requirement to split strings, but is 10% faster and it handles the first and last element:
PRINT '----=========================================================='
PRINT '---- Double-barreled dual-carburetor Tally'
DECLARE @delimiter CHAR(1)
SELECT @delimiter = ','
--SET STATISTICS IO ON
SET STATISTICS TIME ON;
SELECT rowNum, item
INTO result
FROM dbo.JBMTest s
CROSS APPLY (
SELECT
SUBSTRING(s.CSV, 1, Z) item
FROM (SELECT s.CSV) s2
CROSS APPLY (
SELECT TOP(1)
ISNULL(NULLIF(t1.N, LEN(s.CSV) + 1), LEN(s.CSV)) AS Z
FROM dbo.Tally t1
WHERE
SUBSTRING(s.CSV,t1.N,1) = @delimiter
OR (t1.N = LEN(s.CSV) + 1)
ORDER BY t1.N
) ca2
UNION ALL
SELECT
SUBSTRING(s.CSV, t2.O, Z) item
FROM dbo.Tally2 t2
CROSS APPLY (
SELECT TOP(1)
ISNULL(NULLIF(t1.N - t2.O, LEN(s.CSV) + 1 - t2.O), LEN(s.CSV)) AS Z
FROM dbo.Tally t1
WHERE t1.N > t2.N
AND (SUBSTRING(s.CSV,t1.N,1) = @delimiter
OR (t1.N = LEN(s.CSV) + 1))
ORDER BY t1.N
) ca2
WHERE t2.N <= LEN(s.CSV)
AND SUBSTRING(s.CSV,t2.N,1) = @delimiter
) l
SET STATISTICS TIME OFF;
SET STATISTICS IO OFF;
GO
Test results in combination with the other tally solutions
----==========================================================
---- UNION ALL solution
SQL Server Execution Times:
CPU time = 749 ms, elapsed time = 827 ms.
(801000 row(s) affected)
----==========================================================
---- Traditional Tally solution
SQL Server Execution Times:
CPU time = 2434 ms, elapsed time = 2326 ms.
(799000 row(s) affected)
----==========================================================
---- Bob Hovious double-barelled Tally
SQL Server Execution Times:
CPU time = 640 ms, elapsed time = 701 ms.
(799000 row(s) affected)
----==========================================================
---- Double-barreled dual-carburetor Tally
SQL Server Execution Times:
CPU time = 577 ms, elapsed time = 564 ms.
(801000 row(s) affected)
The Article
Sorry that the article is still pending, but it is quiet hard to write an article since all test results still changing... I didn't forget, I just have to finish all tests.
Greets
Flo

Awesome work Flo. Thank you for taking the time to put everything together like this as well. I'll look at setting up a test DB here at home and on my server dev server at work. The string splitting is definately something I am interested in as we are working on a project that requires this, and the developer has at this time implemented a tally version based on the previous work done on SSC prior to this thread. If we can find a more performant solution, that is a bonus.

Paul White SSC Guru Points: 150468 More actions · Answer 11

Hey Flo!

These new tally solutions are only fast if all the rows in dbo.JBMTest are identical.

The QO can tell from the string statistics on the CSV column that there is only one distinct row, so it uses a lazy table spool to hold the split results from the first row, in the expectation that all following rows will be the same. If they are the same, the entire result set is served from the results from the first row (the table spool). If not, additional splitting is done.

In the case where all rows are unique, the QO dispenses with the table spool optimization, and the split takes ~20s on my laptop, up from ~1s.

To see the difference, may I suggest you re-run a test with:CSV = REPLACE(@CSV, 'Part-', 10000 + ROW_NUMBER() OVER (ORDER BY (SELECT 1))) instead of [font="Courier New"]CSV = @CSV[/font] in the script which creates dbo.JBMTest?

The previous routines (including the CLR routines) split every row individually.

Naturally, if the requirement is to split the same string many, many times, one should of course use the tally routines :laugh:

I am sorry it took me so long to notice this.

Cheers,

Paul

edited for typo - and I previewed and everything... :sigh:

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Phil Factor SSC-Insane Points: 20254 More actions · Answer 12

As first I have to apologize for all those "great" CLR test results... Since I tried to reduce the test results to the real split work I used a COUNT(*) instead of a INSERT INTO. This was okay to get the line count and ensure that the splitting was done. But the table valued CLR split functions work in two steps (first the real splitting, second just transfer the data to SSE) the test results have not been correct.

I agree that it makes a difference to the CLR results, but it is not a huge difference. It certainly seems to put the while loop even more in the lead, though I have yet to test the double-barreled tally-technique. Sorry that I haven't published my test harness but I keep refining and changing it! Also, it is getting a bit long!

Best wishes,
Phil Factor

Paul White SSC Guru Points: 150468 More actions · Answer 13

Phil Factor (4/25/2009)
It certainly seems to put the while loop even more in the lead...

Which one, Phil? :unsure:

[dbo].[ufn_SplitLines_Cursor]

[dbo].[WhileSplit]

All seem to feature a while loop - those are from Flo's DDL.sql file.

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Florian Reischl SSC-Dedicated Points: 37301 More actions · Answer 14

Hey Paul!

Paul White (4/25/2009)
Hey Flo!
These new tally solutions are only fast if all the rows in dbo.JBMTest are identical.
The QO can tell from the string statistics on the CSV column that there is only one distinct row, so it uses a lazy table spool to hold the split results from the first row, in the expectation that all following rows will be the same. If they are the same, the entire result set is served from the results from the first row (the table spool). If not, additional splitting is done.
In the case where all rows are unique, the QO dispenses with the table spool optimization, and the split takes ~20s on my laptop, up from ~1s.
To see the difference, may I suggest you re-run a test with:CSV = REPLACE(@CSV, 'Part-', 10000 + ROW_NUMBER() OVER (ORDER BY (SELECT 1))) instead of [font="Courier New"]CSV = @CSV[/font] in the script which creates dbo.JBMTest?
The previous routines (including the CLR routines) split every row individually.
Naturally, if the requirement is to split the same string many, many times, one should of course use the tally routines :laugh:

*ARGH!*

This starts to make me crazy... :crazy:

Thank you for this very important information! New test results are awful:

----==========================================================

---- UNION ALL solution

(0 row(s) affected)

SQL Server Execution Times:

CPU time = 6318 ms, elapsed time = 6255 ms.

(801000 row(s) affected)

----==========================================================

---- Traditional Tally solution

SQL Server Execution Times:

CPU time = 5554 ms, elapsed time = 5506 ms.

(799000 row(s) affected)

----==========================================================

---- Bob Hovious double-barelled Tally

SQL Server Execution Times:

CPU time = 17222 ms, elapsed time = 16751 ms.

(799000 row(s) affected)

----==========================================================

---- Double-barreled dual-carburetor Tally

SQL Server Execution Times:

CPU time = 17269 ms, elapsed time = 16855 ms.

(801000 row(s) affected)

I really never thought that string splitting can become a problem like this...

Thanks again!

Flo

Florian Reischl SSC-Dedicated Points: 37301 More actions · Answer 15

Hi Phil

Phil Factor (4/25/2009)
I agree that it makes a difference to the CLR results, but it is not a huge difference.

Sorry, but I disagree. I just tried some of the functions with and without data access:

Without Data access

Sorting Description Duration(ms) CpuTime LogicalReads PhysicalReads

-------- ----------------------------- ------------ ----------- ------------ -------------

Large CLR tvf (Paul White) solution 93 94 7322 0

Large CLR RegEx solution 130 140 7322 0

Large CLR Chars solution 346 359 7322 0

Large CLR XML Solution 853 702 4276 58

Many CLR tvf (Paul White) solution 200 234 6929 0

Many CLR RegEx solution 236 234 6929 0

Many CLR XML Solution 1056 577 5213 65

Many CLR Chars solution 1173 1248 6929 0

MobyDick CLR Chars solution 190 124 7534 10

Monster CLR Chars solution 33 16 2464 0

Monster CLR tvf (Paul White) solution 163 31 2464 35

Short CLR tvf (Paul White) solution 33 31 180 0

Short CLR RegEx solution 36 32 180 0

Short CLR Chars solution 126 171 180 0

Short CLR XML Solution 206 141 188 8

With data access

Sorting Description Duration(ms) CpuTime LogicalReads PhysicalReads

-------- ----------------------------- ------------ ----------- ------------ -------------

Large CLR RegEx solution 1096 1139 399756 0

Large CLR tvf (Paul White) solution 1106 1045 399728 0

Large CLR XML Solution 1146 1186 91872 0

Large CLR Chars solution 1330 1357 401046 0

Many CLR RegEx solution 3266 3292 1045886 0

Many CLR tvf (Paul White) solution 3276 3307 1044632 0

Many CLR XML Solution 3443 3448 296491 0

Many CLR Chars solution 4083 4057 1044563 0

MobyDick CLR Chars solution 456 499 160488 0

Monster CLR Chars solution 170 156 66444 0

Monster CLR tvf (Paul White) solution 170 156 67130 0

Short CLR tvf (Paul White) solution 390 405 138450 0

Short CLR RegEx solution 396 405 138672 0

Short CLR XML Solution 440 436 32935 0

Short CLR Chars solution 473 468 138923 0

I tried the same directly in a C# application for fun. It is even faster to connect to server, get all data to client, and split all data...

Greets

Flo