Performance issue with tally solution

Question

Performance issue with tally solution

Viewing 15 posts - 166 through 180 (of 522 total)

You must be logged in to reply to this topic. Login to reply

Paul White SSC Guru Points: 150468 More actions · Answer 1

Jeff Moden (4/21/2009)
It's not the presence of TOP nor ORDER BY. It's the presence of UNION ALL. In the testing I did with Flo, I actually created a situation where the first SELECT in the UNION ALL returned 0 rows and it knocked an 800 column split on 1000 rows on a VARCHAR(MAX) down from 44 seconds to only 6. It somehow forces the optimizer to, as Flo says, use a working table that wouldn't normally be utilized to keep from building 799,000 rows using the Tally table. Instead, it only reads 800 rows from the Tally table and spawns the rest of the rows...

Jeff,

My apologies. If you wondered what the heck I was on about - let me explain: Both parts of the script are headed 'Traditional Tally Solution'. In my excitement to try it out, I assumed that the second one was the new and improved version. That was the one I was referring to - oops!

The giveaway was the fact that there is no UNION ALL in it - so I though "what is Jeff talking about UNION ALL for?" when I saw the response. So sorry, serves me right for rushing it. Throw meat portions at me or lash me with noodles - whatever the currently applicable response is.

With some trepidation then, I ran the correct version - the one with the UNION ALL.

It's all cool (really!), but it still suffers with the Lazy Spool which generated 801,000 intermediate rows on the run I just did (admittedly with a 100K tally table). The QO seems to assume that the Lazy Spool will never rewind - in fact it does for every row in JBMTest. I wonder if it would produce a different plan if it could know that?

It does however

The Holy Grail - which I have singularly failed to find - is a plan which (in order of desirability):

doesn't generate a spool (table or index)
eliminates the inequality predicate
calls charindex & substring once per found delimeter

Removing the inequality (if it's even possible) would allow a hash or merge join, or even some kind of streaming aggregate operation rather than a loop join with table spool.

Calling charindex/substring once per delimeter would mean I could never use the "brute force" tag again 🙂

I attach the actual execution plan (sqlplan and text form) in case it differs from yours.

Really sorry about my false start there...:blush:

Paul

P.S. Couldn't upload the .sqlplan file directly - so renamed it as Jeff-UNION-ALL.txt

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

The Dixie Flatline SSC Guru Points: 53354 More actions · Answer 2

Flo and Jeff, I just had to try.

This beast features double-barelled carburetors.... errrr tally tables.... one with two columns, and nested CROSS APPLYs.

Can you help me tune her up a bit?

SET STATISTICS TIME OFF

SET STATISTICS IO OFF

GO

USE tempdb

GO

IF (OBJECT_ID('dbo.Tally') IS NOT NULL) drop table dbo.tally

IF (OBJECT_ID('dbo.Tally2') IS NOT NULL) drop table dbo.tally2

IF (OBJECT_ID('dbo.Result1') IS NOT NULL) drop table dbo.result1

CREATE TABLE dbo.Tally (N INT NOT NULL, PRIMARY KEY CLUSTERED (N))

INSERT INTO dbo.Tally

SELECT TOP 11000

ROW_NUMBER() OVER (ORDER BY c1.column_id)

FROM master.sys.all_columns c1

CROSS JOIN master.sys.all_columns c2

CREATE TABLE dbo.Tally2 (N INT NOT NULL, O INT NOT NULL, PRIMARY KEY CLUSTERED (N))

INSERT INTO dbo.Tally2

SELECT TOP 11000

ROW_NUMBER() OVER (ORDER BY c1.column_id),ROW_NUMBER() OVER (ORDER BY c1.column_id)+1

FROM master.sys.all_columns c1

CROSS JOIN master.sys.all_columns c2

--=====================================================================================

set statistics time on;

set statistics io on;

select rowNum, item

into result1

from jbmTest

cross apply(select substring(csv,t2.O,Z) as item

from tally2 t2

cross apply (select min(t1.N)- t2.O as Z

from Tally t1

where t1.N > t2.N

and substring(csv,t1.N,1) = ',') ca2

where t2.N < len(csv) and substring(csv,t2.N,1) = ','

) ca1

set statistics time off;

set statistics io off;

I got a 69 Chevy with a 396

Fuelie heads and a Hurst on the floor....

- Bruce Springsteen

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

Jeff Moden SSC Guru Points: 1004475 More actions · Answer 3

Bob Hovious (4/21/2009)
Flo and Jeff, I just had to try.
This beast features double-barelled carburetors.... errrr tally tables.... one with two columns, and nested CROSS APPLYs.
Can you help me tune her up a bit?

How well does it run with no tuneup?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Paul White SSC Guru Points: 150468 More actions · Answer 4

Bob Hovious (4/21/2009)
Can you help me tune her up a bit?

Hey Bob,

I *was* going to suggest rewriting it in CLR...?

:laugh:

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

The Dixie Flatline SSC Guru Points: 53354 More actions · Answer 5

Jeff: Just a hair slower than you guys... on my laptop at least.

Hard to isolate because of other processes, but it seems to be about 5-8% slower on CPU. It's all about the I/O. This scans the tally table 800 times.

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

The Dixie Flatline SSC Guru Points: 53354 More actions · Answer 6

Hey Paul, this thread may be the motivator for me to finally learn that awful note (pun) of a language. But it also convinces me that CLR isn't a magic bullet. A mediocre CLR routine might not be the fastest technique either. 🙂

As always, it depends.

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

The Dixie Flatline SSC Guru Points: 53354 More actions · Answer 7

Something to ponder.

This thread has also made me broaden my horizons about tally tables. Why should there be just one flavor? If we are going to parse 8k fields, why not make one that uses smallint for N and only enough rows to do the parser? If we always want to start with the column right after the separator appears (and we know the separator will be only 1 position long), why not precalculate that?

I even thought about a table with all possible combinations of starting (X) and ending (Y) postions, and the length between them. This would be used to set up a substring from X to min(Y) where both X and Y = ',' but it would take tens of millions of rows. It's almost 1:30, so I'm not going to explore that tonight, I'm just throwing out exhausted ramblings in hopes of turning the light bulb on for someone else.

As always, it's been a pleasure and an education.

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

Paul White SSC Guru Points: 150468 More actions · Answer 8

Bob Hovious (4/22/2009)
But it also convinces me that CLR isn't a magic bullet. A mediocre CLR routine might not be the fastest technique either. 🙂

Absolutely. I have been spending most of my time playing with T-SQL to find a better solution. CLR is rarely my first choice; and then only when T-SQL doesn't have the the required functionality (or if it is obviously a non-starter).

Customised tally tables are an interesting thought - though I might consider creating one within the query (using the normal tricks) in some siutations. You guys should rename this site to SS:ID - SQL Server: It Depends.

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

peter-757102 SSCertifiable Points: 6877 More actions · Answer 9

Sorry to have missed this topic, I will spend some time the next week to take a good look at all data and knowledge displayed in this thread. Right now I am quite busy and were it not for a link that Jeff made in another thread I would not even found this one!

Thanks for that Jeff, and stay tuned I am going to like next weeks exercise for sure 🙂

The Dixie Flatline SSC Guru Points: 53354 More actions · Answer 10

Paul, I am drawing on the experience that oftentimes a precalculated date or calendar table produces faster solutions than doing DATEADD,DATEDIFF on the fly. It would seem the same principle should apply. We'll see.

Jeff,

the presence of UNION ALL

Maybe this is the tune-up I needed. Back to the garage.

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

Florian Reischl SSC-Dedicated Points: 37301 More actions · Answer 11

Bob Hovious (4/21/2009)
Flo and Jeff, I just had to try.
This beast features double-barelled carburetors.... errrr tally tables.... one with two columns, and nested CROSS APPLYs.
Can you help me tune her up a bit?
SET STATISTICS TIME OFF
SET STATISTICS IO OFF
GO
USE tempdb
GO
IF (OBJECT_ID('dbo.Tally') IS NOT NULL) drop table dbo.tally
IF (OBJECT_ID('dbo.Tally2') IS NOT NULL) drop table dbo.tally2
IF (OBJECT_ID('dbo.Result1') IS NOT NULL) drop table dbo.result1
CREATE TABLE dbo.Tally (N INT NOT NULL, PRIMARY KEY CLUSTERED (N))
INSERT INTO dbo.Tally
SELECT TOP 11000
ROW_NUMBER() OVER (ORDER BY c1.column_id)
FROM master.sys.all_columns c1
CROSS JOIN master.sys.all_columns c2
CREATE TABLE dbo.Tally2 (N INT NOT NULL, O INT NOT NULL, PRIMARY KEY CLUSTERED (N))
INSERT INTO dbo.Tally2
SELECT TOP 11000
ROW_NUMBER() OVER (ORDER BY c1.column_id),ROW_NUMBER() OVER (ORDER BY c1.column_id)+1
FROM master.sys.all_columns c1
CROSS JOIN master.sys.all_columns c2
--=====================================================================================
set statistics time on;
set statistics io on;
select rowNum, item
into result1
from jbmTest
cross apply(select substring(csv,t2.O,Z) as item
from tally2 t2
cross apply (select min(t1.N)- t2.O as Z
from Tally t1
where t1.N > t2.N
and substring(csv,t1.N,1) = ',') ca2
where t2.N < len(csv) and substring(csv,t2.N,1) = ','
) ca1
set statistics time off;
set statistics io off;
I got a 69 Chevy with a 396
Fuelie heads and a Hurst on the floor....
- Bruce Springsteen

Hi Bob!

Tune this great solution?? You are kidding, aren't you? 🙂

Your double-barelled Tally even beats the UNION ALL function! I tried to replace the Tally2 with a second join to the traditional Tally but the performance decreases for ten percent. I tried to add an unique index on N including O to maybe avoid one step; with and without index hint. I tried to include a UNION... Every character I changed in your awesome function degrades its performance.

I just have to congratulate to the next version of a spooky tally. Why the hell is a CROSS APPLY with a tally scan :alien: faster than CHARINDEX??

Here my performance results:

----==========================================================

---- UNION ALL solution

SQL Server Execution Times:

CPU time = 748 ms, elapsed time = 741 ms.

(801000 row(s) affected)

----==========================================================

---- Traditional Tally solution

SQL Server Execution Times:

CPU time = 2294 ms, elapsed time = 2350 ms.

(799000 row(s) affected)

----==========================================================

---- Bob Hovious double-barelled Tally

SQL Server Execution Times:

CPU time = 624 ms, elapsed time = 665 ms.

(799000 row(s) affected)

Thanks again!

Flo

Florian Reischl SSC-Dedicated Points: 37301 More actions · Answer 12

Paul White (4/21/2009)
Jeff Moden (4/21/2009)
It's not the presence of TOP nor ORDER BY. It's the presence of UNION ALL. In the testing I did with Flo, I actually created a situation where the first SELECT in the UNION ALL returned 0 rows and it knocked an 800 column split on 1000 rows on a VARCHAR(MAX) down from 44 seconds to only 6. It somehow forces the optimizer to, as Flo says, use a working table that wouldn't normally be utilized to keep from building 799,000 rows using the Tally table. Instead, it only reads 800 rows from the Tally table and spawns the rest of the rows...
Jeff,
My apologies. If you wondered what the heck I was on about - let me explain: Both parts of the script are headed 'Traditional Tally Solution'. In my excitement to try it out, I assumed that the second one was the new and improved version. That was the one I was referring to - oops!
The giveaway was the fact that there is no UNION ALL in it - so I though "what is Jeff talking about UNION ALL for?" when I saw the response. So sorry, serves me right for rushing it. Throw meat portions at me or lash me with noodles - whatever the currently applicable response is.

All blame on me...! My first error was twice the same PRINT for both functions. The second, greater, error was that I forgot to note the UNION ALL which Jeff already told me. So all the meat and noodles on me...

I don't try to join the execution plan discussion because I know my limits.

Thanks to you, both!

Flo

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 13

A very minor point for Number/Tally tables is to create the clustered index after population and to specify 100% fill factor. This will ensure minimal IO for subsequent use. This is negligible for 11000 row table but for larger ones becomes increasingly important I think.

That double cross-apply is wicked. I am still trying to wrap my mind around how it actually works. :w00t:

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

The Dixie Flatline SSC Guru Points: 53354 More actions · Answer 14

Hey Guru: I borrowed Flo's code to create the tally table instead of doing it the way Jeff taught me, because I wanted to keep an apples-to-apples comparison against the results he was reporting when I tested on my laptop. Thanks for pointing out something that is important to any tally table application. 🙂

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

RBarryYoung SSC Guru Points: 143329 More actions · Answer 15

Florian Reischl (4/22/2009)
Bob Hovious (4/21/2009)
Flo and Jeff, I just had to try.
This beast features double-barelled carburetors.... errrr tally tables.... one with two columns, and nested CROSS APPLYs.
Can you help me tune her up a bit?
...
Hi Bob!
Tune this great solution?? You are kidding, aren't you? 🙂
Your double-barelled Tally even beats the UNION ALL function! I tried to replace the Tally2 with a second join to the traditional Tally but the performance decreases for ten percent. I tried to add an unique index on N including O to maybe avoid one step; with and without index hint. I tried to include a UNION... Every character I changed in your awesome function degrades its performance.
I just have to congratulate to the next version of a spooky tally. Why the hell is a CROSS APPLY with a tally scan :alien: faster than CHARINDEX??
Here my performance results:
----==========================================================
---- UNION ALL solution
SQL Server Execution Times:
CPU time = 748 ms, elapsed time = 741 ms.
(801000 row(s) affected)
----==========================================================
---- Traditional Tally solution
SQL Server Execution Times:
CPU time = 2294 ms, elapsed time = 2350 ms.
(799000 row(s) affected)
----==========================================================
---- Bob Hovious double-barelled Tally
SQL Server Execution Times:
CPU time = 624 ms, elapsed time = 665 ms.
(799000 row(s) affected)
Thanks again!
Flo

!!!

Those are some freakin' incredible numbers!

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]