Performance issue with tally solution

Question

Performance issue with tally solution

Viewing 15 posts - 451 through 465 (of 522 total)

You must be logged in to reply to this topic. Login to reply

UMG Developer SSChampion Points: 13482 More actions · Answer 1

Paul White (5/12/2009)
Do you get parallelism with Jeff's solution? I would think that would help a lot.
T-SQL UDFs always generate a serial plan.

I've run into that before, and had to create a temporary table with the results of the UDF to get things to run fast enough. (Even a CTE against the UDF wasn't enough.)

I'll get back to you later on Jeff's solution after I have had a chance to do the testing.

Florian Reischl SSC-Dedicated Points: 37301 More actions · Answer 2

Florian Reischl

SSC-Dedicated

Points: 37301

May 12, 2009 at 10:24 am

#993113

UMG, did you try the cursor?

Greets

Flo

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 3

>> Do you get parallelism with Jeff's solution? I would think that would help a lot.

>> T-SQL UDFs always generate a serial plan.

Is that for SELECTs or just for modification queries?? Got any reference for that statement please?

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Jeff Moden SSC Guru Points: 1004363 More actions · Answer 4

UMG Developer (5/12/2009)

Jeff,
Now you've essentially got my roll it up and then split it method that takes ~2 minutes for the roll up, and ~16 minutes for the split. I do have to use the .VALUE trick with the FOR XML to avoid the entitization. (I know I should have posted my code.)

Not sure what you mean. Does that mean you tried it and it worked? Also, yes... please post your code especially if you used the ".VALUE trick" because I didn't have to use it. Good thing, too, because I don't actually know what the heck you're talking about. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004363 More actions · Answer 5

David Burrows (5/12/2009)
Nice one Jeff 😎

Thanks, David. I didn't understand the problem until I saw the test data (funny how important that is :-D). Thought we could do it with Bulk Insert based on the verbal description but the test data flash burned that idea.

They could use the code in the CTE to build a Temp Table and index it specifically for "order" of the split but it should be fairly fast as a CTE, as well. Uses Index Seeks and all. Pick your favorite splitter for use below that.

If anyone knows what the ".VALUE trick" that UMG was talking about is, I'm all ears, thank you. :w00t:

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004363 More actions · Answer 6

UMG Developer (5/12/2009)
Jeff Moden (5/11/2009)
Once you recognize the "pattern", you don't need all the ancillary conditional data moves and deletes. The "pattern" is that if it's NOT a continuation line, add a leading delimiter to the concatenation by CommentID in order by the CommentPartNum.
Jeff,
Now you've essentially got my roll it up and then split it method that takes ~2 minutes for the roll up, and ~16 minutes for the split. I do have to use the .VALUE trick with the FOR XML to avoid the entitization. (I know I should have posted my code.)

Also... that does seem to be a bit faster than stopping it after 2 hours and some. Why not just use the rollup code I used in my solution and pick whatever splitter you want (Flo's "Cursor" solution is nasty fast) instead of fighting with the machine on the other solution where you have to move and delete things?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Paul White SSC Guru Points: 150467 More actions · Answer 7

TheSQLGuru (5/12/2009)
>> T-SQL UDFs always generate a serial plan.
Is that for SELECTs or just for modification queries?? Got any reference for that statement please?

All T-SQL UDFs under all circumstances 🙂

Sure. See page 35 in the excellent: http://blogs.msdn.com/craigfr/archive/2007/04/17/parallel-query-execution-presentation.aspx

The first comment from Craig reflects the change he made to the presentation following my feedback to him.

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Paul White SSC Guru Points: 150467 More actions · Answer 8

Jeff Moden (5/12/2009)
If anyone knows what the ".VALUE trick" that UMG was talking about is, I'm all ears, thank you. :w00t:

I believe he is referring to the (...FOR XML PATH('')).value('./[1]') nastiness. 😉

Embed a non-printable character to blow this one up.

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

RBarryYoung SSC Guru Points: 143329 More actions · Answer 9

Jeff Moden (5/12/2009)If anyone knows what the ".VALUE trick" that UMG was talking about is, I'm all ears, thank you.

When you are using FOR XML to do string concatention, there is a problem called "Entitization" where it turns certain characters like ">" into "& gt;". The ".VALUE" technique is one of several ways way to undo this, but adds 20-30% overhead.

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

UMG Developer SSChampion Points: 13482 More actions · Answer 10

Florian Reischl (5/12/2009)
Here is the cursor I actually used in my tests. I compared it with all others but it seems to be the fastest.

No parallelism, processor stayed at 13% (100% on 1 of the 8 cores) and I killed it after 4 hours with this code:

[font="Courier New"]DECLARE @Delimiter CHAR(1);

SELECT @Delimiter = '|';

SELECT c.Tar_ID,

ca.*

INTO CommentLinesFromUDF

FROM Comments c

CROSS APPLY dbo.ufn_SplitString_Cursor_VM8(c.Comment, @Delimiter) ca;

[/font]

Did I do something wrong in calling it? Is there maybe a different way I should call the function?

(I did modified the function slightly by adding an identity column to the output table so that I got the line number, but that shouldn't have made it take that long should it have?)

Jeff Moden SSC Guru Points: 1004363 More actions · Answer 11

Paul White (5/12/2009)
I believe he is referring to the (...FOR XML PATH('')).value('./[1]') nastiness.
Embed a non-printable character to blow this one up.

RBarryYoung (5/12/2009)
When you are using FOR XML to do string concatention, there is a problem called "Entitization" where it turns certain characters like ">" into "& gt;". The ".VALUE" technique is one of several ways way to undo this, but adds 20-30% overhead.

Ooooh... THAT problem. Didn't know what it was actually called and didn't know .VALUE would fix it (with the overhead mentioned). Thanks guys.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

UMG Developer SSChampion Points: 13482 More actions · Answer 12

Jeff Moden (5/11/2009)
Once you recognize the "pattern", you don't need all the ancillary conditional data moves and deletes. The "pattern" is that if it's NOT a continuation line, add a leading delimiter to the concatenation by CommentID in order by the CommentPartNum.

No parallelism, processor stayed at 13% (100% on 1 of the 8 cores) for just under 21 minutes to create the 9,177,458 rows. (Not counting the FOR XML combining since I created a table of the comments all rolled together for use in multiple tests.)

The .VALUE trick is to deal with special characters like &, in the source text, here is the roll up code I ended up using:

[font="Courier New"]DECLARE @Delimiter CHAR(1);

SELECT @Delimiter = '|';

SELECT

t1.CommentID,

(SELECT

CASE CommentCont

WHEN 'N' THEN @Delimiter

ELSE ''

END + CAST(t2.CommentPart AS VARCHAR(MAX))

FROM dbo.Comments t2

WHERE t2.CommentID = t1.CommentID --- must match GROUP BY below

ORDER BY t2.CommentPartNum

FOR XML PATH(''), TYPE).value('.[1]','varchar(max)') + @Delimiter AS Comment

INTO dbo.cteComments

FROM dbo.Comments t1

GROUP BY t1.CommentID;

[/font]

I re-did your example a little bit to get the parallelism to kick in and it ran in just over 14 minutes to create the 9,177,458 rows:

[font="Courier New"]DECLARE @Delimiter CHAR(1);

SELECT @Delimiter = '|';

SELECT

TAR_ID,

ROW_NUMBER() OVER (PARTITION BY TAR_ID ORDER BY t.n) AS Line,

SUBSTRING(s.Comment, t.N +1, CHARINDEX(@delimiter, s.Comment, t.N +1) -t.N -1) AS CommentLine

INTO dbo.CommentLines3

FROM dbo.cteComments s

CROSS JOIN dbo.Tally t

WHERE t.N < LEN(s.Comment)

AND SUBSTRING(s.Comment, t.N, 1) = @delimiter;

[/font]

So I think in my case we have a winner and it is the Tally table, unless someone has something else I can try. (I know I don't have CLR on the server, so I can't try that solution with my large data set.)

Paul White SSC Guru Points: 150467 More actions · Answer 13

UMG,

So is the parallel tally fast enough for you? Seems fairly reasonable to me.

Try: exec sp_configure 'clr_enabled', 1; reconfigure;

...some day 🙂

It's that hard.

Paul

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

UMG Developer SSChampion Points: 13482 More actions · Answer 14

Paul White (5/12/2009)
Jeff Moden (5/12/2009)
If anyone knows what the ".VALUE trick" that UMG was talking about is, I'm all ears, thank you. :w00t:
I believe he is referring to the (...FOR XML PATH('')).value('./[1]') nastiness. 😉
Embed a non-printable character to blow this one up.

Paul,

Non-printable characters blow up which method up? They appear to work just fine with the method I used. (For CR, LF, and TAB anyhow.)

UMG Developer SSChampion Points: 13482 More actions · Answer 15

Paul White (5/12/2009)
UMG,
So is the parallel tally fast enough for you? Seems fairly reasonable to me.
Try: exec sp_configure 'clr_enabled', 1; reconfigure;
...some day 🙂
It's that hard.

Yes, I think the tally solution is really fast... But nothing can ever be too fast can it?

I never said that technically it was hard to turn CLR on, but I don't get to just turn things on...