May 12, 2009 at 9:59 am
Paul White (5/12/2009)
Do you get parallelism with Jeff's solution? I would think that would help a lot.T-SQL UDFs always generate a serial plan.
I've run into that before, and had to create a temporary table with the results of the UDF to get things to run fast enough. (Even a CTE against the UDF wasn't enough.)
I'll get back to you later on Jeff's solution after I have had a chance to do the testing.
May 12, 2009 at 10:24 am
UMG, did you try the cursor?
Greets
Flo
May 12, 2009 at 12:34 pm
>> Do you get parallelism with Jeff's solution? I would think that would help a lot.
>> T-SQL UDFs always generate a serial plan.
Is that for SELECTs or just for modification queries?? Got any reference for that statement please?
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
May 12, 2009 at 6:18 pm
UMG Developer (5/12/2009)
Jeff,
Now you've essentially got my roll it up and then split it method that takes ~2 minutes for the roll up, and ~16 minutes for the split. I do have to use the .VALUE trick with the FOR XML to avoid the entitization. (I know I should have posted my code.)
Not sure what you mean. Does that mean you tried it and it worked? Also, yes... please post your code especially if you used the ".VALUE trick" because I didn't have to use it. Good thing, too, because I don't actually know what the heck you're talking about. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
May 12, 2009 at 6:25 pm
David Burrows (5/12/2009)
Nice one Jeff 😎
Thanks, David. I didn't understand the problem until I saw the test data (funny how important that is :-D). Thought we could do it with Bulk Insert based on the verbal description but the test data flash burned that idea.
They could use the code in the CTE to build a Temp Table and index it specifically for "order" of the split but it should be fairly fast as a CTE, as well. Uses Index Seeks and all. Pick your favorite splitter for use below that.
If anyone knows what the ".VALUE trick" that UMG was talking about is, I'm all ears, thank you. :w00t:
--Jeff Moden
Change is inevitable... Change for the better is not.
May 12, 2009 at 6:28 pm
UMG Developer (5/12/2009)
Jeff Moden (5/11/2009)
Once you recognize the "pattern", you don't need all the ancillary conditional data moves and deletes. The "pattern" is that if it's NOT a continuation line, add a leading delimiter to the concatenation by CommentID in order by the CommentPartNum.Jeff,
Now you've essentially got my roll it up and then split it method that takes ~2 minutes for the roll up, and ~16 minutes for the split. I do have to use the .VALUE trick with the FOR XML to avoid the entitization. (I know I should have posted my code.)
Also... that does seem to be a bit faster than stopping it after 2 hours and some. Why not just use the rollup code I used in my solution and pick whatever splitter you want (Flo's "Cursor" solution is nasty fast) instead of fighting with the machine on the other solution where you have to move and delete things?
--Jeff Moden
Change is inevitable... Change for the better is not.
May 12, 2009 at 7:14 pm
TheSQLGuru (5/12/2009)
>> T-SQL UDFs always generate a serial plan.Is that for SELECTs or just for modification queries?? Got any reference for that statement please?
All T-SQL UDFs under all circumstances 🙂
Sure. See page 35 in the excellent: http://blogs.msdn.com/craigfr/archive/2007/04/17/parallel-query-execution-presentation.aspx
The first comment from Craig reflects the change he made to the presentation following my feedback to him.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
May 12, 2009 at 7:16 pm
Jeff Moden (5/12/2009)
If anyone knows what the ".VALUE trick" that UMG was talking about is, I'm all ears, thank you. :w00t:
I believe he is referring to the (...FOR XML PATH('')).value('./[1]') nastiness. 😉
Embed a non-printable character to blow this one up.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
May 12, 2009 at 7:24 pm
Jeff Moden (5/12/2009)If anyone knows what the ".VALUE trick" that UMG was talking about is, I'm all ears, thank you.
When you are using FOR XML to do string concatention, there is a problem called "Entitization" where it turns certain characters like ">" into "& gt;". The ".VALUE" technique is one of several ways way to undo this, but adds 20-30% overhead.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
May 12, 2009 at 9:38 pm
Florian Reischl (5/12/2009)
Here is the cursor I actually used in my tests. I compared it with all others but it seems to be the fastest.
No parallelism, processor stayed at 13% (100% on 1 of the 8 cores) and I killed it after 4 hours with this code:
[font="Courier New"]DECLARE @Delimiter CHAR(1);
SELECT @Delimiter = '|';
SELECT c.Tar_ID,
ca.*
INTO CommentLinesFromUDF
FROM Comments c
CROSS APPLY dbo.ufn_SplitString_Cursor_VM8(c.Comment, @Delimiter) ca;
[/font]
Did I do something wrong in calling it? Is there maybe a different way I should call the function?
(I did modified the function slightly by adding an identity column to the output table so that I got the line number, but that shouldn't have made it take that long should it have?)
May 12, 2009 at 9:48 pm
Paul White (5/12/2009)
I believe he is referring to the (...FOR XML PATH('')).value('./[1]') nastiness.Embed a non-printable character to blow this one up.
RBarryYoung (5/12/2009)
When you are using FOR XML to do string concatention, there is a problem called "Entitization" where it turns certain characters like ">" into "& gt;". The ".VALUE" technique is one of several ways way to undo this, but adds 20-30% overhead.
Ooooh... THAT problem. Didn't know what it was actually called and didn't know .VALUE would fix it (with the overhead mentioned). Thanks guys.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 12, 2009 at 9:58 pm
Jeff Moden (5/11/2009)
Once you recognize the "pattern", you don't need all the ancillary conditional data moves and deletes. The "pattern" is that if it's NOT a continuation line, add a leading delimiter to the concatenation by CommentID in order by the CommentPartNum.
No parallelism, processor stayed at 13% (100% on 1 of the 8 cores) for just under 21 minutes to create the 9,177,458 rows. (Not counting the FOR XML combining since I created a table of the comments all rolled together for use in multiple tests.)
The .VALUE trick is to deal with special characters like &, in the source text, here is the roll up code I ended up using:
[font="Courier New"]DECLARE @Delimiter CHAR(1);
SELECT @Delimiter = '|';
SELECT
t1.CommentID,
(SELECT
CASE CommentCont
WHEN 'N' THEN @Delimiter
ELSE ''
END + CAST(t2.CommentPart AS VARCHAR(MAX))
FROM dbo.Comments t2
WHERE t2.CommentID = t1.CommentID --- must match GROUP BY below
ORDER BY t2.CommentPartNum
FOR XML PATH(''), TYPE).value('.[1]','varchar(max)') + @Delimiter AS Comment
INTO dbo.cteComments
FROM dbo.Comments t1
GROUP BY t1.CommentID;
[/font]
I re-did your example a little bit to get the parallelism to kick in and it ran in just over 14 minutes to create the 9,177,458 rows:
[font="Courier New"]DECLARE @Delimiter CHAR(1);
SELECT @Delimiter = '|';
SELECT
TAR_ID,
ROW_NUMBER() OVER (PARTITION BY TAR_ID ORDER BY t.n) AS Line,
SUBSTRING(s.Comment, t.N +1, CHARINDEX(@delimiter, s.Comment, t.N +1) -t.N -1) AS CommentLine
INTO dbo.CommentLines3
FROM dbo.cteComments s
CROSS JOIN dbo.Tally t
WHERE t.N < LEN(s.Comment)
AND SUBSTRING(s.Comment, t.N, 1) = @delimiter;
[/font]
So I think in my case we have a winner and it is the Tally table, unless someone has something else I can try. (I know I don't have CLR on the server, so I can't try that solution with my large data set.)
May 12, 2009 at 10:06 pm
UMG,
So is the parallel tally fast enough for you? Seems fairly reasonable to me.
Try: exec sp_configure 'clr_enabled', 1; reconfigure;
...some day 🙂
It's that hard.
Paul
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
May 12, 2009 at 10:09 pm
Paul White (5/12/2009)
Jeff Moden (5/12/2009)
If anyone knows what the ".VALUE trick" that UMG was talking about is, I'm all ears, thank you. :w00t:I believe he is referring to the (...FOR XML PATH('')).value('./[1]') nastiness. 😉
Embed a non-printable character to blow this one up.
Paul,
Non-printable characters blow up which method up? They appear to work just fine with the method I used. (For CR, LF, and TAB anyhow.)
May 12, 2009 at 10:14 pm
Paul White (5/12/2009)
UMG,So is the parallel tally fast enough for you? Seems fairly reasonable to me.
Try: exec sp_configure 'clr_enabled', 1; reconfigure;
...some day 🙂
It's that hard.
Yes, I think the tally solution is really fast... But nothing can ever be too fast can it?
I never said that technically it was hard to turn CLR on, but I don't get to just turn things on...
Viewing 15 posts - 451 through 465 (of 522 total)
You must be logged in to reply to this topic. Login to reply