April 13, 2009 at 6:29 pm
Bruce W Cassidy (4/13/2009)
P.S. Phil Factor's blog post makes interesting reading, particularly for this little gem:Jeff Moden
P.S. Heh... Always say "Never" to a RBAR While loop... if the While loop beats the set based method, there's something else wrong. 😉:w00t:
[/font]
Heh... and the stone throwing from the band wagon begins. 😉 Don't forget, I was the one that first said that Flo's beat the Tally table, so be nice. :hehe:
--Jeff Moden
Change is inevitable... Change for the better is not.
April 13, 2009 at 6:33 pm
Jeff Moden (4/13/2009)
Heh... and the stone throwing from the band wagon begins. 😉 Don't forget, I was the one that first said that Flo's beat the Tally table, so be nice. :hehe:
[font="Verdana"]Whoa! No stones being thrown here! In fact, quite the opposite. I think it's brilliant that you're chipping in and helping disprove your own theory (I should add "in this case"), and that's a clear indication of your professionalism. 😀
Besides, the real answer in this case is CLR, as it's still far faster than the while loop (or cursor). So in a way, your earlier statement is still true.
[/font]
April 13, 2009 at 7:05 pm
Bruce W Cassidy (4/13/2009)
Jeff Moden (4/13/2009)
Heh... and the stone throwing from the band wagon begins. 😉 Don't forget, I was the one that first said that Flo's beat the Tally table, so be nice. :hehe:[font="Verdana"]Whoa! No stones being thrown here! In fact, quite the opposite. I think it's brilliant that you're chipping in and helping disprove your own theory (I should add "in this case"), and that's a clear indication of your professionalism. 😀
Besides, the real answer in this case is CLR, as it's still far faster than the while loop (or cursor). So in a way, your earlier statement is still true.
[/font]
Thanks, Bruce... and sorry I took it the wrong way. Guess I'm a little sensitive to see my favorite baby get the tar beat out of it by what used to be considered RBAR, not once, but twice. If you include the fact that a While Loop worked better on Phil's old machine on the Moby Dick split, that makes 3 times. BWAA-HAA! Looks like I've gotta change my sig and my avatar, again. 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
April 13, 2009 at 7:44 pm
Jeff Moden (4/13/2009)
Thanks, Bruce... and sorry I took it the wrong way.
[font="Verdana"]No problem.
I think it's yet another example of using the right tool for the job. It's just that the right tool in this case is not a tally table. Considering lots of us have been recommending the use of CLR for things like text handling, I'm just thrilled to see the numbers to back that advice up! (Because frankly, I haven't had time to learn C# in depth yet, so I haven't felt my own code was indicative of how well CLR can perform.)[/font]
April 13, 2009 at 8:59 pm
Bruce W Cassidy (4/13/2009)
Jeff Moden (4/13/2009)
Thanks, Bruce... and sorry I took it the wrong way.[font="Verdana"]No problem.
I think it's yet another example of using the right tool for the job. It's just that the right tool in this case is not a tally table. Considering lots of us have been recommending the use of CLR for things like text handling, I'm just thrilled to see the numbers to back that advice up! (Because frankly, I haven't had time to learn C# in depth yet, so I haven't felt my own code was indicative of how well CLR can perform.)[/font]
Yep... I agree... it's the first time I've actually seen someone post numbers on an apples-to-apples test with a CLR on such a thing. Since I don't write C# at all, had to depend on others and, up to now, there were no takers even though I've been asking.
Heh... like I said... looks like I'm gonna have to learn C#. 😀
--Jeff Moden
Change is inevitable... Change for the better is not.
April 14, 2009 at 2:00 am
Or a faster split UDF?
--- -> Peso solution
PRINT 'Start Peso solution'
SELECT @now = GETDATE()
--- Split text into lines
INSERT INTO @result
SELECT l.data
FROM @source s
CROSS APPLY dbo.fnParseList(CHAR(13), s.definition) l
--- Results
SELECT @duration = DATEDIFF(MILLISECOND, @now, GETDATE())
SELECT @count = COUNT(*) FROM @result
PRINT 'Milliseconds: ' + CONVERT(VARCHAR(10), @duration) + ' | Lines: ' + CONVERT(VARCHAR(10), @count)
--- <- Peso solution
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=76033
N 56°04'39.16"
E 12°55'05.25"
April 14, 2009 at 3:00 am
looks like I'm gonna have to learn C#.
Watch out that your halo does not slip too far Jeff 😉
Far away is close at hand in the images of elsewhere.
Anon.
April 14, 2009 at 5:06 am
Peso (4/14/2009)
Or a faster split UDF?
PRINT 'Start Peso solution'
...blah blah blah...
Awesome. You have saved the day! Well done sir! :ermm:
I can't wait for the version of your solution that actually does something similar to the requirements.
The input string is NVARCHAR(MAX), not VARCHAR(8000).
The delimiter is CR + LF : NCHAR(2), not a CHAR.
Aside from that, awesome, as I say.
Thanks also for not bothering to post performance figures for us to compare.
I would edit your TVF to match the requirements and get the performance figures for myself, but it's late and I can't be bothered, frankly. Maybe tomorrow.
:laugh:
Paul
April 14, 2009 at 5:54 am
No problem Mr White.
CREATE FUNCTION dbo.fnParseList
(
@Text NVARCHAR(MAX)
)
RETURNS @Result TABLE (RowID SMALLINT IDENTITY(1, 1) PRIMARY KEY, Data NVARCHAR(MAX))
AS
BEGIN
DECLARE@NextPos INT,
@LastPos INT,
@Delimiter NCHAR(2)
SELECT@Delimiter = NCHAR(13) + NCHAR(10),
@NextPos = CHARINDEX(@Delimiter, @Text, 1),
@LastPos = 0
WHILE @NextPos > 0
BEGIN
INSERT@Result
(
Data
)
SELECTSUBSTRING(@Text, @LastPos + 1, @NextPos - @LastPos - 1)
SELECT@LastPos = @NextPos,
@NextPos = CHARINDEX(@Delimiter, @Text, @NextPos + 1)
END
IF SCOPE_IDENTITY() > 0
INSERT@Result
(
Data
)
SELECTSUBSTRING(@Text, @LastPos + 1, DATALENGTH(@Text) - @LastPos)
RETURN
END
N 56°04'39.16"
E 12°55'05.25"
April 14, 2009 at 7:42 am
We really haven't got to the bottom of what was causing the original problem.
As I'm a bit stupid, and read it too quickly, I hadn't realized that the original code was using the stored procedure data in master for its data. (Doh!) When I finally realized that we had the test data, I ran some tests with my own hot-shot generic split-table-function routine. It took almost exactly the same length of time as Florian's usp_print_lines procedure. Peso's was faster by a third. When I then looked at the test harness, I realised that it just isn't comparing like-with- like and we're jumping to conclusions about what is causing the problem with the tally solution. Why not re-write the test harness so that both strategies use the CROSS APPLY with a table-function, for example, or if both strategies use the outer WHILE loop and the INSERT....EXECUTE.
Until we can control for all other things that effect the performance we are merely guessing.
Until you can prove me wrong, I shall claim that, with this test-data, a good sensible set-based split routine out-performs the RBAR WHILE-loop by almost ten-fold, but then I may have made a mistake somewhere!
Best wishes,
Phil Factor
April 14, 2009 at 8:27 am
David Burrows (4/14/2009)
looks like I'm gonna have to learn C#.
Watch out that your halo does not slip too far Jeff 😉
Heh... Not to worry. I'll wear the tin hat over it so it doesn't slip.
I see lots and lots of good code... I've seen some of it before like Barry's and Peter's. I'm going to have to revisit it all mostly because I've gotta test things for myself. You know the old saying... "Trust... but verify". 😛
I do wish they'd fix these bloody code windows, though... it's a huge pain to reformat what you copy and paste. It's definitely NOT a WYSIWYG thing.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 14, 2009 at 8:31 am
Phil Factor (4/14/2009)
I ran some tests with my own hot-shot generic split-table-function routine.
Phil, could you post that code here so we have all these fine split functions on one wonderful thread? Thanks.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 14, 2009 at 8:42 am
[font="Courier New"]--================================================
-- Multi-statement Table-valued Function that splits an NVARCHAR(MAX)
-- into rows depending on a delimiter
--================================================
IF OBJECT_ID (N'splitStringToTable') IS NOT NULL
DROP FUNCTION splitStringToTable
GO
CREATE FUNCTION splitStringToTable(@String NVARCHAR(MAX), @Delimiter VARCHAR(255))
RETURNS @strings TABLE
(
line_no INT IDENTITY(1,1),
theIndex INT DEFAULT 0,
previousIndex INT DEFAULT 0,
TheLine VARCHAR(255) DEFAULT '')
AS
-- body of the function
BEGIN
DECLARE @TheIndex INT, @Previousindex INT,@LenDelimiter INT
SELECT @TheIndex=-1, @LenDelimiter=LEN(@Delimiter)
--create the string table unfilled but the right length
INSERT INTO @strings(TheIndex) SELECT 0 FROM numbers
WHERE number <=(LEN(@String)
-LEN(REPLACE(@String,@Delimiter,'')))/LEN(@Delimiter)+1
--and populate the table
UPDATE @strings
SET @PreviousIndex=PreviousIndex=@TheIndex,
@TheIndex=theIndex=CASE WHEN @PreviousIndex<>0
THEN CHARINDEX(@Delimiter,@String,@PreviousIndex+1)ELSE 0 END,
TheLine=CASE WHEN @TheIndex>0 THEN
LEFT(SUBSTRING(@String,@PreviousIndex+@LenDelimiter,
@TheIndex-@previousindex-@LenDelimiter),255)
WHEN @PreviousIndex>0 THEN
LEFT(SUBSTRING(@String,@PreviousIndex+@LenDelimiter,
LEN(@String)-@previousindex),255)
ELSE '' END
RETURN
END
GO
DECLARE @mehitabelsExtensivePast VARCHAR(MAX)
SELECT @mehitabelsExtensivePast=
'
i have been
used something fierce in my time but
i am no bum sport archy
i am a free spirit archy i
look on myself as being
quite a romantic character oh the
queens i have been and the
swell feeds i have ate
a cockroach which you are
and a poet which you used to be
archy couldn t understand
my feelings at having come
down to this i have
had bids to elegant feeds where poets
and cockroaches would
neither one be mentioned without a
laugh archy i have had
adventures but i
have never been an adventuress'
SELECT * FROM dbo.splitStringToTable(@mehitabelsExtensivePast,'
')
[/font]
Best wishes,
Phil Factor
April 14, 2009 at 5:04 pm
Hi everybody
Sorry for my late answer but I had a busy day at work...
Thank you for all your investigations and your time!!!
I've not been lazy. I did many tests for different cases and added each single of your posted split functions to my test environment. I'm not sure if I covered all business cases, but I did my best...
I tested three different cases for each function:
* First the initial tested master.sys.sql_modules. Property, very large test especially some of them.
* Second have been also larger data but with much more lines and shorter text in a single line.
* Third test was Lynn's idea. A table with shorter text (NVARCHAR(4000))
Here my final test results
SourceType Description Lines Duration AverageTextLength MaxTextLength
---------- ----------------------------------------------------- ------- ----------- -------------------- --------------------
Large CLR tvf (Paul White) solution 28545 580 5076 73908
Large CLR XML Solution 28545 823 5076 73908
Large Cursor solution 28145 2000 5076 73908
Large Peso solution 28545 2143 5076 73908
Large Phil Factor solution 28577 2703 5076 73908
Large Tally-function 4 solution (has same bug as the rest) 28145 3200 5076 73908
Large Tally-function 3 solution (correct line-splitting) 28545 3290 5076 73908
Large Tally solution 28145 3870 5076 73908
Many CLR tvf (Paul White) solution 157586 1626 3903 12672
Many CLR XML Solution 157586 2590 3903 12672
Many Tally solution 156410 4356 3903 12672
Many Tally-function 4 solution (has same bug as the rest) 156410 5823 3903 12672
Many Tally-function 3 solution (correct line-splitting) 157586 6030 3903 12672
Many Cursor solution 155288 6716 3903 12672
Many Peso solution 157562 7193 3903 12672
Many Phil Factor solution 157677 11163 3903 12672
Short CLR tvf (Paul White) solution 44106 226 2526 3994
Short Tally solution 43306 310 2526 3994
Short CLR XML Solution 44106 473 2526 3994
Short Cursor solution 43306 1080 2526 3994
Short Peso solution 44106 1083 2526 3994
Short Phil Factor solution 44197 1123 2526 3994
Short Tally-function 4 solution (has same bug as the rest) 43306 1750 2526 3994
Short Tally-function 3 solution (correct line-splitting) 44106 1813 2526 3994
I'm still wondering why my Tally solution does not return the correct counts of lines but I will investigate.
Resume
* The winner of the split contest is Pauls CLR table valued function!
* The tally table does a very good work on smaler strings! It even beats my XML based scalar CLR function.
* The XML based CLR function seems also a good approach but creates too much overhead
* The other solutions are also all very good if we keep in mind that we speak about MBs to be splited!
I'm afraid that I can't post the complete script because of the formatting of this page. Firefox takes over 350mb RAM and IE7 just blows with some script failures. I attached the complete test script as text file.
Thanks again to all of you!
Flo
Edit: Reformatted the test results
Viewing 15 posts - 46 through 60 (of 522 total)
You must be logged in to reply to this topic. Login to reply