December 18, 2012 at 11:10 pm
Let's try a few of these solutions in a small test harness to see what we can see, shall we?
CREATE TABLE #Strings (ID INT IDENTITY, MyString VARCHAR(8000))
;WITH Tally (n) AS (
SELECT TOP 100 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns a CROSS JOIN sys.all_columns b)
INSERT INTO #Strings
SELECT CAST((
SELECT CAST(n AS VARCHAR(5)) + ';' + CAST(40000+n AS VARCHAR(6)) + '|'
FROM Tally
FOR XML PATH('')) AS VARCHAR(8000))
FROM Tally
DECLARE @Holder1 VARCHAR(8000), @Holder2 VARCHAR(8000)
PRINT 'DelimitedSplit8K (Lynn Pettis)'
SET STATISTICS TIME ON
select
@Holder1=max(case ds2.ItemNumber when 1 then ds2.Item else '' end), --as field1,
@Holder2=max(case ds2.ItemNumber when 2 then ds2.Item else '' end) --as field2
from
#Strings
cross apply dbo.DelimitedSplit8K(MyString,'|') ds1
cross apply dbo.DelimitedSplit8K(ds1.Item,';') ds2
where
ds1.Item <> ''
group by
ID, ds1.ItemNumber;
SET STATISTICS TIME OFF
PRINT 'PatternSplitCM (Dwain.C)'
SET STATISTICS TIME ON
SELECT @Holder1=MAX(CASE ItemNumber%4 WHEN 1 THEN Item END)
,@Holder2=MAX(CASE ItemNumber%4 WHEN 3 THEN Item END)
FROM #Strings
CROSS APPLY PatternSplitCM(MyString, '[0-9]')
WHERE [Matched] = 1
GROUP BY ID, ItemNumber/4
--OPTION (MAXDOP 1)
SET STATISTICS TIME OFF
PRINT 'DelimitedSplit8K (Steven Willis)'
SET STATISTICS TIME ON
SELECT
@Holder1=(SELECT Item FROM dbo.DelimitedSplit8K(r1.Item,';') a WHERE ItemNumber = 1) --AS Col1
,@Holder2=(SELECT Item FROM dbo.DelimitedSplit8K(r1.Item,';') a WHERE ItemNumber = 2) --AS Col2
FROM
(
SELECT
ID, dsk1.Item
FROM #Strings a
CROSS APPLY
dbo.DelimitedSplit8K(MyString,'|') AS dsk1
) r1
SET STATISTICS TIME OFF
DROP TABLE #Strings
Results:
DelimitedSplit8K (Lynn Pettis)
SQL Server Execution Times:
CPU time = 280 ms, elapsed time = 164 ms.
PatternSplitCM (Dwain.C)
SQL Server Execution Times:
CPU time = 141 ms, elapsed time = 13946 ms.
Warning: Null value is eliminated by an aggregate or other SET operation.
DelimitedSplit8K (Steven Willis)
SQL Server Execution Times:
CPU time = 531 ms, elapsed time = 703 ms.
Lynn clearly wins on elapsed time, but PatternSplitCM wins on CPU time. Not sure why its elapsed time is trailing the pack by such a wide margin though.
I didn't include the XML solution because it looked a bit complicated to fit into the harness but please feel free to give it a try.
Also I'm not 100% sure I adapted Steven's solution to the table correctly as it was returning some rows with NULL values.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
December 18, 2012 at 11:18 pm
mtassin (12/17/2012)
murthyvs (12/16/2012)
Hi dwain.c - I tried changing VARCHAR(8000) to VARCHAR(MAX). Ran the query with new pattern [0-9A-Za-z]. My input string is a very long string.sample:
3038469
3138470
The results were as expected BUT the query returns fewer than expected rows. I believe somewhere in the function its hitting the 8000 limit. Any ideas? Thanks!
The Delimittedsplit8k only counts up to E4's max which is 10,000. If you have more than 10,000 characters in the string you'll need to increase the size of the tally based CTE by creating an E8 which would cross 10,000 by 10,000 and handle
100,000,000 columns of data for searching for delimitters.
Yes, I noticed that. I created a E5 and changed 10,000 to 100,000.
December 18, 2012 at 11:34 pm
Results:
DelimitedSplit8K (Lynn Pettis)
SQL Server Execution Times:
CPU time = 280 ms, elapsed time = 133 ms.
PatternSplitCM (Dwain.C)
SQL Server Execution Times:
CPU time = 94 ms, elapsed time = 11653 ms.
Warning: Null value is eliminated by an aggregate or other SET operation.
DelimitedSplit8K (Steven Willis)
SQL Server Execution Times:
CPU time = 468 ms, elapsed time = 717 ms.
Lynn clearly wins on elapsed time, but PatternSplitCM smokes all contenders on CPU time. Not sure why its elapsed time is trailing the pack by such a wide margin though.
I didn't include the XML solution because it looked a bit complicated to fit into the harness but please feel free to give it a try.
Thanks for all your help and suggestions. I finally chose DelimitedSplit8K - Lynn Pettis method. I tried all methods but for some magical reason my stored proc really runs fast/goes well with DelimitedSplit8K- Lynn Pettis method. For now, I am constraining the user to 250 records (each of the fields can have up to 10 characters) until I figure a way to import the excel file into sql server from my web application (right now the user just copy-pastes the two columns from excel file into a multiline text box).
December 19, 2012 at 5:21 pm
dwain.c (12/18/2012)
Also I'm not 100% sure I adapted Steven's solution to the table correctly as it was returning some rows with NULL values.
The answer to this is the delimiter appearing at the end of the test strings. Remove it and I think Steven's code will work exactly like the others. I wouldn't expect the timings to be changed much though.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
December 19, 2012 at 7:05 pm
Hi, I had a feeling that I had forgotten something important about the XML method...then I remembered...
My revised code (suitable for the test harness):
PRINT 'XML (Mister Magoo)'
SET STATISTICS TIME ON
SELECT -- XML (Mister Magoo)
@Holder1= nd.value('./@a','varchar(200)') ,
@Holder2= nd.value('./@b','varchar(200)')
FROM (
SELECT CAST(
'<c a="'+
REPLACE(
REPLACE(
MyString,
'|',
'"/><c a="'
),
';',
'" b="'
)
+'"/>' AS XML
).query('.') FROM #Strings
FOR XML PATH(''),TYPE
) AS src(nds)
CROSS APPLY nds.nodes('c') AS x(nd)
WHERE nd.exist('./@b')=1
SET STATISTICS TIME OFF
edit:removed unnecessary comment :doze:
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
December 19, 2012 at 7:11 pm
"Oh, Magoo! You've done it again!"
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
December 19, 2012 at 7:13 pm
dwain.c (12/19/2012)
"Oh, Magoo! You've done it again!"
Maybe 😉 I am just testing other data patterns...
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
December 19, 2012 at 7:19 pm
mister.magoo (12/19/2012)
dwain.c (12/19/2012)
"Oh, Magoo! You've done it again!"Maybe 😉 I am just testing other data patterns...
You know the really funny thing. I made the discovery of that approach independently about a year ago while starting out learning about and playing with XML. Found that it wasn't unique after a Google search. But the thing was, I never figured it was going to be very good in the performance department for some reason. So I never actually used it for anything.
Guess I'll need to rethink that.
How many people do you think will actually remember Mr. Magoo's catch phrase?
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
December 19, 2012 at 7:28 pm
Dwain,
Performance wise, the main "speed ups" in this version are the use of our old friend FOR XML PATH('') to bolt everything into one XML fragment (means only handling one XML fragment rather than one per row) and the seemingly innocuous ".query('.')"
See This Brad Schulz Blog Post for more on this technique (not a lot more, but extra validation of it's efficacy)...
I also chose to use a single node per pair of values instead of the row/col approach I had previously - although I haven't tested whether this gave a speed boost or not - however this would not work so well if the data had more than two values per row.
edit:spelling
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
December 19, 2012 at 7:52 pm
And the nice thing is this method works nicely on varchar(MAX)...
I used 2000 rows of 2000 pairs of values to test :
Duration_AvgCpu_AvgReads_AvgWrites_AvgRowCounts_Avg
106994.000000985614196918 36806 4000000
Of course, I couldn't compare this to the other methods because they don't handle MAX in their current forms, but I did test them at just 200 rows/200 pairs per row and they were getting very slow.
I think 107 seconds to split out 4 million rows is quite good though 😀
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
December 19, 2012 at 8:36 pm
mister.magoo (12/19/2012)
And the nice thing is this method works nicely on varchar(MAX)...I used 2000 rows of 2000 pairs of values to test :
Duration_AvgCpu_AvgReads_AvgWrites_AvgRowCounts_Avg
106994.000000985614196918 36806 4000000
Of course, I couldn't compare this to the other methods because they don't handle MAX in their current forms, but I did test them at just 200 rows/200 pairs per row and they were getting very slow.
I think 107 seconds to split out 4 million rows is quite good though 😀
Very interesting and thanks for the Brad Schultz link.
I have a vague recollection of some recent discussion in Jeff's DelimitedSplit8K article that may have also touched on this approach. That discussion thread is pretty huge but it would be in the last 2-3 pages of it I think. Can't recall if it was suggesting to use this for VARCHAR(MAX) or not though. Maybe something else to investigate or maybe even something you'd like to add to the discussion.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
December 19, 2012 at 9:07 pm
mister.magoo (12/19/2012)
Hi, I had a feeling that I had forgotten something important about the XML method...then I remembered...
N-i-i-i-i-i-c-c-c-c-e-e-e ! ! ! ! !
--Jeff Moden
Change is inevitable... Change for the better is not.
December 19, 2012 at 10:14 pm
I'm getting some conflicting results on the performance testing. A 10 year old single cpu desktop computer and a laptop probably aren't the best places for me to test on. I'll try to reserve a little time to test these on the server at work today (it's just after midnight here).
--Jeff Moden
Change is inevitable... Change for the better is not.
December 19, 2012 at 10:17 pm
Jeff Moden (12/19/2012)
I'm getting some conflicting results on the performance testing. A 10 year old single cpu desktop computer and a laptop probably aren't the best places for me to test on. I'll try to reserve a little time to test these on the server at work today (it's just after midnight here).
I would be really interested to hear if you've got any ideas why PatternSplitCM is doing reasonably well on the CPU side but sucking so badly on elapsed time.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
December 20, 2012 at 1:34 am
Jeff Moden (12/19/2012)
I'm getting some conflicting results on the performance testing. A 10 year old single cpu desktop computer and a laptop probably aren't the best places for me to test on. I'll try to reserve a little time to test these on the server at work today (it's just after midnight here).
Having read recently about the problems with using set statistics time on, I used a server side trace to measure when I was testing.
I also saw some strange results, mostly from Lynn's solution which, for some reason would sometimes have terrible elapsed time, but I took that to be environmental as it was seemingly random.
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
Viewing 15 posts - 31 through 45 (of 57 total)
You must be logged in to reply to this topic. Login to reply