July 1, 2009 at 5:03 pm
If my math is correct you can (in theory) split a list of consecutive values from 1 to over 125.000.000 (assuming plain numbers e.g. 125000000 instead of 125.000.000, separated by a comma, no blanks).
Reason: VARCHAR(Max) can store up to 2^31-1 bytes, which is just a little over 2Gb.
So, you should be fine... 🙂
I just wouldn't try to split a file (edit: string) of 2Gb at all, regardless of the method available...;-)
Edit2: ... but if I had to, Lynn's function would be one of the very last ones remaining to try it with...
If you have any doubt I'd recommend you test it with the max number you expect (let's say 1 to 100) and then test it with 10 times more data (e.g. 1 to 1000).
Don't forget to compare it to the WHILE loop though... I promise, you'd be amazed!
July 1, 2009 at 5:08 pm
Digs (7/1/2009)
..Try It..I am but my TEST data set only allows 30 or so selections..
Are you saying you are not sure if it can handle 100+ selections????
Please dont be cyptic, thanks ::hehe:
I'm not being cryptic. I'm telling you to test it out. It doesn't matter what your test table has in it. Just create a delimited string and see how it splits it.
I know what my function can do, I wrote it and tested it.
July 1, 2009 at 10:01 pm
rijiboy (7/1/2009)
Lynn,you are absolutely right, it will not, but the original question was not about speed 😀
Riz
BWAA-HAA!!!! The original question IS always be about speed especially when they specifically ask how to avoid the loop. 😉 Why would you tell someone how to write slow code on purpose? :hehe:
--Jeff Moden
Change is inevitable... Change for the better is not.
July 1, 2009 at 11:22 pm
Lynn,
Was it you that first came up with the "Base 10" modification on Itzek's cteTally or someone else? Folks have had so many races on the subject that I don't remember.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 2, 2009 at 12:20 am
Jeff,
Pretty sure I'm the first one SSC to use the Base 10. I just thought it reduced the number of cross joins to generate a large number of values. Not really sure which is actually faster as I never really compared them side by side.
July 2, 2009 at 5:52 am
Lynn Pettis (7/2/2009)
Jeff,Pretty sure I'm the first one SSC to use the Base 10. I just thought it reduced the number of cross joins to generate a large number of values. Not really sure which is actually faster as I never really compared them side by side.
I haven't finished my testing on it either but it appears that you may have hit a sweet spot. On my humble 7 year old 1.8Ghz desktop, it's about 2 tenths of a second faster on a million rows than Itzek's Base 2 version. Base 8 has comparable speed to your base 10 and Base 16 is slower at about the same speed as the Base 2 version. Like I said, I've got some more testing to do but it looks like you hit a sweet spot on it.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 2, 2009 at 7:30 am
Jeff Moden (7/2/2009)
Lynn Pettis (7/2/2009)
Jeff,Pretty sure I'm the first one SSC to use the Base 10. I just thought it reduced the number of cross joins to generate a large number of values. Not really sure which is actually faster as I never really compared them side by side.
I haven't finished my testing on it either but it appears that you may have hit a sweet spot. On my humble 7 year old 1.8Ghz desktop, it's about 2 tenths of a second faster on a million rows than Itzek's Base 2 version. Base 8 has comparable speed to your base 10 and Base 16 is slower at about the same speed as the Base 2 version. Like I said, I've got some more testing to do but it looks like you hit a sweet spot on it.
Jeff,
First, I want to thank you for taking the time to actually test the various methods. It is something that I should perhaps do more often. Now this may seem heretical, but sometimes I just go with my gut feeling on things. Could just be me and how I started in this field as a computer operator, but there are times things just feel right.
July 2, 2009 at 9:45 am
Lynn Pettis (7/2/2009)
sometimes I just go with my gut feeling on things
Heh... welcome to the club. "Use the force, Luke."
--Jeff Moden
Change is inevitable... Change for the better is not.
July 2, 2009 at 10:05 am
WOW.
Lynn that is some awesome code. Very clever and very fast 🙂
----------------------------------------------
Try to learn something about everything and everything about something. - Thomas Henry Huxley
:w00t:
Posting Best Practices[/url]
Numbers / Tally Tables[/url]
July 2, 2009 at 10:10 am
Christopher Stobbs (7/2/2009)
WOW.Lynn that is some awesome code. Very clever and very fast 🙂
Thanks, but there may be better out there. I'm still reviewing an article that goes into this even more. Which reminds me, I really need to finish that up shortly and get it back to Flo.
July 3, 2009 at 7:49 pm
Dang it... I thought you were really on to something Lynn. Still, a very cool slice of code you rendered that will smoke any While Loop. Consider that Itzek's fine code still won't beat the Tally table for speed. Unfortunately, as fast as the Base 10 version is, it doesn't either.
DECLARE @Select VARCHAR(8000)
SELECT TOP 1000
@Select = ISNULL(@Select+',','')
+ CAST(ABS(CHECKSUM(NEWID()))%1000000 AS VARCHAR(10))
FROM Master.sys.All_Columns ac1
CROSS JOIN Master.sys.All_Columns ac2
SELECT @Select, LEN(@Select), RIGHT(@Select,20)
set statistics time on
--===== Do the split using a Tally table and return the index like the splitter function
SELECT ROW_NUMBER() OVER (ORDER BY t.N) AS ItemID,
SUBSTRING(',' + @Select, N+1, CHARINDEX(',', @Select+',', N)-N) AS Item
FROM dbo.Tally t
WHERE N <= LEN(',' + @Select)
AND SUBSTRING(',' + @Select ,t.N, 1) = ','
--===== Do the split using a CTE splitter in a function
SELECT * FROM dbo.DelimitedSplit(@select,',')
set statistics time off[/font]
(1000 row(s) affected)
SQL Server Execution Times: (Tally table)
CPU time = 0 ms, elapsed time = 290 ms.
(1000 row(s) affected)
SQL Server Execution Times: (Base 10 CTE function)
CPU time = 62 ms, elapsed time = 180 ms.[/font]
--Jeff Moden
Change is inevitable... Change for the better is not.
July 3, 2009 at 7:50 pm
And, yes, Digs... that test shows that a 1000 element parameter was split in less than a heartbeat for both methods. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
July 3, 2009 at 9:14 pm
@jeff - would you mind trying out this little piece of code and let me know how fast it is on your "computer in a deathbox"?:-D
Also, would you compare the XML code in section 3 of the article (linked below) and see how fast the Tally table does on your computer taking a 31,000 element comma-delimited string apart? (When I use the Tally table method (that you used above) on that code, I'm only getting 2,000 records returned even though my tally table has 30,000 numbers...??? ) Thanks!
--===== Do the split using XML
declare @MyXML XML
declare @MyStr varchar(max) -- needed since I get an error about exceeding 8000 characters in the replace statement below
set @MyStr = @select
set @MyXML = '' + replace(@MyStr, ',', '') + ''
select ID = x.data.value('id[1]', 'int')
from @MyXML.nodes('/row') AS x(data)
Yes, I know I've used SQL 2005 code, and this is a SQL 2000 forum. I just want to get a comparison, since every other method was used above. Edit: including Lynn's, which is 2005 code also.
BTW, the XML method runs in 20-30ms on my computer... but then, the tally table method runs consistently at 3ms. I've used the XML method on some strings with > 30,000 elements and still get excellent response. In my experience, the larger the string gets, the better XML runs compared to other methods.
For more information on this XML method, please see my article Using XML to Enhance the Performance of String Manipulations[/url]
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
July 3, 2009 at 9:54 pm
Okay, I pushed my tally table from 30,000 to 1,000,000 records (30,000 / (14 char acct# + comma)) = 2000..... 😀
I get these results when running the code in my article (31,465 account numbers from an AdventureWorks table, code copied below) (results for each running 5 times):
XML: 1312/1139/1163/1099/1097 ms.
Tally: 1297/1352/1333/1285/1254 ms.
Delta: 15/213/170/186/157
Tally table method "won" once, XML the other four times.
Like I said previously, in my experience, XML wins when dealing with larger strings.
I'd still like to see what kind of results Jeff gets on this test on his old box...
Here's the code I ran:
use AdventureWorks
GO
-- create comma-delimted string with XML PATH('') method
declare @CSV varchar(max)
select @CSV = (select',' + AccountNumber from Sales.SalesOrderHeader FOR XML PATH('') )
set @CSV = substring(@CSV, 2, len(@CSV)-1)
set statistics time on
print '---- XML ----'
-- convert the CSV string into a valid XML string
declare @MyXMLData XML
set @MyXMLData = ''+
replace(@CSV,',','')+
''
select x.item.value('AccountNumber[1]','nvarchar(15)')
from @MyXMLData.nodes('/Rows/Row')AS x(item)
print '---- TALLY ----'
SELECT ROW_NUMBER() OVER (ORDER BY t.Number) AS ItemID,
SUBSTRING(',' + @CSV, Number+1, CHARINDEX(',', @CSV+',', Number)-Number) AS Item
FROM IMS.dbo.Numbers t
WHERE Number <= LEN(',' + @CSV)
AND SUBSTRING(',' + @CSV ,t.Number, 1) = ','
set statistics time off
GO 5
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
July 3, 2009 at 10:09 pm
And, just to be complete, I ran this test again including Lynn's CTE method (DelimitedSplit):
XML: 1571/1100/1350/1134/1130
Tally: 1429/1313/1306/1317/1288
CTE: 6283/6221/6292/6445/6569
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
Viewing 15 posts - 16 through 30 (of 70 total)
You must be logged in to reply to this topic. Login to reply