SQL brian teaser...

Question

SQL brian teaser...

Viewing 15 posts - 16 through 30 (of 70 total)

You must be logged in to reply to this topic. Login to reply

LutzM SSC Guru Points: 107049 More actions · Answer 1

If my math is correct you can (in theory) split a list of consecutive values from 1 to over 125.000.000 (assuming plain numbers e.g. 125000000 instead of 125.000.000, separated by a comma, no blanks).

Reason: VARCHAR(Max) can store up to 2^31-1 bytes, which is just a little over 2Gb.

So, you should be fine... 🙂

I just wouldn't try to split a ~~file~~ (edit: string) of 2Gb at all, regardless of the method available...;-)

Edit2: ... but if I had to, Lynn's function would be one of the very last ones remaining to try it with...

If you have any doubt I'd recommend you test it with the max number you expect (let's say 1 to 100) and then test it with 10 times more data (e.g. 1 to 1000).

Don't forget to compare it to the WHILE loop though... I promise, you'd be amazed!

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 2

Digs (7/1/2009)
..Try It..
I am but my TEST data set only allows 30 or so selections..
Are you saying you are not sure if it can handle 100+ selections????
Please dont be cyptic, thanks ::hehe:

I'm not being cryptic. I'm telling you to test it out. It doesn't matter what your test table has in it. Just create a delimited string and see how it splits it.

I know what my function can do, I wrote it and tested it.

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 3

rijiboy (7/1/2009)
Lynn,
you are absolutely right, it will not, but the original question was not about speed 😀
Riz

BWAA-HAA!!!! The original question IS always be about speed especially when they specifically ask how to avoid the loop. 😉 Why would you tell someone how to write slow code on purpose? :hehe:

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 4

Lynn,

Was it you that first came up with the "Base 10" modification on Itzek's cteTally or someone else? Folks have had so many races on the subject that I don't remember.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 5

Jeff,

Pretty sure I'm the first one SSC to use the Base 10. I just thought it reduced the number of cross joins to generate a large number of values. Not really sure which is actually faster as I never really compared them side by side.

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 6

Lynn Pettis (7/2/2009)
Jeff,
Pretty sure I'm the first one SSC to use the Base 10. I just thought it reduced the number of cross joins to generate a large number of values. Not really sure which is actually faster as I never really compared them side by side.

I haven't finished my testing on it either but it appears that you may have hit a sweet spot. On my humble 7 year old 1.8Ghz desktop, it's about 2 tenths of a second faster on a million rows than Itzek's Base 2 version. Base 8 has comparable speed to your base 10 and Base 16 is slower at about the same speed as the Base 2 version. Like I said, I've got some more testing to do but it looks like you hit a sweet spot on it.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 7

Jeff Moden (7/2/2009)
Lynn Pettis (7/2/2009)
Jeff,
Pretty sure I'm the first one SSC to use the Base 10. I just thought it reduced the number of cross joins to generate a large number of values. Not really sure which is actually faster as I never really compared them side by side.
I haven't finished my testing on it either but it appears that you may have hit a sweet spot. On my humble 7 year old 1.8Ghz desktop, it's about 2 tenths of a second faster on a million rows than Itzek's Base 2 version. Base 8 has comparable speed to your base 10 and Base 16 is slower at about the same speed as the Base 2 version. Like I said, I've got some more testing to do but it looks like you hit a sweet spot on it.

Jeff,

First, I want to thank you for taking the time to actually test the various methods. It is something that I should perhaps do more often. Now this may seem heretical, but sometimes I just go with my gut feeling on things. Could just be me and how I started in this field as a computer operator, but there are times things just feel right.

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 8

Lynn Pettis (7/2/2009)

sometimes I just go with my gut feeling on things

Heh... welcome to the club. "Use the force, Luke."

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Christopher Stobbs SSC-Insane Points: 21098 More actions · Answer 9

WOW.

Lynn that is some awesome code. Very clever and very fast 🙂

----------------------------------------------
Try to learn something about everything and everything about something. - Thomas Henry Huxley

:w00t:
Posting Best Practices[/url]
Numbers / Tally Tables[/url]

SQL-4-Life

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 10

Christopher Stobbs (7/2/2009)
WOW.
Lynn that is some awesome code. Very clever and very fast 🙂

Thanks, but there may be better out there. I'm still reviewing an article that goes into this even more. Which reminds me, I really need to finish that up shortly and get it back to Flo.

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 11

Dang it... I thought you were really on to something Lynn. Still, a very cool slice of code you rendered that will smoke any While Loop. Consider that Itzek's fine code still won't beat the Tally table for speed. Unfortunately, as fast as the Base 10 version is, it doesn't either.

[font="Courier New"]--===== Setup a test variable with 1,000 random "ID's" to be split

DECLARE @Select VARCHAR(8000)

SELECT TOP 1000

@Select = ISNULL(@Select+',','')

+ CAST(ABS(CHECKSUM(NEWID()))%1000000 AS VARCHAR(10))

FROM Master.sys.All_Columns ac1

CROSS JOIN Master.sys.All_Columns ac2

SELECT @Select, LEN(@Select), RIGHT(@Select,20)

set statistics time on

--===== Do the split using a Tally table and return the index like the splitter function

SELECT ROW_NUMBER() OVER (ORDER BY t.N) AS ItemID,

SUBSTRING(',' + @Select, N+1, CHARINDEX(',', @Select+',', N)-N) AS Item

FROM dbo.Tally t

WHERE N <= LEN(',' + @Select)

AND SUBSTRING(',' + @Select ,t.N, 1) = ','

--===== Do the split using a CTE splitter in a function

SELECT * FROM dbo.DelimitedSplit(@select,',')

set statistics time off[/font]

Here are the results from my 7 year old box...

[font="Courier New"](1 row(s) affected)

(1000 row(s) affected)

SQL Server Execution Times: (Tally table)

CPU time = 0 ms, elapsed time = 290 ms.

(1000 row(s) affected)

SQL Server Execution Times: (Base 10 CTE function)

CPU time = 62 ms, elapsed time = 180 ms.[/font]

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 12

And, yes, Digs... that test shows that a 1000 element parameter was split in less than a heartbeat for both methods. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

WayneS SSC Guru Points: 95460 More actions · Answer 13

@jeff - would you mind trying out this little piece of code and let me know how fast it is on your "computer in a deathbox"?:-D

Also, would you compare the XML code in section 3 of the article (linked below) and see how fast the Tally table does on your computer taking a 31,000 element comma-delimited string apart? (When I use the Tally table method (that you used above) on that code, I'm only getting 2,000 records returned even though my tally table has 30,000 numbers...??? ) Thanks!

--===== Do the split using XML

declare @MyXML XML

declare @MyStr varchar(max) -- needed since I get an error about exceeding 8000 characters in the replace statement below

set @MyStr = @select

set @MyXML = '' + replace(@MyStr, ',', '') + ''

select ID = x.data.value('id[1]', 'int')

from @MyXML.nodes('/row') AS x(data)

Yes, I know I've used SQL 2005 code, and this is a SQL 2000 forum. I just want to get a comparison, since every other method was used above. Edit: including Lynn's, which is 2005 code also.

BTW, the XML method runs in 20-30ms on my computer... but then, the tally table method runs consistently at 3ms. I've used the XML method on some strings with > 30,000 elements and still get excellent response. In my experience, the larger the string gets, the better XML runs compared to other methods.

For more information on this XML method, please see my article Using XML to Enhance the Performance of String Manipulations[/url]

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

WayneS SSC Guru Points: 95460 More actions · Answer 14

Okay, I pushed my tally table from 30,000 to 1,000,000 records (30,000 / (14 char acct# + comma)) = 2000..... 😀

I get these results when running the code in my article (31,465 account numbers from an AdventureWorks table, code copied below) (results for each running 5 times):

XML: 1312/1139/1163/1099/1097 ms.

Tally: 1297/1352/1333/1285/1254 ms.

Delta: 15/213/170/186/157

Tally table method "won" once, XML the other four times.

Like I said previously, in my experience, XML wins when dealing with larger strings.

I'd still like to see what kind of results Jeff gets on this test on his old box...

Here's the code I ran:

use AdventureWorks

GO

-- create comma-delimted string with XML PATH('') method

declare @CSV varchar(max)

select @CSV = (select',' + AccountNumber from Sales.SalesOrderHeader FOR XML PATH('') )

set @CSV = substring(@CSV, 2, len(@CSV)-1)

set statistics time on

print '---- XML ----'

-- convert the CSV string into a valid XML string

declare @MyXMLData XML

set @MyXMLData = ''+

replace(@CSV,',','')+

''

select x.item.value('AccountNumber[1]','nvarchar(15)')

from @MyXMLData.nodes('/Rows/Row')AS x(item)

print '---- TALLY ----'

SELECT ROW_NUMBER() OVER (ORDER BY t.Number) AS ItemID,

SUBSTRING(',' + @CSV, Number+1, CHARINDEX(',', @CSV+',', Number)-Number) AS Item

FROM IMS.dbo.Numbers t

WHERE Number <= LEN(',' + @CSV)

AND SUBSTRING(',' + @CSV ,t.Number, 1) = ','

set statistics time off

GO 5

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

WayneS SSC Guru Points: 95460 More actions · Answer 15

And, just to be complete, I ran this test again including Lynn's CTE method (DelimitedSplit):

XML: 1571/1100/1350/1134/1130

Tally: 1429/1313/1306/1317/1288

CTE: 6283/6221/6292/6445/6569

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2