SQL brian teaser...

  • If my math is correct you can (in theory) split a list of consecutive values from 1 to over 125.000.000 (assuming plain numbers e.g. 125000000 instead of 125.000.000, separated by a comma, no blanks).

    Reason: VARCHAR(Max) can store up to 2^31-1 bytes, which is just a little over 2Gb.

    So, you should be fine... 🙂

    I just wouldn't try to split a file (edit: string) of 2Gb at all, regardless of the method available...;-)

    Edit2: ... but if I had to, Lynn's function would be one of the very last ones remaining to try it with...

    If you have any doubt I'd recommend you test it with the max number you expect (let's say 1 to 100) and then test it with 10 times more data (e.g. 1 to 1000).

    Don't forget to compare it to the WHILE loop though... I promise, you'd be amazed!



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Digs (7/1/2009)


    ..Try It..

    I am but my TEST data set only allows 30 or so selections..

    Are you saying you are not sure if it can handle 100+ selections????

    Please dont be cyptic, thanks ::hehe:

    I'm not being cryptic. I'm telling you to test it out. It doesn't matter what your test table has in it. Just create a delimited string and see how it splits it.

    I know what my function can do, I wrote it and tested it.

  • rijiboy (7/1/2009)


    Lynn,

    you are absolutely right, it will not, but the original question was not about speed 😀

    Riz

    BWAA-HAA!!!! The original question IS always be about speed especially when they specifically ask how to avoid the loop. 😉 Why would you tell someone how to write slow code on purpose? :hehe:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Lynn,

    Was it you that first came up with the "Base 10" modification on Itzek's cteTally or someone else? Folks have had so many races on the subject that I don't remember.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    Pretty sure I'm the first one SSC to use the Base 10. I just thought it reduced the number of cross joins to generate a large number of values. Not really sure which is actually faster as I never really compared them side by side.

  • Lynn Pettis (7/2/2009)


    Jeff,

    Pretty sure I'm the first one SSC to use the Base 10. I just thought it reduced the number of cross joins to generate a large number of values. Not really sure which is actually faster as I never really compared them side by side.

    I haven't finished my testing on it either but it appears that you may have hit a sweet spot. On my humble 7 year old 1.8Ghz desktop, it's about 2 tenths of a second faster on a million rows than Itzek's Base 2 version. Base 8 has comparable speed to your base 10 and Base 16 is slower at about the same speed as the Base 2 version. Like I said, I've got some more testing to do but it looks like you hit a sweet spot on it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (7/2/2009)


    Lynn Pettis (7/2/2009)


    Jeff,

    Pretty sure I'm the first one SSC to use the Base 10. I just thought it reduced the number of cross joins to generate a large number of values. Not really sure which is actually faster as I never really compared them side by side.

    I haven't finished my testing on it either but it appears that you may have hit a sweet spot. On my humble 7 year old 1.8Ghz desktop, it's about 2 tenths of a second faster on a million rows than Itzek's Base 2 version. Base 8 has comparable speed to your base 10 and Base 16 is slower at about the same speed as the Base 2 version. Like I said, I've got some more testing to do but it looks like you hit a sweet spot on it.

    Jeff,

    First, I want to thank you for taking the time to actually test the various methods. It is something that I should perhaps do more often. Now this may seem heretical, but sometimes I just go with my gut feeling on things. Could just be me and how I started in this field as a computer operator, but there are times things just feel right.

  • Lynn Pettis (7/2/2009)


    sometimes I just go with my gut feeling on things

    Heh... welcome to the club. "Use the force, Luke."

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • WOW.

    Lynn that is some awesome code. Very clever and very fast 🙂

    ----------------------------------------------
    Try to learn something about everything and everything about something. - Thomas Henry Huxley

    :w00t:
    Posting Best Practices[/url]
    Numbers / Tally Tables[/url]

    SQL-4-Life
  • Christopher Stobbs (7/2/2009)


    WOW.

    Lynn that is some awesome code. Very clever and very fast 🙂

    Thanks, but there may be better out there. I'm still reviewing an article that goes into this even more. Which reminds me, I really need to finish that up shortly and get it back to Flo.

  • Dang it... I thought you were really on to something Lynn. Still, a very cool slice of code you rendered that will smoke any While Loop. Consider that Itzek's fine code still won't beat the Tally table for speed. Unfortunately, as fast as the Base 10 version is, it doesn't either.


    [font="Courier New"]--===== Setup a test variable with 1,000 random "ID's" to be split

    DECLARE @Select VARCHAR(8000)

     SELECT TOP 1000

            @Select = ISNULL(@Select+',','') 

                    + CAST(ABS(CHECKSUM(NEWID()))%1000000 AS VARCHAR(10))

       FROM Master.sys.All_Columns ac1

      CROSS JOIN Master.sys.All_Columns ac2

     SELECT @Select, LEN(@Select), RIGHT(@Select,20)

    set statistics time on

    --===== Do the split using a Tally table and return the index like the splitter function

     SELECT ROW_NUMBER() OVER (ORDER BY t.N) AS ItemID,

            SUBSTRING(',' + @Select, N+1, CHARINDEX(',', @Select+',', N)-N) AS Item

       FROM dbo.Tally t

      WHERE N <= LEN(',' + @Select)

        AND SUBSTRING(',' + @Select ,t.N, 1) = ','

    --===== Do the split using a CTE splitter in a function

     SELECT * FROM dbo.DelimitedSplit(@select,',')

    set statistics time off[/font]


    Here are the results from my 7 year old box...


    [font="Courier New"](1 row(s) affected)

    (1000 row(s) affected)

    SQL Server Execution Times: (Tally table)

    CPU time = 0 ms, elapsed time = 290 ms.

    (1000 row(s) affected)

    SQL Server Execution Times: (Base 10 CTE function)

    CPU time = 62 ms, elapsed time = 180 ms.[/font]

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • And, yes, Digs... that test shows that a 1000 element parameter was split in less than a heartbeat for both methods. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • @jeff - would you mind trying out this little piece of code and let me know how fast it is on your "computer in a deathbox"?:-D

    Also, would you compare the XML code in section 3 of the article (linked below) and see how fast the Tally table does on your computer taking a 31,000 element comma-delimited string apart? (When I use the Tally table method (that you used above) on that code, I'm only getting 2,000 records returned even though my tally table has 30,000 numbers...??? ) Thanks!

    --===== Do the split using XML

    declare @MyXML XML

    declare @MyStr varchar(max) -- needed since I get an error about exceeding 8000 characters in the replace statement below

    set @MyStr = @select

    set @MyXML = '' + replace(@MyStr, ',', '') + ''

    select ID = x.data.value('id[1]', 'int')

    from @MyXML.nodes('/row') AS x(data)

    Yes, I know I've used SQL 2005 code, and this is a SQL 2000 forum. I just want to get a comparison, since every other method was used above. Edit: including Lynn's, which is 2005 code also.

    BTW, the XML method runs in 20-30ms on my computer... but then, the tally table method runs consistently at 3ms. I've used the XML method on some strings with > 30,000 elements and still get excellent response. In my experience, the larger the string gets, the better XML runs compared to other methods.

    For more information on this XML method, please see my article Using XML to Enhance the Performance of String Manipulations[/url]

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Okay, I pushed my tally table from 30,000 to 1,000,000 records (30,000 / (14 char acct# + comma)) = 2000..... 😀

    I get these results when running the code in my article (31,465 account numbers from an AdventureWorks table, code copied below) (results for each running 5 times):

    XML: 1312/1139/1163/1099/1097 ms.

    Tally: 1297/1352/1333/1285/1254 ms.

    Delta: 15/213/170/186/157

    Tally table method "won" once, XML the other four times.

    Like I said previously, in my experience, XML wins when dealing with larger strings.

    I'd still like to see what kind of results Jeff gets on this test on his old box...

    Here's the code I ran:

    use AdventureWorks

    GO

    -- create comma-delimted string with XML PATH('') method

    declare @CSV varchar(max)

    select @CSV = (select',' + AccountNumber from Sales.SalesOrderHeader FOR XML PATH('') )

    set @CSV = substring(@CSV, 2, len(@CSV)-1)

    set statistics time on

    print '---- XML ----'

    -- convert the CSV string into a valid XML string

    declare @MyXMLData XML

    set @MyXMLData = ''+

    replace(@CSV,',','')+

    ''

    select x.item.value('AccountNumber[1]','nvarchar(15)')

    from @MyXMLData.nodes('/Rows/Row')AS x(item)

    print '---- TALLY ----'

    SELECT ROW_NUMBER() OVER (ORDER BY t.Number) AS ItemID,

    SUBSTRING(',' + @CSV, Number+1, CHARINDEX(',', @CSV+',', Number)-Number) AS Item

    FROM IMS.dbo.Numbers t

    WHERE Number <= LEN(',' + @CSV)

    AND SUBSTRING(',' + @CSV ,t.Number, 1) = ','

    set statistics time off

    GO 5

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • And, just to be complete, I ran this test again including Lynn's CTE method (DelimitedSplit):

    XML: 1571/1100/1350/1134/1130

    Tally: 1429/1313/1306/1317/1288

    CTE: 6283/6221/6292/6445/6569

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

Viewing 15 posts - 16 through 30 (of 70 total)

You must be logged in to reply to this topic. Login to reply