SQL Split query

  • Mario Garcia (7/6/2009)


    I have used a function to parse data. It can be altered some to fit your needs.

    You may want to search this site a bit. You may find several split routines that will scale better than your current solution. The biggest issue I can see with your code is the WHILE loop you are using to accomplish the split.

    Here is a post with one example of an alternative method as well.

  • Lynn Pettis (7/6/2009)


    Mario Garcia (7/6/2009)


    I have used a function to parse data. It can be altered some to fit your needs.

    You may want to search this site a bit. You may find several split routines that will scale better than your current solution. The biggest issue I can see with your code is the WHILE loop you are using to accomplish the split.

    Here is a post with one example of an alternative method as well.

    I'd recommend reading that entire thread... there were several ways brought up, and all of them would be significantly better than any using a while loop.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • WayneS (7/6/2009)


    Lynn Pettis (7/6/2009)


    Mario Garcia (7/6/2009)


    I have used a function to parse data. It can be altered some to fit your needs.

    You may want to search this site a bit. You may find several split routines that will scale better than your current solution. The biggest issue I can see with your code is the WHILE loop you are using to accomplish the split.

    Here is a post with one example of an alternative method as well.

    I'd recommend reading that entire thread... there were several ways brought up, and all of them would be significantly better than any using a while loop.

    I concur with Wayne on this as well. I just didn't say it as my previous post was the third attempt at getting a post to work. Not sure, but something is screwy here at work as I don't have these issues at home.

  • Here is a 48-page long topic about a various attempts to write a good and decent split function

    http://www.sqlservercentral.com/Forums/Topic695508-338-1.aspx


    N 56°04'39.16"
    E 12°55'05.25"

  • sanjeev40084 (7/6/2009)


    see if this works:

    http://sqlservercode.blogspot.com/2005/09/split-comma-delimited-string-fast.html%5B/quote%5D

    Heh... that blog contains the same method we've already talked about in this thread.... except it's usually just called a "Numbers" or "Tally" table instead of "NumbersPivot".

    Since many demonstrate it's usage but few explain how it actually works, please see the following article to find out...

    http://www.sqlservercentral.com/articles/T-SQL/62867/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I found this a while back. I'm sure you can google the coder's name and see what terms it's released under.

    CREATE FUNCTION [dbo].[fn_SplitAndRemovePunctuation](@s VARCHAR(8000))

    returns @out TABLE (wrd VARCHAR(8000))

    AS

    BEGIN

    /**********************************************************

    Written By : yousef ekhtiari

    Email :yousef_ekhtiari@Hotmail.com

    Create Date : 24 May 2007

    Modified Date :

    Description : Extracting words of a string with removal

    of unwanted punctuations

    USAGE:

    select * from dbo.ufnSplitWords('This is a test.')

    select * from dbo.ufnSplitWords('This,, is a test')

    select * from dbo.ufnSplitWords('Do you need more example?')

    **********************************************************/

    DECLARE @pos INT,

    @Tmp VARCHAR(8000),

    @wrd VARCHAR(50)

    DECLARE @UnwantedPunctuations TABLE (punc CHAR(1))

    /*keep unwanted punctuations in a table.

    you may customise these punctuation as you wish */

    INSERT @UnwantedPunctuations

    SELECT '.'

    UNION ALL

    SELECT ','

    UNION ALL

    SELECT '?'

    UNION ALL

    SELECT ':'

    UNION ALL

    SELECT '-'

    --Removing unwanted punctuations from the input string

    UPDATE @UnwantedPunctuations

    SET @s-2 =REPLACE(@s,punc,' ')

    SET @s-2 =ltrim(rtrim(@s))+' '

    WHILE LEN(@s)>0

    BEGIN

    SET @pos=CHARINDEX(' ',@s,2)

    SET @wrd=ltrim(LEFT(@s,@pos))

    --Removing unwanted punctuations

    UPDATE @UnwantedPunctuations

    SET @wrd=REPLACE(@wrd,punc,'')

    INSERT @out VALUES

    (

    rtrim(@wrd)

    )

    --Take out the last word

    SET @s-2=ltrim ( stuff(@s,1,@pos ,'') )

    END

    RETURN

    END

  • Sorry... wrong thread.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Heh... dang it... I had the right thread after all. That code above is slow. Where all the other methods take something less than 16 ms to split out 534 elements in a single VARCHAR(8000), the one with the While Loop and the bad character delimiter takes 250 ms cpu and over a 500 ms in duration.

    Here's the run results...

    [font="Courier New"]---- XML ----

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 10 ms.

    (534 row(s) affected)

    SQL Server Execution Times:

    CPU time = 15 ms, elapsed time = 182 ms.

    ---- XML2 ----

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 6 ms.

    (534 row(s) affected)

    SQL Server Execution Times:

    CPU time = 16 ms, elapsed time = 130 ms.

    ---- TALLY ----

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    (534 row(s) affected)

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 12 ms.

    ---- other ----

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    (534 row(s) affected)

    SQL Server Execution Times:

    CPU time = 234 ms, elapsed time = 547 ms.

    [/font]

    I suppose you can try to justify it by saying it'll never get used for something like that... right up until the time someone needs it for something like that. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    I'm pretty sure I know what your Tally table test is doing... but what is the difference between your "XML" and "XML2" tests? Would you be able to post your code for those tests?

    Thanks (and always curious about what Jeff does in the name of performance;-)),

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Sorry, post appeared to hang and accidentally double-posted.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Yeah, I noticed that it wasn't the most efficient SQL I've seen. 🙂 Thanks for the info.

  • WayneS (7/6/2009)


    Jeff,

    I'm pretty sure I know what your Tally table test is doing... but what is the difference between your "XML" and "XML2" tests? Would you be able to post your code for those tests?

    Thanks (and always curious about what Jeff does in the name of performance;-)),

    The XML2 method doesn't always perform better than the XML method. I threw in XML3 in the code below, as well, but it doesn't always perform better, either. The Tally table, on the other hand, always performs better on up to 8k. If you want a real kick, try the side bar I mention in the very last comment... Surprise! 😉

    [font="Courier New"]--===== Declare local variables

    DECLARE @CSV       VARCHAR(8000),

            @xCSV      VARCHAR(MAX),  -- To allow expansion of XML in tests

            @MyXMLData XML            -- Holds xml string version 

    --===== Create space-delimted string with XML PATH('') method

     SELECT @CSV  STUFF(

                         (SELECT ' ' AccountNumber FROM AdventureWorks.Sales.SalesOrderHeader FOR XML PATH(''))

                    ,1,1,''),

            @CSV  REPLACE(@CSV,'-',''), --Remove dashes just to simplify the tests

            @xCSV @CSV --Copy of @CSV for expansion of XML in tests

    --===== Start the timers and do the tests...

        SET STATISTICS TIME ON

    --===== Traditional XML split (verbosity makes it a bit slower than it should be)

      PRINT '---- XML ----'

         -- Convert the CSV string into a valid XML string (it's mostly case sensitive, too)

        SET @MyXMLData '<Rows><Row><AccountNumber>'

                       REPLACE(@xCSV,' ','</AccountNumber></Row><Row><AccountNumber>')

                       + '</AccountNumber></Row></Rows>'

         -- Do the split (most of this is case sensitive as well

     SELECT x.item.value('AccountNumber[1]','NVARCHAR(15)'AS SplitData

       FROM @MyXMLData.nodes('/Rows/Row'AS x(Item)

    --===== Another XML split but usually a tiny bit faster because it's less verbose, yet still easy to read

      PRINT '---- XML2 ----'

         -- Convert the CSV string into a valid XML string (again, mostly case sensitive)

        SET @MyXMLData '<T><R><C>'

                       REPLACE(@xcsv,' ','</C></R><R><C>')

                       + '</C></R></T>'

         -- Do the split (most of this is case sensitive as well

     SELECT a.item.value('C[1]','NVARCHAR(15)'AS SplitData

       FROM @MyXMLData.nodes('/T/R')AS a(item)

    --===== Another XML split but usually a bit faster still because it's much less verbose, yet still easy to read.

      PRINT '---- XML3 ----'

         -- Convert the CSV string into a valid XML string (again, mostly case sensitive)

        SET @MyXMLData '<X>'

                       REPLACE(@xCSV,' ','</X><X>')

                       + '</X>'

         -- Do the split (most of this is case sensitive as well

     SELECT x.item.value('.','NVARCHAR(15)'AS SplitData

       FROM @MyXMLData.nodes('X'AS x(item)

    --===== Traditional Tally table split doesn't require expansion into VARCHAR(MAX).

         -- It also handles special characters like "&" which are a bit tough on XML.

         -- Further, for splitting up to VARCHAR(8000), it's faster than the other methods.

      PRINT '---- TALLY ----'

     SELECT SUBSTRING(' ' @CSVN+1CHARINDEX(' '@CSV+' 'N)-NAS SplitData

       FROM dbo.Tally

      WHERE <= LEN(' ' @CSV)

        AND SUBSTRING(' ' @CSVN1' '

    --===== End of testing

        SET STATISTICS TIME OFF

    --===== Sidebar... if you want to see how bad XML can be, try inserting the

         -- result set into a table[/font]

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (7/8/2009)


    If you want a real kick, try the side bar I mention in the very last comment... Surprise! 😉

    Well, after testing it, I don't know what to say. I'm astonished at the huge difference. Jeez...

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

Viewing 14 posts - 16 through 28 (of 28 total)

You must be logged in to reply to this topic. Login to reply