Processing Comma Separated Lists

  • Ok Evan, let's go way back to your original request... you have data that looks like this...

    code_list_id

    code_list_text

    1

    'Alpha,Bravo,Charlie,Delta,Echo'

    2

    'India,Foxtrot,Golf'

    ...

    ..

    ..

    .

    100

    'X-ray,Yankee,Zulu'

    ...and you want it to look like this...

    code_list_id

    code_1

    code_2

    code_3

    code_4

    code_5

    1

    'Alpha'

    'Bravo'

    'Charlie'

    ' Delta'

    'Echo'

    2

    'India'

    'Foxtrot'

    'Golf'

    NULL

    NULL

    ...

    .

    .

    .

    .

    .

    ..

    .

    .

    .

    .

    .

    100

    'X-ray'

    'Yankee'

    'Zulu'

    NULL

    NULL

    First, I think that Serqiy is correct... you don't really need it like that... you just need to be able to query it as if it were like that.  Don't worry... I'm gonna give it to you both ways...

    In order to be able to do this, we need the help of a special little multi-purpose tool known as a "Tally" or "Numbers" table... I like "Tally" because it just sounds cooler   All it is is a table containing a single column of well indexed sequential numbers starting at "1" and will help you do some remarkable things that most folks would have to use a cursor for.  And, it runs a heck of a lot faster than a cursor does, too.  Here's how to make one... make it a permanent part of your database 'cause it's worth it... and notice that we don't use a WHILE loop or cursor to make the Tally table, either...

    --===== Create and populate the Tally table on the fly

     SELECT TOP 11000 --equates to more than 30 years of dates when you need it

            IDENTITY(INT,1,1) AS N

       INTO dbo.Tally

       FROM Master.dbo.SysColumns sc1,

            Master.dbo.SysColumns sc2

    --===== Add a Primary Key to maximize performance

      ALTER TABLE dbo.Tally

            ADD CONSTRAINT PK_Tally_N PRIMARY KEY CLUSTERED (N)

    --===== Allow the general public to use it

      GRANT SELECT ON dbo.Tally TO PUBLIC

    Alright... just for the rest of the folks who might want to try this one, we need a bit of test data...

    --===== Create a test table to hold some test data for the demo

     CREATE TABLE Code_Lists

            (

            Code_List_ID INT,

            Code_List_Text VARCHAR(8000)

            )      

     INSERT INTO Code_Lists (Code_List_ID,Code_List_Text)

     SELECT 1,'Alpha,Bravo,Charlie,Delta,Echo' UNION ALL

     SELECT 2,'India,Foxtrot,Golf'  UNION ALL

     SELECT 3,'Delta,Echo'  UNION ALL

     SELECT 4,'Alpha'  UNION ALL

     SELECT 5,' ' UNION ALL

     SELECT 6,'X-ray,Yankee,Zulu' 

     Ok... now the hard part... about now, most folks resort to some form of RBAR (pronounced "ree-bar" and is a "Modenism" for "Row By Agonizing Row) either by using a WHILE loop or a Cursor.  The logical choice would be to make a SPLIT function but even that is a form of RBAR.  Nope, we're gonna do like Serqiy suggested (to start with)... we're gonna normalize the data (first) and we're going to do it with what most folks avoid... a good healthy cross-join that works pretty darned fast because of the criteria...

    --===== Do the split with a count

     SELECT Code_List_ID,

            CASE

                WHEN LEN(h.Code_List_Text)=0

                THEN NULL

                ELSE SUBSTRING(','+h.Code_List_Text+',', t.N+1, CHARINDEX(',', ','+h.Code_List_Text+',', t.N+1)-t.N-1)

            END AS Code_List_Text,

            CASE

                WHEN LEN(h.Code_List_Text)=0

                THEN 1

                ELSE t.N-DATALENGTH(REPLACE(LEFT(h.Code_List_Text,t.N), ',', ''))+1

            END AS Position

       FROM dbo.Tally t,

            dbo.Code_Lists h

      WHERE SUBSTRING(','+h.Code_List_Text+',', t.N, 1) = ','

        AND t.N < DATALENGTH(','+h.Code_List_Text+',')

    Try THAT with a cursor... this will easily do 10,000 rows in a little over 1 second.  Yeah, there's probably a faster way to do it but I'm being a bit lazy.   Here's what the result set looks like...

    Code_List_ID

    Code_List_Text

    Position

    1

    Alpha

    1

    1

    Bravo

    2

    1

    Charlie

    3

    1

    Delta

    4

    1

    Echo

    5

    2

    India

    1

    2

    Foxtrot

    2

    2

    Golf

    3

    3

    Delta

    1

    3

    Echo

    2

    4

    Alpha

    1

    5

    NULL

    1

    6

    X-ray

    1

    6

    Yankee

    2

    6

    Zulu

    3

    Notice that the result set contains a position number which is correct for the position in the CSV string from which it was plucked.   Now, I'm not sure why, but lot's of folks don't like "long skinny" tables where row information is split up... so, let's put it all back together with another chunk of SQL prestidigitation known as a "cross-tab" (it's in Books Online if you want to read about it)... for this, we'll also use the results of the previous query as if it were a table or view... that's called a "Derived Table" and some folks actually prefer to think of them as inline views.  Here we go...

     SELECT d.Code_List_ID,

            MAX(CASE WHEN d.Position = 1 THEN d.Code_List_Text ELSE NULL END) AS Code_1,

            MAX(CASE WHEN d.Position = 2 THEN d.Code_List_Text ELSE NULL END) AS Code_2,

            MAX(CASE WHEN d.Position = 3 THEN d.Code_List_Text ELSE NULL END) AS Code_3,

            MAX(CASE WHEN d.Position = 4 THEN d.Code_List_Text ELSE NULL END) AS Code_4,

            MAX(CASE WHEN d.Position = 5 THEN d.Code_List_Text ELSE NULL END) AS Code_5

       FROM (--==== Derived table "d" splits the code list by Code_List_ID and provides a position for each item in Code_List_Text

             SELECT Code_List_ID,

                    CASE

                        WHEN LEN(h.Code_List_Text)=0

                        THEN NULL

                        ELSE SUBSTRING(','+h.Code_List_Text+',', t.N+1, CHARINDEX(',', ','+h.Code_List_Text+',', t.N+1)-t.N-1)

                    END AS Code_List_Text,

                    CASE

                        WHEN LEN(h.Code_List_Text)=0

                        THEN 1

                        ELSE t.N-DATALENGTH(REPLACE(LEFT(h.Code_List_Text,t.N), ',', ''))+1

                    END AS Position

               FROM dbo.Tally t,

                    dbo.Code_Lists h

              WHERE SUBSTRING(','+h.Code_List_Text+',', t.N, 1) = ','

                AND t.N < DATALENGTH(','+h.Code_List_Text+',')

            ) d

      GROUP BY d.Code_List_ID

    And, how do you use THAT in a query... you either convert THAT to a view and then query the view (WARNING WARNING WARNING!!!!   DO NOT USE A VIEW FOR MORE THAN ABOUT 10K rows for this unless you make an indexed view!), insert the results into a new table and query the new table (even if it's a temp table), or use the query as another derived table!

    Lemme know how it works for you... and yes... I agree... legacy hardware is a pain in the patootie but its still fun to mess with 

    _____________________________________________________________________________________________________________________________________

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    there is no need for a cursor if you use my function:

    SELECT code_list_id, dbo.List_CharLong(code_list_text, ',')

    FROM CodesTable

    It returns same set of data.

    _____________
    Code for TallyGenerator

  • Hi Jeff

    Your solution using the "Tally" and derived position tables worked beautifully...

    I got exactly what I needed.

    That was some mighty fine codin' Mr. Moden.

    Your assistance is greatly appreciated.

    Cheers

    Evan

  • I know, Serqiy... and I agree... I was just trying to avoid the UDF just for fun.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks, Evan...  appreciate the feedback.  Just remember that with these types of things (cross tab), the whole result set must materialize before criteria will work on it (which is why these really stink for views)...

    Also, I just went back and reread a post I missed... since your black box never spits out less than 2 items, we can get rid of the CASE statements in the derived table to make this a wee-bit faster...

     SELECT d.Code_List_ID,

            MAX(CASE WHEN d.Position = 1 THEN d.Code_List_Text ELSE NULL END) AS Code_1,

            MAX(CASE WHEN d.Position = 2 THEN d.Code_List_Text ELSE NULL END) AS Code_2,

            MAX(CASE WHEN d.Position = 3 THEN d.Code_List_Text ELSE NULL END) AS Code_3,

            MAX(CASE WHEN d.Position = 4 THEN d.Code_List_Text ELSE NULL END) AS Code_4,

            MAX(CASE WHEN d.Position = 5 THEN d.Code_List_Text ELSE NULL END) AS Code_5

       FROM (--==== Derived table "d" splits the code list by Code_List_ID

                 -- and provides a position for each item in Code_List_Text

             SELECT Code_List_ID,

                    SUBSTRING(','+h.Code_List_Text+',', t.N+1, CHARINDEX(',', ','+h.Code_List_Text+',', t.N+1)-t.N-1)

                    AS Code_List_Text,

                    t.N-DATALENGTH(REPLACE(LEFT(h.Code_List_Text,t.N), ',', ''))+1

                    AS Position

               FROM dbo.Tally t,

                    dbo.Code_Lists h

              WHERE SUBSTRING(','+h.Code_List_Text+',', t.N, 1) = ','

                AND t.N < DATALENGTH(','+h.Code_List_Text+',')

            ) d

      GROUP BY d.Code_List_ID

    ___________________________________________________________________________________________________________________

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks, Jeff, for taking time to produce an excellent solution. I was going to take time to use Burton Roberts' 2001 function to solve it. I have been using it extensively since I found it some time ago. It is very similar to Sergiy's function. But your solution works just fine. I'm including it below.

    CREATE FUNCTION udf_split(@sText varchar(8000), @sDelim varchar(20) = ' ')

    RETURNS @retArray TABLE (idx smallint Primary Key, value varchar(8000))

    AS

    /***********************************************************************************

    udf_split

    Emulate VB's Split() function and parameter arrays .

    Taken from Burton Roberts, SQL Server Magazine, July 2001.

    Renamed from fn_Split to udf_split in order to maintain local naming conventions.

    Sample call: SELECT * FROM dbo.udf_Split(@string, '.')

    -----------------------------------------------------------------------------------

    CHANGE HISTORY

    -----------------------------------------------------------------------------------

    DATE WHO COMMENT

    ---------- ----------- ------------------------------------------------------------

    05/02/2006 Carl Start.

    ***********************************************************************************/

    BEGIN

    DECLARE @idx smallint,

    @value varchar(8000),

    @bcontinue bit,

    @iStrike smallint,

    @iDelimlength tinyint

    IF @sDelim = 'Space'

    BEGIN

    SET @sDelim = ' '

    END

    SET @idx = 0

    SET @sText = LTrim(RTrim(@sText))

    SET @iDelimlength = DATALENGTH(@sDelim)

    SET @bcontinue = 1

    IF NOT ((@iDelimlength = 0) or (@sDelim = 'Empty'))

    BEGIN

    WHILE @bcontinue = 1

    BEGIN

    --If you can find the delimiter in the text, retrieve the first element and

    --insert it with its index into the return table.

    IF CHARINDEX(@sDelim, @sText)>0

    BEGIN

    SET @value = SUBSTRING(@sText,1, CHARINDEX(@sDelim,@sText)-1)

    BEGIN

    INSERT @retArray (idx, value)

    VALUES (@idx, @value)

    END

    --Trim the element and its delimiter from the front of the string.

    --Increment the index and loop.

    SET @iStrike = DATALENGTH(@value) + @iDelimlength

    SET @idx = @idx + 1

    SET @sText = LTrim(Right(@sText,DATALENGTH(@sText) - @iStrike))

    END

    ELSE

    BEGIN

    --If you can’t find the delimiter in the text, @sText is the last value in

    --@retArray.

    SET @value = @sText

    BEGIN

    INSERT @retArray (idx, value)

    VALUES (@idx, @value)

    END

    --Exit the WHILE loop.

    SET @bcontinue = 0

    END

    END

    END

    ELSE

    BEGIN

    WHILE @bcontinue=1

    BEGIN

    --If the delimiter is an empty string, check for remaining text

    --instead of a delimiter. Insert the first character into the

    --retArray table. Trim the character from the front of the string.

    --Increment the index and loop.

    IF DATALENGTH(@sText)>1

    BEGIN

    SET @value = SUBSTRING(@sText,1,1)

    BEGIN

    INSERT @retArray (idx, value)

    VALUES (@idx, @value)

    END

    SET @idx = @idx+1

    SET @sText = SUBSTRING(@sText,2,DATALENGTH(@sText)-1)

    END

    ELSE

    BEGIN

    --One character remains.

    --Insert the character, and exit the WHILE loop.

    INSERT @retArray (idx, value)

    VALUES (@idx, @sText)

    SET @bcontinue = 0

    END

    END

    END

    RETURN

    END

  • Sorry,

    Was reviewing old PM's and came across this link.  Here's a split function that has no loops... the details, as always, are in the comments...

    drop FUNCTION dbo.fnTSplit

    go

    CREATE FUNCTION dbo.fnTSplit

    /***************************************************************************************************

     Purpose:

     Split a delimited string into a table variable including a column that identifies the ordinal of

     each value that has been split out of the input string.  Just about ANY size delimiter may be used.

     Programmer Notes:

     1. Blank values return NULLs in the result set even when they are at the beginning or end of the

        input string.

     2. Empty input string returns result set with 1 value of NULL.

     3. NULL input string returns result set with 1 value of NULL.

     4. Function raises an error if the length of the input string is greater than 7,998 characters.

     Typical Performance:

        20 items in a single string: < 1 millisecond

     1,373 items in a single string: 30-46 milliseconds

     

     Example Usage:

     SELECT * FROM dbo.fnTSplit

     Revision History:

     Rev 00 - 05/11/2007 - Jeff Moden

            - Initial creation, unit test, and performance test for SQLServerCentral.Com

    ***************************************************************************************************/

    --==================================================================================================

    --      Declare the I/O variables

    --==================================================================================================

            (

            @DelimitedList VARCHAR(8000), --Delimited string of values to be split

            @Delimiter     VARCHAR(8000)  --The delimiter to split the values on

            )

    RETURNS @Result TABLE                 --The return table

            (

            Ordinal INT IDENTITY(1,1),    --The ordinal postion of the split value

            String  VARCHAR(8000)         --The split value

            )

         AS

      BEGIN

    --==================================================================================================

    --      Declare local variables

    --==================================================================================================

    DECLARE @MyDelim CHAR(1)

    --==================================================================================================

    --      Test the input variable for max length.  Force error if test fails because cannot use a

    --      RAISERROR in a function.

    --==================================================================================================

         IF LEN(@DelimitedList) > 7998 --Need room for two delimiters (8000-2=7998)

            INSERT INTO @Result (STRING)

            SELECT STR(1/0) --Forces an error (divide by zero)

    --==================================================================================================

    --      Presets

    --==================================================================================================

    --===== Use an "invisible" character that will replace the given delimiter(s)

        SET @MyDelim = CHAR(1)

    --===== Replace the delimiters in the input string, no matter how long they may be, with the single

         -- character "invisible" delimiter we just created above.  Then, wrap the string in delimiters.

        SET @DelimitedList = @MyDelim+ISNULL(REPLACE(@DelimitedList,@Delimiter,@MyDelim),'')+@MyDelim

    --==================================================================================================

    --      Split the modified input string into the result table using the Tally table to create a

    --      set-based "loop".  The WITH (NOLOCK) nearly doubles the speed.

    --==================================================================================================

     INSERT INTO @Result (String)

     SELECT NULLIF(SUBSTRING(@DelimitedList,N+1,CHARINDEX(@MyDelim,@DelimitedList,N+1)-N-1),'')

       FROM dbo.Tally WITH (NOLOCK)

      WHERE N < LEN(@DelimitedList)

        AND SUBSTRING(@DelimitedList,N,1)=@MyDelim

     RETURN

        END

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 7 posts - 16 through 21 (of 21 total)

You must be logged in to reply to this topic. Login to reply