Processing Comma Separated Lists

  • Hi Everyone

    I have a legacy table called code_lists that contains rows of comma separated strings.

    structure:

    code_list_id int , code_list_test varchar(200)

    data:

    code_list_id code_list_text
    1'Alpha,Bravo,Charlie,Delta,Echo'
    2'India,Foxtrot,Golf'
    .....
    .. .
    100'X-ray,Yankee,Zulu'

    What I need to do is retrieve the data like so:

    code_list_idcode_1code_2code_3code_4code_5
    1'Alpha''Bravo''Charlie'' Delta' 'Echo'
    2'India''Foxtrot''Golf'NULLNULL
    ........
    .......
    100'X-ray''Yankee''Zulu'NULLNULL

    I thougt the only way to do this was to brute force it and process every line individually using T-SQL string manipulation inside a cursor within a stored proc.

    Are there any other clever SQL tricks that might make this easier? Can the breaking up of comma separated lists be done in a single SELECT? (I'm not sure it can be done, that is why I'm asking...)

    Thanks

    Evan

  • The way you want it is not any better then it is now.

    What if 6th code will appear one day?

    It should be:

    1 'Alpha'

    1 'Bravo'

    1 'Charlie'

    1 'Delta'

    1 'Echo'

    2 'India'

    2 'Foxtrot'

    2 'Golf'

    ...

    _____________
    Code for TallyGenerator

  • To get this you need to use UDF.

    One I've posted some time ago is here:

    http://www.sqlservercentral.com/forums/shwmessage.aspx?forumid=8&messageid=310562&p=4

    _____________
    Code for TallyGenerator

  • Sergiy

    5 Codes to a list is a hard limit imposed in the legacy system. It won't be added to as I don't think anyone knows how any more...

    I don't want to normalise the data into multiple rows.

    Doing it the way you suggest will return (up to) five rows for each successful join by code_list_id on the code_list table when I only want one (but with the values in separate columns).

    Thanks for thinking about it anyway.

    Cheers

    Evan

  • You are mixing RDMS database with Excel spreadsheet.

    I understand your desire to keep table in the way suitable for human reading, but it's not a human who will need actually to read it.

    Can you provide a query to select all items having code "Bravo" assigned to it?

    Or a query to remove code "Bravo" from items having it assigned?

    Probably it would not matter that you need to perform scan 5 times in order to do that, but only if your project will never exceed Excel scale.

    _____________
    Code for TallyGenerator

  • Segiy

    You are correct in some ways. However the typical query is more along the lines of :

    select the code_list_ids that have 'Bravo' as the second code item and 'Alpha' as the

    fifth code item (but not 'Delta' as the first item).

    The problem is that the position of the item in the list is significant and without splitting them into separate columns numbered 1 to 5 I can't easily tell the position of the item. The actual item codes are long-ish character strings (Base64 representations of chunks of binary data). I have used actual words like 'Alpha', 'Bravo' etc in my example only to make it easier to read.

    Thanks for thinking about it anyway.

    Cheers

    Evan

  • So, if you gonna need to remove code "Bravo" from item 1 what's gonna be the result?

    NULL in Code2?

    Or Code3 will become Code2?

    _____________
    Code for TallyGenerator

  • Evan,

    You mentioned "PL/SQL"... as you know, the SQL extensions between Oracle and SQL Server are quite different.  Are you looking for an Oracle (PL/SQL) solution or an SQL Server (T-SQL) solution?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff

    Old habits die hard and the fingers get ahead of the brain. I meant to type "T-SQL" of course...

    Thanks

    Evan

  • Evan,

    so, is it true what you said that code "Bravo" may appear only as Code_2 and "Charlie" only as Code_3 and you not gonn search for "Charlie" in Code_2 column after "Bravo" has ben deleted?

    _____________
    Code for TallyGenerator

  • Sergiy

    Yes. The black-box machine that spits out the code lists always produces between 2 and 5 items in a code_list_text value. They are always "stuffed left". There are never blank fields in mid-string. These code lists are not edited after creation, they are only ever searched for code list item "hits" by item position. It's a "one way" thing. The information is always turning over by having new items added. I just have to monitor the table and pick up the codes as they appear.

    I know you are trying to set me on the good path to structural "correctness" but I can't change the behaviour of the black box. I'm just looking at simplifying the task I have in front of me.

    I can write the code in a trigger using loops and lots of CHARINDEX() and LEFT() function calls. I was just hoping that somone might know a trick that would be simpler.

    I know MySQL has an GROUP_CONCAT() function that allows for producing a comma delimited string of values for a group and I thought that there may be something in T-SQL that was similar (in a kind of reverse fashion e.g. LIST_STRIP(column, delimiter)). I searched all the documentation I could find but had no joy. If there isn't a trick method then that's fine, I'll use the T-SQL 'loop and cut' method.

    Thanks

    Evan

  • So, if code "Bravo" is not there this "obe way" "black box" will put "Charlie" into Code_2 column, and to select items having code "Charlie" you need to search in column Code_2, not Code_3.

    Right?

    _____________
    Code for TallyGenerator

  • Kind of....

    If "Charlie" is put into Code_2, then that code line is not returned when you search for records with "Charlie" in Code_3. That is a perfectly valid situation.

  • So, return to the question:

    what kind of query you gonna need to run to find items having code "Charlie" assigned?

    _____________
    Code for TallyGenerator

  • It's a moot point because the item value without a positional modifier is meaningless.

    As a purely academic exercise to answer you question:

    SELECT code_list_id FROM code_list

    WHERE (code_1 = 'Charlie' ) OR (code_2 = 'Charlie' ) OR (code_3 = 'Charlie' ) OR (code_4 = 'Charlie' ) OR (code_5 = 'Charlie' )

    This is not a meaningful query that would ever be required in this application, but as an academic exercise it can easily be done.

    There is not really anything productive that can come out of pursuing this. I will code the list component stripping in T-SQL using a WHILE loop and leave it at that.

    Thanks for your contributions.

    Cheers

    Evan

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply