Parse delimited data in column to multiple columns

  • I'm working on a sales commission report that will show commissions for up to 5 sales reps for each invoice. The invoice detail table contains separate columns for the commission rates payable to each rep, but for some reason the sale srep IDs are combined into one column. The salesrep column may contain null, a single sales rep id, or up to five slaes rep IDs separated by the '~' character.

    So I'd like to parse the rep IDs from a single column (salesreplist) in my invoice detail table (below) to multiple columns (RepID1, RepID2, RepID3, RepID4,RepID5) in a temp table so I can more easily calculate the commission amounts for each invoice and sales rep.

    Here is my table:

    CREATE TABLE invcdtl(

    invoicenum int,

    salesreplist [text] NULL,

    reprate1 int NULL,

    reprate2 int NULL,

    reprate3 int NULL,

    reprate4 int NULL,

    reprate5 int NULL,

    )

    Here is some sample data:

    1 A 0 0 0 0 0

    2 0 0 0 0 0

    3 I~~~~ 15 0 0 0 0

    4 A~B 5 5 0 0 0

    5 I~F~T~K~G 5 5 2 2 2

    As you can see, some records have trailing delimiters but some don't. This may be a result of the application's behavior when multiple reps are entered then removed from an invoice. One thing for sure is that when there are multiple reps, the IDs are always separated by '~'

    Can anyone suggest a solution?

  • Your sample data doesn't match your sample table. See the URL in my signature to get a better answer quicker...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Is it possible to change the design of this table? This design is weak. You would be better off if your data was normalized rather than this single table. Another alternative might be to store the rep and rate data as xml. This schema might be "convenient" for developers to write to, but as you see in your present question this is not a good storage method when you want to get the data OUT.

  • On top of that, the SalesRepList column is TEXT instead of VARCHAR... it could probably be converted to VARCHAR quite easily.

    Once converted, you can easily split the column with a Tally table split... see the piece of code titled "One Final "Split" Trick with the Tally Table" in the following article...

    http://www.sqlservercentral.com/articles/TSQL/62867/

    {edit}... I repaired the link above...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Using what was given including the recommendation of converting TEXT data type to VARCHAR you might want to try this:

    You will have to insert your own code to update the appropriate table field, but that should not be too difficult

    DECLARE @position INT

    DECLARE @string VARCHAR(20)

    DECLARE @tilde INT

    DECLARE @Char AS VARCHAR(5)

    DECLARE @First AS INT

    DECLARE @Rep AS VARCHAR(5)

    -- Initialize the current position and the string variables.

    SET @position = 1

    SET @First = 1

    SET @Rep = ''

    SET @string = 'IX~F~T~K~G' --This is the content of the table field salesreplist with one sales rep composed of 2 characters just to make sure the routine could handle properly

    --SET @string = '~~F~T~G' -- just a test case

    WHILE @position <= DATALENGTH(@string)

    BEGIN

    SET @tilde = (SELECT ASCII(SUBSTRING(@string, @position, 1)))

    IF @tilde <> 126

    BEGIN

    SET @Char = (SELECT CHAR(ASCII(SUBSTRING(@string, @Position, (@position - @First) +1))))

    SET @Rep = @Rep + @Char

    END

    ELSE

    BEGIN

    --PRINT 'Rep found*' + @Rep + '*' --only for testing

    SET @First = @position

    SET @Rep = ''

    END

    IF @position = DATALENGTH(@string)

    BEGIN

    IF @position - DATALENGTH(@string) > 0

    BEGIN

    SET @Char = (SELECT CHAR(ASCII(SUBSTRING(@string, @Position, (@position - DATALENGTH(@string))) )))

    SET @Rep = @Rep + @Char

    END

    --PRINT 'Last Rep found*' + @Rep + '*' --only for testing

    END

    SET @position = @position + 1

    END

    Knowing Jeff Moden's skill I strongly recommend you follow his suggestion on the use of a Tally Table.

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • Jeff Moden,

    Jeff tried to send you a PM, but was not allowed to do so said something about you have too many of them.

    Just to let you know I clicked multiple times on the link for the Tally table solution and kept getting a 403 error - "The website declined to show this webpage"

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • BitBucket... I've got 100 PM's that I really want to keep. Steve hasn't come up with a way to download them, yet. Send me an email...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sorry folks... I've repaired the link that BitBucket was talking about and I've also included it here...

    http://www.sqlservercentral.com/articles/TSQL/62867/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    Thanks for the Forum Etiquette article. Here is my revised table and test data based on your recommendations. By the way, since I don't have any control over the design of the actual table in production, normalization is not an option.

    I appreciate your time. Any additional suggestions are welcome.

    --===== If the test table already exists, drop it

    IF OBJECT_ID('TempDB..#mytable','U') IS NOT NULL

    DROP TABLE #mytable

    --===== Create the test table with

    CREATE TABLE #mytable

    (

    ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,

    SalesRepList TEXT,

    RepRate1 DECIMAL(6,2),

    RepRate2 DECIMAL(6,2),

    RepRate3 DECIMAL(6,2),

    RepRate4 DECIMAL(6,2),

    RepRate5 DECIMAL(6,2)

    )

    --===== All Inserts into the IDENTITY column

    SET IDENTITY_INSERT #mytable ON

    --===== Insert the test data into the test table

    INSERT INTO #mytable

    (ID, SalesRepList, RepRate1, RepRate2, RepRate3, RepRate4, RepRate5 )

    SELECT '12534','CK~AUTOSOL',5.00,5.00,0.00,0.00,0.00 UNION ALL

    SELECT '12535','AUTOSOL~TS',5.00,5.00,0.00,0.00,0.00 UNION ALL

    SELECT '12538','TS',10.00,0.00,0.00,0.00,0.00 UNION ALL

    SELECT '12543','CK',10.00,0.00,0.00,0.00,0.00 UNION ALL

    SELECT '12554','AUTOSOL',0.00,0.00,0.00,0.00,0.00 UNION ALL

    SELECT '12555','IBSNE~~~~',0.00,0.00,0.00,0.00,0.00 UNION ALL

    SELECT '12559','AUTOSOL~CK',5.00,5.00,0.00,0.00,0.00 UNION ALL

    SELECT '12561','AUTOSOL~~~~',10.00,0.00,0.00,0.00,0.00 UNION ALL

    SELECT '12612','AUTOSOL',10.00,0.00,0.00,0.00,0.00 UNION ALL

    SELECT '12623','CK',10.00,0.00,0.00,0.00,0.00 UNION ALL

    SELECT '12625','TS',0.00,0.00,0.00,0.00,0.00 UNION ALL

    SELECT '12633','AUTOSOL',10.00,0.00,0.00,0.00,0.00 UNION ALL

    SELECT '12634','CK',10.00,0.00,0.00,0.00,0.00 UNION ALL

    SELECT '12648','TS',5.00,0.00,0.00,0.00,0.00

    --===== Set the identity insert back to normal

    SET IDENTITY_INSERT #mytable ON

    --removed final UNION ALL

  • Thanks... now we're cooking... I'm on my way to work... I'll take a look at this tonight.

    Hmmm... Glad this is SQL Server 2k5... makes it a lot easier to solve the ragged right split...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi all,

    I have a partial solution that should be able to be extended into a complete one. I just don't have the time right now to finish it. Someone out there probably can - perhaps even the original poster...

    DECLARE @INVCDTL TABLE (

    ID INT,

    SalesRepList TEXT,

    RepRate1 DECIMAL(6,2),

    RepRate2 DECIMAL(6,2),

    RepRate3 DECIMAL(6,2),

    RepRate4 DECIMAL(6,2),

    RepRate5 DECIMAL(6,2)

    )

    INSERT INTO @INVCDTL (ID, SalesRepList, RepRate1, RepRate2, RepRate3, RepRate4, RepRate5)

    SELECT '12534','CK~AUTOSOL',5.00,5.00,0.00,0.00,0.00 UNION ALL

    SELECT '12535','AUTOSOL~TS',5.00,5.00,0.00,0.00,0.00 UNION ALL

    SELECT '12538','TS',10.00,0.00,0.00,0.00,0.00 UNION ALL

    SELECT '12543','CK',10.00,0.00,0.00,0.00,0.00 UNION ALL

    SELECT '12554','AUTOSOL',0.00,0.00,0.00,0.00,0.00 UNION ALL

    SELECT '12555','IBSNE~~~~',0.00,0.00,0.00,0.00,0.00 UNION ALL

    SELECT '12559','AUTOSOL~CK',5.00,5.00,0.00,0.00,0.00 UNION ALL

    SELECT '12561','AUTOSOL~~~~',10.00,0.00,0.00,0.00,0.00 UNION ALL

    SELECT '12612','AUTOSOL',10.00,0.00,0.00,0.00,0.00 UNION ALL

    SELECT '12623','CK',10.00,0.00,0.00,0.00,0.00 UNION ALL

    SELECT '12625','TS',0.00,0.00,0.00,0.00,0.00 UNION ALL

    SELECT '12633','AUTOSOL',10.00,0.00,0.00,0.00,0.00 UNION ALL

    SELECT '12634','CK',10.00,0.00,0.00,0.00,0.00 UNION ALL

    SELECT '12648','TS',5.00,0.00,0.00,0.00,0.00;

    WITH NO_EXTRA_TILDES AS (

    SELECT ID, REPLACE(REPLACE(REPLACE(CAST(SalesRepList AS varchar(10)),'~~~~',''),'~~~',''),'~~','') AS SalesRepList,

    RepRate1, RepRate2, RepRate3, RepRate4, RepRate5

    FROM @INVCDTL

    ),

    REP_ONE AS (

    SELECT ID, REPLACE(

    CASE CHARINDEX('~',SalesRepList)

    WHEN 0 THEN SalesRepList

    ELSE SUBSTRING(SalesRepList,1,CHARINDEX('~',SalesRepList))

    END,'~','') AS Rep1ID

    FROM NO_EXTRA_TILDES

    ),

    REP_TWO AS (

    SELECT A.ID, REPLACE(

    CASE CHARINDEX('~',SalesRepList,LEN(Rep1ID)+2)

    WHEN 0 THEN SUBSTRING(SalesRepList,LEN(Rep1ID)+1,LEN(SalesRepList)-LEN(Rep1ID))

    ELSE SUBSTRING(SalesRepList,LEN(Rep1ID)+1,CHARINDEX('~',SalesRepList,LEN(Rep1ID)+1))

    END,'~','') AS Rep2ID

    FROM NO_EXTRA_TILDES AS A INNER JOIN REP_ONE AS B

    ON A.ID=B.ID

    ),

    COMBINED AS (

    SELECT A.ID, Rep1ID, Rep2ID

    FROM REP_ONE AS A INNER JOIN REP_TWO AS B

    ON A.ID=B.ID

    )

    SELECT COMBINED.*, SalesRepList

    FROM COMBINED INNER JOIN NO_EXTRA_TILDES

    ON COMBINED.ID=NO_EXTRA_TILDES.ID

    The idea is to start by eliminating all excess tildes, and then split off each rep's ID, one at a time, from the string using just length, substring, and charindex. I have the first two complete. Any takers?

    Steve

    (aka smunson)

    :):):)

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • I have copied queries directly from Jeff Moden's article at:

    http://www.sqlservercentral.com/articles/TSQL/62867

    and modified them to suit this purpose.

    Using the test data you posted (not repeated here)...

    First, create a Tally table, as in the article:

    IF OBJECT_ID('dbo.Tally') IS NOT NULL

    DROP TABLE dbo.Tally

    --===== Create and populate the Tally table on the fly

    SELECT TOP 11000 --equates to more than 30 years of dates

    IDENTITY(INT,1,1) AS N

    INTO dbo.Tally

    FROM Master.dbo.SysColumns sc1,

    Master.dbo.SysColumns sc2

    --===== Add a Primary Key to maximize performance

    ALTER TABLE dbo.Tally

    ADD CONSTRAINT PK_Tally_N

    PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100

    Here's the query, a modified version of the delimited-string splitter from the Tally Table article:

    --Create a table to store the results in

    DECLARE @Commissions TABLE

    (

    SaleIDINT,

    CommissionOrder INT,

    RepCode VARCHAR(8000),

    Commission DECIMAL(6,2)

    )

    ; WITH SalesReps (SalesID, RepCode, RepOrder) AS

    (

    SELECT M.ID,

    SUBSTRING('~' + CAST(M.SalesRepList AS VARCHAR(8000)) + '~', T.N+1, CHARINDEX('~','~' + CAST(M.SalesRepList AS VARCHAR(8000)) + '~',T.N+1)-(T.N+1)),

    ROW_NUMBER() OVER (PARTITION BY M.ID ORDER BY T.N)

    FROM dbo.Tally T

    CROSS JOIN #mytable M

    WHERE T.N < DATALENGTH(M.SalesRepList)

    AND SUBSTRING('~' + CAST(M.SalesRepList AS VARCHAR(8000)) + '~',T.N,1) = '~'

    AND SUBSTRING('~' + CAST(M.SalesRepList AS VARCHAR(8000)) + '~',T.N+1,1) <> '~'

    )

    INSERT INTO @Commissions

    (SaleId, CommissionOrder, RepCode, Commission)

    SELECT R.SalesID, R.RepOrder, R.RepCode,

    CASE R.RepOrder

    WHEN 1 THEN M.RepRate1

    WHEN 2 THEN M.RepRate2

    WHEN 3 THEN M.RepRate3

    WHEN 4 THEN M.RepRate4

    WHEN 5 THEN M.RepRate5

    ELSE 0

    END

    FROM SalesReps R

    JOIN #mytable M ON M.ID = R.SalesID

    SELECT * FROM @Commissions

  • Stephanie Giovannini (6/16/2008)


    Outstanding job, Stephanie! 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks to all for your responses!

    Once my head stops spinning I may have a follow up question but for now I need to get a bit more comfortable with CTE.

    I studied Jeff's "Tally" article and appreciate the performance benefits. One concern I have is that the database I'm reporting on is part of an ERP system (Vantage), over which I have no control. I suppose I could ask our DBA to run a script to create the Tally table, but then each time we migrated to a new version of Vantage (which typically entails running a database conversion), we'd have to also remember to re-create the Tally table in the new database with the converted data. Sounds like a small price to pay, I know -- I guess I'm looking for feedback on how people manage this kind of customization in a tightly controlled ERP environment.

    Thanks again.

    Ahmet

  • Easy - put the Tally table in another database not so tightly managed. you don't HAVE to put it in the database that changes all of the time. As long as it's in a local database on the same server, you should be able to take advantage of it.

    Some on here even advocate having a dedicated "utility" database, with goodies such as this in there.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply