Splitting Data from One Column into 5 Columns

  • I need help to create a script that would perform text parsing from a table. I have a text (words separated by the symbol '&' (symbol ampersand). I need to extract all words from the text as columns.

    -The column name is 'Parameters' ---> [Source Table].[Parameters]

    -The Parameter field contains text like this all togther: (Below is 2 sample rows)

    Subject=Drilling&Category=Drill Header&Status=NO&IPPDM_Count=94356&Well_Count=84353&Zone_Status=NO&Prod_String_Count=95249&R_Master_W_Count=94353&BeginDate=2/3/2008&EndDate=2/5/2008

    Subject=Zone&Category=Zone Header&Status=YES&IPPDM_Count=94356&Well_Count=94356&Zone_Status=NO&Prod_String_Count=95249&R_Master_W_Count=94356&AddlZoneInterest=107674

    There is a symbol '&' (symbol ampersand) between each category, all in 1 column.

    My goal: To extract just the TEXT after the '=' like this:

    Result should look like:

    Subject Category Status IPPDM_Count Well_Count

    ---------------------------------------------------------------

    Drill Drill Header NO 94356 84353

    Zone Zone Header YES 94356 94356

    THANKS!

  • Just checking to be sure... do all the rows have exactly the same number of "fields" and all the rows are always in the same basic format?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (4/1/2008)


    Just checking to be sure... do all the rows have exactly the same number of "fields" and all the rows are always in the same basic format?

    since that looks like an URL fragment, I'll bet no. but, i'll also bet that an xml parsing solution can do this quickly and (relatively) simply.;)

  • Thanks, Antonio...

    While I agree that the XML concatenation method is very fast, I've found that the XLM parsing method is a bit slow compared to a Tally Table or cteTally split.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • sounds like a challenge to me! winner buys a round of beer for everyone?

  • lalonacademy (4/1/2008)


    I need help to create a script that would perform text parsing from a table.

    I have a text (words separated by the symbol '&' (symbol ampersand). I need to extract all words from the text as columns.

    -The column name is 'Parameters' ---> [Source Table].[Parameters]

    -The Parameter field contains text like this all togther: (Below is 2 sample rows)

    Subject=Drilling&Category=Drill Header&Status=NO&IPPDM_Count=94356&Well_Count=84353&Zone_Status=NO&Prod_String_Count=95249&R_Master_W_Count=94353&BeginDate=2/3/2008&EndDate=2/5/2008

    Subject=Zone&Category=Zone Header&Status=YES&IPPDM_Count=94356&Well_Count=94356&Zone_Status=NO&Prod_String_Count=95249&R_Master_W_Count=94356&AddlZoneInterest=107674

    There is a symbol '&' (symbol ampersand) between each category, all in 1 column.

    My goal: To extract just the TEXT after the '=' like this:

    Result should look like:

    Subject Category Status IPPDM_Count Well_Count

    ---------------------------------------------------------------

    Drill Drill Header NO 94356 84353

    Zone Zone Header YES 94356 94356

    Ok... first step... you have to add one of the most valuable tools available to your data base... a Tally table.

    Sure, you can make a CTE to do something similar, but even that won't be as fast as a Tally table. Please visit the

    following URL to learn what it is and how to make one...

    http://www.sqlservercentral.com/scripts/Advanced+SQL/62486/

    Once you have that bad boy in place, then you can start doing some very high speed magic... like splitting your data

    into columns... here's some test data (please see the URL in my signature for how we'd like to see example data) and

    the code to split it... lemme know how it works for you...

    --===== Create a test table to hold the test data. This is NOT part of the solution

    CREATE TABLE #SourceTable (Parameters VARCHAR(8000))

    INSERT INTO #SourceTable

    (Parameters)

    SELECT 'Subject=Drilling&Category=Drill Header&Status=NO&IPPDM_Count=94356&Well_Count=84353&Zone_Status=NO&Prod_String_Count=95249&R_Master_W_Count=94353&BeginDate=2/3/2008&EndDate=2/5/2008' UNION ALL

    SELECT 'Subject=Zone&Category=Zone Header&Status=YES&IPPDM_Count=94356&Well_Count=94356&Zone_Status=NO&Prod_String_Count=95249&R_Master_W_Count=94356&AddlZoneInterest=107674'

    --===== Split the data into an EAV (Entity and Value) cte and reassemble as rows containing the desired columns

    ;WITH

    cteFirstSplit AS

    (--==== Splits data on the "&" character

    SELECT RowNum = DENSE_RANK() OVER (ORDER BY Parameters),

    ColInfo = SUBSTRING('&'+h.Parameters+'&', t.N+1, CHARINDEX('&', '&'+h.Parameters+'&', t.N+1)-t.N-1)

    FROM dbo.Tally t

    RIGHT OUTER JOIN --Necessary in case Parameters is NULL

    #SourceTable h

    ON SUBSTRING('&'+h.Parameters, t.N, 1) = '&'

    AND t.N < LEN('&'+h.Parameters)

    )

    ,

    cteSecondSplit AS

    (--==== This splits the split-out ColInfo at the "=" sign

    SELECT RowNum,

    ColName = LEFT(ColInfo,CHARINDEX('=',ColInfo)-1),

    ColVal = SUBSTRING(Colinfo,CHARINDEX('=',ColInfo)+1,400)

    FROM cteFirstSplit

    )

    SELECT MAX(CASE WHEN ColName = 'Subject' THEN ColVal ELSE NULL END) AS Subject,

    MAX(CASE WHEN ColName = 'Category' THEN ColVal ELSE NULL END) AS Category,

    MAX(CASE WHEN ColName = 'Status' THEN ColVal ELSE NULL END) AS Status,

    MAX(CASE WHEN ColName = 'IPPDM_Count' THEN ColVal ELSE NULL END) AS IPPDM_Count,

    MAX(CASE WHEN ColName = 'Well_Count' THEN ColVal ELSE NULL END) AS Well_Count

    FROM cteSecondSplit

    GROUP BY RowNum

    ORDER BY RowNum

    Of course, you'll need to change the "#SourceTable " table name in the first CTE to the actual name of your table.

    Again, please visit the URL in my signature line... help us help you in the future.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • antonio.collins (4/1/2008)


    sounds like a challenge to me! winner buys a round of beer for everyone?

    Sure, I'm good for a virtual beer and pretzels... Million rows ok with you? 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ok... here's my entry for the virtual beer...

    1 million rows... half are the first row and half are the second row that the Op posted. My goal was to split the first 5 columns out just like the Op asked for and then cross-tab them into rows... like the OP asked for. To keep differences in video cards from coming into play, the final result is inserted into a temp table. Times do not include the time to make the data. Since the OP did not post a PK, I took the liberty of making an IDENTITY column and put the PK on that.

    My run times and box description is at the end of the code...

    --===== Create and populate a million row test table to hold the test data. This is NOT part of the solution

    drop table #SourceTable

    SELECT TOP 1000000

    RowNum = IDENTITY(INT,1,1),

    u.Parameters

    INTO #SourceTable

    FROM (--===== Same two lines of data repeated

    SELECT CAST('Subject=Drilling&Category=Drill Header&Status=NO&IPPDM_Count=94356&Well_Count=84353&Zone_Status=NO&Prod_String_Count=95249&R_Master_W_Count=94353&BeginDate=2/3/2008&EndDate=2/5/2008' AS VARCHAR(8000)) AS Parameters

    UNION ALL

    SELECT 'Subject=Zone&Category=Zone Header&Status=YES&IPPDM_Count=94356&Well_Count=94356&Zone_Status=NO&Prod_String_Count=95249&R_Master_W_Count=94356&AddlZoneInterest=107674') u,

    Master.sys.All_Columns ac1,

    Master.sys.All_Columns ac2

    ALTER TABLE #SourceTable

    ADD PRIMARY KEY CLUSTERED (RowNum)

    drop table #Results

    SET ANSI_WARNINGS OFF

    SET NOCOUNT ON

    SET STATISTICS TIME ON

    --===== Split the data into an EAV (Entity and Value) cte and reassemble as rows containing the desired columns

    ;WITH

    cteFirstSplit AS

    (--==== Splits data on the "&" character

    SELECT h.RowNum,

    ColNum = (ROW_NUMBER() OVER (ORDER BY Parameters)-1)%5,

    ColVal = SUBSTRING(h.Parameters, t.N+1, CHARINDEX('&', h.Parameters, t.N+1)-t.N-1)

    FROM dbo.Tally t WITH (NOLOCK)

    RIGHT OUTER JOIN --Necessary in case Parameters is NULL

    #SourceTable h

    ON SUBSTRING(h.Parameters, t.N, 1) = '='

    AND t.N < CHARINDEX('Zone_',h.Parameters)

    )

    SELECT RowNum,

    MAX(CASE WHEN ColNum = 0 THEN ColVal ELSE NULL END) AS Subject,

    MAX(CASE WHEN ColNum = 1 THEN ColVal ELSE NULL END) AS Category,

    MAX(CASE WHEN ColNum = 2 THEN ColVal ELSE NULL END) AS Status,

    MAX(CASE WHEN ColNum = 3 THEN ColVal ELSE NULL END) AS IPPDM_Count,

    MAX(CASE WHEN ColNum = 4 THEN ColVal ELSE NULL END) AS Well_Count

    INTO #Results

    FROM cteFirstSplit

    GROUP BY RowNum

    SET STATISTICS TIME OFF

    /**************************************************************

    6 year old single P5 1.8 GHz, IDI Hard Drive, 1GB ram

    SQL Server 2005 Developer's Edition sp2 with no cume updates.

    SQL Server Execution Times:

    1,000 rows: CPU time = 172 ms, elapsed time = 225 ms.

    10,000 rows: CPU time = 1531 ms, elapsed time = 1619 ms.

    100,000 rows: CPU time = 16875 ms, elapsed time = 21959 ms.

    1,000,000 rows: CPU time = 185921 ms, elapsed time = 268232 ms.

    ***************************************************************/

    Looking forward to your entry 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • lalonacademy,

    I'm not sure how you got the data into the table, but consider this... BULK INSERT with a format file to handle your strange delimiters will load 5.1 MILLION 20 column rows in 60 seconds flat... That means that 1 million rows will take about 12 seconds to import instead of going through all this in-column split stuff.

    You might want to consider changing your load procedure. :w00t:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Wow! I don't know where to start. Thanks for taking the time. Actually this is one of the field in a table. It's from the [reportserver].[ExecutionLog table].[Parameters]. The PK would be the key in that table. The parameters column contains the text in that format. Basically, I'm trying to query the [ExecutionLog] table and split the TEXT in the [Parameter] colum into several fields. This isn't from an URL. There will be thousands of rows in that table, some rows might have NULL values. I'm only looking to EXTRACT first 5 values. See the table in the REPORTSERVER database.

    Result should look like:

    Subject Category Status IPPDM_Count Well_Count

    ---------------------------------------------------------------

    Drill Drill Header NO 94356 84353

    Zone Zone Header YES 94356 94356

    Cheers!!

  • Where you want to start is the last code I posted... it will do the job.

    The whole thing is that I wonder how the data got into the parameters column like it is... did it come from a file?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    I'm NOT loading this into a table. I'm reporting on the executionLog table in reportserver database. I'll be quering the table to CREATE a DATASET in that format. I'm only parsing the [parameter] field. I'll be using the PK from the [ExecutionLog] table. Thanks!

    Query Result should look like:

    Subject Category Status IPPDM_Count Well_Count

    ---------------------------------------------------------------

    Drill Drill Header NO 94356 84353

    Zone Zone Header YES 94356 94356

  • Would you please read what I said more carefully? The code I wrote does your split the way you want it... the load stuff was just a suggestion to do things better.

    And read the comments in the code... the first part is just to build test data... that's not part of the solution...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    The Parameter is coming from the .RDL file SSRS 2005. All the parameters from the .rdl file logged into this field. Thanks I'll work on it in the morning.

  • Thanks...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 28 total)

You must be logged in to reply to this topic. Login to reply