Parsing a cell into multiple rows

  • I agree with Dave's logic. You have to manually write a program to parse this file and then store the required text from file into a table.

    i did parsing of complete HTML file in past where i have to extract information and then store it in table, do some database processing on info and then store the resultant into HTML page at its desired location. This is complete db programming. It took me a while to think and work around logic.

    Best of luck mate.

  • wdillon (5/5/2008)


    Hi Jeff

    Hum, you are right, it's pretty messed up. I'll attach a zip file of the orginal and see if that works.

    Thanks

    Bill

    Thanks... that's a lot better.

    Suggestion on using b an /b are good but not 100%... the marks are inconsistent. Fortunately, they can be made consistent... and then we can hit this with a "simple" 3 dimensional split.

    I'm working on it... just to confirm, you ARE using SQL Server 2005 for sure, right?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Like I said, the data needs to be made more consistent... and the code below does that. About all you need to do is change the path to the file.

    Fun exercise... thanks...

    [font="Courier New"]

    --===== Import the data into a temporary working table

         IF OBJECT_ID('TempDB..#RawData','U') IS NOT NULL

            DROP TABLE #RawData

    CREATE TABLE #RawData (RawData VARCHAR(8000))

       BULK INSERT #RawData

       FROM 'C:\Temp\EastLake\EastLake.csv' --SQL Server needs to be able to "see" where the file is at

       WITH (DATAFILETYPE = 'WIDECHAR',     --It IS a unicode file

             MAXERRORS    = 1000000)        --Forget errors... load everything

    --===== Precondition the raw data to make it consistent

         -- These are order sensitive

    UPDATE #RawData SET RawData = REPLACE(RawData,'  ',' ')

    UPDATE #RawData SET RawData = REPLACE(RawData,'  ',' ')

    UPDATE #RawData SET RawData = REPLACE(RawData,' Check In:','<b>Check In:')

    UPDATE #RawData SET RawData = REPLACE(RawData,' <','<')

    UPDATE #RawData SET RawData = REPLACE(RawData,': ',':</b>')

    UPDATE #RawData SET RawData = REPLACE(RawData,'<br><b>Total Cost:','<b>Total Cost:')

    UPDATE #RawData SET RawData = REPLACE(RawData,'> ','>')

    UPDATE #RawData SET RawData = REPLACE(RawData,'<br /><br /><b>','<br><b>')

    UPDATE #RawData SET RawData = REPLACE(RawData,'<br />','<b>')

    UPDATE #RawData SET RawData = REPLACE(RawData,' Tax:','<b>Tax:')

    UPDATE #RawData SET RawData = REPLACE(RawData,'Total Cost:','Total Price:')

    UPDATE #RawData SET RawData = REPLACE(RawData,'<b>','~')

    UPDATE #RawData SET RawData = REPLACE(RawData,'</b>','~')

    UPDATE #RawData SET RawData = REPLACE(RawData,'<br>','|')

    UPDATE #RawData SET RawData = REPLACE(RawData,'~~|~','~')

    --===== Do the "impossible" split

    ;WITH

    cteSplit1 AS

    (--==== This splits reservations that have more than one Item per line

    SELECT ReservationID,

            ItemNumber = ROW_NUMBER() OVER (PARTITION BY ReservationID ORDER BY ReservationID),

            Split1

       FROM (

             SELECT ReservationID = CAST(LEFT(RawData,CHARINDEX(',',rd.RawData)-1) AS INT),

                    Split1 = SUBSTRING('|'+rd.RawData+'|',t.N+1,CHARINDEX('|','|'+rd.RawData+'|',t.N+1)-t.N-1)

               FROM #RawData rd,

                    dbo.Tally t

              WHERE t.N < LEN('|'+rd.RawData+'|')      

                AND t.N > CHARINDEX(',',rd.RawData)      

                AND SUBSTRING('|'+rd.RawData+'|',t.N,1) = '|'

            )d

    )

    ,

    cteSplit2 AS

    (--==== This splits the individual Items from each line with a couple of "helper" columns.

         -- SubItem let's us join back to the cte to get matching pairs

    SELECT ReservationID,

            ItemNumber,

            SubItem = ROW_NUMBER() OVER (PARTITION BY ReservationID,ItemNumber ORDER BY ReservationID,ItemNumber),

            HalfItem = (ROW_NUMBER() OVER (PARTITION BY ReservationID,ItemNumber ORDER BY ReservationID,ItemNumber)-1)%2,

            Split2 = SUBSTRING(s1.Split1+'~',t.N+1,CHARINDEX('~',s1.Split1+'~',t.N+1)-t.N-1)

       FROM cteSplit1 s1,

            dbo.Tally t

      WHERE t.N < LEN(s1.Split1+'~')      

        AND SUBSTRING(s1.Split1+'~',t.N,1) = '~'

    )

    --===== This reassembles the data into what we'd expect as a table

    SELECT c.ReservationID,

            c.ItemNumber,

            [Service Type] = MAX(CASE c.Split2  

                                 WHEN 'Cabin:'   THEN 'Cabin'

                                 WHEN 'Dock:'    THEN 'Dock'

                                 WHEN 'Pontoon:' THEN 'Pontoon'

                                 WHEN 'RV Site:' THEN 'RV Site'

                                 END),

            [Service Name] = MAX(CASE c.Split2  

                                 WHEN 'Cabin:'   THEN v.Split2

                                 WHEN 'Dock:'    THEN v.Split2

                                 WHEN 'Pontoon:' THEN v.Split2

                                 WHEN 'RV Site:' THEN v.Split2

                                 END),

            [Check In]     = MAX(CASE

                                 WHEN c.Split2 = 'Check In:'

                                  AND ISDATE(v.Split2) = 1

                                 THEN CAST(v.Split2 AS DATETIME)

                                 END),

            [Check OUT]    = MAX(CASE

                                  WHEN c.Split2 = 'Check Out:'

                                  AND ISDATE(v.Split2) = 1

                                 THEN CAST(v.Split2 AS DATETIME)

                                 END),

            Price          = MAX(CASE

                                 WHEN c.Split2 = 'Price:'

                                 THEN CAST(v.Split2 AS MONEY)

                                 END),

            Tax            = MAX(CASE

                                 WHEN c.Split2 = 'Tax:'

                                 THEN CAST(v.Split2 AS MONEY)

                                 END),

            Total          = MAX(CASE

                                 WHEN c.Split2 = 'Total:'

                                 THEN CAST(v.Split2 AS MONEY)

                                 END),

            [Total Price]  = MAX(CASE

                                 WHEN c.Split2 = 'Total Price:'

                                 THEN CAST(v.Split2 AS MONEY)

                                 END)

       FROM cteSplit2 c --This provides the column name part of paired values

      INNER JOIN

            cteSplit2 v --This provides the value part of paired values

         ON c.ReservationID = v.ReservationID

        AND c.ItemNumber    = v.ItemNumber

        AND c.SubItem+1     = v.SubItem --Value always follows the column name

      WHERE c.HalfItem      = 0         --Identifies Column name part of paired values

      GROUP BY c.ReservationID,

               c.ItemNumber

      ORDER BY c.ReservationID,

               c.ItemNumber

    [/font]

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff:

    A lot of fun ideas there!!!! Thank you very very much. Yes SQL 2005 for the question. I'll be working on this in about 2 hours and see where it goes.

    Bill

  • Hi Jeff:

    So, I'm playing with this now and it complains about a "Tally" object. When searching to find out more about this "Tally" object, I'm finding "Karthik, if you want to know how it works, I have an article coming out on May 7th (just 4 more days... :hehe called "The "Numbers" or "Tally" Table: What it is and how it replaces a loop."

    Smiles!

    So is there anything I can do quick and dirty for this?

    Thanks

    Bill

  • create table dbo.Tally (

    Number int identity (0,1) primary key,

    Junk bit)

    go

    insert into dbo.Tally(junk)

    select top 10001 null

    from sys.all_objects

    cross join sys.all_objects

    go

    alter table dbo.Tally

    drop column junk

    That should do it for you.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Hi Jeff, I got the Tally table thing worked out from finding on another post of yours. WOW!!!! The data looks great!! Do you ever get any sleep? Nice work

    Bill

  • If I understand what the point of the original question was correctly, all you want to do is returen a table from a a column with values. I had to do comma delimited one before and created a user defined function for it as follows...

    ALTER FUNCTION [dbo].[UDF_CharCommaSeparatedListToTable]

    (

    @CommaSeparatedList VARCHAR(8000)

    )

    RETURNS @ParsedTableValue TABLE (TableValue VARCHAR(1000))

    AS

    BEGIN

    DECLARE @TableValue VARCHAR(1000)

    DECLARE @Pos INT

    SET @CommaSeparatedList = LTRIM(RTRIM(@CommaSeparatedList))+ ','

    SET @Pos = CHARINDEX(',', @CommaSeparatedList, 1)

    IF REPLACE(@CommaSeparatedList, ',', '') <> ''

    BEGIN

    WHILE @Pos > 0

    BEGIN

    SET @TableValue = LTRIM(RTRIM(LEFT(@CommaSeparatedList, @Pos - 1)))

    IF @TableValue <> ''

    BEGIN

    INSERT INTO @ParsedTableValue (TableValue)

    VALUES (RTRIM(@TableValue))

    END

    SET @CommaSeparatedList = RIGHT(@CommaSeparatedList,

    LEN(@CommaSeparatedList) - @Pos)

    SET @Pos = CHARINDEX(',', @CommaSeparatedList, 1)

    END

    END

    RETURN

    END

  • wdillon (5/6/2008)


    Hi Jeff, I got the Tally table thing worked out from finding on another post of yours. WOW!!!! The data looks great!! Do you ever get any sleep? Nice work

    Bill

    Jeff rarely sleeps.

    Steve's had a buzzer implanted on Jeff, that goes off each time a new post is added...:)

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt Miller (5/6/2008)


    wdillon (5/6/2008)


    Hi Jeff, I got the Tally table thing worked out from finding on another post of yours. WOW!!!! The data looks great!! Do you ever get any sleep? Nice work

    Bill

    Jeff rarely sleeps.

    Steve's had a buzzer implanted on Jeff, that goes off each time a new post is added...:)

    Must ... resist ... temptation .... to ... just ... spam ... heck .... out ... of ... forums ... at ... 4 AM...

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Matt Miller (5/6/2008)


    Steve's had a buzzer implanted on Jeff, that goes off each time a new post is added...:)

    So THAT's what it is! Here I was trying to cut back on coffee and all! 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • wdillon (5/6/2008)


    Hi Jeff, I got the Tally table thing worked out from finding on another post of yours. WOW!!!! The data looks great!! Do you ever get any sleep? Nice work

    Bill

    Awesome! Man, thanks for the compliment. I'm pretty well embarrassed that I forgot to give you the link for the Tally table. Better late than never I suppose… here 'tis…

    http://www.sqlservercentral.com/scripts/Advanced+SQL/62486/

    ...and that's not the article that comes out tomorrow.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff

    Thanks for the intelligent and neat solution of this big problem.

    I was thinking of a scenario where the number of records in the input file is huge say a million, would this script be suitable in terms of performance.

    I think it wouldn't be as we have to first increase the tally table to 1 million rows and then do a cartesan join with input file records which is again a million.

    What could be possible ways in order to handle this kind of situation? I can think of writing my own program which would parse the sequential file and extract the required data.

    Any advices to tackle this situation.

    Thanks

    Anam

  • If it's truly a "file" and not just some passed parameter, I'd import it with Bulk Insert or BCP, instead. Dunno about others but I've been able to import 5.1 million 20 column rows in 60 seconds flat using a format file and Bulk Insert.

    The big problem with this one is the record format is not consistent. Some rows have more than 1 "Unit" in it and the units are not all laid out the same way. If it were me and I was going to have to content with importing a million rows on a regular basis, I'd get with the folks providing the data and, armed with my favorite bucket of porkchops, beat them into submission of regular importable data. 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Along a similar vein I have an issue that I'm trying to figure out. I have a fairly simple table...

    create table TEST (

    a int identity ( 1,1 ),

    s varchar(2000) )

    insert TEST values ( 'ABC|DEF|GHI|JKL' )

    insert TEST values ( 'MNO|PQR|STU|VWX' )

    insert TEST values ( 'YZ|123|456|789' )

    I took Jeff's tally table and parse function and added a second column to the returned data, just a sequential integer.

    (The 2k in the name refers to the fact that the tally table has only 2 thousand entries. I also created a 10K and my unbounded 100M.)

    create FUNCTION f_StringToTable_2K_With_Index

    (@s-2varchar(max),

    @deliminatorchar(1) )

    RETURNS @t table ( c varchar(255),

    idxsmallint )

    AS

    BEGIN

    DECLARE @ismallint

    SET @s-2 = @deliminator + @s-2 + @deliminator

    INSERT INTO @t (c, idx)

    SELECT SUBSTRING(@s, N+1, CHARINDEX(@deliminator, @s-2, N+1)-N-1) , ROW_NUMBER() OVER (ORDER BY N)

    FROM dbo.Tally_2k

    WHERE N < LEN( @s-2 ) AND SUBSTRING(@s,N,1) = @deliminator

    RETURN

    END

    What I would like to figure out is how I can feed this table into a select statement and get output like:

    [font="Courier New"]a c idx

    --- --- ---

    1 ABC 1

    1 DEF 2

    1 GHI 3

    1 JKL 4

    2 MNO 1

    2 PQR 2

    2 STU 3

    2 VWX 4

    3 YZ 1

    3 123 2

    3 456 3

    3 789 4[/font]

    I'm sure there's an elegant way to do this, I'm just not seeing it. Lack of sleep from a teething 1.5 year old, there isn't enough coffee around here.

    Your time and consideration is greatly appreciated.

    Honor Super Omnia-
    Jason Miller

Viewing 15 posts - 16 through 30 (of 30 total)

You must be logged in to reply to this topic. Login to reply