Help needed to decode comma delimited string and insert into tables

  • {edit} I found a bug in my code after I posted it, so I took it down. I'm trying to fix the bug right now.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • JJH82 (2/27/2011)


    yes, I can create or use stored procedures on the server.

    :crazy:but i have not .Net programming experience

    OK. I was suggesting a .NET stored procedure or function, so that may not fly for you.

    Until Jeff comes up with something to significantly improve the TSQL processing, one other option would be to get a .NET programmer to write a little bit of code to pre-process the files into an easier form for SQL Server to bulk load.

  • Ok... I've not tested every eventuality, but this works with the test data provided by other posters.

    First, here's that test data.

    CREATE TABLE dbo.CLOB

    (

    Row_ID INT IDENTITY(1,1),

    Parse_Clob NVARCHAR(4000)

    )

    INSERT INTO dbo.CLOB

    SELECT ',2 Column_Name "Col1" "Col2" , 2 New_Column_Value "A" "B", 2 Old_Column_Value "a" "b"'

    UNION ALL

    SELECT ',2 Field_Name "F1" "F2", 2 New_Field_Name "F1" "F2", 2 Old_Field_Name "f1" "f2" ,1 Column_Name "Col3" ,1 New_Column_Value "C", 1 Old_Column_Value "c"'

    UNION ALL

    SELECT ',1 Field_Name "F3", 1 New_Field_Name "F3", 1 Old_Field_Name "f3" ,1 Column_Name "Col6" ,1 New_Column_Value "1<", 1 Old_Column_Value "2&"'

    ;

    Next, the solution. It uses a thing called a "Tally Table". If you don't know what that is, how to build one, or how it replaces certain types of While Loops, please read the following article...

    http://www.sqlservercentral.com/articles/T-SQL/62867/

    WITH

    cteFirstSplit AS

    (

    SELECT Row_ID,

    Section = ROW_NUMBER() OVER (PARTITION BY Row_ID ORDER BY t.N)-1,

    WorkingString = RTRIM(SUBSTRING(Parse_Clob,t.N+1,ISNULL(NULLIF(CHARINDEX(',',Parse_Clob,t.N+1),0),DATALENGTH(Parse_Clob)+1)-t.N-1))

    FROM dbo.Tally t

    CROSS JOIN dbo.Clob c

    WHERE t.N BETWEEN 1 AND LEN(Parse_Clob)

    AND SUBSTRING(Parse_Clob, t.N, 1) = ','

    ),

    cteFirstClean AS

    (

    SELECT Row_ID,

    Section,

    Field_Type = CASE WHEN CHARINDEX('Field',WorkingString)>0 THEN 'Field' ELSE 'Column' END,

    WorkingString = REPLACE(SUBSTRING(WorkingString, CHARINDEX('"',WorkingString), 4000),'" "','"')

    FROM cteFirstSplit

    ),

    cteFinalSplit AS

    (

    SELECT Row_ID,

    Section = Section,

    Element = ROW_NUMBER() OVER (PARTITION BY Row_ID, Field_Type, Section ORDER BY Field_Type, Section, t.N)-1,

    Field_Type,

    SplitString = RTRIM(SUBSTRING(WorkingString,t.N+1,ISNULL(NULLIF(CHARINDEX('"',WorkingString,t.N+1),0),DATALENGTH(WorkingString)+1)-t.N-1))

    FROM dbo.Tally t

    CROSS JOIN cteFirstClean

    WHERE t.N BETWEEN 1 AND LEN(WorkingString)-1

    AND SUBSTRING(WorkingString, t.N, 1) = '"'

    )

    --==== Now, reassemble

    SELECT Row_ID,

    Field_Type = MAX(CASE WHEN Section % 3 = 0 THEN Field_Type ELSE '' END),

    Field_Name = MAX(CASE WHEN Section % 3 = 0 THEN SplitString ELSE '' END),

    New_Value = MAX(CASE WHEN Section % 3 = 1 THEN SplitString ELSE '' END),

    Old_Value = MAX(CASE WHEN Section % 3 = 2 THEN SplitString ELSE '' END)

    FROM cteFinalSplit

    GROUP BY Row_ID, Field_Type, Element

    ORDER BY Row_ID, Field_Type, Element

    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • A couple of tweaks to Jeff's code, mostly for correctness, not speed. It will be interesting to see how the Tally code performs.

    ;

    WITH cteFirstSplit

    AS (

    SELECT Row_ID,

    Section =

    ROW_NUMBER() OVER

    (

    PARTITION BY C.Row_ID

    ORDER BY T.N

    ) - 1,

    WorkingString =

    LTRIM(RTRIM(SUBSTRING(C.Parse_Clob, T.N+1,

    ISNULL(NULLIF(CHARINDEX(N',', C.Parse_Clob, T.N + 1), 0),

    DATALENGTH(C.Parse_Clob) + 1) - T.N-1)))

    FROM dbo.Tally AS T

    JOIN dbo.Clob AS C

    ON T.N >= 1

    AND T.N <= DATALENGTH(C.Parse_Clob) / DATALENGTH(LEFT(C.Parse_Clob, 1))

    WHERE SUBSTRING(C.Parse_Clob, T.N, 1) = N','

    ),

    cteFirstClean

    AS (

    SELECT S1.Row_ID,

    S1.Section,

    Field_Type =

    CASE SUBSTRING(S1.WorkingString, 1 + CHARINDEX(SPACE(1), S1.WorkingString), 10)

    WHEN N'New_Field_' THEN N'Field'

    WHEN N'Old_Field_' THEN N'Field'

    WHEN N'Field_Name' THEN N'Field'

    WHEN N'Old_Column' THEN N'Column'

    WHEN N'New_Column' THEN N'Column'

    WHEN N'Column_Nam' THEN N'Column'

    END,

    WorkingString =

    LTRIM(REPLACE(SUBSTRING(S1.WorkingString,

    CHARINDEX(NCHAR(34), S1.WorkingString), 2147483647),

    NCHAR(34) + SPACE(1) + NCHAR(34), NCHAR(34)))

    FROM cteFirstSplit AS S1

    ), cteFinalSplit

    AS (

    SELECT Row_ID,

    Section = Section,

    Element = ROW_NUMBER() OVER

    (

    PARTITION BY FC.Row_ID, FC.Field_Type, FC.Section

    ORDER BY FC.Field_Type, FC.Section, T.N

    ) - 1,

    Field_Type,

    SplitString =

    RTRIM(SUBSTRING(FC.WorkingString, T.N + 1,

    ISNULL(NULLIF(CHARINDEX(NCHAR(34), FC.WorkingString, T.N + 1), 0),

    DATALENGTH(FC.WorkingString) + 1) - T.N - 1))

    FROM dbo.Tally AS T

    JOIN cteFirstClean AS FC

    ON T.N >= 1

    AND T.N <= (DATALENGTH(FC.WorkingString) / DATALENGTH(LEFT(FC.WorkingString, 1))) - 1

    WHERE SUBSTRING(FC.WorkingString, T.N, 1) = NCHAR(34)

    )

    SELECT FS.Row_ID,

    Field_Type = MAX(CASE WHEN FS.Section % 3 = 0 THEN FS.Field_Type ELSE SPACE(0) END),

    Field_Name = MAX(CASE WHEN Section % 3 = 0 THEN FS.SplitString ELSE SPACE(0) END),

    New_Value = MAX(CASE WHEN Section % 3 = 1 THEN FS.SplitString ELSE SPACE(0) END),

    Old_Value = MAX(CASE WHEN Section % 3 = 2 THEN FS.SplitString ELSE SPACE(0) END)

    FROM cteFinalSplit AS FS

    GROUP BY

    FS.Row_ID,

    FS.Field_Type,

    FS.Element

    ORDER BY

    FS.Row_ID,

    FS.Field_Type,

    FS.Element

    ;

    Paul

  • SQLkiwi (2/28/2011)


    A couple of tweaks to Jeff's code, mostly for correctness

    I was in a bit of a hurry... what tweaks did you make?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (2/28/2011)


    SQLkiwi (2/28/2011)


    A couple of tweaks to Jeff's code, mostly for correctness

    I was in a bit of a hurry... what tweaks did you make?

    Never mind. I see. My code worked but wouldn't suffer many data changes.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (2/28/2011)


    Never mind. I see. My code worked but wouldn't suffer many data changes.

    Yes, only minor changes to some very nice code. My main worry is that the sorts and string splitting operations won't scale well. We'll see if the real data does have 100,000 groups in a single line as was mentioned earlier...

  • Thanks, Paul.

    I agree... in particular, I saw that 70% sort on the second use of the Tally Table and believe it would go away if we simply wrote the results of the cteFirstClean out to an interim table to do the final split on.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (2/28/2011)


    I agree... in particular, I saw that 70% sort on the second use of the Tally Table and believe it would go away if we simply wrote the results of the cteFirstClean out to an interim table to do the final split on.

    I doubt it would go away - the sorting for the window aggregates has to be performed at some point (creating or inserting into an index on the intermediate table would involve a sort too).

    The intermediate result set might also be quite large, so the memory grant required by the sort could cause problems too. To over-simplify a little, as the sort grows, so does the memory grant. As soon as the server is unable to meet the memory grant request, the sort will either physically spill to tempdb disk, or the query will wait on RESOURCE_SEMAPHORE. Either way, not good.

  • thanks guys, sorry for replying so late, i am trying them out today

    🙂

  • Great jobs guys, and the execution time is excellent, it reduced processing time from 3 hr45min to 2 mins. Thank you for all your input!:-):-D

Viewing 11 posts - 31 through 40 (of 40 total)

You must be logged in to reply to this topic. Login to reply