Help needed to decode comma delimited string and insert into tables

Question

Help needed to decode comma delimited string and insert into tables

Viewing 11 posts - 31 through 40 (of 40 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004435 More actions · Answer 1

{edit} I found a bug in my code after I posted it, so I took it down. I'm trying to fix the bug right now.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Paul White SSC Guru Points: 150468 More actions · Answer 2

JJH82 (2/27/2011)
yes, I can create or use stored procedures on the server.
:crazy:but i have not .Net programming experience

OK. I was suggesting a .NET stored procedure or function, so that may not fly for you.

Until Jeff comes up with something to significantly improve the TSQL processing, one other option would be to get a .NET programmer to write a little bit of code to pre-process the files into an easier form for SQL Server to bulk load.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Jeff Moden SSC Guru Points: 1004435 More actions · Answer 3

Ok... I've not tested every eventuality, but this works with the test data provided by other posters.

First, here's that test data.

CREATE TABLE dbo.CLOB

(

Row_ID INT IDENTITY(1,1),

Parse_Clob NVARCHAR(4000)

)

INSERT INTO dbo.CLOB

SELECT ',2 Column_Name "Col1" "Col2" , 2 New_Column_Value "A" "B", 2 Old_Column_Value "a" "b"'

UNION ALL

SELECT ',2 Field_Name "F1" "F2", 2 New_Field_Name "F1" "F2", 2 Old_Field_Name "f1" "f2" ,1 Column_Name "Col3" ,1 New_Column_Value "C", 1 Old_Column_Value "c"'

UNION ALL

SELECT ',1 Field_Name "F3", 1 New_Field_Name "F3", 1 Old_Field_Name "f3" ,1 Column_Name "Col6" ,1 New_Column_Value "1<", 1 Old_Column_Value "2&"'

;

Next, the solution. It uses a thing called a "Tally Table". If you don't know what that is, how to build one, or how it replaces certain types of While Loops, please read the following article...

http://www.sqlservercentral.com/articles/T-SQL/62867/

WITH

cteFirstSplit AS

(

SELECT Row_ID,

Section = ROW_NUMBER() OVER (PARTITION BY Row_ID ORDER BY t.N)-1,

WorkingString = RTRIM(SUBSTRING(Parse_Clob,t.N+1,ISNULL(NULLIF(CHARINDEX(',',Parse_Clob,t.N+1),0),DATALENGTH(Parse_Clob)+1)-t.N-1))

FROM dbo.Tally t

CROSS JOIN dbo.Clob c

WHERE t.N BETWEEN 1 AND LEN(Parse_Clob)

AND SUBSTRING(Parse_Clob, t.N, 1) = ','

),

cteFirstClean AS

(

SELECT Row_ID,

Section,

Field_Type = CASE WHEN CHARINDEX('Field',WorkingString)>0 THEN 'Field' ELSE 'Column' END,

WorkingString = REPLACE(SUBSTRING(WorkingString, CHARINDEX('"',WorkingString), 4000),'" "','"')

FROM cteFirstSplit

),

cteFinalSplit AS

(

SELECT Row_ID,

Section = Section,

Element = ROW_NUMBER() OVER (PARTITION BY Row_ID, Field_Type, Section ORDER BY Field_Type, Section, t.N)-1,

Field_Type,

SplitString = RTRIM(SUBSTRING(WorkingString,t.N+1,ISNULL(NULLIF(CHARINDEX('"',WorkingString,t.N+1),0),DATALENGTH(WorkingString)+1)-t.N-1))

FROM dbo.Tally t

CROSS JOIN cteFirstClean

WHERE t.N BETWEEN 1 AND LEN(WorkingString)-1

AND SUBSTRING(WorkingString, t.N, 1) = '"'

)

--==== Now, reassemble

SELECT Row_ID,

Field_Type = MAX(CASE WHEN Section % 3 = 0 THEN Field_Type ELSE '' END),

Field_Name = MAX(CASE WHEN Section % 3 = 0 THEN SplitString ELSE '' END),

New_Value = MAX(CASE WHEN Section % 3 = 1 THEN SplitString ELSE '' END),

Old_Value = MAX(CASE WHEN Section % 3 = 2 THEN SplitString ELSE '' END)

FROM cteFinalSplit

GROUP BY Row_ID, Field_Type, Element

ORDER BY Row_ID, Field_Type, Element

;

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Paul White SSC Guru Points: 150468 More actions · Answer 4

A couple of tweaks to Jeff's code, mostly for correctness, not speed. It will be interesting to see how the Tally code performs.

;

WITH cteFirstSplit

AS (

SELECT Row_ID,

Section =

ROW_NUMBER() OVER

(

PARTITION BY C.Row_ID

ORDER BY T.N

) - 1,

WorkingString =

LTRIM(RTRIM(SUBSTRING(C.Parse_Clob, T.N+1,

ISNULL(NULLIF(CHARINDEX(N',', C.Parse_Clob, T.N + 1), 0),

DATALENGTH(C.Parse_Clob) + 1) - T.N-1)))

FROM dbo.Tally AS T

JOIN dbo.Clob AS C

ON T.N >= 1

AND T.N <= DATALENGTH(C.Parse_Clob) / DATALENGTH(LEFT(C.Parse_Clob, 1))

WHERE SUBSTRING(C.Parse_Clob, T.N, 1) = N','

),

cteFirstClean

AS (

SELECT S1.Row_ID,

S1.Section,

Field_Type =

CASE SUBSTRING(S1.WorkingString, 1 + CHARINDEX(SPACE(1), S1.WorkingString), 10)

WHEN N'New_Field_' THEN N'Field'

WHEN N'Old_Field_' THEN N'Field'

WHEN N'Field_Name' THEN N'Field'

WHEN N'Old_Column' THEN N'Column'

WHEN N'New_Column' THEN N'Column'

WHEN N'Column_Nam' THEN N'Column'

END,

WorkingString =

LTRIM(REPLACE(SUBSTRING(S1.WorkingString,

CHARINDEX(NCHAR(34), S1.WorkingString), 2147483647),

NCHAR(34) + SPACE(1) + NCHAR(34), NCHAR(34)))

FROM cteFirstSplit AS S1

), cteFinalSplit

AS (

SELECT Row_ID,

Section = Section,

Element = ROW_NUMBER() OVER

(

PARTITION BY FC.Row_ID, FC.Field_Type, FC.Section

ORDER BY FC.Field_Type, FC.Section, T.N

) - 1,

Field_Type,

SplitString =

RTRIM(SUBSTRING(FC.WorkingString, T.N + 1,

ISNULL(NULLIF(CHARINDEX(NCHAR(34), FC.WorkingString, T.N + 1), 0),

DATALENGTH(FC.WorkingString) + 1) - T.N - 1))

FROM dbo.Tally AS T

JOIN cteFirstClean AS FC

ON T.N >= 1

AND T.N <= (DATALENGTH(FC.WorkingString) / DATALENGTH(LEFT(FC.WorkingString, 1))) - 1

WHERE SUBSTRING(FC.WorkingString, T.N, 1) = NCHAR(34)

)

SELECT FS.Row_ID,

Field_Type = MAX(CASE WHEN FS.Section % 3 = 0 THEN FS.Field_Type ELSE SPACE(0) END),

Field_Name = MAX(CASE WHEN Section % 3 = 0 THEN FS.SplitString ELSE SPACE(0) END),

New_Value = MAX(CASE WHEN Section % 3 = 1 THEN FS.SplitString ELSE SPACE(0) END),

Old_Value = MAX(CASE WHEN Section % 3 = 2 THEN FS.SplitString ELSE SPACE(0) END)

FROM cteFinalSplit AS FS

GROUP BY

FS.Row_ID,

FS.Field_Type,

FS.Element

ORDER BY

FS.Row_ID,

FS.Field_Type,

FS.Element

;

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Jeff Moden SSC Guru Points: 1004435 More actions · Answer 5

SQLkiwi (2/28/2011)
A couple of tweaks to Jeff's code, mostly for correctness

I was in a bit of a hurry... what tweaks did you make?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004435 More actions · Answer 6

Jeff Moden (2/28/2011)
SQLkiwi (2/28/2011)
A couple of tweaks to Jeff's code, mostly for correctness
I was in a bit of a hurry... what tweaks did you make?

Never mind. I see. My code worked but wouldn't suffer many data changes.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Paul White SSC Guru Points: 150468 More actions · Answer 7

Jeff Moden (2/28/2011)
Never mind. I see. My code worked but wouldn't suffer many data changes.

Yes, only minor changes to some very nice code. My main worry is that the sorts and string splitting operations won't scale well. We'll see if the real data does have 100,000 groups in a single line as was mentioned earlier...

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Jeff Moden SSC Guru Points: 1004435 More actions · Answer 8

Thanks, Paul.

I agree... in particular, I saw that 70% sort on the second use of the Tally Table and believe it would go away if we simply wrote the results of the cteFirstClean out to an interim table to do the final split on.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Paul White SSC Guru Points: 150468 More actions · Answer 9

Jeff Moden (2/28/2011)
I agree... in particular, I saw that 70% sort on the second use of the Tally Table and believe it would go away if we simply wrote the results of the cteFirstClean out to an interim table to do the final split on.

I doubt it would go away - the sorting for the window aggregates has to be performed at some point (creating or inserting into an index on the intermediate table would involve a sort too).

The intermediate result set might also be quite large, so the memory grant required by the sort could cause problems too. To over-simplify a little, as the sort grows, so does the memory grant. As soon as the server is unable to meet the memory grant request, the sort will either physically spill to tempdb disk, or the query will wait on RESOURCE_SEMAPHORE. Either way, not good.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

JJH82 SSC Enthusiast Points: 134 More actions · Answer 10

thanks guys, sorry for replying so late, i am trying them out today

🙂

JJH82 SSC Enthusiast Points: 134 More actions · Answer 11

Great jobs guys, and the execution time is excellent, it reduced processing time from 3 hr45min to 2 mins. Thank you for all your input!:-):-D