Need Help to split Single String data and insert into many columns.

Question

Need Help to split Single String data and insert into many columns.

raymond.dba

Valued Member

Points: 50
More actions
October 7, 2009 at 9:43 am

#138550

Dear All,
I am working on a project for loading data from flat file source into a table using SSIS package. This SSIS ETL configuration is successfully done. There are thousands of records loaded very fast into the 'staging tables' of MS SQL Server 2008 database.
Now my new task is that I have to split some string data from this 'staging table' and insert into many columns along with the other existing columns of this 'staging table' into the "FINAL TABLE'.
For example, the 'staging table' contains 12 columns. I have to split the data of 'Column2' into two parts, and I have to split the data of the 'Column8' into six parts, and then insert the data into the 'final table' which has 18 columns. The strings of 'Column2' do not have any delimiter, we have to count the characters. But the strings of 'Column8' has two types of delimiter such as * and # respectively.
The 18 columns of 'FINAL TABLE' are the remaining 10 columns of the 'staging table' plus 8 NEW COLUMNS which are created from splitting the data of 'Column2' and 'Column8' of the 'staging table'.
Now I am in the middle of no where.
Please help me.
Thanks in advance.
Raymond

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Lynn Pettis SSC Guru Points: 442470 More actions · Answer 1

Please read the first article I reference below in my signature block regarding asking for assistance. To really help you it would help to have the DDL for the tables, some sample data, and expected results. If you follow the instructions in the article you will also get TESTED code in return.

matt6288 SSCertifiable Points: 5058 More actions · Answer 2

Try looking at the left(), mid(), and charindex() functions in books online. You should be able to parse anything you need with those. There might be better ways but those would get you started.

GSquared SSC Guru Points: 260824 More actions · Answer 3

From your description, it should be relatively easy to use the Substring function to split out those columns.

For example, something like this will handle column 2:

create table #T (

ID int identity primary key,

Col1 varchar(50),

Col2 varchar(50));

insert into #T (Col1, Col2)

select '1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ', '1234*567890ABCDEFGH#IJKLMNOPQ#RSTUVWXYZ';

select substring(Col1, 1, 10), substring(Col1, 11, 10), substring(Col1, 21, 50)

from #T;

I need to know if the delimiters in column 8 are in a specific sequence, or if it's variable. That'll determine the complexity of that part. Also, does it always split into a fixed number of columns, or is that variable?

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Piotr.Rodak SSCrazy Eights Points: 9376 More actions · Answer 4

Wouldn't it be easier to load the data back to SSIS pack and process them properly there? Or process the data and load them in their final state to the database.

Regards

Piotr

...and your only reply is slàinte mhath

raymond.dba Valued Member Points: 50 More actions · Answer 5

Many thanks to all.

Dear GSquared,

The delimiters in 'Column8' are not in sequence. And it does not return the same number of columns either. It is very complex.

The minimum string length of 'Column8' is 9 characters, and maximum is 125 characters. For example, let's take a look at the following -

Data1:

1AA#ffjddxmv*2CC#jkdfgjkgdkdjdfgdererndndffdndfgndfgd

Data2:

3DD#klvdfvncvnthrtowejdfgndfgndfgeiherurhsfjgdfgdhdgdfhgfhdsjgdfgh

Data3:

7HH#dnkjgrgdffnfgjknfnnj16456+6*#jfejfgfdjkgdfgjkfngrtyiyurouj908urtyuriojrtgiojrtkhonmghnfghmkfgn

Now let's take a look at how the output is going to look like (please follow the comma , delimiters) -

Data1:

"1","AA","ffjddxmv","2","CC","jkdfgjkgdkdjdfgdererndndffdndfgndfgd"

Data2:

"3","DD","klvdfvncvnthrtowejdfgndfgndfgeiherurhsfjgdfgdhdgdfhgfhdsjgdfgh"

Data3:

"7","HH","dnkjgrgdffnfgjknfnnj16456+6",",","jfejfgfdjkgdfgjkfngrtyiyurouj908urtyuriojrtgiojrtkhonmghnfghmkfgn"

Thanks again and Best Regards.

Raymond

GSquared (10/7/2009)
From your description, it should be relatively easy to use the Substring function to split out those columns.
For example, something like this will handle column 2:
create table #T (
ID int identity primary key,
Col1 varchar(50),
Col2 varchar(50));
insert into #T (Col1, Col2)
select '1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ', '1234*567890ABCDEFGH#IJKLMNOPQ#RSTUVWXYZ';
select substring(Col1, 1, 10), substring(Col1, 11, 10), substring(Col1, 21, 50)
from #T;
I need to know if the delimiters in column 8 are in a specific sequence, or if it's variable. That'll determine the complexity of that part. Also, does it always split into a fixed number of columns, or is that variable?

GSquared SSC Guru Points: 260824 More actions · Answer 6

From that, I'm extrapolating that an asterisk in the field indicates that either the next character is a column, or that the next number is a column. Can you clarify the rule on that?

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

michaelwelcome SSC-Addicted Points: 471 More actions · Answer 7

It appears that this can be acomplshed with the use of two user defined database functions. The data is composed of two parts seperated by a "*". Each of the substrings can have three parts. The first two appear to be fixed lengths of 1 and 2, if they exist. There is a '#' seperator followed by a variable length string. The 125 just represents the max string length you indicated.

The first function ParseFunction1 would take the entire string @InString and the column number to return (1-6) @ColNum. Find the charindex of the "*" and store in a variable @DelimiterIndex. Then just call function 2 as follows:

select @DelimiterIndex = charindex(@InString, '*')

select @ResultString = case

when @DelimiterIndex < 1 and @ColNum in (1, 2, 3) then dbo.ParseFunction2(@InString, @ColNum) -- only one substring

when @DelimiterIndex < 1 and @ColNum in (4, 5, 6) then null -- Second substring is all null

when @ColNum = 1 then dbo.ParseFunction2(substring(@InString, 1, @DelmiterIndex - 1), 1)

when @ColNum = 2 then dbo.ParseFunction2(substring(@InString, 1, @DelmiterIndex - 1), 2)

when @ColNum = 3 then dbo.ParseFunction2(substring(@InString, 1, @DelmiterIndex - 1), 3)

when @ColNum = 4 then dbo.ParseFunction2(substring(@InString, @DelmiterIndex + 1, 125), 1)

when @ColNum = 5 then dbo.ParseFunction2(substring(@InString, @DelmiterIndex + 1, 125), 2)

when @ColNum = 6 then dbo.ParseFunction2(substring(@InString, @DelmiterIndex + 1, 125), 3)

end

ParseFunction2 would then take its substring and select the right peice in a similar way. I am not sure substring likes 0 lengths so you might need a special case when @DelimiterIndex is 0 or 1. Something like:

select @DelimiterIndex = charindex(@instring, '#')

select @ResultString = case

when @DelimiterIndex < 1 then null -- All three columns are null

when @ColNum = 1 and @DelimiterIndex = 1 then null

when @ColNum = 1 then substring(@Instring, 1,1)

when @ColNum = 2 and @DelimiterIndex < 3 then null

when @ColNum = 2 then substring(@Instring, 2,2)

when @ColNum = 3 then substring(@Instring, @DelimiterIndex+1,125)

end

RBarryYoung SSC Guru Points: 143329 More actions · Answer 8

Matt Wilhoite (10/7/2009)
Try looking at the left(), mid(), and charindex() functions in books online. You should be able to parse anything you need with those. There might be better ways but those would get you started.

MID() is in VB. Did you mean SUBSTRING()?

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]