October 7, 2009 at 9:43 am
Dear All,
I am working on a project for loading data from flat file source into a table using SSIS package. This SSIS ETL configuration is successfully done. There are thousands of records loaded very fast into the 'staging tables' of MS SQL Server 2008 database.
Now my new task is that I have to split some string data from this 'staging table' and insert into many columns along with the other existing columns of this 'staging table' into the "FINAL TABLE'.
For example, the 'staging table' contains 12 columns. I have to split the data of 'Column2' into two parts, and I have to split the data of the 'Column8' into six parts, and then insert the data into the 'final table' which has 18 columns. The strings of 'Column2' do not have any delimiter, we have to count the characters. But the strings of 'Column8' has two types of delimiter such as * and # respectively.
The 18 columns of 'FINAL TABLE' are the remaining 10 columns of the 'staging table' plus 8 NEW COLUMNS which are created from splitting the data of 'Column2' and 'Column8' of the 'staging table'.
Now I am in the middle of no where.
Please help me.
Thanks in advance.
Raymond
October 7, 2009 at 9:49 am
Please read the first article I reference below in my signature block regarding asking for assistance. To really help you it would help to have the DDL for the tables, some sample data, and expected results. If you follow the instructions in the article you will also get TESTED code in return.
October 7, 2009 at 9:51 am
Try looking at the left(), mid(), and charindex() functions in books online. You should be able to parse anything you need with those. There might be better ways but those would get you started.
October 7, 2009 at 9:59 am
From your description, it should be relatively easy to use the Substring function to split out those columns.
For example, something like this will handle column 2:
create table #T (
ID int identity primary key,
Col1 varchar(50),
Col2 varchar(50));
insert into #T (Col1, Col2)
select '1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ', '1234*567890ABCDEFGH#IJKLMNOPQ#RSTUVWXYZ';
select substring(Col1, 1, 10), substring(Col1, 11, 10), substring(Col1, 21, 50)
from #T;
I need to know if the delimiters in column 8 are in a specific sequence, or if it's variable. That'll determine the complexity of that part. Also, does it always split into a fixed number of columns, or is that variable?
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
October 7, 2009 at 10:19 am
Wouldn't it be easier to load the data back to SSIS pack and process them properly there? Or process the data and load them in their final state to the database.
Regards
Piotr
...and your only reply is slàinte mhath
October 7, 2009 at 10:28 am
Many thanks to all.
Dear GSquared,
The delimiters in 'Column8' are not in sequence. And it does not return the same number of columns either. It is very complex.
The minimum string length of 'Column8' is 9 characters, and maximum is 125 characters. For example, let's take a look at the following -
Data1:
1AA#ffjddxmv*2CC#jkdfgjkgdkdjdfgdererndndffdndfgndfgd
Data2:
3DD#klvdfvncvnthrtowejdfgndfgndfgeiherurhsfjgdfgdhdgdfhgfhdsjgdfgh
Data3:
7HH#dnkjgrgdffnfgjknfnnj16456+6*#jfejfgfdjkgdfgjkfngrtyiyurouj908urtyuriojrtgiojrtkhonmghnfghmkfgn
Now let's take a look at how the output is going to look like (please follow the comma , delimiters) -
Data1:
"1","AA","ffjddxmv","2","CC","jkdfgjkgdkdjdfgdererndndffdndfgndfgd"
Data2:
"3","DD","klvdfvncvnthrtowejdfgndfgndfgeiherurhsfjgdfgdhdgdfhgfhdsjgdfgh"
Data3:
"7","HH","dnkjgrgdffnfgjknfnnj16456+6",",","jfejfgfdjkgdfgjkfngrtyiyurouj908urtyuriojrtgiojrtkhonmghnfghmkfgn"
Thanks again and Best Regards.
Raymond
GSquared (10/7/2009)
From your description, it should be relatively easy to use the Substring function to split out those columns.For example, something like this will handle column 2:
create table #T (
ID int identity primary key,
Col1 varchar(50),
Col2 varchar(50));
insert into #T (Col1, Col2)
select '1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ', '1234*567890ABCDEFGH#IJKLMNOPQ#RSTUVWXYZ';
select substring(Col1, 1, 10), substring(Col1, 11, 10), substring(Col1, 21, 50)
from #T;
I need to know if the delimiters in column 8 are in a specific sequence, or if it's variable. That'll determine the complexity of that part. Also, does it always split into a fixed number of columns, or is that variable?
October 8, 2009 at 7:14 am
From that, I'm extrapolating that an asterisk in the field indicates that either the next character is a column, or that the next number is a column. Can you clarify the rule on that?
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
October 8, 2009 at 11:08 pm
It appears that this can be acomplshed with the use of two user defined database functions. The data is composed of two parts seperated by a "*". Each of the substrings can have three parts. The first two appear to be fixed lengths of 1 and 2, if they exist. There is a '#' seperator followed by a variable length string. The 125 just represents the max string length you indicated.
The first function ParseFunction1 would take the entire string @InString and the column number to return (1-6) @ColNum. Find the charindex of the "*" and store in a variable @DelimiterIndex. Then just call function 2 as follows:
select @DelimiterIndex = charindex(@InString, '*')
select @ResultString = case
when @DelimiterIndex < 1 and @ColNum in (1, 2, 3) then dbo.ParseFunction2(@InString, @ColNum) -- only one substring
when @DelimiterIndex < 1 and @ColNum in (4, 5, 6) then null -- Second substring is all null
when @ColNum = 1 then dbo.ParseFunction2(substring(@InString, 1, @DelmiterIndex - 1), 1)
when @ColNum = 2 then dbo.ParseFunction2(substring(@InString, 1, @DelmiterIndex - 1), 2)
when @ColNum = 3 then dbo.ParseFunction2(substring(@InString, 1, @DelmiterIndex - 1), 3)
when @ColNum = 4 then dbo.ParseFunction2(substring(@InString, @DelmiterIndex + 1, 125), 1)
when @ColNum = 5 then dbo.ParseFunction2(substring(@InString, @DelmiterIndex + 1, 125), 2)
when @ColNum = 6 then dbo.ParseFunction2(substring(@InString, @DelmiterIndex + 1, 125), 3)
end
ParseFunction2 would then take its substring and select the right peice in a similar way. I am not sure substring likes 0 lengths so you might need a special case when @DelimiterIndex is 0 or 1. Something like:
select @DelimiterIndex = charindex(@instring, '#')
select @ResultString = case
when @DelimiterIndex < 1 then null -- All three columns are null
when @ColNum = 1 and @DelimiterIndex = 1 then null
when @ColNum = 1 then substring(@Instring, 1,1)
when @ColNum = 2 and @DelimiterIndex < 3 then null
when @ColNum = 2 then substring(@Instring, 2,2)
when @ColNum = 3 then substring(@Instring, @DelimiterIndex+1,125)
end
October 8, 2009 at 11:37 pm
Matt Wilhoite (10/7/2009)
Try looking at the left(), mid(), and charindex() functions in books online. You should be able to parse anything you need with those. There might be better ways but those would get you started.
MID() is in VB. Did you mean SUBSTRING()?
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply