July 24, 2009 at 5:00 pm
Hi There,
I have a SSIS package which joins a flat file and 3 more database tables.
From Merge Join Output, If my data is like
ColA ColB colC ColD
123 4 xyz 12
123 4 xyz 12
23456A 5 abc 10
23456A 5 abc 10
23456A 5 abc 10
23456A 5 abc 10
Then I need to Replace ColB with values like
ColA ColB colC ColD
123 1 xyz 12
123 2 xyz 12
23456A 1 abc 10
23456A 2 abc 10
23456A 3 abc 10
23456A 4 abc 10
ColB should be auto incremented based on ColA. If COlA has value 123 two times then ColB should be 1 for first row and 2 for the second row.
If you look at the value of COlA "23456A" has been repeate for 4 times so my ColB should be 1 for first row n 2 for second rown and 3 for third row and 4 for fourth row.
I know how to achieve this one using T-SQL. However It is in between of the SSIS package so I can't write T-SQL.
IF I write T-SQL the query should be like this
select ColA,ColC, COlD , Row_Number() over (partition by colA order by ColA) as COlB from sometable
will give me the result....Can some one please help me how to get this one using SSIS. These four columns works as Primary key to my destination table. It was working fire about a week ago when there were no duplicates. Now I found some duplicate rows which will throw me an error primary key violation. So, I need to replace colB with Incrementing nmber...
Plz Help me... I really appriciate any suggestion
THans In Advance
July 25, 2009 at 4:03 am
not an expert on SSIS, but as you identified,the row_number function can do what you want, for sure...maybe stick everything in a table and do the row_number in a Script Task?
working code:
CREATE TABLE #Example(
ColA varchar(50),
ColB int,
ColC varchar(50),
ColD int )
INSERT INTO #Example
SELECT '123', 4,'xyz', 12 UNION ALL
SELECT '123', 4,' xyz', 12 UNION ALL
SELECT '23456A', 5,' abc', 10 UNION ALL
SELECT '23456A', 5,' abc', 10 UNION ALL
SELECT '23456A', 5,' abc', 10 UNION ALL
SELECT '23456A', 5,' abc', 10
Select
ColA,
row_number() OVER (partition by ColA Order by ColA) As ColB,
ColC,
ColD from #Example
--Results:
/*
ColA ColB ColC ColD
------- ----- ----- -----
123 1 xyz 12
123 2 xyz 12
23456A 1 abc 10
23456A 2 abc 10
23456A 3 abc 10
23456A 4 abc 10
*/
Lowell
July 26, 2009 at 6:42 pm
I did a Google search and quickly came up with a number of web pages that tell you how to do this. There is a third-party transformation, but that doesn't have the flexibility that you need. The other pages I looked at tell how to do this with a script task.
http://support.microsoft.com/kb/908460
http://weblogs.sqlteam.com/jamesn/archive/2008/02/13/60509.aspx
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
July 26, 2009 at 7:05 pm
Use a Sort Transformation to first make sure your rows are sorted properly (ie, sorted by ColA).
Then use a Script Component to "reassign" ColB.
In the script component you can create a variable then holds the value for the previous ColA value. Also create a variable that is a counter (initially 1).
Check if ColA for the current row is the same as ColA for the previous row. If so, then increment the counter by 1 assign the counter value to ColB. Otherwise if the Current Row ColA is not equal to the Previous Row ColA then set the Counter = 1 and assign the counter value to ColB.
I can post an actual code example tomorrow, when I am in the office.
Hope this is what you are looking for.
July 27, 2009 at 8:18 am
Choose ColA and ColB as Input Columns
Heres a Script Component code example:
Public Class ScriptMain
Inherits UserComponent
Dim Counter As Integer = 1
Dim LastColA As String = ""
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
If LastColA Row.ColA.ToString Then
Counter = 1
End If
Row.ColB = Counter
LastColA = Row.ColA.ToString
Counter += 1
End Sub
End Class
July 27, 2009 at 8:30 am
I really appriciate your help
Thanks,
Raj
August 31, 2009 at 8:59 am
Please elaborate on how you get the variable populated for the previouse row "LastColA".
August 31, 2009 at 9:35 am
The Input0_ProcessInputRow sub is called for every row in the input data.
As long as the input data is in the required order, this allows the LastColA variable to be populated (and remain in memory, because it is defined outside of the sub and therefore remains in scope) every time a row is processed. It is merely holding what was in the previous row to be read - it does not go and seek anything.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
August 31, 2009 at 4:42 pm
Thanks for the reply. I'm no developer and therefore steer clear of the script tasks and components, but in this case I need a solution fast. I have created a sample package for this example, and using just the script pasted, my output is:
ColA,ColB,ColC,ColD
123,1,xyz,12
123,1,xyz,12
12346B,1,dfe,13
12346B,1,dfe,13
23456A,1,abc,10
23456A,1,abc,10
23456A,1,abc,10
23456A,1,abc,10
ColB is set to 1 always,I therefore still need to get LastColA variable populated. I can google how to populate a variable via the script component, but do I do this in a new component task prior to this one or can this be performed in the start of this script. Any sample code is much appreaciated.
September 1, 2009 at 3:15 am
Based on the sample data and code, I cannot see why it would not work for you, as you appear to be doing the right things.
Or, as you are in a hurry, perhaps you should consider use of the Row_Number() solution, as exemplified by Lowell, which can be achieved directly in SQL Server, to remove some of the SSIS complexities.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply