How to generate a sequence for a particular group of rows in SSIS

  • Hi There,

    I have a SSIS package which joins a flat file and 3 more database tables.

    From Merge Join Output, If my data is like

    ColA ColB colC ColD

    123 4 xyz 12

    123 4 xyz 12

    23456A 5 abc 10

    23456A 5 abc 10

    23456A 5 abc 10

    23456A 5 abc 10

    Then I need to Replace ColB with values like

    ColA ColB colC ColD

    123 1 xyz 12

    123 2 xyz 12

    23456A 1 abc 10

    23456A 2 abc 10

    23456A 3 abc 10

    23456A 4 abc 10

    ColB should be auto incremented based on ColA. If COlA has value 123 two times then ColB should be 1 for first row and 2 for the second row.

    If you look at the value of COlA "23456A" has been repeate for 4 times so my ColB should be 1 for first row n 2 for second rown and 3 for third row and 4 for fourth row.

    I know how to achieve this one using T-SQL. However It is in between of the SSIS package so I can't write T-SQL.

    IF I write T-SQL the query should be like this

    select ColA,ColC, COlD , Row_Number() over (partition by colA order by ColA) as COlB from sometable

    will give me the result....Can some one please help me how to get this one using SSIS. These four columns works as Primary key to my destination table. It was working fire about a week ago when there were no duplicates. Now I found some duplicate rows which will throw me an error primary key violation. So, I need to replace colB with Incrementing nmber...

    Plz Help me... I really appriciate any suggestion

    THans In Advance

  • not an expert on SSIS, but as you identified,the row_number function can do what you want, for sure...maybe stick everything in a table and do the row_number in a Script Task?

    working code:

    CREATE TABLE #Example(

    ColA varchar(50),

    ColB int,

    ColC varchar(50),

    ColD int )

    INSERT INTO #Example

    SELECT '123', 4,'xyz', 12 UNION ALL

    SELECT '123', 4,' xyz', 12 UNION ALL

    SELECT '23456A', 5,' abc', 10 UNION ALL

    SELECT '23456A', 5,' abc', 10 UNION ALL

    SELECT '23456A', 5,' abc', 10 UNION ALL

    SELECT '23456A', 5,' abc', 10

    Select

    ColA,

    row_number() OVER (partition by ColA Order by ColA) As ColB,

    ColC,

    ColD from #Example

    --Results:

    /*

    ColA ColB ColC ColD

    ------- ----- ----- -----

    123 1 xyz 12

    123 2 xyz 12

    23456A 1 abc 10

    23456A 2 abc 10

    23456A 3 abc 10

    23456A 4 abc 10

    */

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • I did a Google search and quickly came up with a number of web pages that tell you how to do this. There is a third-party transformation, but that doesn't have the flexibility that you need. The other pages I looked at tell how to do this with a script task.

    http://support.microsoft.com/kb/908460

    http://weblogs.sqlteam.com/jamesn/archive/2008/02/13/60509.aspx

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Use a Sort Transformation to first make sure your rows are sorted properly (ie, sorted by ColA).

    Then use a Script Component to "reassign" ColB.

    In the script component you can create a variable then holds the value for the previous ColA value. Also create a variable that is a counter (initially 1).

    Check if ColA for the current row is the same as ColA for the previous row. If so, then increment the counter by 1 assign the counter value to ColB. Otherwise if the Current Row ColA is not equal to the Previous Row ColA then set the Counter = 1 and assign the counter value to ColB.

    I can post an actual code example tomorrow, when I am in the office.

    Hope this is what you are looking for.

  • Choose ColA and ColB as Input Columns

    Heres a Script Component code example:

    Public Class ScriptMain

    Inherits UserComponent

    Dim Counter As Integer = 1

    Dim LastColA As String = ""

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

    If LastColA Row.ColA.ToString Then

    Counter = 1

    End If

    Row.ColB = Counter

    LastColA = Row.ColA.ToString

    Counter += 1

    End Sub

    End Class

  • I really appriciate your help

    Thanks,

    Raj

  • Please elaborate on how you get the variable populated for the previouse row "LastColA".

  • The Input0_ProcessInputRow sub is called for every row in the input data.

    As long as the input data is in the required order, this allows the LastColA variable to be populated (and remain in memory, because it is defined outside of the sub and therefore remains in scope) every time a row is processed. It is merely holding what was in the previous row to be read - it does not go and seek anything.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Thanks for the reply. I'm no developer and therefore steer clear of the script tasks and components, but in this case I need a solution fast. I have created a sample package for this example, and using just the script pasted, my output is:

    ColA,ColB,ColC,ColD

    123,1,xyz,12

    123,1,xyz,12

    12346B,1,dfe,13

    12346B,1,dfe,13

    23456A,1,abc,10

    23456A,1,abc,10

    23456A,1,abc,10

    23456A,1,abc,10

    ColB is set to 1 always,I therefore still need to get LastColA variable populated. I can google how to populate a variable via the script component, but do I do this in a new component task prior to this one or can this be performed in the start of this script. Any sample code is much appreaciated.

  • Based on the sample data and code, I cannot see why it would not work for you, as you appear to be doing the right things.

    Or, as you are in a hurry, perhaps you should consider use of the Row_Number() solution, as exemplified by Lowell, which can be achieved directly in SQL Server, to remove some of the SSIS complexities.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply