What would the expression be for a running sequential count

  • Using Derived column I am creating a new column called journal_line_nbr

    The journal line number is a sequential number starting with 1 and advancing to the next number for each row of a journal entry.

    What is the syntax/expression for this in Derived column? None of the functions appear to have this.

  • I think you will need to create a package variable that you increment in order to do this as SSIS processes row by row not by the set. This is really something you should do in your source statement if at all possible. So if SQL Server 2005 is your source you could use the Row_Number function in your source statement.

  • Hi Jack,

    My source is an Excel file going into a SQL 2005 table. So will the variable function still work?

  • You can use a simple script component to number the rows as well. This assumes that the input table is "nicknamed" input0, and that the script component outputs a derived column called RID (which you would add manually to the output components of the script).

    The script looks like:

    Imports System

    Imports System.Data

    Imports System.Math

    Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper

    Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

    Public Class ScriptMain

    Inherits UserComponent

    Private _RowNumber As Integer = 1

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

    Row.RID = _RowNumber

    _RowNumber = _RowNumber + 1

    '

    ' Add your code here

    '

    End Sub

    End Class

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Just want to clear this up as i am still learning SSIS.

    So after my Derived Column transformation task I want to connect a script component transformation task?

    The code you provide indicated 'enter your code here' would I have to define the columns again?

    Sorry if these questions are trival...

  • No - you wouldn't have to add anything else. Just like a derived column does, this script component would "add" a new column to the list of available items. (The 'add your code here' is the default text you get when you go into a script component, which I neglected to remove.)

    Like you said - just wire up the output from the derived column to the script component (i.e. drag the green arrow from the derived column component onto the script component.)

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • I was just going to go where Matt took you. To explain what Matt is saying, you should replace your Derived Column with a script component. If I understand your needs I correctly I would do the following:

    1. Create 2 variables scoped to the dataflow. journal_line_nbr as int32 and LastJournalEntry typed to the Unique ID of the journal entry

    2. Add a Script Component selecting a type of Transformation

    3. On the Input Columns tab of the script component select the JournalEntry column

    4. On the Inputs ans Output tab of the script component, Expand Output 0, Select Output Columns, and Click the Add Column button to add the journal_line_nbr column to the output using Int as the datatype

    5. On the script tab of the component enter your 2 variables in the ReadWrite variable line separated by a comma (variable1, variable2)

    6. Click Design Script and create a script like this:

    [font="Courier New"]Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

        ' this checks to see if we are in the same journal entry

        ' Row.journal_entry is the current journal entry

        ' Variables.LastJournalEntry is the variable we defined

        If Row.journal_entry = Variables.LastJournalEntry Then

            ' increment the journal line number variable

            Variables.journal_line_nbr += 1

        Else

            ' new journal entry so reset journal_line_nbr variable

            Variables.journal_line_nbr = 1

            ' set the LastJournalEntry variable to current row

            Variables.LastJournalEntry = Row.journal_entry

        End If

        ' Row.journal_line_nbr is the output column created so

        ' set it to the Variable

        Row.journal_line_nbr = Variables.journal_line_nbr

    End Sub

    [/font]

    Now your output from the script will contain all the columns in your excel sheet + the journal_line_nbr.

  • Jack/Matt,

    I really do appreciate this, I am understanding how this works I have two questions:

    1. Do I need to create a blank column in Derived column task for the journal number? I think the answer is yes otherwise the script transformation task cannot transform the column (this is just to clarify and for my learning)

    2. Jack on your example you provide two variables the number and the journal entry (like the journal ID or identifier) I am assuming you placed this variable in here so that it could 'loop' through journal IDs?

    If I am correct, then what would the modified script look like if the output is to go to one journal ID/entry only?

  • 1. Define them as outputs in the script component. After adding the script component, right, go into Edit, then inputs and outputs, and add the output names you want. then use them in the code as Jack described.

    2. Essentially - Jack's code example allows for two variables (one being the sheet and one being the row), and mine just deals with one (just a row number).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • jsheldon (5/23/2008)


    1. Do I need to create a blank column in Derived column task for the journal number? I think the answer is yes otherwise the script transformation task cannot transform the column (this is just to clarify and for my learning)

    If all you are creating is a new column for journal_line_nbr then you do not need a derived column task. The script component transformation task as I outlined creates the column.

    jsheldon (5/23/2008)


    2. Jack on your example you provide two variables the number and the journal entry (like the journal ID or identifier) I am assuming you placed this variable in here so that it could 'loop' through journal IDs?

    I had assumed that you had data like this:

    journal entry data

    1 test1

    1 test2

    2 test1

    3 test1

    3 test2

    3 test3

    and you wanted a result like this:

    journal entry data journal line nbr

    1 test1 1

    1 test2 2

    2 test1 1

    3 test1 1

    3 test2 2

    3 test3 3

    This is what my code would return.

    jsheldon (5/23/2008)


    If I am correct, then what would the modified script look like if the output is to go to one journal ID/entry only?

    I think this may be what you are looking for:

    [font="Courier New"]Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

        ' increment the journal line number variable

        Variables.journal_line_nbr += 1

        

        ' Row.journal_line_nbr is the output column created so

        ' set it to the Variable

        Row.journal_line_nbr = Variables.journal_line_nbr

    End Sub

    [/font]

    If all you want is a sequential number for each row in the excel sheet why not add an identity column to your destination table? It's a lot simpler than what you are trying to do in code.

  • If your package is inserting the data into a staging area why not just add an identity column to the staging table for row number? You might also just insert into a transient table that can be truncated each time. Not sure what the requirements are but this might prove to perform better than adding a script task...


    Cheers,

    Ben Sullins
    bensullins.com
    Beer is my primary key...

  • Hi Jack,

    Ok I am getting this thanks..one more question my script if I gloss over it states 'journal_line_nbr is not a member of 'ScriptComponent_....'

    To answer your question, why not make an identity column on the destination table. That would make life too easy 😉

    The data is a telephone activity going into a transaction table of an ERP system. If I add or even modify the table schema then the loads and postings will not function. I don't mess with third-party vendor databases...

  • To answer your question, why not make an identity column on the destination table. That would make life too easy 😉

    The data is a telephone activity going into a transaction table of an ERP system. If I add or even modify the table schema then the loads and postings will not function. I don't mess with third-party vendor databases...

    Right but do you not write the data to a staging area first? If not then yeah use a script task or derived column but if you are writing it to a staging area I would just modify the staging table for this purpose...

    Good Luck!


    Cheers,

    Ben Sullins
    bensullins.com
    Beer is my primary key...

  • The staging area is the table it is going to...bad ERP desgin what can I say.

    So my raw source goes to project_jrnl_load_trans and then the process moves it to the project_jrnl_line table

    Let me ask this, within my excel source data task can I add an identity column? That would be much easier than this script task.

  • jsheldon (5/23/2008)


    Let me ask this, within my excel source data task can I add an identity column? That would be much easier than this script task.

    If your configured properly then yes you can...here is what I did locally to test it:

    1.) Added an OLE DB Source to my data flow.

    2.) Selected a SQL 2005 DB Server for my connection manager.

    3.) Chose 'SQL Command' as my Data access mode.

    4.) Used the RowNumber() function combined with the OPENROWSET() connection method to compose a query which pulled all the data in from the Excel sheet adding in the RowNumber.

    --In order for this to work your SQL 2005 Surface Area Configuration must allow the OPENROWSET() connection method.

    --You'll most like want to define this SQL Statement dynamically through a package configuration or some other method.

    Here is the query:

    SELECT

    ROW_NUMBER() OVER (order by [VendorAccountNo]) AS [RowNumber]

    ,[VendorAccountNo]

    ,[LocalAccountNo]

    ,[CompanyCd]

    ,[CUSIP]

    FROM

    OPENROWSET('Microsoft.Jet.OLEDB.4.0','Excel 8.0;Database=C:\temp\MySheet.xls;HDR=0; IMEX=1;','SELECT * FROM [Sheet1$]')

    Hope this helps!


    Cheers,

    Ben Sullins
    bensullins.com
    Beer is my primary key...

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply