Split a flat file and load in two different tables

  • Hi All, i am Krish and i am new to this forum as well as SSIS. I am in learning stage and i have scenario like this:

    I have 5 customer files(Flat) in my C:\ drive. My file has data like this

    Cust_US(File name) 1790(No.of records)

    111023 EDW123 james blake TX US

    111024 EDW124 Andy Roddick LA US

    111024 EDW125 Serena Williams CA US

    My all 5 files structure is same. First row will be name of the file and record count.

    now i want to load my first record in one table that is Audit table and the other data into my Customer dimension table. I am using Foreach Loop to load without first record. I was succeded doing that. I was stuck at splittin the file and load it into two different tables.

    I have no idea how to identify first recod, becuase the first row structure and rest of the data structure is different.

    I am still working on this scenario trying to identify the ways.

    Can someone help me in this scenario. I really appreciate someones help here

    Thanks in advance.

    Thanks

    Krish

  • If you've got a source file with dissimilar structure from one line of data to the next, you're most likely going to have to use a Script Component to address this. You can configure the Script Component as a source and, using the System.IO namespace, read through the file and send the data to one of 2 (or more, if necessary) outputs.

    If you're not familiar with the Script Component, feel free to post back and I'll work up a few examples for you.

    hth,

    Tim

    Tim Mitchell, Microsoft Data Platform MVP
    Data Warehouse and ETL Consultant
    TimMitchell.net | @Tim_Mitchell | Tyleris.com
    ETL Best Practices

  • do you know the data transformation 'conditional split' ?

  • The conditional split alone won't get you there. The number and types of columns are different within the file.

    Tim Mitchell, Microsoft Data Platform MVP
    Data Warehouse and ETL Consultant
    TimMitchell.net | @Tim_Mitchell | Tyleris.com
    ETL Best Practices

  • Hi Tim, thanks for the reply. I did not use script component so far. I tried with conditional transformation seems like that does not work for my scenario. Can you please post few examples how to use script component for this scenario. Mean time i will try to work on script component.

    Thanks

    Krish

  • Hi, i do know we have Conditional split transformation. I tried with that but i wasn't successed.

    As per Tim's suggession i am trying using script component to acheive my scenario.

    I defined two outputs using Script component, one for Audit and another for Customer and i am trying to connect these two outputs to two OLE DB destinations.

    Everything looks good, but when i run the Package, i see all the data going through either one output.

    Here is the script i used(I copied it from another place)

    Imports System

    Imports System.Data

    Imports System.Math

    Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper

    Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

    Public Class ScriptMain

    Inherits UserComponent

    Public Overrides Sub input0_ProcessInputRow(ByVal Row As Input0Buffer)

    Row.Col1 = Tokenise(Row.Column0, ",", 1)

    Row.Col2 = Tokenise(Row.Column0, ",", 2)

    End Sub

    Public Overrides Sub input0_ProcessInputRow(ByVal Row As Input0Buffer)

    Row.Col3 = Tokenise(Row.Column0, ",", 3)

    Row.Col4 = Tokenise(Row.Column0, ",", 4)

    Row.Col5 = Tokenise(Row.Column0, ",", 5)

    Row.Col6 = Tokenise(Row.Column0, ",", 6)

    End Sub

    Private Function Tokenise(ByVal input As String, ByVal delimiter As String, ByVal token As Integer) As String

    Dim tokenArray As String()

    tokenArray = input.Split(delimiter.ToCharArray) 'Split the string by the delimiter

    If tokenArray.Length < token Then 'Protect against a request for a token that doesn't exist

    Return ""

    Else

    Return tokenArray(token - 1)

    End If

    End Function

    End Class

    The script returs error message like we Multiple definitions of Public Overrides.

    Can someone help here..

  • Krish,

    I'm finishing up an article for publication on this site to demonstrate an easy way to handle this. I'll send you a preview copy later tonight or tomorrow which should get you there.

    hth,

    Tim

    Tim Mitchell, Microsoft Data Platform MVP
    Data Warehouse and ETL Consultant
    TimMitchell.net | @Tim_Mitchell | Tyleris.com
    ETL Best Practices

  • Thanks a lot Tim. I really appreciate your help on this.

  • Hi Tim, added to above reply i tried another way to do this but i wasn't succeded. Here is the way i tried

    My falt file looks like this(for refference)

    File_Name = Cust_US

    ----------------------

    Cust_US(File name) 1790(No.of records)

    111023 EDW123 james blake TX US

    111024 EDW124 Andy Roddick LA US

    111024 EDW125 Serena Williams CA US

    I defined a data flow in my foreach Loop

    In the data flow i defined my flat file connection

    Then i added a conditional split transormation

    in conditional split i wrote an expression like Substring(Column1,1,2,) == 'Cu" (this is case1)

    Then i defined my OLE DB destination for above case1 and i mapped it to Audit table.

    When i run the package Audit table loaded successfully with 5 records(Becauase i have 5 files).

    Then for the second condition in the Conditaionl split tranformation i defined a script component to defined the columns. Idefined new columns in Advanced table in Script component like this

    col1

    col2

    col3

    col4

    col5

    col6

    I wrote the script to identify all the newly defined column in Script page and i mapped to my target Customer table.

    But when i run the package, it was failing becuase it is still taking Cust_US as first record.

    Hope this makes claear to you and everyone.

    Thanks

  • What you'll need to do is to skip the conditional split altogether; you'll use the Script component to separate the record types.

    First, create your Script Component as a "Source". Next, and this is most important, is to create a second output in the list of Outputs in the settings for the Script Component. You'll see that there is already an output (probably named "Output 0" in the list; click "Add Output" and add a new one to the list. You can name these whatever you want to make your script component easier to use. You'll also see when you unfold the properties for each Output that you can define the columns for each. Go ahead and configure the columns for your 2 different output types; this will be fairly self explanatory.

    Once you've got your outputs configured, open the script editor. You'll need to open the file using the StreamReader object (be sure to include the Imports System.IO directive). You can use the StreamReader object to read the file one line at a time using the ReadLine() method. You can test each line in the file, and send it to the appropriate output.

    To add a line to either of the outputs, call the AddRow() method first. For example, if you use the default output name of "Output 0", you'll call Output0Buffer.AddRow() to add a new row to the output. Then, use Output0Buffer. with each column name. Note that you'll do this for each line in the file, handling each line in turn and sending it to the appropriate output.

    I know this is a lot of information to throw at you, particularly if you're not yet comfortable with the Script Component. Feel free to post back if any of this needs clarification. I'll be publishing an article shortly on this topic, which will be complete with screenshots and examples.

    hth,

    Tim

    Tim Mitchell, Microsoft Data Platform MVP
    Data Warehouse and ETL Consultant
    TimMitchell.net | @Tim_Mitchell | Tyleris.com
    ETL Best Practices

  • Tim, thanks for the reply. To get clear idea, i am waiting for your article. Can you please post it whenever you are ready. Mean time i will try the way you said.

    Thanks

  • Hi tim, good morning! Can we use same flat file source multiple times for Header record and for detail records? And i want to use this in two different dataflows connected each other. One data flow loads only one record i.e. Header record(with a conditional split) and another data flow loads detailed records(Selected in source file connection properties as skip first row).

    Can i folllow the above method using Foreach loop container??

    Thanks

    Raj

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply