SSIS - Multiple flat file as source

  • Hi,

    We need to iterate through multiple flat files and those of different metadata and need to load into different tables

    i.e the file name will be - -

    Test20110225, Test20110227, Test20110228 - These files contains 3 columns.

    TestToday20110225, TestToday20110227, TestTodayt20110228 - These files contains 6 columns.

    TestTomorrow20110225, TestTestTomorrow20110227, TestTestTomorrowt20110228 - These files contains 8 columns.

    All these files will be placed in a folder. I need to load each file in sequential order to each individual table

    i.e

    Test***.txt -> Tabletest

    TestToday***.txt -> TabletestToday

    TestTomorrow***.txt -> TabletestTomorrow

    I am using a foreach loop container and included a dataflow task inside it.In the control flow i have taken multiflatfile connection mgr. But i am getting problem as the files have similar metadata.

    Can you please suggest the best approach..

    Thanks,

    Naru

  • you need to group the files that have the same structure together and loop throught them, then have a different data flow for each set of files that have the same structure

  • Hi,

    Thanks for your suggestion. I just want to make sure whether the below approach is possible in SSIS

    for ex: We are having 5 flat files with different metadata and need to be loaded into 5 different tables. All these 5 tables exists in single folder.

    Shall we have a single dataflow task and with using conditional split can we acheive what we need.

    If there are 100 flat files of different metadata do we need to go for 100 data flow tasks to load into 100 different tables.

    Please let me know. I am trying to explore if there is any other way of doing it. If the only way is to go with 100 data flow tasks for 100 flat files of different metadata i shall implement it.

    Thanks,

    Naru

  • yes that is the correct approach , loop through files that have the same structure,

    for files with different structure you will need a sperate data flow. this may mean 100 data flows.

    it is possible to have dynamic data-flows but in my opinion the amount of hassle creating them is not worth it.

  • Thanks a lot for your suggestion. I agree with you.

  • Hi,

    I am using a for each loop container and included 3 dataflow tasks inside it. The file structure is

    --------------------------------

    Test20110225, Test20110227, Test20110228 - These files contains 3 columns.

    TestToday20110225, TestToday20110227, TestTodayt20110228 - These files contains 6 columns.

    TestTomorrow20110225, TestTomorrow20110227, TestTomorrowt20110228 - These files contains 8 columns.

    ------------------------------

    The first data flow task is executing fine by loading the rows from the file Test20110225. Later instead of reading Test20110227 in to the first data flow task, it is loading the 2nd dataflow task and as a result error.

    Please suggest. DO i need to go for precendence constraint OR 3 for loops and 3 data tasks for 3 different flat files.

    Thanks,

    Naru

  • if the files have a different format or different number of columns then you will need a sperated data flow for each,

  • Narendra-274001 (4/5/2011)


    Please suggest. DO i need to go for precendence constraint OR 3 for loops and 3 data tasks for 3 different flat files.

    All three of those will work. Which you choose will depend on your exact requirements and which techniques you feel most comfortable using. You will be supporting (or at least documenting) this yourself, after all. If you have only three files and the file names never change, then I would probably go for three separate data flows rather than fiddle about setting up loops or conditional precedence constraints.

    John

  • The for loop will consider files in the below sequence - -

    Test20110225

    Test20110227

    Test20110228

    TestToday20110225

    TestToday20110227

    TestTodayt20110228

    TestTomorrow20110225

    TestTomorrow20110227

    TestTomorrowt20110228

    Data flow task 1 - 1,2,3 files

    Data flow task 2 - 4,5,6 files

    Data flow task 3 - 7,8,9 files

    When i execute the package, the first dataflow task is executing the 1st file and next the 2nd dataflow task is executing the 2nd file.

    What i expect is the first dataflow task should execure 1,2,3 files and 2nd dataflow task should execute 4,5,6 files.

    Can you please let me know the best way to handle it. Perhaps using conditional split OR....

  • Conditional split is not necessary. You just need to make sure your loop is configured so that after it has processed one file, it moves on to the next, and after it has processed all of them, it exits with success. Without seeing your code, it's difficult to provide any more advice than that.

    John

  • Hi John,

    I have configured the for each loop to load all the .txt files from the folder. I am trying to explore the behaviour of for each loop. i.e the first file it will traverse through the first dataflow task. For the 2nd file will it loop through the first dataflow task OR second dataflow task.

    My doubt is, If it process the first file through the first dataflow task, what is going to happend next. Will it process the 2nd file through the first dataflow task OR 2nd dataflow task.

    I wont mind even it process all the files through the first data flow task and next all files through the 2nd dataflow task...

    Thanks,

    Naru.

  • If you've set up your loops correctly, it will load each text file for the same table one by one. As I said before, I can't tell you what's going to happen because I haven't seen your code.

    John

  • Hi John,

    The flow is - -

    In the for each loop container i have just given to consider *.txt files from a specific folder. Under the for each loop i have taken 3 dataflow tasks.

    I am getting error with the both scenarios. i.e First scenario - I have enabled only the first data flow task and remaining 2 data flow task are disabled.

    The first dataflow task is mapped only to consider the file having only 3 columns i.e the below files

    Test20110225, Test20110227, Test20110228

    The dataflow task is executing fine till the above 3 files are loaded and giving error when the next file comes into the loop. i.e the files - TestToday20110225, TestToday20110227, TestTodayt20110228

    Error description - -An error occurred while processing file "TestToday20110218.txt"

    As the file metadata doesn't support dataflow task1 it is giving error.

    How can we tell the dataflow to consider only the first 3 files i.e Test20110225, Test20110227, Test20110228 and when the next 3 files comes into the loop it should execure the dataflow task2.

    Thanks for your patience.

    Naru

  • Naru

    You need to make sure that the collection for each loop only contains the files you want to be processed by the dataflow within that loop. One way to do this is to have a table called Files, with columns FilePath, FileName, DataFlowNo. In each of your loops, you can select from the Files table using a different DataFlowNo in your WHERE clause.

    John

  • Hi John,

    It is said "You need to make sure that the collection for each loop only contains the files you want to be processed by the dataflow within that loop" - -

    As of now i am considering all the .txt files in the for each loop. for ex: if there are 9 files.

    Each 3 are having same kind of metadata. Therefore i have considered 3 dataflow tasks in the for each loop.

    My first doubt is - From the above case, i.e having different metadata for each set of files, Is it a good idea of considering different dataflow tasks with in the single for each loop.

    THe for each loop is retreiving all the required files. All we need to do is loop them through different dataflow tasks and based on the data flow task we need to store in Db.

    In the dataflow tasks i have taken a flatfile connection mgr and assigned a expression to it. ( i.e filename).

    Now i have done a small test. I have disabled all the dataflow tasks and only 1 dataflow task exists in the for loop.

    In the dataflow task i have included a flatfile connection with is compatible with 1 set of files. If the other set of files falls into the loop then it is generating error.

    Will this ever going to be work.. i.e 1 dataflow task with in a foreach loop with different metadata.

    Thanks,

    Naru.

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply