April 5, 2011 at 3:48 am
Hi,
We need to iterate through multiple flat files and those of different metadata and need to load into different tables
i.e the file name will be - -
Test20110225, Test20110227, Test20110228 - These files contains 3 columns.
TestToday20110225, TestToday20110227, TestTodayt20110228 - These files contains 6 columns.
TestTomorrow20110225, TestTestTomorrow20110227, TestTestTomorrowt20110228 - These files contains 8 columns.
All these files will be placed in a folder. I need to load each file in sequential order to each individual table
i.e
Test***.txt -> Tabletest
TestToday***.txt -> TabletestToday
TestTomorrow***.txt -> TabletestTomorrow
I am using a foreach loop container and included a dataflow task inside it.In the control flow i have taken multiflatfile connection mgr. But i am getting problem as the files have similar metadata.
Can you please suggest the best approach..
Thanks,
Naru
April 5, 2011 at 4:20 am
you need to group the files that have the same structure together and loop throught them, then have a different data flow for each set of files that have the same structure
April 5, 2011 at 5:29 am
Hi,
Thanks for your suggestion. I just want to make sure whether the below approach is possible in SSIS
for ex: We are having 5 flat files with different metadata and need to be loaded into 5 different tables. All these 5 tables exists in single folder.
Shall we have a single dataflow task and with using conditional split can we acheive what we need.
If there are 100 flat files of different metadata do we need to go for 100 data flow tasks to load into 100 different tables.
Please let me know. I am trying to explore if there is any other way of doing it. If the only way is to go with 100 data flow tasks for 100 flat files of different metadata i shall implement it.
Thanks,
Naru
April 5, 2011 at 5:54 am
yes that is the correct approach , loop through files that have the same structure,
for files with different structure you will need a sperate data flow. this may mean 100 data flows.
it is possible to have dynamic data-flows but in my opinion the amount of hassle creating them is not worth it.
April 5, 2011 at 6:09 am
Thanks a lot for your suggestion. I agree with you.
April 5, 2011 at 6:39 am
Hi,
I am using a for each loop container and included 3 dataflow tasks inside it. The file structure is
--------------------------------
Test20110225, Test20110227, Test20110228 - These files contains 3 columns.
TestToday20110225, TestToday20110227, TestTodayt20110228 - These files contains 6 columns.
TestTomorrow20110225, TestTomorrow20110227, TestTomorrowt20110228 - These files contains 8 columns.
------------------------------
The first data flow task is executing fine by loading the rows from the file Test20110225. Later instead of reading Test20110227 in to the first data flow task, it is loading the 2nd dataflow task and as a result error.
Please suggest. DO i need to go for precendence constraint OR 3 for loops and 3 data tasks for 3 different flat files.
Thanks,
Naru
April 5, 2011 at 6:41 am
if the files have a different format or different number of columns then you will need a sperated data flow for each,
April 5, 2011 at 6:51 am
Narendra-274001 (4/5/2011)
Please suggest. DO i need to go for precendence constraint OR 3 for loops and 3 data tasks for 3 different flat files.
All three of those will work. Which you choose will depend on your exact requirements and which techniques you feel most comfortable using. You will be supporting (or at least documenting) this yourself, after all. If you have only three files and the file names never change, then I would probably go for three separate data flows rather than fiddle about setting up loops or conditional precedence constraints.
John
April 5, 2011 at 7:09 am
The for loop will consider files in the below sequence - -
Test20110225
Test20110227
Test20110228
TestToday20110225
TestToday20110227
TestTodayt20110228
TestTomorrow20110225
TestTomorrow20110227
TestTomorrowt20110228
Data flow task 1 - 1,2,3 files
Data flow task 2 - 4,5,6 files
Data flow task 3 - 7,8,9 files
When i execute the package, the first dataflow task is executing the 1st file and next the 2nd dataflow task is executing the 2nd file.
What i expect is the first dataflow task should execure 1,2,3 files and 2nd dataflow task should execute 4,5,6 files.
Can you please let me know the best way to handle it. Perhaps using conditional split OR....
April 5, 2011 at 7:18 am
Conditional split is not necessary. You just need to make sure your loop is configured so that after it has processed one file, it moves on to the next, and after it has processed all of them, it exits with success. Without seeing your code, it's difficult to provide any more advice than that.
John
April 5, 2011 at 7:48 am
Hi John,
I have configured the for each loop to load all the .txt files from the folder. I am trying to explore the behaviour of for each loop. i.e the first file it will traverse through the first dataflow task. For the 2nd file will it loop through the first dataflow task OR second dataflow task.
My doubt is, If it process the first file through the first dataflow task, what is going to happend next. Will it process the 2nd file through the first dataflow task OR 2nd dataflow task.
I wont mind even it process all the files through the first data flow task and next all files through the 2nd dataflow task...
Thanks,
Naru.
April 5, 2011 at 7:52 am
If you've set up your loops correctly, it will load each text file for the same table one by one. As I said before, I can't tell you what's going to happen because I haven't seen your code.
John
April 5, 2011 at 8:15 am
Hi John,
The flow is - -
In the for each loop container i have just given to consider *.txt files from a specific folder. Under the for each loop i have taken 3 dataflow tasks.
I am getting error with the both scenarios. i.e First scenario - I have enabled only the first data flow task and remaining 2 data flow task are disabled.
The first dataflow task is mapped only to consider the file having only 3 columns i.e the below files
Test20110225, Test20110227, Test20110228
The dataflow task is executing fine till the above 3 files are loaded and giving error when the next file comes into the loop. i.e the files - TestToday20110225, TestToday20110227, TestTodayt20110228
Error description - -An error occurred while processing file "TestToday20110218.txt"
As the file metadata doesn't support dataflow task1 it is giving error.
How can we tell the dataflow to consider only the first 3 files i.e Test20110225, Test20110227, Test20110228 and when the next 3 files comes into the loop it should execure the dataflow task2.
Thanks for your patience.
Naru
April 5, 2011 at 8:24 am
Naru
You need to make sure that the collection for each loop only contains the files you want to be processed by the dataflow within that loop. One way to do this is to have a table called Files, with columns FilePath, FileName, DataFlowNo. In each of your loops, you can select from the Files table using a different DataFlowNo in your WHERE clause.
John
April 5, 2011 at 9:30 am
Hi John,
It is said "You need to make sure that the collection for each loop only contains the files you want to be processed by the dataflow within that loop" - -
As of now i am considering all the .txt files in the for each loop. for ex: if there are 9 files.
Each 3 are having same kind of metadata. Therefore i have considered 3 dataflow tasks in the for each loop.
My first doubt is - From the above case, i.e having different metadata for each set of files, Is it a good idea of considering different dataflow tasks with in the single for each loop.
THe for each loop is retreiving all the required files. All we need to do is loop them through different dataflow tasks and based on the data flow task we need to store in Db.
In the dataflow tasks i have taken a flatfile connection mgr and assigned a expression to it. ( i.e filename).
Now i have done a small test. I have disabled all the dataflow tasks and only 1 dataflow task exists in the for loop.
In the dataflow task i have included a flatfile connection with is compatible with 1 set of files. If the other set of files falls into the loop then it is generating error.
Will this ever going to be work.. i.e 1 dataflow task with in a foreach loop with different metadata.
Thanks,
Naru.
Viewing 15 posts - 1 through 15 (of 16 total)
You must be logged in to reply to this topic. Login to reply