May 30, 2018 at 5:47 am
I am trying to import multiple csv files and after they undergo some transformations they should be outputted into a single csv file. Currently, my problem is that the last file to be imported ends up overwriting the output file. Instead all imported files should go through the transformations then be written to the output file.
Please advise.
May 30, 2018 at 7:05 am
sanaan.sheikh - Wednesday, May 30, 2018 5:47 AMI am trying to import multiple csv files and after they undergo some transformations they should be outputted into a single csv file. Currently, my problem is that the last file to be imported ends up overwriting the output file. Instead all imported files should go through the transformations then be written to the output file.Please advise.
I think you need to be more clear about what you want and what is set up at this moment. I don't really understand how an import would overwrite an output file.
Do you use variables as (part of) the filenames? Are the correct variables used in each task? Have you set the value of these variables correct? Are the values of these variables modified during the process?
For testing purpose:
- what if you set hard-coded filenames?
- what if you change all the filenames in the package (and on the file system accordingly)?
May 30, 2018 at 7:36 am
What is currently set up is the following.
Data flow task:
1. Import file from source (all files in a single folder).
2. Derived column transformations.
3. File outputted in destination location.
Control flow:
- A foreach loop container is used containing the following elements
1. Data flow task as outlined above.
2. File System Task - imported files are archived one by one
There are 2 connection managers.
1. Flat file connection manager for the import of files (data flow task step 1)
Here the file name is specified via an expression which uses a variable for the file name. Unsurprisingly this variable is called 'FileName' and the value has been left blank. This variable is used in the foreach loop container in the 'variable mappings'. I have left the index at the default value of 0.
The foreach loop container points to the folder which contains the csv files to be imported. It will import all files that end in .csv.
2. The second connection manager is the flat file destination.
A single file is created in the destination folder. The file that is produced is based on the last csv file to be imported in the data flow task step 1.
The files are correctly being imported one by one and archived. But I want the results for each file imported after it has been transformed (derived column task) to be appended to the final output file.
May 30, 2018 at 8:03 am
Do you have a "MERGE", "MERGE JOIN" or "UNION" task in the Data Flow? You'll need to combine the derived output of each file into a single set before writing to the destination.
May 30, 2018 at 8:14 am
I was thinking along these lines. Which function do you suggest - merge, merge join or union all?
And should the function be used after the derived column transformations in the data flow task ie. after step 2 in the data flow task?
May 30, 2018 at 9:20 am
Which one to use depends on your needs. But reading your post I guess a UNION fits well.
The position depends on the process. You can first join all imported data and derive afterwards. You can also import each file and include the derive action for each file. Then you only union the derived result. The first option transforms the complete set in one step. The latter option gives you the possibility to import/transform in parallel.
You need to test yourself for usability and performance.
May 30, 2018 at 9:45 am
Turns out it was a lot simpler - always the case!
In the data flow task the flat file destination had the the check box 'overwrite data in the file' ticked. Deselecting this did the trick.
Thanks for your time and help anyway 🙂
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply