Best direction for this

  • Hello everyone,

    I'm creating a SSIS package and I was just looking for some advice on the best direction to proceed with what I'm trying to do. Ok, basically I have a data flow task to get a fixed width file. I need to query that file and split it into two subsets based on a date in the file being greater than today.

    The first subset is the "good records" so all the results in this subset will end up in the final extract which is a comma delimited file.

    There are some good records in the second subset wihich will need to be added to the first subset. In order to find these good records I'll need to check for matches in another (3rd subset) comma delimited text file based on a ID in both files. The matched IDs found between the second subset and the 3rd subset will have that record added to the first subset. A comma delimmited text file is then created from the first subset after all this is done.

    Now, I know I can import all three subsets into 3 temp tables in SQL Server, perform the matches between the 2nd and third subsets, insert the "good" matches into table 1 (first subset) and output the result. My question is, is this the be practice to do this? Or should I be doing this a different way (possibly using some of the other tasks in SSIS)?

    Thanks,

    Strick

  • Your description sounds like a pretty good workflow.

    1) Use a source component to get the initial data set from the fixed width file

    2) use a conditional split transformation based on the date field to create the "good" and "questionable" records

    3) in the use another conditional split on the "questionable" output into "good" and "bad" records

    4) use a union all from this "good" output to the other "good" output

    5) use a destination component to build the final text file

    You may get better performance doing this in some tables in a SQL server if you can optimize indexing and use set-based operations, but you may also find that doing everything in-memory in the SSIS package will out-perform doing this in a database. So, some testing is the best thing to do. If it is less than 100,000 records regularly, I would lean toward SSIS completely because the workflow will tend to be pretty maintenance friendly and everything would be in one place.

  • Thanks so much, this is great! One question though. The second subset (which is the questionable subset) needs to be compared to a third subset which is a completely different comma delimited file. This determines which records from subset 2 will go into subset 1 (final extract) Can I compare two different subsets looking for the id? The matches will be added to subset 1 from subset 2.

    Thanks,

    Strick

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply