import data from one flat source file to two tables

  • I need to import some flat files data to SQl database tables.

    I created data flow task to do this.

    But one of our flat source file have some columns of two tables in my sql database, so I want to split the flat file data into two, I need to select distinc from 3 columns into table A, select distinct the other columns into table B. Currently I implement this doing a data flow task first from all columns in a source file to a staging table, then I add another sql task using two insert into....select distinct ... statements into two different tables from the staging table,

    is this the efficient way to implement this, or is there better way to do it?

    Thanks

  • Just a thought, but you could just load the data from the file, multicast it to two streams, sort each stream on the required keys removing duplicates, and send to the two destinations. I don't quite recall if there's a problem with sorting two streams of a multicast in different ways.

    In theory the above idea saves you writing and reading twice, but sorts can be costly. In fact it might be better just to read the flat file twice only getting the required columns each time.

    YMMV, test if you like.

  • sam.dahl (11/20/2011)


    Just a thought, but you could just load the data from the file, multicast it to two streams, sort each stream on the required keys removing duplicates, and send to the two destinations. I don't quite recall if there's a problem with sorting two streams of a multicast in different ways.

    In theory the above idea saves you writing and reading twice, but sorts can be costly. In fact it might be better just to read the flat file twice only getting the required columns each time.

    YMMV, test if you like.

    Thanks, I learned from you to do two streams from multicast, although the sorting is expensive.

    The second method you mentioned is to read the flat file twice, but if using that way, some columns have duplicated records.

    Is there any problem the way I do it from my previous post?

    Thanks

  • Hmm, I assumed a select distinct query would be possible on a flat file source in the 'read twice' scenario.

    Is there any problem the way I do it from my previous post?

    Not at all, if you are getting the required performance for your dataset then staging is fine.

  • Thank you!

    Also I cannot find a way to use select distict in flat file source, is there a way to do that?

    Thanks

  • No, after checking, I don't believe there is a way to do it. I was getting my etl tools confused. It would still require sorting to remove duplicates.

  • Thanks a lot

  • I think your original solution solves the problem well.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply