Bulk Insert vs. Data Flow Task?

  • Yes, I'm kind of a noob with SSIS. We frequently move hundreds of thousands of records from Text files to Raw tables, and ultimatley to production. After I get the Raw data into a Database table, I use a Data Flow Task to push it to a Staging table.

    What does Bulk Insert get me? Just throwing it out there to see what comes back.

  • CptCrusty1 (4/23/2013)


    Yes, I'm kind of a noob with SSIS. We frequently move hundreds of thousands of records from Text files to Raw tables, and ultimatley to production. After I get the Raw data into a Database table, I use a Data Flow Task to push it to a Staging table.

    What does Bulk Insert get me? Just throwing it out there to see what comes back.

    Bulk Insert allows you to quickly insert rows into a SQL table from an external source, without resorting to SSIS. I can't see how it would fit well into what you are doing at the moment.

    Your data flow sounds odd, by the way.

    Text File --> Raw File --> database table --> Staging table

    Why would you do this? What happens to data in the staging table?

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Hey Phil!

    Flow is as follows:

    Text File --> Raw Table --> Staging Table --> Production Staging Table (Different VM) --> Production Table.

    A lot, if not all of the Transformations happen in the Staging Table. As a general rule, I don't think any CRUD/MACD should happen once the data is pushed to the production environment.

  • CptCrusty1 (4/23/2013)


    Hey Phil!

    Flow is as follows:

    Text File --> Raw Table --> Staging Table --> Production Staging Table (Different VM) --> Production Table.

    A lot, if not all of the Transformations happen in the Staging Table. As a general rule, I don't think any CRUD/MACD should happen once the data is pushed to the production environment.

    I'm sure that you have your reasons, but that looks like one-too-many staging tables to me.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Phil,

    I would tend to agree with you, the the practice has been in place for a while now. The main reason is that they are using VM instances of SQL Server. All the new data is brought in on Server "A" and is physically seperated from Server "B". Once all the Crunching has taken place on "A's" Staging table. The data is then pushed via Data Flow Task to the Production Server "B". The Staging table there is then compared to the final landing table. Since the possibility exists for duplicates, the tables are compared at a row level and only new stuff gets through. Yes, it's a couple extra steps, but they're conservative with this and it works.

    Server "A" is located in house. Server "B" is located off site in a Co-location as part of the Disaster Recovery plan.

    Make Sense?

    Crusty.

  • I knew that you'd have your reasons - sounds robust and safe to me.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply