Fast data transfer ideas

  • Hi all,

    I'm designing a data transfer process, which is the last piece of our data warehouse ETL. The intent is to move tables from our "back room" build server to the "front room" end user environment. I'm trying to meet two goals:

    1) Fast data transfer -- utilizing a bulk copy or fast load operation

    2) Minimal downtime -- as little impact to users as possible

    I'm thinking SSIS is the best choice for this, since it natively does bulk transfers, and the "transfer objects" task can handle multiple tables without needing individual data flows . To minimize downtime, I'd like to transfer the data into differently named destination tables, then use a rename/drop process to shuffle the tables around. E.g.

    a) transfer dim_date to end user database as "dim_date_new"

    b) on end user database, rename dim_date to "dim_date_old"

    c) rename dim_date_new to dim_date

    d) drop dim_date_old

    This idea isn't new; it's been discussed many times in these forums. But I can't find a way to actually implement this -- I can't find a way to change object names or schemas in the transfer objects task, nor with a "roll your own" coded solution using the SMO Transfer object.

    Does anyone have examples of how they've implemented a very efficient data transfer operation like this? Thanks!

  • your problem is going to be that a dataflow requires the field names and types of both the source and destination to be defined prior to compilation so you would have to write a specific Data Flow Task for each table or potential table.

    If you need to move data then I think BCP will probably be your friend but you need to be careful about indexes and batch sizes.

    RedGate Toolbelt has a tool to extract a single table from a backup without restoring the entire backup, I have never used it and don't know whether it can be scripted, but it may resolve an issue.

    It wasn't clear from your post whether you meant, move data from the DEV server to PROD or move data from one PROD system (back office) to another PROD system (Data Warehouse), in which case, Slowly changing dimensions may be a better bet

  • aaron.reese (9/16/2013)


    your problem is going to be that a dataflow requires the field names and types of both the source and destination to be defined prior to compilation so you would have to write a specific Data Flow Task for each table or potential table.

    If you need to move data then I think BCP will probably be your friend but you need to be careful about indexes and batch sizes.

    RedGate Toolbelt has a tool to extract a single table from a backup without restoring the entire backup, I have never used it and don't know whether it can be scripted, but it may resolve an issue.

    It wasn't clear from your post whether you meant, move data from the DEV server to PROD or move data from one PROD system (back office) to another PROD system (Data Warehouse), in which case, Slowly changing dimensions may be a better bet

    Thanks for the thoughts! You're correct about the mapping required for data flow tasks, which is why I'd like to use the transfer object task instead. But I'm not sure if I'll ever find a way to rename objects during the transfer, or change the destination schema -- neither seems to be found in any of the SSIS or SMO documentation.

    So again, has anyone implemented something like this?

  • I gave up on this. I think the SMO transfer class (and therefore the SSIS transfer objects task) simply doesn't allow for changing certain object properties. My workaround was to first rename the table at the source, then run the transfer/swap, then rename the source table back. This works for me only because the source table is otherwise unused during the transfer, so the rename shouldn't disrupt any other processes.

    If anyone has better ideas, I'd love to hear them! I've always found efficient bulk table-to-table transfers to be much more difficult than they should be. Most solutions (bcp, bulk insert, openrowset) require writing to and reading from an intermediate file, which hardly seems like the most efficient approach.

  • BowlOfCereal (9/17/2013)


    I gave up on this. I think the SMO transfer class (and therefore the SSIS transfer objects task) simply doesn't allow for changing certain object properties. My workaround was to first rename the table at the source, then run the transfer/swap, then rename the source table back. This works for me only because the source table is otherwise unused during the transfer, so the rename shouldn't disrupt any other processes.

    If anyone has better ideas, I'd love to hear them! I've always found efficient bulk table-to-table transfers to be much more difficult than they should be. Most solutions (bcp, bulk insert, openrowset) require writing to and reading from an intermediate file, which hardly seems like the most efficient approach.

    One fairly standard approach is as follows (note: 'ChangeDate' refers to the maximum of DateCreated and DateModified and is a datetime(2) value):

    1) Set @ChangeDate = max(ChangeDate) from your target table.

    2) Select rows from source where ChangeDate >= @ChangeDate in a data flow

    3) Fire these rows into a truncated 'staging' table

    4) Fire a stored proc to MERGE from the staging table into the target

    An alternative to (3) and (4) would be

    3a) Use a Conditional Split to divide the data flow into updates and inserts.

    4a) Fire the inserts into the target table directly.

    4b) Fire the updates into a truncated staging table

    4c) Run a proc to update the target table from the staging table

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Phil Parkin (9/18/2013)[hrOne fairly standard approach is as follows (note: 'ChangeDate' refers to the maximum of DateCreated and DateModified and is a datetime(2) value):

    1) Set @ChangeDate = max(ChangeDate) from your target table.

    2) Select rows from source where ChangeDate >= @ChangeDate in a data flow

    3) Fire these rows into a truncated 'staging' table

    4) Fire a stored proc to MERGE from the staging table into the target

    An alternative to (3) and (4) would be

    3a) Use a Conditional Split to divide the data flow into updates and inserts.

    4a) Fire the inserts into the target table directly.

    4b) Fire the updates into a truncated staging table

    4c) Run a proc to update the target table from the staging table

    Thanks for the reply. My concern isn't change capture; my concern is efficiently moving that data - in your example, from the staging table into the target. We are moving hundreds of GB of data, so I need to find an approach which allows for bulk/minimally-logged loading. If there's a way to do that within a stored proc, I need to learn it!

    Also, as I mentioned, I'm trying to avoid constructing and maintaining data flows for every individual object (40+ constantly changing dimension and fact tables). All that brought me to the SSIS transfer task / SMO transfer class, which works well enough for what it does.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply