Compare two like tables on different servers then update missing data

  • I am looking to implement an SSIS package that will go out to our ERP database server, grab specific data from the Item table then compare that data with the Item table in a different database on the same server.

    As we receive items, they are entered into our ERP system. I have the select statement to obtain the desired query results. The homegrown application that we are currently using requires that the data in the Item table be updated based on what is in the ERP database. I want to update the homegrown database's item table to match that of the ERP database.

    For example:

    Server 1 (ERP) contains the ERP database

    Server 2 (Homegrown) contains the database used by the homegrown system.

    Using a select statement we grab the desired data from Server 1. That data is then compared to what is in the database on Server 2. If are ItemNumbers missing in the database on Server 2. Insert those missing rows.

    So my real questions are..

    a) Can SSIS do this?

    b) What is the best way to handle this? Should I dump the selected data to a temp table on Server 2 then do the compare?

    Thanks!

  • SSIS can do this. Performance will depend on how many columns your are comparing? Are you just trying to do a join on keys or is it a column by column comparison?

    James Phillips
    Sr. Consultant
    Pragmatic Works

  • So there are 5 columns that will need to be inserted if the ItemNumber is missing, I am however only concerned with comparing the ItemNumber column between tables. Because the item numbers never repeat we use it as a natural key in the homegrown application.

    So..

    select ItemNumber, ItemGroupCode, Description, ItemOwner, ItemGroup

    from ERP

    Then..

    Compare just the item number column to the homegrown db's item table.

    If..

    ItemNumber is not in the homegrown db's item table. Insert all data that was selected from the ERP database's item table. (see select above)

    Thank you!

  • So you never need to update the columns, or are assuming they never change?

    What you described is a basic SSIS data flow. If you read the first 3 articles in Andy Leonard's Stairway[/url] on this site you'll be able to do it.

    There may be some settings to tweak based on how many rows you are talking about. And I'd strongly recommend you look at updating of rows as well.

    Or since you are using the same ID field as your natural key, you could even forego all the looking up, and just truncate and reload the table.

  • My recommendation would be to stage the data on Server 2. It would then be very easy to execute a merge (or insert/update) statement on your target table. You can of course use SSIS to control this process very effectively.

    Although possible to do this in flight with SSIS (without staging the data), I would not recommend it as you would have no reference of the data should something go wrong with your process.

  • Nevyn,

    The column names will not change, I will read that Stairways article and give it a shot, thank you for the link. I guess my first thought was to do everything on the fly. Perform a union all statement and then insert the missing rows. I wasn't thinking of using a staging area before the insert.

    Martin,

    Thank you for your input, would you suggest an additional database for staging or just a table that matches the ddl criteria?

    Thanks guys!

  • cstg85 (2/4/2015)


    Nevyn,

    The column names will not change, I will read that Stairways article and give it a shot, thank you for the link. I guess my first thought was to do everything on the fly. Perform a union all statement and then insert the missing rows. I wasn't thinking of using a staging area before the insert.

    Martin,

    Thank you for your input, would you suggest an additional database for staging or just a table that matches the ddl criteria?

    Thanks guys!

    I typically like to have a separate staging database, because you really don't need a backup of the data...only the ddl of your staging tables are important to backup. I also make use of schema's to denote the source system, and use the same table names as the source for easy reference.

  • cstg85 (2/4/2015)


    Nevyn,

    The column names will not change, I will read that Stairways article and give it a shot, thank you for the link. I guess my first thought was to do everything on the fly. Perform a union all statement and then insert the missing rows. I wasn't thinking of using a staging area before the insert.

    Martin,

    Thank you for your input, would you suggest an additional database for staging or just a table that matches the ddl criteria?

    Thanks guys!

    I didn't mean the column names, I meant their values.

    Specifically the values for ItemGroupCode, Description, ItemOwner, ItemGroup corresponding to a specific ItemNumber. If the description of an item changes in the ERP system you want it to change in your custom system too, right?

  • I didn't mean the column names, I meant their values.

    Specifically the values for ItemGroupCode, Description, ItemOwner, ItemGroup corresponding to a specific ItemNumber. If the description of an item changes in the ERP system you want it to change in your custom system too, right?

    Sorry for the misunderstanding, the item numbers would never change, however there is a chance for a description / item group codes to be updated in the future. I would like them to be updated in the system.

    I typically like to have a separate staging database, because you really don't need a backup of the data...only the ddl of your staging tables are important to backup. I also make use of schema's to denote the source system, and use the same table names as the source for easy reference.

    Would you suggest truncating the tables once everything is in the destination database?

    Thanks!

  • cstg85 (2/4/2015)


    I typically like to have a separate staging database, because you really don't need a backup of the data...only the ddl of your staging tables are important to backup. I also make use of schema's to denote the source system, and use the same table names as the source for easy reference.

    Would you suggest truncating the tables once everything is in the destination database?

    Thanks!

    Not unless the tables are really large and you need the space. I find it very helpful to have the staging data available until the next load, in case I need it for reference.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply