Merging data from Database2 into Database1

  • Hi all,

    Just wondered if anyone has attempted this in some shape or form.  Basically just want to merge the data from database 2 (source) into database 1 (target) where there is already data in the target database that must remain referentially in-tact.  Database2 is basically a duplicate of Database1's structure.  Only the data is different.

    The problem is where Table1 has a foreign key reference into Table2 (in both databases). This is something like as follows (this is just to demonstrate the relatioships and is far from what we have in the tables):

    Database 1

    Table1

    ID       Name           FKID

    ----------------------------------------------

    1        T.Jones        22

    2        J.Doh          25

    3        D.Howser       26

    4        J.Smith        22

    Table2

    ID       Favourite_Show

    -----------------------------------------------

    22       Chris Moyles

    25       Jo Wiley

    26       Scott Mills 

    Database 2

    Table1

    ID       Name           FKID

    ----------------------------------------------

    1        J.R.Hartley    11

    2        T.Mallet       11

    3        J.R.Ewing      13

    7        F.Le Grand     12

    Table2

    ID       Favourite_Show

    -----------------------------------------------

    11       Ken Bruce

    12       Terry Wogan

    13       Steve Wright 

    The ID column in Table1 is an identity.  Table2 is not.

    There are a lot of tables that have references to Table1.  They basically have the same relationship as Table1 to Table2 (1-to-1).  There are around 50 tables with this relationship.

    Some tables can have over 500,000 records in.  The database size is around 1GB although i don't need to copy across some of the data.

    I thought it would be simple enough to create a DTS to do this and perform the following main actions:

    1.  Copy Table1 data into a Temp Table (and offsetting the ID on Insert) using Execute SQL Task.

    2.  Offset the ID's in Database1 (Target) by a specified increment usign Execute SQL Task.

    3.  Copy the data across from Database2 (Source) using DTS Object Transfer - which would effectively leave that dataset unchanged (all ID's would remain the same).

    It seems like step 2 is hanging or taking over 30minutes to execute.

    Anyone got any thoughts or suggestions on this?

    Andez

     

     

     

  • Forgot to mention that I copied the SQL from Step to into Query Analyzer which took 20 minutes to execute.

  • Without seeing the code it's hard to know what is causing the delays. If there's lots of data, especially not indexed, it could be slow.

    I'm not sure I'd do a specified increment. You could run into issues and if you don't have a unqiue key, you could get duplicates. Rather I'd use some algorithm to find the new IDs that are free. Or find a range and insret them all at once.

  • Also forgot to mention that we would remove ALL FK References before executing the SQL to do the update.  The FK Constraints would then be recreated at the end of the merge process.

    The SQL to offset the table and it's references was basically executing the following (for the above example):

    DECLARE @intIDIncrement INT

    SET @intIDIncrement = 100

    -- Offset the primary table

    CREATE TABLE dbo.Tmp_Table2 (

            ID INT IDENTITY( 1, 1 ),

            Favourite_Show VARCHAR( 50 ) ) ON [PRIMARY]

    SET IDENTITY_INSERT dbo.Tmp_Table2 ON

    IF EXISTS(SELECT * FROM dbo.Table2)

      EXEC('INSERT INTO dbo.Tmp_Table2 (ID, Favourite_Show)

      SELECT ID + ' + @intIDIncrement + ', Favourite_Show FROM dbo.Table2 TABLOCKX')

    SET IDENTITY_INSERT dbo.Tmp_BM_Unit OFF

    DROP TABLE Table2

    EXECUTE sp_rename N'dbo.Tmp_Table2', N'Table2', 'OBJECT'

    -- Update any other table references

    UPDATE Table1

    SET FKID = FKID + @intIDIncrement

    UPDATE Table3

    SET FKID = FKID + @intIDIncrement

    ... And so on ...

    UPDATE Table50

    SET FKID = FKID + @intIDIncrement

    All in all there are a total of 5,110,000 records in all of the tables that require updating using this approach.

    All the tables have been indexed to make the system as efficient as possible.

  • I'm struggling with the bit where Scott Mills is in the Favourite_Show table, but anyway...

    Try copying D2.T2 into D1, then doing a lookup on D1.T2 against D2.T2 to see what the new ID is and storing that in a temp table against the old ID.

    Then update D2.T1 by setting the FKID to the new ID as stored in the temp table.  Then copy D2.T1 into D1 and it will be pointing to the correct data.

  • I haven't tested this, but does this make sense?

    1 - Do a Select with a left join from db1.table1 to db1.table2 (selecting all fields except the identity field INTO a new table (tablex)

    2 - Do the same as above for dB2.

    3 - Do a Select * and UNION on both tables (tablex and tabley) INTO a new Table.

    4- Update the new table with an identity field OR add the identity field in step 3 as a new field.

    Does my logic work?

  • Thanks for the contributions...

    It sounds like it would work off the top of my head although due to the massive amouts of tables that are referenced by the main table (about 50 in all) could be a tad long winded to write the SQL code for.

    I have been working on a little application for a while now that will generate some necessary scripts for things i need to do during this upgrade process (hopefully someday I can rewrite and merge into a single app that will make millions!! - not).  It generates the create and drop FK Constraints for the tables I tell it too.

    At the start of the process, it drops the constraints then updates the necessary tables by updating the PK with a hard-code offset in the SQL (again using my app, i could search through all tables that directly reference the PK column in the main table either directly on indirectly with a nifty search).  The app would then generate the necessary code for me to perform this update.

    At the end of the process, the FK constraints are then recreated with the generated script from my application.  All works well.  I decided to go with performing the update on the database with the least amount of data in it.  There are two main tables that reference lots of other tables (60 in total).  It takes around 20 minutes to execute the steps in a DTS package.

    Once the database has been prepared like this, it is a simple case of copying the table data across (using Copy SQL Server Objects Task in DTS).  All this is fine on SQL2000 but not done too much in 2005 as yet!

    Probs may seem like a funny way to do it, but seems logical and resonable in my way of thinking.

    Andez

     

     

     

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply