Subset Data Extraction: Pull or Destructive method?

  •  

    Need to develop strategy of generating a subset of data from 3 production databases to reduce test runs.  This includes at minimum drop / recreating table data and constraints. There are two methods being analyzed.  Looking for feedback / comments on these methods or possibly any other recommendations:

    Method Summary

    Pull Method

     

    Restore to db1_prod, db2_prod, db3_prod

    Create db1_test, db2_test, db3_test

    Drop / create tables on *_test

    Pull data (BCP and Filtered data tables) from prod to test

    Add constraints

    Add other objects (sp, view, functions, etc.)

     

    Pros:    Will always have a full production set to extract from.  Could be used for other development efforts

                Quick re-generation of test db unless more current data is needed

                No need to manage disk space / shrink DB prior to subset

                Provides flexibility for future daily replication of prod. Db

               

    Cons:    Slower performance on subset extraction

                Additional overhead in creating objects and transferring static data

                Not easily support full run 

     

    Destructive Method

     

    Restore to db1_prod, db2_prod, db3_prod

    Using same prod db, Select into temp, drop existing table, rename temp to existing table

    Add table attributes via SQLDMO

    Shrink DB

     

    Pros:    Supports full run if needed.

    Eliminates overhead of transferring static data / creating objects and dependencies outside of tables

                More portable as only 3 DB's required for subset run

                Current environment configuration better supports this process. 

     

    Cons:    May limit parallel development in that two full sets of DB's can't be supported.

                Slow performance in transfer to baseline.bak due to shrink DB.

    A bit of detail, but Any comments would be appreciated.

  • Is you production database a 24/7/365 item?

    Are your production and test databases on separate boxes?

    An approach I have taken in the past is to have a copy database with no constraints, triggers, indices etc.

    1. TRUNCATE all tables in the copy database.

    2. Filter copy data from production database to copy database. Make sure that you have recovery mode set to BULK LOGGED.

    3. Back up copy

    4. Restore on test (assuming that you have transferred all logins.

    5. Run script to build constraints, indices, triggers etc on test.

    This means that the copying of the actual data is very fast because the data you are inserting is aleady pristine as the production app has enforced data integrity.

  • How frequently do you need to refresh the data ??

    I use transactional & snapshot replication to get subsets of production databases to other databases. Then users can query the subsets on another server without impacting production.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply