Subset Data Extraction from PROD DB set

  • Need to develop strategy of generating a subset of data from 3 production databases to reduce test runs.  This includes at minimum drop / recreating table data and constraints. There are two methods being analyzed.  Looking for feedback / comments on these methods or possibly any other recommendations:

    Method Summary

    Pull Method

     

    Restore to db1_prod, db2_prod, db3_prod

    Create db1_test, db2_test, db3_test

    Drop / create tables on *_test

    Pull data (BCP and Filtered data tables) from prod to test

    Add constraints

    Add other objects (sp, view, functions, etc.)

     

    Pros:    Will always have a full production set to extract from.  Could be used for other development efforts

                Quick re-generation of test db unless more current data is needed

                No need to manage disk space / shrink DB prior to subset

                Provides flexibility for future daily replication of prod. Db

               

    Cons:    Slower performance on subset extraction

                Additional overhead in creating objects and transferring static data

                Not easily support full run 

     

    Destructive Method

     

    Restore to db1_prod, db2_prod, db3_prod

    Using same prod db, Select into temp, drop existing table, rename temp to existing table

    Add table attributes via SQLDMO

    Shrink DB

     

    Pros:    Supports full run if needed.

    Eliminates overhead of transferring static data / creating objects and dependencies outside of tables

                More portable as only 3 DB's required for subset run

                Current environment configuration better supports this process. 

     

    Cons:    May limit parallel development in that two full sets of DB's can't be supported.

                Slow performance in transfer to baseline.bak due to shrink DB.

    A bit of detail, but Any comments would be appreciated.

  • Well, FWIW, I don't like your "Destructive Method" because you are creating your test tables in the Production database(s).

    The "Pull Method" is better, but you still are using the production instance.  The purist in me says you should do all testing in a separate database, ideally on a separate server. 

    What we do in our company is to take a SQL backup from the production database, and restore the database in a test database on a test server(!), so we can do testing there...  You don't wanna touch the production environment, if possible!

    You might be able to do this, provided your prod. databases are small enough to allow backups (and then restores) to be performed in a reasonable timeframe.  Once you have the databases restored on your test box, you can DROP the tables you don't need, or DELETE the rows in tables you don't need. You might also need to re-create database user accounts in the test env., if appropriate. 

    Just a thought.

    - john

     

  • Before I read the rest of your reply I don't think I was clear enough.  This is all being performed in development.  The first step is to restore production backup to development environment.  Then I go from there. 

     

     

  • OK, given that you are doing everything in dev, you're cool.

    Question: In your "Destructive" method you list the step:

    "Using same prod db, Select into temp, drop existing table, rename temp to existing table"

    Why go thru this extra work renaming tables?  Is there some reason you can't test against the same table names in the dev. env. that the prod. env. uses? 

  • There were a couple of reasons:

    It helps clear constraints on a table

    "Select into" is quicker then "delete from" or "insert int" for that matter.

    Then use SQL DMO to add table attributes once data is loaded.

  • OK.

    If you're that concerned about "Select into" vs "delete from" or "insert into" performance, then it sounds like you need another strategy. 

    I suggest you create a multi-step SQL Agent job to do all your development processing (that restores the production backups, and performs all associated processing).  Have the job run each night at 3 AM or whenever you need it-- so that when you come into work all your tables are restored/populated/ready to go with the data you need.

    That approach (partly) takes the time-processing out of the equation, and will permit you to use "DELETE FROM" and "INSERT INTO". Further, the constraints will already be there--you won't have to re-create them!  

    That's my recommendation. Good luck!

    - john

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply