Using copies of prod database on development servers

  • For many years various people would copy a backup of a production database to a development server for restore/testing. For many reasons there is talk of either ending that practice or requiring the dev sql servers to meet production security standards. This will start a big fight, but I'm wondering how the community sees this and how you handle it.

    Making the data in the prod copy anonymous would be a huge effort. There is a method for creating an empty copy of a prod database but that is useless for performance testing.

  • Prod data does not belong in development. There is no problem having the data scrubbed and used in dev - except for effort in scrubbing it.

    I recommend developing a process to scrub that data.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • I don't fully agree with Cirq..

    I think a fundamental problem with many dev and test environments is limited data, prod data may not belong, but prod quantities DO. However with that said there is certain data that might be considered protected or restricted that needs to be scrubbed out. I think Cirq recommended developing a process for this, I concur. But do everything possible to keep the data quantities, I have seen MANY posts here about how a query performs differently from Dev to prod, and how the query was tuned using a limited set of data and not it has 10 times as much and it performs like crap. Something that could have been prevented or at least mitigated by having a production or near production level of dat.

    I work in an environment that has fairly strong requirements for this. When we move DB backups into non-prod environments the first question asked is whether the DB contains restricted data, if it does then it has to be scrubbed before I as the DBA can allow the developers at it.

    CEWII

  • Elliott W (2/24/2010)


    I don't fully agree with Cirq..

    I think a fundamental problem with many dev and test environments is limited data, prod data may not belong, but prod quantities DO. However with that said there is certain data that might be considered protected or restricted that needs to be scrubbed out. I think Cirq recommended developing a process for this, I concur. But do everything possible to keep the data quantities, I have seen MANY posts here about how a query performs differently from Dev to prod, and how the query was tuned using a limited set of data and not it has 10 times as much and it performs like crap. Something that could have been prevented or at least mitigated by having a production or near production level of dat.

    I work in an environment that has fairly strong requirements for this. When we move DB backups into non-prod environments the first question asked is whether the DB contains restricted data, if it does then it has to be scrubbed before I as the DBA can allow the developers at it.

    CEWII

    Thanks for clarifying. I was speaking solely to the data and not the quantity of data. Quantity of data needs to be able to mimicked so proper testing can be achieved. If you take the entire production data, scrub out the necessary data (make certain data generic), then you should have a good dev database to test with.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • CirquedeSQLeil (2/24/2010)


    Elliott W (2/24/2010)


    I don't fully agree with Cirq..

    I think a fundamental problem with many dev and test environments is limited data, prod data may not belong, but prod quantities DO. However with that said there is certain data that might be considered protected or restricted that needs to be scrubbed out. I think Cirq recommended developing a process for this, I concur. But do everything possible to keep the data quantities, I have seen MANY posts here about how a query performs differently from Dev to prod, and how the query was tuned using a limited set of data and not it has 10 times as much and it performs like crap. Something that could have been prevented or at least mitigated by having a production or near production level of dat.

    I work in an environment that has fairly strong requirements for this. When we move DB backups into non-prod environments the first question asked is whether the DB contains restricted data, if it does then it has to be scrubbed before I as the DBA can allow the developers at it.

    CEWII

    Thanks for clarifying. I was speaking solely to the data and not the quantity of data. Quantity of data needs to be able to mimicked so proper testing can be achieved. If you take the entire production data, scrub out the necessary data (make certain data generic), then you should have a good dev database to test with.

    There still is something to be said to run actual performance/load testing against actual data, and not scrubbed data. Things like security, data patterns, statistics, etc... can all be very different if you "sanitize the data". There's also something to be said for not running load testing on the "dev" environment. I don't know about you guys, but our devs are rather busy, so "dev" is often reasonably messy (unfinished code, etc...), so you get bad results.

    So - I would DEVELOP against a largely blank or dummied up data set, and - run the load testing/validation against a clean, secured environment (something like a "model office" type environment). It also has the nice side effect of allowing you to vet out your production rollout routines, AND is not polluted with DEV junk, so it should be a reasonably accurate reflection of production (if it's not an EXACT copy).

    Of course- that's an extra investment in hardware but worth it if you have

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt Miller (#4) (2/24/2010)


    There still is something to be said to run actual performance/load testing against actual data, and not scrubbed data. Things like security, data patterns, statistics, etc... can all be very different if you "sanitize the data". There's also something to be said for not running load testing on the "dev" environment. I don't know about you guys, but our devs are rather busy, so "dev" is often reasonably messy (unfinished code, etc...), so you get bad results.

    So - I would DEVELOP against a largely blank or dummied up data set, and - run the load testing/validation against a clean, secured environment (something like a "model office" type environment). It also has the nice side effect of allowing you to vet out your production rollout routines, AND is not polluted with DEV junk, so it should be a reasonably accurate reflection of production (if it's not an EXACT copy).

    Of course- that's an extra investment in hardware but worth it if you have

    Yes, Dev is messy, but frequent refreshes should help with this. A lot depends on your data, but in a given DB there really shouldn't be that much actual restricted data. I sometimes do local dev on a DB that has only lookup data, but any performance numbers here are useless. I agree that there should be at least 3 levels from dev to prod. Them being Dev, Test, Prod, at least..

    CEWII

  • there is talk of either ending that practice or requiring the dev sql servers to meet production security standards. This will start a big fight

    Are you concerned about the "big fight", some developer's egos being bruised. My suggestion is ALL servers will met production security standards. Inform those connected with development/QA to suck it up, if not find another job. Harsh yes ... but practical yes ...

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • We've had a situation for years where development priorities outweighed everything else. Now with more audits ( such as SAS70) and clients asking questions there is a movement to tighten things up. Certainly a good idea. Problem is, with tight deadlines facing DEV now, and established patterns of doing business, can you turn on a dime?

    If DEV is forced to find a new way of doing things, the short answer is deadlines will get pushed. It's all over my pay grade. Just wondering what other shops are doing.

  • Grumbling may happen. People will adjust - even if it takes a little longer to adjust than wanted. Deadlines may get pushed, but security and quality should carry more weight.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • There is no one answer, it is horses for courses.

    We deploy a solution at client sites and typically have 4 environments:

    1) Virtual server with sampling of data. This environment is used by our customer support team and for integration purposes with our internal source code control, etc.

    2) Dev - used for developing client environment and testing ETL. If the data set is small, it would be the full data set but not updated as regularly as the QA or prod dataset.

    3) QA - For UAT, training and performance testing with full set of data albeit on slower h/w

    4) Prod - Full data set on the appropriate h/w.

  • bitbucket-25253 (2/24/2010)


    My suggestion is ALL servers will met production security standards. Inform those connected with development/QA to suck it up, if not find another job. Harsh yes ... but practical yes ...

    I think you need to explain what you mean by meeting production standards on ALL servers.

    CEWII

  • Indianrock (2/24/2010)


    For many years various people would copy a backup of a production database to a development server for restore/testing. For many reasons there is talk of either ending that practice or requiring the dev sql servers to meet production security standards. This will start a big fight, but I'm wondering how the community sees this and how you handle it.

    Making the data in the prod copy anonymous would be a huge effort. There is a method for creating an empty copy of a prod database but that is useless for performance testing.

    If copying files across PROD and DEV is the constraint for your question then,

    here is easiest way do doing regular refreshes for DEV environment, speak to network team they can open a port in firewall to see prodbackfile share only one way, then you don't have to copy files to DEV environment you can refresh DEV environment from PROD network path, believe me it is very easy. Set up in firewall is also very simple they (network team) have set up access list and add DEV SQL servers to this group.

    EnjoY!

    EnjoY!
  • Historically, the prod backup share has been visible to certain people who can restore copies of the prod database to dev sql servers. This weekend we start a new situation where native sql backups will no longer be taken. Systems will be running Commvault and doing sql backups to Commvault's own disk device in commvault proprietary format so they will do all restores from now on. ( which also creates frustration and questions for the DEV team ). Commvault can only restore to boxes where their sql Idata agent is installed ( ~ $4,000 per server ).

    It sounds like DEV will have to come up with a way to make the prod data anonymous, although updating names and addresses to ( Smith?) would be a big data operation taking quite some time on a database with 16 million records already and growing all the time.

  • Indianrock (2/25/2010)


    Historically, the prod backup share has been visible to certain people who can restore copies of the prod database to dev sql servers. This weekend we start a new situation where native sql backups will no longer be taken. Systems will be running Commvault and doing sql backups to Commvault's own disk device in commvault proprietary format so they will do all restores from now on. ( which also creates frustration and questions for the DEV team ). Commvault can only restore to boxes where their sql Idata agent is installed ( ~ $4,000 per server ).

    It sounds like DEV will have to come up with a way to make the prod data anonymous, although updating names and addresses to ( Smith?) would be a big data operation taking quite some time on a database with 16 million records already and growing all the time.

    It doesn't need to be entirely anonymous. The items that should be scrubbed would be protected personal information such as government ids or numbers, credit card numbers and you would probably want to scrub email addresses as well. Why email addresses? You don't want to send emails from dev during some testing to 16 million customers - that could be a bit embarrassing.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Heh... my recommendation is that you deny the Developers of any information about the production server. Don't provide scrubbed data or any data at all. That way, you can't actually hold them to a development schedule and you can justify your IT budget with all the rework you'll need to do when you finally do promote code to production. It's a Win-Win situation. :-):-D:-P;-):w00t::cool::hehe:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply