non-prod database data scrambling

  • We would need to restore our data from production to training environment. To protect the data we would like to scramble some data like address, lastname, etc.  I know there is a data masking thing, but what we would like to do is still showing data but it is scrambled data.  We want them to also work without breaking table relations. Any suggestions about good approaches to do data scrambling?

    Thanks

  • Thanks for posting your issue and hopefully someone will answer soon.

    This is an automated bump to increase visibility of your question.

  • I think RedGate has got some nice tools to handle that kind of operation

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • I agree with Johan, you want to take a look at Data Masker. It's probably the single most sophisticated tool for getting this done, and it will absolutely keep relations in place in your database.

    It's possible to write your own scripts for this, but it's a ton of work and they'll be hard to maintain.

    DISCLOSURE: I work for Redgate.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thank you both.

    The product is good, but it is expensive. Not sure if we can get approval to have budget for that.

    Any other recommendations?

    Thanks

    • This reply was modified 2 years, 11 months ago by  sqlfriend.
  • You’re into writing your own queries really.

    So it’s starting with a good data dictionary.  What tables have what columns and what data in those columns needs to be masked.

    Then your into developing the queries to scramble the data and how you want the scrambling to occur.

    Then your into testing

    Then your into the constant iterative process of fixing, scrambling, testing.

    Then you have new data/new tables and having to rinse and repeat the whole process.

     

    Now ask yourself, all that time and effort is going to cost money.  Ok it’s remuneration costs so it will fly under the radar, but if that takes you 6 months to complete then that’s 6 months that could be spent else where on a solution which is making money rather than scrambling non prod.

    Compare the wage cost to lost revenue to buying off the shelf and which is cheaper now.

    You will be surprised usually in these things that you should of just got a 3rd party all in from the start as the quickest and long term cheaper solution.

     

    If Red-Gate is to much look at ApexSQL they have a masking tool also.

    But do you really need the whole production database in  none prod?  Could you get away with using a blank schema and using a data generator instead to create meaningful data instead of going through all the pain.

  • sqlfriend wrote:

    Thank you both.

    The product is good, but it is expensive. Not sure if we can get approval to have budget for that.

    Any other recommendations?

    Thanks

    It'll be cheaper that writing your own and more of a guarantee that things work properly.  It's the old pay now or pay dearly later type of thing.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks all.

    For answering this question:  But do you really need the whole production database in  none prod?  Could you get away with using a blank schema and using a data generator instead to create meaningful data instead of going through all the pain.

    Our database is a pretty complicated enterpirse database which has many tables. we do want data to make sense, and this is for our training enviroment, it has to have some features and data in our real business concept, and we just need to scramble some columns. Data generator may not satisfy the need, because it is too random data.

  • A second thought I do have license of Redgate of data generator as in the toolbelt bundle. Never used this tool, but maybe it can be used to replace columns' data by generating new data according to rules. I will take a look into it.

    Thanks much

  • sqlfriend wrote:

    Our database is a pretty complicated enterpirse database which has many tables. we do want data to make sense, and this is for our training enviroment, it has to have some features and data in our real business concept, and we just need to scramble some columns. Data generator may not satisfy the need, because it is too random data.

    I totally agree.  Nothing is more odd than real data.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply