January 25, 2022 at 12:22 am
We would need to restore our data from production to training environment. To protect the data we would like to scramble some data like address, lastname, etc. I know there is a data masking thing, but what we would like to do is still showing data but it is scrambled data. We want them to also work without breaking table relations. Any suggestions about good approaches to do data scrambling?
Thanks
January 26, 2022 at 1:10 am
Thanks for posting your issue and hopefully someone will answer soon.
This is an automated bump to increase visibility of your question.
January 26, 2022 at 8:03 am
I think RedGate has got some nice tools to handle that kind of operation
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
January 26, 2022 at 2:15 pm
I agree with Johan, you want to take a look at Data Masker. It's probably the single most sophisticated tool for getting this done, and it will absolutely keep relations in place in your database.
It's possible to write your own scripts for this, but it's a ton of work and they'll be hard to maintain.
DISCLOSURE: I work for Redgate.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
January 26, 2022 at 7:20 pm
You’re into writing your own queries really.
So it’s starting with a good data dictionary. What tables have what columns and what data in those columns needs to be masked.
Then your into developing the queries to scramble the data and how you want the scrambling to occur.
Then your into testing
Then your into the constant iterative process of fixing, scrambling, testing.
Then you have new data/new tables and having to rinse and repeat the whole process.
Now ask yourself, all that time and effort is going to cost money. Ok it’s remuneration costs so it will fly under the radar, but if that takes you 6 months to complete then that’s 6 months that could be spent else where on a solution which is making money rather than scrambling non prod.
Compare the wage cost to lost revenue to buying off the shelf and which is cheaper now.
You will be surprised usually in these things that you should of just got a 3rd party all in from the start as the quickest and long term cheaper solution.
If Red-Gate is to much look at ApexSQL they have a masking tool also.
But do you really need the whole production database in none prod? Could you get away with using a blank schema and using a data generator instead to create meaningful data instead of going through all the pain.
January 26, 2022 at 9:57 pm
Thank you both.
The product is good, but it is expensive. Not sure if we can get approval to have budget for that.
Any other recommendations?
Thanks
It'll be cheaper that writing your own and more of a guarantee that things work properly. It's the old pay now or pay dearly later type of thing.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 26, 2022 at 11:22 pm
Thanks all.
For answering this question: But do you really need the whole production database in none prod? Could you get away with using a blank schema and using a data generator instead to create meaningful data instead of going through all the pain.
Our database is a pretty complicated enterpirse database which has many tables. we do want data to make sense, and this is for our training enviroment, it has to have some features and data in our real business concept, and we just need to scramble some columns. Data generator may not satisfy the need, because it is too random data.
January 26, 2022 at 11:39 pm
A second thought I do have license of Redgate of data generator as in the toolbelt bundle. Never used this tool, but maybe it can be used to replace columns' data by generating new data according to rules. I will take a look into it.
Thanks much
January 27, 2022 at 1:17 am
Our database is a pretty complicated enterpirse database which has many tables. we do want data to make sense, and this is for our training enviroment, it has to have some features and data in our real business concept, and we just need to scramble some columns. Data generator may not satisfy the need, because it is too random data.
I totally agree. Nothing is more odd than real data.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply