May 15, 2005 at 3:30 am
Hi,
Our production database holds people's names, addresses and credit card numbers. When we copy the production data back to dev and QA, I would like to run a datawash script that re-arranges all this info, so that there is no real, complete data. For compliance we have put a simple change all member names, addresses to this single value, but this makes it hard to develop and troubleshoot because they are all the same.
What I was thinking that the script/s would replace a frst name with a random first name from the same field, replace surname with random surname, replace address line 1 etc etc
Has anyone done something like this before (I'm thinking this would take a few days to nut out, develop and test)? Does someone have a better datawash script?
Thanks in advance.
Steve
May 16, 2005 at 8:56 am
there is surprisingly little available. See:
http://www.sqlservercentral.com/forums/shwmessage.aspx?forumid=23&messageid=143718
May 16, 2005 at 2:50 pm
I've done this for several different reasons... but it is basically the same...munge existing data to be unreleated to real client data.
i've got a database of animal names, colors, and city names with views and functions to be used to update client data with meaningless stuff; the trick is you can't use a random or newid function directly from another function...you have to use a view to bypass the non-deterministic rule:
try this:
create view rand_CityName as
select top 1 poname from master.dbo.zipcodes order by newid()
CREATE VIEW Rand_Number
AS SELECT RAND() AS r
create function fn__randname()
returns varchar(128)
begin
declare @return varchar(128)
select top 1 @return= poname from rand_Cityname
return @return
end
create function fn__randnumber
(
@MaxVal int
)
returns int
begin
declare @return int
select top 1 @return = r * @MaxVal from Rand_Number
return @return
end
select dbo.fn__randname()
select dbo.fn__randnumber(4)
select dbo.fn__randnumber(98989)
update gmhome5 set name=dbo.fn__randname()
munging an address would be something like
update sometable set addr1= convert(varchar,dbo.fn__randnumber(9999)) + ' ' + dbo.fn__randcolor() + ' Street'
so you get something like '4593 Aquamarine Street'
the real work is determining which columns in your database need to be masked with fake data.
HTH.
Lowell
May 16, 2005 at 4:55 pm
See this for a scrambler utility:
http://www.sqlservercentral.com/forums/shwmessage.aspx?forumid=8&messageid=123939#bm177681
You can probably get an idea of how to modify it to do what you need.
Maybe you'll get inspired to add a randomization option to it. ( GUID() works well for simulating random #s)
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply