October 27, 2004 at 10:26 am
Hi,
I have to put together a demonstration database for sales and marketing. The two main objectives of this excercise are to keep the contents of the db as realistic as possible and to also maintain the privacy rights of the individuals represented in the production db, which will be the source of the demo db. Obviously I'd like to have it scripted so that I can produce these outputs on an ad hoc basis.
I'd like to hash all numeric identifiers and shuffle the names of people and some places. E.g. (Tom, Smith), (Jan, Young) becomes (Tom, Young), (Jan, Smith). After a brief search, I turned up very little on techniques for doing this, but it must be something that others deal with on a fairly regular basis. Is there a common hash function in t-sql? What about a procedure for shuffling a list of names? Ideally I'd also like to test the new db to ensure that the resulting
November 1, 2004 at 8:00 am
This was removed by the editor as SPAM
November 1, 2004 at 11:23 am
CHECKSUM() and BINARY_CHECKSUM() functions are built in to T-SQL for hashing...
As for your other requirement, I've never seen any code to do that. In a database I worked on with similar concerns, we created a big list of fake names and randomly replaced the real names in the database with the fake ones...
--
Adam Machanic
whoisactive
November 3, 2004 at 9:35 am
Thanks for the response.
I've checked into the checksum functions, but they result in too many collisions to be useful. In retrospect, hashing is probably not the way to go. I guess I will use a simpler transformation and be done with it. I'm just not sure if a simple transformation fits the 'due dilligence' test. Perhaps a bijective, non-linear transformation will suffice.
As far as using a list of fake names goes, it begs the question of actually generating the list in the first place.
I could certainly write some java code using Collections.shuffle(List) to get the results I want, but that would entail at least three quarters of a million update statements shoved across a jdbc connection - not exactly pretty.
In any case, thanks for the help.
November 3, 2004 at 9:53 am
Create a table for names, using your favorite (or least favorite) celebrities. Ask everyone in your office to contribute some names. Put it all in a table.
Then simply run:
UPDATE YourTable
SET Name =
(SELECT TOP 1 Name
FROM Names
ORDER BY NEWID())
And you're done.
--
Adam Machanic
whoisactive
November 5, 2004 at 12:46 pm
Hi, I had to make a bunch of fake data as well. I took the most popular firstnames and lastnames from the census folks and just crossed them then I created bogus SSN and Phone Numbers.
Here is a Comma delimited file with the top 1,000,000 Fake names,Social Security Numbers and Phone numbers if you care to use it
http://69.128.23.252/downloads/person.zip
HTH
Tal McMahon
November 5, 2004 at 2:03 pm
Thanks, Tal! I'm sure that will be very useful. Are you hosting that? Do you mind if I link to it from my blog?
--
Adam Machanic
whoisactive
November 8, 2004 at 7:03 am
November 8, 2004 at 8:48 am
Nah, probably my only readers are my mother and girlfriend
--
Adam Machanic
whoisactive
November 9, 2004 at 4:15 am
That times are gone!
http://www.insidesql.de/component/option,com_bookmarks/Itemid,42/catid,1/
( if you don't mind, Adam)
You need to scroll down a bit.
--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]
November 9, 2004 at 7:17 am
Quite the contrary, Frank. I really appreciate it. Thank you!
--
Adam Machanic
whoisactive
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply