demo db

  • Hi,

    I have to put together a demonstration database for sales and marketing. The two main objectives of this excercise are to keep the contents of the db as realistic as possible and to also maintain the privacy rights of the individuals represented in the production db, which will be the source of the demo db. Obviously I'd like to have it scripted so that I can produce these outputs on an ad hoc basis.

    I'd like to hash all numeric identifiers and shuffle the names of people and some places. E.g. (Tom, Smith), (Jan, Young) becomes (Tom, Young), (Jan, Smith). After a brief search, I turned up very little on techniques for doing this, but it must be something that others deal with on a fairly regular basis. Is there a common hash function in t-sql? What about a procedure for shuffling a list of names? Ideally I'd also like to test the new db to ensure that the resulting triples don't match any production ones, but that may not be a reasonable expectation. Thanks in advance.

  • This was removed by the editor as SPAM

  • CHECKSUM() and BINARY_CHECKSUM() functions are built in to T-SQL for hashing...

    As for your other requirement, I've never seen any code to do that. In a database I worked on with similar concerns, we created a big list of fake names and randomly replaced the real names in the database with the fake ones...

    --
    Adam Machanic
    whoisactive

  • Thanks for the response.

    I've checked into the checksum functions, but they result in too many collisions to be useful. In retrospect, hashing is probably not the way to go. I guess I will use a simpler transformation and be done with it. I'm just not sure if a simple transformation fits the 'due dilligence' test. Perhaps a bijective, non-linear transformation will suffice.

    As far as using a list of fake names goes, it begs the question of actually generating the list in the first place.

    I could certainly write some java code using Collections.shuffle(List) to get the results I want, but that would entail at least three quarters of a million update statements shoved across a jdbc connection - not exactly pretty.

    In any case, thanks for the help.

  • Create a table for names, using your favorite (or least favorite) celebrities. Ask everyone in your office to contribute some names. Put it all in a table.

    Then simply run:

    UPDATE YourTable

    SET Name =

    (SELECT TOP 1 Name

    FROM Names

    ORDER BY NEWID())

    And you're done.

    --
    Adam Machanic
    whoisactive

  •  

    Hi,  I had to make a bunch of fake data as well.  I took the most popular firstnames and lastnames from the census folks and just crossed them then I created bogus SSN and Phone Numbers.

    Here is a Comma delimited file with the top 1,000,000 Fake names,Social Security Numbers and Phone numbers  if you care to use it 

    http://69.128.23.252/downloads/person.zip

    HTH

    Tal McMahon

     


    Kindest Regards,

    Tal Mcmahon

  • Thanks, Tal! I'm sure that will be very useful. Are you hosting that? Do you mind if I link to it from my blog?

    --
    Adam Machanic
    whoisactive

  • Hey,

    It is an extra IP at work. it is only a half of a t1.  I would think it is ok to list unless you have a million readers that hit me all at once.

     

    tal 


    Kindest Regards,

    Tal Mcmahon

  • Nah, probably my only readers are my mother and girlfriend

    --
    Adam Machanic
    whoisactive

  • That times are gone!

    http://www.insidesql.de/component/option,com_bookmarks/Itemid,42/catid,1/

    ( if you don't mind, Adam)

    You need to scroll down a bit.

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Quite the contrary, Frank. I really appreciate it. Thank you!

    --
    Adam Machanic
    whoisactive

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply