any tips on anonymising data

  • Hi guys,

    I am being asked to restore a live database to a training environment. The database contains patient confidential records and I would like to keep all of the exisiting data but change the first names, surnames and addresses (1st lines and postcodes will do) but keep the data meaningful.

    does anyone have any tip or tricks to do this efficiently.

    I am thinking of taking distinct counts of the first name, last name and sex and then using this with a random number generator to seed a table function and replacing the real data with the results from a table function that takes the sex and returns a name in proportion to the distinct counts.

    Doing the same with addresses is a bit harder as you need to split the address line into its component parts and possibly keep postcodes with post towns.

  • aaron.reese (3/30/2010)


    Hi guys,

    I am being asked to restore a live database to a training environment. The database contains patient confidential records and I would like to keep all of the exisiting data but change the first names, surnames and addresses (1st lines and postcodes will do) but keep the data meaningful.

    does anyone have any tip or tricks to do this efficiently.

    I am thinking of taking distinct counts of the first name, last name and sex and then using this with a random number generator to seed a table function and replacing the real data with the results from a table function that takes the sex and returns a name in proportion to the distinct counts.

    Doing the same with addresses is a bit harder as you need to split the address line into its component parts and possibly keep postcodes with post towns.

    Just replace the names with a length of NextID.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Script the structure and then use a tool like this:

    http://www.red-gate.com/products/SQL_Data_Generator/index.htm

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply