March 11, 2011 at 3:15 am
Hello,
I have an question about how to unpersonalize a SQL database
The reason is I work in an hospital and we want to develop an new application on an SQL database, but in this database are patients with all their information like medication and other things
I don’t want to give the development team a copy of this database without atliest changing the name of the patient but there are a lot of patients
Is there a way to do this automatically ?
Greetings
March 11, 2011 at 3:28 am
I've had to do this a few times, and couldn't find any tools to help (I haven't looked recently, so there may be some now).
In the end I wrote a few functions to scramble names, dates of birth etc and then created a script to do the scrambling, table by table, something like this...
UPDATE patient set PatientName = dbo.ScrambleName(PatientName),
PatientDOB = dbo.ScrambleDate(PatientDOB) etc etc
I don’t want to give the development team a copy of this database...
It may not be a question of WANT, there are quite probably laws that say you CAN'T give the development team a copy, without scrambling the data.
March 11, 2011 at 3:33 am
Something like this?
WITH PatientNos AS (
SELECT ROW_NUMBER () OVER (ORDER BY NEWID()) AS PNo
, *
FROM Patients
)
UPDATE PatientNos
SET PatientName = 'Patient' + CAST (PNo as varchar(7))
John
March 14, 2011 at 8:54 am
There may be a lot more than name you need to scramble. SSN. MRN. DOB. When you're done there shouldn't be any way to track back to the actual patient. I wasn't part of the project so I don't have the code but we pulled a client's DB so we had an internal system for testing that was full of data and had to scramble all patient identifying information. Instead of doing something linear like John posted we went for completely random names and numbers. We ended up with long, incomprehensible names but it worked.
March 16, 2011 at 1:51 am
Thanx for your replays
I started with a script that replaces the name, address, and a couple of other fields to other information
So address is than address +IDnumber of the record.
So thanx for your help :w00t:
Greetings.
March 17, 2011 at 8:55 am
That might work for address but if you're planning on doing name, SSN and MRN like that it may not meet the requirements to be considered anonymous. Are you planning on using that method for those too or a different one?
March 18, 2011 at 1:12 am
Yes, i do this for all the fields where there is only a trace of personal information
March 18, 2011 at 8:34 am
You may have other, less-obvious considerations. I work at a law school, and sometimes I have to be careful not give out lists of sanitized students and their courses and grades for a term, if the assemblage of courses could be used to identify a particular student. You need to know your business and what information (other than the obvious) could be used to identify an individual.
Rich
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply