June 8, 2009 at 10:36 am
I have a table of customers name/address info that I want to use for demo purposes but don't want the actual names/addresses to be seen. I want to use something like Graphic Designers use in "greeking" the data so it is unrecognizable. Is there a command in SQL to do this?
June 8, 2009 at 12:07 pm
Depending on how many tables you need to change, you could use replace and just replace all 'a' with 'o' etc, etc.... or you could take a look at red gate's SQL data Generator. I think they have a trial for it...
Just remember that this will actually change your data, so perhaps do it to a dev copy or something so you don't kill your production data.
-Luke.
June 8, 2009 at 12:10 pm
Heh... just replace the names with NEWID().
--Jeff Moden
Change is inevitable... Change for the better is not.
June 9, 2009 at 3:46 pm
Personally, I like to use replicate('*',len(company)) which obviously will replace everything with a string of *'s the length of the data...
June 10, 2009 at 8:46 am
Using Jeff's idea of newid() to generate a random ID, here's something that I developed for our company's use. De-identifies data for HIPAA purposes, similar to what you're talking about, can shift dates around too so you don't have to worry about birthdates etc.
Generates the code you need to create your test table(s). You'll probably want to change the dynamic SQL part to look for your particular column names and/or change the verbiage so it doesn't call everyone a member/provider, that was specific to our line of work.
hth,
Jon
/*
Title: 'Generate De-identified Test Data.sql'
Description:HIPAA legislation requires that unless you 'need to know' data to perform your job,
you should not have access to it.
But what if you need help on querying your data?
The purpose of this script is to demonstrate how to generate test data
so that help can be sought enterprise-wide, from users who may not have
access to the actual data source.
Values entered into the Upper and Lower Bound variables will change dates randomly,
although the same values for both can be used to retain date information if necessary
Here are the steps to use this script (marked as such below):
STEP ONE - initialize variables as needed to name tables, limit results and move dates around
STEP TWO - enter your query, adding 'INTO ##myTemp' as noted and fully naming tables
STEP THREE - Execute this script and copy/paste first the Message tab,
then the Results tab into your post/email
STEP FOUR - Save a copy of the results in case you want to tie the new values back to the original
The 'iRow' in the generated table matches the row in your data table
Known flaws:n/a
Author: Jon Crawford
Healthcare Data Analyst
Revision History:
DateChanges
-------------
12/12/2008'initial implementation'
*/
SET NOCOUNT ON
USE TempDB -- DO NOT CHANGE THIS
-- =================================
-- DECLARE variables
-- =================================
DECLARE @shiftYearsLower int,-- lower bound of random years added to dates
@shiftYearsUpper int,-- upper bound of random years added to dates
@shiftMonthsLower int,-- lower bound of random months added to dates
@shiftMonthsUpper int,-- upper bound of random months added to dates
@shiftDaysLower int,-- lower bound of random days added to dates
@shiftDaysUpper int,-- upper bound of random days added to dates
@sq char(1), -- holds the value of a single quote to simplify our dynamic SQL
@limit varchar(25), -- holds the number of rows you want to generate
@columns varchar(8000), -- variable to hold the columns that we want
@createTableSQL varchar(8000), -- variable to hold the dynamic SQL that will generate our table
@mySelectList varchar(8000), -- variable to hold the dynamic SQL that represents your SELECT in the real world
@myInsertList varchar(8000), -- @myInsertList holds the dynamic SQL for the INSERT statements you need
-- if your inserts are too long for this,
-- then my advice is to rethink your inserts, you're passing test data. Make it fit.
@myTableData varchar(8000), -- holds the data that you will be copy/pasting
@myTempTableName varchar(255) -- holds the name of your current output table (set to #temp)
-- ===============================================
-- Initialize variables - DON'T CHANGE THESE ONES
-- ===============================================
-- we have to set all to Empty String ('') because we're adding their value to themselves,
--and if we don't, the first value is undefined, which wrecks the whole thing
SET @columns = ''
SET @createTableSQL = ''
SET @mySelectList = ''
SET @myInsertList = ''
SET @myTableData = ''
SET @sq = ''''
-- ======================================================================================
-- STEP ONE: Initialize variables - CHANGE THESE AS NEEDED
-- ======================================================================================
-- change @myTempTableName to whatever you want your table name to be that you will be posting
-- *** NOTE *** you'll have to run this script once for each temp table that you want to generate
SET @myTempTableName = '#temp'
-- set @limit to 'TOP x' where x is the number of rows you want to generate
-- or set to '' if you don't want to limit the results
-- (but remember your statements need to fit within an 8000 character variable)
SET @limit = 'TOP 10'
-- the following shift days/months/years in all date fields either a set number or a random number in a boundary
SET @shiftYearsLower = 10 -- lower bound of random years added to dates
SET @shiftYearsUpper = 10 -- upper bound of random years added to dates
SET @shiftMonthsLower = 1 -- lower bound of random months added to dates
SET @shiftMonthsUpper = 1 -- upper bound of random months added to dates
SET @shiftDaysLower = 0 -- lower bound of random days added to dates
SET @shiftDaysUpper = 0 -- upper bound of random days added to dates
-- =========================
-- END OF STEP ONE
-- =========================
IF OBJECT_ID('Tempdb..##myTemp') IS NOT NULL BEGIN DROP TABLE ##myTemp END
-- ======================================================================================
-- STEP TWO:
-- Add your SELECT statement below, this is a sample based on member and claim startdate
-- that shows both the original and modified data
-- SELECT INTO a global temp table (allows for complex joins, calculated fields, etc),
-- so you'll want to alias all columns that are modified in any way (e.g. column AS alias)
-- ======================================================================================
SELECT TOP 10 rtrim(member.fullname) AS member,
--'member'+convert(varchar(25),abs(checksum(member.fullname))) AS testMember,
rtrim(provider.fullname) AS provider,
--'provider'+convert(varchar(25),abs(checksum(provider.fullname))) AS testProvider,
rtrim(claim.claimid) AS claimid,
--10000000000+abs(checksum(claim.claimid)) AS testClaim,
claim.startdate--,
-- dateadd(yyyy,abs(checksum(claim.startdate))%(@shiftYearsUpper - @shiftYearsLower+1)+@shiftYearsLower,
-- dateadd(mm,abs(checksum(claim.startdate))%(@shiftMonthsUpper - @shiftMonthsLower+1)+@shiftMonthsLower,
-- dateadd(dd,abs(checksum(claim.startdate))%(@shiftDaysUpper-@shiftDaysLower+1)+@shiftDaysLower,claim.startdate)
-- )
-- ) AS testClaimStartdate
-- ======================
-- Add this to your query after your SELECT clause
INTO ##myTemp
-- FULLY NAME YOUR TABLES below along with aliases
-- e.g. JOIN QNXT_PLANDATA_xx.dbo.tablename tablename ON blah blah blah
-- ^database author^ ^table ^alias ^JOIN criteria
-- ======================
FROM myDB.dbo.member member
JOIN myDB.dbo.claim claim on member.memid = claim.memid
JOIN myDB.dbo.provider provider ON claim.provid = provider.provid
WHERE provider.fullname = 'Doe, John'
ORDER BY member.memid
-- =========================
-- END OF STEP TWO
-- =========================
-- ====================================================================
-- STEP THREE: Now Execute the query for each temp table and
--copy/paste from the Messages tab first and then
-- the Results tab into your post/email
-- ====================================================================
-- =======================================================================================================
-- STEP FOUR: Save the data from this as a reference in case you want to tie back to the original
-- =======================================================================================================
-- GENERATE INSERT STATEMENTS TO HELP OTHERS POPULATE #temp WITH TEST DATA
SELECT @columns = @columns + char(9) + cols.name + ' ' + systypes.name
+ CASE WHEN systypes.name IN ('char','varchar') THEN '('+convert(varchar(4),cols.length)+')' ELSE '' END
+ ',' + char(13),
@mySelectList = @mySelectList + cols.name + ', ',
@myInsertList = @myInsertList + @sq + ',' + @sq + char(13)
+
CASE
WHEN systypes.name IN ('char','varchar','datetime','smalldatetime')
THEN '+' +@sq+@sq+@sq+@sq+ ' + '
+CASE
WHEN cols.name IN ('memid','carriermemid','fullname','enrollid','member')
THEN
@sq+'member'+@sq+'+convert(varchar(25),abs(checksum(ISNULL('+objs.name+'.'+cols.name+','+@sq+@sq+')))) + '
WHEN cols.name = 'ssn'
THEN 'convert(varchar(9),100000000+abs(checksum('+objs.name+'.'+cols.name+','+@sq+@sq+'))) + '
WHEN cols.name IN ('provid','affiliateid','affiliationid','fedid','provider')
THEN
@sq+'provider'+@sq+'+convert(varchar(25),abs(checksum(ISNULL('+objs.name+'.'+cols.name+','+@sq+@sq+')))) + '
WHEN cols.name IN ('claimid','claim','encounter')
THEN 'convert(varchar(11),10000000000+abs(checksum('+objs.name+'.'+cols.name+','+@sq+@sq+'))) + '
WHEN systypes.name IN ('datetime','smalldatetime')
THEN 'convert(varchar,'+objs.name+'.'+cols.name+',20) + '
ELSE
'convert(varchar(25),ISNULL('+objs.name+'.'+cols.name+','+@sq+@sq+')) + '
END
+ @sq+@sq+@sq+@sq
ELSE '+ CONVERT(varchar(255),ISNULL(' + objs.name+'.'+cols.name+',0))'
END
+ ' + '
FROM dbo.sysobjects AS objs
INNER JOIN dbo.syscolumns AS cols ON cols.id = objs.id
INNER JOIN dbo.systypes systypes ON cols.xtype = systypes.xusertype
WHERE objs.name+'.'+cols.name IN (
SELECT objs.name + '.'+ cols.name
FROM sysobjects AS objs
INNER JOIN syscolumns AS cols
ON cols.id = objs.id
WHERE objs.name = '##myTemp'
)
-- Create temp table generation code
SET @createTableSQL = '
IF object_id(''Tempdb..'+@myTempTableName+''') IS NOT NULL
BEGIN DROP TABLE '+@myTempTableName+' END
CREATE TABLE '+@myTempTableName+'
(iRow int identity(1,1), -- identity column for primary key
' + substring(@columns,1,len(@columns)-2) + ')
--===== Add a Primary Key to maximize performance
IF OBJECT_ID(''Tempdb..'+@myTempTableName+''') IS NULL
BEGIN
ALTER TABLE '+@myTempTableName+'
ADD CONSTRAINT PK_'+@myTempTableName+'_iRow
PRIMARY KEY CLUSTERED (iRow)
WITH FILLFACTOR = 100
END'
PRINT @createTableSQL -- Created code will show up in Messages tab of QA, copy/paste into your email/posted question
PRINT '-- Insert test data into table' + char(13) + '-------------------------------------------------------' + char(13)
-- Created code will show up in Results tab of QA, copy/paste into your email/posted question
SET @myTableData = 'SELECT ' + @limit + ' ' + @sq + 'INSERT INTO '+@myTempTableName+' (' + substring(@mySelectList,1,len(@mySelectList)-1) + ') '
+ 'VALUES (' + @sq + substring(@myInsertList,4,len(@myInsertList)) + @sq +')' + @sq
+ ' AS [Copy and Paste these AFTER the text from the Messages tab]'
+ char(13)+ ' FROM ##myTemp'
--PRINT @myTableData -- uncommenting this will show you what the dynamic SQL generated from the above looks like
EXEC(@myTableData) -- execute the dynamic SQL we just generated, to create the INSERT statements
SELECT @myTempTableName + ': SAVE THIS RESULT, ' AS [Example of how the data is changed],
'the iRow of the ' + @myTempTableName + ' table' AS [.],
' corresponds to the row of this table' AS [..]
SELECT * FROM ##myTemp
---------------------------------------------------------
How best to post your question[/url]
How to post performance problems[/url]
Tally Table:What it is and how it replaces a loop[/url]
"stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply