July 10, 2012 at 9:19 am
Greetings all, I am fairly new to SQL Server and new to these forums as I have only posted a couple times. I apologize if this seems a bit elementary to some of you advanced SQLers out there. The code I have listed below is a function that is called in a larger procedure for obfuscating data. The larger procedure reads in values from a "control table", which tells the procedure which columns to obfuscate, and then updates the various tables based on the values it reads from the control table. It basically reads the values from the control table and then curses (using a cursor) for the specified column, when it finds it it, it runs the character_scramble function on the column and updates the column with that value.
This is a function that I have developed that is called in another procedure. This code takes a string and scrambles it, example: Johnson --> nojhson. The purpose of this is to obfuscate the data because it has been deemed "sensitive". It runs fine in its current state, the only problem is that there are a couple datetime columns that need to be obfuscated and the character_scramble function does not recognize them. I need to figure out a way to read in the datetime objects and scramble them as well. Either that or generate random dates and return them to the procedure, and I'm not quite sure how to accomplish either of those. One note, performance is not a huge issue here because this is being done on a relatively small database. Thanks in advance for the help!
--drop view dbo.vwRandom;
-- Create user defined function
create function [dbo].[character_scramble]
(
@originalVal varchar(max)
)
returns varchar(max)
as
begin
declare @newVal varchar(max);
declare @origLen int;
declare @currLen int;
declare @loopCt int;
declare @random int;
set @newVal = '';
set @origLen = datalength(@originalVal);
set @currLen = @origLen;
set @loopCt = 1;
-- Loop through the characters passed
while @loopCt <= @origLen
begin
set @currLen = datalength(@originalVal);
select @random = convert(int,(((1) - @currLen) * RandomValue + @currLen))
from dbo.vwRandom;
set @newVal = @newVal + substring(@originalVal,@random,1);
set @originalVal = Replace(@originalVal,SUBSTRING(@originalVal,@random,1),'');
set @LoopCt = @LoopCt + 1;
end
-- Returns new value
return lower(@newVal);
end
GO
July 10, 2012 at 10:46 am
How about something like this?
CREATE VIEW dbo.Rnd
AS
SELECT NEWID() AS R ;
GO
ALTER FUNCTION dbo.character_scramble
(@OriginalVal_in VARCHAR(MAX))
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE @ScrambledVal_out VARCHAR(MAX) ;
WITH NumbersSeed(Seed)
AS (SELECT Seed
FROM ( VALUES ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1) ) AS Val (Seed)),
Numbers(Number)
AS (SELECT ROW_NUMBER() OVER (ORDER BY S1.Seed)
FROM NumbersSeed AS S1
CROSS JOIN NumbersSeed AS S2
CROSS JOIN NumbersSeed AS S3
CROSS JOIN NumbersSeed AS S4
CROSS JOIN NumbersSeed AS S5
CROSS JOIN NumbersSeed AS S6)
SELECT @ScrambledVal_out = CASE WHEN ISDATE(@OriginalVal_in) = 1 THEN (SELECT TOP (1)
CAST(DATEADD(DAY, number, 0) AS VARCHAR(100))
FROM Numbers
WHERE Number <= 100000
ORDER BY (SELECT R
FROM dbo.Rnd))
ELSE ( SELECT SUBSTRING(@OriginalVal_in, Number, 1)
FROM Numbers
WHERE Number <= LEN(@OriginalVal_in)
ORDER BY (SELECT R
FROM dbo.Rnd)
FOR XML PATH(''),
TYPE).value('.[1]', 'varchar(max)')
END ;
RETURN @ScrambledVal_out ;
END ;
GO
SELECT dbo.character_scramble('bob') ;
SELECT dbo.character_scramble('supercalifragilisticexpialodocious') ;
SELECT dbo.character_scramble(GETDATE()) ;
Can't use NewID() in a UDF, but putting it in a View and then querying the View works just fine. (Old trick I learned when I needed GetDate() in a UDF about 10 years ago.)
It uses a Numbers table (CTE in this case) to parse out the string, then NewID to randomize the sequence, then For XML to put the string back together again.
For dates, it selects a random date between Jan 1900 and 100,000 days after that (November of year 4637).
Does that do what you need?
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
July 10, 2012 at 1:38 pm
Thank you for your response. I ran it and it seems to work great thanks! ..... That being said, it doesn't seem to be "jiving" with my procedure that the character_scramble function is called in.
One more question. When I call the function in my procedure I call it through the use of dynamic sql like this: @sql = ' dba.dbo.character_scramble(' + @column_name + ')'. In this function call @column_name is a parameter pulled from the "control table" (mentioned in the earlier post). The @column_name is a string value, but as you can see it is stored in a variable. My procedure isn't working correctly when I call the character_scramble function this way. Will this version of the character_scramble work in a situation like I am using it in, with the use of dynamic SQL to call the function?
July 10, 2012 at 2:20 pm
Just curious... Why are you giving access to sensitive information to people who don't need it. If it is scrambled, why not just deny permission to the column? Or, use SQL Server encryption? Seems that would make life easier and actually keep the data safe instead of "maybe" safe.
Jared
CE - Microsoft
July 10, 2012 at 2:32 pm
Jared,
The data is being used in a production environment and the purpose of this data obfuscation is just to prevent any sensitive data from being seen by people that shouldn't see it. This is the type of procedure that was specifically requested from me, encryption is not what they were aiming for. What the procedure does is it curses through all the relevant tables and columns in a specified database and obfuscates them to a scrambled or random value. The reason that I was given for doing the procedure like this was that through the use of a control table this data could be obfuscated on many different databases/schemas/tables/columns in one fell swoop through the use of this procedure (by calling the character_scramble within the procedure and obfuscating a @column_name pulled from the control table).
Thanks for the responses!!!
July 10, 2012 at 2:36 pm
blampe (7/10/2012)
Jared,The data is being used in a production environment and the purpose of this data obfuscation is just to prevent any sensitive data from being seen by people that shouldn't see it. This is the type of procedure that was specifically requested from me, encryption is not what they were aiming for. What the procedure does is it curses through all the relevant tables and columns in a specified database and obfuscates them to a scrambled or random value. The reason that I was given for doing the procedure like this was that through the use of a control table this data could be obfuscated on many different databases/schemas/tables/columns in one fell swoop through the use of this procedure (by calling the character_scramble within the procedure and obfuscating a @column_name pulled from the control table).
Thanks for the responses!!!
Interesting. To each his own. I guess it just doesn't make sense to me to return data to people who shouldn't see it. Why go through all of the trouble of obfuscating it or even returning it at all when it can't be used anyway. Just my 2 cents...
Jared
CE - Microsoft
July 10, 2012 at 3:12 pm
Jared, believe me you are not the only person that is asking that same question. If I had to guess, I would say that this is something relatively low priority that the summer intern (me) can handle. This is the specific type of proc my supervisor requested, like you said, "To each his own."
Also, I want to retract my previous statement about "performance not being an issue." This thing has been running for almost an hour and it is still not finished!!! Is there a way to get rid of those cross joins? I would say that that's robbing a great deal of my performance right there.
July 10, 2012 at 3:58 pm
I wonder how much of a rocket scientist you need to be to figure out that "nojhson" is really "Johnson". 😉
This type of "obfuscation" will not protect data and should not be used, ever.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 11, 2012 at 7:20 am
Jeff I think this is just being used within the company to protect some of the more sensitive client information from employees within the company that may have "devious agendas". My company deals with a lot of personal health information and while anyone can probably figure out that nohnjos = Johnson it would be pretty difficult to determine that the social security no 987-65-1234 is actually 497-85-2163. While it may not be as effective on the names and other string values, on numbers and dates it works rather well.
Again, I am new to SQL Server and just trying to come through on a procedure that my boss handed down to me. It is incredibly slow right now and any performance ideas to speed it up would be greatly appreciated. Thanks!
CREATE FUNCTION [dbo].[character_scramble]
(@OriginalVal_in VARCHAR(MAX))
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE @ScrambledVal_out VARCHAR(MAX) ;
WITH NumbersSeed(Seed)
AS (SELECT Seed
FROM ( VALUES ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1) ) AS Val (Seed)),
Numbers(Number)
AS (SELECT ROW_NUMBER() OVER (ORDER BY S1.Seed)
FROM NumbersSeed AS S1
CROSS JOIN NumbersSeed AS S2
CROSS JOIN NumbersSeed AS S3
CROSS JOIN NumbersSeed AS S4
CROSS JOIN NumbersSeed AS S5
CROSS JOIN NumbersSeed AS S6)
SELECT @ScrambledVal_out = CASE WHEN ISDATE(@OriginalVal_in) = 1 THEN (SELECT TOP (1)
CAST(DATEADD(DAY, number, 0) AS VARCHAR(100))
FROM Numbers
WHERE Number <= 100000
ORDER BY (SELECT R
FROM dbo.Rnd))
ELSE ( SELECT SUBSTRING(@OriginalVal_in, Number, 1)
FROM Numbers
WHERE Number <= LEN(@OriginalVal_in)
ORDER BY (SELECT R
FROM dbo.Rnd)
FOR XML PATH(''),
TYPE).value('.[1]', 'varchar(max)')
END ;
RETURN @ScrambledVal_out ;
END ;
GO
July 11, 2012 at 7:27 am
The problem is that it is a cursor and a function 🙂 Can you give us the stored proc that runs? This will help us to help you. The reality is that this should be done
1. Without a cursor if at all possible (99 times out of 100 it is possible).
2. Without a function; i.e. the manipulation should be built into the stored proc or query.
Get us that info, some ddl for a sample table, and sample data.
Jared
CE - Microsoft
July 11, 2012 at 7:52 am
Jared I know you have commented on one of my other posts about this same procedure so we will see if this code looks familiar, but here is the code for the phi_scrub procedure. I know that cursors are incredibly inefficient, but I believe we had decided in a previous forum on this proc that, in this case, cursors were warranted.
As far as the "control tables" (mentioned in an earlier post) go, there are two control tables that drive this procedure. One is called encrypt_table and the other is called encrypt_table_columns. encrypt_table has 5 columns where table information is stored, they are: table_id, database_name, sch_name, table_name, and active (a bit field determining whether the table is active or not). In the encrypt_table_columns table there are 3 columns, they are: table_id, column_name, and active (again a bit field determining whether the column is active or not). The cursors then curse through all the tables in a specified database and update the specific column specified based on whether or not they are active. There is also an audit flag that was added to this proc where when the procedure is called you specify either 1 or 0 if you want it to carry out the update. If 1 is specified the proc will just print out a report, i.e. "25 rows available for update in column @column_name of table @database_name.@sch_name.table_name".
I'm open to any solutions to my performance issues, thanks again for the responses!
CREATE TABLE [dbo].[encrypt_table_columns](
[table_id] [int] NULL,
[column_name] [varchar](50) NULL,
[active] [bit] NULL
)
CREATE TABLE [dbo].[encrypt_table](
[table_id] [int] IDENTITY(1,1) NOT NULL,
[database_name] [varchar](50) NULL,
[sch_name] [varchar](50) NULL,
[table_name] [varchar](50) NULL,
[active] [bit] NULL
)
create procedure [dbo].[phi_scrub](@audit char(1)) --audit must be passed as 1 for procedure to update the specified table
as
declare @table_id int
declare @db_name varchar(128)
declare @schema_name varchar(128)
declare @table_name varchar(128)
declare @STR varchar(4000)
declare @aud_flag char(1)
declare @aud_str varchar(4000)
declare @row_no int
set @aud_flag = @audit
declare table_cur cursor read_only for
select table_id, database_name, sch_name, table_name
from DBA.dbo.encrypt_table
where active = 1
open table_cur
fetch next from table_cur into @table_id, @db_name, @schema_name, @table_name
if @@fetch_status > -1
while @@fetch_status = 0
begin
--start the update statement
set @STR = 'Update ' + @db_name + '.' + @schema_name + '.' + @table_name + ' SET '
declare @column_name varchar(50)
declare @ctr int
declare col_cur cursor read_only for
select column_name
from DBA.dbo.encrypt_table_columns
where table_id = @table_id and active = 1
--init ctr
set @ctr = 1
open col_cur
fetch next from col_cur into @column_name
if @@fetch_status > -1
while @@fetch_status = 0
begin
set @STR = @STR + @column_name + ' = dba.dbo.character_scramble(' + @column_name + '), '
fetch next from col_cur into @column_name
end
else
print 'empty column cursor! active value must be 1'
close col_cur
deallocate col_Cur
--get rid of the last comma to fix syntax issue
set @STR = substring(@str,1,Len(@str)-1)
--include a cr/lf to make print statement readable
if(@aud_flag = 1)
begin
select @aud_str = 'select count(*) from ' + @db_name + '.' + @schema_name + '.' + @table_name
declare @value table (RCount int)
insert into @value
execute (@aud_str)
select @row_no = Rcount from @value
print convert(varchar(10),@row_no) + ' rows available for update in column ' + @column_name + ' of table ' + @schema_name + '.' + @table_name
end
else
begin
Print @STR
EXEC (@str)
Print convert(varchar(10),@@rowcount) + ' rows updated for column ' + @column_name + ' of table ' + @schema_name + '.' + @table_name
end
fetch next from table_cur into @table_id, @db_name, @schema_name, @table_name
end
else
print 'empty table cursor! active value must be 1'
close table_cur
deallocate table_Cur
GO
July 12, 2012 at 6:21 am
I can see a use for this kind of thing in generating data for a dev or QA environment. Makes it so you can parallel patterns and volume of production data, without exposing production (sensitive) data to people who aren't bonded for it and that kind of thing.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply