Character Scramble function won't accept datetime datatype

  • Greetings all, I am fairly new to SQL Server and new to these forums as I have only posted a couple times. I apologize if this seems a bit elementary to some of you advanced SQLers out there. The code I have listed below is a function that is called in a larger procedure for obfuscating data. The larger procedure reads in values from a "control table", which tells the procedure which columns to obfuscate, and then updates the various tables based on the values it reads from the control table. It basically reads the values from the control table and then curses (using a cursor) for the specified column, when it finds it it, it runs the character_scramble function on the column and updates the column with that value.

    This is a function that I have developed that is called in another procedure. This code takes a string and scrambles it, example: Johnson --> nojhson. The purpose of this is to obfuscate the data because it has been deemed "sensitive". It runs fine in its current state, the only problem is that there are a couple datetime columns that need to be obfuscated and the character_scramble function does not recognize them. I need to figure out a way to read in the datetime objects and scramble them as well. Either that or generate random dates and return them to the procedure, and I'm not quite sure how to accomplish either of those. One note, performance is not a huge issue here because this is being done on a relatively small database. Thanks in advance for the help!

    --drop view dbo.vwRandom;

    -- Create user defined function

    create function [dbo].[character_scramble]

    (

    @originalVal varchar(max)

    )

    returns varchar(max)

    as

    begin

    declare @newVal varchar(max);

    declare @origLen int;

    declare @currLen int;

    declare @loopCt int;

    declare @random int;

    set @newVal = '';

    set @origLen = datalength(@originalVal);

    set @currLen = @origLen;

    set @loopCt = 1;

    -- Loop through the characters passed

    while @loopCt <= @origLen

    begin

    set @currLen = datalength(@originalVal);

    select @random = convert(int,(((1) - @currLen) * RandomValue + @currLen))

    from dbo.vwRandom;

    set @newVal = @newVal + substring(@originalVal,@random,1);

    set @originalVal = Replace(@originalVal,SUBSTRING(@originalVal,@random,1),'');

    set @LoopCt = @LoopCt + 1;

    end

    -- Returns new value

    return lower(@newVal);

    end

    GO

  • How about something like this?

    CREATE VIEW dbo.Rnd

    AS

    SELECT NEWID() AS R ;

    GO

    ALTER FUNCTION dbo.character_scramble

    (@OriginalVal_in VARCHAR(MAX))

    RETURNS VARCHAR(MAX)

    AS

    BEGIN

    DECLARE @ScrambledVal_out VARCHAR(MAX) ;

    WITH NumbersSeed(Seed)

    AS (SELECT Seed

    FROM ( VALUES ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1) ) AS Val (Seed)),

    Numbers(Number)

    AS (SELECT ROW_NUMBER() OVER (ORDER BY S1.Seed)

    FROM NumbersSeed AS S1

    CROSS JOIN NumbersSeed AS S2

    CROSS JOIN NumbersSeed AS S3

    CROSS JOIN NumbersSeed AS S4

    CROSS JOIN NumbersSeed AS S5

    CROSS JOIN NumbersSeed AS S6)

    SELECT @ScrambledVal_out = CASE WHEN ISDATE(@OriginalVal_in) = 1 THEN (SELECT TOP (1)

    CAST(DATEADD(DAY, number, 0) AS VARCHAR(100))

    FROM Numbers

    WHERE Number <= 100000

    ORDER BY (SELECT R

    FROM dbo.Rnd))

    ELSE ( SELECT SUBSTRING(@OriginalVal_in, Number, 1)

    FROM Numbers

    WHERE Number <= LEN(@OriginalVal_in)

    ORDER BY (SELECT R

    FROM dbo.Rnd)

    FOR XML PATH(''),

    TYPE).value('.[1]', 'varchar(max)')

    END ;

    RETURN @ScrambledVal_out ;

    END ;

    GO

    SELECT dbo.character_scramble('bob') ;

    SELECT dbo.character_scramble('supercalifragilisticexpialodocious') ;

    SELECT dbo.character_scramble(GETDATE()) ;

    Can't use NewID() in a UDF, but putting it in a View and then querying the View works just fine. (Old trick I learned when I needed GetDate() in a UDF about 10 years ago.)

    It uses a Numbers table (CTE in this case) to parse out the string, then NewID to randomize the sequence, then For XML to put the string back together again.

    For dates, it selects a random date between Jan 1900 and 100,000 days after that (November of year 4637).

    Does that do what you need?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Thank you for your response. I ran it and it seems to work great thanks! ..... That being said, it doesn't seem to be "jiving" with my procedure that the character_scramble function is called in.

    One more question. When I call the function in my procedure I call it through the use of dynamic sql like this: @sql = ' dba.dbo.character_scramble(' + @column_name + ')'. In this function call @column_name is a parameter pulled from the "control table" (mentioned in the earlier post). The @column_name is a string value, but as you can see it is stored in a variable. My procedure isn't working correctly when I call the character_scramble function this way. Will this version of the character_scramble work in a situation like I am using it in, with the use of dynamic SQL to call the function?

  • Just curious... Why are you giving access to sensitive information to people who don't need it. If it is scrambled, why not just deny permission to the column? Or, use SQL Server encryption? Seems that would make life easier and actually keep the data safe instead of "maybe" safe.

    Jared
    CE - Microsoft

  • Jared,

    The data is being used in a production environment and the purpose of this data obfuscation is just to prevent any sensitive data from being seen by people that shouldn't see it. This is the type of procedure that was specifically requested from me, encryption is not what they were aiming for. What the procedure does is it curses through all the relevant tables and columns in a specified database and obfuscates them to a scrambled or random value. The reason that I was given for doing the procedure like this was that through the use of a control table this data could be obfuscated on many different databases/schemas/tables/columns in one fell swoop through the use of this procedure (by calling the character_scramble within the procedure and obfuscating a @column_name pulled from the control table).

    Thanks for the responses!!!

  • blampe (7/10/2012)


    Jared,

    The data is being used in a production environment and the purpose of this data obfuscation is just to prevent any sensitive data from being seen by people that shouldn't see it. This is the type of procedure that was specifically requested from me, encryption is not what they were aiming for. What the procedure does is it curses through all the relevant tables and columns in a specified database and obfuscates them to a scrambled or random value. The reason that I was given for doing the procedure like this was that through the use of a control table this data could be obfuscated on many different databases/schemas/tables/columns in one fell swoop through the use of this procedure (by calling the character_scramble within the procedure and obfuscating a @column_name pulled from the control table).

    Thanks for the responses!!!

    Interesting. To each his own. I guess it just doesn't make sense to me to return data to people who shouldn't see it. Why go through all of the trouble of obfuscating it or even returning it at all when it can't be used anyway. Just my 2 cents...

    Jared
    CE - Microsoft

  • Jared, believe me you are not the only person that is asking that same question. If I had to guess, I would say that this is something relatively low priority that the summer intern (me) can handle. This is the specific type of proc my supervisor requested, like you said, "To each his own."

    Also, I want to retract my previous statement about "performance not being an issue." This thing has been running for almost an hour and it is still not finished!!! Is there a way to get rid of those cross joins? I would say that that's robbing a great deal of my performance right there.

  • I wonder how much of a rocket scientist you need to be to figure out that "nojhson" is really "Johnson". 😉

    This type of "obfuscation" will not protect data and should not be used, ever.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff I think this is just being used within the company to protect some of the more sensitive client information from employees within the company that may have "devious agendas". My company deals with a lot of personal health information and while anyone can probably figure out that nohnjos = Johnson it would be pretty difficult to determine that the social security no 987-65-1234 is actually 497-85-2163. While it may not be as effective on the names and other string values, on numbers and dates it works rather well.

    Again, I am new to SQL Server and just trying to come through on a procedure that my boss handed down to me. It is incredibly slow right now and any performance ideas to speed it up would be greatly appreciated. Thanks!

    CREATE FUNCTION [dbo].[character_scramble]

    (@OriginalVal_in VARCHAR(MAX))

    RETURNS VARCHAR(MAX)

    AS

    BEGIN

    DECLARE @ScrambledVal_out VARCHAR(MAX) ;

    WITH NumbersSeed(Seed)

    AS (SELECT Seed

    FROM ( VALUES ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1) ) AS Val (Seed)),

    Numbers(Number)

    AS (SELECT ROW_NUMBER() OVER (ORDER BY S1.Seed)

    FROM NumbersSeed AS S1

    CROSS JOIN NumbersSeed AS S2

    CROSS JOIN NumbersSeed AS S3

    CROSS JOIN NumbersSeed AS S4

    CROSS JOIN NumbersSeed AS S5

    CROSS JOIN NumbersSeed AS S6)

    SELECT @ScrambledVal_out = CASE WHEN ISDATE(@OriginalVal_in) = 1 THEN (SELECT TOP (1)

    CAST(DATEADD(DAY, number, 0) AS VARCHAR(100))

    FROM Numbers

    WHERE Number <= 100000

    ORDER BY (SELECT R

    FROM dbo.Rnd))

    ELSE ( SELECT SUBSTRING(@OriginalVal_in, Number, 1)

    FROM Numbers

    WHERE Number <= LEN(@OriginalVal_in)

    ORDER BY (SELECT R

    FROM dbo.Rnd)

    FOR XML PATH(''),

    TYPE).value('.[1]', 'varchar(max)')

    END ;

    RETURN @ScrambledVal_out ;

    END ;

    GO

  • The problem is that it is a cursor and a function 🙂 Can you give us the stored proc that runs? This will help us to help you. The reality is that this should be done

    1. Without a cursor if at all possible (99 times out of 100 it is possible).

    2. Without a function; i.e. the manipulation should be built into the stored proc or query.

    Get us that info, some ddl for a sample table, and sample data.

    Jared
    CE - Microsoft

  • Jared I know you have commented on one of my other posts about this same procedure so we will see if this code looks familiar, but here is the code for the phi_scrub procedure. I know that cursors are incredibly inefficient, but I believe we had decided in a previous forum on this proc that, in this case, cursors were warranted.

    As far as the "control tables" (mentioned in an earlier post) go, there are two control tables that drive this procedure. One is called encrypt_table and the other is called encrypt_table_columns. encrypt_table has 5 columns where table information is stored, they are: table_id, database_name, sch_name, table_name, and active (a bit field determining whether the table is active or not). In the encrypt_table_columns table there are 3 columns, they are: table_id, column_name, and active (again a bit field determining whether the column is active or not). The cursors then curse through all the tables in a specified database and update the specific column specified based on whether or not they are active. There is also an audit flag that was added to this proc where when the procedure is called you specify either 1 or 0 if you want it to carry out the update. If 1 is specified the proc will just print out a report, i.e. "25 rows available for update in column @column_name of table @database_name.@sch_name.table_name".

    I'm open to any solutions to my performance issues, thanks again for the responses!

    CREATE TABLE [dbo].[encrypt_table_columns](

    [table_id] [int] NULL,

    [column_name] [varchar](50) NULL,

    [active] [bit] NULL

    )

    CREATE TABLE [dbo].[encrypt_table](

    [table_id] [int] IDENTITY(1,1) NOT NULL,

    [database_name] [varchar](50) NULL,

    [sch_name] [varchar](50) NULL,

    [table_name] [varchar](50) NULL,

    [active] [bit] NULL

    )

    create procedure [dbo].[phi_scrub](@audit char(1)) --audit must be passed as 1 for procedure to update the specified table

    as

    declare @table_id int

    declare @db_name varchar(128)

    declare @schema_name varchar(128)

    declare @table_name varchar(128)

    declare @STR varchar(4000)

    declare @aud_flag char(1)

    declare @aud_str varchar(4000)

    declare @row_no int

    set @aud_flag = @audit

    declare table_cur cursor read_only for

    select table_id, database_name, sch_name, table_name

    from DBA.dbo.encrypt_table

    where active = 1

    open table_cur

    fetch next from table_cur into @table_id, @db_name, @schema_name, @table_name

    if @@fetch_status > -1

    while @@fetch_status = 0

    begin

    --start the update statement

    set @STR = 'Update ' + @db_name + '.' + @schema_name + '.' + @table_name + ' SET '

    declare @column_name varchar(50)

    declare @ctr int

    declare col_cur cursor read_only for

    select column_name

    from DBA.dbo.encrypt_table_columns

    where table_id = @table_id and active = 1

    --init ctr

    set @ctr = 1

    open col_cur

    fetch next from col_cur into @column_name

    if @@fetch_status > -1

    while @@fetch_status = 0

    begin

    set @STR = @STR + @column_name + ' = dba.dbo.character_scramble(' + @column_name + '), '

    fetch next from col_cur into @column_name

    end

    else

    print 'empty column cursor! active value must be 1'

    close col_cur

    deallocate col_Cur

    --get rid of the last comma to fix syntax issue

    set @STR = substring(@str,1,Len(@str)-1)

    --include a cr/lf to make print statement readable

    set @STR = @STR + char(13)

    if(@aud_flag = 1)

    begin

    select @aud_str = 'select count(*) from ' + @db_name + '.' + @schema_name + '.' + @table_name

    declare @value table (RCount int)

    insert into @value

    execute (@aud_str)

    select @row_no = Rcount from @value

    print convert(varchar(10),@row_no) + ' rows available for update in column ' + @column_name + ' of table ' + @schema_name + '.' + @table_name

    end

    else

    begin

    Print @STR

    EXEC (@str)

    Print convert(varchar(10),@@rowcount) + ' rows updated for column ' + @column_name + ' of table ' + @schema_name + '.' + @table_name

    end

    fetch next from table_cur into @table_id, @db_name, @schema_name, @table_name

    end

    else

    print 'empty table cursor! active value must be 1'

    close table_cur

    deallocate table_Cur

    GO

  • I can see a use for this kind of thing in generating data for a dev or QA environment. Makes it so you can parallel patterns and volume of production data, without exposing production (sensitive) data to people who aren't bonded for it and that kind of thing.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply