Find all columns that might contain SSN numbers

  • First I'm dealing with a vender product who does not know their own product. With that said i have to try and scramble this type of info no problem. My problem is finding all of those columns in every table that might contain one. Now there is no method to how they store them xxx-xx-xxxx, xxxxxxxxx, xxx-xxxxxx and so on it also allows nulls. So i am a bit stupped at finding a way to locate all the columns. They also store as a number or varchar/nvarchar. I hope i have explained enough if not let me know i will try and give more information, Thanks in advance!

    :cool:

    "There are no problems! Only solutions that have yet to be discovered!"

  • bopeavy (11/29/2012)


    First I'm dealing with a vender product who does not know their own product. With that said i have to try and scramble this type of info no problem. My problem is finding all of those columns in every table that might contain one. Now there is no method to how they store them xxx-xx-xxxx, xxxxxxxxx, xxx-xxxxxx and so on it also allows nulls. So i am a bit stupped at finding a way to locate all the columns. They also store as a number or varchar/nvarchar. I hope i have explained enough if not let me know i will try and give more information, Thanks in advance!

    There is no way to write a script or something for this. You are just going to have to look in every table and see what is there. You could maybe look at searching all columns in all tables but even then it is impossible to know if the values are in fact SSNs.

    Here is a VERY nasty script I wrote years ago for this type of searching. I will stress NOT to run this in production. It is hideously slow, consider that it looks at every single column and row in every single table it is going to be slow no matter what you do. I would bet that with a little tweaking you could use this to help.

    declare @table_name varchar(2000)

    declare @sSQL nvarchar(4000)

    declare @result varchar(20)

    declare @column_name varchar(2000)

    declare @SearchVal varchar(200)

    set @SearchVal = '%your search val here%'

    declare @ColName varchar (250)

    set @ColName = '%use this if you want to limit to a naming convention on the columns to search (i.e. email)%'

    declare SearchList cursor for

    select distinct so.name,sc.name from syscolumns sc

    inner join sysobjects so on sc.id = so.id

    where sc.name like @ColName

    and so.type = 'U'

    open SearchList

    fetch next from SearchList into @table_name, @column_name

    while(@@fetch_status = 0)

    begin

    select @sSQL = 'if exists (select ''' + @table_name + ''' as TableName,' + @column_name + ' from ' + @table_name + ' where ' + @column_name + ' like ''' + @SearchVal + ''') select ''' + @table_name + ''' as TableName,' + @column_name + ' from ' + @table_name + ' where ' + @column_name + ' like ''' + @SearchVal + ''''

    exec sp_executesql @sSQL

    --select @ssql

    fetch next from SearchList into @table_name, @column_name

    end

    close SearchList

    deallocate SearchList

    Another long time poster Lowell has posted the following multiple times on this site. His does a few things differently than mine.

    The same caveat of performance extends with his.

    CREATE PROCEDURE sp_UGLYSEARCH

    /*

    --Purpose: to search every string column in a databasefor a specific word

    --returns sql statement as a string which idnetifies the matching table

    -- or when the optional parameter is used, the sql statement for the specific matching column.

    --usage:

    -- EXEC sp_UGLYSEARCH 'Provisional'

    -- EXEC sp_UGLYSEARCH 'TEST'

    -- creates one SQL for each table that actually has a match for the searched value i.e.

    -- SELECT * FROM [ACACTSCR] WHERE [DESCRIP] LIKE '%TEST%' OR [TITLE] LIKE '%TEST%'

    --optional parameter SEARCHBYCOLUMN

    -- EXEC sp_UGLYSEARCH 'TEST',1

    -- creates one SQL for each Column that actually has a match for the searched value i.e.

    -- SELECT * FROM [dbo].[ACACTSCR] WHERE [DESCRIP] LIKE '%TEST%'

    -- SELECT * FROM [dbo].[ACACTSCR] WHERE [TITLE] LIKE '%TEST%'

    */

    @SEARCHSTRING VARCHAR(50),

    @SEARCHBYCOLUMN INT = 0

    AS

    BEGIN

    SET NOCOUNT ON

    DECLARE @SQL VARCHAR(max),

    @SCHEMANAME VARCHAR(100),

    @TABLENAME VARCHAR(100),

    @COLUMNNAME VARCHAR(100),

    @COLZ VARCHAR(max)

    CREATE TABLE #RESULTS(SCHEMANAME VARCHAR(100), TBLNAME VARCHAR(100),COLNAME VARCHAR(100),SQL VARCHAR(max))

    SELECT

    SCHEMA_NAME(schema_id) AS SCHEMANAME,

    objz.name AS TBLNAME,

    colz.name AS COLNAME,

    TYPE_NAME(colz.user_type_id) AS DATATYPE

    INTO #TEMP

    FROM sys.objects objz

    INNER JOIN sys.columns colz ON objz.object_id = colz.object_id

    WHERE objz.type='U'

    AND TYPE_NAME(colz.user_type_id) IN ('VARCHAR','NVARCHAR','CHAR','NCHAR')

    AND colz.max_length >= LEN(@SEARCHSTRING) --smart: don't search varchar(1) columns for 'TEST' 4xmpl

    ORDER BY TBLNAME,COLNAME

    IF @SEARCHBYCOLUMN = 0

    BEGIN

    DECLARE C1 CURSOR FOR

    SELECT SCHEMANAME,TBLNAME,COLNAME FROM #TEMP ORDER BY SCHEMANAME,TBLNAME,COLNAME

    OPEN C1

    FETCH NEXT FROM C1 INTO @SCHEMANAME,@TABLENAME,@COLUMNNAME

    WHILE @@FETCH_STATUS <> -1

    BEGIN

    SET @COLZ=''

    SELECT @COLZ = @COLZ + QUOTENAME(COLNAME) + ' LIKE ''%' + @SEARCHSTRING + '%'' OR ' FROM #TEMP WHERE TBLNAME=@TABLENAME

    --@COLZ has a trailing 'OR ' which must be removed

    SET @COLZ = SUBSTRING(@COLZ,1,LEN(@COLZ) -3)

    --PRINT @COLZ

    SET @SQL = 'IF EXISTS(SELECT * FROM ' + QUOTENAME(@SCHEMANAME) + '.' + QUOTENAME(@TABLENAME) + ' WHERE ' + @COLZ + ') INSERT INTO #RESULTS(TBLNAME,COLNAME,SQL) VALUES(''' + @TABLENAME + ''',''-'','' SELECT * FROM ' + QUOTENAME(@TABLENAME) + ' WHERE ' + REPLACE(@COLZ,'''','''''') + ''') ;'

    --PRINT @SQL

    EXEC (@SQL)

    FETCH NEXT FROM C1 INTO @SCHEMANAME,@TABLENAME,@COLUMNNAME

    END

    CLOSE C1

    DEALLOCATE C1

    END

    ELSE --@SEARCHBYCOLUMN <> 0

    BEGIN

    DECLARE C2 CURSOR FOR

    SELECT SCHEMANAME,TBLNAME,COLNAME FROM #TEMP ORDER BY SCHEMANAME,TBLNAME,COLNAME

    OPEN C2

    FETCH NEXT FROM C2 INTO @SCHEMANAME,@TABLENAME,@COLUMNNAME

    WHILE @@FETCH_STATUS <> -1

    BEGIN

    SET @SQL = 'IF EXISTS(SELECT * FROM ' + QUOTENAME(@SCHEMANAME) + '.' + QUOTENAME(@TABLENAME) + ' WHERE ' + QUOTENAME(@COLUMNNAME) + ' LIKE ''%' + @SEARCHSTRING + '%'')

    INSERT INTO #RESULTS(SCHEMANAME,TBLNAME,COLNAME,SQL) VALUES(''' + @SCHEMANAME + ''',''' + @TABLENAME + ''',''' + @COLUMNNAME + ''',''

    SELECT * FROM ' + QUOTENAME(@SCHEMANAME) + '.' + QUOTENAME(@TABLENAME) + ' WHERE ' + QUOTENAME(@COLUMNNAME) + ' LIKE ''''%' + @SEARCHSTRING + '%'''' '') ;'

    PRINT @SQL

    EXEC (@SQL)

    FETCH NEXT FROM C2 INTO @SCHEMANAME,@TABLENAME,@COLUMNNAME

    END

    CLOSE C2

    DEALLOCATE C2

    END --@SEARCHBYCOLUMN <> 0

    SELECT * FROM #RESULTS ORDER BY TBLNAME,COLNAME

    END --PROC

    GO

    SELECT * FROM #RESULTS ORDER BY TBLNAME,COLNAME

    END --PROC

    GO

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Grab paddle, proceed up creek.

    There's really no way to do what you're looking for, primarily because of the part that's frustrating you, the lack of a dedicated pattern.

    You could, in theory, search every column in every table for anything fitting a pattern and then use that to reduce your search (like '___-__-____' for example) but that's pretty heavy handed and would require a lot of dynamic SQL coding.

    My personal approach would be slightly different. Grab a dev copy of the Vendor app, and trace the db for proc and adhoc SQL calls. Then have someone familiar with the app go to everywhere that a SSN can be plugged in and use the exact same one, then search the trace for that pattern to locate all the ways it can be dropped off.

    Not pretty, but less painful then an end to end schema review.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • ugg that is tough; you might even have columns that don't store the the number as a varchar without dashes 9and leave the presentation layer to dash-ify it, or even as a 9 digit integer.

    i'd stick with a top down approach, searching column names and look for SSN,EIN,Taxpayer, etc and try to narrow it down that way, instead of looking at the data first.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • I'd wait to check numeric columns, because it's gonna be even more difficult to determine if the values are SSNs.

    As to [n][var]char, you do have some restrictions that help:

    1) column length must be at least 9 (and maybe no more than, say, 30 ??)

    2) pattern should be ######### | ###-##-#### | ###-######(??) [where # is 0-9, of course]

    3) the SSN is the only (or at least leading) value in the column (??)

    4) I'd suggest when you check a table, to check ALL columns in that table at once. I don't see the point of checking them separately (unless it was a limited number and they were all indexed, but that's gonna be difficult to determine dynamically too). It shouldn't be too difficult to generate that code. Of course it could run a while, depending on the size of your tables.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • This looks strangely familiar to a product I ran across some years back. Unfortunately I can't remember the name of it. Too long ago or I'm getting old or something.

    I think I may have some documentation or some such in my archives but I'm not sure what to search on.

    Perhaps if you gave us the name of the product and vendor, I might be able to dig something up.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • bopeavy (11/29/2012)


    First I'm dealing with a vender product who does not know their own product. With that said i have to try and scramble this type of info no problem. My problem is finding all of those columns in every table that might contain one. Now there is no method to how they store them xxx-xx-xxxx, xxxxxxxxx, xxx-xxxxxx and so on it also allows nulls. So i am a bit stupped at finding a way to locate all the columns. They also store as a number or varchar/nvarchar. I hope i have explained enough if not let me know i will try and give more information, Thanks in advance!

    I appreciate you trying to fix this severe violation of multiple privacy laws but I believe my next step would be to advise the vendor that you're going to report them to the SEC and Social Security Administration for storing SSNs and EIDs in plain text... and then do it. If you want, tell me who the vendor is and what the product is an I'll report the buggers for ya! Vendors like this have to be stopped cold.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply