November 29, 2012 at 12:34 pm
First I'm dealing with a vender product who does not know their own product. With that said i have to try and scramble this type of info no problem. My problem is finding all of those columns in every table that might contain one. Now there is no method to how they store them xxx-xx-xxxx, xxxxxxxxx, xxx-xxxxxx and so on it also allows nulls. So i am a bit stupped at finding a way to locate all the columns. They also store as a number or varchar/nvarchar. I hope i have explained enough if not let me know i will try and give more information, Thanks in advance!
November 29, 2012 at 12:44 pm
bopeavy (11/29/2012)
First I'm dealing with a vender product who does not know their own product. With that said i have to try and scramble this type of info no problem. My problem is finding all of those columns in every table that might contain one. Now there is no method to how they store them xxx-xx-xxxx, xxxxxxxxx, xxx-xxxxxx and so on it also allows nulls. So i am a bit stupped at finding a way to locate all the columns. They also store as a number or varchar/nvarchar. I hope i have explained enough if not let me know i will try and give more information, Thanks in advance!
There is no way to write a script or something for this. You are just going to have to look in every table and see what is there. You could maybe look at searching all columns in all tables but even then it is impossible to know if the values are in fact SSNs.
Here is a VERY nasty script I wrote years ago for this type of searching. I will stress NOT to run this in production. It is hideously slow, consider that it looks at every single column and row in every single table it is going to be slow no matter what you do. I would bet that with a little tweaking you could use this to help.
declare @table_name varchar(2000)
declare @sSQL nvarchar(4000)
declare @result varchar(20)
declare @column_name varchar(2000)
declare @SearchVal varchar(200)
set @SearchVal = '%your search val here%'
declare @ColName varchar (250)
set @ColName = '%use this if you want to limit to a naming convention on the columns to search (i.e. email)%'
declare SearchList cursor for
select distinct so.name,sc.name from syscolumns sc
inner join sysobjects so on sc.id = so.id
where sc.name like @ColName
and so.type = 'U'
open SearchList
fetch next from SearchList into @table_name, @column_name
while(@@fetch_status = 0)
begin
select @sSQL = 'if exists (select ''' + @table_name + ''' as TableName,' + @column_name + ' from ' + @table_name + ' where ' + @column_name + ' like ''' + @SearchVal + ''') select ''' + @table_name + ''' as TableName,' + @column_name + ' from ' + @table_name + ' where ' + @column_name + ' like ''' + @SearchVal + ''''
exec sp_executesql @sSQL
--select @ssql
fetch next from SearchList into @table_name, @column_name
end
close SearchList
deallocate SearchList
Another long time poster Lowell has posted the following multiple times on this site. His does a few things differently than mine.
The same caveat of performance extends with his.
CREATE PROCEDURE sp_UGLYSEARCH
/*
--Purpose: to search every string column in a databasefor a specific word
--returns sql statement as a string which idnetifies the matching table
-- or when the optional parameter is used, the sql statement for the specific matching column.
--usage:
-- EXEC sp_UGLYSEARCH 'Provisional'
-- EXEC sp_UGLYSEARCH 'TEST'
-- creates one SQL for each table that actually has a match for the searched value i.e.
-- SELECT * FROM [ACACTSCR] WHERE [DESCRIP] LIKE '%TEST%' OR [TITLE] LIKE '%TEST%'
--optional parameter SEARCHBYCOLUMN
-- EXEC sp_UGLYSEARCH 'TEST',1
-- creates one SQL for each Column that actually has a match for the searched value i.e.
-- SELECT * FROM [dbo].[ACACTSCR] WHERE [DESCRIP] LIKE '%TEST%'
-- SELECT * FROM [dbo].[ACACTSCR] WHERE [TITLE] LIKE '%TEST%'
*/
@SEARCHSTRING VARCHAR(50),
@SEARCHBYCOLUMN INT = 0
AS
BEGIN
SET NOCOUNT ON
DECLARE @sql VARCHAR(max),
@SCHEMANAME VARCHAR(100),
@TABLENAME VARCHAR(100),
@COLUMNNAME VARCHAR(100),
@COLZ VARCHAR(max)
CREATE TABLE #RESULTS(SCHEMANAME VARCHAR(100), TBLNAME VARCHAR(100),COLNAME VARCHAR(100),SQL VARCHAR(max))
SELECT
SCHEMA_NAME(schema_id) AS SCHEMANAME,
objz.name AS TBLNAME,
colz.name AS COLNAME,
TYPE_NAME(colz.user_type_id) AS DATATYPE
INTO #TEMP
FROM sys.objects objz
INNER JOIN sys.columns colz ON objz.object_id = colz.object_id
WHERE objz.type='U'
AND TYPE_NAME(colz.user_type_id) IN ('VARCHAR','NVARCHAR','CHAR','NCHAR')
AND colz.max_length >= LEN(@SEARCHSTRING) --smart: don't search varchar(1) columns for 'TEST' 4xmpl
ORDER BY TBLNAME,COLNAME
IF @SEARCHBYCOLUMN = 0
BEGIN
DECLARE C1 CURSOR FOR
SELECT SCHEMANAME,TBLNAME,COLNAME FROM #TEMP ORDER BY SCHEMANAME,TBLNAME,COLNAME
OPEN C1
FETCH NEXT FROM C1 INTO @SCHEMANAME,@TABLENAME,@COLUMNNAME
WHILE @@FETCH_STATUS <> -1
BEGIN
SET @COLZ=''
SELECT @COLZ = @COLZ + QUOTENAME(COLNAME) + ' LIKE ''%' + @SEARCHSTRING + '%'' OR ' FROM #TEMP WHERE TBLNAME=@TABLENAME
--@COLZ has a trailing 'OR ' which must be removed
SET @COLZ = SUBSTRING(@COLZ,1,LEN(@COLZ) -3)
--PRINT @COLZ
SET @sql = 'IF EXISTS(SELECT * FROM ' + QUOTENAME(@SCHEMANAME) + '.' + QUOTENAME(@TABLENAME) + ' WHERE ' + @COLZ + ') INSERT INTO #RESULTS(TBLNAME,COLNAME,SQL) VALUES(''' + @TABLENAME + ''',''-'','' SELECT * FROM ' + QUOTENAME(@TABLENAME) + ' WHERE ' + REPLACE(@COLZ,'''','''''') + ''') ;'
--PRINT @sql
EXEC (@SQL)
FETCH NEXT FROM C1 INTO @SCHEMANAME,@TABLENAME,@COLUMNNAME
END
CLOSE C1
DEALLOCATE C1
END
ELSE --@SEARCHBYCOLUMN <> 0
BEGIN
DECLARE C2 CURSOR FOR
SELECT SCHEMANAME,TBLNAME,COLNAME FROM #TEMP ORDER BY SCHEMANAME,TBLNAME,COLNAME
OPEN C2
FETCH NEXT FROM C2 INTO @SCHEMANAME,@TABLENAME,@COLUMNNAME
WHILE @@FETCH_STATUS <> -1
BEGIN
SET @sql = 'IF EXISTS(SELECT * FROM ' + QUOTENAME(@SCHEMANAME) + '.' + QUOTENAME(@TABLENAME) + ' WHERE ' + QUOTENAME(@COLUMNNAME) + ' LIKE ''%' + @SEARCHSTRING + '%'')
INSERT INTO #RESULTS(SCHEMANAME,TBLNAME,COLNAME,SQL) VALUES(''' + @SCHEMANAME + ''',''' + @TABLENAME + ''',''' + @COLUMNNAME + ''',''
SELECT * FROM ' + QUOTENAME(@SCHEMANAME) + '.' + QUOTENAME(@TABLENAME) + ' WHERE ' + QUOTENAME(@COLUMNNAME) + ' LIKE ''''%' + @SEARCHSTRING + '%'''' '') ;'
PRINT @sql
EXEC (@SQL)
FETCH NEXT FROM C2 INTO @SCHEMANAME,@TABLENAME,@COLUMNNAME
END
CLOSE C2
DEALLOCATE C2
END --@SEARCHBYCOLUMN <> 0
SELECT * FROM #RESULTS ORDER BY TBLNAME,COLNAME
END --PROC
GO
SELECT * FROM #RESULTS ORDER BY TBLNAME,COLNAME
END --PROC
GO
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
November 29, 2012 at 12:46 pm
Grab paddle, proceed up creek.
There's really no way to do what you're looking for, primarily because of the part that's frustrating you, the lack of a dedicated pattern.
You could, in theory, search every column in every table for anything fitting a pattern and then use that to reduce your search (like '___-__-____' for example) but that's pretty heavy handed and would require a lot of dynamic SQL coding.
My personal approach would be slightly different. Grab a dev copy of the Vendor app, and trace the db for proc and adhoc SQL calls. Then have someone familiar with the app go to everywhere that a SSN can be plugged in and use the exact same one, then search the trace for that pattern to locate all the ways it can be dropped off.
Not pretty, but less painful then an end to end schema review.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
November 29, 2012 at 1:27 pm
ugg that is tough; you might even have columns that don't store the the number as a varchar without dashes 9and leave the presentation layer to dash-ify it, or even as a 9 digit integer.
i'd stick with a top down approach, searching column names and look for SSN,EIN,Taxpayer, etc and try to narrow it down that way, instead of looking at the data first.
Lowell
November 29, 2012 at 3:04 pm
I'd wait to check numeric columns, because it's gonna be even more difficult to determine if the values are SSNs.
As to [n][var]char, you do have some restrictions that help:
1) column length must be at least 9 (and maybe no more than, say, 30 ??)
2) pattern should be ######### | ###-##-#### | ###-######(??) [where # is 0-9, of course]
3) the SSN is the only (or at least leading) value in the column (??)
4) I'd suggest when you check a table, to check ALL columns in that table at once. I don't see the point of checking them separately (unless it was a limited number and they were all indexed, but that's gonna be difficult to determine dynamically too). It shouldn't be too difficult to generate that code. Of course it could run a while, depending on the size of your tables.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
November 30, 2012 at 2:18 am
This looks strangely familiar to a product I ran across some years back. Unfortunately I can't remember the name of it. Too long ago or I'm getting old or something.
I think I may have some documentation or some such in my archives but I'm not sure what to search on.
Perhaps if you gave us the name of the product and vendor, I might be able to dig something up.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
November 30, 2012 at 9:12 am
bopeavy (11/29/2012)
First I'm dealing with a vender product who does not know their own product. With that said i have to try and scramble this type of info no problem. My problem is finding all of those columns in every table that might contain one. Now there is no method to how they store them xxx-xx-xxxx, xxxxxxxxx, xxx-xxxxxx and so on it also allows nulls. So i am a bit stupped at finding a way to locate all the columns. They also store as a number or varchar/nvarchar. I hope i have explained enough if not let me know i will try and give more information, Thanks in advance!
I appreciate you trying to fix this severe violation of multiple privacy laws but I believe my next step would be to advise the vendor that you're going to report them to the SEC and Social Security Administration for storing SSNs and EIDs in plain text... and then do it. If you want, tell me who the vendor is and what the product is an I'll report the buggers for ya! Vendors like this have to be stopped cold.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply