Is there a better way to write this query?

  • Management wants a report of columns that are personal data, so that we can make sure we are protecting that data from misuse. I need to find columns within the entire database that might possibly contain personal data.

    I have a working query, but is there a better way to write this?

    SELECT 'DOB' AS NPI, o.name AS TableName, c.name AS ColumnName FROM sys.columns c JOIN sys.objects o ON c.object_ID = o.object_id WHERE o.type = 'u' AND (c.name LIKE '%DOB%' OR c.name like '%birth%')

    UNION

    SELECT 'Email' AS NPI, o.name AS TableName, c.name AS ColumnName FROM sys.columns c JOIN sys.objects o ON c.object_ID = o.object_id WHERE o.type = 'u' AND Replace(c.name,'_','') LIKE '%email%'

    UNION

    SELECT 'Address' AS NPI, o.name AS TableName, c.name AS ColumnName FROM sys.columns c JOIN sys.objects o ON c.object_ID = o.object_id WHERE o.type = 'u' AND c.name like '%address%' AND Replace(c.name,'_','') not like '%emailaddress%'

    UNION

    SELECT 'Zip' AS NPI, o.name AS TableName, c.name AS ColumnName FROM sys.columns c JOIN sys.objects o ON c.object_ID = o.object_id WHERE o.type = 'u' AND c.name LIKE '%ZIP%' AND c.name not LIKE '%suffix%' and c.name not LIKE '%sfx%'

    UNION

    SELECT 'LastName' AS NPI, o.name AS TableName, c.name AS ColumnName FROM sys.columns c JOIN sys.objects o ON c.object_ID = o.object_id WHERE o.type = 'u' AND (Replace(c.name,'_','') LIKE '%lastname%' or Replace(c.name,'_','') like '%lname%')

    BTW - Yes, I realize I do not have SSN in my check. We do not collect SSN.

  • 😉

    Kindly used sql code tag for writing any sql query.

    And also post same data for batter view.

  • Marcia Q (3/26/2010)


    Management wants a report of columns that are personal data, so that we can make sure we are protecting that data from misuse. I need to find columns within the entire database that might possibly contain personal data.

    I have a working query, but is there a better way to write this?

    ...

    Well, here's the query that I would write:

    To all Application managers and Development Leads:

    Please be aware that as part of GloboCorp's Data Protection Initiative,

    we require the identification (application name, database, table and column

    name) of all production database columns that may hold personal user,

    employee, customer or vendor information. It is of highest important that

    we receive this information as soon as possible as without it we cannot

    guarantee the protection of this personal information within your application

    and it may even put GloboCorp in jeopardy of violating the applicable legal

    regulations and potentially subject to corresponding penalties.

    sincerely, Your database Team

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • A bit tongue in cheek perhaps, but my point is serious: database administration should not be doing something like this own their own and independent of the Applications or the business lines. Their is just no way that any number of uninformed SQL Queries can reliably determine this without guidance (or really good documentation) from the applications. You will miss things and then you could well be liable...

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • I agree... something like an SSN could be couched in a column called GlobalReferenceNumber, etc. There's too much of a chance of missing something. Make the creators and users of the data identify the personal data by server, instance, DB, table, and column name. They should also identify files that they may be importing, as well.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Vijay - ah yes, I should have used sql coding. My apologies. Nothing special with the data for this since these queries just run against the system tables. It can be run against AdventureWorks or any handy db.

    Jeff/Barry - thanks for the warning. There is waaay more analysis/effort going into the project than what I wrote, including work by application and business teams. Sorry to mislead, but I was just trying to keep my post short. This is not an enterprise wide activity. All of our other databases have already been thoroughly analyzed (ad nauseum), fought over, documented, locked down, etc. We will be acquiring this particular db from outside channels soon, and I'm just trying to help the app team with their work.

    All I really wanted to know is if there is a better and/or simpler way to write my query.

    BTW Barry, I got a kick out of your first post.

  • Heh, thanks Marcia. Sometimes we forget that there's more than on kind of query. 🙂

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply