Mining with Privacy

  • Mining with Privacy

    This is an interesting idea at how data mining can grow in the future, continue to be very useful, but allow us to maintain some privacy. Instead of anonymizing data by removing names or some limited amount of information, encryption is used instead.

    The example given is where data submitted to the federal HUD offices was hashed into encrypted data so that the overall mining of the data could take place, but the individual data was encrypted and could be recovered if the need was shown to allow this.

    I think that's a good way to balance the needs of law enforcement and other agencies with the privacy of individuals. And this is probably a good law to pass for corporations as well, maybe even mandating one-way hashes for data purchased from external sources for mining.

    However there still need to be data controls around who can see and use the data, especially for decryption operations. It surely will place a larger burden on DBAs to ensure the data is not only protected well while it's being stored in a database, but also it's properly encrypted when you send it to someone else.

    Maybe this is a good place to implement those new CLR functions and stored procedures I keep reading about 🙂

    Steve Jones

  • It is an interesting idea. The problem with it is that unless different fields like names, social security numbers and some random salt are hashed together, then some idiots out there will simply hash only the SSNs through SHA1 or MD5 or something like that (ROT-13?), and it'll be a simple dictionary search for whoever comes across the data to match hashed values with the values used to generate the hashes.

    But where do you draw the line between what is personally identifying data (i.e., name, name+SSN) and what is non-identifying aggregating information?

    At some point, any commercial or "legitimate" interests will argue for the least amount of data to be hashed away, or that it will be simplistically hidden so that it is relatively trivial to unlock.

    What would probably be more beneficial is if Congress could (yeah, right) pass a law that ownership of personal data lies with the person it identifies, provides some serious recourse for companies that are irresponsible or negligent with it, etc., much like we have laws that ensure that the monies we put in banks are really ours, as much as bankers and investors would like it not to be. It would be nice if we deposited our identifying information (and could thusly retract it...), that we were giving the information collector the privilege of gathering it, not the other way around, etc.

    But that's probably asking a little too much, especially if Experian et al had anything to say about it.

  • When we have massive government agencies that silently ignore legal restrictions when inconvenient, I'm not sure that this will be more than feel good window dressing.

     

    ...

    -- FORTRAN manual for Xerox Computers --

  • Jay has a good point, as much as I dislike companies digging around in what I do, or sharing my shopping or browsing habits, The I far prefer that to the govt doing the same. 

    Honestly, the worst a company is out for is the ability to make money.  That means either using the information to try to sell me something, or absolute worst case, they steal an idea from me.  However, if I'm negligent enough to make my idea visible before I'm ready, that's in a part my own fault.

    The govt on the other hand scares me... they aren't concerned with money so much as they are with staying in power.  And so long as there are politicians, I don't trust them not to get backed into a corner, and end up mining information in a desperate search to incriminate whatever side doesn't happen to agree with whoever is cornered.  After all, I think all of us have inadvertantly violated atleast a few of those unenforced silly laws that are sitting out there on the books.  A govt with too much information could easily turn the act of disagreeing into a nightmare for 90some percent of the populace.

    yeh, yeh, I read 1984 one time too many

    </rant>

  • As long as the data can be unencrypted, it will be abused or hacked by someone. The only safe way to store it is with identifiers removed. The only reason personal info should be stored is if it is critical and then the risk is merited.

    How many places have you work where security was compromised due to ignorance or budget concerns. Almost every place that I have worked. Given that, I have no confidence that I can trust my personal information with just about anyone and particularly with the government who farms out data operations to slimey ChoicePoint.


    Karen Gayda
    MCP, MCSD, MCDBA

    gaydaware.com

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply