Masking Data

  • Another tool to look at. They offer a 30-day eval and it looks to be a promising product. I played with it today and developing the rule sets are straightforward. I'm not sure about their pricing.

    http://www.datamasker.com/

  • Most security products I know are looking for tens of thousands of dollars. A little outrageous from my perspective because the risk isn't high to most people.

  • If you deal with senestive data, the cost could be huge if a breach or release of that data occurs. The problem is companies don't want to pay for protection, but wish they had when it does happen. Here are some others I have found.

    http://www.dataguise.com/products.php

    http://www.datamasking.com/

    http://www.applimation.com/products-secure.asp?gclid=CMOp8cbpkZoCFSMeDQod_XxmFg

  • HIPAA = Health Insurance Portability Accountabilbity Act of 1996

    Under "Wrongful Disclosure of Individually Identifiable Health Information," Section 1177 states that a person who knowingly discloses individually identifiable health information to another person shall be fined not more than [highlight=#ffff11]$50,000, imprisoned not more than 1 year, or both[/highlight].

    I thought it was worth a couple weeks work to avoid should my laptop get stolen out of my car! Especially since criminal penalties (should someone decide that I personnally gained from getting robbed!!) go up to $250,000 and ten years.

    We shred reports from our obscured database if it's not obvious that it's obscured data, just to prevent misunderstandings. (Several columns have "patient info here" and all phone numbers are 555-1234 in the obscured db.)

  • Camouflage Software also offers a FREE trial version for evaluation purposes, it features our Data discovery and data masking modules.

    Leveraging this evaluation software will allow you to quickly get a sense of where sensitive data resides within the databases you manage, often finding sensitive data where you least expect it. The free evaluation version also highlights how easily this discovered data can be masked to ensure it is protected from unwanted disclosure.

    Visit http://www.datamasking.com to download!

  • Almost every place I have worked:

    - production data is not allowed outside of the production environment

    - pre-production test environment is regarded as a live environment

    - other test enviroments - dev, integration testing etc. - has its own set of data, which is usually benign

    I would say that most of the projects I have worked on there would be legal ramifications for using a clients live data. The Data Protection Act would make it illegal for you to divulge personal data to a third party without consent.

    Over an above that, when I have worked in a pre-sales environment, there is usually a "demo" database which has been specifically crafted to have realistic data that does not relate to anyone.

  • CAGreensfelder (4/28/2009)


    HIPAA = Health Insurance Portability Accountabilbity Act of 1996

    Under "Wrongful Disclosure of Individually Identifiable Health Information," Section 1177 states that a person who knowingly discloses individually identifiable health information to another person shall be fined not more than [highlight=#ffff11]$50,000, imprisoned not more than 1 year, or both[/highlight].

    I thought it was worth a couple weeks work to avoid should my laptop get stolen out of my car! Especially since criminal penalties (should someone decide that I personnally gained from getting robbed!!) go up to $250,000 and ten years.

    We shred reports from our obscured database if it's not obvious that it's obscured data, just to prevent misunderstandings. (Several columns have "patient info here" and all phone numbers are 555-1234 in the obscured db.)

    I might be mis-reading your post but are you saying you actually have a copy of production data on your laptop?

  • It depends on the industry and type application. My projects are data centric, mostly DWH. Once you get into integration across multiple systems, you need to put massive effort into scrambling if you need to do more than just aimple masking of certain business attributes / business keys. A relatively few number of profiling attributes might be sufficient to uniquely identify most customers. However if you start scrambling tramsactions, it gets increasingly difficult to verify any more complex processing for derived data (aggregates or otherwise). Add to this integration across systems - you need the same scrambling applied to business keys in multiple systems - and there is the pretty big mess. So really you usually need very strict and broad regulatory requirements or similar to justify the effort.

  • Some 20+ years ago, I was a member of a team developing a CRM system. I was tasked with producing test data for it. We had no production data (and using prod data is a big no-no here in the UK) so I wrote a program to synthesise the test data. It was lot of nested loops with prime number loop counts, and it generated a database of representative size (which was one of the requirements).

    The advantage was that although you didn't know what results you would get for a particular query, you could accurately predict the size of the result set.

    It didn't take long to write, took rather longer to run, as you might imagine.

  • We do have SQL Data Generator which is useful - more often for masking production data we edit fields to read something like [tablename] [surrogate_key] e.g. Bates' Farm might become Customer 123 at Address Line 456 or whatever. Simple but adequate for demoing.

  • I do know of at least one freeware data obfuscation suite: http://www.wintestgear.com/products/MSSQLDataMask/MSSQLDataMask.html.

    So if you need such a tool (especially if it's for a one-off), this one is definitely worth a look as it doesn't have any silly of the silly limitations which makes many of the 'free evaluation' versions of commercial tools unusable in practice, even for relatively small databases.

    I have no connection whatever with the author of this; it's just a tool I've used in the past which did all I required of it.

  • We still have this same old problem today.

  • CAGreensfelder (3/7/2008)


    We work with medical data so we CANNOT have copies on our local machines for developement work.

    I used to work at a small software developer making software for behavioral healthcare. Very little money there, but as CAGreensfelder said, you don't mess around with HIPAA. We also invested quite a bit of time in scripts that would obfuscate sensitive data but still keep it a somewhat representative distribution (I won't repeat the reasons already cited why you want the data to still be representative). Because we would frequently work on copies of customer databases to troubleshoot problems (like most business software everyone had piles of customization that caused unique problems) it was pretty much a requirement that we could give the customers this script to run on a backup database before letting us look at it.

    I'm glad I don't have to work with those requirements anymore. It's great that there are third party products, but I wonder if anyone has real-world experience with them, and whether they help that much? I can't see getting around having to do a lot of manual work... in the real world, too many oblique column names, too many free text "note" fields that people can tuck sensitive information into. I suppose you can at least generalize lists of fake names/numbers/addresses, and good software might autodetect some fields, but you probably still have to tweak it.

  • Steve Jones - SSC Editor (4/27/2009)


    Most security products I know are looking for tens of thousands of dollars. A little outrageous from my perspective because the risk isn't high to most people.

    Honestly, I suspect that is exactly WHY it is so expensive. Very few people are going to pay for this so it's a niche product. But the handful of people who DO need it, are probably going to need it badly; a lot of businesses, especially those covered by HIPAA, would be put out of business by a significant breach.

    Unfortunately, as others have said, it quickly becomes cheaper just to hire someone to write your own obfuscation script. Even if it takes them weeks (which it probably won't).

    I suppose for some large companies you're basically buying a scapegoat if something goes wrong? If you are significantly large and profitable, it is easier to shift liability to an outside entity than to trust (or burden) your own people. This might not indicate a healthy business, but there's plenty of money to be made on the unhealthy practices of others.

  • If you're going to do something like copy production data down to lower (ie: Dev and QA) environments, then you need a formalized process that is approved and scoped by executive management to start with (what data, when is it refreshed, how is it masked, and who has access), architected with privacy as the primary requirement, and then both QA and executive management should sign off on the end result.

    If you don't have a formalized QA re-load process, then developers will do it in an ad-hoc fashion as needed, and not even a 3rd party masking tool can mitigate all the risks when that happens.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Viewing 15 posts - 16 through 30 (of 37 total)

You must be logged in to reply to this topic. Login to reply