Anonymisation Confusion

  • Comments posted to this topic are about the item Anonymisation Confusion

  • Thaks to big data, the cat's out of the bag. There is virtually no such thing as anonymous data in usable information. Only different degrees.

    Anonymized medical data in Australia

    https://www.zdnet.com/article/re-identification-possible-with-australian-de-identified-medicare-and-pbs-open-data/

    De-anonymizing data in the 2012 presidential election: (long article, written back when deep analysis of people's voting habits was 'cool')

    https://www.technologyreview.com/s/509026/how-obamas-team-used-big-data-to-rally-voters/

    One brief excerpt:

    Davidsen began negotiating to have research firms repackage their data in a form that would permit the campaign to access the individual histories without violating the cable providers’ privacy standards. Under a $350,000 deal she worked out with one company, Rentrak, the campaign provided a list of persuadable voters and their addresses, derived from its microtargeting models, and the company looked for them in the cable providers’ billing files. When a record matched, ­Rentrak would issue it a unique household ID that identified viewing data from a single set-top box but masked any personally identifiable information.

    ...campaign had created its own television ratings system, a kind of Nielsen in which the only viewers who mattered were those not yet fully committed to a presidential candidate

    ...

    -- FORTRAN manual for Xerox Computers --

  • jay-h - Wednesday, May 9, 2018 6:41 AM

    Thaks to big data, the cat's out of the bag. There is virtually no such thing as anonymous data in usable information. Only different degrees.

    Anonymized medical data in Australia

    https://www.zdnet.com/article/re-identification-possible-with-australian-de-identified-medicare-and-pbs-open-data/

    De-anonymizing data in the 2012 presidential election: (long article, written back when deep analysis of people's voting habits was 'cool')

    https://www.technologyreview.com/s/509026/how-obamas-team-used-big-data-to-rally-voters/

    One brief excerpt:

    Davidsen began negotiating to have research firms repackage their data in a form that would permit the campaign to access the individual histories without violating the cable providers’ privacy standards. Under a $350,000 deal she worked out with one company, Rentrak, the campaign provided a list of persuadable voters and their addresses, derived from its microtargeting models, and the company looked for them in the cable providers’ billing files. When a record matched, ­Rentrak would issue it a unique household ID that identified viewing data from a single set-top box but masked any personally identifiable information.

    ...campaign had created its own television ratings system, a kind of Nielsen in which the only viewers who mattered were those not yet fully committed to a presidential candidate

    Anonymous just means that no-one has yet managed to link dataset A (Anonymous) with dataset B (Because we know who you are now)! Can you have useful data without it directly relating to reality and if it relates to reality in any specific sense then it can almost certainly be de-anonymised at some point once you have enough data to correlate with it.
    ... I also meant to say... I absolutely agree with you.

  • jay-h - Wednesday, May 9, 2018 6:41 AM

    Thaks to big data, the cat's out of the bag. There is virtually no such thing as anonymous data in usable information. Only different degrees.

    Anonymized medical data in Australia
    https://www.zdnet.com/article/re-identification-possible-with-australian-de-identified-medicare-and-pbs-open-data/

    De-anonymizing data in the 2012 presidential election: (long article, written back when deep analysis of people's voting habits was 'cool')
    https://www.technologyreview.com/s/509026/how-obamas-team-used-big-data-to-rally-voters/

    One brief excerpt:

    Davidsen began negotiating to have research firms repackage their data in a form that would permit the campaign to access the individual histories without violating the cable providers’ privacy standards. Under a $350,000 deal she worked out with one company, Rentrak, the campaign provided a list of persuadable voters and their addresses, derived from its microtargeting models, and the company looked for them in the cable providers’ billing files. When a record matched, ­Rentrak would issue it a unique household ID that identified viewing data from a single set-top box but masked any personally identifiable information.

    ...campaign had created its own television ratings system, a kind of Nielsen in which the only viewers who mattered were those not yet fully committed to a presidential candidate

    The worst part of it isn't that the data could ultimately sway the outcome of an election. I believe that using data analytics to identify a reliable list of "persuadable voters" is simply b--- s---, regardless of the nature or extent of the data. Instead, what's concerning to me is that is this vast hoard of data will be left improperly secured by technically incompetent snake oil salesmen, and then subsequently breached by 3rd party hackers who will leverage the data to commit identity or financial fraud.
    https://www.upguard.com/breaches/the-rnc-files

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • I agree with you, Steve, at least in principal. And its one we've recently adopted here. But we're also experiencing a lot of push back by the users. The first big pushback is from a third party app we've got. We paid dearly for this app. It has some serious UI/UX issues, one of which is for the most part it doesn't allow the user to change anything they save. Of course, since humans are using it, they're going to make mistakes. And its then incumbent upon a colleague and I to fix to myriad number of errors entered by users. To do this we have to run one of about 40 different SQL scripts the vendor provided. (Why they didn't bother to fix the app instead of supply SQL scripts to fix errors, is beyond me.) And the supervisor of users requires us to restore production to test so we can first run one of the SQL scripts in test and he can verify that its OK. Then we do the same thing to production. But there is no way he will work with anonymized data! He has made that abundantly clear and forced the issue. So, the best we can do is delete all of the data out of test, after this laborious task is completed.

    The second thing is getting users to test changes we make in applications we're writing or enhancements we're making to existing applications. I know, from experience since adopting the idea of anonymizing data, that you know what can freeze over before any user will run and test changes. Thus no feedback. Or I should say none until deadlines pressure us to release the changes. Then you won't believe how loud users scream because they don't like what they see. Its a losing situation for us. Honestly, I don't like this at all. I'd like to know if anyone knows of a way out of this path to frustration.

    Kindest Regards, Rod Connect with me on LinkedIn.

  • Rod at work - Wednesday, May 9, 2018 10:50 AM

    And the supervisor of users requires us ...

    The second thing is ...

    There's nothing you can do if someone refuses to change and they are in charge. Ultimately I hope that they're proven to be right and no data gets lost.

    What I might do is for both these cases, get them to agree that there are xx types  of data that represent our business. Often we have 10, 20, 50 "transactions" with our data that are repeated millions of times in a database. meaning there are certain cases we need to ensure the system works with. If you can do that, then anonymize or replace everything and then inject those data rows into the test system. That way the dev knows to look for certain accounts to test with, but it's not real data.

  • I work in the health/well-being sector. The first rumblings I got about anonymising data was about three years ago What I shall call a healthcare provider stored results of tests on one computer. The IT police than decided the data should be anonymous so the department changed everybody's name to a reference code. Then on a laptop they put a lookup database to convert a person to a reference code and vice-versa. To me the amusing thing was that both systems had week passwords that many here would have cracked in under half-an-hour.

    Part of the problem with health care is that the focus is on the condition and sometimes forgets the patient. I worry that anonymising data too much could jeopardise outcomes. I have been a cardiac patient for over a year and am awaiting surgery. Having been to numerous clinics, had many tests and read a number of books on the subject (Dr Google has far too much inaccurate and mis- information) it is clear the person is very relevant. The person's age, BMI, cholesterol, if they smoke, pre-existing conditions, etc. are all totally relevant. Yet elsewhere I have heard of those in charge wanting to delete some pre-existing conditions to increase anonymity> doing this could result in data mining giving data nonsense. This in turn could affect outcomes. To me a much more important issue in health care is security!

  • Steve Jones - SSC Editor - Wednesday, May 9, 2018 11:02 AM

    Rod at work - Wednesday, May 9, 2018 10:50 AM

    And the supervisor of users requires us ...

    The second thing is ...

    There's nothing you can do if someone refuses to change and they are in charge. Ultimately I hope that they're proven to be right and no data gets lost.

    What I might do is for both these cases, get them to agree that there are xx types  of data that represent our business. Often we have 10, 20, 50 "transactions" with our data that are repeated millions of times in a database. meaning there are certain cases we need to ensure the system works with. If you can do that, then anonymize or replace everything and then inject those data rows into the test system. That way the dev knows to look for certain accounts to test with, but it's not real data.

    Those are good ideas, Steve. Thanks.

    Kindest Regards, Rod Connect with me on LinkedIn.

  • Eric M Russell - Wednesday, May 9, 2018 10:05 AM

    The worst part of it isn't that the data could ultimately sway the outcome of an election. I believe that using data analytics to identify a reliable list of "persuadable voters" is simply b--- s---, regardless of the nature or extent of the data. Instead, what's concerning to me is that is this vast hoard of data will be left improperly secured by technically incompetent snake oil salesmen, and then subsequently breached by 3rd party hackers who will leverage the data to commit identity or financial fraud.
    https://www.upguard.com/breaches/the-rnc-files

    Totally agree - if people are persuadable by a couple of adverts on their facebook / mobile phone is that something we can really do much about? Gullability is not something we can really legislate against. I live in hope that people can understand that if a decision is important they need to corroborate evidence and its a good idea not take it for written that someone  trying to sell you something  is telling the truth ( why I like forums they are about as neutral as you can get )

    cloudydatablog.net

  • mjh 45389 - Thursday, May 10, 2018 5:49 AM

    Part of the problem with health care is that the focus is on the condition and sometimes forgets the patient. I worry that anonymising data too much could jeopardise outcomes. I have been a cardiac patient for over a year and am awaiting surgery. Having been to numerous clinics, had many tests and read a number of books on the subject (Dr Google has far too much inaccurate and mis- information) it is clear the person is very relevant. The person's age, BMI, cholesterol, if they smoke, pre-existing conditions, etc. are all totally relevant. Yet elsewhere I have heard of those in charge wanting to delete some pre-existing conditions to increase anonymity> doing this could result in data mining giving data nonsense. This in turn could affect outcomes. To me a much more important issue in health care is security!

    Again agree - simple processes (even if they have rich data) I believe are the way forward for security - I tend to think that deleting data is often counter productive

    cloudydatablog.net

  • Dalkeith - Monday, May 14, 2018 8:30 AM

    Eric M Russell - Wednesday, May 9, 2018 10:05 AM

    The worst part of it isn't that the data could ultimately sway the outcome of an election. I believe that using data analytics to identify a reliable list of "persuadable voters" is simply b--- s---, regardless of the nature or extent of the data. Instead, what's concerning to me is that is this vast hoard of data will be left improperly secured by technically incompetent snake oil salesmen, and then subsequently breached by 3rd party hackers who will leverage the data to commit identity or financial fraud.
    https://www.upguard.com/breaches/the-rnc-files

    Totally agree - if people are persuadable by a couple of adverts on their facebook / mobile phone is that something we can really do much about? Gullability is not something we can really legislate against. I live in hope that people can understand that if a decision is important they need to corroborate evidence and its a good idea not take it for written that someone  trying to sell you something  is telling the truth ( why I like forums they are about as neutral as you can get )

    Agree absolutely. If people are they gullible they will also believe what is printed in tabloid newspapers (that always have an agenda)...

  • Dalkeith - Monday, May 14, 2018 8:30 AM

    Eric M Russell - Wednesday, May 9, 2018 10:05 AM

    The worst part of it isn't that the data could ultimately sway the outcome of an election. I believe that using data analytics to identify a reliable list of "persuadable voters" is simply b--- s---, regardless of the nature or extent of the data. Instead, what's concerning to me is that is this vast hoard of data will be left improperly secured by technically incompetent snake oil salesmen, and then subsequently breached by 3rd party hackers who will leverage the data to commit identity or financial fraud.
    https://www.upguard.com/breaches/the-rnc-files

    Totally agree - if people are persuadable by a couple of adverts on their facebook / mobile phone is that something we can really do much about? Gullability is not something we can really legislate against. I live in hope that people can understand that if a decision is important they need to corroborate evidence and its a good idea not take it for written that someone  trying to sell you something  is telling the truth ( why I like forums they are about as neutral as you can get )

    What the "Russians" did with the US election is 2016 was really just a sloppy and much too obvious version of what the DNC, RNC and corporations have been doing for years in a more methodical and effective way. I don't think any of that stuff influenced the outcome of the election, but what it did do is expose gaping holes in how our IT, social media, and election process works. If the folks behind the "hack" were really just digital privacy activists who framed the Russian government, seeking to the influence the outcome of digital protection legislation here in the US, then hats off to them. Mission accomplished, and bravo.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell - Monday, May 14, 2018 9:00 AM

    What the "Russians" did with the US election is 2016 was really just a sloppy and much too obvious version of what the DNC, RNC and corporations have been doing for years in a more methodical and effective way. I don't think any of that stuff influenced the outcome of the election, ...

    Considering that the DNC spent over 1B, and the 'Russians' expense was at most in the thousands (and apparently they played both sides of the fence), I find it really hard to believe they were a major player.

    ...

    -- FORTRAN manual for Xerox Computers --

  • jay-h - Monday, May 14, 2018 9:19 AM

    Eric M Russell - Monday, May 14, 2018 9:00 AM

    What the "Russians" did with the US election is 2016 was really just a sloppy and much too obvious version of what the DNC, RNC and corporations have been doing for years in a more methodical and effective way. I don't think any of that stuff influenced the outcome of the election, ...

    Considering that the DNC spent over 1B, and the 'Russians' expense was at most in the thousands (and apparently they played both sides of the fence), I find it really hard to believe they were a major player.

    I think that going forward, boilerplate political add campaigns on social media will just get filtered out by user preference settings. Maybe the aftermath of 2016 will help level the playing field for 3rd party and independent political candidates who don't have as much money to work with. For example, it costs practically nothing for a candidate to record a no-frills YouTube video where they speak directly to the audience, and that has the potential to be far more influential than even the best professionally produced adds. In 2020, could someone get elected president of the United States on a $10 million budget? Maybe, if they are clever about it, methodical, and know how to push the right issues and psychological buttons.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • jay-h - Monday, May 14, 2018 9:19 AM

    Considering that the DNC spent over 1B, and the 'Russians' expense was at most in the thousands (and apparently they played both sides of the fence), I find it really hard to believe they were a major player.

    I doubt this too, but don't ignore the fact that much of that $1b was in traditional media and there was tremendous amplification of non-traditional, online, social media stuff for much lower costs.

    What's scary isn't that there was material difference in the end result, but that there could be as better techniques are used to influence with new media.

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply