The Social Impact of Data

  • Comments posted to this topic are about the item The Social Impact of Data

  • "The problem is that humans that program the systems might have some bias. Maybe more disconcerting is that the data used to train systems is likely biased as well."

    Very well said.  This has been my point for years regarding what is commonly called 'Artificial Intelligence'.

    Please pardon my use of terms here, this is not intended to be a SOCIAL analogy but a PHYSIOLOGICAL analogy.  Looking at data is always affected by color blindness.  Two people look at the same data and see entirely different things.

     

     

    Rick
    Disaster Recovery = Backup ( Backup ( Your Backup ) )

  • It's incredibly concerning. The most worrying thing is that, having gotten onto some list through bad data matching or a poor algorithm, it's often incredibly hard to get off, because the reason they used that methodology in the first place is  that the agency involved is too underfunded and struggling to use human resources to make the link. This makes it doubly hard to get any sensible recourse.

    I read 'Weapons of Math Destruction' a while, back and the book is replete with examples of the bed effects of poor use of algorithms. The market is the great weak point - the book makes the point that nowadays if someone tries to steal your identity, this often lands in your lap and generates an enormous amount of paperwork for you - even though you did nothing, were not involved in any way, and that it was the bank's verification mechanisms that came up short. The banks have found that they can save a lot of money and time by simply making this your problem.

    Another very scary angle I heard discussed on a podcast was regarding just how hard it is to get an algorithm to work properly and the governance implications of this. They discussed Amazon's use of a hire-screening algorithm, which was at first found to be very sexist - they hired less women, so the algorithm assumed that being a man was a characteristic that would give a better chance of being hired and optimised for that. Amazon was a mature enough company that they recognised this, and spent a lot of money retraining the algorithm several times, eventually giving up and scrapping the whole program. The podcast then made the point that a lot of organisations aren't as deep pocketed or mature as Amazon. Often, after investing a huge amount of time and money creating and training an algorithm, a failure would spell the end of one or more people's career at that organisation, so there is a huge incentive to sweep any problems under the rug and soldier on. This is exacerbated by the fact that very specific analysis is required to understand failures or weaknesses in the algorithm, something that may not be part of the organisation's culture, particularly given staff turnover over time.

    There clearly needs to be special governance and budgeting requirements around use of algorithms to balance these very dangerous incentives to underestimate effort and overestimate result, with dangerous effect.

    • This reply was modified 4 years, 5 months ago by  ben.mcintyre.
  • What I've found is that "Data" is always correct.  "Information" derived from the data is not.  And even if the "Data" is skewed, it is still correct but incomplete.

    I love one of my Grandmother's old sayings... "Figures can be made to lie... and liars figure".

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    What I've found is that "Data" is always correct.  "Information" derived from the data is not.  And even if the "Data" is skewed, it is still correct but incomplete.

    I love one of my Grandmother's old sayings... "Figures can be made to lie... and liars figure".

    Jeff, I have to disagree.  I've had experiences that convince me otherwise.  About a year ago my single credit card account was somehow compromised in a central Indiana city on a day I was travelling across Nebraska, which was evident by other charges made at various geographical points.  Essentially, the 'data' indicated I was in two states at the same time, thus obviously invalid.  The compromised data consisted of nearly $1500.00 of charges over about three days before the card company fortunately detected the problem. handled it, and made good on every cent of the fraud.

    That underlying 'data' was presented as 'information' on my statement.  In this situation the 'information' was technically correct in it's content, while the underlying 'data' was not valid.  While it correctly recorded a transaction, the content of the transaction was not accurate.

    Our version of the saying was "Figures don't lie but liars figure".  Unfortunately, that is not always the whole picture.

    Seems to me that the real world is that:

    Data may or may not be valid.  Information based on invalid data can NEVER be valid.

     

     

    Rick
    Disaster Recovery = Backup ( Backup ( Your Backup ) )

  • skeleton567 wrote:

    Jeff Moden wrote:

    What I've found is that "Data" is always correct.  "Information" derived from the data is not.  And even if the "Data" is skewed, it is still correct but incomplete.

    I love one of my Grandmother's old sayings... "Figures can be made to lie... and liars figure".

    Jeff, I have to disagree.  I've had experiences that convince me otherwise.  About a year ago my single credit card account was somehow compromised in a central Indiana city on a day I was travelling across Nebraska, which was evident by other charges made at various geographical points.  Essentially, the 'data' indicated I was in two states at the same time, thus obviously invalid.  The compromised data consisted of nearly $1500.00 of charges over about three days before the card company fortunately detected the problem. handled it, and made good on every cent of the fraud.

    That underlying 'data' was presented as 'information' on my statement.  In this situation the 'information' was technically correct in it's content, while the underlying 'data' was not valid.  While it correctly recorded a transaction, the content of the transaction was not accurate.

    Our version of the saying was "Figures don't lie but liars figure".  Unfortunately, that is not always the whole picture.

    Seems to me that the real world is that:

    Data may or may not be valid.  Information based on invalid data can NEVER be valid.

    In the case of your credit card, it actually proves my point.  The "data" was not only absolutely correct, but fairly well obvious.  It said that your credit card was used in India and the rest of the data said that's not normal.  The person using your credit card was a liar.  The people with the data did actually interpret it correctly and so saved you a problem.

    To be sure, though, when humans get involved with interpretation of "BI", it easily becomes a train wreck.   I worked for a large DOD company way back when PCs were a new thing.  I interfaced some very large mainframe data with the PCs on demand to be able to produce some rather important reports that were due each month.  It's wasn't my job to analyze the data but you can't help but do so when you have to analyze the data for correctness.

    I noticed some things in the data and ended up making a summary of the whole business plan... it basically said that it was a "going out of business" plan and that we'd have to lay off about half the people in two years and some number of months.  To make a really long story much shorter, I took my findings up the chain of command all the way to the GM and was told, every step of the way, that "it wasn't your concern" and "you don't have a degree" and "you're not an accountant".

    On the month I predicted and in spite of all the "company-wide town halls" where they told us how good things were going, they had to lay off half the people in the very month I predicted.

    Some people actually died because of it.  Either they could no longer afford their own medication or someone that lived with them couldn't.  It was totally preventable but liars were figuring just to make themselves look good.

    I've seen it in a lot of other places, as well, and it has driven me to the point where I've deemed "BI" to be mostly an oxymoron.  As with your credit card problem, there are exceptions.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • It's not always a wreck. Credit card fraud detection is an example that has gotten, very, very good. Not always, not everywhere, and not 100%, but this is a place that has improved.

    There are other examples, plenty of them where AI and data gets really good results. There are also plenty of examples where it goes wrong. Same with BI.

    My point more is that we need to be careful here, and do a better job of analyzing our data, transparently disclosing how it is used, and reacting when there are problems. I don't know I completely trust private enterprise to do this, nor government. We need some collaboration.

  • Steve Jones - SSC Editor wrote:

    My point more is that we need to be careful here, and do a better job of analyzing our data, transparently disclosing how it is used, and reacting when there are problems. I don't know I completely trust private enterprise to do this, nor government. We need some collaboration.

    Critical to the needed collaboration is the collaboration of management to be willing to make needed corrections when we discover and report problems.  Collaboration needs to include willingness to correct problems AND to correct historical data where possible, or at least indicate that it may not be entirely accurate.  Preferably this would happen before resulting law suits force things.

     

    Rick
    Disaster Recovery = Backup ( Backup ( Your Backup ) )

  • ben.mcintyre wrote:

    It's incredibly concerning. The most worrying thing is that, having gotten onto some list through bad data matching or a poor algorithm, it's often incredibly hard to get off, because the reason they used that methodology in the first place is  that the agency involved is too underfunded and struggling to use human resources to make the link. This makes it doubly hard to get any sensible recourse.

    I would venture to say that the reason it is so difficult to get off some lists is actually due to the purposeful design of the algorithm.

    Rick
    Disaster Recovery = Backup ( Backup ( Your Backup ) )

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply