Measure Ethical Data Analysis by Outcomes, Not Intent

Note: co-authored with Christine Assaf, originally published in the now-defunct PASS Blog.

In January 2020, Robert Williams was arrested on his front lawn in a suburb of Detroit, Mich., after police scanned state driver’s license photos and matched his face to grainy surveillance camera footage. He was held for 30 hours in police custody on suspicion of theft of 5 luxury watches, months earlier. He was innocent. In the resulting lawsuit and public scrutiny, the Detroit Chief of Police admitted in a public meeting in June 2020 that their facial recognition software would misidentify suspects 95-97% of the time.

A 2019 NIST study found larger error rates for darker-skinned samples across 99 different facial recognition software providers, including those used by the Michigan State Police software package.

In a 2017 MIT study, leading facial recognition software technologies provided by Microsoft, Face++, and IBM were put to the test on front-facing portrait style headshots. Their accuracy was lowest when evaluating darker-skinned faces. For example, in the case of IBM’s Watson, there was a 34.4% difference in error rate between lighter males and darker females, while 93.6% of Azure Face API’s gender identification failures were on darker-skinned subjects.

The reasons for the embarrassing inaccuracy are being pursued, and could be speculated, from the photo contrast levels to an unrepresentative data set used to train the models. However, it is instead the inaccurate outcomes that are the cause for concern. The potential for a civil rights nightmare is clear and present.

Robert Williams, the Michigan man arrested based on a faulty match, was black. In many Western countries including the United States, communities of color are more scrutinized by the criminal justice system. There is identifiable discrimination based on criminal justice outcomes, including ethnic and gender biases in sentencing, especially when judges have discretion.

Data collection and analysis without regard to the potential for disparate outcomes may reinforce and institutionalize a society’s discriminatory history. Appropriately, Microsoft, IBM, and Amazon announced in June that they would no longer license their facial recognition platforms for use by law enforcement.

Ethical outcomes don’t care about your intentions. Systems can only be judged by their impact, not by intentions vulnerable to revisionism and disconnected from outcomes. Algorithms, as we in the data community well understand, are not infallible, incorruptible oracles. We know better.

When we presented on this topic in May 2020 at a virtual conference, we were asked somewhat incredulously if we favored regulation of software development.

Frankly, our answer is yes.

Engineers who make bridges need a regulated licensing structure around a Professional Engineer stamp, because we don’t want bridges collapsing. Doctors need board certifications to make sure people get evidence-based medical care. From the Therac-25 software bug to the Boeing 737 MAX, perhaps we as data professionals specifically, and software developers generally, do need some regulation. Or, at least, ethical commitments to practice.

Oren Etzioni, CEO of the Allen Institute for Artificial Intelligence and a professor at the University of Washington, has proposed just that: a modified version of a medical doctor’s Hippocratic Oath. He proposed a Hippocratic Oath for artificial intelligence practitioners, with this statement at its heart: “I will consider the impact of my work on fairness both in perpetuating historical biases, which is caused by the blind extrapolation from past data to future predictions, and in creating new conditions that increase economic or other inequality.”

There is considerable need for this type of commitment to our quickly evolving use of modern datasets. Colin Allen, a researcher in Cognitive Science and History at the University of Pittsburgh, summarized the need nearly a decade ago: “Just as we can envisage machines with increasing degrees of autonomy from human oversight, we can envisage machines whose controls involve increasing degrees of sensitivity to things that matter ethically. Not perfect machines, to be sure, but better.”

Measure Ethical Data Analysis by Outcomes, Not Intent

Rate

Share

Share

Rate