How Can We Reveal Bias in Computer Algorithms?

A legal scholar and a computer scientist explored how to limit machine learning biases.

Many of us may take for granted that we can create social media profiles using our own names. But two years ago Facebook made headlines when a number of Native American users, including Dana Lone Hill and Lance Browneyes, reportedly were forced to edit their names to gain access. Others apparently faced suspension from the social media platform under suspicion that they had provided fake names.

Facebook has since changed its name permission policies and, presumably, also changed the algorithm underlying the suspensions. During a recent workshop at the University of Pennsylvania Law School, computer scientist Sorelle Friedler highlighted the Facebook controversy as an example of how bias can creep into computer codes in ways that designers do not foresee when writing them.

Such encoded biases become an even greater concern when government authorities turn to algorithms for assistance. In March, Friedler, an assistant professor at Haverford College, joined Andrew Selbst, a lawyer and currently a Postdoctoral Scholar at Data & Society Research Institute, to discuss at the Penn Law workshop how legal and scientific tools could be used to overcome transparency and accountability concerns associated with government use of computer algorithms.

Professor Cary Coglianese moderates discussion among the panelists.

Selbst framed the conversation around how authorities could think about regulating machine-learning systems that generate inscrutable decisions—that is, decisions that lack intuitive reason. A simple requirement that government provide an explanation does not suffice when dealing with inscrutable systems, Selbst argued. Instead, transparency rules for decisions based on machine learning should require the agency to address three questions: What happened in a given individual case? How are algorithmic decisions made based on the algorithm’s underlying logic? Why did the computer model contain the normative choices and assumptions built into it that it did?

To help show how government could make machine learning more accountable, Selbst referred to private sector machine learning practices that are subject to existing regulations that partially meet his three-part test for an adequate explanation. For example, two federal laws and accompanying regulations seek to prevent discrimination and provide consumers with greater transparency in credit decisions. The policies accomplish these goals by mandating financial institutions explain to each denied customer why their credit request was rejected. Selbst explained that requiring adverse action notices amounts to answering the “what” part of his framework. The notices—which must include the key reasons for denial, such as insufficient income or missing record of address—inform customers what happened in their individual cases.

New regulations in the European Union (EU), meanwhile, answer the “how” part of the explanation framework, Selbst said. In an effort to shed light on automated decision-making, the revised EU General Data Protection Regulation requires the company or agency using machine learning to provide impacted individuals with “meaningful information about the logic” underlying the algorithm. The new regulations will become effective in May 2018 but Selbst indicated that the EU’s “meaningful information” phrasing would help improve transparency and reveal bias if applied to U.S. agencies’ algorithmic decisions.

Andrew Selbst discusses transparency and accountability concerns with government reliance on machine learning.

Friedler, the computer scientist, explored technological approaches to auditing machine learning systems for bias. She explained that agencies could check algorithms for unintended, biased outcomes by creating a second version of their code using the same inputs to test for similar outputs. Alternatively, engineers could audit machine-learning systems to understand how much resulting decisions rely on a single variable. For example, removing income as a factor in issuing a credit decision would explain how important an applicants’ earnings are to their credit scores.

Another method would allow auditors to uncover indirect influence of a certain variable included in an algorithm, Friedler continued. Sometimes biased outcomes result from indirect—or proxy—variables, she explained, making the indirect audit an important tool for regulators who must consider whether policies disproportionately impact members of protected classes.

Friedler turned to the private sector for an example. When it launched same-day delivery last year, Amazon encountered how strongly customers’ zip codes were tied to their race in some U.S. cities. As a result, the e-commerce company’s decision to roll out the expedited delivery option in a limited number of zip codes meant that majority-African American neighborhoods were cut out of the service in cities like Boston, Chicago, and New York. Auditing algorithms for the influence of proxy variables can help reveal and prevent biased outcomes, Friedler concluded.

Friedler and Selbst agreed machine learning holds great promise to improve government decision-making in many policy areas. But they both also emphasized that policymakers must remain vigilant about detecting real, even if unintended, bias that can result from machine learning. Both computer science and legal tools can aid in this effort.

The workshop was the sixth installment of the seven-part Optimizing Government series, which was supported by the Fels Policy Research Initiative. Cary Coglianese, a professor at Penn Law and the Director of the Penn Program on Regulation, moderated the discussion.

This essay is part of a seven-part series, entitled Optimizing Government.

Tagged: Administrative Law, Big Data, European Union, machine learning, regulation