Become a Patron! 



Excerpted from: Sandra G. Mayson, Bias In, Bias Out, 128 Yale Law Journal 2218 (June, 2019) (284 Footnotes) (Full Document)


SandraGMayson“There's software used across the country to predict future criminals. And it's biased against blacks.” So proclaimed an exposé by the news outlet ProPublica in the summer of 2016. The story focused on a particular algorithmic tool, COMPAS, but its ambition and effect was to stir alarm about the ascendance of algorithmic crime prediction overall.

The ProPublica story, Machine Bias, was emblematic of broader trends. The age of algorithms is upon us. Automated prediction programs now make decisions that affect every aspect of our lives. Soon such programs will drive our cars, but for now they shape advertising, credit lending, hiring, policing--just about any governmental or commercial activity that has some predictive component. There is reason for this shift. Algorithmic prediction is profoundly more efficient, and often more accurate, than is human judgment. It eliminates the irrational biases that skew so much of our decision-making. But it has become abundantly clear that machines too can discriminate. Algorithmic prediction has the potential to perpetuate or amplify social inequality, all while maintaining the veneer of high-tech objectivity.

Nowhere is the concern with algorithmic bias more acute than in criminal justice. Over the last five years, criminal justice risk assessment has spread rapidly. In this context, “risk assessment” is shorthand for the actuarial measurement of some defined risk, usually the risk that the person assessed will commit future crime. The concern with future crime is not new; police, prosecutors, judges, probation officers, and parole officers have long been tasked with making subjective determinations of dangerousness. The recent shift is from subjective to actuarial assessment. With the rise of big data and bipartisan ambitions to be smart on crime, algorithmic risk assessment has taken the criminal justice system by storm. It is the linchpin of the bail-reform movement; the cutting edge of policing; and increasingly used in charging, sentencing, and allocating supervision resources.

This development has sparked profound concern about the racial impact of risk assessment. Given that algorithmic crime prediction tends to rely on factors heavily correlated with race, it appears poised to entrench the inexcusable racial disparity so characteristic of our justice system, and to dignify the cultural trope of black criminality with the gloss of science.

Thankfully, we have reached a moment in which the prospect of exacerbating racial disparity in criminal justice is widely understood to be unacceptable. And so, in this context as elsewhere, the prospect of algorithmic discrimination has generated calls for interventions in the predictive process to ensure racial equity. Yet this raises the difficult question of what racial equity looks like. The challenge is that there are many possible metrics of racial equity in statistical prediction, and some of them are mutually exclusive. The law provides no useful guidance about which to prioritize. In the void, data scientists are exploring different statistical measures of equality and different technical methods to achieve them. Legal scholars have also begun to weigh in. Outside the ivory tower, this debate is happening in courts, city-council chambers, and community meetings. The stakes are real. Criminal justice institutions must decide whether to adopt risk-assessment tools and, if so, what measure of equality to demand that those tools fulfill. They are making these decisions even as this Article goes to print.

Among racial-justice advocates engaged in the debate, a few common themes have emerged. The first is a demand that race, and factors that correlate heavily with race, be excluded as input variables for prediction. The second is a call for “algorithmic affirmative action” to equalize adverse predictions across racial lines. To the extent that scholars have grappled with the necessity of prioritizing a particular equality measure, they have mostly urged stakeholders to demand equality in the false-positive and false-negative rates for each racial group, or in the overall rate of adverse predictions across groups (“statistical parity”). Lastly, critics argue that, if algorithmic risk assessment cannot be made meaningfully race neutral, the criminal justice system must reject algorithmic methods altogether.

This Article contends that these three strategies-- colorblindness, efforts to equalize predictive outputs by race, and the rejection of algorithmic methods--are at best inadequate, and at worst counterproductive, because they ignore the real source of the problem: the nature of prediction itself. All prediction functions like a mirror. Its premise is that we can learn from the past because, absent intervention, the future will repeat it. Individual traits that correlated with crime commission in the past will correlate with crime commission in future. Predictive analysis, in effect, holds a mirror to the past. It distills patterns in past data and interprets them as projections of the future. Algorithmic prediction produces a precise reflection of digital data. Subjective prediction produces a cloudy reflection of anecdotal data. But the nature of the analysis is the same. To predict the future under status quo conditions is simply to project history forward.

Given the nature of prediction, a racially unequal past will necessarily produce racially unequal outputs. To adapt a computer-science idiom, “bias in, bias out.” To be more specific, if the thing that we undertake to predict--say arrest--happened more frequently to black people than to white people in the past data, then a predictive analysis will project it to happen more frequently to black people than to white people in the future. The predicted event, called the target variable, is thus the key to racial disparity in prediction.

The strategies for racial equity that currently dominate the conversation amount to distorting the predictive mirror or tossing it out. Consider input data. If the thing we have undertaken to predict happens more frequently to people of color, an accurate algorithm will predict it more frequently for people of color. Limiting input data cannot eliminate the disparity without compromising the predictive tool. The same is true of algorithmic affirmative action to equalize outputs. Some calls for such interventions are motivated by the well-founded belief that, because of racially disparate law enforcement patterns, arrest rates are racially distorted relative to offending rates for any given category of crime. But unless we know actual offending rates (which we generally do not), reconfiguring the data or algorithm to reflect a statistical scenario we prefer merely distorts the predictive mirror, so that it reflects neither the data nor any demonstrable reality. Along similar lines, calls to equalize adverse predictions across racial lines require an algorithm that forsakes the statistical risk assessment of individuals in favor of risk sorting within racial groups. And wholesale rejection of algorithmic methods rejects the predictive mirror directly.

This Article's normative claim is that neither distorting the predictive mirror nor tossing it out is the right path forward. If the image in the predictive mirror is jarring, bending it to our liking does not solve the problem. Nor does rejecting algorithmic methods, because there is every reason to expect that subjective prediction entails an equal degree of racial inequality. To reject algorithms in favor of judicial risk assessment is to discard the precise mirror for the cloudy one. It does not eliminate disparity; it merely turns a blind eye.

Actuarial risk assessment, in other words, has revealed the racial inequality inherent in all crime prediction in a racially unequal world, forcing us to confront a much deeper problem than the dangers of a new technology. In making the mechanics of prediction transparent, algorithmic methods have exposed the disparities endemic to all criminal justice risk assessment, subjective and actuarial alike. Tweaking an algorithm or its input data, or even rejecting actuarial methods, will not redress the racial disparities in crime or arrest risk in a racially stratified world.

The inequality exposed by algorithmic risk assessment should instead galvanize a more fundamental rethinking of the way in which the criminal justice system understands and responds to risk. To start, we should be more thoughtful about what we want to learn from the past, and more honest about what we can learn from it. If the risk that really matters is the risk of serious crime, but we have no access to data that fairly represent the incidence of it, then there is no basis for predicting serious crime at all. Nor is it acceptable to resort to predicting some other event, like “any arrest,” that happens to be easier to measure. This lesson has profound implications for all forms of criminal justice risk assessment, both actuarial and subjective.

If the data fairly represent the incidence of serious crime, however, the place to redress racial disparity is not in the measurement of risk, but in the response to it. Risk assessment must reflect the past; it need not dictate the future. The default response to risk could be supportive rather than coercive. In the long term, a supportive response to risk would help to redress the conditions that produce risk in the first place. In the short term, it would mitigate the disparate racial impact of prediction. Counterintuitively, algorithmic assessment could play a valuable role in a system that targets the risky for support rather than for restraint.

This Article makes three core contributions. The first is explanatory. Thus far, the computer-science and statistical literature on algorithmic fairness and the legal literature on criminal justice risk assessment have largely evolved on separate tracks. The Article offers an accessible taxonomy of potential measures of equality in prediction, synthesizing recent work in computer science with legalequality constructs. The second contribution is a descriptive analysis of practical and conceptual problems with strategies to redress predictive inequality that are aimed at algorithmic methods per se, given that all prediction replicates the past. The Article's third contribution is the normative argument that meaningful change will require a more fundamental rethinking of the role of risk in criminal justice.

Although this Article is about criminal justice risk assessment, it also offers a window onto the broader conversation about algorithmic fairness, which is itself a microcosm of perennial debates about the nature of equality. Through a focused case study, the Article aims to contribute to the larger literatures on algorithmic fairness and on competing conceptions of equality in law. The Article's Conclusion draws out some of these larger connections.

A few caveats are in order. First, the Article focuses on racial disparity in prediction, severed from the messy realities of implementation. Megan Stevenson has shown that the vagaries of implementation may affect the treatment of justice-involved people more than a risk-assessment algorithm itself. Still, risk-assessment tools are meant to guide decision-making. To the extent they do, disparities in classification will translate into disparities in outcomes. For that reason, and for the purpose of clarity, this Article focuses on disparities in classification alone.

The second caveat is that this Article speaks of race in the crass terminology of “black” and “white.” This language reduces a deeply fraught and complex social phenomenon to an artificial binary. The Article uses this language in part by necessity, to explain competing metrics of equality with as much clarity as possible, and in part to recognize that the criminal justice system itself tends to deploy this reductive schema. The reader may judge whether this approach is warranted.

It is important to note, though, that much of the Article's analysis generalizes to other minority groups. Although the criminal-legal apparatus has inflicted unique harm on African Americans over the past two hundred years, the data that generate predictions may also include disparities with respect to other groups, and this data will in turn produce predictive inequality. The manifold equality metrics presented in Section I.C apply to any intergroup comparison, as do the trade-offs among them. And there is every reason to be concerned about predictive disparities for other marginalized populations. Melissa Hamilton has recently shown that the very same prediction data set that ProPublica analyzed for black/white disparities manifests even greater disparities between Hispanic and white defendants. As the debate on equality in algorithmic prediction evolves, the analysis here is meant to serve as a template with broader applications.

The Article proceeds in four Parts.

Part I chronicles the recent scholarly and public debate over risk assessment and racial inequality, using the ProPublica saga and a stylized example to illustrate why race-neutral prediction is impossible. It concludes with a taxonomy of potential metrics of predictive equality.

Part II lays out the Article's central conception of prediction as a mirror. For clarity of analysis, it draws an important distinction between two possible sources of racial disparity in prediction: racial distortions in past-crime data relative to crime rates, and a difference in crime rates by race.

Accounting for both, Part III explains why the prescriptions for racial equity that currently dominate the debate will not solve the problem.

Part IV argues for a broader rethinking of the role of risk in criminal justice. The Conclusion draws out implications for other predictive arenas.

[. . .]

On June 6, 2018, the Pennsylvania Commission on Sentencing held a public hearing in Philadelphia on the newly proposed Pennsylvania Risk Assessment Tool for sentencing. The room was packed. One by one, community members walked to the lectern and delivered impassioned pleas against adoption of the tool. They argued that reliance on criminal-history factors would have disparate impact, and that the likelihood of arrest is an artifact of racially skewed law enforcement rather than a meaningful measure of risk. Several speakers wondered why the system is so fixated on risk-- the prospect of failure--in the first place. Instead, they argued, it should direct its efforts to improving people's prospects for success.

The speakers at that meeting offered a profound critique--of all state coercion on the basis of risk. Some of their concerns were indeed specific to algorithmic methods and to the proposed Pennsylvania tool. But the deepest concerns of the community, the sources of its deepest outrage, applied equally to the subjective risk assessment that already pervades the criminal justice system.

Algorithmic methods have revealed the racial inequality that inheres in all forms of risk assessment, actuarial and subjective alike. Neither colorblindness, nor algorithmic affirmative action, nor outright rejection of actuarial methods will solve the underlying problem. As long as crime and arrest rates are unequal across racial lines, any method of assessing crime or arrest risk will produce racial disparity. The only way to redress the racial inequality inherent in prediction in a racially unequal world is to rethink the way in which contemporary criminal justice systems conceive of and respond to risk.

The analysis of racial inequality in criminal justice risk assessment also serves as a case study for broader questions of algorithmic fairness. The important distinction between the two possible sources of intergroup disparity in prediction– distortions in the data versus differential base rates of the event of concern--applies in any predictive context, as does the taxonomy of equality metrics. But the types of distortions that affect the data or algorithmic process will differ by context. So too will the analysis of what equality metric(s) it makes sense to prioritize. This is because the right equality metric depends on the relevant basis for the action at issue. When an algorithm's very purpose is to accurately communicate statistical risk under status quo conditions, statistical risk is the only relevant basis for its action, such that two people who pose the same statistical risk must be treated alike. But in other contexts, algorithms might have other purposes. Algorithms used to allocate loans, housing, or educational opportunity might have distributional goals. Algorithms that drive internet search engines might be programmed to maximize the credibility of top results or minimize representational harms. Algorithms used to calculate lost-earnings damages in wrongful-death suits should perhaps have objectives other than reflecting status quo earning patterns. Not all algorithms, in other words, should faithfully mirror the past.

Given the frenzied uptake of criminal justice risk assessment and the furious resistance it has engendered, the present moment is crucial. The next few years will likely set the course of criminal justice risk assessment for decades to come. To demand race neutrality of tools that can only function by reflecting a racially unequal past is to demand the impossible. To reject algorithms in favor of subjective prediction is to discard the clear mirror for a cloudy one. The only sustainable path to predictive equity is a long-term effort to eliminate the social inequality that the predictive mirror reflects. That path should include a radical revision of how the criminal justice system understands and responds to crime risk. There is an opportunity now, with risk assessment and race in the public eye, to take it.

Assistant Professor of Law, University of Georgia School of Law.