excerpted from: P. Jeffrey Brantingham, The Logic of Data Bias and its Impact on Place-based Predictive Policing, 15 Ohio State Journal of Criminal Law 473 (Spring, 2018) (18 Footnotes) (Full Document)
Predictive policing refers to a three-part process: (1) data of one or more type are ingested; (2) algorithmic methods use ingested data to forecast the occurrence of crime in some domain of interest; and (3) police use forecasts to inform strategic and tactical decisions in the field. A primary goal of predictive policing is to reduce uncertainty so that police can approach the allocation of resources in an optimal manner. The theory is that an optimal allocation of police resources has a better chance at disrupting opportunities for crime before they happen.
Although simple in principle, there are many subtle questions that surround each part of the predictive policing process. What types of data go into prediction? What are the biases associated with these data? How do the algorithmic methods work? Are algorithmic methods for crime forecasting better than existing practice? How do police actually use predictions in the field? Are outcomes from the use of predictions in the field unequal and/or unconstitutional? Each of these questions, and many more that could be listed, deserves careful scrutiny with the understanding that the answers should have an impact on how (and if) predictive policing should be deployed in the future.
This paper takes up one particular question surrounding the origin of biases in police data and how such biases may be expected to percolate through forecasting algorithms to impact police action. I specifically look at place-based predictive policing, where algorithmic methods ingest data on the time, location, and type of past crimes and deliver forecasts for where and when crime is most likely to occur in narrow space-time windows. The principal question is whether data biases, when filtered through algorithmic place-based policing, should be expected to lead predictions to produce under-or over-policing for a given community.
There is voluminous evidence that policing practice is not immune from bias. Racial bias has been documented in the targeting of vehicles and pedestrians for stops, issuing traffic citations, drug enforcement and arrests, use of force, and even the decision about whether to fire a weapon in training simulators. How exactly explicit and implicit biases operate to produce such outcomes is difficult to disentangle, but there is no doubt that such unequal outcomes exist.
Given this empirical record, there is real and justified concern that algorithmic methods for predictive policing, rather than helping the situation, will only serve to exacerbate bias and amplify unequal outcomes. That the exacerbation of bias is possible has been demonstrated in simulations that take up a hypothetical case of predictive policing using drug arrest data from Oakland, California. The core idea in that work is that if people of color are stopped and arrested disproportionately for drug crimes relative to actual prevalence, and if those arrests are the basis for forecasts, then predictions will lead to more disproportionate stops and arrests. Unequal outcomes will grow and, not surprisingly, subsequent arrests would be consistently confirmed by predictions. The present paper takes a step back from this very specific example to ask more fundamental and general questions about how implicit bias impacts crime event data.
The remainder of this paper proceeds as follows. First, I examine the origin of data biases from a logical standpoint. I take as a starting point the assumptions that explicit and implicit biases do exist and that these biases act against the interests of individuals whom the police contact if those individuals represent a particular social group. Second, I discuss in general terms the expected impact of these data biases on risk assessments. Third, I provide a theoretical exploration of the impact of data biases on place-based predictive policing of the type tested in Los Angeles. The analysis relies on simulation methods rather than analysis of real-world data. I conclude with a discussion of limitations and future possible avenues of research.
. . .
.Beyond issues of general prevalence of opportunities for bias to operate, we need to know something about its magnitude. It is perhaps convenient to argue the extremes: that implicit bias has the maximum impact on each and every crime tied to a victim or suspect of a targeted group (the anti-police stance) or that it does not exist and therefore does not impact any events (the pro-police stance). It seems far more reasonable to assume that implicit bias does not operate at the extremes but, rather, is heterogeneous in both space and time. Of course, this makes the task of trying to assess the impact of implicit bias on the police data much more challenging. The conclusion is that we need to work hard to figure out how to detect and correct for biases in police data rather than rejecting such data out of hand or accepting it without further thought.
The goal of the present work was to start the process of mapping the fundamental ways in which implicit biases can impact police data and percolate through to algorithmic predictive policing programs. While a reasonable first step, numerous limitations must be highlighted. First, implicit bias was framed simplistically without any reference to detailed experimental work in psychology and sociology. Future work should seek to ground the simplifying assumptions in this rich source of evidence.
Second, blanket concepts of crime type downgrading and upgrading were taken as the only avenue by which implicit bias might operate to impact crime event data. It is possible that the spatial and temporal features of crime events might also be impacted in some way, for example, through biased variation in the accuracy with which such information is collected. It is also possible that implicit biases tied to more complex aspects of crime investigation, including attribution of motive, might influence primitive data about the events. More work is needed to try to understand whether such complex processes are at play.
Third, there are significant limitations to the simulation experiments presented here. These focused on random, uniform downgrading/upgrading of crimes. That is, any one crime has an equal probability of being impacted by bias. The experimental approach is not particularly realistic. The results might be quite different if downgrading/upgrading were to preferentially act on clusters of events. For example, one crime is more likely to be downgraded/upgraded if it follows closely another crime associated with the same targeted social group. This is akin to the Lum and Isaac mechanism where previous arrests are more likely to lead to future arrests. However, the advantage of starting with the simpler mechanism is that it provides a basis for mapping out fundamental mechanisms. Indeed, the results here should be easily translated into initial mathematical propositions about how bias impacts data. This is a first step to building algorithms that are able to better handle--or perhaps even correct for--such biases. Future work can look to more complex bias mechanisms. But without the simple first steps, there is little hope of seeking to manage the more complex bias mechanisms.
P. Jeffrey Brantingham is Professor of Anthropology at UCLA.