Statistical Controversy on Estimating Racial Bias in the Criminal Justice System

## Notes Gelman reacts to a recent controversy between two research teams regarding a paper by Knox et al. and a critique by Gaebler et al. The two papers discuss statistical methods in causal inference for detecting racial bias in the criminal justice system. Knox suggests that the common datasets used in these questions, administrative records of police encounters and the like, mask racially biased policing due to selection bias--you don't measure the people not stopped by the police. Gaebler rebuts that the situation is not that bad, that you can use statistical methods to detect bias at specific steps in the process (e.g., sentencing) but concedes you cannot know about sum total bias in the system. Gelman suggests that Knox has overstated their case, but is accurate in pointing out the selection bias problem. Any causal effects identified must be defined carefully and in recognition of the limitations of the data. You can estimate causal effects in standard regression-based inferences, but those effects don't tell the whole story. ## Highlights Knox et al. make the argument that analyses using administrative data are implicitly conditioning on a post-treatment variable because they subset on whether you were stopped or not. To use the words in their title, adjusting for or conditioning on administrative records can mask racially biased policing. Knox et al. point out that this bias can be viewed as an example of conditioning on intermediate outcomes. It’s a well known principle in causal inference that you can’t give regression coefficients a direct causal interpretation if you’re conditioning on intermediate outcomes ([View Highlight](https://read.readwise.io/read/01j5tm5nxmgm1xt9t57e9qjghv)) --- And, as Knox et al. point out, there’s an additional challenge for criminal justice research because of missing data: “police administrative records lack information on civilians police observe but do not investigate.” ([View Highlight](https://read.readwise.io/read/01j5tm6k1f01pxhv6n0zrtjqcw)) --- From a substantive point of view, the message that I take from Knox et al. is to be careful with what might seem to be kosher regression analyses purporting to estimate the extent of discrimination. I would not want to read Knox et al. as saying that regression methods can’t work here. You have to be aware of what you are conditioning on, but if you interpret the results carefully, you should be able to estimate a causal effect of one stage in the process. The concern is that it can be easy for the most important effects to be hidden in the data setup. Gaebler et al. discuss similar issues. However, they disagree with Knox et al. on technical grounds. Gaebler et al. argue that there exist situations in which you may be able to estimate causal effects of discrimination or perception of race just fine, even when conditioning on variables that are affected by race. It’s just that these won’t capture all aspects of discrimination in the system; they will capture the discrimination inherent solely at that point in the process (for example, when a prosecutor makes a decision about charging). ([View Highlight](https://read.readwise.io/read/01j5tm857rv96106dmghw324nx)) --- For a simple example, suppose you have stone-cold evidence of racial bias in sentencing. That’s still conditional on who gets arrested, charged, and prosecuted. So it doesn’t capture all potential sources of racial bias, not by a long shot. Or, maybe you find no racial bias in sentencing. That doesn’t mean that total racial bias is zero; it just means that you don’t find anything at that stage in the process. ([View Highlight](https://read.readwise.io/read/01j5tm8n6vndarr2prss2vj7rt)) --- Gaebler et al. give an example where the estimand corresponds to what one would measure in a randomized controlled trial where the stated race of arrested individuals was randomly masked on the police narratives that prosecutors use when making their decisions. With this setup, it is possible to estimate causal racial bias in a particular part of the system. ([View Highlight](https://read.readwise.io/read/01j5tm9y506vay8s6gpkqj2dyz)) --- One thing I especially like about the Gaebler et al. article is that they move beyond the question of racial discrimination to the more relevant, I think, issue of disparate outcomes: “much of the literature has framed discrimination in terms of causal effects of race on behavior, but other conceptions of discrimination, such as disparate impact, are equally important for assessing and reforming practices.” ([View Highlight](https://read.readwise.io/read/01j5tmaemhjrfhyqb6y7f27bjy)) --- The relevant point to make, and hence the point I will extract from Knox et al., is that, in this particular example of arrests, the problems of selection is, in the words of Morrissey, really serious. But you can use standard methods of causal inference to estimate causal effects here, as long as you’re careful in interpreting the results and don’t take the estimate of discrimination in one part of the system as representing the entirety of racial bias in the whole process. ([View Highlight](https://read.readwise.io/read/01j5tmgcc246ft8qrrdahvnw2m)) --- From both papers I draw the same substantive conclusion, which is it that simple, or even not-so-simple regressions of outcome on race and pre-treatment predictors can give misleading results if you’re trying to understand the possibility of racial discrimination in the criminal justice system without thinking carefully about these issues. ([View Highlight](https://read.readwise.io/read/01j5tmh8bmmnxm39jr67fk0pr1)) --- My read of this particular dispute is that Knox et al. were trying to prove something that is not quite correct. The correct statement is that, even when standard regression-based inferences allow you to estimate a causal effect of race at some stage in the criminal justice process, this causal effect is conditional on everything that came before, and so a focus on any particular causal effect will not catch other biases in the system. The incorrect statement is that you can’t estimate causal effects in standard regression-based inferences. These local causal effects don’t tell the fully story, but they’re still causal effects, and that’s the technical point that Gaebler et al. are making. ([View Highlight](https://read.readwise.io/read/01j5tmja25c6yfygvhsb4nztfk)) ---