> *This note was originally drafted as a brief essay response for a course in my MSDS program.* Racial bias in the criminal justice system is an interest in policing, policy, and the legal system. Increasingly, data science and statistical modeling is playing a role in the application of police force, in decisions about who to try and for what, in sentencing, and in parole hearings. The datasets and statistical methods used in these applications have important real world consequences for individuals, and some argue may be enforcing racial segregation and contributing to differential outcomes based on race. Statisticians and data scientists debate the methodologies used to draw causal inferences about race and outcomes. One issue stems from the data itself. The data available for these analyses are drawn from administrative records such as police encounters and arrest records. As Gaebler et al. [^1] points out in their paper, these data do not include people that were not stopped by police. Thus it is impossible to say whether racial bias played a role in this initial encounter, which would distort the inferences made for subsequent stages of the criminal justice process. This is an example of the fundamental problem of causal inference, namely that the counterfactual cannot be observed. What would have happened if a different group of people were stopped? What if those people were of a different racial composition? What would have happened to this group if they had not been stopped by police? The assumptions underlying this debate include statistical assumptions (e.g., are the errors normally distributed) and assumptions about how race works in the criminal justice system. As Lily Hu points out in her blog post [^2] the researcher's assumptions about race cannot be disentangled from their decisions about the appropriate statistical methods. While some of the statistical assumptions can be checked against the data, other assumptions are simply normative judgements about the way the world works--often judgements that are intertwined with the very questions on racial justice that are being evaluated. These assumptions are not falsifiable in the traditional sense. In a [[lit/readwise/articles/Statistical Controversy on Estimating Racial Bias in the Criminal Justice System|blog post]], Andrew Gelman throws some cold water over the debate, pointing out that all statistical analysis suffers from limitations in data and requires many assumptions, both statistical and otherwise. Any causal inferences drawn from the data should be defined clearly and acknowledge the limitations of that data and analysis. Lily Hu, PhD candidate at Harvard, [[lit/readwise/articles/Law, Liberation, and Causal Inference|offers her take]] that rather than discount all statistical approaches to questions of race and criminal justice, or any other question of social progress, statistical analyses can help us further our understanding, but the assumptions underlying the models and the final picture they paint should both be subject to empirical and political scrutiny. [^1]: https://5harad.com/papers/post-treatment-bias.pdf [^2]: https://lpeproject.org/lpe-manifesto/