When Machines Can Be Judge, Jury, And Executioner: Justice In The Age Of Artificial Intelligence

Author: Katherine B. Forrest
Publisher: World Scientific Publishing, 2021. 160 pages.
Reviewer: Stephen E. Henderson ǀ January 2022

“In today’s criminal justice system…artificial intelligence (AI) is everywhere: it guides carceral sentences, bail decisions, like[li]hoods of arrest, and even the applications of autonomous military weapons. Put simply, AI is essential to how we, as an American society, dispense justice (p. xiii).”

There is much in Katherine Forrest’s claim—and thus in her new book—that is accurate and pressing. Forrest adds her voice to the many who have critiqued contemporary algorithmic criminal justice, and her seven years as a federal judge and decades of other experience make her perspective an important one. Many of her claims find support in kindred writings, such as her call for greater transparency, especially when private companies try to hide algorithmic details for reasons of greater profit. A for-profit motive is a fine thing in a private company, but it is anathema to our ideals of public trial. Algorithms are playing an increasingly dominant role in criminal justice, including in our systems of pretrial detention and sentencing. And as we criminal justice scholars routinely argue, there is much that is rather deeply wrong in that criminal justice.

But the relation between those two things—algorithms on the one hand and our systems of criminal justice on the other—is complicated, and it most certainly does not run any single direction. Just as often as numbers and formulae are driving the show (a right concern of Forrest’s), a terrible dearth of both leaves judges meting out sentences that, in the words of Ohio Supreme Court Justice Michael Donnelly, “have more to do with the proclivities of the judge you’re assigned to, rather than the rule of law.” Moreover, most of the algorithms we currently use—and even most of those we are contemplating using—are ‘intelligent’ in only the crudest sense. They constitute ‘artificial intelligence’ only if we term every algorithm run by, or developed with the assistance of, a computer to constitute AI, and that is hardly the kind of careful, precise definition that criminal justice deserves. A calculator is a machine that we most certainly want judges using, a truly intelligent machine is something we humans have so far entirely failed to create, and the spectrum between is filled with innumerable variations, each of which must be carefully, scientifically evaluated in the particular context of its use.

This brief review situates Forrest’s claims in these two regards. First, we must always compare apples to apples. We ought not compare a particular system of algorithmic justice to some elysian ideal, when the practical question is whether to replace and/or supplement a currently biased and logically-flawed system with that algorithmic counterpart. After all, the most potently opaque form of ‘intelligence’ we know is that we term human—we humans go so far as routine, affirmative deception—and that truth calls for a healthy dose of skepticism and humility when it comes to claims of human superiority. Comparisons must be, then, apples to apples. Second, when we speak of ‘artificial intelligence,’ we ought to speak carefully, in a scientifically precise manner. We will get nowhere good if we diverge into autonomous weapons when trying to decide, say, whether we ought to run certain historic facts about an arrestee through a formula as an aid to deciding whether she is likely to appear as required for trial. The same if we fail to understand the very science upon which any particular algorithm runs. We must use science for science.

First, apples. How just is our current system of criminal justice? We need to restate the question, because we of course have fifty-one such systems in America, and few can hope to match the resources of the federal system in which Forrest served. So, how just are our current systems of American criminal justice? Well, they are awful. No, they are not the worst we have seen—I would immediately pick them over living as a Jew in Hitler’s Germany or Stalin’s Russia, as a Uyghur living in contemporary China, or as a Black person living in the 1860s American South (or, alas, in the 1940s American South). Humans are collectively capable of astounding evil, and contemporary American criminal justice is far from the worst. That’s good. But when we have the highest incarceration rates in the world, and when those rates are strongly skewed by race and socioeconomic status, that’s very bad. Given such systemic injustice, we ought to measure every potential change against that current reality. We ought not be satisfied with a pareto superior move, of course—we ought never be satisfied with the mess that will always be human ‘justice.’ But we ought not allow the perfect to be the enemy of the good.

So, while Forrest has a great deal of confidence in her discernment—able to explain why she chose “13 months and not 12 or 14” (p. xiii), each of her “over 500 criminal sentences…incorporating [her] personal view of a ‘just’ result” (p. xviii)—the right question is not whether, say, an Arnold Foundation algorithm beats out a Katherine Forrest. The question is whether that algorithm is more legitimate than what is currently in place systemwide, looking to norms like accuracy, fairness (including treating meaningfully like cases alike), and efficiency. And if the alternative is a legion of individual judges exercising substantial discretion, well, there is a library of social science literature tending to show how idiosyncratic and biased that will be. Throughout her book, Forrest hammers away at the claim that current predictive criminal justice algorithms have “a 30% rate of error” (p. xv; see also pp. xxi, 8, 13, 47, 57, 64, 87, 109). That sounds damning. Yet it’s better than, say, a 50% error rate, and Forrest never articulates just what the human judicial failure rate might be. Chapter 4 intriguingly promises to “put AI up to the mirror of the human brain” (p. 29), but unfortunately never makes it there, consisting of no neuroscience or psychological experiment, but rather solely of personal reflection. Chapter 6 acknowledges “how little information we have regarding human accuracy” but doesn’t go where that now-recognized problem logically leads (p. 55).

Similarly, Forrest bemoans a lack of effort in achieving “national consensus as to the value of [machine learning] weightings” used in criminal justice software (p. 16), yet she sees no similar-magnitude problem in there being no judicial vetting or training on what justice demands (pp. xviii–xx). She presents as unproblematic that some judges see drug crimes as “victimless” while others do not (p. 33), and the same goes for judges disagreeing on the basic purposes of imprisonment (p. 36–37). What is good for the goose should be good for the gander—if we want fairness as equality in computer algorithms, we ought to want it in human judicial decisions as well. Again, apples to apples.

And that brings us to our second theme, which is scientific precision in evaluating algorithmic justice. How heavily do our systems of criminal justice depend upon AI? It necessarily depends upon how one defines that term. Forrest begins her book with this: “The same technology that allows Amazon to deliver our packages at warp speeds, Google to personalize our search results, and Facebook to tailor our newsfeeds, is now also used to govern the lives and liberties of millions of people” (p. xiii). It is true that all of these things rely upon solid-state circuits running algorithms. But that’s like asserting that we use the same technology to pick our music—or even to heat our food—as we use in criminal justice. Yes, Spotify, our microwaves, and the computers used by attorneys, judges, and executive officials all rely upon the same basic technologies. ‘No vacuum tubes here!’ But that is, of course, remarkably unhelpful. So, if we are to make progress, we are going to need to dig into the particular science much, much more, as Forrest’s text later seeks to do but only to partial success. Some AI is explainable, meaning humans can understand the algorithm it runs. Some AI is opaque, but it can be robustly tested for accuracy and bias. Some AI is ‘AI’ only in the most rudimentary ‘my-microwave-is-smart’ sense: it might be an algorithm running a regression analysis that mathematicians have been doing on paper since at least the early nineteenth century. When we call all such things ‘AI,’ we simply muck everything up.

A quick example should help illustrate the point. In Chapter 2, Forrest analogizes an AI algorithm to a cooking recipe. Unfortunately, the analogy falls off the rails with this: “An AI algorithm…could go on to consider all possible consumable items and/or all possible methods that one might be able to use to make the same end product, and then relay those options to the chef” (p. 13). That’s true only if that chef had infinite patience and so didn’t mind waiting to cook dinner forever, since the entire reason that we use clever algorithms is because such brute-force calculations and explanations are demonstrably impossible. If computer science is not your bailiwick, you might see it like this: Chess is a game with only 32 pieces and 64 board locations, yet the number of possible chess games might be something like 10¹²⁰—the so-called ‘Shannon number’—which is vastly more atoms than are believed to exist in the observable universe. Exponential complexity is simply a beast. So, Forrest’s analogy has a science problem. (For another fun example of exponential growth, one can Google ‘rice and chess.’)

Similarly, while Forrest is quite right to recognize that certain types of machine learning have a “black-box” problem (p. xxi), even those sorts of algorithms can generally be robustly tested for accuracy and bias prior to any implementation, and that testing can be explained to anyone. More importantly, there are several different types of machine learning, and to lump them together gets us nowhere good. Again, an example might help. One type is ‘reinforcement learning,’ which is, for example, useful in solving the ‘multi-armed bandit’ dilemma: Imagine a gambler wishes to determine which of a casino’s slot machines to play in order to maximize her payout, not knowing ex ante the payout probability of any of them. Thanks to the work of some clever mathematicians, anyone with a basic understanding of statistics and programming can run simple algorithms by which the ‘gambler’—here the code itself—will ‘learn’ from each slot-machine pull—did it pay out well or no?—and over time be guided to the better-payout machines in a reasonably efficient manner, at least better than random chance. Such computer code is ‘machine learning,’ because it learns as it goes. But it is hardly complicated, sinister, or opaque. So, it does no good to label things as ‘artificial intelligence’ or ‘machine learning’ and leave it there—meaningful discussion requires the weeds of what particular technology is being used and precisely how.

In sum, if we focus on pareto superior moves—not allowing the perfect to be the enemy of the good—and on understanding the particular mathematics at issue in each instance, we can harness machines to improve our systems of criminal justice. And, lord knows, we certainly need that.

Stephen E. Henderson, Judge Haskell A. Holloman Professor of Law, University of Oklahoma