Imagine for a moment that you’ve been convicted of a crime, and are awaiting sentencing. The prosecutor hands a computer-generated analysis to the judge that declares, based on a secret analysis performed by a complex algorithm, that you should receive the harshest possible sentence, since according to the algorithm you are highly likely to commit future crimes. Your attorney, hoping to rebut this conclusion, asks how the report was prepared, but the judge rules that neither you nor your attorney are entitled to know anything about its preparation, only the results. The judge then proceeds to impose the maximum sentence, based on this secret calculation.
If that sounds like something out of a dystopian science fiction novel, well, it’s going on right now in several states throughout this country.
Jed Rakoff is a federal district judge for the Southern District of New York. A former federal prosecutor appointed to the bench in 1996, Rakoff has presided over the some of the most significant white-collar crime cases in this country. He is generally recognized as one of the leading authorities on securities and criminal law, and as a regular contributor to the New York Review of Books, he often writes about novel and emergent criminal justice issues.
His latest essay addresses the increasingly widespread use by criminal prosecutors of artificial-intelligence-based (AI) computer programs or algorithms to support sentencing recommendations for convicted criminal defendants. These programs, using a variety of controversial sociological theories and methods, are primarily used to assess recidivism (the propensity of a defendant to commit future crimes) and they are often given heavy weight by judges in determining the length of the sentence to be imposed. They also factor in decisions regarding setting bail or bond limits. The consideration of potential recidivism is based on the theory of “incapacitation:” the idea that criminal sentencing should serve the dual purpose of punishment as well as preventing a defendant from committing future crimes, in order to protect society.
Rakoff finds the use of these predictive algorithms troubling for a number of reasons, not the least of which are their demonstrated error rates and propensity for inherent racial bias. He notes that the theories on which they purportedly analyze a person’s propensity to commit future crimes are often untested, unreliable, and otherwise questionable. However, his most recent essay for the NYRB, titled “Sentenced by Algorithm” and reviewing former district judge Kathleen Forrest’s, When Machines Can Be Judge, Jury, and Executioner, implicates even more disturbing questions raised by the introduction of artificial intelligence technology into our criminal justice system.
Is it fair for a judge to increase a defendant’s prison time on the basis of an algorithmic score that predicts the likelihood that he will commit future crimes? Many states now say yes, even when the algorithms they use for this purpose have a high error rate, a secret design, and a demonstrable racial bias.
One of the basic concerns about the use of these programs is their fundamental fairness to criminal defendants. In the past, when a prosecutor wanted to emphasize, for purposes of sentencing, that a convicted defendant might commit future crimes, he/she would rely primarily upon that defendant’s past criminal record, his demonstrations of remorse (or lack thereof) for the crime committed, his demeanor, the testimony of various witnesses as to his character, and possibly most importantly, his potential for rehabilitation under a less stringent sentencing regimen. Obviously a public defender would also make these considerations paramount in support of clemency for his client.
But the introduction of a quasi-scientific basis upon which to determine a defendant’s propensity to commit future crimes—crimes which have yet to occur, if they occur at all—threatens to undermine the human element a judge typically uses in making such determinations. The fact that such computer-driven assessments carry an imprimatur of infallibility and certainty are undoubtedly part of their attractiveness to judges beleaguered by crowded dockets and heavy time constraints. Nor are judges immune to the fact that such tools can effectively provide cover to close or questionable decisions regarding sentencing of criminals; for judges subject to the political constraints of reelection, that factor alone may unduly influence their reliance on them.
These are serious enough concerns. However, according to Rakoff, the biggest problems with these algorithms is that they don’t actually work.
Studies suggest they have an error rate of between 30 and 40 percent, mostly in the form of wrong predictions that defendants will commit more crimes in the future. In other words, out of every ten defendants who these algorithms predict will recidivate, three to four will not. To be sure, no one knows if judges who don’t use such programs are any better at predicting recidivism (though one study, mentioned below, finds that even a random sample of laypeople is as good as the most frequently used algorithm). But the use of such programs supplies a scientific façade to these assessments that the large error rate belies.
As Rakoff notes, the most common of these AI computer algorithms employed to detect potential recidivism is called COMPAS, produced by a private company called Northpointe, which does business as Equivant. The COMPAS product is currently being used in several states, including New York, California, and Florida. In Wisconsin the legal merits of COMPAS were addressed in what Rakoff describes as “perhaps the leading case” evaluating their usage in criminal prosecutions, Loomis v. State of Wisconsin.
In that case, a unanimous Wisconsin Supreme Court denied an appeal by Mr. Loomis, a defendant who had entered into a plea bargain for two nonviolent offenses, but contended his sentence was still excessive, primarily as a result of pre-sentencing report submitted by the prosecution that relied, in part, upon COMPAS’s assessment of his likely recidivism. Loomis argued that as the company’s algorithm was classified as a “trade secret,” he had inadequate means to evaluate its reliability in order to rebut its conclusions.
Somewhat perversely, the court denied Loomis’ appeal, reasoning that even if he didn’t have access to its means of preparation, he had an opportunity to rebut the COMPAS findings with evidence of his own. Further, the court apparently felt content with the admonition that the COMPAS results should simply be viewed by the court as one of several guidelines regarding an individual’s threat to public safety, and not the primary factor in determining the severity of the sentence. As Rakoff drily observes, as a practical matter that distinction is preposterous:
If a sentencing judge, unaware of how unreliable COMPAS really is, is told that this “evidence-based” instrument has scored the defendant as a high recidivism risk, it is unrealistic to suppose that she will not give substantial weight to that score in determining how much of the defendant’s sentence should be weighted toward incapacitation.
Worse, the court actually acknowledged that the algorithm had demonstrated systematic racial bias in its past assessments. Rakoff quotes from the court’s opinion:
A recent analysis of COMPAS’s recidivism scores based upon data from 10,000 criminal defendants in Broward County, Florida, concluded that black defendants “were far more likely than white defendants to be incorrectly judged to be at a higher risk of recidivism.” Likewise, white defendants were more likely than black defendants to be incorrectly flagged as low risk.
Meanwhile, according to Rakoff, the company itself has disclosed validation studies which “show an error rate of between 29 and 37 percent in predicting future violent behavior and an error rate of between 27 and 31 percent in predicting future nonviolent recidivism.” In other words, as Rakoff notes, the software is potentially wrong “about one-third of the time.”
Whether or not COMPAS actually incorrectly categorizes Black defendants as recidivist at a higher rate relative to white defendants has been a matter of dispute. ProPublica released its own analysis in 2016 (the one referenced by Rakoff), based on a database of over 10,000 criminal defendants in Broward County, Florida, and found systematic “mis-flagging” of Black defendants as potential future criminals. Northpointe, which produces COMPAS, questioned their analysis and ProPublica responded to Northpointe’s rebuttal. In 2018 an analysis in the Washington Post concluded that because Northpointe refused to release its algorithm, claiming it was proprietary, it was impossible to determine whether the COMPAS product demonstrated unfair bias.
But that fact should be disqualifying in and of itself. The court’s seeming approval of the COMPAS program despite its known error rate and despite the fact that the company refuses to provide specifics of its algorithm to Mr. Loomis or others is probably the most disturbing aspect of this decision. It suggests that the court will essentially sanction any trial judge’s deference to this purported scientific evidence without being required to delve to any useful extent into the exact methodology or reliability underlying that evidence. As Rakoff notes, in the context of a sentencing hearing there is no requirement under current law that an algorithm such as COMPAS be subjected to more rigorous scrutiny, such as that required of expert witnesses or evidence during an actual trial.
In a civil case, allowing potentially unreliable evidence to be considered could make the difference between a fair or unfair verdict for money damages. But in the criminal context that distinction can literally obliterate years of a person’s life.
Rakoff blames the use of analytical products like COMPAS on the encouragement by the National Center for State Courts to make the sentencing process more “data-driven,”and he opines that the entire process of basing the severity of criminal sentences based on “incapacitation,” i.e. prevention of future crimes, should be re-evaluated. Specifically, Rakoff believes the focus should be on rehabilitating criminal defendants rather than trying to prevent crimes that have never been committed in the first place. In the unlikely event of such a transformation in criminal jurisprudence, Rakoff believes that if products like COMPAS become more ubiquitous judges are going to become more reliant on them, with the end result of more emphasis on preventing future crimes (through more severe sentencing) than reforming criminals through rehabilitative programs that don’t involve incarceration.
One point that Rakoff also might have raised is the fact that although these AI algorithms are intended to assist judges in determining appropriate sentencing, they are primarily a tool wielded by prosecutors. The vast majority of criminal defendants (and the vast majority of public defenders) do not have the resources or wherewithal to challenge the results of these assessments, particularly if the use of datasets and algorithms remain secret. Even if such data are disclosed, the forensic analysis needed to evaluate their credibility would cost more than most defendants are able to pay.
The employment of this technology thus reinforces the disparity between the power of the state and the individual, one that seems to have simply been accepted out of expedience. Notably, Rakoff also cites a study conducted by researchers at Dartmouth University which determined that of the (estimated) 137 factors COMPAS might use to evaluate a person’s potential to commit future crimes, the same predictive analysis can be achieved by utilizing only two factors—a person’s age and criminal history, which judges are presumably capable of assessing without the assistance of artificial intelligence.
Beyond the Orwellian prospect of having the course of one’s future dependent on an unknowable, secret algorithm, the adoption of COMPAS and products like it highlights the unsettling intersection between the very human issues of criminal justice and the inherently inhuman aspects of technology. And while that route may seem easier or more convenient for judges and prosecutors, it isn’t necessarily the one we ought to be following.