| Literature DB >> 29376122 |
Abstract
Algorithms for predicting recidivism are commonly used to assess a criminal defendant's likelihood of committing a crime. These predictions are used in pretrial, parole, and sentencing decisions. Proponents of these systems argue that big data and advanced machine learning make these analyses more accurate and less biased than humans. We show, however, that the widely used commercial risk assessment software COMPAS is no more accurate or fair than predictions made by people with little or no criminal justice expertise. We further show that a simple linear predictor provided with only two features is nearly equivalent to COMPAS with its 137 features.Entities:
Mesh:
Year: 2018 PMID: 29376122 PMCID: PMC5777393 DOI: 10.1126/sciadv.aao5580
Source DB: PubMed Journal: Sci Adv ISSN: 2375-2548 Impact factor: 14.136
Human versus COMPAS algorithmic predictions from 1000 defendants.
Overall accuracy is specified as percent correct, AUC-ROC, and criterion sensitivity (d′) and bias (β). See also Fig. 1.
| Accuracy (overall) | 67.0% | 66.5% | 65.2% |
| AUC-ROC (overall) | 0.71 | 0.71 | 0.70 |
| 0.86/1.02 | 0.83/1.03 | 0.77/1.08 | |
| Accuracy (black) | 68.2% | 66.2% | 64.9% |
| Accuracy (white) | 67.6% | 67.6% | 65.7% |
| False positive (black) | 37.1% | 40.0% | 40.4% |
| False positive (white) | 27.2% | 26.2% | 25.4% |
| False negative (black) | 29.2% | 30.1% | 30.9% |
| False negative (white) | 40.3% | 42.1% | 47.9% |
Fig. 1Human (no-race condition) versus COMPAS algorithmic predictions (see also Table 1).
Algorithmic predictions from 7214 defendants.
Logistic regression with 7 features (A) (LR7), logistic regression with 2 features (B) (LR2), a nonlinear SVM with 7 features (C) (NL-SVM), and the commercial COMPAS software with 137 features (D) (COMPAS). The results in columns (A), (B), and (C) correspond to the average testing accuracy over 1000 random 80%/20% training/testing splits. The values in the square brackets correspond to the 95% bootstrapped [columns (A), (B), and (C)] and binomial [column (D)] confidence intervals.
| Accuracy (overall) | 66.6% [64.4, 68.9] | 66.8% [64.3, 69.2] | 65.2% [63.0, 67.2] | 65.4% [64.3, 66.5] |
| Accuracy (black) | 66.7% [63.6, 69.6] | 66.7% [63.5, 69.2] | 64.3% [61.1, 67.7] | 63.8% [62.2, 65.4] |
| Accuracy (white) | 66.0% [62.6, 69.6] | 66.4% [62.6, 70.1] | 65.3% [61.4, 69.0] | 67.0% [65.1, 68.9] |
| False positive (black) | 42.9% [37.7, 48.0] | 45.6% [39.9, 51.1] | 31.6% [26.4, 36.7] | 44.8% [42.7, 46.9] |
| False positive (white) | 25.3% [20.1, 30.2] | 25.3% [20.6, 30.5] | 20.5% [16.1, 25.0] | 23.5% [20.7, 26.5] |
| False negative (black) | 24.2% [20.1, 28.2] | 21.6% [17.5, 25.9] | 39.6% [34.2, 45.0] | 28.0% [25.7, 30.3] |
| False negative (white) | 47.3% [40.8, 54.0] | 46.1% [40.0, 52.7] | 56.6% [50.3, 63.5] | 47.7% [45.2, 50.2] |