| Literature DB >> 31651973 |
Ravi B Parikh1,2,3,4,5, Christopher Manz1,2,3, Corey Chivers6, Susan Harkness Regli6, Jennifer Braun2, Michael E Draugelis6, Lynn M Schuchter1,2,3, Lawrence N Shulman1,2,3, Amol S Navathe1,4,5, Mitesh S Patel1,5, Nina R O'Connor1,2.
Abstract
Importance: Machine learning algorithms could identify patients with cancer who are at risk of short-term mortality. However, it is unclear how different machine learning algorithms compare and whether they could prompt clinicians to have timely conversations about treatment and end-of-life preferences.Entities:
Year: 2019 PMID: 31651973 PMCID: PMC6822091 DOI: 10.1001/jamanetworkopen.2019.15997
Source DB: PubMed Journal: JAMA Netw Open ISSN: 2574-3805
Patient Characteristics, Stratified by Death Status Within 6 Months of the Index Encounter
| Characteristic | No. (%) | |
|---|---|---|
| Alive at 6 mo (n = 25 460) | Died at 6 mo (n = 1065) | |
| Age, mean (95% CI), y | 61.3 (61.1-61.5) | 67.3 (66.5-68.0) |
| Race/ethnicity | ||
| White | 18 920 (74.3) | 767 (72.0) |
| Black | 4163 (16.4) | 191 (17.9) |
| Asian | 535 (2.1) | 16 (1.5) |
| Hispanic, white | 346 (1.4) | 14 (1.3) |
| Hispanic, black | 96 (0.4) | 3 (0.3) |
| East Indian | 83 (0.3) | 1 (0.1) |
| Pacific Islander | 38 (0.1) | 2 (0.2) |
| American Indian | 28 (0.1) | 2 (0.2) |
| Other | 584 (2.3) | 30 (2.8) |
| Unknown | 659 (2.6) | 39 (3.7) |
| Women | 15 922 (62.5) | 500 (47.0) |
| Selected comorbidities | ||
| Hypertension | 8600 (33.8) | 472 (44.3) |
| Renal failure | 1891 (7.4) | 151 (14.2) |
| COPD | 3631 (14.3) | 227 (21.3) |
| Congestive heart failure | 1536 (6.0) | 141 (13.2) |
| Fluid and electrolyte disorders | 4526 (17.8) | 417 (39.2) |
| Most recent laboratory values, mean (95% CI) | ||
| Hemoglobin, g/dL | 12.2 (12.1-12.2) | 11.0 (10.9-11.1) |
| Platelets, ×103/μL | 227.1 (226.1-228.1) | 229.8 (222.4-237.3) |
| White blood cells, /μL | 7.0 (6.9-7.1) | 8.0 (7.6-8.4) |
| Creatinine, mg/dL | 0.95 (0.93-0.98) | 1.03 (0.98-1.08) |
| Total calcium, mg/dL | 9.3 (9.3-9.3) | 9.2 (9.1-9.2) |
| ALT, U/L | 20.0 (19.7-20.2) | 26.7 (24.3-29.0) |
| Total bilirubin, mg/dL | 0.55 (0.55-0.56) | 0.83 (0.70-0.97) |
| Alkaline phosphatase, U/L | 77.1 (76.6-77.7) | 122.3 (114.5-130.0) |
| Albumin, g/dL | 4.0 (4.0-4.0) | 3.7 (3.6-3.7) |
Abbreviations: ALT, alanine aminotransferase; COPD, chronic obstructive pulmonary disease.
SI conversion factors: To convert hemoglobin to g/L, multiply by 10.0; platelet count to ×109/L, multiply by 1.0; white blood cell count to ×109/L, multiply by 0.001; creatinine to μmol/L, multiply by 76.25; total calcium to mmol/L, multiply by 0.25; ALT to μkat/L, multiply by 0.0167; total bilirubin to μmol/L, multiply by 17.104; alkaline phosphatase to μkat/L, multiply by 0.0167; and albumin to g/L, multiply by 10.
Performance Metrics of Machine Learning Models
| Algorithm | Positive Predictive Value | AUC | Accuracy | Specificity |
|---|---|---|---|---|
| Random forest | 0.513 | 0.88 | 0.96 | 0.99 |
| Gradient boosting classifier | 0.494 | 0.87 | 0.96 | 0.99 |
| Logistic regression | 0.447 | 0.86 | 0.95 | 0.99 |
Abbreviation: AUC, area under the receiver operating characteristic curve.
Positive predictive value, accuracy, and specificity were determined by setting the alert rate in the test set for each algorithm to 0.02. At this prespecified alert rate, the 6-month mortality risk threshold was 0.27 for the random forest model; 0.15 for the gradient boosting model; and 0.33 for the logistic regression model.
Coprimary performance metric.
Refers to the best-performing model(s) for each performance metric.
Figure. Observed 180-Day Survival for Random Forest Model
Risk threshold was determined in the random forest model by setting the alert rate to 0.02, which corresponds to a proportion risk of 180-day mortality of 27%. Shaded areas indicate 95% CIs.