| Literature DB >> 35883154 |
Roland Albert A Romero1, Mariefel Nicole Y Deypalan1, Suchit Mehrotra1, John Titus Jungao1, Natalie E Sheils1, Elisabetta Manduchi2, Jason H Moore3.
Abstract
OBJECTIVES: Ascertain and compare the performances of Automated Machine Learning (AutoML) tools on large, highly imbalanced healthcare datasets.Entities:
Keywords: AutoML; Automated machine learning; Class imbalance; Healthcare; Machine learning; Medical claims
Year: 2022 PMID: 35883154 PMCID: PMC9327416 DOI: 10.1186/s13040-022-00300-2
Source DB: PubMed Journal: BioData Min ISSN: 1756-0381 Impact factor: 4.079
Definitions for flagging disease outcomes and the respective prevalences in the final cohort table. Abbreviations used: Chronic Kidney Disease (CKD), Type 2 Diabetes (T2D), Inflammatory Bowel Disease (IBD), Rheumatoid Arthritis (RA), International Classification of Diseases, Tenth Revision (ICD-10)
| ICD-10 Code | Definition | Prevalence | Number of cases | |
|---|---|---|---|---|
| Lung Cancer | C34 | Two lung cancer claims at least 30 days apart, no history of any cancer | 0.053% | 6,539 |
| Rheumatoid Arthritis (RA) | M05, M06 (Except M064) | At least one RA claim* | 0.10% | 12,174 |
| Prostate Cancer | C61 | Two prostate cancer claims at least 30 days apart, no history of any cancer | 0.12% | 14,925 |
| Type 2 Diabetes (T2D) | E11 | Two T2D claims at least 30 days apart | 0.59% | 73,540 |
| Inflammatory Bowel Disease (IBD) | K51, K52 | Two IBD claims at least one day apart | 0.32% | 39, 502 |
| Chronic Kidney Disease (CKD) | N18 | Two CKD claims at least 30 days apart | 0.63% | 78,786 |
Time periods for creating feature flags
| Time window | Start date | End date |
|---|---|---|
| 1 | Oct 1 2018 | Dec 31 2018 |
| 2 | July 1 2018 | Sep 30 2018 |
| 3 | Jan 1 2018 | Jun 30 2018 |
| 4 | Jan 1 2016 | Dec 31 2017 |
Fig. 1Flowchart of framework for benchmarking AutoML tools adapted from Gjisbers et al.
Fig. 2ROC AUC performance of different AutoML models trained for various disease outcomes from stratified bootstrap samples. Median values are indicated by diamond markers and 95% CIs are indicated by lines
Fig. 3AUCPR performance of different AutoML models trained for various disease outcomes from stratified bootstrap samples. Median values are indicated by diamond markers and 95% CIs are indicated by lines
Median performance ROC AUC scores for different AutoML models scaled according to median random forest performance. Models with the best performance for each disease are indicated in bold
| Metric: ROC AUC | Lung Cancer | Prostate Cancer | Rheumatoid Arthritis | Type 2 Diabetes | IBD | CKD |
|---|---|---|---|---|---|---|
| Model | ||||||
| AutoSklearn (Average Precision) | 1.107 | 1.091 | 1.072 | 1.081 | 1.022 | 1.039 |
| AutoSklearn (Balanced Accuracy) | 1.124 | 1.097 | 1.082 | 1.069 | 1.042 | 1.034 |
| AutoSklearn (ROC AUC) | 1.152 | 1.091 | 1.041 | |||
| H2O (AUC) | 1.107 | 1.107 | 1.042 | 1.042 | ||
| H2O (AUCPR) | 1.104 | 1.106 | 1.042 | |||
| Random Forest | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| TPOT (Average Precision) | 1.144 | 1.053 | 1.055 | 1.056 | 1.037 | 1.012 |
| TPOT (Balanced Accuracy) | 1.071 | 1.013 | 1.058 | 1.003 | 1.032 | 1.002 |
| TPOT (ROC AUC) | 1.128 | 1.103 | 1.084 | 1.075 | 1.040 | 1.013 |
| Random Forest | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
Median AUCPR scores for different AutoML models scaled according to median random forest performance. Models with the best performance for each disease are indicated in bold
| Metric: Average Precision | Lung Cancer | Prostate Cancer | Rheumatoid Arthritis | Type 2 Diabetes | IBD | CKD |
|---|---|---|---|---|---|---|
| Model | ||||||
| AutoSklearn (Average Precision) | 1.957 | 1.787 | 1.471 | 1.675 | 1.212 | 1.395 |
| AutoSklearn (Balanced Accuracy) | 2.043 | 1.260 | 1.647 | 1.608 | 1.259 | 1.300 |
| AutoSklearn (ROC AUC) | 1.870 | 2.102 | 1.353 | 1.592 | 1.235 | 1.337 |
| H2O (AUC) | 3.217 | 1.750 | ||||
| H2O (AUCPR) | 1.961 | 1.420 | ||||
| TPOT (Average Precision) | 2.565 | 1.346 | 1.147 | 1.650 | 1.200 | 1.160 |
| TPOT (Balanced Accuracy) | 0.696 | 0.457 | 1.324 | 1.033 | 0.976 | 0.984 |
| TPOT (ROC AUC) | 1.522 | 1.087 | 1.559 | 1.650 | 1.129 | 1.074 |
| Random Forest | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
Fig. 4Receiver operating characteristic (ROC) curves of models trained for predicting different diseases. ROC curves are generated using prediction scores on full validation set (N = 12,125,832)
Fig. 5Average savings per person for different cut-off thresholds for the H2O (AUROC) model for different test costs. True positive costs are set at $84,000, while false negative costs are set at $300,000. False positive costs are only from the test costs