| Literature DB >> 33684933 |
Matt Docherty1, Stephane A Regnier2, Gorana Capkun2, Maria-Magdalena Balp2, Qin Ye1, Nico Janssens2, Andreas Tietz2, Jürgen Löffler2, Jennifer Cai3, Marcos C Pedrosa2, Jörn M Schattenberg4.
Abstract
OBJECTIVE: To develop a computer model to predict patients with nonalcoholic steatohepatitis (NASH) using machine learning (ML).Entities:
Keywords: NAFLD; NASH; artificial intelligence; machine learning; non-alcoholic fatty liver disease
Year: 2021 PMID: 33684933 PMCID: PMC8200272 DOI: 10.1093/jamia/ocab003
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
NIDDK feature rank
| Feature | Rank | Relative Feature Importance |
|---|---|---|
| HbA1c | 1 | 100% |
| AST (units/L) | 2 | 86% |
| ALT (units/L) | 3 | 75% |
| Total protein (g/dl) | 4 | 71% |
| AST/ALT | 5 | 69% |
| BMI (kg/m2) | 6 | 66% |
| Triglycerides (mg/dl) | 7 | 64% |
| Height (cm) | 8 | 61% |
| Platelets (cell/μl) | 9 | 58% |
| WBC (1000 cells/μl) | 10 | 55% |
| Hematocrit (%) | 11 | 49% |
| Albumin (g/dl) | 12 | 42% |
| Hypertension | 13 | 16% |
| Gender | 14 | 12% |
Abbreviations: Hemoglobin A1C (HbA1c), alanine transaminase (ALT), aspartate transaminase (AST), white blood cell count (WBC), body mass index (BMI).
NIDDK feature values
|
| T-Test | ||
|---|---|---|---|
| Laboratory test | NASH (N = 453) | Non-NASH (N = 251) |
|
| HbA1C (%) | 6.3 ± 1.4 | 5.8 ± 1.1 | <.01 |
| AST (units/L) | 67.1 ± 44.3 | 44.9 ± 29.9 | <.01 |
| ALT (units/L) | 88.7 ± 60.2 | 57.3 ± 41.3 | <.01 |
| Total Protein (g/dl) | 7.4 ± 0.5 | 7.1 ± 0.6 | <.01 |
| AST/ALT | 0.8 ± 0.3 | 0.9 ± 0.5 | .13 |
| BMI (kg/m2) | 34.0 ± 5.4 | 33.4 ± 5.8 | .16 |
| Triglycerides (mg/dl) | 189 ± 112 | 155 ± 86 | <.01 |
| Height (cm) | 167 ± 9 | 169 ± 9 | .01 |
| Platelets (cell/µl) | 233 877 ± 65 296 | 239 905 ± 70 449 | .24 |
| WBC (1000 cells/µl) | 7.1 ± 1.8 | 6.7 ± 1.7 | .01 |
| Hematocrit (%) | 41.8 ± 3.6 | 42.0 ± 3.7 | .45 |
| Albumin (g/dl) | 4.3 ± 0.4 | 4.2 ± 0.4 | <.01 |
Abbreviations: Hemoglobin A1C (HbA1c), alanine transaminase (ALT), aspartate transaminase (AST), white blood cell count (WBC), body mass index (BMI).
Model performance based on number of features (14 and 5 features) and method used (NIDDK test dataset)
| Performance | Logistic Regression | CART | Random Forest | XGBoost |
|---|---|---|---|---|
| 14-Feature Model | ||||
| AUC | 77% | 72% | 82% | 82% |
| Accuracy | 73% | 70% | 75% | 75% |
| Precision | 79% | 76% | 80% | 81% |
| Sensitivity | 79% | 78% | 82% | 81% |
| 5-Feature Model | ||||
| AUC | 75% | 73% | 78% | 79% |
| Accuracy | 73% | 69% | 70% | 74% |
| Precision | 78% | 77% | 77% | 80% |
| Sensitivity | 80% | 75% | 77% | 80% |
Abbreviations: Classification and regression trees (CART), area under the curve (AUC).
Figure 1.Model performance in NASH prediction using NIDDK data. Area under the curve (AUC), false positive rate (FPR).
Figure 2.Model performance in NASH prediction using Optum data. Area under the curve (AUC), false positive rate (FPR).
Comparison of reported model performance
| Performance |
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| Task | NASH vs. non-NASH | NAFLD vs. non-NAFLD | NAFLD vs. non-NAFLD | NASH vs. non-NASH | NASH vs. Healthy | NASH vs NAFL vs. Healthy |
| Cohort | NAFLD | Taiwanese high-tech workers | T2DM and non- T2DM at high risk | Obese with NAFLD | NAFLD and Healthy | Greek NAFLD and Healthy |
| Number of features | 14 | 8 | 9 | 5 | 23 | 29 |
| AUC | 82% | – | 82% | 70% | 88% | 95% |
| Accuracy | 75% | 87% | 74% | – | 80% | 88% |
| Precision | 81% | – | – | – | 81% | – |
| Sensitivity | 81% | 90% | 74% | – | 77% | 89% |
| Specificity | 66% | 81% | 73% | – | – | 94% |
| F1 Score | 81% | 70% | 79% |
Abbreviation: Type 2 Diabetes Mellitus (T2DM).
14-feature model on NIDDK data,
male data,
model 3 in IMI DIRECT,
29 lipid non-linear SVM OvR model with healthy >27.5 BMI.