| Literature DB >> 35178483 |
Masaki Kawakami1, Shigehiro Karashima2, Kento Morita1, Hayato Tada3, Hirofumi Okada3, Daisuke Aono4, Mitsuhiro Kometani4, Akihiro Nomura3, Masashi Demura5, Kenji Furukawa6, Takashi Yoneda7,4,8, Hidetaka Nambo1, Masa-Aki Kawashiri3.
Abstract
Background: Atrial fibrillation (AF) is the most common arrhythmia and is associated with increased thromboembolic stroke risk and heart failure. Although various prediction models for AF risk have been developed using machine learning, their output cannot be accurately explained to doctors and patients. Therefore, we developed an explainable model with high interpretability and accuracy accounting for the non-linear effects of clinical characteristics on AF incidence. Methods andEntities:
Keywords: Atrial fibrillation; General population; Generalized additive model; Machine learning; Prediction
Year: 2021 PMID: 35178483 PMCID: PMC8811230 DOI: 10.1253/circrep.CR-21-0151
Source DB: PubMed Journal: Circ Rep ISSN: 2434-0790
Figure 1.Process used to develop the machine learning model. The Kanazawa Medical Association (KMA) database containing 4,386 subjects with atrial fibrillation (AF) and 133,535 subjects with a normal electrocardiogram (ECG) was analyzed. The training dataset comprised subjects from 2009 to 2017, whereas the test dataset comprised subjects for whom data was obtained in 2018. After under-sampling and bootstrap sampling, the dataset was split into a training and out-of-bag (OOB) subset. A generalized additive model (GAM) or generalized linear model (GLM) was constructed using the training data, and the importance score was calculated using the OOB data. In all, 100 small models were developed as an ensemble, and their generalization performance was evaluated by using the test dataset.
Baseline Characteristics of Subjects in the Atrial Fibrillation (AF) and Normal Electrocardiogram (ECG) Groups in the Training and Test Datasets
| Training (n=137,921) | Test (n=35,407) | |||
|---|---|---|---|---|
| AF | Normal ECG | AF | Normal ECG | |
| 80±8** | 70±12 | 79±8** | 71±11 | |
| 59.9 | 31.2 | 63.0 | 32.1 | |
| 23.1±3.6** | 22.6±3.4 | 23.6±3.6** | 22.8±3.4 | |
| 84.7±9.8** | 82.8±9.7 | 86.1±10.3** | 83.4±9.6 | |
| 126±16** | 127±16 | 125±16** | 127±16 | |
| 72±11** | 73±10 | 73±11 | 73±10 | |
| WBC (/μL) | 5,506±2,027** | 5,384±1,727 | 5,290±1,429 | 5,264±1,449 |
| RBC (×104/μL) | 430±57 | 431±46 | 437±52 | 439±44 |
| Hemoglobin (g/dL) | 13.3±1.9** | 13.1±1.5 | 13.5±1.7** | 13.4±1.4 |
| Hematocrit (%) | 40.3±5.1** | 39.7±4.0 | 40.9±4.7** | 40.5±3.8 |
| PLT (×104/μL) | 18.9±5.5** | 22.5±5.9 | 19.8±5.4** | 23.3±5.7 |
| Cr (mg/dL) | 1.0±0.5** | 0.7±0.3 | 1.0±0.4** | 0.8±0.2 |
| eGFR (mL/min/1.73 m2) | 58.3±17.6** | 71.5±17.2 | 56.3±15.7** | 67.5±15.3 |
| UA (mg/dL) | 5.8±1.5** | 4.9±1.3 | 5.7±1.5** | 5.0±1.3 |
| AST (U/L) | 26.3±12.4** | 23.8±12.7 | 25.5±10.1** | 23.7±10.8 |
| ALT (U/L) | 20.5±12.5* | 20.4±14.5 | 20.1±11.0 | 20.7±13.0 |
| γ-GTP (U/L) | 52.7±82.5** | 32.6±50.8 | 45.9±81.6** | 31.9±46.1 |
| HbA1c (NGSP; %) | 5.7±0.7** | 5.5±0.6 | 6.0±0.7** | 5.8±0.6 |
| PG (mg/dL) | 112±40** | 102±30 | 112±37** | 101±27 |
| TC (mg/dL) | 180±35** | 201±34 | 180±32** | 202±34 |
| LDL-C (mg/dL) | 104±29** | 118±30 | 104±28** | 117±29 |
| HDL-C (mg/dL) | 54±16** | 60±15 | 56±14** | 62±16 |
| TG (mg/dL) | 114±73** | 119±74 | 107±60** | 119±76 |
| Protein (%) | ||||
| (−)/(±) | 62.6/16.3 | 80.5/11.9 | 65.4/16.9 | 81.8/11.3 |
| (+)/(2+)/(3+) | 12.6/6.5/0.5 | 5.1/1.9/0.1 | 11.2/3.8/1.6 | 4.7/1.4/0.4 |
| Glucose (%) | ||||
| (−)/(±) | 86.7/4.1 | 93.3/1.9 | 87.5/2.9 | 93.6/1.6 |
| (+)/(2+)/(3+) | 3.2/4.3/0.4 | 1.8/2.1/0.3 | 3.7/2.0/2.8 | 1.7/1.1/1.6 |
| Occult blood (%) | ||||
| (−)/(±) | 60.0/18.9 | 66.4/16.7 | 57.7/20.4 | 65.0/17.1 |
| (+)/(2+)/(3+) | 11.6/7.7/0.4 | 9.5/6.4/0.4 | 12.1/6.0/2.6 | 10.1/5.6/1.8 |
| Hypertension (%) | 59.3 | 38.8 | 63.6 | 41.4 |
| Diabetes (%) | 16.9 | 10.7 | 18.4 | 11.2 |
| Dyslipidemia (%) | 19.5 | 24.6 | 26.9 | 29.9 |
| Stroke (%) | 23.4 | 6.6 | 19.8 | 5.5 |
| CAD (%) | 63.0 | 8.7 | 63.9 | 7.6 |
| CKD (%) | 2.9 | 0.8 | 3.1 | 1.1 |
| Anemia (%) | 15.9 | 17.6 | 15.2 | 15.6 |
| Anemia (%) | 1.1 | 0.6 | 1.2 | 0.3 |
| Jaundice (%) | 0.1 | 0.0 | 0.0 | 0.0 |
| Arrhythmia (%) | 57.3 | 0.3 | 57.6 | 0.3 |
| Heart murmur (%) | 6.2 | 1.0 | 3.5 | 0.8 |
| Crackles (%) | 0.8 | 0.5 | 1.2 | 0.3 |
| Hepatomegaly (%) | 0.3 | 0.1 | 0.3 | 0.1 |
| Edema (%) | 9.4 | 2.3 | 8.2 | 2.0 |
| Cervical tumors (%) | 0.4 | 1.0 | 1.0 | 1.0 |
| Neuropathy (%) | 2.9 | 1.4 | 1.7 | 0.8 |
| Malnutrition (%) | 0.6 | 0.3 | 0.5 | 0.1 |
| Other (%) | 8.7 | 4.7 | 5.8 | 3.0 |
Data are given as the mean±SD or as percentages. *P<0.05, **P<0.001 compared with the normal ECG group. ALT, alanine aminotransferase; AST, aspartate aminotransferase; BMI, body mass index; CAD, coronary artery disease; CKD, chronic kidney disease; Cr, creatinine; DBP, diastolic blood pressure; eGFR, estimated glomerular filtration rate; Exam, physical examination; γ-GTP, γ-glutamyl transpeptidase; HDL-C, high-density lipoprotein cholesterol; LDL-C, low-density lipoprotein cholesterol; NGSP, National Glycohemoglobin Standardization Program; Past, past history; PG, plasma glucose; PLT, platelet count; RBC, red blood cell count; SBP, systolic blood pressure; TC, total cholesterol; TG, triglycerides; UA, uric acid; WBC, white blood cell count; WC, waist circumference.
Feature Importance Ranking and Scores Calculated on the Basis of Area Under the Curve (AUC), Sensitivity, and Specificity
| Ranking | AUC | Score | Sensitivity | Score | Specificity | Score |
|---|---|---|---|---|---|---|
| 1 | Exam arrhythmia | 0.429 | Exam arrhythmia | 0.149 | Exam arrhythmia | 0.454 |
| 2 | Past CAD | 0.177 | Past CAD | 0.121 | Past CAD | 0.144 |
| 3 | Age | 0.073 | Age | 0.088 | Hematocrit | 0.051 |
| 4 | Hematocrit | 0.054 | Hematocrit | 0.041 | Age | 0.034 |
| 5 | γ-GTP | 0.030 | γ-GTP | 0.035 | Hemoglobin | 0.029 |
| 6 | Cr | 0.029 | Cr | 0.032 | ALT | 0.024 |
| 7 | Hemoglobin | 0.028 | Hemoglobin | 0.024 | SBP | 0.023 |
| 8 | SBP | 0.021 | HbA1c | 0.023 | Cr | 0.022 |
| 9 | ALT | 0.018 | UA | 0.020 | γ-GTP | 0.021 |
| 10 | HbA1c | 0.012 | TC | 0.019 | Exam no symptoms | 0.016 |
| 11 | TG | 0.011 | SBP | 0.018 | TG | 0.014 |
| 12 | TC | 0.010 | Past stroke | 0.018 | DBP | 0.011 |
| 13 | UA | 0.010 | AST | 0.016 | eGFR | 0.010 |
| 14 | AST | 0.009 | PLT | 0.016 | TC | 0.010 |
| 15 | DBP | 0.008 | HDL-C | 0.016 | AST | 0.009 |
| 16 | eGFR | 0.008 | eGFR | 0.015 | UA | 0.008 |
| 17 | Past stroke | 0.007 | UP (1+, 2+, 3+) | 0.015 | Sex | 0.008 |
| 18 | RBC | 0.006 | ALT | 0.015 | WC | 0.008 |
| 19 | PLT | 0.006 | RBC | 0.014 | LDL-C | 0.008 |
| 20 | UP (1+, 2+, 3+) | 0.006 | BMI | 0.014 | HbA1c | 0.007 |
| 21 | Exam no symptoms | 0.005 | Past hypertension | 0.014 | RBC | 0.006 |
| 22 | HDL-C | 0.005 | DBP | 0.013 | PLT | 0.006 |
| 23 | BMI | 0.004 | UG | 0.013 | UP (1+, 2+, 3+) | 0.006 |
| 24 | UP(−) | 0.004 | UP(±) | 0.013 | PG | 0.006 |
| 25 | LDL-C | 0.004 | TG | 0.013 | UP(−) | 0.006 |
| 26 | Past dyslipidemia | 0.004 | Past dyslipidemia | 0.012 | Past stroke | 0.005 |
| 27 | WC | 0.003 | UP(−) | 0.012 | Past diabetes | 0.005 |
| 28 | UP(±) | 0.003 | UOB | 0.012 | Past dyslipidemia | 0.005 |
| 29 | Sex | 0.003 | LDL-C | 0.011 | UOB | 0.004 |
| 30 | PG | 0.002 | WBC | 0.011 | UP(±) | 0.004 |
| 31 | UOB | 0.002 | Past anemia | 0.011 | BMI | 0.004 |
| 32 | Past diabetes | 0.002 | Exam anemia | 0.011 | HDL-C | 0.004 |
| 33 | UG | 0.001 | Exam cervical | 0.011 | Exam edema | 0.003 |
| 34 | Past hypertension | 0.001 | Exam crackles | 0.011 | Exam heart murmur | 0.002 |
| 35 | Exam edema | 0.001 | Exam malnutrition | 0.011 | WBC | 0.002 |
| 36 | Exam heart murmur | <0.001 | Exam jaundice | 0.011 | Exam others | 0.002 |
| 37 | Exam anemia | <0.001 | Exam neuropathy | 0.011 | Exam crackles | 0.002 |
| 38 | Exam crackles | <0.001 | Exam hepatomegaly | 0.011 | Exam anemia | 0.002 |
| 39 | WBC | <0.001 | Exam heart murmur | 0.011 | Exam jaundice | 0.002 |
| 40 | Past anemia | <0.001 | Past CKD | 0.010 | Exam neuropathy | 0.002 |
| 41 | Exam jaundice | <0.001 | Past diabetes | 0.010 | Exam malnutrition | 0.002 |
| 42 | Exam neuropathy | <0.001 | Exam edema | 0.010 | Past anemia | 0.002 |
| 43 | Exam others | <0.001 | Exam others | 0.010 | Exam cervical tumors | 0.002 |
| 44 | Exam malnutrition | <0.001 | Sex | 0.009 | Past CKD | 0.002 |
| 45 | Exam cervical tumors | <0.001 | WC | 0.009 | Exam hepatomegaly | 0.002 |
| 46 | Past CKD | <0.001 | PG | 0.009 | UG | 0.002 |
| 47 | Exam hepatomegaly | <0.001 | Exam no symptoms | <0.001 | Past hypertension | <0.001 |
UG, urinary glucose; UOB, urine occult blood; UP, urinary protein. Other abbreviations as in Table 1.
Figure 2.The area under the receiver operating characteristic (AUC) curve, with sensitivity and specificity, for the prediction of atrial fibrillation (AF) with the generalized additive model (GAM; blue line) and the generalized linear model (GLM; orange line).
Figure 3.The probability of atrial fibrillation (AF), as determined by the generalized additive model (GAM; red lines) or generalized linear model (GLM; blue lines), according to 9 clinical variables, namely: (A) arrhythmia on physical examination (Exam), (B) past coronary artery disease (CAD), (C) age, (D) hematocrit, (E) γ-glutamyl transpeptidase (γ-GTP), (F) creatinine, (G) hemoglobin, (H) systolic blood pressure (SBP), and (I) HbA1c. Arrhythmia on Exam (A) and past CAD (B) are regarded as binary variables, with trends shown as the mean±SD. The remaining parameters (C–I) are regarded as continuous variables, with trends indicated by solid lines. The histograms show data distribution for each parameter (lighter shading indicating subjects with a normal electrocardiogram [ECG] and darker shading indicating AF subjects). The left axes show the AF risk transformed by the sigmoid function, and the right axes show data distribution. The closer the AF risk is to 1, the higher the likelihood of subclinical AF; conversely, the closer the risk is to 0, the less likely the parameter is associated with AF. Trends within a small distribution of data may be unreliable. In the GAM, suspicion of arrhythmia on Exam or having a medical history of CAD significantly increases the probability of AF compared with no arrhythmia on Exam and no medical history of CAD, respectively. The probability of AF increased with increasing age and hematocrit. The relationship between the probability of AF and γ-GTP, creatinine, and HbA1c values was almost parallel between the GAM and GLM, whereas there was a steady downward trend in the relationship between AF probability and increasing hemoglobin and SBP.