| Literature DB >> 34255684 |
Shigeo Muro1, Masato Ishida2, Yoshiharu Horie3, Wataru Takeuchi4, Shunki Nakagawa4, Hideyuki Ban4, Tohru Nakagawa5, Tetsuhisa Kitamura6.
Abstract
BACKGROUND: Airflow limitation is a critical physiological feature in chronic obstructive pulmonary disease (COPD), for which long-term exposure to noxious substances, including tobacco smoke, is an established risk. However, not all long-term smokers develop COPD, meaning that other risk factors exist.Entities:
Keywords: Gradient Boosting Decision Tree; airflow limitation; chronic obstructive pulmonary disease; logistic regression; medical check-up
Year: 2021 PMID: 34255684 PMCID: PMC8293159 DOI: 10.2196/24796
Source DB: PubMed Journal: JMIR Med Inform
Figure 1Flow diagram of the study. COPD: chronic obstructive pulmonary disease.
Subject characteristics stratified by chronic obstructive pulmonary disease status.
| Characteristic | Non-COPDa (n=23,326) | COPD (n=1489) | ||
| Age (years), mean (SD) | 42 (9.1) | 48 (9.3) | <.001 | |
| Female, n (%) | 3841 (16.5%) | 58 (3.9%) | <.001 | |
|
|
|
| <.001 | |
|
| Current smoker | 10,632 (45.6%) | 1,021 (68.6%) |
|
|
| Exsmoker | 3534 (15.2%) | 202 (13.6%) |
|
|
| Nonsmoker | 9153 (39.3%) | 266 (17.9%) |
|
|
| Unknown/missing | 7 (0.0%) | 0 (0.0%) |
|
| BMI (kg/m2), mean (SD) | 23 (3.2) | 22 (2.7) | <.001 | |
|
|
|
|
| |
|
| Prebronchodilator FEV1b | 3.4 (0.7) | 3.1 (0.6) | <.001 |
|
| Prebronchodilator FVCc | 4.1 (0.8) | 4.2 (0.8) | <.001 |
|
| Prebronchodilator FEV1/FVC | 83.7 (5.4) | 74.9 (5.1) | <.001 |
|
|
|
|
| |
|
| Arrythmia | 107 (0.5%) | 16 (1.1%) | .003 |
|
| Duodenal ulcer | 158 (0.7%) | 19 (1.3%) | .02 |
|
| Colorectal polyp | 43 (0.2%) | 13 (0.9%) | <.001 |
|
| Angina | 56 (0.2%) | 10 (0.7%) | .006 |
|
| Stomach ulcer | 180 (0.8%) | 29 (1.9%) | <.001 |
|
| Kidney disease | 77 (0.3%) | 12 (0.8%) | .01 |
|
|
|
|
| |
|
| Bulla, bleb | 108 (0.5%) | 31 (2.1%) | <.001 |
|
| Moderate emphysema | 18 (0.1%) | 13 (0.9%) | <.001 |
|
| Mild emphysema | 96 (0.4%) | 27 (1.8%) | <.001 |
|
| Calcification of left anterior descending coronary artery | 128 (0.5%) | 16 (1.1%) | .02 |
|
| Chronic inflammation | 342 (1.5%) | 43 (2.9%) | <.001 |
|
|
|
|
| |
|
| Albumin (U/L) | 4.4 (0.2) | 4.3 (0.2) | <.001 |
|
| Alanine aminotransferase (U/L) | 209.5 (53.7) | 215.2 (54.6) | <.001 |
|
| Aspartate aminotransferase (U/L) | 26.4 (14.8) | 24.3 (12.8) | <.001 |
|
| Blood urea nitrogen (mg/dL) | 14.1 (3.2) | 14.7 (3.3) | <.001 |
|
| Cholinesterase (U/L) | 320.8 (60.1) | 307.8 (58.8) | <.001 |
|
| Estimated glomerular filtration rate (mL/min/1.73 m2) | 83.6 (14.6) | 80.2 (14.1) | <.001 |
|
| Eosinophil count (cells/mm3) | 183.2 (124.5) | 195.4 (125.9) | <.001 |
|
| Gamma-glutamyl transferase (U/L) | 42.7 (34.4) | 45.8 (34.5) | <.001 |
|
| Hemoglobin (g/dL) | 14.7 (1.4) | 14.9 (1.1) | <.001 |
|
| Hemoglobin A1c (%) | 5.3 (0.7) | 5.4 (0.7) | <.001 |
|
| Hematocrit (%) | 44.0 (3.6) | 44.6 (3.1) | <.001 |
|
| MCHd (pg) | 30.5 (1.8) | 31.1 (1.7) | <.001 |
|
| MCHCe (g/L) | 33.4 (1.0) | 33.3 (0.8) | <.001 |
|
| MCVf (fL) | 91.3 (4.6) | 93.4 (4.4) | <.001 |
|
| WBCg count (×102 cells/µL) | 58.8 (15.0) | 62.8 (15.5) | <.001 |
aCOPD: chronic obstructive pulmonary disease.
bFEV1: forced expiratory volume in 1 second.
cFVC: forced vital capacity.
dMCH: mean corpuscular hemoglobin.
eMCHC: mean corpuscular hemoglobin concentration.
fMCV: mean corpuscular volume.
gWBC: white blood cell.
Figure 2Diagnostic age for chronic obstructive pulmonary disease (COPD) according to smoking status.
Comparison of performance of the Gradient Boosting Decision Tree machine learning (XGBoost) and logistic regression models.
| Variable | XGBoosta model | Logistic regression model | |||
| Training, mean (SE) | Test, mean | Training, mean (SE) | Test, mean | ||
| Positive predictive value | 0.505 (0.099) | 0.362 | 0.441 (0.110) | 0.285 | |
| AUCb | 0.956 (0.015) | 0.898 | 0.943 (0.022) | 0.892 | |
| Accuracy | 0.917 (0.032) | 0.918 | 0.884 (0.049) | 0.883 | |
| Sensitivity | 0.845 (0.021) | 0.877 | 0.874 (0.039) | 0.901 | |
| Specificity | 0.960 (0.016) | 0.919 | 0.946 (0.025) | 0.882 | |
| F-measure | 0.370 (0.107) | 0.513 | 0.306 (0.110) | 0.434 | |
aXGBoost: Gradient Boosting Decision Tree machine learning.
bAUC: area under the receiver operating characteristic curve.
Importance of each predictor in the XGBoost model.
| Variable | Importance value |
| Forced expiratory volume in 1 second/forced vital capacity | 0.2824 |
| Smoking status | 0.0329 |
| Allergic symptoms (yes/no) | 0.0303 |
| Symptom-cough (yes/no) | 0.0294 |
| Smoking-pack year | 0.0222 |
| Hemoglobin A1c | 0.0197 |
| Albumin | 0.0195 |
| Mean corpuscular volume | 0.0177 |
| %Vital capacity | 0.0165 |
| %Forced expiratory volume in 1 second | 0.0164 |
| Treatment with an antidiabetic drug (yes/no) | 0.0162 |
| Allergic disease (yes/no) | 0.0146 |
| Hematocrit | 0.0144 |
| Urinary red blood cells | 0.0143 |
| Hemoglobin | 0.0138 |
| Age | 0.0128 |
| Smoking duration | 0.0127 |
| High density lipoprotein cholesterol | 0.0123 |
| Mean corpuscular hemoglobin concentration | 0.0122 |
| Total protein | 0.0118 |
| BMI | 0.0118 |
| Number of eosinophils | 0.0115 |
| Mean corpuscular hemoglobin | 0.0114 |
| Serum white blood cells | 0.0111 |
| Fasting blood sugar | 0.0110 |
| Serum alanine aminotransferase | 0.0108 |
| Pulse rate | 0.0108 |
| Forced expiratory volume in 1 second | 0.0107 |
| Urinary white blood cells | 0.0104 |
| Diastolic blood pressure | 0.0103 |