| Literature DB >> 33437777 |
Lijue Liu1,2, Shiyang Tan1, Yi Li1,2, Jingmin Luo3, Wei Zhang3, Shihao Li1.
Abstract
BACKGROUND: As a particularly dangerous and rare cardiovascular disease, aortic dissection (AD) is characterized by complex and diverse symptoms and signs. In the early stage, the rate of misdiagnosis and missed diagnosis is relatively high. This study aimed to use machine learning technology to establish a fast and accurate screening model that requires only patients' routine examination data as input to obtain predictive results.Entities:
Keywords: Aortic dissection (AD); early screening; ensemble learning; machine learning
Year: 2020 PMID: 33437777 PMCID: PMC7791246 DOI: 10.21037/atm-20-1475
Source DB: PubMed Journal: Ann Transl Med ISSN: 2305-5839
Figure 1The specific structure of the XGBF model. This is the structure of XGBF model. In the algorithm, P represents positive (the minority class) dataset, which is the patient set; N represents negative (the majority class) dataset, which is the non-patient set; and T represents the number of XGBoost classifiers. The training data entered into each XGBoost classifier were composed with some undersampled majority data and oversampled minority data. In the XGBF algorithm, the oversampling operations were performed on the minority class, which included duplicating and smote. The minority samples were strengthened by these two methods, so the learning model could get more information from the minority samples. The undersampling operation was performed on the majority class so that the distribution of each kind of sample was balanced. Finally, each weak subclassifier was enhanced by ensemble methods to achieve better classification results.
Confusion matrix
| Predicted positive class | Predicted negative class | |
|---|---|---|
| Actual positive class | TP (True positive) | FN (False negative) |
| Actual negative class | FP (False positive) | TN (True negative) |
t-test statistical table of features
| Features | AD patient(N=802) | Non-AD patient (N=52,411) | χ2/t | P |
|---|---|---|---|---|
| Age | 55.57±12.90 | 62.56±13.06 | 15.03 | <0.001 |
| Sex | 574 (71.57%) | 29,994 (57.23%) | 66.47 | <0.001 |
| Chest pain | 206 (25.79%) | 9,460 (18.05%) | 30.99 | <0.001 |
| Stomachache | 66 (8.23%) | 2,996 (5.72%) | 9.2 | 0.002 |
| Heart disease | 63 (7.86%) | 6,106 (11.65%) | 11.1 | 0.001 |
| Dizziness and headache | 62 (7.73%) | 7,803 (14.89%) | 32.13 | <0.001 |
| Aortic valve area murmur | 23 (2.87%) | 377 (0.72%) | 48.88 | <0.001 |
| Family history of hypertension | 92 (11.47%) | 4,798 (9.15%) | 5.08 | 0.024 |
| Chest trauma history | 11 (1.37%) | 206 (0.39%) | 18.62 | <0.001 |
| Smoking and duration | 10.22±14.39 | 7.34±13.88 | −5.63 | <0.001 |
| Hypertension | 530 (66.08%) | 31,571 (60.24%) | 11.29 | 0.001 |
| Hypertension and duration | 6.01±6.47 | 6.10±7.09 | 0.36 | 0.72 |
| Diabetes | 88 (10.97%) | 11,910 (22.72%) | 62.47 | <0.001 |
| Diabetes and duration | 0.85±2.87 | 1.82±3.83 | 9.4 | <0.001 |
| Heart rate | 81.74±13.87 | 78.73±14.20 | −6.1 | <0.001 |
| Systolic pressure | 142.41±26.71 | 136.86±21.90 | −5.85 | <0.001 |
| Diastolic pressure | 83.20±16.59 | 80.46±13.01 | −4.66 | <0.001 |
| HGB | 119.76±21.57 | 119.95±22.47 | 0.24 | 0.814 |
| NEUT | 7.16±4.08 | 4.79±3.47 | −16.35 | <0.001 |
| NEUT% | 72.83±10.79 | 65.30±12.09 | −19.59 | <0.001 |
| LYMPH% | 16.94±9.10 | 24.22±10.44 | 22.43 | <0.001 |
| LYMPH | 1.36±0.60 | 1.57±2.03 | 9.07 | <0.001 |
| MCV | 91.84±6.82 | 92.10±7.17 | 1.09 | 0.275 |
| MPV | 8.93±1.39 | 9.36±1.58 | 8.6 | <0.001 |
| TP | 64.58±7.06 | 65.43±8.04 | 3.41 | 0.001 |
| ALB | 37.08±5.67 | 38.61±6.26 | 7.6 | <0.001 |
| GLO | 27.57±5.19 | 26.94±5.32 | −3.39 | 0.001 |
| A/G | 1.40±0.36 | 1.49±0.37 | 6.77 | <0.001 |
| TBIL | 16.19±21.62 | 13.20±26.81 | −3.86 | <0.001 |
| DBIL | 6.65±11.52 | 5.39±13.53 | −3.07 | 0.002 |
| TBA | 6.22±13.32 | 7.55±15.04 | 2.49 | 0.013 |
| ALT | 66.50±296.27 | 32.47±108.73 | −3.25 | 0.001 |
| AST | 85.34±510.27 | 36.33±155.39 | −2.72 | 0.007 |
| CRE | 136.87±156.07 | 138.98±213.75 | 0.28 | 0.781 |
| GSP | 2.25±0.62 | 2.03±0.73 | −9.79 | <0.001 |
| CHO | 4.33±0.43 | 4.37±0.55 | 2.19 | 0.029 |
| HDL | 1.12±0.17 | 1.12±0.17 | 0.28 | 0.782 |
| LDL | 2.60±0.35 | 2.63±0.46 | 1.98 | 0.048 |
| LDH | 322.03±684.10 | 236.51±283.48 | −3.54 | <0.001 |
| CK | 538.04±5272.64 | 162.57±567.45 | −2.02 | 0.044 |
| CKMB | 35.93±299.32 | 19.33±33.08 | −1.57 | 0.117 |
| MB | 72.69±84.95 | 57.60±59.02 | −5.01 | <0.001 |
| K | 3.83±0.56 | 3.97±0.52 | 7.52 | <0.001 |
| Na | 139.37±4.28 | 140.71±3.79 | 8.77 | <0.001 |
| Cl | 101.08±4.95 | 102.59±4.62 | 8.56 | <0.001 |
| CO2 | 23.14±3.21 | 23.20±3.65 | 0.51 | 0.609 |
| AG | 15.23±3.68 | 14.95±3.35 | −2.38 | 0.017 |
| Ca | 2.16±0.16 | 2.21±0.18 | 8.88 | <0.001 |
| P | 1.19±0.39 | 1.19±0.34 | 0.55 | 0.581 |
| Mg | 0.90±0.13 | 0.89±0.13 | −2.38 | 0.017 |
| ESR | 34.94±10.81 | 35.40±13.92 | 1.17 | 0.241 |
| PT% | 99.83±18.59 | 106.62±17.36 | 10.28 | <0.001 |
| INR | 1.06±0.39 | 1.01±0.28 | −3.91 | <0.001 |
| APTT | 37.66±11.29 | 35.54±9.68 | −5.29 | <0.001 |
| FIB | 4.44±1.81 | 3.77±1.22 | −10.45 | <0.001 |
| D-Dimer | 1.37±1.94 | 0.97±1.27 | −5.49 | <0.001 |
| PLG | 252.01±24.57 | 255.86±27.68 | 4.4 | <0.001 |
| TT | 18.92±14.17 | 19.11±12.91 | 0.4 | 0.691 |
| PT | 13.57±4.39 | 13.02±3.06 | −3.58 | <0.001 |
| ATAG | 271.19±17.23 | 271.18±21.52 | −0.01 | 0.99 |
| FT3 | 3.91±0.44 | 3.96±1.15 | 3.33 | 0.001 |
| TSH | 3.35±2.51 | 3.41±3.74 | 0.672 | 0.502 |
NEUT, neutrophils; LYMPH, lymphocytes; MCV, mean corpuscular volume; MPV, mean platelet volume; TP, total protein; ALB, albumin; GLO, globulin; A/G, ALB/GLO ratio; TBIL, total bilirubin; DBIL, direct bilirubin; TBA, total bile acid; ALT, alanine aminotransferase; AST, aspartate aminotransferase; CRE, creatinine; GSP, glycosylated serum protein; CHO, cholesterol; HDL, high-density lipoprotein; LDL, low-density lipoprotein; LDH, lactate dehydrogenase; CK, creatine kinase; ESR, erythrocyte sedimentation rate; PT, prothrombin time; INR, international normalised ratio; APTT, activated partial thromboplastin time; FIB, fibrinogen; TT, thrombin time; PT, prothrombin time; ATAG, antithrombin III antigen; FT3, free triiodothyronine-T3; TSH, thyroid-stimulating hormone.
Confusion matrix for seven-fold cross validation using AdaBoosta
| Predicted positive class | Predicted negative class | |
|---|---|---|
| Actual positive class | 18 | 97 |
| Actual negative class | 15 | 7,472 |
a, in AdaBoost, the number of iterations is 400.
Confusion matrix for seven-fold cross validation using XGBoosta
| Predicted positive class | Predicted negative class | |
|---|---|---|
| Actual positive class | 18 | 97 |
| Actual negative class | 6 | 7,481 |
a, in XGBoost, the parameter of depth is 4.
Confusion matrix for seven-fold cross validation using SmoteBagginga
| Predicted positive class | Predicted negative class | |
|---|---|---|
| Actual positive class | 89 | 26 |
| Actual negative class | 1,557 | 5,930 |
a, in SmoteBagging, the number of base classifiers is 100.
Confusion matrix for seven-fold cross validation using EasyEnsemblea
| Predicted positive class | Predicted negative class | |
|---|---|---|
| Actual positive class | 89 | 26 |
| Actual negative class | 1,550 | 5,937 |
a, in EasyEnsemble, the number of base classifiers is 40.
Confusion matrix for seven-fold cross validation using XGBFa
| Predicted positive class | Predicted negative class | |
|---|---|---|
| Actual positive class | 92 | 23 |
| Actual negative class | 1,535 | 5,952 |
a, in XGBF, the number of XGboost is 40; m is 1.5; t is 2; n is 2, and k is 5.
Seven-fold cross-validation average of five comparison methods
| Sensitivity | Specificity | Time (s) | |
|---|---|---|---|
| AdaBoost | 16.1% | 99.8% | 9 |
| XGBoost | 15.7% | 99.9% | 1 |
| SmoteBagging | 78.0% | 79.2% | 1,873 |
| EasyEnsemble | 77.8% | 79.3% | 98 |
| XGBF | 80.5% | 79.5% | 117 |