| Literature DB >> 30804372 |
Da Huo1,2, Bo Kou3,4, Zhili Zhou1,5, Ming Lv1.
Abstract
Aortic dissection is one of the most clinical-challenging and life-threatening cardiovascular diseases associated with high morbidity and mortality. Aortic dissection requires fast diagnosis and timely therapy. Any delay or misdiagnosis can cause severe consequence to aortic dissection patients with even higher mortality. To better help physicians identify the potential dissection within the scope of all misdiagnosed patients, this paper describes a method which is developed with data mining methods for aortic dissection patient classification and prediction in the phase of early diagnosis. Various machine learning algorithms were used to build the models which were all trained and tested on the patient dataset with cross validation. Among them, Bayesian Network model achieved the best performance by predicting at a precision rate of 84.55% with Area Under the Curve (AUC) value of 0.857. On this basis, the Bayesian Network model can help physicians better with early diagnosis of aortic dissection in clinical practice. Beyond this study, more data from diverse regions and the internal pathology can be crucial to further build a universal model with broader predictive power.Entities:
Year: 2019 PMID: 30804372 PMCID: PMC6389887 DOI: 10.1038/s41598-019-39066-9
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Stacked bar chart of selected patient nominal attributes. (A) Nominal attributes of aortic dissection positive patients; (B) nominal attributes of aortic dissection negative patients. Different colours in one column represent the possible nominal values only in that particular attribute. Same colour across different columns has no necessary connection or implying same value interval.
Patients characteristics with selected nominal attributes.
| Attributes | No. Positive | % Positive | No. Negative | % Negative |
|---|---|---|---|---|
|
| ||||
| Male | 255 | 77.3% | 116 | 71.6% |
| Female | 75 | 22.7% | 46 | 28.4% |
|
| ||||
| Unmarried | 8 | 2.4% | 4 | 2.5% |
| Married | 314 | 95.2% | 157 | 96.9% |
| Widowed | 6 | 1.8% | 1 | 0.6% |
| Divorced | 2 | 0.6% | 0 | 0% |
|
| ||||
| Emergency | 168 | 50.9% | 40 | 24.7% |
| Outpatient | 159 | 48.2% | 120 | 74.1% |
| Transferred-in | 3 | 0.9% | 2 | 1.2% |
Figure 2Grouped box chart of selected patient numerical attributes. Y: aortic dissection positive patients. N: aortic dissection negative patients. For each group, the boxes are in the order of CHOL, TG, Lp(a), HDL-C, APOB, APOE, WBC, LYMPH%, #MONO, RBC and RDW-CV from left to right.
Patients characteristics with selected numeric attributes. Each attribute is showed separately by the class label of aortic dissection positive or aortic dissection negative. P: aortic dissection positive; N: aortic dissection negative. The onset-day air temperature interval and weather condition which shows in (figure 1), was initially involved to see if there was a high connection between the natural environment and an aortic dissection attack.
| Variables | Minimum | Maximum | Mean | STD | |
|---|---|---|---|---|---|
|
| |||||
| Age, year | P | 20 | 87 | 54.0 | 13.1 |
| N | 21 | 87 | 57.6 | 14.4 | |
| Male, n (%) | P | N/A | N/A | 255 (77.04%) | N/A |
| N | N/A | N/A | 116 (71.60%) | N/A | |
| Low Air Temp, °C | P | −12 | 28 | 9.3 | 9.2 |
| N | −5 | 27 | 12.3 | 8.8 | |
| High Air Temp, °C | P | −2 | 37 | 18.4 | 9.9 |
| N | 1 | 38 | 21.3 | 9.8 | |
|
| |||||
| CHOI, mmol/L | P | 1.95 | 7.34 | 3.46 | 0.83 |
| N | 2.14 | 7.59 | 3.57 | 1.02 | |
| TG, mmol/L | P | 0.25 | 7.40 | 1.28 | 0.77 |
| N | 0.40 | 8.84 | 1.53 | 0.95 | |
| Lp(a), mg/L | P | 20.00 | 887.30 | 166.67 | 127.01 |
| N | 20.60 | 660.00 | 194.41 | 144.66 | |
| HDL-C, mmol/L | P | 0.32 | 1.95 | 0.97 | 0.27 |
| N | 0.43 | 1.92 | 0.90 | 0.23 | |
| APOB, g/L | P | 0.28 | 1.55 | 0.67 | 0.18 |
| N | 0.32 | 1.50 | 0.71 | 0.23 | |
| APOE, mg/L | P | 11.00 | 115.50 | 33.39 | 12.56 |
| N | 13.20 | 139.80 | 34.99 | 14.30 | |
|
| |||||
| WBC, 109/L | P | 2.05 | 26.16 | 10.12 | 4.14 |
| N | 2.00 | 24.14 | 7.24 | 3.08 | |
| LYMPH%, % | P | 2.00 | 57.10 | 14.89 | 9.38 |
| N | 1.80 | 54.51 | 21.09 | 11.12 | |
| MONO#, 109/L | P | 0.00 | 4.60 | 0.62 | 0.41 |
| N | 1.83 | 6.22 | 4.05 | 0.83 | |
| RBC, 1012/L | P | 2.38 | 6.08 | 4.17 | 0.71 |
| N | 0.01 | 1.50 | 0.43 | 0.23 | |
| RDW-CV, % | P | 11.40 | 21.80 | 13.37 | 1.11 |
| N | 11.70 | 23.00 | 13.87 | 1.60 | |
Figure 3ROC (Receiver Operating Characteristic) curves show the performance of models using different machine learning algorithms. BN: Bayesian Network; NB: Naïve Bayes; J48: Java implementation of the C4.5 decision tree algorithm; SMO: Sequential Minimal Optimization. Bayesian Network classifier achieved the maximum Area Under ROC Curve (AUC) as 0.8571.
Model performance on emergent patients.
| Classifier | Precision | AUC |
|---|---|---|
| Benchmark | 80.77% | 0.4952 |
| Bayesian Network | 86.54% | 0.8138 |
| Naïve Bayes | 86.53% | 0.8057 |
| SMO | 84.13% | 0.6256 |
| J48 | 81.73% | 0.5905 |