| Literature DB >> 31652290 |
Andreas Leha1, Kristian Hellenkamp2, Bernhard Unsöld3, Sitali Mushemi-Blake4, Ajay M Shah4, Gerd Hasenfuß2,5, Tim Seidler2,5.
Abstract
BACKGROUND: Machine learning (ML) is a powerful tool for identifying and structuring several informative variables for predictive tasks. Here, we investigated how ML algorithms may assist in echocardiographic pulmonary hypertension (PH) prediction, where current guidelines recommend integrating several echocardiographic parameters.Entities:
Mesh:
Year: 2019 PMID: 31652290 PMCID: PMC6814224 DOI: 10.1371/journal.pone.0224453
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Characterization of patients.
| parameter | level | noPH | PH | p value |
|---|---|---|---|---|
| n | 22 | 68 | ||
| age (years) | ||||
| mean ± sd | 54 ± 19 | 68 ± 14 | ||
| median(min; max) | 55(21; 83) | 74(22; 88) | ||
| missing | 0 | 0 | ||
| sex | 1.0 | |||
| m | 8(50.0%) | 32(47.1%) | ||
| w | 8(50.0%) | 36(52.9%) | ||
| missing | 6 | 0 | ||
| BMI (kg/m2) | 0.15 | |||
| mean ±sd | 25 ±3.6 | 26 ±3.9 | ||
| median (min; max) | 26(18;29) | 26 (19;38) | ||
| missing | 7 | 2 | ||
| BSA (m2) | 0.3 | |||
| mean ±sd | 1.9 ±0.26 | 1.9 ±0.23 | ||
| median(min;max) | 2(1.4;2.3) | 1.9(1.4;2.5) | ||
| missing | 7 | 2 | ||
| IVSd (mm) | 0.7 | |||
| mean ±sd | 12 ±3.3 | 12 ±2.7 | ||
| median (min; max) | 12(8;22) | 12(7;21) | ||
| missing | 6 | 3 | ||
| LVEDD (mm) | 0.25 | |||
| mean ±sd | 46 ±8.2 | 49 ±12 | ||
| median (min;max) | 46(29;62) | 47(26;78) | ||
| missing | 6 | 3 | ||
| PW (mm) | 0.81 | |||
| mean ±sd | 12 ±3.2 | 12 ±2.8 | ||
| median (min;max) | 12 (7;22) | 12 (8;24) | ||
| missing | 6 | 3 | ||
| LAD (mm) | 0.59 | |||
| mean ±sd | 41 ±9.7 | 43 ±10 | ||
| median (min; max) | 40 (29;55) | 46 (23;68) | ||
| missing | 16 | 35 | ||
| EF (%) | 0.12 | |||
| mean ±sd | 52 ±17 | 46 ±14 | ||
| median (min; max) | 55 (10;80) | 53 (10;66) | ||
| missing | 0 | 3 | ||
| RVD1 (mm) | ||||
| mean ±sd | 41 ±7.6 | 48 ±8.7 | ||
| median (min; max) | 42 (25;51) | 49 (26;66) | ||
| missing | 10 | 18 | ||
| RVD2 (mm) | ||||
| mean ±sd | 31 ±4.8 | 37 ±9.6 | ||
| median (min; max) | 31 (22;38) | 36 (18;63) | ||
| missing | 10 | 18 | ||
| RVD3 (mm) | 0.36 | |||
| mean ±sd | 72 ±9.7 | 75 ±12 | ||
| median (min; max) | 72 (58;90) | 74 (48;100) | ||
| missing | 10 | 21 | ||
| RVenlargement (0/1) | 0.72 | |||
| 0 | 4 (33.3%) | 13 (26.0%) | ||
| 1 | 8 (66.7%) | 37 (74.0%) | ||
| missing | 10 | 18 | ||
| TAPSE (mm) | ||||
| mean ± sd | 21 ±5.2 | 17 ±4.6 | ||
| median (min; max) | 22 (12;33) | 17 (8;29) | ||
| missing | 6 | 11 | ||
| RAPestimated (mmHg) | ||||
| mean ±sd | 5.4 ±3.2 | 8.9 ±5.4 | ||
| median (min; max) | 3 (3;15) | 8 (3;15) | ||
| missing | 0 | 0 | ||
| RAP≥15 (0/1) | ||||
| 0 | 21 (95.5%) | 41 (60.3%) | ||
| 1 | 1 (4.5%) | 27 (39.7%) | ||
| missing | 0 | 0 | ||
| AS (0/1/2/3) | 0.73 | |||
| mean ±sd | 0.47 ±0.99 | 0.33 ±0.81 | ||
| median (min; max) | 0 (0;3) | 0 (0;3) | ||
| missing | 7 | 2 | ||
| AR (0/1/2/3) | 0.38 | |||
| mean ±sd | 0.4 ±0.74 | 0.39 ±0.6 | ||
| median (min; max) | 0 (0;2) | 0 (0;2) | ||
| missing | 7 | 2 | ||
| MS (0/1/2/3) | 0.32 | |||
| mean ± sd | 0 ±0 | 0.17 ±0.45 | ||
| median (min; max) | 0 (0;0) | 0 (0;2) | ||
| missing | 7 | 2 | ||
| MR (0/1/2/3) | 0.85 | |||
| mean ± sd | 0.87 ±0.99 | 1.1 ±0.98 | ||
| median (min; max) | 1 (0;3) | 1 (0;3) | ||
| missing | 7 | 1 | ||
| TR (0/1/2/3) | 0.21 | |||
| mean ±sd | 1.3 ±1.1 | 1.7 ±0.89 | ||
| median (min; max) | 1 (0;3) | 2 (0;3) | ||
| missing | 7 | 2 | ||
| TRVmax (m/s) | ||||
| mean ±sd | 2.7 ±0.6 | 3.4 ±0.84 | ||
| median (min; max) | 2.6 (1.8;3.9) | 3.3 (1.6;5.5) | ||
| missing | 1 | 2 | ||
| TRVm (m/s) | ||||
| mean ±sd | 1.9 ±0.43 | 2.5 ±0.59 | ||
| median (min; max) | 1.9 (1.3;2.9) | 2.4 (1.3;3.9) | ||
| missing | 1 | 2 | ||
| PVAT (ms) | 0.1 | |||
| mean ±sd | 102 ±26 | 87 ±21 | ||
| median (min; max) | 100 (60;150) | 83 (56;141) | ||
| missing | 11 | 33 | ||
| TRPm (mmHg) | ||||
| mean ±sd | 15 ±6.8 | 26 ±12 | ||
| median (min; max) | 14 (6.5;32) | 23 (6.6;61) | ||
| missing | 1 | 2 | ||
| WHO classification | ||||
| 0: no PH | 21 (95.5%) | 0 (0.0%) | ||
| 1: PAH | 0 (0.0%) | 6 (8.8%) | ||
| 2: due to LH-Disease | 1 (4.5%) | 48 (70.6%) | ||
| 3: due to lung diseae | 0 (0.0%) | 2 (2.9%) | ||
| 4: CTEPH | 0 (0.0%) | 0 (0.0%) | ||
| 5: unknown / multifactorial | 0 (0.0%) | 12 (17.6%) | ||
| Incident Case | ||||
| 0: known PH or pre-evaluated patient | 0 (0.0%) | 13 (19.1%) | ||
| 1: new evaluation | 22 (100.0%) | 55 (80.9%) |
The cohort of the available 90 patients grouped into 68 patients with confirmed (by means of RHC) PH and 22 patients without PH. This table shows descriptive values for four basic characteristics age, sex, BMI and body surface area (BSA) as well as for 23 echocardiographic measurements and the WHO classification. The last column contains p values from comparisons between the two patient subgroups. t test and χ2 test were used as appropriate.
Fig 1Data overview.
(A) Heatmap of the 27 variables (columns) across the 90 patients (rows). The variables are studentized. Both patients and variables are re-ordered by hierarchical clustering. The color bar at the right shows patients with PH (pink) and without PH (blue). (B)-(D) Results from a factor analysis for mixed data (FAMD). (B) The first two dimensions explaining the largest parts of the variance in the data. Each dot represents one patient, where patients with PH are shown in pink, patients without confirmed PH are shown in blue. (C) Contribution of each of the variables to the first two dimensions of the FAMD. (D) Percentage of variance for the first five dimensions in the FAMD.
Prediction performance measures.
| method | AUC | ACC | sensitivity | specificity | PPV | NPV |
|---|---|---|---|---|---|---|
| Aduen et al. | 0.87 [0.78; 0.96] | 0.85 | 0.86 | 0.86 | 0.93 | 0.65 |
| random forest of regression trees with Aduen et al. | 0.89 [0.81; 0.98] | 0.84 | 0.95 | 0.52 | 0.87 | 0.87 |
| random forest of regression trees | 0.87 [0.78; 0.96] | 0.83 | 0.89 | 0.67 | 0.89 | 0.66 |
| random forest of classification trees | 0.85 [0.75; 0.95] | 0.85 | 0.9 | 0.67 | 0.9 | 0.7 |
| lasso penalized logistic regression | 0.78 [0.67; 0.89] | 0.8 | 0.93 | 0.4 | 0.83 | 0.65 |
| boosted C5.0 | 0.80 [0.68; 0.92] | 0.82 | 0.9 | 0.58 | 0.87 | 0.64 |
| SVM | 0.83 [0.73; 0.93] | 0.84 | 0.95 | 0.49 | 0.85 | 0.76 |
Prediction performance is assessed using a 10 times repeated 3-fold CV and is measured using the AUC. The first column gives the AUC for the ML algorithms under consideration as well as the established method by Aduen et al. together with the 95% confidence interval according to DeLong. At the Youden index the accuracy, sensitivity, positive predictive value, and negative predictive value are evaluated additionally.
Fig 2Classification performance.
Random forest of regression trees shows performance comparable to the best of several established PH prediction methods by Aduen et al. (A) Area under the ROC curve (AUC) for all methods with estimated 95% confidence intervals. (B) Consensus ROC curves of the five machine learning algorithms under consideration as well as the ROC curve of the method by Aduen et al. (light blue).
Fig 3Best performing Machine Learning Method: Random forest of regression trees.
The random forest of regression trees performed best among the 5 machine learning methods under consideration and achieves performance levels comparable to the prediction by Aduen et al., the best of several established prediction methods. (A) Invasively measured PAPm (y-axis) in comparison to the predictions (x-axis). Displayed are predictions by a random forest of regression trees (blue), predictions by the combination of the random forest of regression trees and the method of Aduen et al. (purple), and predictions by the method of Aduen et al. (pink). The lines show a linear fit with confidence bands (gray shades). The plot shows the predictions from the first repetition of the CI The text annotation gives Pearson's correlation coefficients with 95% confidence intervals. For the ML method these are average values across all CV repeats. (B) Variable importance for the random forest of regression trees.
Fig 4Overview procedure and main results.
The data set comprises measurements of 68 patients with confirmed PH and 22 patients without PH. Four socio-demographic and 21 echocardiographic variables were measured. Six patients were dropped due to the high degree of missingness. As reference the formula by Aduen et al. was evaluated. Five ML methods were applied and evaluated using a 10 times repeated 3-fold CV scheme. Two ML methods required an imputation as pre-processing step within each fold of the CV. The predictions of the random forest of regression trees have additionally been combined with the predictions by Aduen et al.