| Literature DB >> 34116660 |
Jingyi Zhang1, Huolan Zhu2, Yongkai Chen1, Chenguang Yang2, Huimin Cheng1, Yi Li2, Wenxuan Zhong3, Fang Wang4.
Abstract
BACKGROUND: Extensive clinical evidence suggests that a preventive screening of coronary heart disease (CHD) at an earlier stage can greatly reduce the mortality rate. We use 64 two-dimensional speckle tracking echocardiography (2D-STE) features and seven clinical features to predict whether one has CHD.Entities:
Keywords: Classification; Coronary heart disease; Ensemble learning; Machine learning; Speckle tracking echocardiography
Mesh:
Year: 2021 PMID: 34116660 PMCID: PMC8196502 DOI: 10.1186/s12911-021-01535-5
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Features chosen to be predictors in CHD prediction model
| Peak systolic strain (PSS) | 17 segments | |
| Longitudinal strain | Rate of systolic strain (SSR) | 17 segments |
| (mid-layer) | Time-to-peak (TP) | 17 segments |
| Mitral valve level (MV) | 3 layers (ENDO/MID/EPI) | |
| Global strain (GS) | Papillary muscle level (PM) | 3 layers |
| for radio | Apical level (AP) | 3 layers |
| Global longitudinal peak strain (GLPS) | 3 layers (ENDO/MID/EPI) | |
| Peak standard deviation (PSD) | ||
| Age (integer) | ||
| Gender (M/F) | ||
| Hypertension (Y/N) | ||
| Diabetes (Y/N) | ||
| Hyperlipemia (Y/N) | ||
| Smoke (Y/N) | ||
| Family history (Y/N) | ||
Summary of clinical characteristics of the subjects
| CHD positive ( | CHD negative ( | |
|---|---|---|
| Age(years) | ||
| BMI ( | ||
| DBP | ||
| SBP | ||
| Heart Rate | ||
| Male | ||
| Hypertension | ||
| Diabetes | ||
| Hyperlipemia | ||
| Smoke | ||
| Family history |
Fig. 1Flowchart of the two-step stacking method. The testing set of size 64, named “Testing”, is used to evaluate the proposed method. The validation set of size 72, named “Validation 0”, is used to train the second-step stacking weights in Eq. (4). The rest set of size 288 is randomly divided into a first-step training set (named “Training 1” through “Training 10”) of size 230 and a first-step validation set (named “Validation 1” through “Validation 10”) of size 58 to train the 14 individual classifiers and first-step stacking weights in Eq. (3) for 10 times
p Values for the two-sample t-test of 2D-STE features
| Longitudinal strain | |||||||||
| GLPS ( | PSS | SSR | TP | PSD | |||||
| Epi | Mid | Endo | |||||||
| .024 | .049 | .076 | .024 | .041 | .179 | .731 | |||
| Radial Strain | |||||||||
| SAX-AP ( | SAX-PM ( | SAX-MV ( | |||||||
| Epi | Mid | Endo | Epi | Mid | Endo | Epi | Mid | Endo | |
| .982 | .952 | .598 | .663 | .654 | .682 | .247 | .175 | .516 | |
Fig. 2Correlations among features. a Correlation matrix of global longitudinal strains and radial strains of apical level, papillary muscle level and mitral valve level. b Correlation matrix of 17 segments on PSS, SSR and TP
Fig. 4a Heatmaps of contributions of 17 segments in first three PCs of peak systolic strain, systolic strain rate and time-to-peak. Column from left to right represents the first PC to the third PC respectively, and the top row represents PSS, the middle row represents SSR and the bottom row represents TP. b Bullseye plot of the AHA 17-segment model
Fig. 3Screeplot of PCA on peak systolic strain, systolic strain rate and time-to-peak
Mean testing accuracy of individual classification models after 50 replicates with standard deviation in the brackets
| Model | Accuracy |
|---|---|
| logistic regression | |
| penalized logistic regression | |
| cumulative probability model | |
| random forest | |
| weighted subspace random forest | |
| SVM with class weight | |
| SVM with polynomial kernel | |
| SVM with radial kernel | |
| K-nearest neighbor | |
| LDA | |
| sparsed LDA | |
| naive Bayes | |
| Bayes generalized linear model | |
| Gaussian process with polynomial kernel | |
| Gaussian process with radial kernel | |
| Neural network | |
| Monotone multi-layer perceptron neural network | |
| model average neural network | |
| stochastic gradient boosting |
Mean testing accuracy and the AUC of ensemble learning methods after 50 replicates with standard deviation in the brackets
| Model | Accuracy | AUC |
|---|---|---|
| Two-step stacking (14 models) | ||
| Two-step stacking (3 models) | 0.822 (0.030) | |
| Traditional stacking (14 models) | 0.854 (0.034) | |
| Traditional stacking (3 models) | 0.798 (0.037) | |
| Weighted voting (14 models) | 0.751 (0.040) | |
| Weighted voting (3 models) | 0.728 (0.037) | |
| Two-step stacking with GLPS only | 0.674 (0.047) |
Best results are bolded
Fig. 5ROC curves of 1. the ensemble learning methods on 14 individual models, 2. the ensemble learning methods on the three “best-perform” models, and 3. the three “best-perform” individual models. The ensemble learning methods including the two-step stacking methods, the traditional stacking methods, and the weighted voting methods. The purple lines represent the individual models. The black lines represent the two-step stacking methods, the blue lines represent the traditional stacking methods, and the red lines represent the weighted voting methods, with the solid lines represents ensemble on 14 models, and the dashed lines represent ensemble on 3 models