| Literature DB >> 34135418 |
Sarv Priya1, Tanya Aggarwal2, Caitlin Ward3, Girish Bathla4, Mathews Jacob5, Alicia Gerke6, Eric A Hoffman4,7, Prashant Nagpal4.
Abstract
Side experiments are performed on radiomics models to improve their reproducibility. We measure the impact of myocardial masks, radiomic side experiments and data augmentation for information transfer (DAFIT) approach to differentiate patients with and without pulmonary hypertension (PH) using cardiac MRI (CMRI) derived radiomics. Feature extraction was performed from the left ventricle (LV) and right ventricle (RV) myocardial masks using CMRI in 82 patients (42 PH and 40 controls). Various side study experiments were evaluated: Original data without and with intraclass correlation (ICC) feature-filtering and DAFIT approach (without and with ICC feature-filtering). Multiple machine learning and feature selection strategies were evaluated. Primary analysis included all PH patients with subgroup analysis including PH patients with preserved LVEF (≥ 50%). For both primary and subgroup analysis, DAFIT approach without feature-filtering was the highest performer (AUC 0.957-0.958). ICC approaches showed poor performance compared to DAFIT approach. The performance of combined LV and RV masks was superior to individual masks alone. There was variation in top performing models across all approaches (AUC 0.862-0.958). DAFIT approach with features from combined LV and RV masks provide superior performance with poor performance of feature filtering approaches. Model performance varies based upon the feature selection and model combination.Entities:
Mesh:
Year: 2021 PMID: 34135418 PMCID: PMC8209219 DOI: 10.1038/s41598-021-92155-6
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Top three models selected to fit for entire group (All PH patients versus controls).
| Slices | Model | Feature selection | Mean | SD | Median | Min | Max |
|---|---|---|---|---|---|---|---|
| LV mask | rf | Full | 0.921 | 0.064 | 0.922 | 0.750 | 1.000 |
| LV mask | rf | Corr | 0.916 | 0.069 | 0.938 | 0.792 | 1.000 |
| Combined | rf | Full | 0.913 | 0.054 | 0.922 | 0.806 | 1.000 |
| Combined mask | nnet | Corr | 0.905 | 0.059 | 0.906 | 0.750 | 0.984 |
| Combined mask | nnet | Full | 0.904 | 0.079 | 0.917 | 0.708 | 1.000 |
| Combined mask | mlp | Full | 0.904 | 0.063 | 0.906 | 0.750 | 1.000 |
| Combined mask | ridge | Full | 0.895 | 0.065 | 0.891 | 0.734 | 0.984 |
| Combined mask | nnet | Full | 0.885 | 0.079 | 0.903 | 0.688 | 1.000 |
| Combined mask | enet | Full | 0.873 | 0.074 | 0.875 | 0.734 | 0.984 |
| Combined mask | svmPoly | pca | 0.958 | 0.033 | 0.960 | 0.846 | 1.000 |
| Combined mask | svmPoly | Full | 0.957 | 0.043 | 0.967 | 0.860 | 1.000 |
| Combined mask | svmRad | Full | 0.939 | 0.034 | 0.949 | 0.853 | 0.988 |
| Combined mask | svmRad | Corr | 0.945 | 0.047 | 0.945 | 0.824 | 1.000 |
| Combined mask | svmPoly | Full | 0.943 | 0.051 | 0.956 | 0.816 | 1.000 |
| Combined mask | svmRad | Full | 0.941 | 0.053 | 0.945 | 0.783 | 1.000 |
| Combined mask | svmRad | Full | 0.920 | 0.060 | 0.934 | 0.702 | 0.978 |
| Combined mask | nnet | Corr | 0.903 | 0.073 | 0.919 | 0.625 | 0.985 |
| Combined mask | svmRad | Corr | 0.902 | 0.075 | 0.926 | 0.607 | 0.989 |
LV left ventricle, RV right ventricle, combined combined RV and LV masks, original original data without inclusion of side experiments, ICC2 features with excellent intraclass correlation from first two extractions, ICC3 features with excellent intraclass correlation from all three extractions, DAFIT synthetic data creation using main and side study data, DAFIT Filt2 combining DAFIT with feature filtering from ICC2, DAFIT Filt3 combining DAFIT with feature filtering from ICC3, rf random forest, nnet neural network, mlp multilayer perceptron, enet elastic net, svmPoly support vector machine (SVM) with a polynomial kernel, svmRad SVM with a radial kernel, full full feature set, corr high correlation filter, pca principal component analysis.
Top three models selected to fit for PH sub-group (PH subjects with preserved ejection fraction versus controls).
| Slices | Model | Feature selection | Mean | SD | Median | Min | Max |
|---|---|---|---|---|---|---|---|
| RV mask | nnet | Lincomb | 0.885 | 0.097 | 0.906 | 0.688 | 1.000 |
| Combined | rf | Full | 0.878 | 0.092 | 0.906 | 0.609 | 1.000 |
| RV mask | ridge | Lincomb | 0.876 | 0.107 | 0.906 | 0.531 | 1.000 |
| Combined mask | gbrm | Full | 0.808 | 0.151 | 0.844 | 0.313 | 1.000 |
| Combined mask | rf | Full | 0.798 | 0.118 | 0.797 | 0.594 | 1.000 |
| RV mask | rf | Full | 0.794 | 0.140 | 0.813 | 0.375 | 1.000 |
| Combined mask | nnet | Full | 0.815 | 0.119 | 0.813 | 0.563 | 1.000 |
| Combined mask | mlp | Full | 0.800 | 0.144 | 0.844 | 0.500 | 1.000 |
| Combined mask | lasso | Full | 0.785 | 0.161 | 0.750 | 0.500 | 1.000 |
| Combined mask | svmPoly | Full | 0.957 | 0.039 | 0.969 | 0.859 | 1.000 |
| Combined mask | svmPoly | pca | 0.947 | 0.036 | 0.945 | 0.891 | 1.000 |
| Combined mask | svmRad | Full | 0.926 | 0.043 | 0.930 | 0.836 | 0.984 |
| Combined mask | svmPoly | Corr | 0.908 | 0.095 | 0.930 | 0.617 | 1.000 |
| Combined mask | svmPoly | Full | 0.903 | 0.098 | 0.914 | 0.609 | 1.000 |
| Combined mask | svmRad | Full | 0.890 | 0.088 | 0.906 | 0.617 | 1.000 |
| Combined mask | svmRad | Full | 0.887 | 0.100 | 0.906 | 0.656 | 0.992 |
| Combined mask | svmRad | Corr | 0.881 | 0.089 | 0.906 | 0.648 | 1.000 |
| Combined mask | linear | Corr | 0.863 | 0.082 | 0.875 | 0.703 | 0.992 |
LV left ventricle, RV right ventricle, combined combined RV and LV masks, original original data without inclusion of side experiments, ICC2 features with excellent intraclass correlation from first two extractions, ICC3 features with excellent intraclass correlation from all three extractions, DAFIT synthetic data creation using main and side study data, DAFIT Filt2 combining DAFIT with feature filtering from ICC2, DAFIT Filt3 combining DAFIT with feature filtering from ICC3, rf random forest, nnet neural network, gbrm gradient boost regression model, mlp multilayer perceptron, enet elastic net, lasso least absolute shrinkage and selection operator, svmPoly support vector machine (SVM) with a polynomial kernel, svmRad SVM with a radial kernel, full full feature set, corr high correlation filter, pca principal component analysis, lincomb linear combinations filter.
Figure 1Model predictive performance for primary analysis (all patients with PH and controls). (a) The box and whisker plot and (b) the ROC curve for multiple approaches analyzed for the primary analysis. DAFIT approach without filtering shows least variation in standard deviation (a) and highest area under the curve (b).
Figure 2Model predictive performance for subgroup analysis (PH patients with preserved ejection fraction and controls). (a) The box and whisker plot and (b) the ROC curve for multiple approaches analyzed for the PH subgroup analysis. DAFIT approach without filtering shows least variation in standard deviation (a) and highest area under the curve (b).
Figure 3Area under curve for primary analysis (all patients with PH and controls). This figure shows the mean area under the curve for multiple feature selection and model combinations using all approaches for the primary analysis.
Figure 4Area under curve for PH subgroup analysis (PH patients with preserved ejection fraction and controls). This figure shows mean area under the curve for multiple feature selection and model combinations using all approaches for the PH subgroup analysis.
Figure 5Effect of confounding variables. This figure shows that of all the confounders evaluated, largest amount of deviance in the classifications is explained by age. Also, predictions by machine learning explain a large portion of the deviance that is not explained by the confounder variables. Htn hypertension, BMI body mass index, BSA body surface area, DAFIT—synthetic data creation using main and side study data; DAFIT Filt2—combining DAFIT with feature filtering from ICC2; DAFIT Filt3—combining DAFIT with feature filtering from ICC3.
Figure 6Study workflow. This figure depicts the overall workflow of the study showing radiomics feature extraction from original and multiple side studies, cross-validation and modelling.
Figure 7DAFIT (data augmentation with information transfer approach). This figure shows the creation of synthetic augmented data set using DAFIT approach. denotes the original data from the main study, and denote the mean and variance, respectively of the paired differences calculated from the repeated extractions in the side study, and denotes the augmented data.
Summary of the feature selection and modeling approaches for each analysis (DAFIT Filt analyses were only fit to the combined masks).
| Number of potential features | Lincomb used | Corr thresholds | PCA thresholds | ||
|---|---|---|---|---|---|
| LV mask | RV mask | ||||
| Original | 348 | 348 | Yes | 0.6/0.5 | 0.9/0.85 |
| ICC2 | 22 | 46 | No | 0.9/0.9 | 0.9/0.9 |
| ICC3 | 8 | 24 | No | – | – |
| DAFIT | 348 | 348 | Yes | 0.6/0.4 | 0.9/0.85 |
| DAFIT Filt2 | 22 | 46 | No | –/0.8 | –/0.85 |
| DAFIT Filt3 | 8 | 24 | No | –/0.8 | –/0.85 |
Original original data without inclusion of side experiments, ICC2 features with excellent intraclass correlation from first two extractions, ICC3 features with excellent intraclass correlation from all three extractions, DAFIT synthetic data creation using main and side study data, DAFIT Filt2 combining DAFIT with feature filtering from ICC2, DAFIT Filt3 combining DAFIT with feature filtering from ICC3, full full feature set, corr high correlation filter, pca principal component analysis, lincomb linear combinations filter, LV left ventricle, RV right ventricle.