| Literature DB >> 35311083 |
Wei Guo1, Dejun She1, Zhen Xing1, Xiang Lin1, Feng Wang1, Yang Song2, Dairong Cao1,3,4.
Abstract
Objectives: The performance of multiparametric MRI-based radiomics models for predicting H3 K27M mutant status in diffuse midline glioma (DMG) has not been thoroughly evaluated. The optimal combination of multiparametric MRI and machine learning techniques remains undetermined. We compared the performance of various radiomics models across different MRI sequences and different machine learning techniques.Entities:
Keywords: H3 K27M mutant; diffuse midline glioma; machine learning; multiparametric MRI; radiomics
Year: 2022 PMID: 35311083 PMCID: PMC8928064 DOI: 10.3389/fonc.2022.796583
Source DB: PubMed Journal: Front Oncol ISSN: 2234-943X Impact factor: 6.244
Figure 1The flowchart of the presented study. (A) Multiparametric MRI data collection, image pre-processing, tumor segmentation, and radiomics feature extraction. (B) Machine learning and model performance analysis.
Baseline characteristics of the training and test groups.
| Characteristics | All (n = 102) | Training (n = 72) | Test (n = 30) |
|
|---|---|---|---|---|
| Age (years) | 41.19 ± 20.64 | 41.63 ± 20.80 | 40.13 ± 20.57 | 0.649 |
| Gender (%) | 0.711 | |||
| Male | 64 (62.75%) | 46 (63.89%) | 18 (60.00%) | |
| Female | 38 (37.25%) | 26 (36.11%) | 12 (40.00%) | |
| H3 K27M mutant status (%) | 0.977 | |||
| Mutant | 27 (26.47%) | 19 (26.39%) | 8 (26.67%) | |
| Wild type | 75 (73.53%) | 53 (73.61%) | 22 (73.33%) |
A p-value <0.05 indicates the statistical significance of the variate difference between training and test sets. Continuous variables were described as the mean ± SD. Categorical variables were presented as the number, with percentages in parentheses.
Figure 2The machine learning pipelines and performance of top-five-performing models of different sequences. The color of lines indicated the performance of models in the test set.
Figure 3Box-and-whisker plots illustrate the top-five-performing area under the curve (AUC) values of different sequences.
The performance of the top-one-performing models.
| Sequence | Machine learning technique | Dataset | AUC | 95% CI | ACC | SEN | SPE | PPV | NPV |
|---|---|---|---|---|---|---|---|---|---|
| T2WI | Min–max_PCA_RFE_AB | Training | 1.000 | 1.000–1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| Test | 0.915 | 0.769–1.000 | 0.900 | 0.750 | 0.955 | 0.857 | 0.913 | ||
| T1WI | Z-score_PCC_KW_AE | Training | 0.767 | 0.631–0.890 | 0.694 | 0.842 | 0.642 | 0.457 | 0.919 |
| Test | 0.881 | 0.733–0.984 | 0.700 | 1.000 | 0.591 | 1.000 | 0.471 | ||
| FLAIR | Mean_PCC_Relief_AB | Training | 1.000 | 1.000–1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| Test | 0.875 | 0.722–0.984 | 0.833 | 0.625 | 0.909 | 0.870 | 0.714 | ||
| CE-T1WI | Min–max_PCC_Relief_LR | Training | 0.780 | 0.669–0.883 | 0.653 | 0.947 | 0.547 | 0.429 | 0.967 |
| Test | 0.881 | 0.733–0.984 | 0.800 | 1.000 | 0.727 | 1.000 | 0.571 | ||
| ADC | Min–max_PCC_RFE_RF | Training | 1.000 | 1.000–1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| Test | 0.886 | 0.718–1.000 | 0.700 | 0.000 | 0.955 | 0.724 | 0.000 | ||
| SWI | Mean_PCC_RFE_DT | Training | 1.000 | 1.000–1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| Test | 0.869 | 0.694–0.979 | 0.867 | 0.875 | 0.864 | 0.950 | 0.700 | ||
| CBV | Mean_PCA_Relief_AE | Training | 0.640 | 0.490–0.779 | 0.736 | 0.474 | 0.830 | 0.500 | 0.815 |
| Test | 0.807 | 0.585–0.980 | 0.700 | 0.875 | 0.636 | 0.933 | 0.467 | ||
| CBF | Z-score_PCA_RFE_LR | Training | 0.924 | 0.844–0.983 | 0.875 | 0.842 | 0.887 | 0.727 | 0.940 |
| Test | 0.926 | 0.814–1.000 | 0.833 | 0.875 | 0.818 | 0.947 | 0.636 | ||
| T2WI+CE-T1WI | Mean_PCA_ANOVA_SVM | Training | 0.964 | 0.919–0.995 | 0.944 | 0.895 | 0.962 | 0.895 | 0.962 |
| Test | 0.909 | 0.769–1.000 | 0.800 | 0.500 | 0.909 | 0.833 | 0.667 | ||
| T2WI+CE-T1WI+ADC | Z-score_PCA_RFE_AE | Training | 0.965 | 0.920–1.000 | 0.931 | 0.842 | 0.962 | 0.889 | 0.944 |
| Test | 0.869 | 0.727–0.976 | 0.733 | 0.625 | 0.773 | 0.850 | 0.500 | ||
| T2WI+CE-T1WI+SWI | Z-score_PCA_RFE_LDA | Training | 0.963 | 0.905–1.000 | 0.958 | 0.947 | 0.962 | 0.900 | 0.981 |
| Test | 0.898 | 0.761–0.988 | 0.800 | 0.750 | 0.818 | 0.900 | 0.600 | ||
| T2WI+CE-T1WI+CBF | Z-score_PCA_RFE_RF | Training | 1.000 | 1.000–1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| Test | 0.932 | 0.824–1.000 | 0.733 | 0.125 | 0.955 | 0.750 | 0.500 | ||
| T2WI+CE-T1WI+ADC+SWI | Min–max_PCA_KW_SVM | Training | 1.000 | 1.000–1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| Test | 0.892 | 0.728–0.994 | 0.867 | 0.750 | 0.909 | 0.909 | 0.750 | ||
| T2WI+CE-T1WI+ADC+CBF | Mean_PCA_Relief_LR | Training | 0.555 | 0.379–0.733 | 0.736 | 0.421 | 0.849 | 0.500 | 0.804 |
| Test | 0.881 | 0.701–1.000 | 0.900 | 0.750 | 0.955 | 0.913 | 0.857 | ||
| T2WI+CE-T1WI+SWI+CBF | Min–max_PCA_RFE_LDA | Training | 0.888 | 0.805–0.957 | 0.806 | 0.895 | 0.774 | 0.586 | 0.954 |
| Test | 0.955 | 0.854–1.000 | 0.767 | 1.000 | 0.682 | 1.000 | 0.533 | ||
| cMRI | Min–max_PCC_Relief_LR | Training | 0.833 | 0.731–0.933 | 0.778 | 0.842 | 0.755 | 0.552 | 0.930 |
| Test | 0.921 | 0.778–1.000 | 0.733 | 1.000 | 0.636 | 1.000 | 0.500 | ||
| aMRI | Mean_PCA_Relief_AB | Training | 1.000 | 1.000–1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| Test | 0.915 | 0.800–0.993 | 0.800 | 0.875 | 0.773 | 0.944 | 0.583 | ||
| ALL | Z-score_PCA_KW_RF | Training | 1.000 | 1.000–1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| Test | 0.969 | 0.904–1.000 | 0.767 | 0.125 | 1.000 | 0.759 | 1.000 |
Machine learning technique was expressed as “feature matrix normalization_dimensionality reduction_feature selector_classifier”.
T2WI, T2-weighted imaging; T1WI, T1-weighted imaging; FLAIR, fluid-attenuated inversion recovery; CE-T1WI, contrast-enhanced T1WI; ADC, apparent diffusion coefficient; SWI, susceptibility-weighted imaging; CBV, cerebral blood volume; CBF, cerebral blood flow; cMRI, model developed with all of the conventional MRI; aMRI, model developed with all of the advanced MRI; ALL, model developed with all of the eight sequences; PCC, Pearson’s correlation coefficient; PCA, principal component analysis; RFE, recursive feature elimination; KW, Kruskal–Wallis; LR, logistic regression; LDA, linear discriminant analysis; SVM, support vector machine; AE, auto-encoder, DT, decision tree; RF, random forest; AB, AdaBoost; AUC, area under the curve; ACC, accuracy; SEN, sensibility; SPE, specificity; PPV, positive predictive value; NPV, negative predictive value.
Figure 4The receiver operating characteristic curve of the top-one-performing models of different sequences in the training (A, C) and test sets (B, D).
Results of DeLong’s test of the best models with different classifiers.
| Sequence | Highest AUC | Lowest AUC |
|
|
| ||
|---|---|---|---|---|---|---|---|
| Classifier | AUC | Classifier | AUC | ||||
| T2WI | AB | 0.915 | RF | 0.847 | 0.4851 | 0.4508 | 0.0527 |
| T1WI | AB | 0.875 | AB | 0.815 | 0.5602 | 0.1908 | 0.0621 |
| FLAIR | AE | 0.881 | LR | 0.767 | 0.2334 | 0.1940 |
|
| CE-T1WI | LR | 0.881 | DT | 0.744 | 0.2568 | 0.1720 |
|
| ADC | RF | 0.886 | SVM | 0.727 |
| 0.1837 |
|
| SWI | DT | 0.869 | AB | 0.761 | 0.2292 | 0.1834 |
|
| CBV | AE | 0.807 | DT | 0.722 | 0.3579 | 0.1108 |
|
| CBF | LR | 0.926 | DT | 0.761 |
| 0.4729 |
|
| T2WI+CE-T1WI | SVM | 0.909 | DT | 0.790 | 0.2944 | 0.3316 | 0.0756 |
| T2WI+CE-T1WI+ADC | AE | 0.869 | DT | 0.807 | 0.4912 | 0.1319 | 0.0867 |
| T2WI+CE-T1WI+SWI | LDA | 0.898 | DT | 0.847 | 0.5049 | 0.1872 | 0.5532 |
| T2WI+CE-T1WI+CBF | RF | 0.932 | SVM | 0.847 | 0.3043 | 0.5163 | 0.6795 |
| T2WI+CE-T1WI+ADC+SWI | SVM | 0.892 | DT | 0.790 | 0.1657 | 0.2710 | 1.0000 |
| T2WI+CE-T1WI+ADC+CBF | LR | 0.881 | DT | 0.824 |
| 0.3280 | 0.0584 |
| T2WI+CE-T1WI+SWI+CBF | LDA | 0.955 | DT | 0.807 | 0.1121 | 0.7520 | 0.8889 |
| cMRI | LR | 0.921 | DT | 0.841 | 0.2765 | 0.4145 |
|
| aMRI | AB | 0.915 | DT | 0.784 | 0.1715 | 0.3243 | 0.0618 |
| ALL | RF | 0.969 | DT | 0.790 | 0.0640 | – | – |
Bold type indicate p < 0.05.
T2WI, T2-weighted imaging; T1WI, T1-weighted imaging; FLAIR, fluid-attenuated inversion recovery; CE-T1WI, contrast-enhanced T1WI; ADC, apparent diffusion coefficient; SWI, susceptibility-weighted imaging; CBV, cerebral blood volume; CBF, cerebral blood flow; cMRI, model developed with all of the conventional MRI; aMRI, model developed with all of the advanced MRI; ALL, model developed with all of the eight sequences; LR, logistic regression; LDA, linear discriminant analysis; SVM, support vector machine; AE, auto-encoder, DT, decision tree; RF, random forest; AB, AdaBoost; AUC, area under the curve.
p-Value is for the comparison between the best and worst classifiers in the same sequence.
p-Value is for the comparison of the best classifiers between all sequence-based models (ALL) and other sequence-based models.
p-Value is for the comparison between the best classifier of all sequence-based models (ALL) and the worst classifier of other sequence-based models.
Figure 5The optimal performance across different sequences and classifiers.
Figure 6The percentage of machine learning techniques in 90 top-five-performing models of different model categories. “Conventional MRI only” represents models developed only with conventional MRI sequences; “Advanced MRI only” for models only with advanced MRI sequences; “Mixed MRI” for models with both conventional and advanced MRI sequences; “Single sequences” for models with one sequence; “combined sequences” for models with at least two sequences; and “All sequences” for models of all sequence sets.