| Literature DB >> 35530302 |
Ryan C Bahar1, Sara Merkaj1,2, Gabriel I Cassinelli Petersen1, Niklas Tillmanns1, Harry Subramanian1, Waverly Rose Brim1, Tal Zeevi1, Lawrence Staib1, Eve Kazarian1, MingDe Lin1,3, Khaled Bousabarah4, Anita J Huttner5, Andrej Pala2, Seyedmehdi Payabvash1, Jana Ivanidze6, Jin Cui1, Ajay Malhotra1, Mariam S Aboian1.
Abstract
Objectives: To systematically review, assess the reporting quality of, and discuss improvement opportunities for studies describing machine learning (ML) models for glioma grade prediction.Entities:
Keywords: artificial intelligence; deep learning; glioma; machine learning; systematic review
Year: 2022 PMID: 35530302 PMCID: PMC9076130 DOI: 10.3389/fonc.2022.856231
Source DB: PubMed Journal: Front Oncol ISSN: 2234-943X Impact factor: 5.738
Figure 1PRISMA flow diagram of study search strategy. ML, machine learning; PRISMA, Preferred Reporting Guidelines for Systematic Reviews and Meta-Analyses.
Figure 2Number of studies published per year from 1995-2020.
Figure 3(A) Number of studies by first author’s country of affiliation and respective continent. (B) Number of studies by country (or countries) of data acquisition.
Figure 4Classification systems used across studies for defining HGG vs. LGG by grade 1-4. HGG, high-grade gliomas; LGG, low-grade gliomas.
Mean (± standard deviation) aggregate performance metrics across studies.
| Accuracy (n=82) | AUC (n=48) | Sensitivity (n=55) | Specificity (n=51) | Positive Predictive Value (n=12) | Negative Predictive Value (n=6) | F1 Score (n=7) |
|---|---|---|---|---|---|---|
| 0.89 ± 0.09 | 0.92 ± 0.07 | 0.89 ± 0.09 | 0.88 ± 0.11 | 0.90 ± 0.09 | 0.82 ± 0.08 | 0.89 ± 0.11 |
| (0.53-1.00) | (0.73-1.00) | (0.63-1.00) | (0.55-1.00) | (0.68-1.00) | (0.73-0.94) | (0.67-0.98) |
n, number of studies reporting metric.
Figure 5Prediction accuracy of most common algorithm types, measured in the best performing algorithm of each study. Circle at mean. Error bars indicate standard deviation.
Characteristics of the 10 studies reporting the highest accuracy results for their best performing models, including: glioma grade classification task, dataset source and size, ratio of high- to low-grade gliomas, validation technique, imaging sequences used in prediction, feature types used in prediction, best performing algorithm (based on accuracy results), and performance metrics.
| Paper | Glioma Grade Classification Task | Dataset | HGG : LGG Ratio | Validation Technique | Imaging Sequences | Features | Best Algorithm | Performance Metrics |
|---|---|---|---|---|---|---|---|---|
| Hedyehzadeh et al. (2020) ( | 2/3 vs. 4 | TCIA (n=461 patients) | 1.3:1 (262 HGG, 199 LGG in total set) | Internal (4-fold cross-validation) | T1, T1CE, T2, FLAIR | Texture | Support Vector Machine | Accuracy = 1.00 Sensitivity = 1.00 Specificity = 1.00 |
| BashirGonbadi and Khotanlou (2019) ( | 1/2 vs. 3/4 | BraTS (n=285 patients) | 2.8:1 (210 HGG, 75 LGG in total set) | Internal (Holdout, 15% of dataset) | T1, T1CE, T2, FLAIR | Deep learning extracted | Convolutional Neural Network | Accuracy = 0.9918 |
| Polly et al. (2018) ( | HGG vs. LGG (unclear) | BraTS (n=160 images) | 1:1 (50 HGG, 50 LGG in testing set) | Unspecified | T2 | First-order, Shape, Texture | Support Vector Machine | Accuracy = 0.99 Sensitivity = 1.00 Specificity = 0.9803 |
| De Looze et al. (2018) ( | HGG vs. LGG (unclear) | Single center hospital (n=381 patients) | Unclear | Internal (5-fold cross-validation) | T1, T1CE, T2, FLAIR, Diffusion | Qualitative | Random Forest | Accuracy = 0.99 AUC = 0.99 Sensitivity = 1.00 Specificity = 0.92 |
| Sharif et al. (2020) ( | HGG vs. LGG (unclear) | BraTS (n=30 patients) | 2.3:1 (7 HGG, 3 LGG in testing set) | Internal (Holdout, 10-fold cross-validation) | T1, T1CE, T2, FLAIR | Deep learning extracted | Convolutional Neural Network | Accuracy = 0.987 |
| Muneer et al. (2019) ( | 1 vs. 2 vs. 3 vs. 4 | Single center hospital (n=20 patients) | 1.3:1.6:1:1.5 (39 grade 1, 51 grade 2, 31 grade 3, 47 grade 4 images in testing set) | Internal | T2 | Deep learning extracted | VGG19 (Deep Convolutional Neural Network) | Accuracy = 0.9825 Sensitivity = 0.9272 Specificity = 0.9813 Positive Predictive Value = 0.9471 F1 Score = 0.9371 |
| Dandil and Bicer (2020) ( | 1/2 vs. 3 vs. 4 vs. meningioma | INTERPRET (n=179 patients) | Unclear | Unspecified | MR Spectroscopy (Time of Echo 20ms and 136ms) | First-order, Shape and size, Texture | Long Short-Term Memory (Neural Network) | Accuracy = 0.982 AUC = 0.9936 Sensitivity = 1.00 Specificity = 0.9753 |
| Tian et al. (2018) ( | 2 vs. 3/4 | Single center hospital (n=153 patients) | 2.6:1 (111 HGG, 42 LGG in total set) | Internal (10-fold cross-validation) | T1, T1CE, T2, Diffusion, Perfusion (3D Arterial Spin Labeling) | Texture | Support Vector Machine | Accuracy = 0.981 AUC = 0.992 Sensitivity = 0.987 Specificity = 0.974 |
| Lo et al. (2019) ( | 2 vs. 3 vs. 4 | TCIA (n=130 patients) | 1:1.4:1.9(30 grade 2, 43 grade 3 and 57 grade 4 in total set) | Internal (10-fold cross-validation) | T1CE | Deep learning extracted | Deep Convolutional Neural Network | Accuracy = 0.979 AUC = 0.9991 |
| Kumar et al. (2020) ( | 1/2 vs. 3/4 | BraTS (n=285 patients) | 2.8:1 (210 HGG, 75 LGG in total set) | Internal (5-fold cross-validation) | T1, T1CE, T2, (T2W)-FLAIR | First-order, Shape, Texture | Random Forest | Accuracy = 0.9754 AUC = 0.9748 Sensitivity = 0.9762 Specificity = 0.9733 F1 Score = 0.983 |
Testing or validation metrics are reported when available, otherwise training metrics are reported. HGG, high-grade gliomas; LGG, low-grade gliomas; ML, machine learning; PRISMA-DTA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses of Diagnostic Test Accuracy; T1CE, T1-weighted contrast-enhanced; TRIPOD, Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis.
Figure 6TRIPOD adherence of machine learning glioma grade prediction studies. Adherence rate for individual items represents the percent of studies scoring a point for that item: 1 – title. 2 – abstract. 3a – background. 3b – objectives. 4a – study design. 4b – study dates. 5a – study setting. 5b – eligibility criteria. 6a – outcome assessment. 6b – blinding assessment of outcome. 7a – predictor assessment. 7b – blinding assessment of predictors. 8 – sample size justification. 9 – missing data. 10a – predictor handling. 10b – model type, model-building, and internal validation. 10d – model performance. 13a – participant flow and outcomes. 13b – participant demographics and missing data. 14a – model development (participants and outcomes). 15a – full model specification. 15b – using the model. 16 – model performance. 18 – study limitations. 19b – results interpretation. 20 – clinical use and research implications. 22 – funding. Overall – mean TRIPOD adherence rate of all studies. TRIPOD, Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis.