| Literature DB >> 36010228 |
Kaining Sheng1,2, Cecilie Mørck Offersen1,2, Jon Middleton3,4, Jonathan Frederik Carlsen1,2, Thomas Clement Truelsen2,5, Akshay Pai1,4, Jacob Johansen3,4, Michael Bachmann Nielsen1,2.
Abstract
We conducted a systematic review of the current status of machine learning (ML) algorithms' ability to identify multiple brain diseases, and we evaluated their applicability for improving existing scan acquisition and interpretation workflows. PubMed Medline, Ovid Embase, Scopus, Web of Science, and IEEE Xplore literature databases were searched for relevant studies published between January 2017 and February 2022. The quality of the included studies was assessed using the Quality Assessment of Diagnostic Accuracy Studies 2 tool. The applicability of ML algorithms for successful workflow improvement was qualitatively assessed based on the satisfaction of three clinical requirements. A total of 19 studies were included for qualitative synthesis. The included studies performed classification tasks (n = 12) and segmentation tasks (n = 7). For classification algorithms, the area under the receiver operating characteristic curve (AUC) ranged from 0.765 to 0.997, while accuracy, sensitivity, and specificity ranged from 80% to 100%, 72% to 100%, and 65% to 100%, respectively. For segmentation algorithms, the Dice coefficient ranged from 0.300 to 0.912. No studies satisfied all clinical requirements for successful workflow improvements due to key limitations pertaining to the study's design, study data, reference standards, and performance reporting. Standardized reporting guidelines tailored for ML in radiology, prospective study designs, and multi-site testing could help alleviate this.Entities:
Keywords: artificial intelligence; brain MRI; brain diseases; machine learning; workflow
Year: 2022 PMID: 36010228 PMCID: PMC9406456 DOI: 10.3390/diagnostics12081878
Source DB: PubMed Journal: Diagnostics (Basel) ISSN: 2075-4418
Inclusion and exclusion criteria.
| Inclusion Criteria: | Exclusion Criteria: |
|---|---|
| Studies focusing on abnormal brain diseases that included either brain infarct, hemorrhage, or tumor on brain MRI | Studies focusing on tasks not relevant for identification of brain diseases |
| Studies developing algorithms tested on a dataset that was separate from the training dataset | Studies focusing on identification of a single brain disease only |
| Peer-reviewed studies in English | Studies focusing on development of ML for specialized MR sequences (e.g., MR elastography, functional MRI) or other imaging modalities (e.g., SPECT, PET, CT, US) |
| Studies with primarily non-adult populations | |
| Editorials, case series, letters, conference proceedings, reviews, and inaccessible papers |
Figure 1PRISMA workflow: 5688 records screened, and 19 studies were included for qualitative review.
Study population and data characteristics.
| Author | Data Source | No. Patients | Training Data | Validation Data | Testing Data | Disease Distribution in Data | MR Sequences | MR Field Strength |
|---|---|---|---|---|---|---|---|---|
| Ahmadi et al., 2021 [ | Private + | 1200 images | 1120 | N/A | 80 | 12.5% normals | 2D single slice of: | 1.5 T |
| Baur et al., 2021 [ | Private | 259 patients | 100 | 18 | 141 | 42% normal used for unsupervised training | Ax T2-FLAIR | 1.5 T |
| Duong et al., 2019 [ | Private | 387 patients | 295 | N/A | 92 | Normal and 19 different abnormalities incl. MS, high grade glioma, and vascular (acute or subacute ischemia) | Ax T2-FLAIR | 1.5 T |
| Fayaz et al., 2021 [ | Harvard Medical School Whole Brain Atlas | 4100 images | 2870 | N/A | 1230 | 50% normal | 2D single slice of: | 1.5 T |
| Felipe Fattori Alves et al., 2020 [ | Private | 67 patients | 50 | N/A | 17 | 45% inflammatory lesion (incl. MS, vasculitis, toxoplasmosis, pyogenic and septic-embolic brain abscess, etc.) | Ax T1 & T1 + C | 1.5 T |
| Gauriau et al., 2021 [ | Private | 10,770 patients | 7795 | 473 | 2502 | Normal and 8+ different abnormalities including infarct, hemorrhage, neoplasm, demyelination, and infections | Ax T2-FLAIR | 1.5 T |
| Gilanie et al., 2018 [ | Harvard Medical School Whole Brain Atlas | 4589 images | 3029 | N/A | 1560 | 11% normal | 2D single slices of: | 1.5 T |
| Han et al., 2020 [ | OASIS-3 | 1162 patients | 543 | N/A | 619 | 47% normals used for unsupervised training | Ax T1 & Ax T1 + c | 1.5 T |
| Hu et al., 2020 [ | BRATS 2019 | 459 patients | 317 | N/A | 142 | 84% glioma (HGG, LGG) | Ax T1 & T1 + C | 1.5 T |
| Kamnitsas et al., 2017 [ | Private | 509 patients | 348 | N/A | 161 | 75% tumor (high grade glioma, low grade glioma) | Ax or Sag T1 & T1 + C | 1.5 T |
| Kim et al., 2021 [ | BRATS 2019 | 259 patients | 239 | N/A | 26 | 36% normal | 2D slices of | 1.5 T |
| Lu et al., 2021 [ | Private | 7134 patients | * 5002 | 1061 | 1071 | 13% acute/subacute stroke | Axial T2-FLAIR | 1.5 T |
| Lu, Lu et Zhang., 2019 [ | Harvard Medical School Whole Brain Atlas | 291 images | 204 | N/A | 87 | 39% normal | 2D single slice of: | 1.5 T |
| Nael et al., 2021 [ | Private | 13,215 patients | 9845 | 1248 | 2122 | 17% normal | Ax or Sag T1 & T1 + C | 1.5 T |
| Nayak et al., 2020 [ | Harvard Medical School Whole Brain Atlas & | 275 images | 165 | N/A | 110 | 20% normal | 2D single slice of: | 1.5 T |
| Nayak et al., 2020 [ | Havard Medical School | 200 images | 120 | N/A | 80 | 20% normal | 2D single slice of: | 1.5 T |
| Pereira et al., 2019 [ | BRATS 2013 | 471 patients | 358 | 10 | 103 | 89% tumor (high grade glioma, low grade glioma) | Ax T1 & T1 + C | 1.5 T |
| Rauschecker et al., 2020 [ | Private | 178 patients | 86 | N/A | 92 | 19 different abnormalities incl. MS, high grade glioma, and vascular (acute or subacute ischemia) | Ax T1 + C | 1.5 T |
| Wood et al., 2022 [ | Private | 71,206 patients | 53,409 | 9425 | 7372 | Normal and 90+ different abnormalities including vascular disease, neoplasms, demyelination, and atrophy | Ax T2-FLAIR | 1.5 T |
Abbreviations: WMH = White Matter Hyperintensity Challenge; BRATS = Brain Tumor Segmentation Challenge Data; HGG = High Grade Glioma; LGG = Low Grade Glioma; ISLES = Ischemic Stroke Lesion Segmentation; OASIS-3 = Open Access Series of Imaging Studies; Ax. = axial. Private = local in-house dataset not publicly available. * Train/test distribution only reported for data pertaining subpart of study performing case-level classification of stroke/not stroke. Disease frequencies in the study populations were also varied with diverse samples of cerebrovascular disease, brain tumors, inflammatory disease, neurodegenerative diseases, neuroinfectious diseases, dementia, traumatic brain injury, and various less significant pathologies. All study populations had samples of brain tumors, while 11 (58%) studies had samples of both normal scans, brain infarcts, hemorrhages, and tumors. The brain pathologies were identified on MR sequences found in typical routine acquisitions with T1, T2, and T2-FLAIR being the most prevalent. Eight studies (42%) used only a single scan sequence, either T2 or T2-FLAIR, for disease inference.
(a) Performance results of binary classification algorithms. (b) Performance results of multiclass classification algorithms.
| Author | Aim of Algorithm | Type of | Ground Truth | Testing Strategy | Performance Results | Workflow Applicability | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AUC | Acc (%) | Sens (%) | Spec (%) | F1 (%) | PPV (%) | NPV (%) | ||||||
| ( | ||||||||||||
| Fayaz et al., 2021 [ | Binary classification of normal and abnormal | CNN + DWT | Expert labels | Train-test split | 0.997 | N/A | 99.7 | N/A | N/A | N/A | N/A | (A) Reflecting clinical practice: NS |
| Felipe Fattori Alves et al., 2020 [ | Binary classification of inflammatory lesions and brain tumors | RF | Expert delineation | Train-test split | * 0.906 | * 82.7 | * 91.2 | N/A | * 87.5 | N/A | N/A | (A) Reflecting clinical practice: NS |
| Gauriau et al., 2021 [ | Binary classification of normal and abnormal | CNN | Radiological report | Train-test split incl. external test set | 0.800 | N/A | 77.0 | 65.0 | 78.0 | N/A | N/A | (A) Reflecting clinical practice: S |
| Gilanie et al., 2018 [ | Binary classification of normal and abnormal | Gabor filter | Expert labels | Train-test split | 0.970 | 96.5 | 98.0 | 92.0 | N/A | N/A | N/A | (A) Reflecting clinical practice: NS |
| Lu et al., 2021 [ | Binary classification of stroke/non- stroke patients | CNN + Gating attention mechanism ranking of multi-contrast MRI | Expert labels | Train-test split | ** 0.881 | N/A | N/A | N/A | N/A | N/A | N/A | (A) Reflecting clinical practice: NS |
| Lu, Lu et Zhang., 2019 [ | Binary classification of normal and abnormal | CNN + transfer learning | Expert labels | Train-test split | N/A | 100.0 | 100.0 | 100.0 | N/A | N/A | N/A | (A) Reflecting clinical practice: NS |
| Wood et al., 2022 [ | Binary classification of normal and abnormal | Ensemble | NLP labelled radiological report | Train-test split incl. external test set | 0.948 | N/A | 91.9 | 84.2 | 92.3 | N/A | N/A | (A) Reflecting clinical practice: NS |
| ( | ||||||||||||
| Han et al., 2020 [ | Multiple binary classification of normal/clinical dementia (Dem), normal/brain metastasis (BM), and normal/various diseases (VD) incl. small infarct and hemorrhage. | Unsupervised GAN + | Expert label | Train-test split | Dem: 0.765 | N/A | N/A | N/A | N/A | N/A | N/A | (A) Reflecting clinical practice: NS |
| Nael et al., 2021 [ | Multiple binary classification of normal/any abnormalities (abn), infarct (inf)/non-infarct, hemorrhage (hem)/non-hemorrhage, and mass effect (ME)/non-mass effect | CNN | Radiological report | Train-test split incl. external test set | Abn: 0.880 | Abn: 80.0 | Abn: 80.0 | Abn: 80.0 | N/A | Abn: 94.0 | Abn: 48.0 | (A) Reflecting clinical practice: NS |
| Nayak et al., 2020 [ | Multiclass classification of normal, stroke, tumor, infectious, degenerative | CNN | Expert labels | Train-test split | N/A | *** 97.5 | N/A | N/A | N/A | N/A | N/A | (A) Reflecting clinical practice: NS |
| Nayak et al., 2020 [ | Multiclass classification of normal, stroke, tumor, infectious, degenerative | CNN + | Expert labels | Train-test split | N/A | 93.8 | N/A | N/A | N/A | N/A | N/A | (A) Reflecting clinical practice: NS |
| Rauschecker et al., 2020 [ | Multiclass classification of 19 brain diseases incl multiple sclerosis (MS), high grade glioma, and vascular infarct defined as correctly classified within top 3 differential diagnosis | CNN + | Expert labels | Train-test split | 0.920 | 91.0 | N/A | N/A | N/A | N/A | N/A | (A) Reflecting clinical practice: NS |
(a) Abbreviations: AUC = area under receiver operatic characteristics curve; Acc = accuracy; Sens = sensitivity; Spec = specificity; PPV = positive predictive value; NPV = negative predictive value; S = satisfied; NS = Not Satisfied (see Section 2); CNN = convolutional neural network; VAE = variational auto-encoder; ELM = extreme learning machine; GAN = generative adversarial network; RF = Random forest; SVM = Support vector machine; k-NN = k-Nearest Neighbor; DWT = Deep Wavelet Transform. * Metrics are reported for a random forest classifier on only T1 images ** Results reported for subpart of study performing case-level classification of stroke/not stroke *** Metrics reported for the largest data subset available in the study (b) Abbreviations: ELM = extreme learning machine; GAN = generative adversarial network.
Performance results of segmentation algorithms.
| Author | Aim of Algorithm | Type of Algorithm | Ground Truth | Testing Strategy | Performance Results | Workflow Applicability | ||||
|---|---|---|---|---|---|---|---|---|---|---|
| DSC | Sens (%) | Spec (%) | PPV (%) | NPV (%) | ||||||
| Ahmadi et al., 2021 [ | Multiclass segmentation incl. neoplasm and neurodegenerative disease | CNN | Synthetic labels via robust PCA | Train-test split | 0.912 | 99.9 | 99.8 | N/A | N/A | (A) Reflecting clinical practice: NS |
| Baur et al., 2021 [ | Multiclass segmentation of normal, MS, glioblastoma (GBM), glioma, microangiopathy (MA), and WMH | Unsupervised VAE | Radiological report | Train-test split | MS: 0.650 | MS: 62.0 | N/A | MS: 67.0 | N/A | (A) Reflecting clinical practice: NS |
| Duong et al., 2019 [ | Multiclass segmentation of 19+ different abnormalities incl. MS, high grade glioma, and infarcts | CNN | Expert image delineation | Train-test split | 0.789 | 76.7 | 99.9 | 76.9 | 99.0 | (A) Reflecting clinical practice: NS |
| Hu et al., 2020 [ | Multiclass segmentation of infarct and glioma | CNN | Expert image delineation | Train-test split | Infarct: 0.300 | Infarct: 43.0 | Infarct: N/A | Infarct: 35.0 | N/A | (A) Reflecting clinical practice: NS |
| Kamnitsas et al., 2017 [ | Multiclass | Ensemble CNN | Expert image delineation | Train-test split | Infarct: 0.590 | Infarct: 60.0 | N/A | Infarct: 68.0 | N/A | (A) Reflecting clinical practice: NS |
| Kim et al., 2021 [ | Multiclass segmentation of infarct and glioma | Unsupervised VAE | Expert image delineation | Train-test split | Infarct: 0.278 | Infarct: 42.9 | N/A | Infarct: 20.5 | N/A | (A) Reflecting clinical practice: NS |
| Pereira et al., 2019 [ | Multiclass segmentation of infarct incl. penumbra and glioma | CNN | Expert image delineation | Train-test split | Infarct: 0.340 | Infarct: 55.0 [25; 85] | N/A | Infarct: 36.0 | N/A | (A) Reflecting clinical practice: NS |
Abbreviations: VAE = variational auto-encoder; CNN = convolutional neural network; DSC = Dice-score coefficient; Sens = sensitivity; Spec = specificity; PPV = positive predictive value; NPV = negative predictive value; S = satisfied; NS = Not Satisfied; CNN = convolutional neural network; VAE = variational autoencoder; PCA = principal component analysis; MS = multiple sclerosis; WMH = white matter hyperintensities.
Presentation of risk of bias/concern for applicability analysis results.
| Source | Risk of Bias | Concern for Applicability | |||||
|---|---|---|---|---|---|---|---|
| Patient | Index Test | Reference Test | Flow and Timing | Patient | Index Test | Reference Test | |
| Ahmadi et al., 2021 [ |
|
|
|
|
|
|
|
| Baur et al., 2021 [ |
|
|
|
|
|
|
|
| Duong et al., |
|
|
|
|
|
|
|
| Fayaz et al., 2021 [ |
|
|
|
|
|
|
|
| Felipe Fattori Alves et al., 2020 [ |
|
|
|
|
|
|
|
| Gauriau et al., 2021 [ |
|
|
|
|
|
|
|
| Gilanie et al., 2018 [ |
|
|
|
|
|
|
|
| Han et al., |
|
|
|
|
|
|
|
| Hu et al., |
|
|
|
|
|
|
|
| Kamnitsas et al., 2017 [ |
|
|
|
|
|
|
|
| Kim et al., |
|
|
|
|
|
|
|
| Lu et al., |
|
|
|
|
|
|
|
| Lu, Lu et Zhang., 2019 [ |
|
|
|
|
|
|
|
| Nael et al., |
|
|
|
|
|
|
|
| Nayak et al., |
|
|
|
|
|
|
|
| Nayak et al., |
|
|
|
|
|
|
|
| Pereira et al., 2019 [ |
|
|
|
|
|
|
|
| Rauschecker et al., 2020 [ |
|
|
|
|
|
|
|
| Wood et al., |
|
|
|
|
|
|
|
Summary of risk of bias and concern for applicability for all included studies. = low risk of bias and concern for applicability; = high risk of bias and concern for applicability; = unclear risk of bias and concern for applicability.
Figure 2Summary of risk of bias and concern for the applicability of included studies.