| Literature DB >> 32603860 |
Siddhesh Thakur1, Jimit Doshi2, Sarthak Pati2, Saima Rathore2, Chiharu Sako2, Michel Bilello3, Sung Min Ha2, Gaurav Shukla4, Adam Flanders5, Aikaterini Kotrotsou6, Mikhail Milchenko7, Spencer Liem8, Gregory S Alexander9, Joseph Lombardo10, Joshua D Palmer11, Pamela LaMontagne7, Arash Nazeri7, Sanjay Talbar12, Uday Kulkarni12, Daniel Marcus7, Rivka Colen6, Christos Davatzikos2, Guray Erus2, Spyridon Bakas13.
Abstract
Brain extraction, or skull-stripping, is an essential pre-processing step in neuro-imaging that has a direct impact on the quality of all subsequent processing and analyses steps. It is also a key requirement in multi-institutional collaborations to comply with privacy-preserving regulations. Existing automated methods, including Deep Learning (DL) based methods that have obtained state-of-the-art results in recent years, have primarily targeted brain extraction without considering pathologically-affected brains. Accordingly, they perform sub-optimally when applied on magnetic resonance imaging (MRI) brain scans with apparent pathologies such as brain tumors. Furthermore, existing methods focus on using only T1-weighted MRI scans, even though multi-parametric MRI (mpMRI) scans are routinely acquired for patients with suspected brain tumors. In this study, we present a comprehensive performance evaluation of recent deep learning architectures for brain extraction, training models on mpMRI scans of pathologically-affected brains, with a particular focus on seeking a practically-applicable, low computational footprint approach, generalizable across multiple institutions, further facilitating collaborations. We identified a large retrospective multi-institutional dataset of n=3340 mpMRI brain tumor scans, with manually-inspected and approved gold-standard segmentations, acquired during standard clinical practice under varying acquisition protocols, both from private institutional data and public (TCIA) collections. To facilitate optimal utilization of rich mpMRI data, we further introduce and evaluate a novel ''modality-agnostic training'' technique that can be applied using any available modality, without need for model retraining. Our results indicate that the modality-agnostic approach1 obtains accurate results, providing a generic and practical tool for brain extraction on scans with brain tumors.Entities:
Keywords: Brain Extraction; Brain tumor; Deep learning; Evaluation; Glioblastoma; Glioma; Skull-stripping; TCIA
Mesh:
Year: 2020 PMID: 32603860 PMCID: PMC7597856 DOI: 10.1016/j.neuroimage.2020.117081
Source DB: PubMed Journal: Neuroimage ISSN: 1053-8119 Impact factor: 7.400
Fig. 1.Example 2D tri-planar sections of mpMRI brain tumor scans after using the brain extraction tool (BET) (Smith, 2002) and followed by manual revisions.
Summary of data included in our comparative evaluations.
| Dataset | # Subjects. | # mpMRI Scans |
|---|---|---|
| UPenn | 453 | 1,812 |
| TJU | 152 | 608 |
| MDA | 25 | 100 |
| TCGA-GBM | 82 | 328 |
| TCGA-LGG | 93 | 372 |
Fig. 2.Example 2D tri-planar sections of T1Gd MRI scans from UPenn, MDA, and TJU datasets, respectively in each row, illustrating the high variability between datasets. Note the lower resolution of the TJU scans emphasizing the resampling interpolation.
Fig. 3.Overview of the complete framework applied in this study leading to results for further analyses.
Fig. 4.Residual Connection in Encoder/Decoder block for our 3D-Res-U-Net implementation.
Time to converge during training. The reported time is in hours and in parenthesis is the number of epochs that each model was trained for before application for inference.
| DeepMedic | 3D-U-Net | 3D-Res-U-Net | 2D-ResInc | FCN | |
|---|---|---|---|---|---|
| T1-T1 | 18 (35) | 6 (25) | 6 (25) | 150.5 (96) | 3 (25) |
| T2-T2 | 26 (35) | 6 (25) | 6 (25) | 71.5 (38) | 3 (25) |
| T1Gd-T1Gd | 24 (35) | 6 (25) | 6 (25) | 151.7 (100) | 3 (25) |
| Flair-Flair | 18 (35) | 6 (25) | 6 (25) | 147.7 (94) | 3 (25) |
| Multi-2 | 36 (35) | 8 (25) | 8 (25) | 157.3 (91) | 4 (25) |
| Multi-4 | 45 (35) | 8 (25) | 8 (25) | 113.6 (56) | 4 (25) |
| M-A | 62 (45) | 25 (25) | 25 (25) | 321.8 (54) | 17 (25) |
Fig. 5.Quantitative evaluation of various DL network architectures compared with BET and FreeSurfer, using the T1 MRI brain tumor scans.
Fig. 6.Quantitative (average Dice) evaluation of various DL network architectures. From top to bottom rows we see results on the data from (a) UPenn, (b) TJU, and (c) MDA. The evaluated models in this figure include training on individual modalities and their ensemble using majority voting, as well as multi-modality training.
Fig. 7.Quantitative (average Hausdorf f) evaluation of various DL network architectures. From top to bottom rows we see results on the data from (a) UPenn, (b) TJU, and (c) MDA. The evaluated models in this figure include training on individual modalities and their ensemble using majority voting, as well as multi-modality training.
Fig. 8.Evaluation results (Dice) for the selected 3D-Res-U-Net and DeepMedic on the Modality-Agnostic training process. Results also include training on the best results of Fig. 6 for comparison purposes.
Fig. 9.Evaluation results (Hausdorf f95) for the selected 3D-Res-U-Net and DeepMedic on the Modality-Agnostic training process. Results also include training on the best results of Fig. 7 for comparison purposes.
Fig. 10.Mean Dice of model inference on publicly-available multi-institutional data from TCIA. Diverse data contribute to performance improvements. “M-A” training process performs comparably with “T1-T1”.
Fig. 11.Quantitative (average Dice and Hausdorf f) evaluation of the various DL network architectures tested on unseen defaced data from an independent institution (WashU). The evaluated models in this figure include training on individual modalities of the UPenn dataset and their ensemble using majority voting, as well as multi-modality training.
Fig. 12.Evaluation results of the selected 3D-Res-U-Net on the Modality-Agnostic training process tested on unseen defaced data from an independent institution (WashU). Average Dice and Hausdorf f95 metrics are shown in the left and right columns, respectively. Results also include the “T1-T1” and the “Multi-4” models for comparison purposes.