| Literature DB >> 31749678 |
Michael Rebsamen1,2, Urspeter Knecht3, Mauricio Reyes4, Roland Wiest1, Raphael Meier1, Richard McKinley1.
Abstract
It is a general assumption in deep learning that more training data leads to better performance, and that models will learn to generalize well across heterogeneous input data as long as that variety is represented in the training set. Segmentation of brain tumors is a well-investigated topic in medical image computing, owing primarily to the availability of a large publicly-available dataset arising from the long-running yearly Multimodal Brain Tumor Segmentation (BraTS) challenge. Research efforts and publications addressing this dataset focus predominantly on technical improvements of model architectures and less on properties of the underlying data. Using the dataset and the method ranked third in the BraTS 2018 challenge, we performed experiments to examine the impact of tumor type on segmentation performance. We propose to stratify the training dataset into high-grade glioma (HGG) and low-grade glioma (LGG) subjects and train two separate models. Although we observed only minor gains in overall mean dice scores by this stratification, examining case-wise rankings of individual subjects revealed statistically significant improvements. Compared to a baseline model trained on both HGG and LGG cases, two separately trained models led to better performance in 64.9% of cases (p < 0.0001) for the tumor core. An analysis of subjects which did not profit from stratified training revealed that cases were missegmented which had poor image quality, or which presented clinically particularly challenging cases (e.g., underrepresented subtypes such as IDH1-mutant tumors), underlining the importance of such latent variables in the context of tumor segmentation. In summary, we found that segmentation models trained on the BraTS 2018 dataset, stratified according to tumor type, lead to a significant increase in segmentation performance. Furthermore, we demonstrated that this gain in segmentation performance is evident in the case-wise ranking of individual subjects but not in summary statistics. We conclude that it may be useful to consider the segmentation of brain tumors of different types or grades as separate tasks, rather than developing one tool to segment them all. Consequently, making this information available for the test data should be considered, potentially leading to a more clinically relevant BraTS competition.Entities:
Keywords: automatic segmentation; brain tumors; data stratification; deep learning; magnetic resonance imaging; training strategy
Year: 2019 PMID: 31749678 PMCID: PMC6848279 DOI: 10.3389/fnins.2019.01182
Source DB: PubMed Journal: Front Neurosci ISSN: 1662-453X Impact factor: 4.677
Figure 1Summary statistics for the segmentation of the three compartments by means of a Tukey boxplot. p-values indicate statistically significant (p < 0.05) improvements determined by a one-sided Wilcoxon signed rank test.
Ratio in % of better performing subjects compared to baseline.
| LGG vs. Baseline | 41.7 | 0.877 | 49.3 | 0.454 | 54.7 | 0.208 |
| HGG vs. Baseline | 46.7 | 0.877 | ||||
| HGG/LGG vs. Baseline | 54.6 | 0.127 | 48.8 | 0.725 | ||
Statistical significance is determined by a one-sided Wilcoxon signed rank test. Bold numbers indicate statistically significant (p < 0.05) results. CE: contrast enhancing.
Figure 2Absolute change of Dice coefficients of the tumor core for each subject. Positive changes were observed for the subjects on the right side of the dashed vertical line.
Figure 3Performance of the HGG-only model for the tumor core (y-axis) and agreement with the baseline model (x-axis). Subjects with a label were visually reviewed. Colors indicate the center (2013, CBICA, TCIA01-08).
Performance of selected cases for the two models.
| Brats18_2013_11_1 | 1 | 0.14 | 0.45 | 0.90 | 0.17 | 0.61 | 0.89 |
| Brats18_2013_21_1 | 3 | 0.80 | 0.83 | 0.94 | 0.76 | 0.68 | 0.94 |
| Brats18_2013_25_1 | 3 | 0.18 | 0.10 | 0.90 | 0.25 | 0.06 | 0.90 |
| Brats18_CBICA_ABN_1 | 2 | 0.84 | 0.41 | 0.82 | 0.79 | 0.84 | 0.83 |
| Brats18_CBICA_ATF_1 | 3 | 0.69 | 0.73 | 0.69 | 0.65 | 0.51 | 0.63 |
| Brats18_CBICA_AXJ_1 | 2 | 0.79 | 0.35 | 0.90 | 0.79 | 0.58 | 0.90 |
| Brats18_CBICA_BHB_1 | 2 | 0.00 | 0.00 | 0.15 | 0.00 | 0.00 | 0.24 |
| Brats18_CBICA_BHK_1 | 2 | 0.01 | 0.00 | 0.05 | 0.25 | 0.05 | 0.24 |
| Brats18_TCIA01_221_1 | 2 | 0.76 | 0.88 | 0.95 | 0.48 | 0.82 | 0.95 |
| Brats18_TCIA01_411_1 | 1 | 0.07 | 0.24 | 0.71 | 0.23 | 0.48 | 0.64 |
| Brats18_TCIA01_425_1 | – | 0.26 | 0.58 | 0.75 | 0.68 | 0.76 | 0.78 |
| Brats18_TCIA02_171_1 | 2 | 0.89 | 0.47 | 0.95 | 0.89 | 0.90 | 0.95 |
| Brats18_TCIA04_343_1 | 2 | 0.69 | 0.73 | 0.74 | 0.59 | 0.61 | 0.66 |
| Brats18_TCIA05_277_1 | 3 | 0.42 | 0.37 | 0.85 | 0.56 | 0.55 | 0.90 |
| Brats18_TCIA05_444_1 | 3 | 0.39 | 0.96 | 0.94 | 0.54 | 0.30 | 0.89 |
| Brats18_TCIA06_409_1 | – | 0.52 | 0.53 | 0.89 | 0.50 | 0.53 | 0.86 |
| Brats18_TCIA08_113_1 | 1 | 0.91 | 0.36 | 0.97 | 0.79 | 0.52 | 0.92 |
| Brats18_TCIA08_406_1 | 1 | 0.65 | 0.63 | 0.88 | 0.68 | 0.77 | 0.90 |
Assessment after a qualitative review with a neuroradiologist. Assessment 1: Issue with input image quality, 2: Possible problem with ground truth, 3: Special phenotype, GT: ground truth, CE: contrast enhancing.
Figure 4Brats18_2013_21_1.
Figure 5Brats18_2013_25_1.
Figure 6Brats18_CBICA_AXJ_1.
Figure 7Brats18_CBICA_BHB_1.
Figure 8Brats18_TCIA01_221_1.
Figure 9Brats18_TCIA01_425_1.
Figure 10Brats18_TCIA05_444_1.