Literature DB >> 31920609

Multi-Disease Segmentation of Gliomas and White Matter Hyperintensities in the BraTS Data Using a 3D Convolutional Neural Network.

Jeffrey D Rudie1,2, David A Weiss3, Rachit Saluja4, Andreas M Rauschecker2, Jiancong Wang1, Leo Sugrue2, Spyridon Bakas1,5,6, John B Colby2.   

Abstract

An important challenge in segmenting real-world biomedical imaging data is the presence of multiple disease processes within individual subjects. Most adults above age 60 exhibit a variable degree of small vessel ischemic disease, as well as chronic infarcts, which will manifest as white matter hyperintensities (WMH) on brain MRIs. Subjects diagnosed with gliomas will also typically exhibit some degree of abnormal T2 signal due to WMH, rather than just due to tumor. We sought to develop a fully automated algorithm to distinguish and quantify these distinct disease processes within individual subjects' brain MRIs. To address this multi-disease problem, we trained a 3D U-Net to distinguish between abnormal signal arising from tumors vs. WMH in the 3D multi-parametric MRI (mpMRI, i.e., native T1-weighted, T1-post-contrast, T2, T2-FLAIR) scans of the International Brain Tumor Segmentation (BraTS) 2018 dataset (n training = 285, n validation = 66). Our trained neuroradiologist manually annotated WMH on the BraTS training subjects, finding that 69% of subjects had WMH. Our 3D U-Net model had a 4-channel 3D input patch (80 × 80 × 80) from mpMRI, four encoding and decoding layers, and an output of either four [background, active tumor (AT), necrotic core (NCR), peritumoral edematous/infiltrated tissue (ED)] or five classes (adding WMH as the fifth class). For both the four- and five-class output models, the median Dice for whole tumor (WT) extent (i.e., union of AT, ED, NCR) was 0.92 in both training and validation sets. Notably, the five-class model achieved significantly (p = 0.002) lower/better Hausdorff distances for WT extent in the training subjects. There was strong positive correlation between manually segmented and predicted volumes for WT (r = 0.96) and WMH (r = 0.89). Larger lesion volumes were positively correlated with higher/better Dice scores for WT (r = 0.33), WMH (r = 0.34), and across all lesions (r = 0.89) on a log(10) transformed scale. While the median Dice for WMH was 0.42 across training subjects with WMH, the median Dice was 0.62 for those with at least 5 cm3 of WMH. We anticipate the development of computational algorithms that are able to model multiple diseases within a single subject will be a critical step toward translating and integrating artificial intelligence systems into the heterogeneous real-world clinical workflow.
Copyright © 2019 Rudie, Weiss, Saluja, Rauschecker, Wang, Sugrue, Bakas and Colby.

Entities:  

Keywords:  convolutional neural network; deep learning; glioblastoma; multi-disease classification; radiology; segmentation; white matter hyperintensities

Year:  2019        PMID: 31920609      PMCID: PMC6933520          DOI: 10.3389/fncom.2019.00084

Source DB:  PubMed          Journal:  Front Comput Neurosci        ISSN: 1662-5188            Impact factor:   2.380


Introduction

A significant challenge in the deployment of advanced computational methods into typical clinical workflows is the vast heterogeneity of disease processes, which are present both between individuals (inter-subject heterogeneity) and within individuals (intra-subject heterogeneity). Most adults over the age of 60 have a variable degree of abnormal signal on brain MRIs due to age-related changes manifesting as white matter hyperintensities (WMH), which are typically secondary to small vessel ischemic disease (SVID) and chronic infarcts that can be found in subjects with vascular risk factors and clinical histories of stroke and dementia (Wardlaw et al., 2015). These lesions can confound automated detection and segmentation of other disease processes, including brain tumors, which also result in abnormal signal in T2-weighted (T2) and T2 Fluid-attenuated inversion recovery (T2-FLAIR) MRI scans secondary to neoplastic processes and associated edema/inflammation. We sought to address this challenge of intra-individual heterogeneity by leveraging (i) the dataset of the International Multimodal Brain Tumor Segmentation (BraTS) 2018 challenge (Menze et al., 2015; Bakas et al., 2017b, 2019) (ii) expert radiologist expertise, and (iii) three-dimensional (3D) convolutional neural networks (CNNs). Advances in the field of segmentation and radiomics within neuro-oncology have been supported by data made available through The Cancer Imaging Archive (TCIA; Clark et al., 2013). Since 2012, the BraTS challenge has further curated TCIA glioma multi-parametric MRI (mpMRI) scans, segmentation of tumor sub-regions, and survival data in a public dataset and sponsored a yearly challenge to improve performance of automated segmentation and prognostication methods (Menze et al., 2015; Bakas et al., 2017b, 2019). Similar to BraTS, there have been large efforts for improving automatic segmentation of WMH (Griffanti et al., 2016; Habes et al., 2016), which include the MICCAI 2017 WMH competition (Li et al., 2018; Kuijf et al., 2019), as well as stroke lesions, through the Ischemic Stroke Lesion Segmentation Challenge (ISLES; Winzeck et al., 2018). Deep learning (DL) approaches for biomedical image segmentation are now established as superior to the previous generation of atlas-based and hand-engineered feature approaches (Fletcher-Heath et al., 2001; Gooya et al., 2012), as demonstrated by their performance in recent image segmentation challenges (Chang, 2017; Kamnitsas et al., 2017; Li et al., 2018; Bakas et al., 2019; Myronenko, 2019). Deep learning relies on hierarchically organized layers to process increasingly complex intermediate feature maps and utilizes the gradient of the error in predictions with regard to the units of each layer to update model weights, known as “back-propagation.” In visual tasks, this allows for the identification of lower- and intermediate-level image information (feature maps) to maximize classification performance based on annotated datasets (LeCun et al., 2015; Chartrand et al., 2017; Hassabis et al., 2017). Typically, CNNs, a class of feed-forward neural networks, have been used for image-based problems, achieving super-human performance in the ImageNet challenge (Deng et al., 2009. Krizhevsky et al., 2012). The U-Net architecture (Ronneberger et al., 2015; Cicek et al., 2016; Milletari et al., 2016) describes a CNN with an encoding convolutional arm and corresponding decoding [de]convolutional arm has been shown to be particularly useful for 3D biomedical image segmentation through its semantic- and voxel-wise approach, such as for segmentation of abnormal T2-FLAIR signal across a range of diseases (Duong et al., 2019). Several prior machine learning approaches have been used to model inter-subject disease heterogeneity, such as distinguishing on an individual subject basis between primary CNS lymphoma and glioblastoma (Wang et al., 2011), or between different types of brain metastases (Kniep et al., 2018). There is evidence that these approaches may be superior to human radiologists (Suh et al., 2018), yet little work has been done to address intra-subject lesion heterogeneity. Notably, one recent study used CNNs to distinguish between WMH due to SVID versus stroke, finding that training a CNN to explicitly distinguish between these diseases allowed for improved correlation between SVID burden and relevant clinical variables (Guerrero et al., 2018). Although a large body of work has detailed methodological approaches to improve segmentation methods for brain tumors, to the best of our knowledge no prior studies have addressed intra-subject disease heterogeneity in the BraTS dataset. Although the task of distinguishing between different diseases within an individual is typically performed subconsciously by humans, distinguishing between different diseases could be challenging for an automated system if it were not specifically designed and trained to perform such a task. When provided with enough labeled training data, image-based machine learning methods have shown success in identifying patterns that are imperceptible to humans. These include GBM subtypes related to specific genetic mutations (i.e., radiogenomics; Bakas et al., 2017a; Korfiatis et al., 2017; Akbari et al., 2018; Chang et al., 2018; Rathore et al., 2018), or imaging subtypes that are predictive of clinical outcomes (Rathore et al., 2018). Therefore, we sought to train a 3D U-Net model to distinguish between abnormal radiographic signals arising from brain glioma versus WMH in individual subjects, in the mpMRI data of the BraTS 2018 challenge. We hypothesized that this would (1) allow for automatic differentiation of different disease processes, and (2) improve overall accuracy of segmentation of brain tumor extent of disease, particularly in subjects with a large amount of abnormal signal due to WMH.

Materials and Methods

Data

We utilized the publicly available data of the BraTS 2018 challenge that describe a multi-institutional collection of pre-operative mpMRI brain scans of 351 subjects (ntraining = 285, and nvalidation = 66) diagnosed with high-grade (glioblastoma) and lower-grade gliomas. The mpMRI scans comprise native T1-weighted (T1), post-contrast T1-weighted (T1PC), T2, and T2-FLAIR scans. Pre-processing of the provided images included re-orientation to LPS (left-posterior-superior) coordinate system, co-registration to the same T1 anatomic template (Rohlfing et al., 2010), resampling to isotropic 1 mm3 voxel resolution and skull-stripping as detailed in Bakas et al., 2019. Manual expert segmentation of the BraTS dataset delineated three tumor sub-regions: (1) Necrotic core (NCR), (2) active tumor (enhancing tissue; AT), and (3) peritumoral edematous/infiltrated tissue (ED). The whole tumor extent (WT) was considered the union of all these three classes.

Manual Annotation of WMH

In order to define the new tissue class of abnormal signal relating to WMH in the BraTS training subjects, a neuroradiologist (JR; neuroradiology fellow with extensive segmentation experience) defined manually segmentation masks of WMH using ITK-SNAP (Yushkevich et al., 2006). WMH were considered to be abnormal signal due to SVID, chronic infarcts, and/or any periventricular abnormal signal contralateral to the tumor. Examples of these new two class segmentations of the BraTS 2018 dataset are shown in Figure 1.
FIGURE 1

Revised BraTS 2018 training segmentations including annotations of WMH. (A) Sample revised segmentation of abnormal signal due to WT (red) and WMH (green) on T2-FLAIR, T2, T1, and T1PC axial slices. (B) Three additional example revised segmentation maps for tumor and WMH overlaid on T2-FLAIR axial slices.

Revised BraTS 2018 training segmentations including annotations of WMH. (A) Sample revised segmentation of abnormal signal due to WT (red) and WMH (green) on T2-FLAIR, T2, T1, and T1PC axial slices. (B) Three additional example revised segmentation maps for tumor and WMH overlaid on T2-FLAIR axial slices.

U-Net Architecture

We adapted the 3D U-Net architecture (Cicek et al., 2016; Milletari et al., 2016) for voxelwise image segmentation. Our encoder-decoder type fully convolutional deep neural network consists of (1) an encoder limb (with successive blocks of convolution and downsampling encoding progressively deeper/higher-order spatial features), (2) a decoder limb (with a set of blocks – symmetric to those of the encoder limb – of upsampling and convolution, eventually mapping this encoded feature set back onto the input space), and (3) an introduced novel so-called skip connections (whereby outputs of encoding layers are concatenated with inputs to corresponding decoding layers) in order to improve spatial localization over previous generations of fully convolutional networks (3D Res-U-Net; Milletari et al., 2016). Our adaptations from the prototypical U-Net architecture included: 4 channel input data (T1, T1PC, T2, T2-FLAIR), 4 or 5 class output data (background = 0, NCR = 1, ED = 2, AT = 4, WMH = 3), with 3D convolutions, and no voxelwise weighting of input label masks. Training patch size was 80 × 80 × 80 voxels (mm), and inference was conducted in the whole image. We zero padded the provided images to increase its size from 240 × 240 × 155 voxels to 240 × 240 × 160 voxels, and hence being divisible by the training input patch size (80 × 80 × 80). Training patch centerpoints were randomly sampled from within the lesion (90%) or from within the whole brain (10%). Train-time data augmentation was performed with random left-right flipping, and constrained affine warps (maximum rotation 45°, maximum scale ±25%, maximum shear ±0.1). Core convolutional blocks included two nodes each of 3D convolution (3 × 3 × 3 kernel, stride = 1, zero padded), rectified linear unit activation, and batch normalization. Four encoding/decoding levels were used, with 32 convolutional filters (channels) in the base/outermost level, and channel number increased by a factor of two at each level (Figure 2).
FIGURE 2

Multiclass input and multiclass output U-Net schematic. Our U-Net has 4-channel input accepting 3D patches from mpMRI with four encoding and decoding layers, and either a four-class output (background, AT, NCR, and ED) or a five-class output (adding WMH as the fifth class).

Multiclass input and multiclass output U-Net schematic. Our U-Net has 4-channel input accepting 3D patches from mpMRI with four encoding and decoding layers, and either a four-class output (background, AT, NCR, and ED) or a five-class output (adding WMH as the fifth class). The network was trained on an NVIDIA Titan Xp GPU (12GB), using the Xavier initialization scheme, Adam optimization algorithm (Kingma and Lei Ba, 2015; initial learning rate 1e–4), and 2nd order polynomial learning rate decay over 600 epochs. Training time was approximately 4.5 h. 10-fold internal cross validation on the training set was used for hyperparameter optimization and intrinsic estimation of generalization performance during training. For inference on the validation set, the model was retrained 10 times independently on the entire training set (n = 285), and model predictions were averaged. We trained models using this architecture twice; once with the four tissue classes originally annotated in the BraTS dataset, and again with the manual WMH segmentations added as a fifth class. All of the code has been made publicly available at https://github.com/johncolby/svid_paper.

Performance Metrics

Tissue segmentation performance was evaluated with the Dice metric (2×TP/(2×TP + FP + FN); TP = true positive; FP = false positive; FN = false negative; Dice, 1945) for the tumor segmentation in both models, as well as for WMH in the five-class model. In addition, the 95th percentile of the Hausdorff distance (Hausdorff95) was used as a performance evaluation metric, to evaluate the distance between the centers of the predicted and the expert 3D segmentations. The metrics for the four tissue classes and the Hausdorff95 distance were measured by submitting our segmentations to the online BraTS evaluation portal[1] (Davatzikos et al., 2018).

Further Exploration of U-Net Results

In order to further interrogate the performance of our proposed model, we performed correlations between manually segmented and predicted volumes for WT and WMH, as well as Bland Altman plots to assess agreement between the two measures of tissue volumes for both WT and WMH. For the evaluation of WMH, we performed correlations among the 196 cases that contained at least 100 mm3 of WMH. To better understand what could affect performance, we also evaluated correlations between total lesion volumes and Dice scores.

Results

Manual WMH Segmentations

Of the manually revised BraTS training data (285 subjects), we found 196 (68.8%) with at least 100 mm3 of WMH, 109 (38.4%) with at least 1000 mm3 (1 cm3) of WMH, 32 (11.2%) with at least 5000 mm3 (5 cm3) of WMH, and 17 (5.8%) with at least 10000 mm3 (10 cm3) of WMH. The manual WMH segmentations have been made available for public use at https://github.com/johncolby/svid_paper.

Segmentation Performance

The performance metrics for the training (10-fold cross validation) and validation subjects (final model) for each of the tissue classes in the four- and five-class models are shown in Table 1. We achieved a median Dice of 0.92 for WT in both the four- and five-class models, in both the training (p = 0.52; 10-fold cross validation) and validation datasets (p = 0.94). Segmentation performance on AT and tumor core (the union of AT and NCR) were also not significantly different between the four- and five-class models. There were no significant differences between tumor segmentation performance for high- or low-grade gliomas in the training set (p = 0.45).
TABLE 1

Performance metrics of four- and five-class models applied to the BraTS 2018 Training and Validation datasets.

Training (n = 285)
Validation (n = 66)
Performance metric4 Class model5 Class modelp value4 Class model5 Class Modelp value
Dice (Whole Tumor)0.92 (0.87-0.95)0.92 (0.87-0.94)0.520.92 (0.89-0.95)0.92 (0.90-0.95)0.94
Dice (Enhancing Tumor)0.82 (0.68-0.88)0.82 (0.68-0.88)0.760.87 (0.82-0.91)0.87 (0.81-0.91)0.99
Dice (Tumor Core)0.88 (0.75-0.93)0.89 (0.77-0.93)0.750.91 (0.81-0.95)0.91 (0.80-0.95)0.97
Dice (WMH)N/A0.42 (0.25-0.55)N/AN/AN/AN/A
Hausdorff Distance (WT)3.5 (2.2-9.0)3.0 (2.2-4.9)0.0023.1 (2.0-4.5)3 (2.0-4.4)0.84
Performance metrics of four- and five-class models applied to the BraTS 2018 Training and Validation datasets. The median Hausdorff95 distance in the training data was significantly lower (p = 0.002; two tailed t-test) in the five-class model (3.0, interquartile range 2.2–9.0) than the traditional four-class model (3.5, interquartile range 2.2–4.9). Example training cases where the Hausdorff95 distance were much better in the five-class model are shown in Figure 3 with predicted segmentations for AT, NCR, ED, and WMH for both the four and five-class models. However, the Hausdorff95 distance was not significantly different in the validation data (p = 0.84). Example validation cases with greater than 5 cm3 of WMH are shown in Figure 4, with predicted segmentations for AT, NCR, ED, and WMH.
FIGURE 3

Example segmentations in training subjects with smaller (better) Hausdorff distance metrics in the model with WMH. Axial T2-FLAIR slices of four example training cases are shown in the first row. The ground truth segmentations are overlaid in the second row (background, AT, NCR, and ED). The predicted tumor segmentations overlaid from the four-class model (background, AT, NCR, and ED) and the five-class model (background, AT, NCR, ED, and WMH) are shown in the third and fourth rows, respectively. The red arrows indicate multiple WMH distal to the tumor that were incorrectly classified in the four-class model as either ED or NCR.

FIGURE 4

Example predicted segmentations in validation subjects with greater than 5 cm3 of WMH. Axial T2-FLAIR slices of four example validation cases are shown in the first row. The predicted tumor segmentations overlaid from the four-class model (background, AT, NCR, and ED) and the five-class model (background, AT, NCR, ED, and WMH) are shown in the second and third row, respectively.

Example segmentations in training subjects with smaller (better) Hausdorff distance metrics in the model with WMH. Axial T2-FLAIR slices of four example training cases are shown in the first row. The ground truth segmentations are overlaid in the second row (background, AT, NCR, and ED). The predicted tumor segmentations overlaid from the four-class model (background, AT, NCR, and ED) and the five-class model (background, AT, NCR, ED, and WMH) are shown in the third and fourth rows, respectively. The red arrows indicate multiple WMH distal to the tumor that were incorrectly classified in the four-class model as either ED or NCR. Example predicted segmentations in validation subjects with greater than 5 cm3 of WMH. Axial T2-FLAIR slices of four example validation cases are shown in the first row. The predicted tumor segmentations overlaid from the four-class model (background, AT, NCR, and ED) and the five-class model (background, AT, NCR, ED, and WMH) are shown in the second and third row, respectively. We achieved a median Dice of 0.42 in the 189 subjects with WMH of at least 100 mm3. Median Dice for WMH in subjects with at least 1000 mm3 (1 cm3), 5000 mm3 (5 cm3) and 10000 mm3 (10 cm3) of WMH was 0.52, 0.62, and 0.67, respectively.

Correlation Between Predicted Lesion Volumes and Manual Segmented Volumes

Within the training dataset there was a strong correlation between manually segmented WT volume and predicted WT volume (Pearson r = 0.96, p < 0.0001; Figure 5A). There was also a strong correlation between manually segmented WMH volume and predicted WMH volume (Pearson r = 0.89, p < 0.0001; Figure 5B). Bland-Altman plots assessing agreement between manual and predicted volume for WT and WMH are shown in Figures 5C,D.
FIGURE 5

Relationship between manually segmented volume and U-Net predicted volume. (A) Pearson correlation between manually segmented tumor volume and U-Net predicted tumor volume. (B) Spearman ranked correlation between manually segmented WMH volume and U-Net predicted WMH volume. (C) Bland–Altman plot for WT manually segmented volume and U-Net predicted volume. (D) Bland–Altman plot for WMH manually segmented volume and U-Net predicted volume. The dotted lines in panels (C,D) mark the bounds of the 95% confidence interval of the bias.

Relationship between manually segmented volume and U-Net predicted volume. (A) Pearson correlation between manually segmented tumor volume and U-Net predicted tumor volume. (B) Spearman ranked correlation between manually segmented WMH volume and U-Net predicted WMH volume. (C) Bland–Altman plot for WT manually segmented volume and U-Net predicted volume. (D) Bland–Altman plot for WMH manually segmented volume and U-Net predicted volume. The dotted lines in panels (C,D) mark the bounds of the 95% confidence interval of the bias.

Correlations Between Lesion Volumes and Dice

Within the training dataset there was a significant correlation between manually segmented WT volumes and WT Dice scores (Pearson r = 0.33, p < 0.0001; Figure 6A) and between manually segmented WMH volumes and WMH Dice scores (Pearson r = 0.34; p < 0.0001; Figure 6B). When combining WT and WMH, there was a stronger correlation between lesion volumes and Dice scores (Pearson r = 0.68; p < 0.0001; Figure 6C), which was even stronger when the volumes were transformed to a logarithmic (log(10)) scale (Pearson r = 0.89; p < 0.0001; Figure 6D). There was no significant relationship between WMH volume and WT Dice scores (Pearson r = −0.05, p = 0.42).
FIGURE 6

Relationship between Dice score and lesion volume. (A) Scatter plot and Pearson correlation between WT volume and WT Dice score. (B) Scatter plot and Pearson correlation between WMH volumes and WMH Dice scores. (C) Scatter plot and Pearson correlation between volumes and Dice scores for both WMH and WT. (D) Scatter plot and Pearson correlation between log(10) transformed volumes and Dice scores for both WMH and WT.

Relationship between Dice score and lesion volume. (A) Scatter plot and Pearson correlation between WT volume and WT Dice score. (B) Scatter plot and Pearson correlation between WMH volumes and WMH Dice scores. (C) Scatter plot and Pearson correlation between volumes and Dice scores for both WMH and WT. (D) Scatter plot and Pearson correlation between log(10) transformed volumes and Dice scores for both WMH and WT.

Discussion

Advanced computational methods are poised to improve diagnostic and treatment methods for patients diagnosed with glioma (Davatzikos et al., 2019; Rudie et al., 2019). However, a critical challenge facing the eventual deployment of artificial intelligence systems into daily clinical practice is disease heterogeneity within subjects. In this study, we utilized the BraTS 2018 dataset and expert-revised WMH segmentations to train a state-of-the-art CNN to successfully distinguish and quantify abnormal signal due to WMH as a distinct tissue class from glioma tissue sub-regions. We used a 3D CNN (U-Net architecture; Cicek et al., 2016; Milletari et al., 2016) for multiclass tissue segmentation with performance at the top 10% of the BraTS 2018 leaderboard (Bakas et al., 2019; noting that we did not participate in the official competition). U-Nets have been particularly adept at medical image segmentation, due to their ability to convert feature maps obtained during convolutions into a vector and from that vector reconstruct a segmentation, which reduces distortion by preserving the structural integrity. To our knowledge this is the first study to distinguish intra-subject lesion heterogeneity in the BraTS dataset, noting that Guerrero et al. (2018) previously used a U-Net architecture to distinguish chronic infarcts from WMH due to SVID. Although we hypothesized that adding WMH as a tissue class could improve tumor segmentation performance, we did not find a significant difference between tumor segmentation overlap (Dice) in the model that incorporated WMH as an additional class. Incorporating WMH as a distinct fifth-class did significantly (p = 0.002; two tailed t-test) improve the Hausdorff (95th percentile) distance metric within the training sample. As the Hausdorff95 distance reflects the center of the lesion, and WMH are often far from the tumor, poorer Hausdorff95 distance in the four-class model was likely due to false positive segmentations of WMH as tumor as demonstrated in Figure 3. However, upon reviewing validation cases with larger amounts of predicted WMH (Figure 4), it appeared that the original four-class model, although not explicitly trained to model WMH, mostly learned to implicitly ignore most WMH, likely due to spatial characteristics of the WMH being distant from the primary tumors and in characteristic locations and shapes. It is possible that the addition of WMH as an additional class could degrade segmentation performance of ED that was relatively distal to the center of tumor, thus the benefits of reducing distal WMH false positives in the five-class model may have be counterbalanced by increasing false negatives. As evidenced by the BraTS leaderboard, a Dice of ∼0.90 is considered excellent and has previously been shown to be at a level similar to inter-rater reliability for BraTS (Visser et al., 2019). As demonstrated in Figure 6, we found that lesion volume was an important predictor of Dice scores for both WT (Figure 6A) and WMH (Figure 6B). When evaluating both WT and WMH (Figures 6C,D), we found that the majority of the variance in Dice scores was explained by lesion volume, particularly when transformed to a logarithmic scale (Figure 6D). Thus, poorer performance for WMH in our data appear to largely be driven by smaller lesion sizes. This is consistent with prior literature that has also shown positive correlations between lesion volumes and Dice scores (Winzeck et al., 2018; Duong et al., 2019). Although our reported Dice scores for WMH appear relatively low (0.42), it should be noted that average volume of WMH in the MICCAI 2017 dataset was 16.9 cm3 (Kuijf et al., 2019). When looking at cases with larger volumes of WMH (>10 cm3) the average Dice score (0.67) was more similar to those reported in the 2017 MICCAI WMH dataset (0.70–0.80; Kuijf et al., 2019). A further explanation for reduced segmentation performance of smaller lesions may be lower inter-rater reliability, such as what has been reported in multiple sclerosis (Dice ∼0.60; Egger et al., 2017). A limitation of the current study is that there is only a single expert annotation for both the BraTS dataset and the WMH, thus the contribution of inter-rater reliability could not be assessed. In the future we also plan to improve detection of smaller lesions by using different neural network architectures, such as two-stage detectors (Girshick et al., 2014), or implementing different loss functions, such a focal loss (Lin et al., 2018; Abraham and Khan, 2019). As artificial intelligence tools start to become integrated with clinical workflows for more precise quantitative assessments of disease burden, it will be necessary to distinguish, quantify and longitudinally assess a variety of disease processes, in order to assist with more accurate and efficient clinical decision-making. Explicitly tackling intra-subject disease heterogeneity by training models to perform these tasks should help translate these advanced computational methods into clinical practice.

Data Availability Statement

The datasets generated for this study can be found in the BraTS 2018 dataset https://www.med.upenn.edu/sbia/brats2018/data.html. The manual WMH segmentations and code used in this manuscript have been made available for public use at https://github.com/johncolby/svid_paper.

Ethics Statement

Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

Author Contributions

JR and JC performed the analysis and prepared the initial manuscript. DW, RS, and JW performed some of the analyses. AR, LS, and SB helped to direct the research. All authors helped to revise the manuscript.

Conflict of Interest

DW and RS are consultants for the company Galileo CDS and receive consultant fees for their work, which is not directly related to this manuscript. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
  32 in total

1.  Primary central nervous system lymphoma and atypical glioblastoma: Differentiation using radiomics approach.

Authors:  Hie Bum Suh; Yoon Seong Choi; Sohi Bae; Sung Soo Ahn; Jong Hee Chang; Seok-Gu Kang; Eui Hyun Kim; Se Hoon Kim; Seung-Koo Lee
Journal:  Eur Radiol       Date:  2018-04-06       Impact factor: 5.315

2.  White matter hyperintensities and imaging patterns of brain ageing in the general population.

Authors:  Mohamad Habes; Guray Erus; Jon B Toledo; Tianhao Zhang; Nick Bryan; Lenore J Launer; Yves Rosseel; Deborah Janowitz; Jimit Doshi; Sandra Van der Auwera; Bettina von Sarnowski; Katrin Hegenscheid; Norbert Hosten; Georg Homuth; Henry Völzke; Ulf Schminke; Wolfgang Hoffmann; Hans J Grabe; Christos Davatzikos
Journal:  Brain       Date:  2016-02-24       Impact factor: 13.501

Review 3.  Deep learning.

Authors:  Yann LeCun; Yoshua Bengio; Geoffrey Hinton
Journal:  Nature       Date:  2015-05-28       Impact factor: 49.962

4.  Focal Loss for Dense Object Detection.

Authors:  Tsung-Yi Lin; Priya Goyal; Ross Girshick; Kaiming He; Piotr Dollar
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2018-07-23       Impact factor: 6.226

5.  Fully convolutional network ensembles for white matter hyperintensities segmentation in MR images.

Authors:  Hongwei Li; Gongfa Jiang; Jianguo Zhang; Ruixuan Wang; Zhaolei Wang; Wei-Shi Zheng; Bjoern Menze
Journal:  Neuroimage       Date:  2018-08-18       Impact factor: 6.556

6.  In Vivo Detection of EGFRvIII in Glioblastoma via Perfusion Magnetic Resonance Imaging Signature Consistent with Deep Peritumoral Infiltration: The φ-Index.

Authors:  Spyridon Bakas; Hamed Akbari; Jared Pisapia; Maria Martinez-Lage; Martin Rozycki; Saima Rathore; Nadia Dahmane; Donald M O'Rourke; Christos Davatzikos
Journal:  Clin Cancer Res       Date:  2017-04-20       Impact factor: 12.531

7.  Residual Deep Convolutional Neural Network Predicts MGMT Methylation Status.

Authors:  Panagiotis Korfiatis; Timothy L Kline; Daniel H Lachance; Ian F Parney; Jan C Buckner; Bradley J Erickson
Journal:  J Digit Imaging       Date:  2017-10       Impact factor: 4.056

8.  The Cancer Imaging Archive (TCIA): maintaining and operating a public information repository.

Authors:  Kenneth Clark; Bruce Vendt; Kirk Smith; John Freymann; Justin Kirby; Paul Koppel; Stephen Moore; Stanley Phillips; David Maffitt; Michael Pringle; Lawrence Tarbox; Fred Prior
Journal:  J Digit Imaging       Date:  2013-12       Impact factor: 4.056

9.  Differentiation between glioblastomas, solitary brain metastases, and primary cerebral lymphomas using diffusion tensor and dynamic susceptibility contrast-enhanced MR imaging.

Authors:  S Wang; S Kim; S Chawla; R L Wolf; D E Knipp; A Vossough; D M O'Rourke; K D Judy; H Poptani; E R Melhem
Journal:  AJNR Am J Neuroradiol       Date:  2011-02-17       Impact factor: 3.825

Review 10.  What are white matter hyperintensities made of? Relevance to vascular cognitive impairment.

Authors:  Joanna M Wardlaw; Maria C Valdés Hernández; Susana Muñoz-Maniega
Journal:  J Am Heart Assoc       Date:  2015-06-23       Impact factor: 5.501

View more
  8 in total

1.  Artificial Intelligence System Approaching Neuroradiologist-level Differential Diagnosis Accuracy at Brain MRI.

Authors:  Andreas M Rauschecker; Jeffrey D Rudie; Long Xie; Jiancong Wang; Michael Tran Duong; Emmanuel J Botzolakis; Asha M Kovalovich; John Egan; Tessa C Cook; R Nick Bryan; Ilya M Nasrallah; Suyash Mohan; James C Gee
Journal:  Radiology       Date:  2020-04-07       Impact factor: 11.105

2.  Interinstitutional Portability of a Deep Learning Brain MRI Lesion Segmentation Algorithm.

Authors:  Andreas M Rauschecker; Tyler J Gleason; Pierre Nedelec; Michael Tran Duong; David A Weiss; Evan Calabrese; John B Colby; Leo P Sugrue; Jeffrey D Rudie; Christopher P Hess
Journal:  Radiol Artif Intell       Date:  2021-11-10

3.  Longitudinal Assessment of Posttreatment Diffuse Glioma Tissue Volumes with Three-dimensional Convolutional Neural Networks.

Authors:  Jeffrey D Rudie; Evan Calabrese; Rachit Saluja; David Weiss; John B Colby; Soonmee Cha; Christopher P Hess; Andreas M Rauschecker; Leo P Sugrue; Javier E Villanueva-Meyer
Journal:  Radiol Artif Intell       Date:  2022-08-03

4.  Brain MRI Deep Learning and Bayesian Inference System Augments Radiology Resident Performance.

Authors:  Jeffrey D Rudie; Jeffrey Duda; Michael Tran Duong; Po-Hao Chen; Long Xie; Robert Kurtz; Jeffrey B Ware; Joshua Choi; Raghav R Mattay; Emmanuel J Botzolakis; James C Gee; R Nick Bryan; Tessa S Cook; Suyash Mohan; Ilya M Nasrallah; Andreas M Rauschecker
Journal:  J Digit Imaging       Date:  2021-06-15       Impact factor: 4.903

5.  Trends in Development of Novel Machine Learning Methods for the Identification of Gliomas in Datasets That Include Non-Glioma Images: A Systematic Review.

Authors:  Harry Subramanian; Rahul Dey; Waverly Rose Brim; Niklas Tillmanns; Gabriel Cassinelli Petersen; Alexandria Brackett; Amit Mahajan; Michele Johnson; Ajay Malhotra; Mariam Aboian
Journal:  Front Oncol       Date:  2021-12-23       Impact factor: 6.244

6.  Automated detection and quantification of brain metastases on clinical MRI data using artificial neural networks.

Authors:  Irada Pflüger; Tassilo Wald; Fabian Isensee; Marianne Schell; Hagen Meredig; Kai Schlamp; Denise Bernhardt; Gianluca Brugnara; Claus Peter Heußel; Juergen Debus; Wolfgang Wick; Martin Bendszus; Klaus H Maier-Hein; Philipp Vollmuth
Journal:  Neurooncol Adv       Date:  2022-08-23

7.  Three-dimensional U-Net Convolutional Neural Network for Detection and Segmentation of Intracranial Metastases.

Authors:  Jeffrey D Rudie; David A Weiss; John B Colby; Andreas M Rauschecker; Benjamin Laguna; Steve Braunstein; Leo P Sugrue; Christopher P Hess; Javier E Villanueva-Meyer
Journal:  Radiol Artif Intell       Date:  2021-03-10

8.  Deep learning-enabled analysis reveals distinct neuronal phenotypes induced by aging and cold-shock.

Authors:  Sahand Saberi-Bosari; Kevin B Flores; Adriana San-Miguel
Journal:  BMC Biol       Date:  2020-09-23       Impact factor: 7.431

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.