| Literature DB >> 32312276 |
Khaled Bousabarah1,2, Maximilian Ruge3, Julia-Sarita Brand3, Mauritius Hoevels3, Daniel Rueß3, Jan Borggrefe4, Nils Große Hokamp4, Veerle Visser-Vandewalle3, David Maintz4, Harald Treuer3, Martin Kocher3.
Abstract
INTRODUCTION: Deep learning-based algorithms have demonstrated enormous performance in segmentation of medical images. We collected a dataset of multiparametric MRI and contour data acquired for use in radiosurgery, to evaluate the performance of deep convolutional neural networks (DCNN) in automatic segmentation of brain metastases (BM).Entities:
Keywords: Brain metastasis; Deep learning; Magnetic resonance imaging; Segmentation; Stereotactic radiosurgery
Mesh:
Year: 2020 PMID: 32312276 PMCID: PMC7171921 DOI: 10.1186/s13014-020-01514-6
Source DB: PubMed Journal: Radiat Oncol ISSN: 1748-717X Impact factor: 3.481
Fig. 1Architecture of the trained U-Net. All convolutions used filters with a kernel size of 3. Before all convolutions, instance normalization and the activation function (leaky ReLU) were applied to the input. The residual block contained two such convolutions. Downsampling in the encoding layer was realized using a convolution with a stride of 2. In the output layers the sigmoidal function is applied to the DCNN’s output. For the moU-Net, two intermediate output layers are added (dashed red lines). The original contour data is then used to compute the cost function
Characteristics of the patient cohort enrolled in this study
| Train | Test | |
|---|---|---|
| No. of Patients | 469 | 40 |
| Gender (Female/ Male) | 244 / 225 | 26 /14 |
| Mean Age | 61 | 62 |
| No. of Metastases | 1149 | 83 |
| Metastases < 0.4 ml (No. of Patients) | 524 (257) | 47 (24) |
| Median Lesion Size | 0.31 ml | 0.47 ml |
| IQR | (0.09–1.32 ml) | (0.14–1.82 ml) |
| Mean Lesion Size | 1.29 ml | 1.92 ml |
| Lung | 255 | 20 |
| Melanoma | 85 | 9 |
| Breast | 48 | 4 |
| Other | 66 | 6 |
| Mixed | 9 | 0 |
| CUP | 6 | 1 |
| 2013 | 13 | 1 |
| 2014 | 78 | 2 |
| 2015 | 98 | 6 |
| 2016 | 97 | 14 |
| 2017 | 97 | 8 |
| 2018 | 63 | 6 |
| 2019 | 23 | 3 |
| Ingenia (3.0 T) | 372 | 37 |
| Ingenia (1.5 T) | 2 | 0 |
| Archieva (3.0 T) | 93 | 3 |
| Intera (1.5 T) | 2 | 0 |
Performance of the algorithms by network type and type of ensemble building. SUM: summation, MV: majority voting
| DCNN Type | Ensemble | Sensitivity | Precision | F1-Score | Sensitivity Small BM | AFPR | Mean DSC | DSC |
|---|---|---|---|---|---|---|---|---|
| Method | per Lesion | |||||||
| moU-Net | SUM | 0.71 | 0.89 | 0.79 | 0.51 | 0.18 | 0.71 | 0.74 |
| cU-Net | SUM | 0.71 | 0.94 | 0.81 | 0.51 | 0.1 | 0.7 | 0.73 |
| sU-Net | SUM | 0.53 | 0.85 | 0.65 | 0.68 | 0.2 | 0.27 | 0.61 |
| 0.83 | 0.82 | 0.35 | 0.7 | |||||
| moU-Net | MV | 0.65 | 0.96 | 0.78 | 0.43 | 0.05 | 0.71 | 0.73 |
| cU-Net | MV | 0.63 | 1 | 0.77 | 0.4 | 0 | 0.69 | 0.73 |
| sU-Net | MV | 0.43 | 0.95 | 0.59 | 0.62 | 0.05 | 0.21 | 0.52 |
| 0.77 | 0.64 | 0.71 |
DSC dice similarity coefficient, AFPR average false positive rate, F1-score combines sensitivity and specificity into a single metric by calculation of their harmonic mean in order to find the most balanced model
Fig. 2Sensitivity and Specificity of the developed networks plotted against the minimum volume of the considered target lesions. Dashed lines depict the four quartiles (Q) of the measured volumes of target lesions in the test data (Q1 = 0.06 ml, Q2 = 0.29 ml, Q3 = 1.29 ml, Q4 = 8.05 ml). The largest drop in both sensitivity and specificity is observed for lesions smaller than 0.06 ml. At this threshold the sensitivities and specificities are 0.97/0.92 and 0.92/1.00 for NetSUM and NetMV respectively
Fig. 3a: Bland-Altmann plot visualizing agreement between manually delineated ground truth and automatic segmentation by DCNN per patient. The middle horizontal line is drawn at the mean difference (0.15 ml) between both measurements and the lines below and above at the limits of agreement (95% CI). b: Volume predicted by DCNN plotted against manual segmentation. The concordance correlation coefficient (CCC) measuring deviation from the diagonal line depicting perfect agreement between both volumes was 0.87
Fig. 4Samples of T1c images (if not otherwise specified) containing ground truth segmentations (blue lines) and segmentations by DCNN (purple lines). a: Randomly selected samples of detected lesions. Number in bottom left of each image is the percentage of segmentations with similar quality of segmentation measured by DSC. b: Samples of undetected lesions. Atypical BM: Largest undetected lesion with minor contrast-uptake in rim. Wrong T1c: Second-largest undetected lesion where T1c images came from a different study than T2 and FLAIR images. Small BM: Randomly selected samples of undetected lesions
Fig. 5Results per lesion for all algorithms (cU-Net, moU-Net, sU-Net and their combination) and ensemble building through summation and majority voting. A lesion in the test set (40 patients, 83 lesions) was considered detected if it overlapped with a segmentation produced by the respective algorithm. The degree of overlap and thus the quality of the segmentation was assessed using the dice similarity coefficient (DSC). The dashed blue line is the threshold at which a lesion was defined as small (< 0.4 ml) and thus used to train the sU-Net
Comparison of deep learning based segmentation studies for brain metastases (adapted from Dikici et al. [7]). AFPR: average false positive rate
| Study | Patients | Multiparametric MRI | Dedicated Test Set | Mean BM volume (mm) | Median BM volume (mm) | Sensitivity | AFPR |
|---|---|---|---|---|---|---|---|
| Losch et al. [ | 490 | no | yes | NA | NA | 0.83 | 7.7 |
| Charron et al. [ | 182 | yes | yes | 2400 | 500 | 0.93 | 7.8 |
| Grøvik et al. [ | 156 | yes | no | NA | NA | 0.83 | 8.3 |
| Dikici et al. [ | 158 | no | no | 159.6 | 50.4 | 0.9 | 9.12 |
| Present paper | 509 | yes | yes | 1290 (train) | 310 (train) | 0.77–0.82 | 0.08–0.35 |
| 1920 (test) | 470 (test) |