| Literature DB >> 30555005 |
Sergi Valverde1, Mostafa Salem2, Mariano Cabezas3, Deborah Pareto4, Joan C Vilanova5, Lluís Ramió-Torrentà6, Àlex Rovira4, Joaquim Salvi3, Arnau Oliver3, Xavier Lladó3.
Abstract
In recent years, several convolutional neural network (CNN) methods have been proposed for the automated white matter lesion segmentation of multiple sclerosis (MS) patient images, due to their superior performance compared with those of other state-of-the-art methods. However, the accuracies of CNN methods tend to decrease significantly when evaluated on different image domains compared with those used for training, which demonstrates the lack of adaptability of CNNs to unseen imaging data. In this study, we analyzed the effect of intensity domain adaptation on our recently proposed CNN-based MS lesion segmentation method. Given a source model trained on two public MS datasets, we investigated the transferability of the CNN model when applied to other MRI scanners and protocols, evaluating the minimum number of annotated images needed from the new domain and the minimum number of layers needed to re-train to obtain comparable accuracy. Our analysis comprised MS patient data from both a clinical center and the public ISBI2015 challenge database, which permitted us to compare the domain adaptation capability of our model to that of other state-of-the-art methods. In both datasets, our results showed the effectiveness of the proposed model in adapting previously acquired knowledge to new image domains, even when a reduced number of training samples was available in the target dataset. For the ISBI2015 challenge, our one-shot domain adaptation model trained using only a single case showed a performance similar to that of other CNN methods that were fully trained using the entire available training set, yielding a comparable human expert rater performance. We believe that our experiments will encourage the MS community to incorporate its use in different clinical settings with reduced amounts of annotated data. This approach could be meaningful not only in terms of the accuracy in delineating MS lesions but also in the related reductions in time and economic costs derived from manual lesion labeling.Entities:
Keywords: Automatic lesion segmentation; Brain; Convolutional neural networks; MRI; Multiple sclerosis
Mesh:
Year: 2018 PMID: 30555005 PMCID: PMC6413299 DOI: 10.1016/j.nicl.2018.101638
Source DB: PubMed Journal: Neuroimage Clin ISSN: 2213-1582 Impact factor: 4.881
Fig. 1Eleven-layer CNN model architecture trained using multi-sequence 3D image patches (FLAIR and T1-w) that are 11 × 11 × 11 in size. Compared to the original implementation in Valverde et al. (2017), we double the number of convolutional layers (3DCONV) before each of the two max-pooling layers (MP) and we add two additional fully connected layers of sizes 128 (FC2) and 64 (FC3), before the softmax layer.
Fig. 2Supervised intensity domain adaptation framework. From the 11 layer CNN source model trained on two public MS datasets (see Subsection 2.2), we transfer the model knowledge to an unseen target image domain. Domain adaptation is performed via 3 possible configurations by retraining the first FC layer, two FC layers or all FC layers using images and labels from the target intensity domain. In all of the configurations, the layers that are not re-trained are depicted in gray.
Training parameters on each of the CNN models used. When training the source model (see Subsection 2.2), all of the network layers are optimized from scratch. On the target models, only the last FC layer (FC3), last two FC layers (F2 + FC3) or all FC layers (FC1 + FC2 + FC3) are optimized, which significantly reduces the number of training parameters.
| Model | Trained layers | Network param |
|---|---|---|
| Source | All (11 layers) | 470,466 |
| Target 3 layers | FC1 + FC2 + FC3 | 172,928 |
| Target 2 layers | FC2 + FC3 | 41,344 |
| Target 1 layer | FC3 | 8320 |
Clinical MS dataset: DSC, sensitivity and precision coefficients for each of the models re-trained using a single case with varying degree of lesion load. For comparison, the obtained values for SLS (Roura et al., 2015), LST (Schmidt et al., 2012) and the same cascaded CNN method fully trained using the 30 available training cases (Valverde et al., 2017) are also shown. For each coefficient, the reported values are the mean (standard deviation) when evaluated on the 30 testing cases.
| llesion vol (num lesions) | DSC | Sensitivity | Precision |
|---|---|---|---|
| 1 layer (FC3) | |||
| 0.5 ml (9 lesions) | 0.30 (0.19) | 0.44 (0.23) | 0.49 (0.30) |
| 1.2 ml (11 lesions) | 0.39 (0.19) | 0.44 (0.19) | 0.67 (0.23) |
| 3.1 ml (17 lesions) | 0.38 (0.22) | 0.46 (0.20) | 0.54 (0.25) |
| 8.3 ml (90 lesions) | 0.44 (0.17) | 0.58 (0.19) | 0.58 (0.26) |
| 18 ml (78 lesions) | 0.47 (0.18) | 0.59 (0.18) | 0.58 (0.23) |
| 2 layers (FC2 + FC3) | |||
| 0.5 ml (9 lesions) | 0.30 (0.17) | 0.52 (0.23) | 0.54 (0.28) |
| 1.2 ml (11 lesions) | 0.39 (0.18) | 0.49 (0.21) | 0.72 (0.29) |
| 3.1 ml (17 lesions) | 0.36 (0.22) | 0.42 (0.20) | 0.54 (0.27) |
| 8.3 ml (90 lesions) | 0.45 (0.15) | 0.55 (0.18) | 0.66 (0.24) |
| 18 ml (78 lesions) | 0.44 (0.19) | 0.62 (0.20) | 0.52 (0.25) |
| 3 layers (FC1 + FC2 + FC3) | |||
| 0.5 ml (9 lesions) | 0.28 (0.17) | 0.48 (0.22) | 0.48 (0.28) |
| 1.2 ml (11 lesions) | 0.38 (0.17) | 0.52 (0.22) | 0.72 (0.26) |
| 3.1 ml (17 lesions) | 0.38 (0.21) | 0.46 (0.21) | 0.55 (0.25) |
| 8.3 ml (90 lesions) | 0.44 (0.17) | 0.61 (0.17) | 0.57 (0.26) |
| 18 ml (78 lesions) | 0.45 (0.18) | 0.60 (0.21) | 0.55 (0.23) |
| Source (0 lesions) | 0.23 (0.22) | 0.42 (0.43) | 0.45 (0.34) |
| SLS | 0.25 (0.17) | 0.34 (0.25) | 0.51 (0.30) |
| LST | 0.28 (0.23) | 0.31 (0.21) | 0.59 (0.27) |
| CNN | 0.53 (0.16) | 0.60 (0.21) | 0.75 (0.21) |
Fig. 3Effect of the number of re-trained FC layers and training images on the DSC, sensitivity and precision coefficients when evaluated on the clinical MS dataset. The represented value for each configuration is computed as the mean DSC, sensitivity and precision scores over the 30 testing images. For comparison, the obtained values for the lesion segmentation methods SLS (Roura et al., 2015) (× pink line), LST (Schmidt et al., 2012) (+ cyan line) and the same cascaded CNN method fully trained using all of the available training data (Valverde et al., 2017) (− black line) are shown.
ISBI dataset: DSC, sensitivity and precision coefficients for each of the models re-trained using a single case of the training dataset against the silver masks. For comparison, the obtained values for the same source CNN method without domain adaptation (see Subsection 2.2) are also shown. For each coefficient, the reported values are the mean (standard deviation) when evaluated on the 61 testing images.
| lesion vol (num lesions) | DSC | Sensitivity | Precision |
|---|---|---|---|
| 1 layer (FC3) | |||
| ISBI01 (17.4 ml, 29 lesions) | 0.56 (0.14) | 0.80 (0.11) | 0.62 (0.07) |
| ISBI02 (26.8 ml, 45 lesions) | 0.51 (0.21) | 0.83 (0.13) | 0.55 (0.07) |
| ISBI03 (5.9 ml, 26 lesions) | 0.65 (0.11) | 0.60 (0.17) | 0.80 (0.14) |
| ISBI04 (2.3 ml, 20 lesions) | 0.33 (0.12) | 0.41 (0.16) | 0.81 (0.14) |
| ISBI05 (4.3 ml, 22 lesions) | 0.54 (0.11) | 0.56 (0.16) | 0.84 (0.12) |
| 2 layers (FC2 + FC3) | |||
| ISBI01 (17.4 ml, 29 lesions) | 0.56 (0.14) | 0.74 (0.11) | 0.59 (0.06) |
| ISBI02 (26.8 ml, 45 lesions) | 0.53 (0.21) | 0.87 (0.11) | 0.56 (0.06) |
| ISBI03 (5.9 ml, 26 lesions) | 0.65 (0.11) | 0.66 (0.15) | 0.79 (0.13) |
| ISBI04 (2.3 ml, 20 lesions) | 0.47 (0.12) | 0.48 (0.18) | 0.83 (0.11) |
| ISBI05 (4.3 ml, 22 lesions) | 0.56 (0.11) | 0.54 (0.16) | 0.82 (0.13) |
| 3 layers (FC1 + FC2 + FC3) | |||
| ISBI01 (17.4 ml,29 lesions) | 0.66 (0.10) | 0.73 (0.11) | 0.78 (0.10) |
| ISBI02 (26.8 ml,45 lesions) | 0.69 (0.13) | 0.70 (0.18) | 0.77 (0.10) |
| ISBI03 (5.9 ml, 26 lesions) | 0.65 (0.11) | 0.63 (0.13) | 0.79 (0.14) |
| ISBI04 (2.3 ml, 20 lesions) | 0.47 (0.14) | 0.40 (0.16) | 0.84 (0.08) |
| ISBI05 (4.3 ml, 22 lesions) | 0.46 (0.12) | 0.46 (0.17) | 0.87 (0.13) |
| 0.33 (0.12) | 0.40 (0.16) | 0.72 (0.14) | |
Fig. 4Output segmentation masks for the first image of the ISBI testing set. (A) FLAIR and (B) T1-w input masks. Silver mask (C) obtained based on the same CNN method fully trained on the entire training dataset (Valverde et al., 2017). The other panels show the output masks for the one-shot domain adaptation model re-trained only for the last FC layer using the images (D) ISBI01 (17.4 ml), (E) ISBI02 (26.8 ml), (F) ISBI03 (5.9 ml), (G) ISBI04 (2.3 ml), and (H) ISBI05 (4.3 ml). The blue regions depict the overlapped lesion voxels between the silver mask and each of the models. The red and green regions depict false-positive and false-negative lesion voxels, respectively, with respect to the silver masks.
ISBI challenge: DSC, sensitivity, precision and overall score coefficients for the best one-shot domain adaptation model (ISBI02 with 3 layers) after submitting the segmentation masks for blind evaluation. The obtained results are compared with different top rank participant strategies and the same model fully trained on all the available data. For each method, the reported values are extracted from the challenge results board. The reported values are the mean (standard deviation) when evaluated on the 61 testing images. The performance of the methods with an overall score ≥ 90 is considered to be similar to human performance.
| Method | DSC | Sensitivity | Precision | Score |
|---|---|---|---|---|
| 0.63 (0.14) | 0.54 (0.19) | 0.84 (0.10) | 92.07 | |
| 0.66 (0.11) | 0.67 (0.20) | 0.71 (0.16) | 91.52 | |
| 0.64 (0.12) | 0.57 (0.17) | 0.79 (0.15) | 91.44 | |
| 0.63 (0.14) | 0.55 (0.18) | 0.80 (0.15) | 91.26 | |
| 0.52 (− −) | - - (− −) | 0.86 (− −) | 90.48 | |
| 0.60 (0.13) | 0.55 (0.17) | 0.73 (0.18) | 89.81 | |
| 0.55 (0.14) | 0.47 (0.15) | 0.73 (0.20) | 88.74 | |
| 0.55 (0.19) | 0.54 (0.15) | 0.70 (0.29) | 88.46 | |
| 0.57 (0.13) | 0.57 (0.18) | 0.61 (0.16) | 87.71 | |
| 0.52 (0.14) | 0.46 (0.15) | 0.66 (0.18) | 86.44 | |
| Full train | 0.63 (0.13) | 0.55 (0.16) | 0.79 (0.14) | 91.33 |
| One-shot (3 layers, 26.8 ml.) | 0.58 (0.16) | 0.48 (0.19) | 0.84 (0.13) | 90.32 |
Obtained results for Roy et al. (2018) were extracted from the related publication.