| Literature DB >> 28698556 |
Mohsen Ghafoorian1,2, Nico Karssemeijer3, Tom Heskes4, Inge W M van Uden5, Clara I Sanchez3, Geert Litjens3, Frank-Erik de Leeuw5, Bram van Ginneken3, Elena Marchiori4, Bram Platel3.
Abstract
The anatomical location of imaging features is of crucial importance for accurate diagnosis in many medical tasks. Convolutional neural networks (CNN) have had huge successes in computer vision, but they lack the natural ability to incorporate the anatomical location in their decision making process, hindering success in some medical image analysis tasks. In this paper, to integrate the anatomical location information into the network, we propose several deep CNN architectures that consider multi-scale patches or take explicit location features while training. We apply and compare the proposed architectures for segmentation of white matter hyperintensities in brain MR images on a large dataset. As a result, we observe that the CNNs that incorporate location information substantially outperform a conventional segmentation method with handcrafted features as well as CNNs that do not integrate location information. On a test set of 50 scans, the best configuration of our networks obtained a Dice score of 0.792, compared to 0.805 for an independent human observer. Performance levels of the machine and the independent human observer were not statistically significantly different (p-value = 0.06).Entities:
Mesh:
Year: 2017 PMID: 28698556 PMCID: PMC5505987 DOI: 10.1038/s41598-017-05300-5
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1A pattern is observable in WMHs occurrence probability map.
MR imaging protocol specification for the T1 and FLAIR modalities.
| Modality | TR/TE/TI | Flip angle | Voxel size | Interslice gap |
|---|---|---|---|---|
| T1 | 2250/3.68/850 ms | 15° | 1.0 × 1.0 × 1.0 | 0 |
| FLAIR | 9000/84/2200 ms | 15° | 1.2 × 1.0 × 5.0 | 1 mm |
Figure 2An example of negative (top row) and positive (bottom row) samples in three scales (from left to right) 32 × 32, 64 × 64 and 128 × 128 on the FLAIR image. The two larger scales are down sampled to 32 × 32.
Figure 3Patch preparation process and different proposed CNN architectures. The links between set of convolutional layers represent a weight sharing policy among the streams.
Performance comparison of different CNN architectures based on validation set A and test set Dice score considering observer 1 and observer 2 as the reference standard.
| Method | Without location features | With location features | ||||
|---|---|---|---|---|---|---|
| Validation set A | Test set Dice (obs1) | Test set Dice (obs2) | Validation set A | Test set Dice (obs1) | Test set Dice (obs2) | |
| SS | 0.9939 | 0.731 | 0.729 | 0.9972 | 0.781 | 0.778 |
| MSEF | 0.9947 | 0.762 | 0.752 | 0.9966 | 0.777 | 0.769 |
| MSIW | 0.9966 | 0.778 | 0.768 | 0.9972 | 0.795 | 0.787 |
| MSWS | 0.9965 | 0.773 | 0.760 | 0.9973 | 0.792 | 0.783 |
A performance comparison between conventional method, MSWS + Loc architecture, and human observers.
| Method | Dice (obs1) | Dice (obs2) |
|---|---|---|
| Conventional | 0.716 | 0.699 |
| MSWS + Loc | 0.792 | 0.783 |
| observer 1 | — | 0.805 |
| observer 2 | 0.805 | — |
Statistical significance test for pairwise comparison of the methods Dice score. p ij indicates the p-value for the null hypothesis that method i is better than method j.
| Method | MSWS | SS + Loc | MSWS + Loc | Ind. Obs. |
|---|---|---|---|---|
| SS | <0.01 | <0.01 | <0.01 | <0.01 |
| MSWS | — | <0.01 | <0.01 | <0.01 |
| SS + Loc | — | — | 0.03 | 0.03 |
| MSWS + Loc | — | — | — | 0.06 |
A performance comparison of the single-scale architecture with different possible locations to add the spatial location information. Abbreviations: last convolutional layer (LCL), first fully connected layer (FFCL), second fully connected layer (SFCL).
| Method | Validation set | Test set Dice |
|---|---|---|
| LCL | 0.9964 | 0.763 |
| FFCL | 0.9971 | 0.781 |
| SFCL | 0.9967 | 0.778 |
Figure 4Integration of spatial location information fills the gap between performance of a normal CNN and human observer. (a) An ROC comparison of different CNN methods, a conventional segmentation method and independent human observer, considering observer 1 as the reference standard. (b) A comparison of different methods on Dice score as a function of binary masking threshold. The light shades around the curves indicate 95% confidence intervals with bootstrapping on patients.
Figure 5Test Dice as a function of training set size.
Test Dice as a function of training set size.
| Training set size | 23 | 47 | 94 | 189 | 378 |
| Test set Dice | 0.753 | 0.756 | 0.770 | 0.788 | 0.792 |
Figure 6Two sample cases of segmentation improvement by adding location information to the network. (a) FLAIR images without annotations. (b) Segmentation by human observer 1. (c) Segmentation by SS method. (d) Segmentation by MSWS + Loc method.
Figure 8A sample case with a small lesion missed by the two human observers. (a) FLAIR image without annotations. (b) Segmentation by human observer 1. (c) Segmentation by human observer 2. (d) Segmentation by MSWS + Loc method.
Figure 7Gliosis around the lacunes is a prevalent type of false positive segmentation. (a) FLAIR images without annotations. (b) Segmentation by human observer 1. (c) Segmentation by human observer 2. (d) Segmentation by MSWS + Loc method.