| Literature DB >> 32570943 |
Muhammad Arsalan1, Na Rae Baek1, Muhammad Owais1, Tahir Mahmood1, Kang Ryoung Park1.
Abstract
Ophthalmological analysis plays a vital role in the diagnosis of various eye diseases, such as glaucoma, retinitis pigmentosa (RP), and diabetic and hypertensive retinopathy. RP is a genetic retinal disorder that leads to progressive vision degeneration and initially causes night blindness. Currently, the most commonly applied method for diagnosing retinal diseases is optical coherence tomography (OCT)-based disease analysis. In contrast, fundus imaging-based disease diagnosis is considered a low-cost diagnostic solution for retinal diseases. This study focuses on the detection of RP from the fundus image, which is a crucial task because of the low quality of fundus images and non-cooperative image acquisition conditions. Automatic detection of pigment signs in fundus images can help ophthalmologists and medical practitioners in diagnosing and analyzing RP disorders. To accurately segment pigment signs for diagnostic purposes, we present an automatic RP segmentation network (RPS-Net), which is a specifically designed deep learning-based semantic segmentation network to accurately detect and segment the pigment signs with fewer trainable parameters. Compared with the conventional deep learning methods, the proposed method applies a feature enhancement policy through multiple dense connections between the convolutional layers, which enables the network to discriminate between normal and diseased eyes, and accurately segment the diseased area from the background. Because pigment spots can be very small and consist of very few pixels, the RPS-Net provides fine segmentation, even in the case of degraded images, by importing high-frequency information from the preceding layers through concatenation inside and outside the encoder-decoder. To evaluate the proposed RPS-Net, experiments were performed based on 4-fold cross-validation using the publicly available Retinal Images for Pigment Signs (RIPS) dataset for detection and segmentation of retinal pigments. Experimental results show that RPS-Net achieved superior segmentation performance for RP diagnosis, compared with the state-of-the-art methods.Entities:
Keywords: RPS-Net; deep learning; retinal disease; retinitis pigmentosa; semantic segmentation
Mesh:
Year: 2020 PMID: 32570943 PMCID: PMC7349531 DOI: 10.3390/s20123454
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Comparison of the available methods and RPS-Net for retinal pigment segmentation.
| Type | Methods | Strength | Limitation |
|---|---|---|---|
|
| Das et al. [ | Uses simple image processing schemes. | Preprocessing is required. |
| Ravichandran et al. [ | Watershed transform gives better region approximation. | The handcrafted feature-based method performance is subject to preprocessing by CLACHE. | |
|
| Brancati et al. [ | The simple machine learning classifier are used, AdaBoost provides less false negatives | The classification accuracy of the classifier is based on the, denoising, shade correction, etc. |
| Brancati et al. (Modified U-Net) [ | Subsequently improved segmentation performance by modified U-Net model, and 15% improvement in F-measure compared to [ | The method performance is affected by more false negatives (represented by sensitivity of the method) compared to [ | |
| RPS-Net(Proposed) | Utilizes deep concatenation inside encoder-decoder, and encoder-to-decoder (outer) for immediate feature transfer and enhancement, with substantial reduction in false negatives. | Training for fully convolutional network requires a large amount of data by augmentation. |
Figure 1Schematic diagram of retinitis pigmentosa segmentation network (RPS-Net) deep-feature concatenation.
Figure 2Proposed RPS-Net architecture for retinal pigment segmentation.
Figure 3Sample RP Fundus images with and ground-truths for the RIPS dataset: (a) original images, (b) G1, and (c) G2.
Architectural differences between RPS-Net and existing deep learning models.
| Method | Other Architectures | RPS-Net |
|---|---|---|
| SegNet [ | Collectively, the network has 26 convolutional layers. | Total 16 convolutional layers (3 × 3) are used in the encoder and decoder with concatenation in each dense block. |
| No feature reuse policy is employed. | Dense connectivity in both the encoder and decoder for feature empowerment. | |
| First two dense blocks have two convolutional layers, whereas the others have three convolutional layers. | Each dense block similarly has two convolutional layers. | |
| The convolutional layer with 512-depth is utilized twice in the network. | The convolutional layer with 512-depth is used once for each encoder and decoder. | |
| OR-Skip-Net [ | No feature reused policy is implemented for internal convolutional blocks. | Internal dense connectivity for both encoder and decoder. |
| Only external residual skip paths are used. | Both internal and external dense paths are used by concatenation. | |
| No bottleneck layers are used. | Bottleneck layers are employed in each dense block. | |
| Total of four residual connections are used in total. | Overall, 20 dense connections are used internally and externally. | |
| Vess-Net [ | Based on residual connectivity. | Based on dense connectivity. |
| Feature empowerment from the first convolutional layer is missed and has no internal or external residual connection. | Each layer is densely connected. | |
| No bottleneck layer is used. | Bottleneck layers are employed in each dense block. | |
| Collectively, 10 residual paths. | Overall, 20 dense connections are used internally and externally. | |
| U-Net [ | Overall, 23 convolutional layers are employed. | Total 16 convolutional layers (3 × 3) are used in the encoder and decoder with concatenation in each dense block. |
| Up convolutions are for the expansion part to upsample the features. | Up convolutions are not used. | |
| Based on residual and dense connectivity. | Based on dense connectivity. | |
| Convolution with 1024-depth is used between the encoder and decoder. | 1024-depth convolutions are ignored to reduce the number of parameters. | |
| Cropping layer is employed for borders. | Cropping is not required; pooling indices keep the image size the same. | |
| Modified U-Net [ | Overall, 3 blocks are used in each encoder and decoder | Overall, 4 blocks are used in each encoder and decoder |
| The up convolutions are used for upsampling | Unpooling layers are used to upsample | |
| The deep feature concatenation is just used encoder-to-decoder | Feature concatenation used inside both encoder/decoder and encoder-to-decoder | |
| The number of filters considered is 32 to 128 | The number of filters considered is 64 to 512 | |
| Dense-U-Net [ | Total of 4 dense blocks are used inside encoder with 6, 12, 36, 24 convolutional layers in each block respectively | Total 16 convolutional layers for overall network with occurrence of two convolutional layers in each block |
| Average pooling used in each encoder block | Max pooling used in each encoder block | |
| Five up convolutions are used in decoder for upsampling | 4 unpooling layers are used in decoder for upsampling | |
| H-Dense-U-Net [ | Combines 2-D Dene-U-Net and 3-D Dene-U-Net for voxel wise prediction | Used for pixel wise prediction |
| Total 4 dense blocks are used inside encoder with 3, 4, 12, 8 3-D convolutional layers in combination of 2-D Dense-U-net fusion | Total of 16 2-D convolutional layers for overall network | |
| Designed for 3-D volumetric features | Designed for 2-D image features | |
| Utilizes 3-D average pooling layer in each 3-D dense block | Used 2-Maxpooling layer in each encoder dense block | |
| U-Net++ [ | The external dense path is with dense convolutional block | No convolutional layer is used in external dense path |
| There is a pyramid type structure of dense convolutional blocks between the encoder and decoder | Direct flat dense paths are used | |
| Individual dense blocks in dense path also have own dense skip connections | No convolutions are used in dense skip path |
RPS-Net encoder with deep-feature concatenation, individual feature map size of each block (DB, EC, EDP, Ecat, and Pool indicate dense block, encoder convolution, external dense path, encoder concatenation, and pooling layer, respectively). The layer that contains “^^” denotes batch normalization, and ReLU layers are associated with this layer. The table is designed with an input image size of 300 × 400 × 3.
| Block | Name/Size | Number of Filters | Output Feature Map Size |
|---|---|---|---|
|
| EC1-A ^^/3 × 3 × 3 | 64 | 300 × 400 × 64 |
| EC1-B/3 × 3 × 64 | 64 | ||
| Ecat-1 (EC1-A * EC1-B) | - | 300 × 400 × 128 | |
| Bneck-1^^/1 × 1 × 64 | 300 × 400 × 64 | ||
| Pool-1 | - | 150 × 200 × 64 | |
|
| EC2-A ^^/3 × 3 × 64 | 128 | 150 × 200 × 128 |
| EC2-B/3 × 3 × 64 | 128 | ||
| Ecat-2 (EC2-A * EC2-B) | - | 150 × 200 × 256 | |
| Bneck-2^^/1 × 1 × 128 | 150 × 200 × 128 | ||
| Pool-2 | - | 75 × 100 × 128 | |
|
| EC3-A ^^/3 × 3 × 64 | 256 | 75 × 100 × 256 |
| EC3-B/3 × 3 × 64 | 256 | ||
| Ecat-3 (EC3-A * EC3-B) | - | 75 × 100 × 512 | |
| Bneck-3^^/1 × 1 × 256 | 75 × 100 × 256 | ||
| Pool-3 | - | 37 × 50 × 256 | |
|
| EC4-A ^^/3 × 3 × 64 | 512 | 37 × 50 × 512 |
| EC4-B/3 × 3 × 64 | 512 | ||
| Ecat-4 (EC4-A * EC4-B) | - | 37 × 50 × 1024 | |
| Bneck-4^^/1 × 1 × 512 | 37 × 50 × 512 | ||
| Pool-4 | - | 18 × 25 × 512 |
RPS-Net decoder with deep-feature concatenation, and individual feature map size of each block (DB, DC, EDP, Dcat, and Pool indicate dense block, decoder convolution, external dense path, decoder concatenation, and pooling layer, respectively). The layer that contains “^^” denotes batch normalization, and ReLU layers are associated with this layer. The table is designed with an input image size of 300 × 400 × 3.
| Block | Name/Size | Number of Filters | Output Feature Map Size |
|---|---|---|---|
|
| Unpool-4 | - | 37 × 50 × 512 |
| DC4-B ^^/3 × 3 × 512 | 512 | ||
| DC4-A/3 × 3 × 512 | 256 | 37 × 50 × 256 | |
| Dcat-4 (DC4-B * DC4-A * EC4-A) | - | 37 × 50 × 1280 | |
| Bneck-5^^/1 × 1 × 1280 | 256 | 37 × 50 × 256 | |
|
| Unpool-3 | - | 75 × 100 × 256 |
| DC3-B ^^/3 × 3 × 256 | 256 | ||
| DC3-A/3 × 3 × 256 | 128 | 75 × 100 × 128 | |
| Dcat-3 (DC3-B * DC3-A * EC3-A) | - | 75 × 100 × 640 | |
| Bneck-6^^/1 × 1 × 640 | 128 | 75 × 100 × 128 | |
|
| Unpool-2 | - | 150 × 200 × 128 |
| DC2-B ^^/3 × 3 × 128 | 128 | ||
| DC2-A/3 × 3 × 128 | 64 | 150 × 200 × 64 | |
| Dcat-2 (DC2-B * DC2-A * EC2-A) | - | 150 × 200 × 320 | |
| Bneck-7^^/1 × 1 × 320 | 64 | 150 × 200 × 64 | |
|
| Unpool-1 | - | 300 × 400 × 64 |
| DC1-B ^^/3 × 3 × 64 | 64 | ||
| DC1-A/3 × 3 × 64 | 2 | 300 × 400 × 2 | |
|
| - | 300 × 400 × 130 | |
|
| 2 | 300 × 400 × 2 | |
Figure 4Illustration of the data augmentation process used to generate artificial images to train RPS-Net; Hflip and Vflip represent the horizontal flip and vertical flip, respectively.
Figure 5Examples of RPS-Net results for pigment sign segmentation for the Retinal Images for Pigment Signs (RIPS) dataset: (a) original retinal image; (b) ground-truth mask G1; (c) ground-truth mask G2; (d) predicted retinal pigment mask by RPS-Net, where FP is indicated in green, FN in red, and TP in blue.
Figure 6ROC curves for RPS-Net based on the ground-truth masks by (a) the first expert G1 and (b) the second expert G2.
Accuracies of retinal pigment sign segmentation by RPS-Net for the RIPS dataset based on the ground-truth mask by the second expert G2 (unit: %).
| Type | Method | Sen | Spe | P | F | Acc |
|---|---|---|---|---|---|---|
| Handcrafted local feature-based methods | * Ravichandran et al. [ | 72.0 | 97.0 | - | 62.0 | 96.0 |
| Learned/deep-feature-based methods | Random Forest [ | 58.26 | 99.46 | 46.18 | 47.93 | 99.14 |
| AdaBoost M1 [ | 64.29 | 99.30 | 42.45 | 46.76 | 99.01 | |
| U-Net 48 × 48 [ | 55.70 | 99.40 | 48.00 | 50.60 | 99.00 | |
| U-Net 72 × 72 [ | 62.60 | 99.30 | 46.50 | 52.80 | 99.00 | |
| U-Net 96 × 96 [ | 55.20 | 99.60 | 56.10 | 55.10 | 99.20 | |
| RPS-Net (proposed method) | 80.54 | 99.60 | 54.05 | 61.54 | 99.52 |
Accuracies of retinal pigment sign segmentation by RPS-Net for the RIPS dataset based on the ground-truth mask by the second expert G2 (unit: %).
| Type | Method | Sen | Spe | P | F | Acc |
|---|---|---|---|---|---|---|
| Learned/deep-feature-based methods | Random Forest [ | 56.20 | 99.48 | 50.49 | 49.29 | 99.11 |
| AdaBoost M1 [ | 61.76 | 99.33 | 46.29 | 48.30 | 98.99 | |
| RPS-Net (proposed method) | 78.09 | 99.62 | 56.84 | 62.62 | 99.51 |
Figure 7Sample image for retinal pigment sign detection, count, and size analysis. (a) Original image; (b) detected pigment spots with sizes.
Figure 8Sample image for retinal pigment sign detection, count, and size analysis. (a) Original image; (b) detected pigment spots with sizes.