Literature DB >> 36112649

Dermoscopic image segmentation based on Pyramid Residual Attention Module.

Yun Jiang¹, Tongtong Cheng¹, Jinkun Dong¹, Jing Liang¹, Yuan Zhang¹, Xin Lin¹, Huixia Yao¹.

Abstract

We propose a stacked convolutional neural network incorporating a novel and efficient pyramid residual attention (PRA) module for the task of automatic segmentation of dermoscopic images. Precise segmentation is a significant and challenging step for computer-aided diagnosis technology in skin lesion diagnosis and treatment. The proposed PRA has the following characteristics: First, we concentrate on three widely used modules in the PRA. The purpose of the pyramid structure is to extract the feature information of the lesion area at different scales, the residual means is aimed to ensure the efficiency of model training, and the attention mechanism is used to screen effective features maps. Thanks to the PRA, our network can still obtain precise boundary information that distinguishes healthy skin from diseased areas for the blurred lesion areas. Secondly, the proposed PRA can increase the segmentation ability of a single module for lesion regions through efficient stacking. The third, we incorporate the idea of encoder-decoder into the architecture of the overall network. Compared with the traditional networks, we divide the segmentation procedure into three levels and construct the pyramid residual attention network (PRAN). The shallow layer mainly processes spatial information, the middle layer refines both spatial and semantic information, and the deep layer intensively learns semantic information. The basic module of PRAN is PRA, which is enough to ensure the efficiency of the three-layer architecture network. We extensively evaluate our method on ISIC2017 and ISIC2018 datasets. The experimental results demonstrate that PRAN can obtain better segmentation performance comparable to state-of-the-art deep learning models under the same experiment environment conditions.

Entities: Chemical

Mesh：

Year: 2022 PMID： 36112649 PMCID： PMC9481037 DOI： 10.1371/journal.pone.0267380

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.752

1. Introduction

Skin cancer is approximately one of the most cancer cases. Currently, two to three million cases are increasing worldwide each year. According to the American Cancer Society, there were 108, 420 new cases of skin cancer and 11, 480 deaths in 2020, an increase of 12.37% and 58.78% year-on-year [1]. The lethality rate of skin cancer can reduce by early diagnosis. There is still 95% curative rate in the early stages of the disease in malignant melanoma even. But once it worsens, the survival rate of patients is only 15% [2]. The early diagnosis of skin lesions is indispensable because malignant melanoma has concealment and long latency. The dermoscopy is designed to provide the physician with high-resolution images of abnormal parts of the patient’s epidermis. And physician can further identify the symptoms by segmenting the images manually and reducing the probability of misdiagnosis [3]. However, it is still a challenging task to achieve the accurate segmentation of these images. As shown in Fig 1, dermoscopic images exist with low contrast, a non-uniform appearance, and a large number of interfering factors (such as hair, artifacts, artificial markers, etc.), which obstruct the diagnostic process [4]. Computer-aided diagnosis (CAD) technology plays an essential role in solving similar challenges. After delineating the lesion area from the normal skin area through instruments accurately, physicians can identify the diagnostic area directly and improve the accuracy of the diagnosis [5].

Fig 1

(a) Background differences between skin and lesions with poor contrast. (b) Irregularity and variable size of lesions. (c) Images of lesions are often accompanied by a large amount of noise. The green line represents Ground truth and the yellow is segmented by our model. In recent years, deep learning methods have been used increasingly in computer vision, natural language processing, and medical image process. Convolutional neural networks(CNNs) have achieved good results on many tasks of medical images segmentation. Compared with traditional manual segmentation methods, by learning the feature distribution in the dataset and adjusting the parameters only from the image itself, CNNs can obtain faster, better, more stable segmentation results. A few years ago, PSPNet [6] with a pyramid pooling module (PPM) could fuse features from multiple sensory fields and improve the learning ability of the model for data with complex and diverse features types. Wei et al. [7] proposed to put the PPM on each skip connection of U-Net [8] for improving the segmentation of the model and used a pixel attention mechanism to weigh the results of the pyramid process increasing in the perceptual field of PPM. Recently, the emerged attention mechanism can effectively extract global information from the image. It can provide references for local information extraction, focus on the distinctive parts of the image, and eliminate irrelevant information, thus achieving more accurate segmentation results. As the first segmentation model combined attention in tackling the medical image processing tasks, Attention-U-Net [9] applied a spatial attention-like mechanism to control the information at skip connections. It supports filtering out relevant information and suppressing the transmission of irrelevance. The SE-block proposed in Squeeze-and-Excitation networks [10] can obtain the rank of each channel in feature maps by global average pooling, and achieve the weight assignment of feature channels better. CA-Net [11], ADAM [12] incorporated bidirectional, channel, spatial, and multi-scale attention mechanisms into the encoder-decoder structure. One or more combinations of attention mechanisms were aimed to capture spatial information from a shallow layer thereby enriching the underlying semantic information. CA-Net placed the attention module on the up-sampling and skip connection based on U-Net. An adaptive bi-directional attention module proposed in ADAM supplemented the shallow extraction results with the attention information at the bottom level. Although the above attention mechanisms can improve the segmentation effect of neural networks to different degrees, they mostly contain the following limitations: The single convolution kernel has a limited ability to learn various features. The different sizes of the receptive fields have an important impact on the feature extraction process. For feature-diverse datasets, the feature extraction ability of the model can be enhanced further by the multi-scale feature fusion mechanism. The importance of the convolution layers has been ignorant for each stage during the design of the network framework. It is reasonable to adopt various convolution strategies when confronted with different sizes of semantic information with different densities. Deeper sub-networks can be assigned to sufficiently extract messages in the shallow stage with comprehensive information. It is not screened for the information processing of feature extraction in classic networks. There is ignorant of the different situations of the amount of information extracted by the convolution kernels in the feature extraction modules. The module which is responsible for extracting information should be supervised and managed accordingly, so as to enhance the update iteration of important convolution kernels and reduce the unnecessary training cost. To address the above shortcomings, we propose a novel pyramid residual attention(PRA) module, assemble a network PRAN (Pyramid Residual Attention Network) with good segmentation effectiveness. Compared with existing work, it uses the PRA module consisting of feature pyramids, residual units, and attention as the base unit of the network to ensure the stability and robustness of the model. In this paper, our work can be summarized in the following points: We propose a novel pyramidal residual attention(PRA) module. It consists of several basic convolution structures used widely in current neural networks. This module can automatically extract features of different sizes from the original image, with good feature extraction ability for skin cancer images. Differing from the traditional U-Net-based network, this paper proposes a network composed of combined PRA modules—PRAN. We divide the traditional encoder-decoder structure into three stages. Every stage in PRAN is aimed to learn the feature maps of different sizes for the enhancement of the encoder-decoder. These designs alleviate the information loss of the U-shaped network due to excessive sampling to a greater extent. Our PRAN achieved good segmentation results on both the ISIC2017 dataset and ISIC2018 dataset. The rest of this paper is arranged as follows: Section 2 introduces the related works. Section 3 introduces the proposed module and network. Section 4 presents the experimental details. Section 5 gives our conclusion in the end.

2. Related works

In this section, we mainly introduce four accesses for skin lesion segmentation tasks, namely attention mechanisms, residual and pyramid attention networks, loss functions and preprocessing methods.

2.1. Attention mechanisms

Starting from U-Net and FCN [13], there were many attention mechanisms proposed, such as the channel attention [14], the spatial attention [15], the multi-scale attention [16], the cross attention [17], and the bidirectional attention [18], etc. Atrous convolution and pyramidal pooling structure also participated in the process of improving the model’s ability to process complex objects. He et al. [19] fused multi-input RefineNet to the residual structure. Wang et al. [20] proposed setting each small convolution kernel in a spatial pyramid with attention. Shahin et al. [21] combined pyramidal pooling with U-shaped networks by placing the pyramid pooling structure on the skip connections of U-shaped networks to learn the spatial information from the encoder. Kaul et al. [22] designed FoucsNet to combine channel attention with two stacked U-Net and introduced the idea of coarse and fine segmentation. Ding et al. [23] proposed a fusion method of depth-aware gated fusion mechanism with U-Net, which used the gating mechanism to enhance the extraction of the encoder part. Xie et al. [24] reconciled two encoder-decoder structures with attention-graph structure. They used attention based on the graph to process the result of the coarse segmentation and deliveries an input to the fine segmentation. Gu et al. [11] also combined U-Net with kinds of attentions, adding improved spatial attention on the skip connection, putting channel attentions on the up-sampling part, and assigning multi-scale attentions at the end of the multi-outputs. Subsequently, a new adaptive dual attention mechanism [12] was proposed to the skip connection of U-Net to capture context effectively. Sarker et al. [25] proposed a multi-input encoder-decoder network, and Lu et al. [26] complemented a CO-attention mechanism that can enhance the model to capture remote information. In addition, a multi-branch encoder-decoder network [27] applied channel attention, spatial attention, and multi-scale fusion mechanism to extract features from images at each level.

2.2. Residual and pyramid attention networks

The pyramid structure and the residual structure are identical to attention in different ways. They are all capable of improving the learning ability of the model to varying degrees. Mei et al. [28] proposed a pyramid attention structure that combined convolution kernels of different sizes with scale agnostic attention to learn the global feature information of an image. Fu et al. [29] designed a residual pyramid attention network for CT image segmentation by combining the inception-like module with SE-block and attention block consisting of a small encoder-decoder, and then combining the two to form a feature extraction module with a large number of parameters. Chae et al. [30] implemented a residual UNet network combined with SE-block for wound region segmentation, and a modified version of SE-block was added to the skip-connected part of UNet to raise the shallow information transfer efficiency. SAR-U-Net [31] also used a combination of SE-Block and the pyramid pooling embedding in the Res-UNet. This composition can increase the ability of feature capture of the downsampling in Res-UNet. In [32], residual units and astrous convolution were regarded as the basic convolution unit and heightened the learning ability of the model on features. Flaute et al. [33] proposed a residual channel attention network for sampling and recovery of super-resolution images, which can well preserve the integrity of features learned from the encoder. Punn et al. [34] designed a residual space cross-attention-guided inception-Unet model that fused shallow and deep semantic information and improved the extraction capability of a single convolutional block with the inception [35] structure.

2.3. Loss functions and preprocessing methods

Starting from the loss function and preprocessing, the former uses a specific loss function to make the model easy to converge, and the latter improves the dataset quality. Yuan et al. [36] first used Jaccard distance as a loss function to effectively deal with the background and focal area imbalance case. Subsequently, Zhang et al. [37] proposed a Kappa loss based on the Kappa coefficient, which takes into account all pixel points of the segmentation result, thus improving the accuracy of the model prediction. Abhishek et al. [38] designed the MCC loss function based on the Mathews correlation coefficient, similar to Kappa loss when taking into account the individual values of the evaluation confusion matrix. Sarker et al. [39] proposed a loss function with a combination of cross-entropy loss and Euclidean distance. Lu et al. [40] implemented the Shrinkage loss function to balance the number of data of the different classes. Saha et al. [41] designed a color enhancement strategy to enrich and expand the dataset by decomposing the image into layers of different hues. Li et al. [42] used a deep learning method as a preprocessing method for hair removal, which has a significant improvement in enhancing the segmentation of the model. Abhishek et al. [43] fused HSV, RGB, and grayscale images to obtain the input shadow decay image, which provides better conditions for accurate segmentation of the model. Since the U-shaped network has the defect of excessive information loss, we use the PRA module as the backbone of the network and make use of the advantages of the encoder-decoder idea to form a combined network. In addition, combining with the advantages of existing work, we use the Dice loss as the loss function to enable the model to learn the features of skin lesion images more effectively and achieve better and more stable results.

3. Methods

In this section, we introduce the PRAN (Pyramid Residual Attention Network), the three-layer encoder-decoder network for image segmentation of skin lesions, and the PRA, the module-the basic unit of the PRAN.

3.1. Structure of the Pyramid Residual Attention Network (PRAN)

U-shaped networks or U-Net-based neural networks have insufficient ability to extract feature maps of different sizes. The single-kind convolution kernels for information extraction of semantic and spatial information of different densities will lead to the loss of feature information to some degree. So we design the PRAN consisting of a pyramidal residual attention module. Our PRAN consists of a three-layer structure which are the shallow layer, middle layer, and deep layer. Incorporating the idea of encoder-decoder structure, each layer consists of a combination of separate PRA modules. The processing of feature extraction is performed only in the current layer. We choose up-sampling and down-sampling to finish information fusion in different layers and information is reprocessed by various skip-connect operations in the same layer. The PRAN is illustrated in Fig 2, and the parameter distribution is presented in Table 1.

Fig 2

Pyramid residual attention network.

Table 1

List of parameters of PRA modules in each position in the model, where ( ) represents the (1, 3, 5, 7) four convolution kernels with the same channels.

Location	Pyramid	Attention	Skip connection	Post process
Left shallow layer	8×( )×( )	32×3×3	29×3×3	64×3×3
Right shallow layer	16×( )×( )	160×3×3	64×3×3	32×3×3
Left middle layer	32×( )×( )	128×3×3	64×3×3	64×3×3
Right middle layer	64×( )×( )	448×3×3	64×3×3	64×3×3
Deeplayer	128×( )×( )	512×3×3	128×3×3	32×3×3

The shallow layer uses a combined unit of two PRA modules to process the input of the original image specification. The feature output obtained by the first attention mechanism weighting forms a skip connection, which is merged with the result of the second information extraction. Then we use the channel attention module to weigh again and fuse with the information from the middle layer to obtain the final output. The residual unit of the module consists of two simple two-layer 3 × 3 convolutions containing group normalization(GN) [44]. The residual unit retains the initial information while extracting features and waiting for the features from the lower layers to add for the final segmentation. The input of the middle layer comes from the feature pyramid of the shallow layer. After a maxpooling layer, the feature maps are 1/4 the size of the original image. It is accompanied by the reduction of spatial information and the condensation of semantic information. The processing of the feature maps in this layer is the same as that in the shallow layer. The output of the middle layer is obtained by supplementing the information extracted from the deep layer. There is only one PRA module in the deep layer, yet its role is crucial. After two times of feature pyramid extraction and four times of down-sampling, the resulting feature maps are only 1/16 of the size of the original maps, which contain dense semantic information. The information extraction mechanism at the bottom layer enables the hidden features to be fully learned, complementing information at the middle layer and thus enhancing the learning capability of the model.

3.2. Pyramid residual attention module

We propose a streamlined and effective synthesis module called the pyramidal residual attention (PRA) module. To better understand the principle of the module’s composition, we illustrate the concept of feature learning with a simple convolutional neural network. Fig 3 shows the flow of the PRA module for processing feature maps. Taking the deep layer of the PRAN as an example, the size of the input image and the output feature map is 16×20. Firstly, the feature maps are processed based on the original maps extracted by 32 convolution kernels size of 3×3. At the same time, the input feature maps are compressed by a 2×2 pooling layer with a step size of 2, concentrating the semantic information and ignoring the useless spatial information. Subsequently, feature maps are fed into a feature pyramid to learn the feature information in different sizes of receptive fields. The feature sets of multiple receivers are used as the input for the channel attention. The channel attention mechanism takes the results of different convolution blocks to derive the information weight of each channel through a pooling layer. These arrangements can enhance the input to various degrees to focus the attention. Finally, after recovering the size by the deconvolution layer, the feature maps connect with the output of the residual block to obtain the output of the module.

Fig 3

Pyramid Residual Attention Module.

3.2.1. Feature pyramids

In PRA, feature pyramids are feature extractors that incorporate different kinds of convolutional kernels. Because features of different dimensions can be learned efficiently by these convolution kernels, and the operation of convolution is pixel-by-pixel. We apply four convolutional kernels of 1×1, 3×3, 5×5, and 7×7 to process the input in parallel. Multiple different convolutional kernels can learn sufficient knowledge from different perceptual fields [45]. Different kinds of convolutions can add more semantic and spatial information to the results than the single 3×3 convolutional kernel. The stride of the convolution kernel is set to 1, which does not change the size of the feature extraction so that the features of the image can be substantially preserved and provide information security for subsequent learning.

3.2.2. Channel attention

Traditional convolutional neural networks lack a feature screening mechanism for deep network features. Bahdanau et al. [19] proposed an attention mechanism for this situation arising in natural language processing which can effectively alleviate such problems. Considering skin cancer lesion areas have irregular boundaries, these important edge features need to be learned adequately in different feature sizes. Channel attention is used as a supervised unit to screen features to reduce redundancy. In the original SE-block [10] module, only the global average pooling is used to extract the features of channels, which will make the spatial information lost seriously. So we use the maximum pooling for information supplementation. Specifically, as in Eqs (1)–(4): we define x as the module input, y as the module output, x ∈ H×W×C, the global average pooling function P, the maxpooling function P, among the equation, P(x) ∈ R1×1×, P(x) ∈ R1×1×, k1, k2, b1, b2, in the formula represent the two linear layer weights and biases. The sigmoid function can produce the distribution of weights between channels when the weights rely mainly on both their input and adjustment of the linear layer. Refers to the per-channel multiplication operation, feature maps are first passed through the global average pooling layer and the global maximum pooling layer, thus obtaining important information for each channel. F(x, y) represents the values on the c-channel coordinates (x, y) on the feature map. Both global average pooling and maximum pooling screen the importance of each channel and rank them by the sigmoid function. The obtained sequence of the importance of feature maps is channel multiplied with the input and finally summed with the original inputs.

3.2.3. Residuals mechanism

As shown in Fig 3, in our PRA, the two branches of the residual structure [46] extract feature maps at different scales, while the skip-connected part passes the input directly backward. The convolution and group normalization [44] can reinforce the learning of the module for small batches of images. The result of the other branch comes from feature pyramids and channel attention. This residual structure can be comprehended as a mapping, as in Eq (5). And there is another mapping in the PRA module. The residual structure used in this paper is shown in Eq (6). For the feature extraction part of PRA, if the input feature x is already optimal, the directional gradient propagation in the residual structure can set the gradient to zero, thus speeding up the training processing of the model. The shortcut can ensure that the model will move to be better steadily during the training. When the shallow parameters of the model reach a specific optimal value, the deep parameters will update on top of that and not move in a worse direction. The inclusion of the residual structure not only improves the performance of the model but also speeds up the training of the model and prevents the gradient from disappearing [39].

3.2.4. Loss function

Considering that the lesion areas of dermoscopy images are irregular and noisy, we choose Dice loss [47] as the loss function to ensure the steady of model training. Since the Dice coefficient is an important index, using Dice loss can increase the Dice coefficient of the model in a targeted manner when improving the learning ability of the model. As shown in Eq (7), P refers to the output of the network, G represents the label of the image. Dice loss is better than IOU loss [36] and Cross-Entropy loss in maintaining the model’s generalization in our experiments.

4. Experimental results

All our methods were implemented on the Pytorch framework, using Adaptive Moment Estimation (Adam) as the training method for gradient descent, while setting the hyperparameters as follows: weight delay was 1e-6, batch size was 8, and the number of iterations set 300. The experience was implemented on a Quadro RTX 6000 GPU (24G), used a stepped learning rate with an initialized learning rate equaling 0.3, and changed it to 1e-4 after 30 iterations and to 1e-6 after 270 iterations. To verify the validity of this model, we conducted experiments using the following three approaches: The model was trained and tested on ISIC2017. The results obtained from the experiments were compared with the advanced models in recent years, and the detailed experimental results are presented in 4.4. We trained the PRA module on the ISIC2018 dataset and demonstrated its effectiveness. We implemented the PRAN, performed ablation experiments on the test set, and compared our model with state-of-the-art models in recent years. To increase the validity verification of the model, the following models were replicated in this paper: U-Net, CA-Net, ATTU-Net, DenseASPP, CE-Net, and the results were presented in Tables 4 and 5.

4.1. Data sets and evaluation metrics

For skin lesion segmentation, our PRAN was evaluated on the datasets of ISIC Challenge 2017, 2018 for two years. The ISIC2017 dataset [48] contains a total of 2750 images, of which 2000, 150, 600 are used as the training, validation, test set respectively. In contrast, the ISIC2018 dataset [49] has only the complete training set totaling 2594 images. Therefore, we randomly divided the ISIC2018 dataset into three parts, with 1816 images in the training set, 256 as the validation set, and the remaining 518 as the test set. The ISIC2018 dataset contains data from ISIC2017, so the ISIC2018 dataset is introduced briefly here. The images in the dataset are of different sizes that are RGB images of sizes 720×540∼6708×4439 are to be normalized as the input to the model before the segmentation task. Here we normalize all the images to the size of 320×256 and enhance the dataset by flipping horizontally and vertically, cutting randomly, and rotating -90∼90 degrees, as shown in Fig 4.

Fig 4

Image display using data enhancement, (a) original image (b) normalized and data enhanced image.

We used DC(Dice coefficient), MIOU, ACC(accuracy), SP(specificity), and SE(sensitivity) as important metrics to evaluate the performance of the model. Among them, ACC is the ratio of correctly predicted pixel points to all pixel points. Suppose there are many pixel points, G represents the pixel points of positive and negative classes in the labels, and P represents the positive and negative classes in the prediction results, and k denotes the intersection operation of the set, where the Dice coefficient (DC) and MIOU coefficient are operated as Eqs (8) and (9).

4.2. Ablation experiments for module validation

We tested the performance of the network and verified the rationality of the PRA module through ablation experiments. The effectiveness of individual PRA modules was verified on the ISIC2018 dataset using the same experimental environment and hyperparameters. The feature learning capability of the modules is derived by comparison of the Precision-Recall plot (PR) and receiver operating characteristic curve (ROC) plot in Fig 5. As shown in Table 2, the effect of the feature pyramid, attention mechanism, and residual module on PRA is observed by replacing each widget using a 3×3 convolutional kernel, where the replacement of the attention mechanism causes the performance of the feature extraction module to decrease by approximately 14%, as shown in Table 2 for PRA-CA (Channel Attention), while the replacement of the pyramid PRA-PM (Pyramid Module) only degrades the performance of the model by 1%. The area of the ROC and Precision-Recall curves show that the combination of the three can enhance the sensitivity and specificity of the model to different degrees, and further justifies the module design. Table 3 shows the test results of the combination of PRA modules on the ISIC2018 dataset. We find that a single PRA module can achieve significant performance improvement after horizontal and vertical slicing. Among them, horizontal stitching can improve the learning ability of the modules, and vertical stitching can further learn features at different scales, thus enhancing the generalization ability of the model for lesion segmentation of rugged skin cancer images. According to the consensus principle of network composition, the generalization ability.

Fig 5

Precision-Recall curve and ROC curve for the effectiveness of each component of the PRA module.

Table 2

Table of experimental results of the effectiveness ablation of each component of PRA module.

Network	DC(%)	MIOU(%)	SP(%)	ACC(%)	SE(%)	Para(kb)
The PRA—PM	86.80	78.16	90.83	91.36	91.89	150.17
The PRA—CA	75.92	64.12	82.60	83.02	83.46	102.69
The PRA—Res	84.93	75.69	89.58	90.08	90.60	86.26
The PRA(Proposed)	87.86	79.37	91.48	92.01	92.56	122.45

Table 3

Validation table for PRA module made up in horizontal and vertical directions.

Network	DC(%)	MIOU(%)	SP(%)	ACC(%)	SE(%)	Para(M)
The PRA	87.86	79.37	91.48	92.01	92.56	0.1
The Shallow of PRA Network	91.75	85.58	94.13	94.85	95.58	1.1
PRAN(Proposed)	93.37	87.94	94.98	95.56	96.15	21.6

Table of experimental results of the effectiveness ablation of each component of PRA module.

The PM, CA, Res stands for Pyramid Module, Channel Attention, Residual Unit, respectively, “-” refers to replace the module with a 3×3 convolution kernel. Bold data indicates the maximum value in this indicator. As shown in Fig 6, when facing the segmentation of different types of skin lesion areas, the PRA module can often capture only a small amount of information. There is still the problem of noise interference for the determination of lesion areas, as shown in Fig 6d, while by deepening the model horizontally, a part of the noise can be reduced. Then by deepening the model vertically and assembling the PRA modules into PRAN, the noise removal ability of the model was improved greatly, as shown in Fig 6c, which segmentation effect obtained is also closer to Ground Truth. The different combinations of PRA modules have different enhancements in the ROC curve and Precision-Recall curve.

Fig 6

The segmentation results of the proposed PRA module and PRAN test on the ISIC2018 dataset.

The segmentation results of the proposed PRA module and PRAN test on the ISIC2018 dataset.

Here, (a) is one of the images, the green line represents Ground Truth, and the yellow line represents the segmentation result of PRAN. (b) shows the labeled graph of Ground Truth, (c), (d), (e) is the segmentation result of PRAN, the horizontal combination of two PRA modules, and a single PRA module, respectively. As shown in Fig 7, the horizontal building improves the sensitivity and specificity more obviously. The vertical three-layer construction increases the accuracy and recall of the model further. The depth and width expansion of the network also accompany the increasing of parameters, which is worthwhile as far as the effect is concerned. We continued deepening network layers in the horizontal direction in the trial experiment. But we found that it did not bring any apparent improvement. Thus we turned to expand the stages of the model, fixed the number of layers of the model at three, and achieved good segmentation results in the end.

Fig 7

The Plot of PR curves for the combined ablation experiment, where PRA is a single module, the shallow layer of the PRAN is that has been combined horizontally, and PRAN is the complete network after assembling of PRA modules vertically.

As shown in Fig 8, the loss decline curves for training and validation, our proposed PRA module can reach a good level of loss in 50 iterations during training and validation and makes a slight and stable decline in the training loss in the last 250 iterations. Moreover, our PRAN has a more obvious advantage over UNet in line with loss reduction and is more stable.

Fig 8

The Plot of train and valid loss curves for the combined ablation experiment.

PRA is a single module, the shallow layer of the PRAN is that has been combined horizontally, and PRAN is the complete network after assembling of PRA modules vertically.

The Plot of train and valid loss curves for the combined ablation experiment.

PRA is a single module, the shallow layer of the PRAN is that has been combined horizontally, and PRAN is the complete network after assembling of PRA modules vertically.

4.3. Comparison with state-of-the-art methods on the ISIC2017 dataset

We find that the virtual height of the model training on the ISIC2017 dataset exists in the invalidation process. The evaluation metric of the generalization performance of the model decreases by 4%∼5% compared to the valid set. From another perspective, this situation is precisely proportional to the feature learning ability of the model. The effect of our model is in Table 4. Through comparing with the existing models, we find that the segmentation of the lesion region is related to the robustness of the model on the ISIC2017 dataset as seen in Fig 9. The effect of the pure channel and spatial attention to enhance the model on this dataset was more limited, for example, CA-Net, DS-Net [44], etc. However, some models like CE-Net [45], ADAM, could achieve good performance gains by enhancing the feature extraction mechanism or the combined attention. As shown in Fig 8, our model can segment different types of lesions satisfactorily with minimal interference from noise compared to the other models. Since there is hardly a distinction between the lesion area and the background, the segmentation results are represented by highlighting the color of the lesion area. We used pink as the base color in the picture. After increasing the hues of pink, the differences between lesions and skin can be seen in the results. By visualizing the results, the physician can recognize the boundaries of the lesion area easier and how the darker parts of the lesion area are highlighted in the segmented area.

Table 4

Comparison of PRAN with existing models on ISIC2017 dataset. Bold data in the table indicates the highest metric in the column.

Network	year	DC(%)	MIOU(%)	SP(%)	ACC(%)	SE(%)	Para(M)
U-Net [8]	2015	84.39±8.42	77.30±11.29	90.06±5.94	90.65±5.96	91.28±5.98	31
Attention U-Net [9]	2018	86.96±6.98	80.37±9.80	91.20±6.17	91.80±6.19	92.43±6.23	33.3
DenseASPP [52]	2018	87.27±7.81	80.45±7.81	91.71±5.31	92.32±5.30	92.96±5.31	33.8
CE-Net [51]	2019	88.73±7.21	81.23±9.98	92.16±5.37	92.81±5.36	93.49±5.36	24
CA-Net [11]	2020	88.17±7.81	80.24±10.75	91.72±5.76	92.37±5.78	93.29±5.81	2.8
Abhishek et al. [43]	2020	83.86±7.80	75.1±10.75	95.16±3.7	92.20±4.5	87.06±7.7	-
DAGAN [57]	2020	85.9	77.1	97.6	93.5	83.5	-
EUnet-DGF [23]	2020	87.89	80.37	-	94.77	-	-
DS-Net [50]	2020	87.5	77.5	-	95.5	-	-
Scale-Att-ASPP [7]	2020	87.81	80.28	93.20	93.16	86.97	-
FC-DPN [53]	2020	84.56	76.34	93.71	98.65	83.82	-
PRAN(proposed)	2021	89.47±6.31	82.14±9.20	92.51±4.65	93.17±4.62	93.86±4.61	21.6

Fig 9

Visualization results of segmentation on ISIC2017, the discolored area represents the segmentation result of the model or label, and the darker part of the segmented lesion area indicates the part of the original image where the skin color is obvious.

4.4. Comparison with state-of-the-art methods on the ISIC2018 dataset

The results of the proposed models are compared with the current optimal models as shown in Table 5. We can observe the strengths and weaknesses of the models from the tabulated data. The performance of the models is fully demonstrated by the visualization of the segmentation results as shown in Fig 10. The comparison of our model with other models on the ISIC2018 dataset shows that the validation set can affect the training processing of the model in some cases. In addition, on the validation set, the results obtained by the model in those two data sets are not significantly different in each metric. However, the final test set shows that the validation set of the ISIC2018 dataset was more credible. Fortunately, The proposed PRAN achieved good results in the comparison of results. In the segmentation of skin lesion images, our model needs to segment the central region of the lesion with significant contrast and make a reliable guess of the general shape of the lesion. Even when the contrast of images is not high, our model can still ensure that the lesion area is within the segmented region. Through the above ablation experiments and comparative studies, our PRAN can achieve good segmentation results in skin cancer images which consist of diverse backgrounds and oddly shaped lesions. Due to the effective mechanism of the PRA module, it was capable of learning the knowledge sufficiently in the feature maps. The difference from previous networks is that the encoder-decoder idea is no longer limited to the design of modules, but integrated into basic blocks. With the insurmountable deficiency of insufficient feature extraction in U-shaped networks, it was not enough to use various attentions to improve U-Net, but worthwhile to learn and develop the application of the encoder-decoder idea. The basis of the PRA module combination was a continuation of the deep-learning idea that networks with more parameters tend to be more powerful, which is why our PRAN can obtain performance improvements.

Table 5

Comparison table of segmentation performance on the ISIC2018 dataset.

Network	year	DC(%)	MIOU(%)	SP(%)	ACC(%)	SE(%)	Para(M)
U-Net [8]	2015	88.43±5.15	82.25±7.56	93.05±3.33	93.59±3.33	94.15±3.33	31
SLSDeep [39]	2018	89.4	83.2	93.3	93.7	90.4	-
Attention U-Net [9]	2018	90.95±5.11	85.51±7.51	94.45±3.12	95.01±3.12	95.58±3.13	33.3
DenseASPP [52]	2018	91.40±5.44	84.88±7.87	93.60±3.17	94.16±3.18	94.74±3.19	33.8
BCDU-Net [54]	2019	82.24	-	97.86	95.60	80.07	-
CE-Net [51]	2019	89.8±5.39	83.1±7.82	93.4±3.21	94.1±3.21	93.4±3.22	24
DA-Net [55]	2019	89.6	82.8	93.7	94.0	93.7	-
ADAM [12]	2019	90.8	84.4	94.1	94.7	94.2	-
CA-Net [11]	2020	92.68±5.78	87.10±8.48	94.76±3.71	95.33±3.70	95.92±3.73	2.7
DAGAN [57]	2020	88.5	82.4	91.1	92.9	95.3	-
DEDB [56]	2021	90.00	83.30	97.00	96.95	96.50	-
PRAN(Proposed)	2021	93.37±4.50	87.94±6.68	94.98±2.98	95.56±2.99	96.15±2.97	21.6

Fig 10

Visualization results of segmentation on ISIC2018, the discolored part represents the segmentation result of the lesion area on the original image, and the darker part of the segmented lesion area indicates the part of the original image where the skin color is obviously.

4.5. The generalization ability

To evaluate the generalization ability of the proposed method, we assessed it on another challenge. We used the KvasirSEG [58] dataset to evaluate our model. In the above experimental setting, we initialized the images of different sizes to 384×384 size. Because KvasirSEG dataset consists of 1000 images, we randomly disorganized them and took the 500s as the training set, 100s as the validation set, and 400s as the test set. We used the same data enhancement method as the ISIC dataset. Fig 11 and Table 6 show the experimental results and specific metrics. It shows that the addition of attention affected the segmentation performance of the model, and in some cases, it even increased the learning burden of the model significantly. For example, CA-Net made the model not as effective as it should be. Replacing the bottom module in U-Net with PRA can improve the model performance. The improvement of the segmentation of PRAU is in the figure, which is significantly more than several other models.

Fig 11

Visualization results of segmentation on KvasirSEG dataset.

The discolored part represents the segmentation result of the lesion area on the original image. The darker part of the segmented lesion area indicates the part of the original image where the skin color is clear.

Table 6

Comparison test table of generalizability of the proposed model on KvasirSEG dataset.

Bold data in the table indicates the highest metric in the column.

Network	year	DC(%)	MIOU(%)	SP(%)	ACC(%)	SE(%)	Para(M)
U-Net [8]	2015	89.40±5.33	82.12±6.64	94.01±3.48	94.66±3.47	95.94±3.46	31
Attention U-Net [9]	2018	89.16±5.71	81.81±8.30	93.75±3.45	94.40±3.44	95.08±3.44	33.3
DenseASPP [52]	2018	89.63±6.64	82.76±9.33	94.25±3.84	94.91±3.84	95.59±3.85	33.8
CE-Net [52]	2019	89.95±5.72	83.07±8.32	94.35±3.24	95.01±3.21	95.69±3.20	24
ADAM [12]	2019	88.22	81.37	96.68	98.28	91.04	-
CA-Net [11]	2020	88.55±6.19	80.99±8.65	93.57±3.77	94.22±3.76	94.89±3.76	2.7
PRAU(Proposed)	2021	90.88±5.39	84.41±8.01	94.73±3.46	95.40±3.43	96.09±3.41	32.7

Visualization results of segmentation on KvasirSEG dataset.

Comparison test table of generalizability of the proposed model on KvasirSEG dataset.

Bold data in the table indicates the highest metric in the column.

5. Conclusion

In this paper, we propose a novel Pyramid Residual Attention Network(PRAN) which shows the advantages of the encoder-decoder idea from a new perspective and apply it successfully to the skin lesion image segmentation tasks. The PRA module can effectively extract the information of the feature map, which alleviates the information confusion problem caused by the feature pyramid to a certain extent. Starting from the PRA module, we further design a new meaning to solve the inadequate feature learning in traditional neural networks. Using channel attention as the supervising mechanism of the feature pyramid, the model ensures accurate segmentation of dermoscopy images with distinctive features and determines the approximate area of image lesions with high interference and low contrast. Importantly, our model achieves satisfactory results on both ISIC2017 dataset and ISIC2018 dataset. The base module has room for further improvement, which is still of research interest for other types of edge segmentation tasks. 24 Nov 2021

PONE-D-21-32574

Dermoscopic Image Segmentation Based on Pyramid Residual Attention Module.

PLOS ONE Dear Dr. Cheng, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Jan 08 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Sen Xiang Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match. When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section. 3. Thank you for stating the following financial disclosure: “No.” At this time, please address the following queries: a) Please clarify the sources of funding (financial or material support) for your study. List the grants or organizations that supported your study, including funding received from your institution. b) State what role the funders took in the study. If the funders had no role in your study, please state: “The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.” c) If any authors received a salary from any of your funders, please state which authors and which funders. d) If you did not receive any funding for this study, please state: “The authors received no specific funding for this work.” Please include your amended statements within your cover letter; we will change the online submission form on your behalf. 4. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide. 5. Please ensure that you refer to Figure 5 in your text as, if accepted, production will need this reference to link the reader to the figure. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: N/A Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: No ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: No ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: No ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The authors propose a new pyramidal residual attention(PRA) Network for dermoscopic image segmentation. The authors validated the method on two public available datasets - ISIC 2017 and 2018. Experiments showed the proposed method outperforms all state of the art methods. Ablation study also confirmed the contribution of each module. Concerns: 1. Some arguments are not supported, especially the limitation on attention mechanism (Line 48). These limitations are not easy to understand and not well supported. The authors either need to demonstrate these limitations, or at least add reference describing these limitations. 2. Claims, such as "the PRA Network is more interpretable." and "... to ensure the stability and robustness of the model" (Line 69) are not supported by the experiment. The authors need to add experiment/visualization to support these claims. Otherwise, the authors may want to remove these unsupported claims. 3. The authors did not provide variance of the reported results, given the DC/mIoU improvement over the CA-Net [11] is small (0.7%). The authors may want to add variance measurement to these scores, or use cross validation. 4. There are many typos and grammar errors in the manuscript. The authors need to greatly improve their writings. These are not limited to the following: - Line 21, "nature language processing" -> "natural language processing" - Line 153, capital M - Line 354, "Table reftable4" - Line 61-62, the sentence is difficult to understand. Reviewer #2: This paper proposes a novel feature extraction module PRA(Pyramid Residual Attention) for dermoscopic image segmentation. In detail, PRA consists of a residual module, an attention module, and a pyramid module. PRA makes use of the denoising function of the encoder-decoder structure and combines it with a multi-scale feature pyramid. The channel attention is used to monitor the feature extraction process of the pyramid to ensure the efficiency of the PRA. Experiments on ISIC2017 and ISIC2018 show the efficiency of the proposed model. There are major weaknesses in this paper that should be solved: 1. Figure 2 and Figure 3 are so messy that the reader cannot understand the architecture of the proposed module. Please use a more concise figure to illustrate the PRA network. 2. There are four identical convolution modules above each feature pyramid, and I hope the authors explain what they do. 3. There are many errors in the paper, please check the paper in detail by the authors. For example, there are missing spaces at the end of the sentences in lines 217 and 241. 4. All experiments in this paper were done on the ISIC dataset. Please conduct the same experiments on other datasets to verify the generalization ability of the proposed model. The explanation for the experiment is inadequate. More analysis should be added to demonstrate the core idea of the whole paper. 5. Some missing key references about segmentation, and residual attention [1,2,3,4] as these methods have been widely used in these works. [1] Deep Object Tracking with Shrinkage Loss, ieee tpami [2] See more, know more: Unsupervised video object segmentation with co-attention siamese networks, cvpr [3] Zero-shot video object segmentation with co-attention siamese networks, ieee tpami [4] Segmenting Objects from Relational Visual Data, ieee tpami 6. The method proposed in this paper is for dermoscopic images. What is the difference between the proposed model and the common image segmentation method? Is the proposed model effective for everyday image segmentation? Reviewer #3: 1. The language needs to be further enhanced. There are still many difficult sentences and case errors in the current manuscript. 2. The author proposes a new attentional mechanism for the skin lesions segmentation. Some existing studies have been carried out on attentional network segmentation, such as "Lightweight attention convolutional neural network for retinal vessel image segmentation ". The novelty of the attentional mechanism proposed by the authors requires further illustration. Reviewer #4: This manuscript proposes a Pyramid Residual Attention Module and the corresponding Pyramid Residual Attention network. My major concern is the novelty and presentation. i) The Pyramid Attention mechanism is an existing method that is proposed in [1]. In this work, a module named Pyramid Residual Attention is proposed. However, after going through section 3.2, I find that the proposed method is just a combination of Residual and pyramid attention. Unfortunately, the original work [1] used the residual block as well. ii) One of the targeting challenges in this manuscript is s the multi-scale problem. However, there is an existing work that uses pyramid attention for multi-scale image fusion [2]. It was published on Apr 2021 which is earlier than the submission date. iii) What is the feature pyramid exactly? The output of the Pyramid residual attention? More justifications are required. iv) This work focuses on the segmentation and the attention mechanism, why does GAN matter? I cannot see anything proposed in this work is related to the generative model. v) Important baseline is missing. Check [3]. Moreover, existing models that use pyramid attention should be compared. For example, [2]. It is not designed for Dermoscopic Image Segmentation but is still worth doing as dermoscopic image belongs to the medical image. vi) The results are not statistically significant which lacks the std etc. vii) Section 3 should be "Methods" or "Methodology". Authors need to double-check the whole manuscript to fix the typos. [1]Mei, Y., Fan, Y., Zhang, Y., Yu, J., Zhou, Y., Liu, D., Fu, Y., Huang, T.S. and Shi, H., 2020. Pyramid attention networks for image restoration. arXiv preprint arXiv:2004.13824. [2]Fu, J., Li, W., Du, J. and Huang, Y., 2021. A multiscale residual pyramid attention network for medical image fusion. Biomedical Signal Processing and Control, 66, p.102488. [3] Abhishek, K., Hamarneh, G. and Drew, M.S., 2020. Illumination-based transformations improve skin lesion segmentation in dermoscopic images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (pp. 728-729). ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Hao Tang Reviewer #2: No Reviewer #3: No Reviewer #4: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 3 Jan 2022 [Answers for Reviewer 1’ Comments] First of all, we would lie to thank you for all that you commented for our paper. Point 1: Some arguments are not supported, especially the limitation on attention mechanism (Line 48). These limitations are not easy to understand and not well supported. The authors either need to demonstrate these limitations, or at least add reference describing these limitations. Response 1: CA-Net placed the attention module on the up-sampling and skip connection based on U-Net to improve the performance of the network greatly. An adaptive bi-directional attention module was proposed in ADAM which supplements the shallow extraction results with the attention information at the bottom level, thereby to process information of different layers effectively. Although the above attention mechanisms can improve the segmentation effect of neural networks to different degrees, they most contain the following: 1.The single convolutional kernel has limited ability to learn diverse features, and the various size of the receptive field has a large impact on the feature extraction process. The multi-scale feature fusion mechanism should be used to enhance the feature extraction ability of the model for feature-diverse data sets. 2.The importance of the convolutional layer has been ignored for each stage during the design of the network framework. It is reasonable to adopt various convolutional strategies when confronted with different sizes of semantic information with different densities. Deeper sub-networks could be assigned to sufficiently extract messages in the shallow stage of convolution with comprehensive information. 3.The process of feature extraction in classic networks does not screen the information, ignoring the different situations of the amount of information extracted by the convolutional kernels in the feature extraction module. The module which is responsible for extracting information should be supervised and managed accordingly, so as to enhance the update iteration of important convolutional kernels and reduce the unnecessary training cost. We are grateful for this suggestion. In this sentence, I would like to express that attention mechanism could alleviate these problem of conventional CNN but the three problems exist still. Point 2: Claims, such as "the PRA Network is more interpretable." and "... to ensure the stability and robustness of the model" (Line 69) are not supported by the experiment. The authors need to add experiment/visualization to support these claims. Otherwise, the authors may want to remove these unsupported claims. Response 2: As shown in Fig 8, the loss decline curves for training and validation, our proposed PRA model is able to reach a good level of loss in 50 iterations during training and validation, and makes a small and stable decline in the training loss in the last 250 iterations. Moreover, our PRA Network has a more obvious advantage over UNet in terms of loss reduction and is more stable. We are grateful for this suggestion. In this regard, we decide to delete these not rigorous sentences, Because we lack of these work to prove the interpretability. And in order to valid the stability of our model, we supply the loss record in training and validation. Point 3: The authors did not provide variance of the reported results, given the DC/mIoU improvement over the CA-Net [11] is small (0.7%). The authors may want to add variance measurement to these scores, or use cross validation. Response 3: Network year DC MIOU SP ACC SE Para(M) U-Net [8] 2015 84.39±8.42 77.30±11.29 90.06±5.94 90.65±5.96 91.28±5.98 31 ATTU-Net[9] 2018 86.96±6.98 80.37±9.80 91.20±6.17 91.80±6.19 92.43±6.23 33.3 DenseASPP[53] 2018 87.27±7.81 80.45±7.81 91.71±5.31 92.32±5.30 92.96±5.31 33.8 CE-Net[52] 2019 88.73±7.21 81.23±9.98 92.16±5.37 92.81±5.36 93.49±5.36 24 CA-Net[11] 2020 88.17±7.81 80.24±10.75 91.72±5.76 92.37±5.78 93.29±5.81 2.8 DAGAN[58] 2020 85.9 77.1 97.6 93.5 83.5 - EUnet-DGF[23] 2020 87.89 80.37 - 94.77 - - DS-Net[51] 2020 87.5 77.5 - 95.5 - - Scale-Att-ASPP [7] 2020 87.81 80.28 93.20 93.16 86.97 - FC-DPN[54] 2020 84.56 76.34 93.71 98.65 83.82 - PRA Network(proposed) 2021 89.47±6.31 82.14±9.20 92.51±4.65 93.17±4.62 93.86±4.61 21.6 We are grateful for this suggestion. In this regard, we add the std to our result tables. Because the models in this segmentation task improve not obvious, although one model could segment the lesion in high contrast images easily, if encountering the low contrasts, it will perform bad. So in the table, we can see that the result of model is higher litter than other models’, but it can be well when apply to the task. Point 4: There are many typos and grammar errors in the manuscript. The authors need to greatly improve their writings. Response 3: We are grateful for this suggestion. In this regard, we carefully checked and corrected the mistakes. Thanks for your comments. [Answers for Reviewer 2’ Comments] First of all, we would lie to thank you for all that you commented for our paper. Point 1: Figure 2 and Figure 3 are so messy that the reader cannot understand the architecture of the proposed module. Please use a more concise figure to illustrate the PRA network. Response 1: We are grateful for this suggestion. In this regard, we change the figure 2 to this appearance. And the figure 3 is our designed to present the process of PRA from accepting the feature to get the output. The detail is narrated at the 3.2. We propose a streamlined and effective synthesis module called the pyramidal residual attention (PRA) module. To better understand the principle of the module's composition, we use a simple convolutional neural network model of feature learning concept to illustrate. Fig3 shows the flow of the PRA module for processing feature maps. Taking the deep level of the network as an example, the size of the input image and the output feature map is 16×20. Firstly, the features are processed on the basis of the original map using 32 convolution kernels size of 3×3 as a result of this level of information extraction. At the same time, the input feature maps are compressed by down-sampling once, using a 2×2 pooling layer with a step size of 2, concentrating the semantic information and ignoring the useless spatial information. The output is fed into a feature pyramid to learn the feature information got from different sizes of receptive fields. The feature sets of multiple receivers are used as the input for the channel attention. Channel attention takes the results of different convolutional blocks to derive the information weight of each channel through a pooling layer. Thus these options can enhance the input to different degrees to focus the attention of the model. Finally, after recovered the size by the deconvolution layer, the feature maps connect with the output of the residual block to obtain the output of the module. Point 2: There are four identical convolution modules above each feature pyramid, and I hope the authors explain what they do. Response 2: We are grateful for this suggestion. The four convolution is our feature pyramid. It is used to study the different knowledge from different convolution. Because convolution has their own perceptual fields. The details like following: Feature pyramids are feature extractors that incorporate different kinds of convolutional kernels. Because the convolution operation is pixel-by-pixel, features of different sizes can be learned efficiently by these convolution kernels. Four convolutional kernels of 226 1×1, 3×3, 5×5 and 7×7 are used to process the input in parallel. Multiple different convolutional kernels are able to learn different knowledge from different perceptual fields [45], adding more semantic and spatial information to the results of the original 229 single 3×3 convolutional kernel. The stride of the convolution kernel is set to 1, which 230 does not change the size of the image while feature extraction, so that the features of the image can be substantially preserved and provide information security for subsequent learning. Point 3: There are many errors in the paper, please check the paper in detail by the authors. For example, there are missing spaces at the end of the sentences in lines 217 and 241. Response 3: We are grateful for this suggestion. In this regard, we carefully checked and corrected the mistakes. Point 4: All experiments in this paper were done on the ISIC dataset. Please conduct the same experiments on other datasets to verify the generalization ability of the proposed model. The explanation for the experiment is inadequate. More analysis should be added to demonstrate the core idea of the whole paper. Response 4: We are grateful for this suggestion. In this regard, we add the generative experiment of KvasirSEG datatset to valid our model. The detail is following. Network year DC MIOU SP ACC SE Para(M) U-Net[8] 2015 89.40±5.33 82.12±6.64 94.01±3.48 94.66±3.47 95.94±3.46 31 AttU-Net[9] 2018 89.16±5.71 81.81±8.30 93.75±3.45 94.40±3.44 95.08±3.44 33.3 DenseASPP[53] 2018 89.63±6.64 82.76±9.33 94.25±3.84 94.91±3.84 95.59±3.85 33.8 CE-Net[53] 2019 89.95±5.72 83.07±8.32 94.35±3.24 95.01±3.21 95.69±3.20 24 ADAM[12] 2019 88.22 81.37 96.68 98.28 91.04 - CA-Net[11] 2020 88.55±6.19 80.99±8.65 93.57±3.77 94.22±3.76 94.89±3.76 2.7 PRAU(proposed) 2021 90.88±5.39 84.41±8.01 94.73±3.46 95.40±3.43 96.09±3.41 32 We use the KvasirSEG dataset to evaluate our model. in the above experimental setting, we initialize the images of different sizes to 384×384 size. the KvasirSEG dataset consists of 1000 images, we randomly disorganize them and take 500 as the training set, 100 as the validation set, and finally 400 as the test set. Using the same data enhancement method as the above dataset, the effect is shown in Fig. The specific metrics are shown in Table. It is shown in the experiments that the addition of attention affects the segmentation performance of the model, and in some cases, it even increases the learning burden of the model significantly. For example, CANet, after using attention instead, makes the model not as effective as it should be. Combining PRA with UNet and using the PRA module to replace the bottom module in UNet can improve the model performance. The improvement of the segmentation effect of UPRA can be clearly felt from the figure, which is significantly more than several other models. Point 5: Some missing key references about segmentation, and residual attention [1,2,3,4] as these methods have been widely used in these works. Response 5: We are grateful for this suggestion. In this regard, we The following papers are cited as a supplement. 1.Lu et al.[26] proposed a CO-attention mechanism which can enhance the model to capture remote information. 2.Lu et al.[40] proposed the Shrinkage loss function to balance number of data of the different classes. Thanks for your comments. [Answers for Reviewer 3’ Comments] First of all, we would lie to thank you for all that you commented for our paper. Point 1: The language needs to be further enhanced. There are still many difficult sentences and case errors in the current manuscript. Response 1: We are grateful for this suggestion. In this regard, we carefully checked and corrected the mistakes. Point 2: The author proposes a new attentional mechanism for the skin lesions segmentation. Some existing studies have been carried out on attentional network segmentation, such as "Lightweight attention convolutional neural network for retinal vessel image segmentation ". The novelty of the attentional mechanism proposed by the authors requires further illustration. Response 2: We are grateful for this suggestion. In this regard, we want to interpret as following. In this paper, we want to introduce a big base block, and connect itself to self. And to do this need a condition that the base block need to have a good study ability. So we use the pyramid, attention and residual for the litter module. And fortunately, this thought get a good result in the task of skin lesion segmentation. The attetion is not attention but a base block. Thanks for your comments. [Answers for Reviewer 4’ Comments] First of all, we would lie to thank you for all that you commented for our paper. Point 1: The Pyramid Attention mechanism is an existing method that is proposed in [1]. In this work, a module named Pyramid Residual Attention is proposed. However, after going through section 3.2, I find that the proposed method is just a combination of Residual and pyramid attention. Unfortunately, the original work [1] used the residual block as well. Response 1: We are grateful for this suggestion. In this regard, we carefully read the paper[1], and find that model was combining the different convolution kernel and scale agnostic attention, and the pyramid will reduce the feature size, importantly, the pyramid attention is consist of pyramid and SE-block-liker. This work is the same as the [7] Wei Z, Shi F, Song H, et al. Attentive boundary aware network for multi-scale skin lesion segmentation with adversarial training[J]. Multimedia Tools and Applications, 2020, 79(37): 27115-27136. Otherwise, the PRA in our work is not only a attention mechanism, but it is also a base block for PRA Network. 2.2 Residual and Pyramid Attention Networks(RPAN) The pyramid structure and the residual structure are identical to attention in a variety of ways, and they are all capable of improving the learning ability of the model to varying degrees. Mei et al.[28] proposed a pyramidal attention structure for image recovery tasks, combining convolution kernels of different sizes with scale agnostic attention to learn the global feature information of an image. Fu et al.[29] proposed a residual pyramid attention network for CT image segmentation by combining the inception-like module with SE-block and attention block consist of a small encoder-decoder, and then combining the two to form a feature extraction module with a large number of parameters. Chae et al.[30] proposed a residual UNet network combined with SE-block for wound region segmentation, and a modified version of SE-block was added to the skip-connected part of UNet to improve the shallow information transfer efficiency of the model. SAR-U-Net[31] proposed a combination of SE-Block and the pyramid pooling embedding in the Res-UNet. This composition could improve the ability of feature capture of the down sampling in ResUNet. Shah et al.[32] proposed the use of Astrous convolution and residual structure for the enhancement of U-Net models, using residual and Astrous convolution as the base convolution unit to improve the learning ability of the model on features. Flaute et al.[33] proposed a residual channel attention network for sampling and recovery of super-resolution images, which can well preserve the integrity of features learned from the encoder. Punn et al.[34] proposed a residual space cross-attention-guided inceptionUnet model that fuses shallow and deep semantic information and improves the extraction capability of a single convolutional block with the inception[35] structure, which improved the feature learning of the model at the base. Point 2: One of the targeting challenges in this manuscript is s the multi-scale problem. However, there is an existing work that uses pyramid attention for multi-scale image fusion [2]. It was published on Apr 2021 which is earlier than the submission date. Response 1: We are grateful for this suggestion. In this regard, we carefully read this paper, and find that it may be a conflict of same name. The model in paper [2] consists of pyramid attention and residual attention, and two block is just Front back combination, and finally get the name residual pyramid attention. Point 3: What is the feature pyramid exactly? The output of the Pyramid residual attention? More justifications are required Response 1: We are grateful for this suggestion. In this regard, we elaborate the detail of feature pyramid. It is just a convolution module which consists of four convolution kernels 1x1, 3x3, 5x5, 7x7. these convolution kernel could study different knowledge from the same feature map. So, pyramid just likes different perception for a same feature map. The output of Pyramid residual attention is a weighted feature map. And it consists of the original input and the feature map throught pyramid and attention mechanism. Weighted feature map will be screened in the layers com after PRA. Point 4: This work focuses on the segmentation and the attention mechanism, why does GAN matter? I cannot see anything proposed in this work is related to the generative model. Response 1: We are grateful for this suggestion. In this regard, we change 2.2 to Residual and Pyramid Attention Networks(RPAN). Point 5: Important baseline is missing. Check [3]. Moreover, existing models that use pyramid attention should be compared. For example, [2]. It is not designed for Dermoscopic Image Segmentation but is still worth doing as dermoscopic image belongs to the medical image. Response 1: We are grateful for this suggestion. In this regard, we add the result of paper [3] to our table. Point 6: The results are not statistically significant which lacks the std etc. Response 6: We are grateful for this suggestion. In this regard, we add the std in our result tables. Network year DC MIOU SP ACC SE Para(M) U-Net [8] 2015 84.39±8.42 77.30±11.29 90.06±5.94 90.65±5.96 91.28±5.98 31 ATTU-Net[9] 2018 86.96±6.98 80.37±9.80 91.20±6.17 91.80±6.19 92.43±6.23 33.3 DenseASPP[53] 2018 87.27±7.81 80.45±7.81 91.71±5.31 92.32±5.30 92.96±5.31 33.8 CE-Net[52] 2019 88.73±7.21 81.23±9.98 92.16±5.37 92.81±5.36 93.49±5.36 24 CA-Net[11] 2020 88.17±7.81 80.24±10.75 91.72±5.76 92.37±5.78 93.29±5.81 2.8 DAGAN[58] 2020 85.9 77.1 97.6 93.5 83.5 - EUnet-DGF[23] 2020 87.89 80.37 - 94.77 - - DS-Net[51] 2020 87.5 77.5 - 95.5 - - Scale-Att-ASPP [7] 2020 87.81 80.28 93.20 93.16 86.97 - FC-DPN[54] 2020 84.56 76.34 93.71 98.65 83.82 - PRA Network(proposed) 2021 89.47±6.31 82.14±9.20 92.51±4.65 93.17±4.62 93.86±4.61 21.6 Point 7: Section 3 should be "Methods" or "Methodology". Authors need to double-check the whole manuscript to fix the typos. Response 7: We are grateful for this suggestion. In this regard, we carefully checked and corrected the mistakes. [1]Mei, Y., Fan, Y., Zhang, Y., Yu, J., Zhou, Y., Liu, D., Fu, Y., Huang, T.S. and Shi, H., 2020. Pyramid attention networks for image restoration. arXiv preprint arXiv:2004.13824. [2]Fu, J., Li, W., Du, J. and Huang, Y., 2021. A multiscale residual pyramid attention network for medical image fusion. Biomedical Signal Processing and Control, 66, p.102488. [3] Abhishek, K., Hamarneh, G. and Drew, M.S., 2020. Illumination-based transformations improve skin lesion segmentation in dermoscopic images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (pp. 728-729). Thanks for your comments. Submitted filename: Answers for Reviewer 2.docx Click here for additional data file. 15 Feb 2022

PONE-D-21-32574R1

Dermoscopic Image Segmentation Based on Pyramid Residual Attention Module.

PLOS ONE Dear Dr. Cheng, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please follow the reviewers' comments and improve the writting and the abstract. Please submit your revised manuscript by Apr 01 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Sen Xiang Academic Editor PLOS ONE Journal Requirements: Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #2: All comments have been addressed Reviewer #3: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly Reviewer #2: Yes Reviewer #3: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: N/A Reviewer #2: Yes Reviewer #3: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: No Reviewer #2: Yes Reviewer #3: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The authors have done a good job in addressing my comments. There are still some grammar mistakes, and the authors should further improve their writing, especially the abstract. Reviewer #2: The revised manuscript has addressed all my concerns. I have no further questions about the current version Reviewer #3: (No Response) ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

15 Mar 2022 [Answers for Reviewer 1’ Comments] First of all, we would lie to thank you for all that you commented for our paper. Point 1: The authors have done a good job in addressing my comments. There are still some grammar mistakes, and the authors should further improve their writing, especially the abstract. Response 1: These are improving of our papers, as follow: For the first, the abstract: The diversity and complexity of skin lesion images are the challenges of dermoscopic image segmentation. Convolutional neural networks(CNNs) have played an essential role in addressing it in recent years. In this paper, to segment this kind of image better, a novel feature extraction module PRA(Pyramid Residual Attention) is designed from a new perspective. PRA consists of a residual module, an attention module, and a pyramid module. We use the denoising function of the encoder-decoder structure and combine it with a multi-scale feature pyramid. The channel attention is to monitor the feature extraction process of the pyramid to ensure the efficiency of the PRA. The PRA Network is composed of shallow, middle, and deep layers. Both the shallows and middle layer consist of two PRA modules, and the deep layer consists of the single one. The shallow, medium, deep layers in PRA Network link by down-sampling and up-sampling to improve the segmentation effect. The network is subjected to ablation experiments on the ISIC 2018 dataset to verify the effectiveness of the PRA module and achieve better results in all metrics. Our method achieved better results for Dice coefficient (DC) indices equal to 93.37% and 89.47%, for MIOU indices up to 87.94% and 82.14% respectively for the segmentation tasks on the ISIC 2018 dataset and ISIC 2017 dataset. The PRA Network is more effective compared with existing advanced methods. We correct the mistakes and trim the word surplus in abstract. The second, details: 1、The dermoscopy is designed to provide the physician with high-resolution images of abnormal parts of the patient’s epidermis. Change to The dermoscopy provides the physician with high-resolution images of abnormal parts of the patient's epidermis. …… We have used the grammarly to check mistake of all sentences, and correct them one by one. But many use of passive voice and objective voice are confused. We try our best to improve our writing and become better. We are grateful for this suggestion. In this regard, we carefully checked and corrected the mistakes. Thanks for your comments. Submitted filename: The Answers for Reviewer 1.docx Click here for additional data file. 23 Mar 2022

PONE-D-21-32574R2

Dermoscopic Image Segmentation Based on Pyramid Residual Attention Module.

PLOS ONE Dear Dr. Cheng, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by May 07 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:

4 Apr 2022 [Answers for academic editor’ Comments] ：Please improve the language as the reviewer suggested. Response 1: We try our best to improve our paper, and using the grammarly to guarantee the grammar correct. In order to make the paper normal, we motivate the good paper’s rhetoric. Thanks for your comprehension and supporting. There are changes in our paper compared to the last version. Firstly, the abstract is changed from The diversity and complexity of skin lesion images are the challenges of dermoscopic image segmentation. Convolutional neural networks(CNNs) have played an essential role in addressing it in recent years. In this paper, to segment this kind of image better, a novel feature extraction module PRA(Pyramid Residual Attention) is designed from a new perspective. PRA consists of a residual module, an attention module, and a pyramid module. We use the denoising function of the encoder-decoder structure and combine it with a multi-scale feature pyramid. The channel attention is to monitor the feature extraction process of the pyramid to ensure the efficiency of the PRA. The PRA Network is composed of shallow, middle, and deep layers. Both the shallows and middle layer consist of two PRA modules, and the deep layer consists of the single one. The shallow, medium, deep layers in PRA Network link by down-sampling and up-sampling to improve the segmentation effect. The network is subjected to ablation experiments on the ISIC 2018 dataset to verify the effectiveness of the PRA module and achieve better results in all metrics. Our method achieved better results for Dice coefficient (DC) indices equal to 93.37% and 89.47%, for MIOU indices up to 87.94% and 82.14% respectively for the segmentation tasks on the ISIC 2018 dataset and ISIC 2017 dataset. The PRA Network is more effective compared with existing advanced methods. to We propose a stacked convolutional neural network incorporating a novel and efficient pyramid residual attention module (PRA) for the task of automatic segmentation of dermoscopic images. Precise segmentation is a significant and challenging step for computer-aided diagnosis technology in skin lesion diagnosis and treatment. The proposed PRA has the following characteristics: Firstly, we concentrate on three widely used modules in the PRA. The purpose of the pyramid structure is to extract the feature information of the lesion area at different scales, the residual means is aimed to ensure the efficiency of model training, and the attention mechanism is used to screen effective features maps. Thanks to the PRA, our network can still obtain precise boundary information that distinguishes healthy skin from diseased areas for the blurred lesion areas. Secondly, The proposed PRA can increase the segmentation ability of a single module for lesion regions through efficient stacking. Finally, we incorporate the idea of encoder-decoder into the architecture of the overall network. Compared with the traditional networks, we divide the segmentation procedure into three levels and make the pyramid residual attention network (PRAN). The shallow layer mainly processes spatial information, the middle layer refines both spatial and semantic information, and the deep layer intensively learns semantic information. The basic module of PRAN is PRA, which is enough to ensure the efficiency of the three-layer architecture network. We extensively evaluate our method on ISIC2017 and ISIC2018 datasets. The experimental results demonstrate that PRAN can obtain segmentation performance comparable to state-of-the-art deep learning models under the same experiment environment conditions. Secondly, the capitalization and voice problems of the first letter of the sentence are all solved, because these correcting are so many, they will be highlighted in the track version. Thanks for your comments. Submitted filename: answers for editors.docx Click here for additional data file. 8 Apr 2022 Dermoscopic Image Segmentation Based on Pyramid Residual Attention Module. PONE-D-21-32574R3 Dear Dr. Cheng, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Sen Xiang Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: 8 Jul 2022 PONE-D-21-32574R3 Dermoscopic Image Segmentation Based on Pyramid Residual Attention Module Dear Dr. Cheng: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Sen Xiang Academic Editor PLOS ONE

14 in total

1. A Mutual Bootstrapping Model for Automated Skin Lesion Segmentation and Classification.

Authors: Yutong Xie; Jianpeng Zhang; Yong Xia; Chunhua Shen
Journal: IEEE Trans Med Imaging Date: 2020-02-10 Impact factor: 10.048

2. CE-Net: Context Encoder Network for 2D Medical Image Segmentation.

Authors: Zaiwang Gu; Jun Cheng; Huazhu Fu; Kang Zhou; Huaying Hao; Yitian Zhao; Tianyang Zhang; Shenghua Gao; Jiang Liu
Journal: IEEE Trans Med Imaging Date: 2019-03-07 Impact factor: 10.048

Review 3. A survey on deep learning in medical image analysis.

Authors: Geert Litjens; Thijs Kooi; Babak Ehteshami Bejnordi; Arnaud Arindra Adiyoso Setio; Francesco Ciompi; Mohsen Ghafoorian; Jeroen A W M van der Laak; Bram van Ginneken; Clara I Sánchez
Journal: Med Image Anal Date: 2017-07-26 Impact factor: 8.545

4. DSNet: Joint Semantic Learning for Object Detection in Inclement Weather Conditions.

Authors: Shih-Chia Huang; Trung-Hieu Le; Da-Wei Jaw
Journal: IEEE Trans Pattern Anal Mach Intell Date: 2021-07-01 Impact factor: 6.226

5. CA-Net: Comprehensive Attention Convolutional Neural Networks for Explainable Medical Image Segmentation.

Authors: Ran Gu; Guotai Wang; Tao Song; Rui Huang; Michael Aertsen; Jan Deprest; Sebastien Ourselin; Tom Vercauteren; Shaoting Zhang
Journal: IEEE Trans Med Imaging Date: 2021-02-02 Impact factor: 10.048

6. Zero-Shot Video Object Segmentation With Co-Attention Siamese Networks.

Authors: Xiankai Lu; Wenguan Wang; Jianbing Shen; David Crandall; Jiebo Luo
Journal: IEEE Trans Pattern Anal Mach Intell Date: 2022-03-04 Impact factor: 6.226

7. Automatic Skin Lesion Segmentation Using Deep Fully Convolutional Networks With Jaccard Distance.

Authors: Yading Yuan; Ming Chao; Yeh-Chi Lo
Journal: IEEE Trans Med Imaging Date: 2017-04-18 Impact factor: 10.048

8. Automatic skin lesion segmentation based on FC-DPN.

Authors: Pufang Shan; Yiding Wang; Chong Fu; Wei Song; Junxin Chen
Journal: Comput Biol Med Date: 2020-07-17 Impact factor: 4.589

9. SAR-U-Net: Squeeze-and-excitation block and atrous spatial pyramid pooling based residual U-Net for automatic liver segmentation in Computed Tomography.

Authors: Jinke Wang; Peiqing Lv; Haiying Wang; Changfa Shi
Journal: Comput Methods Programs Biomed Date: 2021-07-06 Impact factor: 5.428

10. Computer-aided diagnosis of psoriasis skin images with HOS, texture and color features: A first comparative study of its kind.

Authors: Vimal K Shrivastava; Narendra D Londhe; Rajendra S Sonawane; Jasjit S Suri
Journal: Comput Methods Programs Biomed Date: 2016-01-20 Impact factor: 5.428