Literature DB >> 35125928

Deep structured residual encoder-decoder network with a novel loss function for nuclei segmentation of kidney and breast histopathology images.

Amit Kumar Chanchal¹, Shyam Lal¹, Jyoti Kini².

Abstract

To improve the process of diagnosis and treatment of cancer disease, automatic segmentation of haematoxylin and eosin (H & E) stained cell nuclei from histopathology images is the first step in digital pathology. The proposed deep structured residual encoder-decoder network (DSREDN) focuses on two aspects: first, it effectively utilized residual connections throughout the network and provides a wide and deep encoder-decoder path, which results to capture relevant context and more localized features. Second, vanished boundary of detected nuclei is addressed by proposing an efficient loss function that better train our proposed model and reduces the false prediction which is undesirable especially in healthcare applications. The proposed architecture experimented on three different publicly available H&E stained histopathological datasets namely: (I) Kidney (RCC) (II) Triple Negative Breast Cancer (TNBC) (III) MoNuSeg-2018. We have considered F1-score, Aggregated Jaccard Index (AJI), the total number of parameters, and FLOPs (Floating point operations), which are mostly preferred performance measure metrics for comparison of nuclei segmentation. The evaluated score of nuclei segmentation indicated that the proposed architecture achieved a considerable margin over five state-of-the-art deep learning models on three different histopathology datasets. Visual segmentation results show that the proposed DSREDN model accurately segment the nuclear regions than those of the state-of-the-art methods.

Entities: Chemical

Keywords: Histopathology images; Kidney cancer diagnosis and prognosis; Nuclei segmentation; Residual learning

Year: 2022 PMID： 35125928 PMCID： PMC8809220 DOI： 10.1007/s11042-021-11873-1

Source DB: PubMed Journal: Multimed Tools Appl ISSN： 1380-7501 Impact factor: 2.577

Introduction

Recent research trends shows that the deep learning framework performed very well for segmentation, detection, and other computer vision tasks. In the last decade, with the advancement of new types of computation systems, proper strategies to handle overfitting problems to train very deep networks, and lots of changes that are suitable for deep learning networks. Segmentation of haematoxylin and eosin (H & E) from stained histopathology image is the primary prerequisite in artificial pathology. The histopathology slides preparations are discussed by Slaoui M et al. in [27], by the following steps: (I) Tissue collection (II) Fixation (III) Embedding (IV) Sectioning (V) De-paraffining (VI) Staining (VII) Digitizing the slide by whole slide imaging (WSI). There are several tissue collection methods which are fine needle aspiration, biopsy needle, excisional biopsy, etc. A larger biopsy has more information than a small needle biopsy because it preserves the large cellular context of histopathology slides. Fixation of tissue is needed for chemical and physical stabilization. Embedding is required to give a particular shape to the tissue such that it can be easily cut by the machines. Sectioning is required to get all three-dimensional tissue information in the form of many thin slides two-dimensional information. Removing paraffin from the sectioned tissue is important, without de-paraffining the tissue may look a little bit blurry in some of the portions. Staining of the tissue slides is required because it is not visible or kind of transparent under bright field microscopy. The most widely used stains for histopathology images are hematoxylin and eosin. Segmentation tasks can be categorized into traditional or handcrafted feature extraction techniques and CNN-based deep learning approaches. Traditional segmentation methods are mostly based on similarity-based approach, discontinuity-based approach, watershed techniques, active contour methods and their variants, superpixel, and clustering-based methods, etc. The similarity-based approach discussed by Gonzalez R C et al. in [8], is based on local thresholding, global thresholding, adaptive thresholding, Otsu’s thresholding, region growing, region splitting, and merging, where these methods try to group and segment similar pixels. For image histogram having flat valleys, the similarity-based approach does not work well and the wrong selection of threshold value may result in over-segmentation and under- segmentation in this case. The discontinuity-based approach tries to segment those pixels which are isolated in some manner like point, line, edges, and it is a mask processing-based approach. This method requires different operators at different stages. Cousty J et al. proposed watershed segmentation method in [4], based on the split, merge, and marker controlled watershed. Detected boundaries in the watershed method depend on cell complexity. Song T et al. proposed active contour segmentation in [28], where they consider intensity information and local edge information for the detection of object boundaries. Superpixel segmentation method used by Albayrak A et al. in [1], is based on the cluster of connected pixels having identical features. It considers the color and coordinate information of neighbor pixels. This technique provides better regional information but is not very effective in the case of cell segmentation. Clustering based segmentation proposed by Win K Y et al. in [37], performs grouping based on their similarity. In recent research work, most of the authors reported that, the segmentation technique based on a deep convolutional neural network performs far better than the conventional segmentation approach. A concise review of CNN based approach is presented in Section 2. Deep learning segmentation methods also suffer from many challenges. If we categorize these challenges, it will come under the following aspects. Due to large variations of tissue appearance and a varied spectrum of class and sub-class of tissues, it is difficult to recognize. Segmentation of complex boundaries, overlapped boundaries, and vanishing boundaries is not an easy task. Preparation of ground truth in the case of supervised learning is also a big challenge. Supervision of experienced pathologists is necessary since prediction accuracy totally depends on annotated ground truth. In the case of complex histopathology images, conventional methods suffer either over-segmentation or under-segmentation. The proposed approach focuses to separate the overlapped and vanished nuclear regions from histopathology images. To address the challenges in the segmentation of nuclei from histopathology images our contributions in this paper as follows. To strengthen the multi-level intermediate features, our proposed DSREDN model effectively utilized the strength of residual learning. Through empirical evidence and careful experimentation and analysis, we proposed a novel loss function. Visual results and performance matrices indicate that our loss function better trains the model and accurately segment the nuclear regions compared to the state-of-the-art methods. The manuscript is organised as follows: A brief introduction about segmentation and challenges associated with it is described in Section 1. A concise review of CNN based approach is presented in Section 2. Sections 3 and 4 cover detailed analysis of the proposed method and their implementation with benchmark datasets. Experimental results and discussions are presented in Section 5 and finally, we concluded our work in Section 6.

Related work

Most of the CNN architecture for the cell segmentation task consists of an encoder-decoder path for feature extraction. Many of the recent research utilizes lots of potential opportunities like improving training strategies, handling overfitting problems, better optimization methods, and many other strategies to obtain better prediction accuracy. However, many authors reported their result which is very efficient but an accurate and efficient segmentation algorithm is still open-ended research due to complexity in histopathology images. One of the significant contributions by Ronneberger et al. in [26], called UNet, provides a very good direction and a dramatic breakthrough in the field of biomedical image segmentation. UNet is a symmetric encoder-decoder convolutional network and has a large number of feature channels that allow to extract feature to the higher layer in a deep network. Repeated application of (3 x 3) convolution kernel followed by ReLU activation, (2 x 2) maxpooling and (2 x 2) up-sampling with stride size of 2 and (1 x 1) convolution followed by sigmoid activation at final layer, total of 23 layers in the network. In [36], Veit A et al. realized through their experiment that if a network has a collection of paths then a shorter path is enough during training or a very deep path is not required during training. These multiple paths do not strongly depend on each other and their smooth co-relation with multiple valid paths increases the performance of the network. In [22], Milletari F et al. proposed an encoder-decoder convolutional network for three-dimensional data by utilizing dice loss as a loss function. Their empirical evaluation achieves better performance on the strong imbalance dataset. In [24], Nogues I et al. proposed architecture for detection of lymph nodes by two fully nested supervised convolutional networks and a structured conditional random field optimization strategy. Degradation of information in deeper network addressed by Kaiming He et al. in [9], by introducing deep residual network which is easier to train and optimize. The residual connection is realized by skipping one or more layers to restore the flow of information in a deep network. For segmentation and detection of histological objects, Chen H et al. in [5], introduced a contour-aware model that extracts multi-level information under auxiliary supervision. In [10], Huang G et al. proposed a convolutional network, which strengthens the overall flow of the input feature map by feeding preceding layer input as well as original input both. Their experiment also indicates that due to integration of identity mapping, model learns more compact features and reduces the vanishing gradient problem. In the case of an imbalanced dataset, predictions are biased towards high precision and low recall which is not tolerable especially in the medical field. This problem is addressed by Salehi S M et al. in [29], which trained the deep network, even with a highly imbalanced dataset, and handled effectively where false negative prediction is much dangerous than false positive. The behavior of loss function such as weighted cross-entropy and dice loss with different learning rates examined by Sudre C H et al. in [30], on medical images and house dataset. Their experiment found that as the level of imbalance increases overlap measure-based loss function is more effective. A very efficient in terms of memory and time for semantic segmentation of road and indoor scene, an encoder-decoder architecture called SegNet by Badrinarayanan V et al. in [3]. SegNet generates a sparse feature decoder that upsamples with the transferred pool and its lower resolution input from its encoder. To accurately segment near boundary regions Zhou S et al. in [38], used a residual network with a dilated convolution block. They utilize many hierarchical blocks in parallel to retrieve meaningful semantic information. To handle class imbalance problems or reducing false-negative predictions in healthcare, Hashemi S R in [11] proposes a 3D-dense CNN with Tversky index-based asymmetric similarity loss that trains the network with the lowest surface distance. Complex boundary-related segmentation problem addressed by Naylor P et al. in [25], by formulating a loss function based on intra- nuclear distance. Their encoder-decoder model outperform on FCN, FCN+PP, Mask R-CNN, U-Net, U-Net+PP experimented with TNBC and MoNuSeg datasets. Meaningful extensions in standard encoder-decoder by incorporating an additional module called attention gate by Schlemper J et al. in [31], and attention as well as residual mechanism by Lal S et al. in [20], where network is trained in such a way that it suppress irrelevant features while highlight the meaningful feature. For road scene segmentation Malekijoo A et al. in [23], utilized the autoencoder-based model where convolution, deconvolution, and pyramid pooling were applied for reinforcing the local feature. For the segmentation of microscopic, MR, and CT images an encoder-decoder architecture by Zhou S et al. in [39], linked meaningful connections to precisely locate the complex boundaries. For the segmentation of nuclei in pathology images, Lal S et al. model [21], consists of adaptive color deconvolution, multiscale thresholding followed by morphological operations, and other post-processing steps. For the segmentation of medical images, a novel loss function by Karimi D et al. in [16], estimated Hausdorff distance using the morphological operation method, distance transformation method, and by circularly convoluted kernels of different radius. Utilizing methods of reduction of the Hausdorff distance, they train CNN for various microscopy images and compare their results with a commonly used loss function. Hanif M S et al. in [12], proposed a competitive residual network by stacking multiple residual units called wide network. Their study concluded that the performance of such a wide network is better than the deep and thin network. Chanchal A K et al. and Aatresh A A et al. in [2, 6], used separable convolution pyramid pooling and dimension-wise pyramid pooling for nuclei segmentation tasks. A summary of state-of-the-art DL techniques useful for biomedical image segmentation is presented in Table 1.

Table 1

Summary of state-of-the-art DL techniques used for segmentation of medical images

Author	Application	Dataset	Modalities	Method	Toolbox	Activation Function	Loss Function	Optimizer	Performance Criteria
Ronneberger et al. [26]	Cell segmentation	ISBI- 2012	Light microscopic images	2D-CNN, Repeated application of Max-pooling and Up-convolution	Caffe	ReLu and Sigmoid	Cross-entropy	SGD	IOU= 92% for PhC-U373 and 77% for DIC-HeLa
Milletari F et al. [22]	Clinical	PROMISE 2012	MRI	Volumetric convolution, Feature channels doubles at each stage	Caffe	PReLu and Softmax	Dice loss	Standard back-propagation	Avg. Dice= 86.9%
Nogues et al. [24]	Lymph node cluster segmentation	MICCAI 2015	CT	Holistically-nested neural network, Structured optimization	Caffe	ReLu and Sigmoid	Cross-entropy and Auxiliary loss	SGD	Mean Dice coefficient= 82.1% ± 9.6%
He et al. [9]	Image recognition	ImageNet, CIFAR-10	1000 classes and 10 classes (Colored images)	Deep residual learning, Identity mapping by shortcuts	Caffe	Softmax	Cross-entropy	SGD	Error= 3.57%, 6.43%
Chen et al. [5]	Gland segmentation	Warwick-QU	Histopathology	2D-CNN, Auxiliary supervision, Transfer learning	Caffe	Softmax	Formulated loss based on per-pixel classification	Standard back-propagation	F₁ Score= 91% and 77%
Huang et al. [10]	Classification of natural images	CIFAR-10, 100, SVHN, and ImageNet	Colored images	Feature-maps of all preceding layers are used as inputs	Keras, Tensorflow	ReLu, Softmax	Categorical crossentropy	SGD	Error= 5.19%, 19.64%, 1.59%
Salehi et al. [29]	Training with unbalanced data	MICCAI 2008	MRI	3D fully convolutional network based on the UNet architecture	Nvidia Geforce GTX1080 GPU	ReLu, Softmax	Tversky	Adam	DSC= 56.42%, F₂ Score= 57.32%
Sudre et al. [30]	Training with unbalanced data	BRATS	MRI	2D and 3D deep learning framework	Tensorflow	ReLu and Sigmoid	Dice	SGD	Dice loss was found to be more robust than the other loss functions
Badrinarayanan et al. [3]	Indoor and road segmentation	CamVid, SUN RGB-D	RGB	Fully convolutional encoder decoder	Caffe	ReLu and Softmax	Cross-entropy	SGD	Mean IoU= 60.10, 31.84
Hashemi et al. [11]	Medical image segmentation	MSSEG-2016,ISBI	MRI	3D fully convolutional, 3D densely connected architecture	Nvidia Geforce GTX1080 GPU	ReLu and Softmax	Asymmetric similarity	Adam	Average Dice= 69.9%, 65.74%
Naylor et al. [25]	Segmentation of Nuclei	TNBC, TCGA	Histopathology	Fully convolutional network	Python 3, tensorflow	ReLu and Softmax	Regression loss	Adam	AJI= 55.98%, F₁= 78.63%
Schlemper et al. [31]	Medical image analysis	TCIA Pancreas, Multi-class abdominal	CT	Focuses on target structures by employing attention gate	PyTorch	ReLu and Sigmoid	Dice	SGD	Dice score (TCIA)= 82%, (Multi-class abdominal) = 84%
Lal et al. [20]	Liver cancer analysis	KMC liver	Histopathology	Employed residual block, attention mechanism, and joint loss function	TensorFlow 2.0, Keras	ReLu and Sigmoid	Formulated joint loss (Dice and Jaccard)	Adam	F₁= 83.59%, and JI= 72.06%
Malekijoo et al. [23]	Indoor and road segmentation	CamVid	RGB	Convolution deconvolution, Pyramid pooling	Python3, Tensorflow 1.4	ReLu and Softmax	Cross entropy	Adam	Mean IoU = 48.90%, Accuracy= 88.49%
Zhou et al. [39]	Low contrast medical image segmentation	Pelvic CT, Brain tumor, Nuclei segmentation	CT, MR and, Microscopic	Multiscale dense connections, high resolution pathways	Caffe	ReLu and Sigmoid	Difficulty guided cross-entropy	Adam	Dice ratio= 95%, 90%, and F₁ score= 91.1± 10.2
Lal et al. [21]	Nuclei segmentation	Gold-standard, KMC	Histopathology	Adaptive colour de-convolution, Multilevel thresholding	Python-3	–	–	–	Pr, Re, F₁ = 94.6%, 89.6% 91.2%, Pr, Re, F₁ = 78.7%, 76.6% 77.2%
Karimi et al. [16]	Medical image segmentation	TRUS, PROMISE12, 3D CT-Liver, and 3D CT-Pancreas	Ultrasound, MR, and CT	Distance transform, Morphological erosion, Spherical convolution kernels	Python 3.6, TensorFlow 1.2	Softmax	Hausdorff distance	Adam	DSC= 95%, 87%, 94%, 78%
Kumar et al. [17]	Clinical	Multiple organs	Histopathology	A CNN that produces a ternary map	PyTorch, NVIDIA Tesla	ReLu and Softmax	Cross entropy	Standard back-propagation	AJI= 50.83%, F₁ Score= 82.67%
Chanchal et al. [6]	Clinical	TNBC, Kidney, MoNuSeg	Histopathology	Separable convolution pyramid pooling in encoder-decoder network	Keras, TensorFlow, Python-3	ReLu and Sigmoid	BCE	Adam	AJI= 70%, 86%, 67%, F₁ Score= 82%, 92%, 80%
Aatresh et al. [2]	Clinical	Kidney, TNBC	Histopathology	Attention based encoder-decoder, ASPP, Dimension-wise convolutions	PyTorch	ReLu and Sigmoid	Global	Adam	AJI= 87%, 70%, F₁ Score= 93%, 82%
Chanchal et al. [7]	Clinical	TNBC, Kidney, MoNuSeg	Histopathology	High-resolution encoder-decoder path, An ASPP at bottleneck	Keras, TensorFlow, Python-3	ReLu and Sigmoid	BCE	Adam	AJI= 73%, 94%, 72%, F₁ Score= 84%, 97%, 83%

Summary of state-of-the-art DL techniques used for segmentation of medical images

Proposed architecture

For the segmentation of microscopy images, an encoder-decoder architecture is best suited due to the fact that if an encoder has regular convolution layers, max-pooling layers, then it captures the context in the image very effectively. Decoder path presents the output by gradually applying up-sampling, collecting relevant features from the encoder, and enable precise localization. For each of the filters in the encoder side of the DSREDN network shown in Fig. 1, accepts input of flexible size. we have applied regular (3 x 3) 2D standard convolution, batch normalization, and max-pooling. To avoid the saturation problems and loss of information while going deeper into the network, we restored the lower layered information by creating an additional path parallel to the main path of the network. These two paths are not strongly correlated to each other and it avoids vanishing gradient problems. For each of the filter sizes, the entire encoder side of the DSREDN network consists of three convolution layers in parallel with a single convoluted path that focused to flow the more contextual feature in the network. Since the effectiveness of the decoder path to generate the final output depends on the collection of contextual features from the encoder side, we have a slightly different path on the decoder side, for the optimal processing of the collected feature. By this procedure, our DSREDN network becomes wide and deep instead of thin and deep. DSREDN network trained with RGB images of size (512 x 512 x 3). Five stages of encoder path having five different filter sizes and corresponding decoder path consist of (a) 2D convolution of kernel size (3 x 3) with ReLU activation (b) A high-resolution layer (c) (2 x 2) max-pooling layer in encoder path to reduce the spatial size of the image and corresponding (2 x 2) up-sampling layer in decoder side to collect contextual feature from encoder side by concatenation operation (d) At the final stage a (1 x 1) convolution is used to map the size (512 x 512 x 16) to (512 x 512 x 1) with sigmoid activation.

Fig. 1

Proposed deep structured residual encoder-decoder network

Standard convolution layer

Convolution layers have a set of kernels to extract the features and weights of those kernels are automatically learnable. We have given an RGB image of size (512 x 512 x 3). For the input vector x(j, k) and filter size h(j, k) the normal 2D convolution results y(j, k) and is expressed in (1). We have an image of (M x M x C), if we apply N kernel of (K x K x C) on it with a step size of S and padding P, the size of resulting image as in (2). For single-step size and no padding the resulting feature size as in (3).

High resolution layer

Proposed DSREDN architecture has a high-resolution encoder and more effective decoder shown in Fig. 2. The mathematical expression of the flow of features is shown below and it can be seen that these two paths are not strongly dependent on each other and their smooth co-relation with multiple valid paths increases the performance of the network and will reduce the vanishing gradient problem. This wide network strengthens the overall extracted feature map by feeding preceding layer input as well as the original input. Our experiment also indicates that due to the integration of this wide network our model got the additional discriminative capability and is able to retrieve more compact features compared to the other existing deep model.

Fig. 2

High resolution encoder path (Left) and decoder path (Right)

High resolution encoder path (Left) and decoder path (Right) Aggregation of features for high-resolution encoder path is as follows: Aggregation of features for high resolution decoder path can be expressed as:

Activation function

If the dataset is linearly separable, then the linear activation function does well, but in the real world, the dataset is rarely linearly separable. The output of the neural network is based on the linear operation of variables and a linear function does not allow the model to learn complex relations. Nonlinear activation function can process almost any nonlinear relation and provide very good prediction results. Nonlinearity helps the model to adapt or generalized with a variety of data. Rectified Linear Unit (ReLU) is the most popular activation function in deep learning models. We have used ReLU activation function in all of the hidden layers of the DSREDN model due to computational simplicity, representational sparsity, and its linear behavior. ReLU activation function and its derivative are expressed mathematically by (4), and (5). At the final stage a (1 x 1) convolution is used to map the size (512 x 512 x 16) to (512 x 512 x 1) with sigmoid activation.

Pooling layer

To make the architecture scale-invariant, rotation invariant, and location invariant pooling is used. pooling operation on (4 x 4) with a step size of S = 2 and kernel K=(2 x 2) is shown in Fig. 3.

Fig. 3

Max Pooling, step size S = 2 kernel K = 2

Batch normalization

A deep network has large number of layers and there are lots of activity happening between these layers. It has been observed that if input at each of the pre-activation is with a single distribution, ideally it should be Gaussian distributed or mean-centered then it is easier to train a very deep network. A constant and small change at an earlier stage of input leads to a significant effect on the latter layer and in that case due to internal co-variate shift shown in Fig. 4, it is difficult to train a very deep network. To avoid such problems we have applied batch-normalization used by Ioffe S et al. in [13], so that all input coming from the previous layer guaranteed to be the same distribution and convergence become faster. Batch normalization is an additional layer and works as a regularizer.

Fig. 4

Internal co-variate shift of batches

Training and implementation

We have used Google Colab notebook (with 12 GB Google GPU), Tensorflow 2.0, and Keras API for simulation of all models. The most important part of any optimization method in deep learning is about gradient which is used to update the weights. Equation (6) is a general weight update equation. We have used Adam (Adaptive Moment Estimation) optimizer. Adam [19] has integrated with nice features of the RMSProp and Stochastic gradient descent (SGD) with Momentum (SGD +Momentum) algorithms. The final update equation is a combination of RMSProp and SGD with Momentum. Adam is one of the fastest optimization approaches for recent research trends. Followings are the significant controlling parameters that we fine-tuned: (a) Learning rate= 0.0001 for training of all the model with three datasets (b) Weight decay constant, we have selected learning rate and weight decay constant based on trial and error (c) Batch size= 4, Larger batch size allow us to parallelize computations to a greater degree but lead to poor generalization (D) The size of the convolution filter used as per available resources. We have considered F1-score used by Lal S et al. and Aatresh A A et al. in [2, 21], and Aggregated Jaccard Index (AJI) used by Naylor P et al. in [25], the total number of parameters, and FLOPs (Floating point operations), which are mostly preferred performance measure metrics for comparison of nuclei segmentation.

Datasets

Kidney dataset: Kidney dataset was prepared by Irshad H et al. [14]. it consists of 730 H&E images (400 Pixels x 400 Pixels) and their corresponding ground truth. Triple negative breast cancer (TNBC) dataset: TNBC dataset was prepared by Naylor P et al. [25]. This dataset consists of 50 H&E images (512 Pixels x 512 Pixels) of breast tissue. We performed data augmentation like horizontal flip, vertical flip and rotation such that we have sufficient number of training images. MoNuSeg-2018 dataset: This dataset was prepared by Kumar N et al. [17] and included seven different organs of colon, breast, kidney, liver, stomach, prostate, and bladder. It consists of 44 images (1000 Pixels x 1000 Pixels). We performed patch processing of obtained images to make the input image of dimension (512 Pixels x 512 Pixels) and some data augmentation techniques to make the training set larger.

Loss function

The deep learning model learns by means of loss function which is driven by a calculation of error. This error is the difference of actual value and predicted value as in the regression problem and the difference of actual distribution and predicted distribution in the case of a classification problem. With the help of suitable optimization algorithm, the loss function learns to reduce the error. Following are the most preferred loss function in the case of binary segmentation.

Binary cross-entropy

Binary Cross-Entropy is a combination of sigmoid activation and Cross-Entropy, which is discussed by Chanchal et al. in [6]. we have a countable set of symbols X= {x1,x2.........x,x}. Y is the discrete probability distribution which represents the probability of occurrence of those symbols. Y= {y1,y2.........y,y}. Let y = p(x) where y is the probability of occurrence of symbol x. According to the concept of entropy the minimum number of bits required to represent the i symbol is x = log(1/y). If we consider entire distribution Y to achieve the optimal number of bits per transmission through some channel, then the optimal number of bits is known as entropy. Mathematically it is just the expected number of bits per encoding and can be shown in (7). now let with probability of occurrence if we encode symbol X using different symbol then encoding will require instead of then we define Cross-Entropy . If distributions of Y and are equal then entropy and cross-entropy are equal. For binary cross-entropy we consider two classes, class C1 and class C2. (a) represents the ground truth label for class C1, S1 represents the sigmoid score for class C1 (b) represents the ground truth label for class C2, S2 = (1 − S1) represents the sigmoid score for class C2. Binary cross-entropy (BCE) used by Ronneberger et al. in [26] and is defined by (8), and (9).

Weighted binary cross-entropy

For skewed data sometimes weighted binary cross entropy (WBCE) loss function performs better. Normally for minority class we assign higher weights. The purpose of using class weights is to change the loss function so that the training loss cannot be minimized within a limit. It is a way of passing weights to the binary cross entropy loss function used by Jadon S et al. and Sugino T et al. in [15, 35]. Equation (10) describes the weighted binary cross entropy loss function. Here β is used to assign weights to the more relevant objects. y, and represents ground truth and predicted results respectively.

Dice loss

In case of binary segmentation and having class imbalance dice loss is a most preferred overlap measure used by Milletari F et al. and Sudre C H et al. in [22, 30]. There is little difference in the calculation of loss using intersection over union (IOU) and dice. Equations (11), (12), and (13) describes the dice loss function. While the calculation of dice is by calculating the harmonic mean of precision and recall.

Proposed loss function

The proposed loss function has conceived the idea of dynamically scaled cross-entropy loss. Dynamically scaled cross entropy loss which is a pixel based loss that automatically assigns more weight to the portion that of our interest or object portion and down-weight to those which are of less interest with the help of hyper parameter α and γ coefficient.

Dynamically scaled cross-entropy loss

For highly imbalanced dataset dynamically scaled cross-entropy loss predict with better accuracy. It is a dynamically scaled cross entropy loss that automatically assigns more weight to the portion that of our interest or object portion and down-weight to those which are of less interest. From BCE loss we can derived dynamically scaled cross-entropy loss Jadon S et al. and Sugino T et al. in [15, 35], and can be expressed in (14), (15), (16), and (17). To make the notation more convenient we can write: Now BCE can be written in a very precise manner as follows: A modulation factor (1 − p),γ > 0 to down-weight the background and assign more weight to object region and a hyper-parameter α,0 < α < 1 make the above loss function dynamically scaled cross-entropy loss.

Approximation of dice loss

This loss derived from dice coefficient by setting different weight to false positive and false negative with the help of β coefficient. Mathematically This loss can be expressed in (18). Proposed loss function calculated based on the concept of dynamically scaled cross entropy loss in (17). Segmentation results indicated that the dynamically scaled Dice loss shown in (19) better train any encoder-decoder model with histopathological data. Superiority of proposed loss function with F1-score/IOU score of two benchmark model is shown in Table 2. DI indicates Dice index and the value of α and γ ranges 0 < α < 1, 1 < γ < 3, β value can be used to tune false negative and false positive.

Table 2

Superiority of proposed loss function F1/AJI with benchmark model

Dataset Model	(Kidney) UNet	(TNBC) UNet	(MoNuSeg) UNet	(Kidney) Proposed	(TNBC) Proposed	(MoNuSeg) Proposed
BCE	0.8537/0.7489	0.7324/0.6559	0.7278/0.6029	0.9422/0.8994	0.8008/0.6698	0.7854/0.6675
Dice	0.8699/0.7631	0.7470/0.6690	0.7186/0.5991	0.9479/0.9044	0.8100/0.6828	0.7927/0.6607
WBCE	0.8408/0.7376	0.7360/0.6585	0.7168/0.5938	0.9460/0.9115	0.7930/0.6588	0.7919/0.6597
Focal	0.8622/0.7563	0.7228/0.6467	0.7399/0.6088	0.9346/0.8990	0.8175/0.6928	0.7940/0.6619
Tvesky	0.8793/0.7713	0.7573/0.6782	0.7423/0.6149	0.9286/0.9040	0.8346/0.7182	0.7810/0.6513
BCE+Dice	0.8479/0.7329	0.7250/0.6493	0.7321/0.6065	0.9472/0.9010	0.8087/0.6796	0.7950/0.6637
Proposed Loss	0.8878/0.7732	0.7653/0.6854	0.7460/0.6179	0.9579/0.9261	0.8517/0.7415	0.8065/0.6795

Values in bold indicate the highest performance score of the proposed loss function among other comparable loss functions

Superiority of proposed loss function F1/AJI with benchmark model Values in bold indicate the highest performance score of the proposed loss function among other comparable loss functions The training and validation curve showed in Figs. 5, 6, and 7 indicates the proposed loss function better represents the validation data. For the best representation of the model, the training and validation curve should be closer to each other and for optimal bias and variance, they should be collapsed to each other. The model has both curves closer to each other, that model is robust during testing. The large gap between the training and validation curve indicates the model is working very well in training data but not as much as for validation data. Using proposed loss our model is much generalized to work on different types of data.

Fig. 5

Fig. 6

Accuracy and loss plot (TNBC Dataset) (a) Accuracy plot with proposed loss function (b) Accuracy plot with BCE loss function (c) loss plot with proposed loss function (d) Loss plot with BCE loss function

Fig. 7

Accuracy and loss plot (MoNuSeg Dataset) (a) accuracy plot with proposed loss function (b) accuracy plot with BCE loss function (c) loss plot with proposed loss function (d) loss plot with BCE loss function

Accuracy and loss plots (Kidney Dataset) (a) Accuracy plot with proposed loss function (b) Accuracy plot with BCE loss function (c) Loss plot with proposed loss function (d) Loss plot with BCE loss function Accuracy and loss plot (TNBC Dataset) (a) Accuracy plot with proposed loss function (b) Accuracy plot with BCE loss function (c) loss plot with proposed loss function (d) Loss plot with BCE loss function Accuracy and loss plot (MoNuSeg Dataset) (a) accuracy plot with proposed loss function (b) accuracy plot with BCE loss function (c) loss plot with proposed loss function (d) loss plot with BCE loss function

Results and discussion

We compared our proposed model with five other CNN models which is a benchmark in the field of biomedical image segmentation. We expressed our simulation results in terms of F1-Score used by Lal S et al. and Aatresh A A et al. in [2, 21] and Aggregated Jaccard Index (AJI) used by Naylor P et al. in [25]. By calculating the harmonic mean of precision and recall, the F1 score is calculated and is the most preferred method to measure the retrieved information. AJI is a connected component-based method which is the improved version of the pixel-based global Jaccard Index. A higher AJI score indicates a better segmentation model. In Table 3, we compared the DSREDN model with five benchmark models for three histopathological datasets, and Table 4 shows a computational complexity comparison of the proposed DSREDN architecture with other segmentation architecture. Performance is measured in terms of F1 Score, AJI score, and the total number of trainable parameters that describe the training time and complexity. We have considered six sample test images, (two from each of the histopathological datasets) for visualization of results, and presented in Figs. 8, 9, and 10.

Table 3

Performance comparison of different models with three datasets

Model	F1	AJI	F1	AJI	F1	AJI
	(Kidney Dataset)		(TNBC Dataset)		(MoNuSeg Dataset)
Unet [26] (2015)	0.8537	0.7489	0.7324	0.6559	0.7278	0.6029
SegNet [3] (2017)	0.8972	0.8304	0.7685	0.6434	0.7879	0.6514
Dist [25] (2019)	0.8992	0.8272	0.7516	0.6727	0.7795	0.6467
Att. UNet [31] (2019)	0.9135	0.8590	0.7216	0.6194	0.7735	0.6315
HMEDN [39] (2020)	0.9262	0.8676	0.8124	0.7017	0.7865	0.6522
Proposed Model	0.9579	0.9261	0.8517	0.7415	0.8065	0.6795

Values in bold indicate the highest performance score of the proposed model among other comparable models

Table 4

Computational complexity comparison of different models

Model	Parameters(millions)	FLOPs(millions)
Unet [26] (2015)	31.3	62.7
SegNet [3] (2017)	18.8	47
Dist [25] (2019)	7.7	15.5
Att. UNet [31] (2019)	31.9	63.7
HMEDN [39] (2020)	0.20	0.45
Proposed Model	4.3	8.17

Values in bold indicate the highest performance score of the proposed model among other comparable models

Fig. 8

Comparison of predicted nuclear regions of five state-of-the-art models on kidney images

Fig. 9

Comparison of predicted nuclear regions of five state-of-the-art models on TNBC images

Fig. 10

Comparison of predicted nuclear regions of five state-of-the-art models on MoNuSeg images

Performance comparison of different models with three datasets Values in bold indicate the highest performance score of the proposed model among other comparable models Computational complexity comparison of different models Values in bold indicate the highest performance score of the proposed model among other comparable models Comparison of predicted nuclear regions of five state-of-the-art models on kidney images Comparison of predicted nuclear regions of five state-of-the-art models on TNBC images Comparison of predicted nuclear regions of five state-of-the-art models on MoNuSeg images

U-Net prediction

U-Net architecture has repeated application of 3x3 unpadded convolution, a total of 23 convolution layers, 2x2 max-pooling for downsampling, 2x2 upsampling, and each convolution layer followed by ReLU activation. Finally, 1x1 convolution is followed by sigmoid activation. U-Net architecture clearly identifies 51 nuclei out of 57 in image 1, 45 nuclei out of 53 nuclei in image 2. Around 10% to 15% nuclei are not clearly identified by this architecture. Some of the nuclei are in clustered form and some are not predicted. Two nuclei are clustered in image-1 and six are in image 2. Four additional ducts and six additional ducts are predicted in image 1 and image-2 respectively which are actually not desirable. This architecture almost follows a similar prediction pattern in the other four images.

SegNet prediction

A very efficient in terms of memory and time for semantic segmentation of road and indoor scene an encoder-decoder architecture called SegNet. To generate a sparse feature, decoder upsample with the transferred pool and its lower resolution input from its encoder. Its encoder has 13 convolution layers, followed by ReLU activation similar to VGG-16, batch normalized, 2x2 max-pooling, and corresponding 13 layer decoder. SegNet has a slightly different decoder that uses the max-pooling indices to upsample the feature and convolution with a trainable filter bank. Visual segmentation by SegNet indicates that a number of clearly detected nuclei is (80-85%), which is less than U-net. Overlapped nuclei are lesser than U-Net and no undesirable things are detected by this model. The number of partially detected nuclei increases compared to U-Net and is three and two in image-1 and image-2 respectively. The number of nuclei not identified is maximum in this case.

Attention U-Net prediction

Meaningful extensions in standard U-Net by incorporating an additional unit called attention gate that is trained in such a way that it suppresses irrelevant features while highlighting the meaningful feature by strengthening the capability of the decoder. With the use of attention coefficients, this architecture increases the receptive field which is the key for better semantic segmentation. Their attention module can be easily integrated into any other segmentation method. The number of clearly identified nuclei is almost similar to the SegNet model but partially detected nuclei is maximum in this case. Almost 5% of the nuclei are not predicted by this model.

Dist prediction

Complex boundary-related segmentation problem addressed by formulating a loss function based on intranuclear distance. They compare their model, namely DIST with FCN, FCN+PP, Mask R-CNN, U-Net, U-Net+PP for triple-negative breast cancer dataset and other datasets which are of seven different organs. In this architecture, all nuclei are identified either partially or completely. 82-88% nuclei clearly detected. The problem of segmentation of clustered nuclei, somehow solved, but not completely solved. Partially detected nuclei are less as compared to SegNet, U-Net, and Attention U-Net.

HMEDN prediction

For the segmentation of microscopic, MR, and CT images an encoder-decoder architecture that linked meaningful connections to precisely locate the complex boundaries. A dense connection path with dilated convolution blocks guided by modified binary cross-entropy, accurately detects vanishing boundaries of blurry images. In this architecture, prediction accuracy is slightly improved as compared to the Dist algorithm, but with an improvement in accuracy, false detection cases also increased in HMEDN. Four additional things in image-1 and two undesirable things in image-2 have been found.

Proposed DSREDN prediction

The best part of our model is that 90% of the nuclei are clearly identified and the rest of the nuclei are either in clustered form or partially detected. Out of 57 and 53 nuclei in image-1 and image-2, 55 and 48 nuclei clearly detected. Less number of additional and undesirable ducts were detected in the proposed model. Partially detected nuclei are very less in number. The predicted image has a morphology closer to the ground truth image. The problem with overlapped nuclei is somehow solved, but not completely solved.

Clinical significance

Less number of undesirable nuclei indicates the less number of false-positive cases, that are less in number in the proposed model. The number of nuclei that have not been detected is a case of false-negative, which is never desirable especially in health care. The predicted morphology closer to the ground truth image means those images have high clinical and diagnostic values. For clinical purposes, the proposed DSREDN model outperforms the five most recent state-of-the-art models.

Limitations

Reported results of different datasets show that there is a lack of generalizability in the segmentation of the nuclear region from histopathology images. This is mainly due to the histopathology slide and their corresponding ground truth preparation since the clinical significance of predicted output is highly dependent on the prepared slide and their semantic pixel-wise labeling. Another issue of this study is that segmented boundaries are not fine enough and it is still sub-optimal for clinical use. Issue of partially detected nuclei, overlapped nuclei, and false-positive cases found to be proportional to cell complexity.

Conclusion

This paper proposed a CNN-based architecture called a deep structured residual encoder-decoder network (DSREDN), that addressed two major concerns in automatic nuclei segmentation. The first major concern was to identify nuclei from histopathology images having a widely varied spectrum with a large number of artifacts. This problem was addressed by introducing a powerful encoder-decoder having two paths that have more discriminative capability and were able to retrieve relevant and compact textural information. The implemented networks effectively leverage the strength of residual learning as well as encoder-decoder architecture by incorporating wide and deep network paths that strengthen the intermediate features. We proposed an efficient loss function through careful experimentation and analysis to segment the nuclei having complex or vanishing boundaries which were the second major issue in the segmentation task. We have used the most preferred performance matrices F1-score and AJI score by performing experiments on the three different publicly available H&E stained histopathological datasets. The obtained quality metrics and predicted nuclear regions of the proposed framework were better in comparison to those of the state-of-the-art models. Although the proposed model produced excellent results, the feature space may be enriched further by incorporating a high-performance feature extraction module. Also, the proposed method can be generalized to work on more image modalities. This study is a binary segmentation of histopathology images, here we can only segment the nuclear regions. In future, we can grade these nuclear regions into their sub-types. Few innovative applications of different image modalities were reported by Shoeibi A et al. in [32, 33], in which generative adversarial networks (GANs), recurrent neural networks (RNNs), autoencoders (AEs), convolutional neural networks (CNNs), deep neural networks (DNNs), and other hybrid networks have been developed for automated detection of COVID-19 and multiple sclerosis. In [18, 34], Khodatars M et al. and Sadeghi D et al. illustrated the applicability of deep learning for the diagnosis of autism spectrum disorder and schizophrenia disease detection. These examples highlight how the field of computer-aided diagnosis systems is changing rapidly, and that there may still be numerous applications that have not been focused on yet.

20 in total

1. High-Resolution Encoder-Decoder Networks for Low-Contrast Medical Image Segmentation.

Authors: Sihang Zhou; Dong Nie; Ehsan Adeli; Jianping Yin; Jun Lian; Dinggang Shen
Journal: IEEE Trans Image Process Date: 2019-06-19 Impact factor: 10.856

2. Reducing the Hausdorff Distance in Medical Image Segmentation With Convolutional Neural Networks.

Authors: Davood Karimi; Septimiu E Salcudean
Journal: IEEE Trans Med Imaging Date: 2019-07-19 Impact factor: 10.048

3. Segmentation of Nuclei in Histopathology Images by Deep Regression of the Distance Map.

Authors: Peter Naylor; Marick Lae; Fabien Reyal; Thomas Walter
Journal: IEEE Trans Med Imaging Date: 2019-02 Impact factor: 10.048

4. Dual-Channel Active Contour Model for Megakaryocytic Cell Segmentation in Bone Marrow Trephine Histology Images.

Authors: Tzu-Hsi Song; Victor Sanchez; Hesham EIDaly; Nasir M Rajpoot
Journal: IEEE Trans Biomed Eng Date: 2017-04-04 Impact factor: 4.538

5. Automatic cell segmentation in histopathological images via two-staged superpixel-based algorithms.

Authors: Abdulkadir Albayrak; Gokhan Bilgin
Journal: Med Biol Eng Comput Date: 2018-10-16 Impact factor: 2.602

Review 6. Deep learning for neuroimaging-based diagnosis and rehabilitation of Autism Spectrum Disorder: A review.

Authors: Marjane Khodatars; Afshin Shoeibi; Delaram Sadeghi; Navid Ghaasemi; Mahboobeh Jafari; Parisa Moridian; Ali Khadem; Roohallah Alizadehsani; Assef Zare; Yinan Kong; Abbas Khosravi; Saeid Nahavandi; Sadiq Hussain; U Rajendra Acharya; Michael Berk
Journal: Comput Biol Med Date: 2021-10-29 Impact factor: 4.589

Review 7. An overview of artificial intelligence techniques for diagnosis of Schizophrenia based on magnetic resonance imaging modalities: Methods, challenges, and future works.

Authors: Delaram Sadeghi; Afshin Shoeibi; Navid Ghassemi; Parisa Moridian; Ali Khadem; Roohallah Alizadehsani; Mohammad Teshnehlab; Juan M Gorriz; Fahime Khozeimeh; Yu-Dong Zhang; Saeid Nahavandi; U Rajendra Acharya
Journal: Comput Biol Med Date: 2022-05-10 Impact factor: 6.698

8. Attention gated networks: Learning to leverage salient regions in medical images.

Authors: Jo Schlemper; Ozan Oktay; Michiel Schaap; Mattias Heinrich; Bernhard Kainz; Ben Glocker; Daniel Rueckert
Journal: Med Image Anal Date: 2019-02-05 Impact factor: 8.545