Literature DB >> 34466395

Residual Learning: A New Paradigm to Improve Deep Learning-Based Segmentation of the Left Ventricle in Magnetic Resonance Imaging Cardiac Images.

Maral Zarvani¹, Sara Saberi¹, Reza Azmi¹, Seyed Vahab Shojaedini².

Abstract

BACKGROUND: Recently, magnetic resonance imaging (MRI) has become a useful tool for the early detection of heart failure. A vital step of this process is a valid measurement of the left ventricle's properties, which seriously depends on the accurate segmentation of the heart in captured images. Although various schemes have been tested for this segmentation so far, the latest proposed methods have used the concept of deep learning to estimate the range of the left ventricle in cardiac MRI images. While deep learning methods can lead to better results than their classical alternatives, but unfortunately, the gradient vanishing and exploding problems may hamper their efficiency for the accurate segmentation of the left ventricle in MRI heart images.
METHODS: In this article, a new concept called residual learning is utilized to improve the performance of deep learning schemes against gradient vanishing problems. For this purpose, the Residual Network of Residual Network (i.e., Residual of Residual) substructure is utilized inside the main deep learning architecture (e.g., Unet), which provides more significant detection indexes. RESULTS AND
CONCLUSION: The proposed method's performances and its alternatives were evaluated on Sunnybrook Cardiac Data as a reliable dataset in the left ventricle segmentation. The results show that the detection parameters are improved at least by 5%, 3.5%, 8.1%, and 11.4% compared to its deep alternatives in terms of Jaccard, Dice, precision, and false-positive rate indexes, respectively. These improvements were made when the recall parameter was reduced to a negligible value (i.e., approximately 1%). Overall, the proposed method can be used as a suitable tool for more accurate detection of the left ventricle in MRI images. Copyright:

Entities: Chemical

Keywords: Deep learning; left ventricle; magnetic resonance imaging; residual learning; semantic segmentation

Year: 2021 PMID： 34466395 PMCID： PMC8382035 DOI： 10.4103/jmss.JMSS_38_20

Source DB: PubMed Journal: J Med Signals Sens ISSN： 2228-7477

Introduction

The cardiovascular system is one of the essential organs in the body, which through it, the oxygenated blood is pumped into the arteries. Due to congestive heart failure, the ventricles may not be fully drained, increasing the adjacent atria and veins' pressure. Since features that affect the cardiac function, especially the left ventricle, are practically treatable and reversible, measuring left ventricular volume is particularly important in cardiovascular monitoring and cure. Diagnostic devices for evaluating the cardiac function are generally divided into invasive and noninvasive types; the noninvasive devices are more common in practice. Angiographic computed tomography (CT) scan and magnetic resonance imaging (MRI) are the most common noninvasive heart monitoring methods, although angiographic MRI is more attractive owing to its better temporal resolution.[1] Furthermore, angiography is more suitable for atrial fibrillation as usually CT imaging is banned due to the irregular and high heart rate and risk of high radiation doses. In angiographic MRI for accurate measuring of the functional heart behavior should be distinguished from other adjacent organs in the captured chest images. Manual or automatic image segmentation are two main approaches to perform such separation. Manual segmentation (i.e., visual analysis) is time-consuming and entirely depends on the expertise of the technician; therefore, the automated segmentation scheme has been more preferred method during recent years. However, the automatic segmentation of angiographic MRI images also suffers from some problems which hamper its performance. The most important limiting factors are (i) cardiac MRI images exhibit a significant variation in terms of either gray levels or structural shapes, (ii) the gray levels of images may also differ due to the use of different MRI scans, (iii) blood flow in parallel with respiration motion may cause considerable image fuzziness, mostly due to blood flow, (iv) the shape of the ventricle varies both over the patient and time, and finally, (v) the low contrast of angiographic MRI images, which may cause low performance of the automated techniques for distinguishing the left ventricle in the images.[2] Several studies tried to solve these problems and improve the automated detecting left ventricle algorithms' performance in angiographic MRI images.[34] In general, the MRI cardiac image segmentation methods can be divided into bellow five categories:[5]

Thresholding based models

In the earliest methods, the main idea of thresholding was combined with several complementary image processing methods to extract the region of the left ventricle.[6] In such scenarios, the locations of the endocardial surfaces are initially approximated by intensity thresholding. Then, each approximated surface points are replaced with the nearest locally maximum gradient magnitude points to fit a cylinder. As the brightness of the pixels varies significantly for the limiting factors mentioned above, the high dependence on the threshold is the most significant weakness of these methods.

Deformable models

Deformable models are widely used in medical image segmentation, especially for detecting the heart's borders and segmenting the left and right ventricles. This method is a combination of geometry, physics, and estimation theory which its basic idea is considering an initial shape for the object and then minimizing the energy function. The initial shape is formulated according to the extent of some prior knowledge, such as location, shape, and size. One of the most popular schemes in this group is the active contour algorithm. Unfortunately, these algorithms have different problems, such as sensitivity to artifacts and noise, depending on the initial information.[789]

Learning appearance and shape

These statistical algorithms, which were widely used for left ventricle detection, use the training datasets that include the main image and its variations.[1011] The primary strategy is to minimize the energy function considering some parameters, including edge, texture, and elasticity. The performance is highly dependent on the training data, especially training manual contours, leading to low generalization power and reduced efficiency in the real applications.[12]

Atlas-based schemes

The algorithm and its derivatives, with capable of 3-D and 4-D heart segmentation,[1314] received a manually segmented image as the base image (Atlas), compare it to other images, and then measure their similarities. A mapping between the new image and the Atlas is done for the segmentation of the new image. Then the transforms on the new image are applied to the Atlas to obtain the final segmentation.[515] These algorithms cannot be accurate for new images as they suffer from weak generalization power.

Deep learning

Compared to other methods, deep neural networks have been widely applied to analyze various types of medical image processing schemes, including heart segmentation, thanks to their superior features.[4] In some researches, Deep Belief Networks have been utilized as a nonrigid classifier for heart segmentation in the medical images.[16] The disadvantage of this approach can be the lack of a dynamic model. In some other studies, the proposed methods classified the heart based on Convolutional Neural Networks (CNNs).[17] In these schemes, the spatial information lost in the local processing schemes may be improved by applying a fully connected CNN layer. Unet architecture was also introduced for heart segmentation in some researches.[18] The main difference between Unet and fully convolutional networks is the symmetry of Unet architecture along with adding concatenation operation in the decoder path, as is explained in section 2-1. Subsequently, various improvements in Unet are addressed in numerous articles that were mainly focusing on the innovations in the following three areas (i) changing the encoder network to extract more abstract features, (ii) modifying the up-sampling strategy, and (iii) changing the skip connections. For example, in[18] Unet++ was introduced. In the Unet network, data in skip connection path transmit directly from the encoder to the decoder, whereas, in Unet++, it passes through a series of Conv-blocks and transfers the feature maps.[19] Some other researchers used Feature Pyramid Networks advantages, including their flexibility and robustness, to propose the MFP-Unet model for semantic segmentation of the left ventricle.[20] Leclerc et al.[17] performed left ventricle image segmentation so that the training data were annotated by the Kalman filter for the Unet network. Furthermore, methods based on detecting the Region of Interest using deep neural networks were proposed in some other studies.[21] Finally, in some recent studies, magnetic resonance (MR) sequences of several subjects were investigated using Deep Multitask Relationship Learning Network, which contains a CNN for feature extraction and two recursive parallel networks for dynamic temporal modeling of cardiac sequences. In these techniques, sometimes Bayesian-based multitask learning was utilized for estimating left ventricle indices and a softmax classification for cardiac phase detection.[22] Yan et al.[23] reported that the accuracy of LV spatial segmentation improved as a result of (1) modification of the Unet encoder with additional temporal coherence features in cardiac images and (2) usage of dilated convolution layers. Furthermore, a more comprehensive classification of cardiac segmentation deep learning methods is presented in[24] including Recurrent Neural Network, Generative Adversarial Networks, Auto-Encoders, and 3-D segmentation networks. Zhang et al. considered both temporal and spatial properties using the Resnet-based LSTM networks, exhibiting more robust model compared to image inhomogeneity.[25]

Proposed Method

As the main idea of this research was to improve the performance of the Unet in heart segmentation by using the concept of residual learning, in this section, the Unet and ResNet-Unet structures are firstly described. Then, the proposed structure based on Residual of Residual (ROR)-Unet is explained in the next section.

Unet

The procedure of Unet involves two steps. In the first step, feature maps are extracted using the convolution layers, followed by applying Max Pooling. These maps contain the full context of the input image pixels; therefore, they are saved for use in the next step, as shown in Figure 1. In the second step, the stored feature maps are concatenated with applying the zero-padding, convolution, and up-sampling. This scheme may improve the recovery of spatial resolution at the network output.[26]

Figure 1

Schematic representation of the Unet architecture

Schematic representation of the Unet architecture This structure's remarkable fact is that skip connections improve the lost spatial features in the first step (encoding) because spatial information is crucial in semantic segmentation. Better representation and abstraction of the data encoding may lead to better network performance in the second step (i.e., decoding) and improve the semantic segmentation task. For this purpose, it is possible to use pretrained networks such as VGG[20] as the encoder of Unet architecture. ResNet-Unet architecture is discussed in the next section.

ResNet-unet

Although an increase in the depth of the neural networks leads to higher classification performance,[202127] but such an increase may cause gradient vanishing problem during back-propagation. ResNet networks may partially alleviate this problem by performing identity mapping, which skips from one or more layers. Using this technique caused the gradient was reinforced in the deeper layers in parallel with some increase in convergence speed. The composite structure, called ResNet-Unet, is shown in Figure 2 including two types of blocks: identity-block and convolutional-block. The identity block has not convolution layer in shortcut, and the output possesses the same dimension as the input dimension.[28] As shown in Figure 3 both these blocks consist of two 3 × 3 convolution layers, followed by ReLU and Batch normalization layers.

Figure 2

Figure 3

(a) The Conv-Block architecture (b) The architecture of Identity-Block

ResNet-Unet architecture in which the input image is applied by zero paddings, a 7 × 7 convolution layer with step 2 and 64 feature channels followed by batch normalization layer and nonlinear ReLu function. Then, conv-block and the number of identity blocks are applied in each branch. The internal architecture is illustrated in Figure 3. The decoding part is the same as Unet (a) The Conv-Block architecture (b) The architecture of Identity-Block As demonstrated in Figure 3, in the encoding path, four long skip connections transfer the copy of feature maps into the decoding path, which includes some identity and convolutional blocks. Furthermore, an up-sampling layer is utilized in the decoder in which the feature maps are transformed into high-resolution images. Finally, the output is concatenated with the feature maps of the corresponding encoding path. Equations (1-2) briefly describe the logics of identity and convolutional blocks f representing the nonlinear ReLU function, furthermore F, and F demonstrate the identity mapping and residual mapping functions, respectively. Finally O and O show identity and convolutional outputs, respectively. Oi(x) = f(F1(x)+x) (1) Oc(x) = f(F1(x)+F2(x)) (2)

Residual of residual-Unet

Taking into account that the problem of gradient vanishing is only partially reduced by using ResNet-Unet, in this study, the ROR (so-called ROR) structure is utilized as a type of residual learning to improve the performance of Unet. In the ROR structure, level-wise shortcut connections were added to the network, which may provide information exchange between the branches (contains a series of identity and Conv-block). This scheme may improve learning ability, thanks to the reduction of the gradient vanishing problem. As shown in Figure 4, the proposed ROR-Unet structure contains a similar encoding path to the ResNet-Unet. This path consists of identity-blocks and Conv-blocks, but the difference is that the shortcut connections have been added to the level by level residual blocks. The number of ahortcut levels is a hyper-parameter, which can significantly affect the results. Therefore, it should be determined by the multiple experiments. The proposed structure follows three shortcut levels as described below:

Figure 4

The encoder of Residual of Residual-Unet consists of 3 Branches. Each Branch contains 1 conv-block and 15 identity-blocks that has the same number of feature channels in identity and conv blocks. Each branch contains a Level 1 shortcut that contains one 1 × 1 convolution layer. A Level shortcut 0 by one 1 × 1 convolution layer, map the features from the start to the end of the Branch. After applying each max pooling layer, the size of the feature maps is halved. Z1, Z2, Z3, and Z4 are the outputs of branches which transport to the decoder layer and concatenate with the output of the transposed convolution layer Level-0 (so-called root shortcut): This shortcut maps the result of applying 3 × 3 convolution and zero padding on the input image into the end of the encoding path. This task is performed by making use of a 1 × 1 convolution Level-1: At this level, the ResNet architecture is divided into three branches based on the total number of identity-blocks and conv-blocks. Each component consists of one convolutional-block and more than ten identity-blocks. All convolution layers in every three shortcuts possess the same structure (e.g., 1 × 1 kernel size with stride one and padding the same). The only difference is the number of the feature channels, respectively, are 64, 128, and 256 from the beginning to the end Level-2: These shortcuts are original in ResNet structure and are used in identity and convolution blocks. Equations (3-8) show how the result of each of the mentioned branches is calculated. If Z1, Z2, Z3, and Z4 denote the outputs of branches as shown in equations (3)-(6), then equation (7)-(8) may illustrate that they are transported into the decoder layer and concatenate with the output of the transposed convolution layer (i.e., Deconvolution block). The deconvolution block applies a 3 × 3 convolution layer, followed by batch normalization, ReLU, and 2 × 2 up-sampling layers on its input data. Each up-sampling layer doubles the feature maps and halves the number of feature channels correspondingly. Z1 = X (3) Z2 = f(O16i1(Oc1(X)))+G(X) = f(O16i1(f(F1(X)+F2(X))))+G(X) (4) Z3=f(O16i2(Oc2(Z2)))+G(Z2) (5) Z4 = f(O16i3(Oc3(Z3)))+ G(Z3)+ G(X) (6) The above X equations are considered the result of applying consecutive zero padding and convolutional layers on the input image. Furthermore, O and O represent the output of convolutional-block and identity-block in the first branch. G() is defined as a residual mapping function for each input and f() demonstrates the ReLU activation function. Finally, O shows the result obtained from 16 identity blocks (also may be shown as (F. The proposed method by strengthens the residual learning concept (ROR) improved the gradient vanishing and exploding lead to makes it possible to deepen the network by adding more branches and layers, which leads to more abstract features that ultimately increase the accuracy of segmentation in comparison with Unet and ResNet-Unet.

Experiments and Results

The proposed algorithm was applied to a set of cardiac MRI images known as Sunnybrook Cardiac Data[29] to evaluate its performance. This dataset, also known as the 2009 Cardiac MR Left Ventricle Segmentation Challenge data, includes 805 images of 45 cine-MRI with the same sizes (256 × 256 pixel). The captured images may be classified into four categories: healthy (N), hypertrophy, heart failure with infarction (HF-I), and heart failure without infarction (HF-NI).[29] The data set was split into five subsets (i.e., k1 to k5) with an equal percentage (20%) to make possible using a 5-fold cross-validation strategy. In the Table 1, the number of samples of each category is shown for k1.k5, respectively. The test procedure and some of the images and training essential parameters are shown in Table 1.

Table 1

Specifications of train and test data

Categories	n	HYP	HF-I	HF-NI
Number of sample
k1
Train	113	143	195	193
Test	29	41	50	41
k2
Train	116	144	200	184
Test	26	40	45	50
k3
Train	113	151	191	189
Test	29	33	54	45
k4
Train	103	142	205	194
Test	39	42	40	40
k5
Train	123	156	189	176
Test	19	28	56	58
Number of gender
Male	6	7	11	8
Female	3	5	1	4
Average of age	60	57	61	64
Average contrast (%)	34.46	36.30	35.91	44.62

HYP: Hypertrophy, HF-I: Heart failure with infarction, HF-NI: Heart failure without infarction

Specifications of train and test data HYP: Hypertrophy, HF-I: Heart failure with infarction, HF-NI: Heart failure without infarction The proposed method was implemented on a testbed prepared using the Keras framework on a computer equipped with a GeForce GTX 1070 Ti with 8 GB RAM. Furthermore, other methods consisting of Unet[30] and ResNet-Unet[3132] were used in parallel and applied on the same data for comparison to the proposed algorithm. However, different parameters were used for each case. The specifications of our best structure are reported in Table 2 for each case.

Table 2

Specifications of the optimal structure for examined models

Model	Unet	ResNet-Unet			ROR-Unet

Learning parameters
Input size	256×256	256×256			256256
Convolution layers^#	10	57			110
Trainable parameters (million)^#	4.46	16.34			63.46
Drop out	0	0			0
Level of shortcuts	-	1			3
Max epoch	500	250			250
Mini batch size	150	150			150
Optimizer	Adadelta	Adadelta			Adadelta
Initial weights	-	ImageNet			ImageNet

Layers parameters

Feature map order	3->64 ->128 ->256 ->512 ->256 ->128 ->64 ->2	3->64 ->256 ->512 ->1024 ->512 ->256 ->128 ->64 ->2			3->32 ->64 ->128 ->64 ->32 ->2
Activation function	All layers expect last are ReLU, the last layer is softmax	All layers expect last are ReLU, the last layer is softmax			All layers expect last are ReLU, the last layer is softmax
Kernel size of convolution layers	All convolution layers are 3×3	Branch1	First convolution	7×7	Branch1	Convolution 1 in each block	3×3
			Others	3×3		Convolution 2 in each block	3×3
		Branch2	All convolutions: 3×3		Branch 2	Convolution 1 in each block	7×7
					Convolution 2 in each block	5×5
		Branch3	All convolutions: 3×3		Branch3	Convolution 1 in each block	9×9
					Convolution 2 in each block	11×11

ROR: Residual of residual

Specifications of the optimal structure for examined models ROR: Residual of residual The images were firstly processed by experts to obtain a ground truth for comparison with the automatic methods. For better understanding, some results from the proposed and alternative schemes are graphically shown in this section. However, the complete statistics of the test results are discussed in the next section. Figure 5 shows some original heart MRI images, whereas Figure 6 shows their manual segmentation results (i.e., ground truth). Besides, Figure 7 shows the results obtained by using the proposed method. It may be noted that regions of the left ventricle in these images were detected with striking similarity with their ground truth. Lower matched results are observing in Figures 8 and 9 are corresponding to Unet and ResNet-Unet algorithms, respectively. These show that the different performances may happen due to applying proposed and alternative methods, especially when the left ventricle possesses regions with different visual properties or when there is no good contrast between the left ventricle and other parts of the heart. For example, Figure 9a-i shows that ResNet-Unet method misidentified some additional components and labeled them as the left ventricle. A similar, more pronounced finding was observed in Figure 8a-i as two or three additional segments were extracted as the left ventricle. The above failures occurred in the alternative algorithms, while the proposed method could identify the left ventricle without these effects, as shown in Figure 7.

Figure 5

Some original heart magnetic resonance imaging images

Figure 6

The ground truth belonging to images shown in Figure 5

Figure 7

The results of applying the proposed method on images shown in Figure 5

Figure 8

The results of applying the Unet method on images shown in Figure 5

Figure 9

The results of applying the ResNet-Unet method on images shown in Figure 5

Some original heart magnetic resonance imaging images The ground truth belonging to images shown in Figure 5 The results of applying the proposed method on images shown in Figure 5 The results of applying the Unet method on images shown in Figure 5 The results of applying the ResNet-Unet method on images shown in Figure 5 Another inspiring example of how the proposed method was superior to its alternatives can be observed by comparing Figures 7c, 8c and 9c. Figure 7c shows that the proposed method extracted the left ventricle similar to Figure 6c. However, the alternative methods could not perform a proper segmentation without accurate detection of the left ventricle. It should be noted that all the results obtained by the proposed and alternative methods were not necessarily very different. There was a situation that both examined approaches achieved similar results. Examples of this situation are shown in Figures 7e, g and 9e, g and to some extent in Figure 8e and g, all showing that the examined methods achieved acceptable results in the segmentation of the left ventricle. The examples mentioned above highlight that the examined algorithms were delivering different performances for different types of heart images as fully quantified in the next section.

Discussion

To define the evaluation parameters, let AS and MS donate the result of automatic and ground truth segmentations, respectively. As mentioned in section 3, both AS and MS are binary images as the pixels belong to the left ventricle have label 1, and all the others have label 0. To perform a more convincing evaluation, five common metrics were used in this study. These parameters have frequently been applied in semantic segmentation, especially in the medical image segmentation.[33] These parameters include: True positive rate measures the rate of pixels labeled as left ventricle by both the automatic and the ground truth as: False positive rate (FPR), also is known as Fall-Out, shows the rate of pixels labeled as the left ventricle by an automatic algorithm that was not indicated as left ventricle by the ground truth as: Precision, measures the rate of pixels correctly labeled to all predicted pixels as left ventricle by automatic: The Intersection-Over-Union (IoU), also known as the Jaccard Index, and dice coefficient[34] computes two standard parameters for measuring the ability of heart segmentation algorithm[35] with the equations (4) and (5): The 5-fold cross-validation strategy was used for the evaluation of all methods. As shown in Table 3, the proposed method outperforms Unet in terms of Jaccard-index, precision, dice co-efficient, and FPR.

Table 3

Comparison of the evaluation indexes obtained for examined algorithms

Method	Jaccard-index	Dice	Precision	FPR	Recall
Unet
k1	0.808	0.884	0.840	0.248	0.941
k2	0.799	0.876	0.831	0.260	0.930
k3	0.818	0.894	0.837	0.262	0.958
k4	0.811	0.881	0.875	0.155	0.906
k5	0.793	0.867	0.826	0.246	0.940
Mean	0.806	0.880	0.842	0.234	0.940
ResNet-Unet
k1	0.812	0.889	0.826	0.271	0.964
k2	0.811	0.888	0.842	0.284	0.956
k3	0.820	0.894	0.831	0.268	0.968
k4	0.8261	0.900	0.845	0.244	0.964
k5	0.811	0.885	0.845	0.244	0.951
Mean	0.816	0.891	0.838	0.262	0.960
ROR-Unet
k1	0.864	0.922	0.931	0.083	0.949
k2	0.879	0.936	0.926	0.12	0.948
k3	0.871	0.938	0.928	0.097	0.944
k4	0.857	0.913	0.917	0.187	0.939
k5	0.863	0.921	0.914	0.115	0.945
Mean	0.866	0.926	0.923	0.120	0.945

ROR: Residual of residual, FPR: False-positive rate

Comparison of the evaluation indexes obtained for examined algorithms ROR: Residual of residual, FPR: False-positive rate As shown in Table 2, the proposed structure outperformed most of the examined parameters compared to its alternatives. The obtained Jaccard-index revealed that the average value gained using the proposed structure is better than those obtained by ResNet-Unet and Unet by extents of 5.1% and 6.1%, respectively. A similar trend is observed for the Dice parameter as its average value for the proposed scheme captured 3% and 4.1% higher values for ResNet-Unet and Unet, respectively. The proposed structure's superiorities against its alternatives are more significant in terms of precision and FPR indexes. The above table shows that the precision of the proposed method was 8.6% and 8.2% higher than its alternatives. In the same manner, the FPR index demonstrates that the ROR-Unet structure obtained less false detection zones in the range of 14.3% and 11.5% against ResNet-Unet and Unet, respectively. However, in terms of the Recall parameter, the result was somewhat different, despite these mentioned substantial advantages. The last column in the Table 3 indicates that the best Recall was obtained by using ResNet-Unet structures among three examined models so that its detection rate was 3.6% better than the proposed structure. This result is maybe since the proposed, and Unet methods have shown almost a similar recall. However, the above analyzed results indicate that the benefit of the proposed method in the other four parameters is far superior to its weakness in the Recall parameter compared to its alternatives. Consequently, the proposed method can still be considered more successful than detecting the boundary of the left ventricle. Some factors can justify the efficiency of the proposed process over other deep learning-based techniques. In the decoder of Unet, the feature maps generated by the encoder will be up-sampled so that they can ultimately produce a high-resolution output image with the same size as the input image. This step aims to determine the accurate location of the object in the input image (also called object localization). This requires semantic coherence between object pixels or the quality of spatial features extracted from the original image. Therefore, a higher quality of encoder feature maps results in better generating of segmentation masks. Due to the low number of convolution layers, the feature maps generated in the encoder of the original Unet and transferred to the decoder are not of high quality; thus, the performance of generating actually pixel-wise classification segmentation mask is low. The residual network structure makes it possible to add more convolution layers, thanks to the residual shortcuts. These shortcuts reinforce the feature maps in the deeper layers and effectively prevents the gradient vanishing problem. In the proposed model, adding different levels of shortcuts to the ResNet structure leads to higher controlling of the gradient vanishing problem, consequently allowing adding more convolution layers. Furthermore, in the encoder path, these shortcuts strengthen the feature maps of the deeper layers by aggregating shallow layers feature maps.

Conclusion

In this study, a new method was introduced to improve the performance of deep neural networks for detecting the left ventricle in MRI images. We developed the residual network by adding some branches and used it as the Unet encoder in the proposed structure ROR-Unet network to overcome some challenges in the deep learning paradigm, mainly include gradient vanishing and exploding. To evaluate the performance of the proposed algorithm, real cardiac MRI images were examined parallel with two existing methods (i.e., Unet and ResNet-Unet). The results were interpreted based on five well-known detection evaluating indexes consisting of Jaccard, Dice, Precision, FPR, and Recall parameters. The results clearly demonstrated a significant superiority of the proposed method over its closest alternative (between 3.5% and 11% for the first four parameters) for accurate estimation of the left ventricle boundaries. The results also showed that the above significant improvements were achieved when there was no significant drop in the fifth parameter (i.e., Recall). Based on the results achieved in this study, it may be concluded that the proposed method can be used as a suitable alternative tool for determining the left ventricle region in MRI images, helpful for the early detection of heart failure.

Financial support and sponsorship

None.

Conflicts of interest

There are no conflicts of interest.

13 in total

1. Multistage hybrid active appearance model matching: segmentation of left and right ventricles in cardiac MR images.

Authors: S C Mitchell; B P Lelieveldt; R J van der Geest; H G Bosch; J H Reiber; M Sonka
Journal: IEEE Trans Med Imaging Date: 2001-05 Impact factor: 10.048

2. Segmentation of 4D cardiac MR images using a probabilistic atlas and the EM algorithm.

Authors: Maria Lorenzo-Valdés; Gerardo I Sanchez-Ortiz; Andrew G Elkington; Raad H Mohiaddin; Daniel Rueckert
Journal: Med Image Anal Date: 2004-09 Impact factor: 8.545

3. Automated segmentation of the left ventricle in cardiac MRI.

Authors: Michael R Kaus; Jens von Berg; Jürgen Weese; Wiro Niessen; Vladimir Pekar
Journal: Med Image Anal Date: 2004-09 Impact factor: 8.545

4. The segmentation of the left ventricle of the heart from ultrasound data using deep learning architectures and derivative-based search methods.

Authors: Gustavo Carneiro; Jacinto C Nascimento; António Freitas
Journal: IEEE Trans Image Process Date: 2011-09-23 Impact factor: 10.856

5. Segmentation of cardiac cine MR images for extraction of right and left ventricular chambers.

Authors: A Goshtasby; D A Turner
Journal: IEEE Trans Med Imaging Date: 1995 Impact factor: 10.048

Review 6. A survey on deep learning in medical image analysis.

Authors: Geert Litjens; Thijs Kooi; Babak Ehteshami Bejnordi; Arnaud Arindra Adiyoso Setio; Francesco Ciompi; Mohsen Ghafoorian; Jeroen A W M van der Laak; Bram van Ginneken; Clara I Sánchez
Journal: Med Image Anal Date: 2017-07-26 Impact factor: 8.545

7. A combined deep-learning and deformable-model approach to fully automatic segmentation of the left ventricle in cardiac MRI.

Authors: M R Avendi; Arash Kheradvar; Hamid Jafarkhani
Journal: Med Image Anal Date: 2016-02-06 Impact factor: 8.545