Lei Li1, Fuping Wu2, Guang Yang3, Lingchao Xu4, Tom Wong5, Raad Mohiaddin3, David Firmin3, Jennifer Keegan3, Xiahai Zhuang6. 1. School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China; School of Data Science, Fudan University, Shanghai, China. 2. School of Data Science, Fudan University, Shanghai, China; Dept of Statistics, School of Management, Fudan University, Shanghai, China. 3. National Heart and Lung Institute, Imperial College London, London, UK; Cardiovascular Research Center, Royal Brompton Hospital, London, UK. 4. School of NAOCE, Shanghai Jiao Tong University, Shanghai, China. 5. Cardiovascular Research Center, Royal Brompton Hospital, London, UK. 6. School of Data Science, Fudan University, Shanghai, China; Fudan-Xinzailing Joint Research Center for Big Data, Fudan University, Shanghai, China. Electronic address: zxh@fudan.edu.cn.
Abstract
Late gadolinium enhancement magnetic resonance imaging (LGE MRI) appears to be a promising alternative for scar assessment in patients with atrial fibrillation (AF). Automating the quantification and analysis of atrial scars can be challenging due to the low image quality. In this work, we propose a fully automated method based on the graph-cuts framework, where the potentials of the graph are learned on a surface mesh of the left atrium (LA) using a multi-scale convolutional neural network (MS-CNN). For validation, we have included fifty-eight images with manual delineations. MS-CNN, which can efficiently incorporate both the local and global texture information of the images, has been shown to evidently improve the segmentation accuracy of the proposed graph-cuts based method. The segmentation could be further improved when the contribution between the t-link and n-link weights of the graph is balanced. The proposed method achieves a mean accuracy of 0.856 ± 0.033 and mean Dice score of 0.702 ± 0.071 for LA scar quantification. Compared to the conventional methods, which are based on the manual delineation of LA for initialization, our method is fully automatic and has demonstrated significantly better Dice score and accuracy (p < 0.01). The method is promising and can be potentially useful in diagnosis and prognosis of AF.
Late gadolinium enhancement magnetic resonance imaging (LGE MRI) appears to be a promising alternative for scar assessment in patients with atrial fibrillation (AF). Automating the quantification and analysis of atrial scars can be challenging due to the low image quality. In this work, we propose a fully automated method based on the graph-cuts framework, where the potentials of the graph are learned on a surface mesh of the left atrium (LA) using a multi-scale convolutional neural network (MS-CNN). For validation, we have included fifty-eight images with manual delineations. MS-CNN, which can efficiently incorporate both the local and global texture information of the images, has been shown to evidently improve the segmentation accuracy of the proposed graph-cuts based method. The segmentation could be further improved when the contribution between the t-link and n-link weights of the graph is balanced. The proposed method achieves a mean accuracy of 0.856 ± 0.033 and mean Dice score of 0.702 ± 0.071 for LA scar quantification. Compared to the conventional methods, which are based on the manual delineation of LA for initialization, our method is fully automatic and has demonstrated significantly better Dice score and accuracy (p < 0.01). The method is promising and can be potentially useful in diagnosis and prognosis of AF.
Atrial fibrillation (AF) is the most common arrhythmia observed in clinical practice, occurring in up to 1% of the population and rising fast with advancing age (Chugh et al., 2013). Radiofrequency catheter ablation using the pulmonary vein (PV) isolation technique has emerged as one of the most common methods for the treatment of AFpatients (Wilber, Pappone, Neuzil, De Paola, Marchlinski, Natale, Macle, Daoud, Calkins, Hall, et al., 2010, Calkins, Kuck, Cappato, Camm, Chen, Crijns, Jr, Davies, Dimarco, Edgerton, 2012). Quantification of atrial scars is potentially beneficial in selecting candidates and guiding ablation treatment. Late gadolinium enhancement magnetic resonance imaging (LGE MRI) is a promising technique to visualize and quantify the atrial scars (Vergara and Marrouche, 2011). Many clinical studies mainly focus on the location and extent of scarring areas of the left atrium (LA) myocardium (McGann, Kholmovski, Oakes, Blauer, Daccarett, Segerson, Airey, Akoum, Fish, Badger, et al., 2008, Vergara, Vijayakumar, Kholmovski, Blauer, Guttman, Gloschat, Payne, Vij, Akoum, Daccarett, et al., 2011, Badger, Daccarett, Akoum, Adjei-Poku, Burgon, Haslam, Kalvaitis, Kuppahally, Vergara, McMullen, et al., 2010).Automatic delineation of scars from LGE MRI is still challenging due to various reasons. First, the image quality of LGE MRI could be poor. Second, the prior model of scars is hard to construct on account of the various LA shapes, the thin wall (mean thickness of 1.89 ± 0.48 mm reported by Beinart et al., 2011), the surrounding enhanced regions and the complex patterns of scars in AFpatients. Fig. 1 illustrates and explains the challenges in more details. To the best of our knowledge, little work has been reported in the literature to achieve the fully automatic quantification of LA scars from LGE MRI.
Fig. 1
The challenges of automatic delineation of scars from LGE MRI: (a) two typical LGE MRIs with poor quality; (b) thin atrial walls highlighted using bright white color in the figure; (c) surrounding enhanced regions pointed out by the arrows, where (1) and (2) respectively indicate the enhanced walls of descending and ascending aorta, (3) denotes the enhanced walls of right atrium.
The challenges of automatic delineation of scars from LGE MRI: (a) two typical LGE MRIs with poor quality; (b) thin atrial walls highlighted using bright white color in the figure; (c) surrounding enhanced regions pointed out by the arrows, where (1) and (2) respectively indicate the enhanced walls of descending and ascending aorta, (3) denotes the enhanced walls of right atrium.The most widespread methods for atrial scar segmentation are mainly based on thresholding (Badger, Daccarett, Akoum, Adjei-Poku, Burgon, Haslam, Kalvaitis, Kuppahally, Vergara, McMullen, et al., 2010, Karim, Housden, Balasubramaniam, Chen, Perry, Uddin, Al-Beyatti, Palkhi, Acheampong, Obom, et al., 2013). Pontecorboli et al. (2016) provided an overall review of scar segmentation using various threshold techniques. For these methods, an appropriate threshold value is decisive, but setting this value can be subjective, eventually limiting the applicability and reproducibility. Perry et al. (2012) proposed to use k-means clustering to classify the normal and fibrosis tissue from manually segmented LA walls. Karim et al. (2014) combined the scar intensity priors and Gaussian mixture model (GMM) to construct a cost function for scar segmentation, which was achieved by an optimization using the graph-cuts framework. Yang et al. (2018) employed the super-pixel method and support vector machine (SVM) to segment the atrial scars.Most of the reported methods rely on manual segmentation of the LA or LA walls to provide an accurate initialization. In ISBI 2012 challenge (Karim et al., 2013), manual segmentation of LA was provided. There was large variance in terms of segmentation accuracy, especially for the pre-ablation cases, and the teams using manually delineated LA walls generally obtained much better performance than those using fully automatic approaches in the challenge. Their benchmark study emphasizes the importance of an accurate initialization.For LA segmentation, Ravanelli et al. (2014) proposed a method using threshold for initialization, followed by the 3D fast marching for segmentation. They required manual correction from the clinicians to achieve reliable performance. Tao et al. (2016) combined LGE MRI with another MRI sequence with better anatomical information to segment the LA, and the combined segmentation achieved better results than the method solely using LGE MRI. Xiong et al. (2018) proposed a dual fully convolutional neural network for LA segmentation from LGE MRI with promising results. Later, they organized a LA segmentation challenge in MICCAI 2018 (Zhao and Xiong, 2018). For LA wall segmentation, Veni et al. (2017) proposed an algorithm named ShapeCut, combining a shape-based system and the graph-cuts approach to make a Bayesian dual surface estimation. Ji et al. (2018) applied the advanced two-layer level set with a soft distance constraint for dual surface segmentation of LA and left ventricle (LV) walls. Their method was 2D-based and required a manual initialization of the endocardial boundaries. Karim et al. (2018) provided a benchmark dataset for LA wall segmentation. For CT data, three algorithms were evaluated, including the marker-controlled geodesic active contours, level-set, and blood pool mesh vertex normal traversal method. For MRI data, the level-set, region growing, and watershed algorithms were studied.Recently, deep-learning based algorithms have been successfully applied to the LV myocardial segmentation. Xu et al. (2018) proposed an end-to-end framework, named as OP-RNN, to segment myocardial infarction from cine cardiac MRI without contrast agents. Moccia et al. (2019) employed a fully convolutional neural network to segment scars of LV, assisted by an initialization of manually segmented myocardium. Lau et al. (2018) utilized a chained generative adversarial network, referred to as ScarGAN, to simulate scar tissues on LGE MRI of healthy patients for data augmentation.In summary, in previous studies atrial scar quantification relies on an accurate segmentation of the LA or LA walls for initialization, but automating this segmentation is still an open question. In this work, we propose a fully automatic method for LA scar quantification and analysis, without the requirement of an accurate LA segmentation.Firstly, we propose to perform scar quantification on a surface, onto which the LA endocardium is projected. We neglect the thickness of LA walls, because the clinical studies are generally performed by projecting the scars onto the LA endocardial surface for visualization (Peters, Wylie, Hauser, Kissinger, Botnar, Essebag, Josephson, Manning, 2007, Knowles, Caulfield, Cooklin, Rinaldi, Gill, Bostock, Razavi, Schaeffter, Rhode, 2010, Ravanelli, dal Piaz, Centonze, Casagranda, Marini, Del Greco, Karim, Rhode, Valentini, 2014). In this framework, we represent the surface using a graph, and formulate the classification as an energy minimization problem which can be solved by graph-cuts. We further propose to explicitly learn the edge weights of the graph, i.e., n-link and t-link potentials (Boykov and Jolly, 2001). This is achieved by a convolutional neural network (CNN), which learns features from the images (Krizhevsky et al., 2012). Here, we do not directly compute these weights solely based on the intensity similarity, as the conventional graph-cuts methods do. This is because the enhancement patterns in LGE MRI are complex and can vary greatly across different patients, leading to inconsistent intensity patterns. Also, currently the automatic methods could have a few millimeters under or over segmentation, leading to the estimated endocardial surface being misaligned to the ground truth. The proposed CNN scheme can exploit both image features and spatial context by means of neighborhood information, to provide more accurate estimation of the graph weights. We finally obtain the classification based on the graph-cuts framework, which mitigates the effect of misalignments of the endocardial surface due to automatic LA segmentation errors. Note that the graph-cuts in this work was not embedded into the network for an end-to-end training, which could be computational inefficient (please c.f. Section 4 for details).Furthermore, we propose to employ the multi-scale patch (MSP) strategy (Zhuang and Shen, 2016), and combine it with the CNN for graph potential learning. This is because distinguishing scars can be challenging solely based on local texture information, particularly in the area where the LA wall is surrounded by other enhanced regions, such as the fibrosis of mitral valve, aortic wall and right atrial (RA) wall. The MSP method is developed from the image scale space theory, which can handle information at different levels within a limited window and has been widely applied to the tasks of feature extraction, detection and image matching (Lorensen and Cline, 1987). The MSP strategy can incorporate both the local fine texture features and the global structural information into the CNN architecture. We refer to such MSP-based CNN as multi-scale CNN, i.e. MS-CNN. In addition, the MSPs are extracted with random offsets along the perpendicular direction of the LA endocardial surface, simulating the misalignments between the automatically segmented LA surface and the ground truth. Therefore, such patches not only can model the multi-scale texture patterns of the images, but also can further improve the robustness of the proposed method against the LA segmentation errors.The remainder of the paper is organized as follows: the detailed framework of the proposed algorithm is presented in Section 2. Section 3 presents the experiments and results. Discussion and conclusion are given in Section 4.
Method
Fig. 2 provides an overview of the proposed framework. First, we use a well-developed multi-atlas whole heart segmentation (MA-WHS) to obtain an initial segmentation of the LA (see Section 2.1). Then, we project the LA endocardium to generate a surface mesh, where the quantification is performed (see Section 2.2). The labeling of scars is achieved by optimizing a cost function based on the graph-cuts framework (see Section 2.3), whose potentials for edge weights are explicitly learned by the proposed MS-CNN (see Section 2.4). Note that the graph-cuts based classification is performed on the surface mesh. This can avoid the challenging segmentation of thin LA wall and can also greatly reduce the computational cost. At the same time, both the texture and anatomical features of the LA myocardium can be adequately extracted by employing the MSP strategy. Thus, the features of the nodes in the graph are represented by a set of MSPs, and the potentials are learned and predicted by the MS-CNN.
Fig. 2
Flowchart of the proposed framework for LA scar quantification and analysis.
Flowchart of the proposed framework for LA scar quantification and analysis.
Initialization of atrial endocardium and pulmonary veins
We use MA-WHS, which is based on multi-atlas segmentation (MAS), to obtain the geometrical information of the LA. This is because the LGE MRI covers the whole heart, and MA-WHS has been well developed and applied in recent years (Zhuang, Shen, 2016, Yang, Zhuang, Khan, Haldar, Nyktari, Li, Wage, Ye, Slabaugh, Mohiaddin, et al., 2018). MAS algorithm segments an unknown target image by propagating and fusing the labels from multiple annotated atlases using registration. As the LGE MRI could have relatively poor image quality, we first apply MA-WHS on the anatomical MRI (Ana-MRI), and then propagate the segmentation using affine registration from the Ana-MRI to LGE MRI. The Ana-MRI is normally acquired in the same MRI examination as LGE MRI, using the b-SSFP sequence, which generates higher quality images for atlas-based segmentation.Having finished the WHS for LA and PV delineation, the marching cubes algorithm (Lorensen and Cline, 1987) is then used to obtain a surface mesh of the LA endocardium which excludes the mitral valve. Note that the LA segmentation is generally reliable, but still contains errors leading to misalignments between the extracted surface mesh and the ground truth. For example, the mean Dice score of our MA-WHS for LA is 0.898 ± 0.044 (please c.f. Section 3.3 for details). However, the effect of inaccurate LA segmentation can be minimized by using the projection strategy and the MS-CNN learning coupled with the randomly shifted MSP sampling strategy. The reader is referred to Fig. 3 for illustration and following methodology sections for details.
Fig. 3
Pipeline of the projection and patch extraction phases: (a) target LGE MRI; (b) WHS result of LGE MRI propagated from Ana-MRI; (c) extracted LA and PV from WHS; (d) LA endocardial surface after projection; (e) patch along the normal direction of the LA endocardial surface; (f) patches with random shift in the training phase.
Pipeline of the projection and patch extraction phases: (a) target LGE MRI; (b) WHS result of LGE MRI propagated from Ana-MRI; (c) extracted LA and PV from WHS; (d) LA endocardial surface after projection; (e) patch along the normal direction of the LA endocardial surface; (f) patches with random shift in the training phase.
Projection of the atrial endocardium
We project the LA endocardium onto a surface mesh, and then the atrial scars can be classified on a graphical surface. This is because the clinical demands for scar quantification in AFpatients mainly concern the location and extent of scarring areas (Ravanelli et al., 2014). Williams et al. (2017) proposed a method to simultaneously represent multiple parameters on a surface model based on the template of an average LA mesh. By projection, both the errors due to LA wall thickness and misregistration of the WHS can be mitigated. At the same time, the computational complexity of the algorithm can be reduced dramatically.The endocardial surface is generated from the volumetric binary segmentation result of the LA cavity using the marching cubes algorithm (Lorensen and Cline, 1987). The resolution of the surface mesh is denser than the resolution of the image, which protects the small scars. The projection from the LA geometry to the surface mesh can preserve the geodesic distances between two nodes. This equidistant projection is required due to the definition of n-link weights in the proposed graph-cuts framework. In this formulation, each vertex on the surface, i.e., node of the graph, should include a profile that represents the texture information of the corresponding location in the LGE MRI. Here, we represent this profile using MSPs, which can incorporate both global structural features and local texture information.
Graph formulation for scar segmentation
Classification and quantification of scars on the LA surface can be formulated as an energy minimization problem solved via graph-cuts. The weights of the graph come from two parts, i.e., the regional term E and the boundary term E (Boykov and Jolly, 2001). The regional term encodes the intensity distributions of different classes, and the boundary term maintains the continuity between neighbors.Let denotes a graph, where indicates the set of graph nodes, and is the set of edges. Here, the weights of edges connecting graph nodes to the terminals are known as t-link weight, and the weights of edges connecting neighboring nodes are referred to as n-link weight (Boykov and Jolly, 2001). The two terminals respectively denote the scars and normal myocardium in our problem, analogous to the foreground and background of the general image segmentation task. Let be the label assigned to x, and be the label vector that defines a segmentation. The segmentation energy is defined as follows,where and are respectively the t-link and n-link weight, and λ is a balancing parameter.In conventional graph-based segmentation, the regional term is generally obtained by optimizing based on a manual defined initial model. For example, Boykov and Jolly (2001) manually selected a number of seed points to construct such model, referred to as graph cuts method, and Rother et al. (2004) manually defined a bounding box for interactive segmentation, known as GrabCut approach. The boundary term in these works was normally defined according to the dissimilarity of intensity and distance between two connected nodes. Veni et al. (2017) designed a regional term based on a generative image model incorporating both local and global shape priors. The boundary term was defined for regularizing the smoothness of the estimated surface, i.e., minimizing the squared difference of the offsets between neighboring vertices. Lu et al. (2017) estimated a regional term combining three maps, including a probability map, a thresholding map and a local appearance map. The boundary term they defined was related to the intensity difference and distance of two connected nodes.In this work, we propose to directly learn and predict the t/n-link potentials for the regional and boundary terms. This is different from the conventional means, where the profile of a graph node is commonly represented by the intensity of a single pixel or its local texture, which consists of limited information. In contrast, we combine the profile representation of graph nodes with the MSP strategy, and learn the potentials using the proposed MS-CNN. Fig. 4 illustrates the flowchart of constructing the graph.
Fig. 4
Construction of the graph and the explicit learning of the graph potentials by MS-CNN (MSPs are integrated and represented using red cuboid in this work). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Construction of the graph and the explicit learning of the graph potentials by MS-CNN (MSPs are integrated and represented using red cuboid in this work). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Explicit learning of graph potentials using MS-CNN
Fig. 4 illustrates the computation of graph potentials for the graph-cuts based classification of LA scars.
Multi-scale patch and patch extraction
We propose to extract MSPs from LGE MRI to represent the profile of the graph node, and to feed the MS-CNN for training and prediction. MSP can represent different levels of structural information at a location in an image, with low scale capturing local fine details and high scale providing global structural information of the image (Zhuang and Shen, 2016).Each graph node x has its associated MSPs, denoted as where N indicates the number of scales. These patches are extracted from the corresponding volumetric region in the LGE MRI, by back projecting the node to the position in the image. They are elongate-shaped and are defined along the normal direction of the LA endocardial surface, as Fig. 3(e) shows, and their local orientations are maximally aligned to the common world coordinate system of the LGE MRI. The multi-scale strategy is implemented by adjusting the sample spacing to generate patches with different scales, corresponding to different resolutions of the LGE MRI. We employ parallel convolutional pathways for multi-scale processing, to feed the different scale information of images to the neural network simultaneously, as Fig. 5(a) shows.
Fig. 5
The hierarchical architecture of the networks: (a) T-NET: the input is the MSPs of the target node, and the output is the predicted t-link weight; (b) N-NET: the input is the patch pair of two neighbor nodes, and the output is the predicted n-link weight; (c) Patch-NET: the input is a patch, and the output is the patch features. Note that the diagram takes 13 × 13 × 17 patch as an example.
The hierarchical architecture of the networks: (a) T-NET: the input is the MSPs of the target node, and the output is the predicted t-link weight; (b) N-NET: the input is the patch pair of two neighbor nodes, and the output is the predicted n-link weight; (c) Patch-NET: the input is a patch, and the output is the patch features. Note that the diagram takes 13 × 13 × 17 patch as an example.
Multi-scale convolutional neural network
We have two neural networks, i.e., T-NET and N-NET. T-NET learns and predicts the t-link potentials, i.e., the probabilities of a node belonging to scars and normal walls respectively, as Fig. 5(a) shows. N-NET calculates the n-link potential between two connected nodes, as Fig. 5(b) shows. The sub-network for extracting patch features in T-NET and N-NET is referred to as Patch-NET, as Fig. 5(c) shows.For training of the t-link potentials, we define a sample for each node of a graph constructed from LGE MRI. The sample is composed of the MSPs associated to the node x, and its label probability L generated from the ground truth label. As Fig. 5(a) shows, the training data of T-NET can be represented as i.e., N nodes with corresponding labels. Thus, the T-NET can be parameterized by θ as follows,where and is the estimated t-link weight.For training of the n-link, we define a sample for each pair of two neighboring nodes {x}, consisting of three elements, i.e., (1) the pair of the two sets of MSPs associated with the two nodes, i.e. (2) the geodesic distance between them, denoted as d, which is computed using the length of the shortest path along the edges connecting them, and (3) their ground truth label similarity M, defined as .As Fig. 5(b) shows, the training data of N-NET can be represented as . The distance d is viewed as an additional similarity feature, namely the labeling of two nodes can be more similar if they are closer. To this end, we design a sub-network, denoted as to extract high-level and dense features, i.e. . We then obtain a new feature vector from and as follows,Each element of can be considered as a similarity metric in the feature space. Finally, we combine and d, and feed them to another sub-network for computing the label similarity, i.e., the n-link weight. Thus, the N-NET can be parameterized by θ as follows,where is the estimated n-link weight.As Fig. 5(c) shows, in Patch-NET the convolution layers with kernel size 3 × 3 × 3 use 1 pixel stride and 1 pixel width zero padding, and the max pooling layers use a stride equal to the pooling size, i.e., 2. The number of parameters is about 7.27M in the Patch-NET, 21.83M in the T-NET, and 21.82M in the N-NET.
Training and testing strategy
In the training phase, we use weighted sampling to mitigate the problem of class imbalance in the training set, where the number of the nodes belonging to normal myocardium in a subject could be tens or even hundreds times more than that of scars. Hence, instead of extracting the patches of all nodes for training, we first count the total number of the nodes on scars and scar boundaries, and then randomly select the similar number of nodes from the background to deal with the imbalanced training problem. In addition, we add a random shift, along with the normal direction, to the center of the MSPs, to mitigate the effects from the inaccurate delineation of the LA boundaries due to over or under segmentation. This is illustrated in Fig. 3(d)–(f). This shift should be large enough to overcome potential segmentation errors, while at the same time be small enough to avoid being too distant and cannot capture the texture profile of the LA wall. We propose to assign this random value in a given range, i.e. to a node in the training phase, where γ is the shift value, - and + represent being inside and outside of the LA blood cavity, respectively. Note that this random shift strategy is not needed in the testing phase.In the testing phase, one can compute the t-link and n-link potentials of the graph, and the classification of scars on the LA surface can be achieved by embedding these estimated weights into the graph-cuts framework, i.e.,and,Note that the two normalized t-link weights of a node, respectively indicating the potentials to the foreground and background, can also be viewed as the probabilities of this node belonging to scars or normal tissues.
Experiments and results
Data acquisition and experimental setup
We collected fifty-eight post-ablation LGE MRI data from patients with longstanding persistent AF for experiments. Transverse navigator-gated 3D LGE MRI was performed on a 1.5T Siemens Magnetom Avanto scanner (Siemens Medical Systems, Erlangen, Germany), which used an inversion prepared segmented gradient echo sequence (TE/TR 2.2 ms/5.2 ms) 15 min after gadolinium administration. The LGE MRI data were acquired at resolution of (1.4-1.5) × (1.4-1.5) × 4 mm, and reconstructed to (0.7-0.75) × (0.7-0.75) × 2 mm. For each patient, prior to contrast agent administration, coronal navigator-gated 3D b-SSFP (TE/TR 1 ms/2.3 ms) data were acquired, with acquisition resolution of (1.6-1.8) × (1.6-1.8) × 3.2 mm, and reconstructed to (0.8-0.9) × (0.8-0.9) × 1.6 mm. Both LGE MRI and b-SSFP data were acquired during free breathing with respiratory motion control (Keegan et al., 2014).The available data were randomly divided into two sets, one for training (31 images) and the other for testing (27 images). T-NET was trained using stochastic gradient descent optimizer, with following hyper-parameters: momentum = 0.9, batch size = 50, weight decay = number of epochs = 15. The learning rate was initially set to 0.01, and had a stepped decay rate of 0.8 every 1000 iterations. The same configuration was used for N-NET except that its number of epochs was 10.We first evaluated the accuracy of automatic segmentation of LA in Section 3.3. Then, we performed four parameter studies to verify the effects of the parameters and explore their optimal values. In Section 3.4.1, we investigated the influence of different patch sizes to the proposed framework using the single-scale CNN, and then compared the results with that of MS-CNN. In Section 3.4.2, we studied the proposed method with different values of the balancing parameter λ. Section 3.4.3 and 3.4.4 present the studies of random shift and multi-scale learning, respectively. The optimal parameters concluded from these studies were used for the proposed method, in comparisons with other methods, in Section 3.5. Finally, Section 3.6 reports the performance of the proposed method and results of the inter-observer study.
Gold standard and evaluation
All the LGE MRIs were manually segmented by an experienced physicist specialized in cardiac MRI, to label the enhanced atrial scarring regions, which are considered as ground truth in this work. To assess the scar classification results, we generated the ground truth reference by projecting the manually segmented scars onto the LA surface. With regard to different initializations, i.e., the manual (abbreviated as LA) and automatic (abbreviated as LA) delineation of LA, two different ground truths, respectively referred to as GT and GT, were generated for evaluation. In the comparison studies in Section 3.5, the fully automatic methods were evaluated using GT, while the semi-automatic algorithms based on LA were evaluated using GT.For evaluation, we computed the statistical measures, Dice score of scars, referred to as Dice (scar), and the generalized Dice score, denoted as GDice. The statistical measures include accuracy, sensitivity and specificity. GDice is a weighted Dice score by evaluating the segmentation of all labels (Crum, Camara, Hill, 2006, Zhuang, 2013), and is formulated as follows,where and indicate the segmentation results of label k from the automatic method and manual delineation, respectively, and N is the number of labels. All the metrics are computed on the projected LA surface.
Automatic segmentation of LA and correlation analysis
To obtain an initialization of LA for scar segmentation, we developed the MA-WHS method using 30 b-SSFP MRI atlases. The 30 high resolution atlases were constructed from the Left Atrial Segmentation Challenge (STACOM 2013) (Tobon-Gomez et al., 2015). The manual delineation of LA was regarded as the gold standard for this experiment. The MA-WHS results of Ana-MRI were mapped to LGE MRI from the same subject, and then generated the initial LA labels. The average Dice score of this LA segmentation to the manual delineation was 0.898 ± 0.044.To analyze the relation between the LA segmentation error and the scar quantification accuracy by the proposed method, we plotted these two values for each of the 27 test subjects as two dimension scatter points in Fig. 6. One can see that the plot shows little direct relationship between them. We further performed linear regression, Pearson correlation and Spearman’s rank correlation. The R2 and Pearson coefficient were respectively 0.0199 and 0.1412, indicating low linear correlation between Dice (scar) and Dice (LA); and the rank correlation coefficient was 0.0110, meaning hardly monotonic relationship between them either. To conclude, the result illustrates the low correlation between the scar quantification accuracy and the LA segmentation accuracy by the proposed method.
Fig. 6
Scatter point plot for analyzing the correlation between the LA segmentation performance and scar quantification accuracy, both indicated by Dice scores. The Pearson coefficient and Spearman’s rank coefficient are respectively 0.1412 and 0.0110.
Scatter point plot for analyzing the correlation between the LA segmentation performance and scar quantification accuracy, both indicated by Dice scores. The Pearson coefficient and Spearman’s rank coefficient are respectively 0.1412 and 0.0110.
Parameter studies
Study of the patch sizes
We used one-scale MSP, namely only the original image (scale 0) was used and the CNN was a single-scale network, for studying the proposed method with different sizes of patches. The patch sizes ranged from 7 × 7 × 11 to 17 × 17 × 21 voxel, where the voxel size is 1 × 1 × 1 mm. Then, we implemented the three-scale MSP and CNN with patch size 13 × 13 × 17 voxel, for comparisons with the single-scale CNNs. The balancing parameter λ in this study was set to 0.6, and the random shift range was set to half of the patch length.Fig. 7(a) shows that the average Dice score increases dramatically at first with respect to the increased sizes of patches, then starts to converge after the patch size reaching 13 × 13 × 17 voxel. This is reasonable, as the larger size is used, the richer intensity profile is included for feature training and detection. However, the increase of patch size generally requires more complex networks, either more kernels or more convolutional layers, which increases computation load and memory requirements. This also rationalizes our proposal to use MSP and MS-CNN. As Fig. 7(a) presents, our MS-CNN obviously increases the accuracy of the classification results, thanks to the usage of the MSP strategy which incorporates both local and global information of the images. In the following experiments, we adopted this three-scale setting (except for Section 3.4.4) and patch size of 13 × 13 × 17 voxel.
Fig. 7
Dice scores of the proposed method with different parameterizations: (a) performance against different patch sizes (λ=0.6); (b) performance against different values of the balancing parameter λ to weight the t-link and n-link terms in the graph-cuts framework; (c) performance against different random shift ranges ; (d) performance against different numbers of scales N.
Dice scores of the proposed method with different parameterizations: (a) performance against different patch sizes (λ=0.6); (b) performance against different values of the balancing parameter λ to weight the t-link and n-link terms in the graph-cuts framework; (c) performance against different random shift ranges ; (d) performance against different numbers of scales N.
Study of the balancing parameter
In this study, we compared the results of the proposed scheme using different values for the balancing parameter λ, λ ∈ [0, ∞), to demonstrate the effect of graph-cuts. Here, we set the values ranging from 0 to 2. The patch strategy was as follows, number of scales was three, patch size was 13 × 13 × 17 voxel, and the random shift range was set to a maximum of 8 mm.Fig. 7(b) presents the results. One can see that the best performance in terms of Dice score is obtained when λ is set to 0.4. This indicates that the inter-node relation (n-link) is important, and the weighting between the t-link and n-link terms should be balanced to achieve optimal performance. In the following experiments, λ was set to 0.4 for the proposed method.
Study of the random shift range
To demonstrate the effect of random shift, we compared the performance of the proposed method with different random shift ranges for . Here, we set ranging from 0 to 12 mm. The patch size was 13 × 13 × 17 voxel, and λ was set to 0.4.Fig. 7(c) provides the results of this study. The best Dice score is obtained when is set to 8 mm, i.e., half of the patch length in the long-axis direction, and the performance of the proposed method deteriorates when the shift range becomes larger than 8 mm. This is rational, because the shift range should cover all the potential misalignments of the constructed surface to the ground truth. When the random shift range is greater than 8 mm, the patch may not cover the regions which include the important features for training and classification.
Study of the scales
To study the effect of multi-scale learning, we compared the results using different numbers of scales, i.e. . The patch size of MSP was set to 13 × 13 × 17 voxel, λ was set to 0.4, and the random shift range was set to a maximum of 8 mm.Fig. 7(d) presents the mean Dice scores of the method. This study demonstrates that the effectiveness of the multi-scale learning. It indicates that the more scales we used the better accuracy we obtained. It should be noted that when we tried to use more scales, the training session failed, due to the limited computation capacity of our computer.
Comparison with other methods
In this study, we implemented nine segmentation approaches, including the proposed method, for comparisons. Here, LA indicates the methods adopt the manual segmentation of LA for initialization, and LA denotes the methods employ the automatic segmentation from the MA-WHS approach described in Section 2.1.Table 1 presents all the quantitative results of the nine methods, and Fig. 8 provides their boxplots of Dice scores of scars. The proposed learning graph-cuts method, i.e. obtained evidently better scar segmentation (Dice of scars) than the conventional methods based on LA. It also performed statistically better than all the other eight methods in terms of Dice scores of scars (p < 0.01). Compared to the conventional methods, performed better than the two threshold-based methods (p < 0.001), and achieved similar results to the two MGMM-based methods (p > 0.1). Note that has a slightly better Dice (scar) than but without statistical significance (), even though the former is based on automatic segmentation of LA and the latter uses manual segmentations. When combined with the random shift strategy, and obtained evidently and statistical better Dice (scar) than the other methods (p < 0.01). For them, is generally better, but the gain is marginal, due to the fact that the graph-cuts is considered as a built-in smoothness constraint to generate less patchy results. In this study, did not obtain the best figures in sensitivity or specificity metrics. Sensitivity measures the proportion of actual scarring regions that are correctly identified, and specificity measures the proportion of actual normal wall regions that are correctly segmented. One can see the misleading of these two metrics in evaluating the performance of a method from Table 1, where and achieved the best specificity or sensitivity, though their performance was actually poor in our visual assessment.
Table 1
Summary of the quantitative evaluation results. GDice denotes the generalized Dice score. Here, the asterisk (*) in column Dice (scar) indicates the methods obtained statistically poorer (p < 0.01) results compared to the proposed . The p value of the Dice (scar) between 0 and 0 is 0.225.
Method
Accuracy
Sensitivity
Specificity
Dice (scar)
GDice
LAM+2SD
0.809 ± 0.074
0.168 ± 0.067
0.994 ± 0.005
0.275 ± 0.091*
0.758 ± 0.098
LAM+Otsu
0.763 ± 0.188
0.346 ± 0.214
0.880 ± 0.289
0.396 ± 0.090*
0.726 ± 0.207
LAM+MGMM
0.708 ± 0.160
0.781 ± 0.127
0.690 ± 0.236
0.545 ± 0.101*
0.716 ± 0.190
LAM+MGMM+GC
0.716 ± 0.162
0.799 ± 0.124
0.694 ± 0.240
0.562 ± 0.102*
0.721 ± 0.192
LAM+U-Net
0.832 ± 0.046
0.540 ± 0.149
0.920 ± 0.035
0.568 ± 0.083*
0.826 ± 0.052
LAM+MS-CNN0
0.798 ± 0.051
0.775 ± 0.099
0.805 ± 0.078
0.615 ± 0.083*
0.811 ± 0.047
LAauto+MS-CNN0
0.806 ± 0.052
0.743 ± 0.126
0.824 ± 0.088
0.631 ± 0.080*
0.814 ± 0.047
LAauto+MS-CNN
0.846 ± 0.032
0.786 ± 0.118
0.886 ± 0.057
0.692 ± 0.069*
0.851 ± 0.030
LAauto+LearnGC
0.856 ± 0.033
0.773 ± 0.132
0.883 ± 0.058
0.702 ± 0.071
0.859 ± 0.031
Fig. 8
Boxplots of the Dice scores of scar segmentation by the nine methods.
: This is one of the most widespread thresholding methods to detect atrial scars. It calculates a specific number of standard deviation (SD) above a reference value. The reference value is generally set to the mean intensity from the blood pool or LA wall. It is however generally patient-specific and slice-specific, and different numbers of SD have been used (Karim et al., 2013). In our study, we obtained the optimal performance by setting the threshold value to 2 SD above the mean intensity of LA walls. Here, we constructed the LA wall from a manual segmentation of the LA with a morphological dilation, which was also used for the following experiments when the LA wall was needed from LA.: This method uses the Otsu algorithm (Otsu, 1979) for automatic thresholding of the scarring tissues from the LA wall obtained from LA.: This method adopts the multi-component Gaussian mixture model (MGMM) for scar segmentation from the LA wall (Liu et al., 2017). MGMM can deal with the intensity heterogeneity of myocardium caused by the infarcts, and has been proven to be effective in myocardium segmentation.: This method further regularizes the spatial continuity using the graph-cuts framework, based on the result of MGMM. Here, we defined the boundary weight using the intensity difference between neighboring points, and the regional weight was computed from the posterior probability map of scars generated from MGMM.: This method employs a 2D U-Net architecture (Ronneberger et al., 2015) for scar segmentation. The U-Net is trained using a stochastic gradient descent optimizer, with following hyper-parameters: batch size=25, weight decay= number of epochs=100. The learning rate is initially set to 0.01, and has a stepped decay rate of 0.95 every 1000 iterations. The input image is a 2D slice of LGE MRI cropped into 128 × 128 centered on LA.: This learning based method only uses the two t-link weights estimated from T-NET to classify scars. The two weights, i.e. respectively linked to the foreground scar and background normal tissue, are normalized and considered as the posterior probability of the two labels. Here, both training data and test data were initialized using manually segmented LA, so the random shift in the training phase was set to zero, i.e. γ=0.: This method uses the estimated t-link weights from T-NET, similar to to classify scars. However, the LA here was automatically segmented using MA-WHS. For comparisons with here we also set the random shift to zero (γ=0).: Similarly, this method uses the estimated t-link weights from T-NET to classify scars, and the LA was automatically segmented using MA-WHS. However, in the training phase we set the random shift accordingly based on the parameter study in Section 3.4.3.: This is the proposed method in which the LA was initialized by MA-WHS and the weights of the graph were learned and predicted using MS-CNN. Here, the balancing parameter λ was set to 0.4. Noted that when
becomes .Summary of the quantitative evaluation results. GDice denotes the generalized Dice score. Here, the asterisk (*) in column Dice (scar) indicates the methods obtained statistically poorer (p < 0.01) results compared to the proposed . The p value of the Dice (scar) between 0 and 0 is 0.225.Boxplots of the Dice scores of scar segmentation by the nine methods.In addition, we chose a representative case, the median in terms of Dice (scar) from the test set by the proposed . Fig. 9 visualizes the 3D results by the nine methods. One can observe that the 3D visualization agrees well with the quantitative analysis result using Dice (scar). Though the manually segmented scars in LA and LA are projected onto two different reference surfaces, GT and GT visually appear similar when we compare the location and extent of scars. Both the two threshold algorithms, 2SD and Otsu, tended to under estimate (segment) the scars, though Otsu generally performed better. The results of and were acceptable, but the accuracy and automation needed improving. performed reasonably well, but it tended to misclassify the scars with small areas, especially around the PV. The learning-based methods, from
and to improved the performance when the new methodologies were introduced. Particularly, further reduced the noise and patchy segmentation results, and it obtained full automation and best Dice score of scar quantification.
Fig. 9
3D visualization of the LA scar classification results using the nine methods. This is the median case selected from the test set in terms of Dice score of scars by the proposed method. The scarring areas are red-colored on the LA surface mesh, which can be constructed either from LA (LA surface in white) or from LA (LA surface in light yellow). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
3D visualization of the LA scar classification results using the nine methods. This is the median case selected from the test set in terms of Dice score of scars by the proposed method. The scarring areas are red-colored on the LA surface mesh, which can be constructed either from LA (LA surface in white) or from LA (LA surface in light yellow). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Performance of the proposed method and inter-observer study
This study analyzes the performance of the proposed method in detail. To provide a reference for the quantitative evaluation metrics, we conducted a study of inter-observation variation from two manual delineations. We randomly selected ten cases from the available data, and asked two experts to manually label the scars separately. For each case, the two labelling results of scars were projected onto the LA surface. The Dice (scar), generalized Dice, and accuracy of inter-observer variation were respectively 0.695 ± 0.049, 0.868 ± 0.027 and 0.867 ± 0.026.Table 1 summarizes the quantitative evaluation results of the proposed method, i.e. . The average Dice of scar is 0.702 ± 0.071, which is comparable to the inter-observer variation (0.695 ± 0.049), and the difference is not significant (p=0.7783). This conclusion also applies when we compare them using accuracy and GDice evaluation metrics. We have repeated this experiment another 3 times, by randomly selecting 31 subjects for training and the remaining 27 for test. The mean Dice scores of the three experiments are respectively 0.703 ± 0.082, 0.695 ± 0.094 and 0.698 ± 0.083, which are generally stable.Fig. 10 provides 2D visualization of the axial view from three examples. These three cases were the first quarter, median and third quarter cases from the test set in terms of Dice (scar) by the proposed method. This illustrates that the method could provide promising performance for localizing and quantifying atrial scars of LA. In the median and third quarter cases, we highlight the errors, particularly due to the enhanced adjacent regions, pointed out by arrow (1), (2) and (3). These mis-classifications, representing the main challenges of this task, contributed to the major errors of scar quantification by the proposed method. Another type of error was caused by the misalignments of the automatic LA segmentation, as arrow (4) pointed out. This happened in some local areas where the errors occurred because of the different shapes of LA after reconstruction from the automatic segmentation. One can also see that even when it existed large LA segmentation errors, indicated by arrow (5) in Fig. 10, the proposed method still could identify the scars at the corresponding location of the projected surface. This is mainly attributed to the effective training of the MS-CNN, which assigns random shifts along the perpendicular direction of the surface when extracting the training patches. The multi-scale learning also contributes to the less demanding of accuracy from the automatic LA segmentation, thus enables to achieve fully automated LA scar quantification.
Fig. 10
Axial view of the images, the ground truth scar segmentation and the results by the proposed method. The red and white color labels represent the scar and normal wall, respectively. Arrow (1), (2) and (3) indicate the major classification errors of the proposed method caused by the surrounding enhanced regions, respectively from the right atrium wall, ascending aorta wall and descending aorta wall; arrow (4) shows an error from the misalignment between the automatic LA segmentation and the ground truth; arrow (5) illustrates that the proposed method can still perform well, even though the automatic LA segmentation contains obvious errors. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Axial view of the images, the ground truth scar segmentation and the results by the proposed method. The red and white color labels represent the scar and normal wall, respectively. Arrow (1), (2) and (3) indicate the major classification errors of the proposed method caused by the surrounding enhanced regions, respectively from the right atrium wall, ascending aorta wall and descending aorta wall; arrow (4) shows an error from the misalignment between the automatic LA segmentation and the ground truth; arrow (5) illustrates that the proposed method can still perform well, even though the automatic LA segmentation contains obvious errors. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Computational time and computer systems
The pre-processing includes LA segmentation and patch extraction. The LA segmentation, based on a multi-atlas segmentation scheme, took about 18.22 min using a Lenovo D30 Workstation with 32 cores to parallel run the image registration (Zhuang and Shen, 2016). The patch extraction took approximately 1.2 min for one training image and 4.1 min for one test image. Note that the random sample strategy in training phase reduced the time for patch extraction. The inference of the T-NET and N-NET, with a patch size of 13 × 13 × 17 pixel, required about 47 seconds to process one test image, and the average computation time for the graph-cut algorithm was about 12 seconds.
Discussion and conclusion
In this work, we have proposed a fully automatic framework for segmentation and quantification of LA scars. Two major methodological contributions have been introduced. One is the formulation of quantifying the LA scarring based on a surface mesh. The classification and quantification are achieved via the surface projection and graph-cuts framework. The other is the adoption of the multi-scale learning combined with CNN, i.e. MS-CNN. The multi-scale learning is implemented using the MSP strategy, which extracts the features from both the local and global intensity profiles of LGE MRI. The MS-CNN learns both the label probability of each node and the relations between connected nodes in the graph. The surface projection in the proposed framework avoids the difficulty of providing an accurate and demanding LA wall segmentation, and the multi-scale patch-based learning, with the random shift training strategy, further mitigates the effect of less accurate LA initialization from a fully automatic approach, as demonstrated in Section 3.4.3. We employed fifty-eight images with manual delineation for experiments. The proposed method performs better when the size of extracted patches increases, but the performance converges when the size is larger than a certain value (see Section 3.4.1). The multi-scaling learning further improves the performance compared to the method with single-scale learning, as demonstrated in Section 3.4.4. Finally, the proposed learning graph-cuts based method demonstrates evidently better performance compared to the conventional approaches, and the mean accuracy and Dice (scar) for quantifying LA scars are respectively 0.856 and 0.702, which are comparable to those of inter-observer variation (accuracy=0.867, Dice =0.695).Table 2 summarizes the related works from literature. Perry et al. (2012) evaluated their method on a dataset consisting of 34 images. The mean Dice score was 0.807 ± 0.106, and the inter-observation Dice was 0.786 ± 0.072. Their method required an accurate initialization of LA walls from manual segmentation, followed by a k-mean classification. Karim et al. (2014) employed GMM to model the enhancement of scar region, and used the graph-cuts method to consider neighbouring regions. This method used LA segmentation for initialization, which was achieved from a semi-automatic method with manual correction. They evaluated the method using numerical phantoms as well as using 15 in vivo images. They obtained more than 0.8 Dice scores on the two datasets. Ravanelli et al. (2014) adopted a threshold based approach, where the normalized voxel intensity (NVI) of LA walls was applied. The threshold value, was assigned according to previous studies and visual validation by experts, base on which they used a 2-D skeletonization algorithm to quantify the atrial fibrosis. The authors evaluated both the fully automatic method and the semi-automatic approach with manual correction. The mean Dice scores of LA scar quantification increased from 0.60 ± 0.21 to 0.85 ± 0.07 when the manual correction was included. Wu et al. (2018) proposed a fully automatic method for LA fibrosis quantification. They formulated the joint distribution of images based on the multivariate mixture model, and optimized model parameters using the iterated conditional mode algorithm. They tested the method on 36 cases and reported a mean Dice score of 0.556 ± 0.187 and average accuracy of 0.809 ± 0.150. Chen et al. (2018a) developed a multi-view two-task recursive attention model for simultaneous segmentation of LA and scars. The mean Dice score of LA segmentation was 0.908 ± 0.031, which was similar to the result (Dice=0.898 ± 0.044) from our study, though their average Dice score of scar quantification was 0.776 ± 0.146. Yang et al. (2018) employed the super-pixel algorithm and SVM to segment the scars on 37 subjects. They obtained 0.790 ± 0.050 Dice score, 0.87 segmentation accuracy, 0.89 sensitivity and 0.79 specificity by using the leave-one-out cross-validation strategy. This study yielded better Dice score than ours in this work, but there was no evident difference in terms of the accuracy, sensitivity and specificity between these two works. It should be noted that among these six works, only one, i.e., Perry et al. (2012), reported the details of inter-observer variation. Also note that it can be difficult to pursue an objective cross-study comparison due to the difference of datasets, initialization methods, and evaluation metrics.
Table 2
Overview of previous methods for scar quantification and segmentation in LA. Abbreviations: segmentation (seg); inter-observer variation in terms of Dice (Inter-ob); Society of Photo-Optical Instrumentation Engineers (SPIE), IEEE Journal of Translational Engineering in Health and Medicine (TEHM), IEEE transactions on medical imaging (TMI), Medical physics (MP), Medical Image Computing and Computer-Assisted Intervention (MICCAI).
Work
No. subjects
LA (wall) seg
Scar seg method
Result (Dice)
Inter-ob
Perry et al. (2012), SPIE
34
manual
k-means
0.807 ± 0.106
0.786 ± 0.072
Karim et al. (2014), TEHM
15
semi-auto
GMM + Graph-cuts
>0.8
N/A
Ravanelli et al. (2014), TMI
10
semi-auto
NVI + Manual correction
0.850 ± 0.070
N/A
10
auto
NVI
0.600 ± 0.210
N/A
Wu et al. (2018), MICCAI
36
auto
Multivariate mixture model
0.556 ± 0.187
N/A
Chen et al. (2018a), MICCAI
100
auto
Dilated attention network
0.776 ± 0.146
N/A
Yang et al. (2018), MP
37
auto
Super-pixels + SVM
0.790 ± 0.050
N/A
Overview of previous methods for scar quantification and segmentation in LA. Abbreviations: segmentation (seg); inter-observer variation in terms of Dice (Inter-ob); Society of Photo-Optical Instrumentation Engineers (SPIE), IEEE Journal of Translational Engineering in Health and Medicine (TEHM), IEEE transactions on medical imaging (TMI), Medical physics (MP), Medical Image Computing and Computer-Assisted Intervention (MICCAI).One of the challenges of LA scar quantification is to distinguish artifacts from the boundary regions, such as from the RA wall and aorta wall, as we discussed above and showed in Figs. 1 and 10. Conventionally, providing accurate LA walls is the crucial step (Karim, Housden, Balasubramaniam, Chen, Perry, Uddin, Al-Beyatti, Palkhi, Acheampong, Obom, et al., 2013, Perry, Morris, Burgon, McGann, MacLeod, Cates, 2012). In this work, we propose to use multi-scale deep learning technology, with specifically designed training strategy, to tackle this challenge. However, due to the limited training data, the errors caused by this problem could still happen. Secondly, the quantification of scars in our work is performed on the surface mesh projected from the LA endocardium. Karim et al. (2018) discussed the importance of wall thickness, particularly considering the potential that the ectopic activity can prevail in scars that are non-transmural. However, they also emphasized that the relationship between the AF and the changes in wall thickness was not clear, and the thickness was difficult to measure based on current MRI data. In clinical practice, the location and extent of scarring areas are considered to have greater clinical significance, which is however arduous to represent and to perform quantitative cross-subject comparisons. In the future work, visual assessment will be considered. Besides, a limitation of this work is that the gold standard was constructed from the manual segmentation of only one physicist. In the future, we can combine the delineations from multiple experts to obtain an average and consensus gold standard.A major limitation of this work is the lack of an end-to-end training scheme. Specifically, the framework is split into two sub-tasks, i.e., the MS-CNN and graph-cut, resulting in offline post-processing. Conditional random field (CRF), a probabilistic graphical model, has been broadly used in semantic segmentation to remove isolated false positives and to improve the localization of object boundaries. Chen et al. (2018b) employed a fully connected CRF as a post-processing step of deep CNN to capture fine edge details. Kamnitsas et al. (2017) used a 3D fully connected CRF network for post-processing to refine the output of a segmentation network. Zheng et al. (2015) achieve an end-to-end CRF network, known as CRFasRNN, by formulating the inference of CRF as recurrent neural networks. Additionally, the graph convolution network (Kipf and Welling, 2016) and PointNet (Qi et al., 2017) can cooperate with the neighbor information of nodes in an end-to-end style. In these studies, only the low dimensional features in the corresponding positions of the nodes are used. By contrast, the features associated with the nodes in the proposed LearnGC framework come from the image patches. The dimension of the patches could be thousands, e.g. the typical size of the patch is 13 × 13 × 17, whose dimension is 2873. Therefore, an end-to-end training scheme of our method, with integral optimization based on the whole graph, could be infeasible in practice, due to its expensive time and space complexity. A detailed discussion of this issue can be found in the supplementary material document associated with this manuscript, and an efficient end-to-end network is considered as our future work.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Authors: Catalina Tobon-Gomez; Arjan J Geers; Jochen Peters; Jurgen Weese; Karen Pinto; Rashed Karim; Mohammed Ammar; Abdelaziz Daoudi; Jan Margeta; Zulma Sandoval; Birgit Stender; Maria A Zuluaga; Julian Betancur; Nicholas Ayache; Mohammed Amine Chikh; Jean-Louis Dillenseger; B Michael Kelm; Said Mahmoudi; Sebastien Ourselin; Alexander Schlaefer; Tobias Schaeffter; Reza Razavi; Kawal S Rhode Journal: IEEE Trans Med Imaging Date: 2015-02-03 Impact factor: 10.048
Authors: Hugh Calkins; Karl Heinz Kuck; Riccardo Cappato; Josep Brugada; A John Camm; Shih-Ann Chen; Harry J G Crijns; Ralph J Damiano; D Wyn Davies; John DiMarco; James Edgerton; Kenneth Ellenbogen; Michael D Ezekowitz; David E Haines; Michel Haissaguerre; Gerhard Hindricks; Yoshito Iesaka; Warren Jackman; José Jalife; Pierre Jais; Jonathan Kalman; David Keane; Young-Hoon Kim; Paulus Kirchhof; George Klein; Hans Kottkamp; Koichiro Kumagai; Bruce D Lindsay; Moussa Mansour; Francis E Marchlinski; Patrick M McCarthy; J Lluis Mont; Fred Morady; Koonlawee Nademanee; Hiroshi Nakagawa; Andrea Natale; Stanley Nattel; Douglas L Packer; Carlo Pappone; Eric Prystowsky; Antonio Raviele; Vivek Reddy; Jeremy N Ruskin; Richard J Shemin; Hsuan-Ming Tsao; David Wilber Journal: Heart Rhythm Date: 2012-03-01 Impact factor: 6.343
Authors: Zhaohan Xiong; Vadim V Fedorov; Xiaohang Fu; Elizabeth Cheng; Rob Macleod; Jichao Zhao Journal: IEEE Trans Med Imaging Date: 2019-02 Impact factor: 10.048
Authors: Daniel Perry; Alan Morris; Nathan Burgon; Christopher McGann; Robert Macleod; Joshua Cates Journal: Proc SPIE Int Soc Opt Eng Date: 2012-02-23
Authors: Sumeet S Chugh; Rasmus Havmoeller; Kumar Narayanan; David Singh; Michiel Rienstra; Emelia J Benjamin; Richard F Gillum; Young-Hoon Kim; John H McAnulty; Zhi-Jie Zheng; Mohammad H Forouzanfar; Mohsen Naghavi; George A Mensah; Majid Ezzati; Christopher J L Murray Journal: Circulation Date: 2013-12-17 Impact factor: 29.690
Authors: David J Wilber; Carlo Pappone; Petr Neuzil; Angelo De Paola; Frank Marchlinski; Andrea Natale; Laurent Macle; Emile G Daoud; Hugh Calkins; Burr Hall; Vivek Reddy; Giuseppe Augello; Matthew R Reynolds; Chandan Vinekar; Christine Y Liu; Scott M Berry; Donald A Berry Journal: JAMA Date: 2010-01-27 Impact factor: 56.272
Authors: Jennifer Keegan; Permi Jhooti; Sonya V Babu-Narayan; Peter Drivas; Sabine Ernst; David N Firmin Journal: Magn Reson Med Date: 2014-03 Impact factor: 4.668
Authors: Giuseppe Muscogiuri; Valentina Volpato; Riccardo Cau; Mattia Chiesa; Luca Saba; Marco Guglielmo; Alberto Senatieri; Gregorio Chierchia; Gianluca Pontone; Serena Dell'Aversana; U Joseph Schoepf; Mason G Andrews; Paolo Basile; Andrea Igoren Guaricci; Paolo Marra; Denisa Muraru; Luigi P Badano; Sandro Sironi Journal: Heliyon Date: 2022-10-05