| Literature DB >> 31446280 |
Xiahai Zhuang1, Lei Li2, Christian Payer3, Darko Štern4, Martin Urschler4, Mattias P Heinrich5, Julien Oster6, Chunliang Wang7, Örjan Smedby7, Cheng Bian8, Xin Yang9, Pheng-Ann Heng9, Aliasghar Mortazi10, Ulas Bagci10, Guanyu Yang11, Chenchen Sun11, Gaetan Galisot12, Jean-Yves Ramel12, Thierry Brouard12, Qianqian Tong13, Weixin Si14, Xiangyun Liao15, Guodong Zeng16, Zenglin Shi17, Guoyan Zheng16, Chengjia Wang18, Tom MacGillivray19, David Newby18, Kawal Rhode20, Sebastien Ourselin20, Raad Mohiaddin21, Jennifer Keegan21, David Firmin21, Guang Yang22.
Abstract
Knowledge of whole heart anatomy is a prerequisite for many clinical applications. Whole heart segmentation (WHS), which delineates substructures of the heart, can be very valuable for modeling and analysis of the anatomy and functions of the heart. However, automating this segmentation can be challenging due to the large variation of the heart shape, and different image qualities of the clinical data. To achieve this goal, an initial set of training data is generally needed for constructing priors or for training. Furthermore, it is difficult to perform comparisons between different methods, largely due to differences in the datasets and evaluation metrics used. This manuscript presents the methodologies and evaluation results for the WHS algorithms selected from the submissions to the Multi-Modality Whole Heart Segmentation (MM-WHS) challenge, in conjunction with MICCAI 2017. The challenge provided 120 three-dimensional cardiac images covering the whole heart, including 60 CT and 60 MRI volumes, all acquired in clinical environments with manual delineation. Ten algorithms for CT data and eleven algorithms for MRI data, submitted from twelve groups, have been evaluated. The results showed that the performance of CT WHS was generally better than that of MRI WHS. The segmentation of the substructures for different categories of patients could present different levels of challenge due to the difference in imaging and variations of heart shapes. The deep learning (DL)-based methods demonstrated great potential, though several of them reported poor results in the blinded evaluation. Their performance could vary greatly across different network structures and training strategies. The conventional algorithms, mainly based on multi-atlas segmentation, demonstrated good performance, though the accuracy and computational efficiency could be limited. The challenge, including provision of the annotated training data and the blinded evaluation for submitted algorithms on the test data, continues as an ongoing benchmarking resource via its homepage (www.sdspeople.fudan.edu.cn/zhuangxiahai/0/mmwhs/).Entities:
Keywords: Benchmark; Challenge; Multi-modality; Whole Heart Segmentation
Mesh:
Year: 2019 PMID: 31446280 PMCID: PMC6839613 DOI: 10.1016/j.media.2019.101537
Source DB: PubMed Journal: Med Image Anal ISSN: 1361-8415 Impact factor: 8.545
Fig. 1Examples of cardiac images and WHS results: (a) displays the three orthogonal views of a cardiac CT image and its corresponding WHS result, (b) shows example cardiac MRI data and the WHS result. LV: left ventricle; RV: right ventricle; LA: left atrium; RA: right atrium; Myo: myocardium of LV; AO: ascending aorta; PA: pulmonary artery.
Summary of previous WHS methods for multi-modality images. PIS: patch-based interactive segmentation; FIMH: International Conference on Functional Imaging and Modeling of the Heart; MICCAI: International Conference on Medical Image Computing and Computer-assisted Intervention; MedPhys: Medical Physics; MedIA: Medical Image Analysis; RadiotherOncol: Radiotherapy and Oncology.
| Reference | Data | Method | Runtime | Dice |
|---|---|---|---|---|
| 8 CT, 23 MRI | MAS | 60 min, 30 min | 0.89 ± 0.04, 0.91 ± 0.03 | |
| 30 CT | MAS | 13.2 min | 0.92 ± 0.02 | |
| 20 MRI | PIS + Active learning | N/A | N/A | |
| 20 CT, 20 MRI | Multi-modality MAS | 12.58 min | 0.90 ± 0.03 | |
| 31 CT | MAS | 10 min | N/A | |
| 14 CT | Gaussian filter-based | N/A | N/A |
Summary of submitted methods.
| Teams | Tasks | Key elements in methods | Teams | Tasks | Key elements in methods |
|---|---|---|---|---|---|
| GUT | CT, MRI | Two-step CNN, combined with anatomical label configurations. | UOL | MRI | MAS and discrete registration, to adapt the large shape variations. |
| KTH | CT, MRI | Multi-view U-Nets combining hierarchical shape prior. | CUHK1 | CT, MRI | 3D fully connected network (FCN) with the gradient flow optimization and Dice loss function. |
| SEU | CT | Conventional MAS-based method. | CUHK2 | CT, MRI | Hybrid loss guided FCN. |
| UCF | CT, MRI | Multi-object multi-planar CNN with an adaptive fusion method. | UT | CT, MRI | Local probabilistic atlases coupled with a topological graph. |
| SIAT | CT, MRI | 3D U-Net network learn multi-modality features. | UB2 | MRI | Multi-scale fully convolutional Dense-Nets. |
| UB1 | CT, MRI | Dilated residual networks. | UOE | CT, MRI | Two-stage concatenated U-Net. |
Teams submitted results after the challenge deadline are indicated using Asterisk (*).
Fig. 2Multi-atlas registration and label fusion with regularization proposed by Heinrich and Oster (2017).
Fig. 3A schematic illustration of the method developed by Yang et al. (2017c). Digits represent the number of feature volumes in each layer. Volume with dotted line is for concatenation.
Results of the ten evaluated algorithms on CT dataset.
| Teams | Dice | Jaccard | SD (mm) | HD (mm) | DL/MAS |
|---|---|---|---|---|---|
| GUT | DL | ||||
| KTH | 0.894 ± 0.030 | 0.810 ± 0.048 | 1.387 ± 0.516 | 31.146 ± 13.203 | DL |
| CUHK1 | 0.890 ± 0.049 | 0.805 ± 0.074 | 1.432 ± 0.590 | 29.006 ± 15.804 | DL |
| CUHK2 | 0.886 ± 0.047 | 0.798 ± 0.072 | 1.681 ± 0.593 | 41.974 ± 16.287 | DL |
| UCF | 0.879 ± 0.079 | 0.792 ± 0.106 | 1.538 ± 1.006 | 28.481 ± 11.434 | DL |
| SEU | 0.879 ± 0.023 | 0.784 ± 0.036 | 1.705 ± 0.399 | 34.129 ± 12.528 | MAS |
| SIAT | 0.849 ± 0.061 | 0.742 ± 0.086 | 1.925 ± 0.924 | 44.880 ± 16.084 | DL |
| UT | 0.838 ± 0.152 | 0.742 ± 0.161 | 4.812 ± 13.604 | 34.634 ± 12.351 | MAS |
| UB1* | 0.887 ± 0.030 | 0.798 ± 0.048 | 1.443 ± 0.302 | 55.426 ± 10.924 | DL |
| UOE* | 0.806 ± 0.159 | 0.697 ± 0.166 | 4.197 ± 7.780 | 51.922 ± 17.482 | DL |
| Average | 0.859 ± 0.108 | 0.763 ± 0.118 | 3.259 ± 9.748 | 34.382 ± 12.468 | MAS |
| 0.875 ± 0.083 | 0.784 ± 0.010 | 1.840 ± 2.963 | 38.510 ± 17.890 | DL | |
| 0.872 ± 0.087 | 0.780 ± 0.102 | 2.124 ± 5.133 | 37.684 ± 17.026 | ALL |
Results of the eleven evaluated algorithms on MRI dataset.
| Teams | Dice | Jaccard | SD (mm) | HD (mm) | DL/MAS |
|---|---|---|---|---|---|
| UOL | 0.870 ± 0.035 | 0.772 ± 0.054 | 1.700 ± 0.649 | MAS | |
| GUT | 0.863 ± 0.043 | 0.762 ± 0.064 | 1.890 ± 0.781 | 30.227 ± 14.046 | DL |
| KTH | 0.855 ± 0.069 | 0.753 ± 0.094 | 1.963 ± 1.012 | 30.201 ± 13.216 | DL |
| UCF | 0.818 ± 0.096 | 0.701 ± 0.118 | 3.040 ± 3.097 | 40.092 ± 21.119 | DL |
| UT | 0.817 ± 0.059 | 0.695 ± 0.081 | 2.420 ± 0.925 | 30.938 ± 12.190 | MAS |
| CUHK2 | 0.810 ± 0.071 | 0.687 ± 0.091 | 2.385 ± 0.944 | 33.101 ± 13.804 | DL |
| CUHK1 | 0.783 ± 0.097 | 0.653 ± 0.117 | 3.233 ± 1.783 | 44.837 ± 15.658 | DL |
| SIAT | 0.674 ± 0.182 | 0.532 ± 0.178 | 9.776 ± 6.366 | 92.889 ± 18.001 | DL |
| UB2* | 28.995 ± 13.030 | DL | |||
| UB1* | 0.869 ± 0.058 | 0.773 ± 0.079 | 1.757 ± 0.814 | 30.018 ± 14.156 | DL |
| UOE* | 0.832 ± 0.081 | 0.720 ± 0.105 | 2.472 ± 1.892 | 41.465 ± 16.758 | DL |
| Average | 0.844 ± 0.047 | 0.734 ± 0.072 | 2.060 ± 0.876 | 29.737 ± 12.771 | MAS |
| 0.820 ± 0.107 | 0.707 ± 0.127 | 3.127 ± 3.640 | 41.314 ± 24.711 | DL | |
| 0.824 ± 0.102 | 0.711 ± 0.125 | 2.933 ± 3.339 | 39.209 ± 23.435 | ALL |
Fig. 4Boxplot of Dice scores of the whole heart segmentation on CT dataset by the ten methods.
Fig. 5Boxplot of Dice scores of the whole heart segmentation on MRI dataset by the eleven methods.
Fig. 63D visualization of the WHS results of the median and worse cases in the CT test dataset by the ten evaluated methods. The color bar indicates the correspondence of substructures. Note that the colors of Myo and LV in 3D visualization do not look exactly the same as the keys in the color bar, due to the 50% transparency setting for Myo rendering and the addition effect from two colors (LV and 50% Myo) for LV rendering, respectively. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 73D visualization of the WHS results of the median and worse cases in the MRI test dataset by the eleven evaluated methods.
Details on the average run time and computer systems used for the evaluated methods. T: average run time; Proc: average run time includes the pre- and post-processing of the images for the DL-based methods.
| Teams | T (MRI) | T (CT) | Proc | GPU | CPU and RAM | Programming language |
|---|---|---|---|---|---|---|
| GUT | 21 s | 104 s | Y | GTX TITAN X; 12GB | Intel i7-4820K; 32GB | Python, C++ |
| UOL | N/A | N/A | N/A | N/A | N/A | N/A |
| KTH | 7 min | 5 min | Y | GTX1080; 8GB | Intel Xeon E5 1620; 32GB | Python, C++ |
| CUHK1 | 68.55 s | 87.38 s | N | TITAN X (PASCAL); 12GB | Intel i5-6500; 16GB | Python + TensorFlow |
| SEU | N/A | 20 min | N/A | N/A | Intel 7900X; 16G | Python + Elastix |
| CUHK2 | 66.03 s | 89.79 s | N | TITAN X (PASCAL); 12GB | Intel i5-6500; 16GB | Python + TensorFlow |
| UCF | 17 s | 50 s | N | TITAN XP; 12GB | Intel Xeon E5-2630 v3; N/A | Python + TensorFlow |
| UT | 14 min | 21 min | N/A | N/A | Intel Core i7-4600; 16GB | C++, Cli |
| SIAT | 7 s | 11 s | N | GTX TITAN X; 12GB | Intel Core i5-7640X; 32GB | Python |
| UB2* | 30 s | N/A | N | GTX 1080 Ti; 11GB | Intel(R) i7; 32GB | Python + TensorFlow |
| UB1* | 28 s | 23 s | N | GTX 1080 Ti; 11GB | Intel(R) i7; 32GB | Python + TensorFlow |
| UOE* | 0.11 s | 0.22 s | N | Telsa K80; 24GB | Intel Xeon E5-2686 v4; 64GB | Python + TensorFlow |
The inter-observer (Inter-Ob) and intra-observer (Intra-Ob) variabilities of the MRI segmentation in Dice scores (%).
| LV | Myo | RV | LA | |
|---|---|---|---|---|
| Inter-Ob | 93.7 ± 1.33 | 81.1 ± 2.90 | 90.1 ± 1.96 | 83.7 ± 4.58 |
| Intra-Ob | 94.2 ± 0.84 | 83.9 ± 1.23 | 91.2 ± 2.59 | 86.8 ± 3.23 |
| RA | AO | PA | WHS | |
| Inter-Ob | 85.8 ± 3.10 | 87.6 ± 5.24 | 76.3 ± 14.34 | 87.8 ± 1.36 |
| Intra-Ob | 87.2 ± 2.48 | 91.1 ± 1.65 | 82.6 ± 3.77 | 89.5 ± 1.03 |
The performance of each substructure and WHS on different pathologies of the MRI in Dice scores (%).
| LV | Myo | RV | LA | |
|---|---|---|---|---|
| AF | 80.4 ± 17.9 | 71.8 ± 13.4 | 71.5 ± 15.8 | 84.4 ± 9.7 |
| CHD | 85.5 ± 16.6 | 69.6 ± 18.1 | 87.5 ± 11.3 | 78.2 ± 18.4 |
| 91.2 ± 7.7 | 79.6 ± 8.2 | 88.6 ± 7.7 | 81.9 ± 1.14 | |
| RA | AO | PA | WHS | |
| AF | 84.7 ± 10.1 | 76.5 ± 18.3 | 71.4 ± 20.7 | 79.0 ± 10.3 |
| CHD | 83.5 ± 11.3 | 80.9 ± 10.3 | 67.7 ± 23.4 | 81.7 ± 12.9 |
| 81.2 ± 15.7 | 83.4 ± 13.4 | 73.4 ± 15.5 | 85.3 ± 7.2 |
Summary of the DL-based methods. The abbreviations are as follows, Dim: dimension; MS: multi-stage; E-D: encode-decode CNN; MM-train: trained on multi-modality datasets.
| Teams | Dim | MS | Network | Prior | Pre-train | MM-train |
|---|---|---|---|---|---|---|
| GUT | 3D | Y | U-Net | N | N | N |
| KTH | 2D | Y | U-Net | Y | N | N |
| CUHK1 | 3D | N | FCN | N | Y | N |
| CUHK2 | 3D | N | FCN | N | Y | N |
| UCF | 2D | N | E-D | N | N | N |
| SIAT | 3D | Y | U-Net | N | N | Y |
| UB2* | 3D | N | E-D | N | N | N |
| UB1* | 3D | N | FCN | N | N | N |
| UOE* | 3D | Y | U-Net | N | N | N |
Summary of the advantages and limitations of the twelve evaluated methods.
| Method | Strengths | Limitations |
|---|---|---|
| GUT | - Combining localization and segmentation CNNs to reduce the requirements of memory and computation time. | - Based on an automatically localized landmark in the center of the heart, the cropping of a fixed physical size ROI is required for segmentation. |
| UOL | - The discrete registration can capture large shape variations across scans. | - Only tested on the MRI data. |
| KTH | - Combining shape context information with orthogonal U-Nets for more consistent segmentation in 3-D views. | - Potential of overfitting because the U-Nets rely much on the shape context channels. |
| CUHK1 | - Pre-trained 3-D Network provides good initialization and reduces overfitting. | - The introduced hyperparameters need determining empirically. |
| UCF | - Multi-planar information reinforce the segmentation along the three orthogonal planes. | - The softmax function in the last layer could cause information loss due to class normalization. |
| CUHK2 | - Coupling the 3-D FCN with transfer learning and deep supervision mechanism to tackle potential training difficulties caused by overfitting and vanishing gradient. | - Relatively poor performance in MRI WHS. |
| SEU | - Three-step multi-atlas image registration method is lightweight for computing resources. | - Only tested on the CT data. |
| UT | - The proposed incremental segmentation method is based on local atlases and allows users to perform partial and incremental segmentation. | - The registration of MRI atlas can be inaccurate, and the evaluated segmentation accuracy is low. |
| SIAT | - Combining a 3-D U-Net with a ROI detection to alleviate the impact of surrounding tissues and reduce the computational complexity. | - Poor segmentation performance, particularly for MRI data. |
| UB1* | - The focal loss and Dice loss are well encapsulated into a complementary learning objective to segment both hard and easy classes. | - Late submission of the WHS results. |
| UB2* | - Multi-scale context and multi-scale deep supervision are employed to enhance feature learning and to alleviate the potential gradient vanishing problem during training. | - Late submission of the WHS results. |
| UOE* | - The proposed two-stage U-Net framework can directly segment the images with their original resolution. | - Late submission of the WHS results. |
Summary of the previous challenges related to cardiac segmentation from MICCAI society.
| Organizers/refernece | Year | Data | Target | Pathology |
|---|---|---|---|---|
| 2009 | 45 cine MRI | LV | hypertrophy, infarction | |
| 2011 | 200 cine MRI | LV | myocardial infarction | |
| 2013 | 60 MRI | LA scar | atrial fibrillation | |
| 2012 | 48 cine MRI | RV | congenital heart disease | |
| 2013 | 30 CT + 30 MRI | LA | atrial fibrillation | |
| 2016 | 10 CT + 10 MRI | LA wall | atrial fibrillation | |
| 2016 | 20 MRI | Blood pool, Myo | congenital heart disease | |
| 2017 | 150 cine MRI | Ventricles | infarction, dilated/ hypertrophic | |
| cardiomyopathy, abnormal RV | ||||
| 2018 | 150 LGE-MRI | LA | atrial fibrillation | |
| 2019 | 45 multi-modal MRI | Ventricles | cardiomyopathy |