| Literature DB >> 32714943 |
Chen Chen1, Wenjia Bai2,3, Rhodri H Davies4,5, Anish N Bhuva4,5, Charlotte H Manisty4,5, Joao B Augusto4,5, James C Moon4,5, Nay Aung5,6, Aaron M Lee5,6, Mihir M Sanghvi5,6, Kenneth Fung5,6, Jose Miguel Paiva5,6, Steffen E Petersen5,6, Elena Lukaschuk7, Stefan K Piechnik7, Stefan Neubauer7, Daniel Rueckert1.
Abstract
Background: Convolutional neural network (CNN) based segmentation methods provide an efficient and automated way for clinicians to assess the structure and function of the heart in cardiac MR images. While CNNs can generally perform the segmentation tasks with high accuracy when training and test images come from the same domain (e.g., same scanner or site), their performance often degrades dramatically on images from different scanners or clinical sites.Entities:
Keywords: artificial intelligence; cardiac MR image segmentation; cardiac image analysis; deep learning; model generalization; neural network
Year: 2020 PMID: 32714943 PMCID: PMC7344224 DOI: 10.3389/fcvm.2020.00105
Source DB: PubMed Journal: Front Cardiovasc Med ISSN: 2297-055X
Related work that applies CNN-based CMR image segmentation models across multiple datasets.
| Tran ( | Yes | Yes | LV/MYO/RV separately | <200 |
| Bai et al. ( | Yes | Yes | LV+MYO+RV | <100 |
| Khened et al. ( | Yes | No | MYO | <200 |
| Our work | Yes | No | LV+MYO+RV | 699 |
General descriptions of the three datasets.
| UKBB | 4,875 | General population | 1 | 1.5 T, Aera, Siemens (100%) | in-plane resolution: 1.8 mm2 /pixel; |
| ACDC | 100 | Without cardiac disease (20%); | 1 | 1.5 T, Area, Siemens (67%) | in-plane resolution: 1.34–1.68 mm2 /pixel; |
| BSCMR-AS | 599 | Aortic stenosis | 6 | 1.5 T, Ingenia, Philips (5.2%); | in-plane resolution: 0.78–2.3 mm2; |
Figure 1(A) Overview of the network structure. Conv, Convolutional layer; BN, Batch normalization; ReLU, Rectified linear unit. The U-Net takes a batch size of N 2D CMR images as input at each iteration, learning multi-scale features through a series of convolutional layers, max-pooling operations. These features are then combined through upsampling and convolutional layers from coarse to fine scales to generate pixel-wise predictions for the four classes (background, LV, MYO, RV) on each slice. (B) Image pre-processing during training and testing.
Comparison results of segmentation performance between a baseline method and the proposed method across three test sets.
| Bai et al. ( | UKBB ( | 0.94 (0.04) | 0.88 (0.03) | 0.90 (0.05) | 0.81 (0.22) | 0.70 (0.20) | 0.68 (0.31) | 0.82 (0.21) | 0.74 (0.17) |
| Ours | UKBB ( | 0.94 (0.04) | 0.88 (0.03) | 0.90 (0.05) | 0.90 (0.10) | 0.81 (0.07) | 0.82 (0.13) | 0.89 (0.09) | 0.83 (0.07) |
Both methods were trained using the same UKBB training set. The results were evaluated on three sets. Numbers listed in the table are the means and standard deviation of Dice scores.
*The myocardium segmentation performance on the BSCMR-AS set was only evaluated on ED frames because of the lack of annotation at ES frames, whereas the performance on the other two datasets was evaluated on both ED and ES frames. For simplicity, Dice scores for the myocardium on the BSCMR-AS in the following tables were calculated in the same way without further illustration.
Figure 2Boxplots of the average Dice scores between the results of our previous work (3) and the results of the proposed method on the three datasets. For simplicity, we calculate the average Dice score over the three structures (LV, MYO, RV) for each image in the three datasets. The boxplots in orange are the results of the proposed method whereas the boxplots in blue are the results of the previous work. The green dashed line in each boxplot shows the mean value of the Dice scores for the segmentation results on one dataset.
Cross-dataset segmentation performances of four different network architectures.
| FCN-16 | 0.98 million | 0.92 (0.04) | 0.84 (0.04) | 0.88 (0.05) | 0.80 (0.20) | 0.67 (0.19) | 0.68 (0.27) | 0.84 (0.14) | 0.77 (0.11) |
| FCN-64 | 15.6 million | 0.94 (0.04) | 0.87 (0.03) | 0.89 (0.05) | 0.87 (0.12) | 0.78 (0.11) | 0.77 (0.17) | 0.85 (0.12) | 0.79 (0.10) |
| UNet-16 | 0.84 million | 0.92 (0.04) | 0.83 (0.04) | 0.87 (0.05) | 0.87 (0.12) | 0.66 (0.14) | 0.67 (0.22) | 0.85 (0.11) | 0.73 (0.11) |
| Ours (UNet-64) | 13.4 million | ||||||||
All the networks have been trained using the same UKBB training set with the proposed data normalization and augmentation strategy for 1,000 epochs. Results listed in the table are the means and standard deviation of the Dice scores evaluated on the three sets. Numbers in red denote mean Dice scores below 0.70, whereas numbers in the bold font style denote the highest mean Dice scores among the results of the four networks.
Cross-dataset segmentation performances of U-Nets with different training configurations.
| ✓ | ✓ | ✓ | ✓ | 0.923 (0.041) | 0.847 (0.038) | 0.878 (0.048) | 0.873 (0.101) | 0.744 (0.104) | 0.750 (0.187) | 0.851 (0.113) | 0.783 (0.095) |
| ✗ | ✓ | ✓ | ✓ | 0.916 (0.046) | 0.836 (0.041) | 0.864 (0.053) | 0.811 (0.179) | 0.614 (0.186) | 0.575 (0.270) | 0.798 (0.172) | 0.673 (0.162) |
| ✓ | ✗ | ✓ | ✓ | 0.922 (0.042) | 0.848 (0.038) | 0.878 (0.050) | 0.869 (0.117) | 0.733 (0.117) | 0.722 (0.210) | 0.853 (0.118) | 0.784 (0.093) |
| ✓ | ✓ | ✗ | ✓ | 0.924 (0.041) | 0.849 (0.037) | 0.881 (0.049) | 0.858 (0.115) | 0.705 (0.142) | 0.681 (0.266) | 0.862 (0.110) | 0.779 (0.092) |
| ✓ | ✓ | ✓ | ✗ | 0.921 (0.047) | 0.845 (0.039) | 0.876 (0.050) | 0.785 (0.188) | 0.640 (0.187) | 0.596 (0.279) | 0.834 (0.148) | 0.752 (0.125) |
All experiments were performed with the standard U-Net architecture: UNet-64. Each U-Net was trained using the same UKBB training set for 200 epochs to save computation. Statistics listed in the table are the means and standard deviation of the Dice scores evaluated on the three sets. Numbers in red are those mean Dice scores below 0.70.
Segmentation performance of the UKBB model across different scanners.
| BSCMR-AS | Manufactures | Philips | 142 | 0.89 (0.07) | 0.85 (0.04) | – |
| Siemens | 457 | 0.88 (0.10) | 0.83 (0.08) | – | ||
| Magnetic field strengths | 1.5T | 517 | 0.88 (0.09) | 0.83 (0.09) | – | |
| 3 T | 82 | 0.88 (0.09) | 0.84 (0.09) | – | ||
| ACDC | Magnetic field strengths | 1.5T | 65 | 0.89 (0.09) | 0.81 (0.06) | 0.80 (0.09) |
| 3 T | 29 | 0.91 (0.06) | 0.82 (0.05) | 0.80 (0.08) |
Tests were performed on the BSCMR-AS dataset and ACDC dataset. This table presents the mean and standard deviation (numbers in the brackets) of the Dice score.
Segmentation performance of the UKBB model across different sites.
| ACDC | site A | 100 | 0.91 (0.07) | 0.81 (0.08) | 0.82 (0.11) |
| BSCMR-AS | site B | 28 | 0.88 (0.09) | 0.83 (0.04) | – |
| Site C | 74 | 0.88 (0.09) | 0.83 (0.04) | – | |
| Site D | 150 | 0.89 (0.07) | 0.85 (0.04) | – | |
| Site E | 122 | 0.86 (0.11) | 0.81 (0.08) | – | |
| Site F | 64 | 0.88 (0.09) | 0.84 (0.08) | – | |
| Site G | 160 | 0.89 (0.09) | 0.85 (0.08) | – |
This table presents the mean and the standard deviation (numbers in the brackets) of Dice scores for each site.
Segmentation performance of the UKBB model across the five groups of pathological cases and normal cases (NOR).
| ACDC | NOR | 20 | 0.91 (0.05) | 0.83 (0.04) | 0.85 (0.14) |
| DCM | 20 | 0.94 (0.04) | 0.81 (0.05) | 0.82 (0.11) | |
| HCM | 20 | 0.84 (0.12) | 0.84 (0.03) | 0.84 (0.08) | |
| MINF | 20 | 0.92 (0.05) | 0.81 (0.04) | 0.78 (0.13) | |
| ARV | 20 | 0.86 (0.13) | 0.74 (0.11) | 0.79 (0.16) | |
| BSCMR-AS | AS | 599 | 0.88 (0.09) | 0.83 (0.07) | – |
This table presents the mean and standard deviation of the Dice score. Red numbers are those mean Dice scores below 0.80.
Figure 3Visualization of good segmentation examples selected from three patient groups. NOR (without cardiac disease), DCM (dilated cardiomyopathy), AS (aortic stenosis). Row 1: Ground truth (manual annotations); row 2: predicted results by the UKBB model. Each block contains a slice from ED frame and its corresponding ES one for the same subject. This figure shows that the UKBB model produced satisfying segmentation results not only on healthy subjects but also on those DCM and AS cases with abnormal cardiac morphology. The AS example in this figure is a patient with aortic stenosis who previously had a myocardial infarction. Note that this AS case is from BSCMR-AS dataset where the MYO and RV on ES frames were not annotated by experts.
Figure 4Examples of the worst cases that have pathological deformations. Row 1: Ground truth; row 2: predicted results by the UKBB model. HCM, hypertrophic cardiomyopathy; MINF, myocardial infarction with altered left ventricular ejection fraction; ARV, abnormal right ventricle. Column 1 shows that the UKBB model underestimates the myocardium in patients with HCM. Column 2 shows that the model struggles to predict the cardiac structure when certain sections of the myocardium are extremely thin. Column 3 shows a failure case where an extremely large right ventricle is shown in the image. All these images are from ACDC dataset.
Figure 5Examples of worst segmentation results found on challenging slices. Left: Image, middle: ground truth (GT), right: prediction from the UKBB model. (A) Failure to predict LV when the apical slice has a very small LV. (B) LV segmentation missing on the basal slice (ES frame). This sample is from the BSCMR-AS dataset where only the LV endocardial annotation is available. (C) Failure to recognize the LV due to a stripe of high-intensity noise around the cardiac chambers in this 1.5T image. This sample is an ES frame image from the BSCMR-AS dataset. (D) Failure to estimate the LV structure when unexpected strong dark artifacts disrupt the shape of the LV in this 3T image. Note that this image is an ED frame image from the BSCMR-AS dataset where RV was not annotated by experts.
Figure 6(A-Z) are Bland Altman plots (automatic-manual) for the three datasets. Agreement of clinical measurement from automatic and manual segmentations. Bland Altman plots (automatic-manual) are shown regarding the three sets. In each Bland-Altman plot, the x-axis denotes the average of two measurements whereas the y-axis denotes the difference between them. The solid line in red denotes the mean difference (bias) and the two dashed lines in green denote ±1.96 standard deviations from the mean. The title of each plot shows the mean difference (MD) and its standard deviation (SD) for each pair of measurements. FCN, the automatic method in our previous work (3); LV/RV, left/right ventricle; EDV/ESV, end-diastolic/systolic volume; LVM, left ventricular mass.
Spearman's rank correlation coefficients of clinical parameters derived from the automatic measurements and the manual measurements on the three sets.
| Automatic vs. Manual | UKBB ( | 0.97 | 0.91 | 0.93 | 0.96 | 0.91 |
| Automatic vs. Manual | ACDC ( | 0.97 | 0.94 | 0.96 | 0.79 | 0.83 |
| Automatic vs. Manual | BSCMR-AS ( | 0.94 | 0.92 | 0.92 | – | – |
All segmentations are produced by the U-Net trained with the UKBB training set.
Each coefficient reported in this table has a P-value below 0.0001.