| Literature DB >> 35363415 |
Jing Wang1, Shuyu Wang1, Wei Liang2, Nan Zhang3, Yan Zhang4.
Abstract
PURPOSE: Accurate segmentation of cardiac structures on coronary CT angiography (CCTA) images is crucial for the morphological analysis, measurement, and functional evaluation. In this study, we achieve accurate automatic segmentation of cardiac structures on CCTA image by adopting an innovative deep learning method based on visual attention mechanism and transformer network, and its practical application value is discussed.Entities:
Keywords: coronary CT angiography; deep learning; self-attention; transformers; visual attention mechanism
Mesh:
Year: 2022 PMID: 35363415 PMCID: PMC9121042 DOI: 10.1002/acm2.13597
Source DB: PubMed Journal: J Appl Clin Med Phys ISSN: 1526-9914 Impact factor: 2.243
FIGURE 1Manual segmentation of cardiac structures. (a) Most of the structures with the large density difference can be segmented through automatic threshold method. (b) In the same slice, the structure with small difference in density from the surrounding can be segmented by manual anchor. (c) Slice by slice to draw until the cardiac structures were marked. (d) The segmentation results can be displayed through 3D visualization to check the accuracy of manual segmentation
FIGURE 2Overall architecture of the proposed visual saliency and transformer (VST). We use input1 and input2 as a dual‐input for tissue contrast: (a) a Convolutional neural networks (CNN) encoder for feature extraction, use a CNN encoder to extract multi‐scale features, and feed the embedded tokens to the transformer; (b) a transformer encoder for long‐range dependency modeling; (c) a CNN decoder for segmentation
FIGURE 3Compare and display the original coronary CT angiography (CCTA) image, the labeled image and the saliency map. From this picture, the addition of the saliency map improved the tissue contrast of the organs in the original CCTA image and also grasped more information about the boundaries. It improved the clarity of the CCTA image boundaries, which played a important role in the following segmentation
Ablation study on model scaling
| Dice | ||||||||
|---|---|---|---|---|---|---|---|---|
| Depth ( | Number of heads ( | LVM | LV | LA | RV | RA | AO | Average |
| 6 | 8 | 0.83 | 0.94 | 0.86 | 0.89 | 0.90 | 0.91 | 0.89 |
| 8 | 8 | 0.87 | 0.95 | 0.90 | 0.93 | 0.91 | 0.95 | 0.92 |
| 8 | 12 | 0.87 | 0.92 | 0.90 | 0.92 | 0.92 | 0.92 | 0.90 |
| 12 | 8 | 0.84 | 0.93 | 0.91 | 0.88 | 0.89 | 0.93 | 0.89 |
Abbreviations: AO, aortic; LA, left atrial; LV, left ventricular; LVM, left ventricular myocardium; RA, right atrial; RV, right ventricular.
The DSC scores of cardiac structures
| Structure | DSC ± SD |
|---|---|
| Left ventricular myocardium | 0.87 ± 0.31 |
| Left ventricular | 0.94 ± 0.22 |
| Left atrial | 0.90 ± 0.28 |
| Right ventricular | 0.92 ± 0.23 |
| Right atrial | 0.91 ± 0.34 |
| Aortic | 0.96 ± 0.14 |
| Average | 0.92 ± 0.25 |
Abbreviations: DSC, Dice similarity coefficient; SD, standard deviation.
Volume statistics results of manual and automatic segmentation of cardiac structures
| Manual segmentation | Auto segmentation | Comparison | ||||
|---|---|---|---|---|---|---|
| Structure | Range (ml) | M (Q1, Q3) | Range(ml) | M (Q1, Q3) |
|
|
| LVM | 65.4–128.8 | 86.1 (74.7, 111.4) | 66.2–129.0 | 86.8 (75.0, 111.8) | –2.084 | 0.037 |
| LV | 45.1–173.5 | 104.3 (94.2, 152.2) | 44.4–172 | 104.9 (94.2, 154.7) | –0.784 | 0.433 |
| LA | 38.4–98.9 | 61.4 (45.6, 84.1) | 39.0–100.4 | 61.6 (46.8, 84.8) | –2.041 | 0.041 |
| RV | 62.9–198.9 | 121.7 (101.4, 171.4) | 63.3–200.2 | 122.4 (103.8, 171.3) | –1.256 | 0.209 |
| RA | 51.1–114.2 | 67.0 (59.9, 85.5) | 52.6–114.6 | 67.7 (59.3, 87.2) | –1.021 | 0.307 |
| AO | 21.4–48.8 | 29.9 (26.4, 42.4) | 22.1–48.2 | 29.7 (26.3, 42.8) | –0.315 | 0.753 |
Abbreviations: AO, aortic; HD, Hausdorff distance; LA, left atrial; LV, left ventricular; LVM, left ventricular myocardium; RA, right atrial; RV, right ventricular.
Results for each network in Dice similarity coefficient
| DSC | ||||||||
|---|---|---|---|---|---|---|---|---|
| Network | LVM | LV | LA | RV | RA | AO | average | HD (mm) |
| Multi‐planar FCNs (2D) | 0.85 | 0.90 | 0.91 | 0.88 | 0.83 | 0.90 | 0.88 | 24.4 ± 11.4 |
| A pipeline of two FCNs (2D) | 0.88 | 0.91 | 0.92 | 0.90 | 0.88 | 0.93 | 0.90 | 25.2 ± 10.8 |
| Multi‐view UNet (2.5D) | 0.87 | 0.93 | 0.90 | 0.83 | 0.88 | – | – | 31.1 ± 13.2 |
| Faster RCNN and U‐net (3D) | 0.82 | 0.87 | 0.83 | 0.90 | 0.84 | 0.91 | 0.86 | – |
| 3D FCN (3D) | 0.81 | 0.90 | 0.79 | 0.81 | 0.85 | 0.72 | 0.81 | 29.0 ± 15.8 |
| 3D deeply supervised U‐Net (3D) | 0.84 | 0.89 | 0.89 | 0.81 | 0.81 | 0.87 | 0.85 | 44.9 ± 16.1 |
| R‐CNN based on SqueezeNet (3D) |
| 0.85 |
| 0.81 | 0.87 | 0.91 | 0.88 | – |
| Ours (VST) | 0.87 |
| 0.90 |
|
|
|
| 7.2 ± 2.1 |
Abbreviations: AO, aortic; CNN, convolutional neural networks; RCNN, region convolutional neural networks; DSC, Dice similarity coefficient; FCNs, fully convolutional networks; HD, Hausdorff distance; LA, left atrial; LV, left ventricular; LVM, left ventricular myocardium; RA, right atrial; RV, right ventricular. The bold values in the table represent the best performanceof each column.
Performance evaluation of proposed architectures with various normalization techniques
| DSC | |||||||
|---|---|---|---|---|---|---|---|
| Model | LVM | LV | LA | RV | RA | AO | Average |
| U‐Net | 0.86 | 0.91 | 0.82 | 0.84 | 0.81 | 0.84 | 0.85 |
| Multi‐U‐Net | 0.87 | 0.92 | 0.74 | 0.86 | 0.90 | 0.85 | 0.86 |
| Multi‐U‐Net+transformer | 0.87 | 0.92 | 0.86 | 0.87 | 0.89 | 0.93 | 0.89 |
| VST (multi‐U‐Net+transformer+dual‐input) | 0.87 | 0.94 | 0.90 | 0.92 | 0.91 | 0.96 | 0.92 |
Abbreviations: AO, aortic; DSC, Dice similarity coefficient; LA, left atrial; LV, left ventricular; LVM, left ventricular myocardium; RA, right atrial; RV, right ventricular; VST, visual saliency and transformer.
FIGURE 4Three dimensional visualization displayed the manually segmented image, visual saliency and transformer (VST) automatically segmented image, and them overlapped. The first column was the manual segmentation image, the second column was the automatic segmentation image, and the third column was the overlay display. In the overlay display image, dark red represented the manual segmentation of non overlapping pixels, while gray green represented the automatic segmentation of non overlapping pixels. On the whole, they had a high overlap