| Literature DB >> 36135401 |
Stanislav Shimovolos1, Andrey Shushko1, Mikhail Belyaev2,3, Boris Shirokikh2,3.
Abstract
Deep learning methods provide significant assistance in analyzing coronavirus disease (COVID-19) in chest computed tomography (CT) images, including identification, severity assessment, and segmentation. Although the earlier developed methods address the lack of data and specific annotations, the current goal is to build a robust algorithm for clinical use, having a larger pool of available data. With the larger datasets, the domain shift problem arises, affecting the performance of methods on the unseen data. One of the critical sources of domain shift in CT images is the difference in reconstruction kernels used to generate images from the raw data (sinograms). In this paper, we show a decrease in the COVID-19 segmentation quality of the model trained on the smooth and tested on the sharp reconstruction kernels. Furthermore, we compare several domain adaptation approaches to tackle the problem, such as task-specific augmentation and unsupervised adversarial learning. Finally, we propose the unsupervised adaptation method, called F-Consistency, that outperforms the previous approaches. Our method exploits a set of unlabeled CT image pairs which differ only in reconstruction kernels within every pair. It enforces the similarity of the network's hidden representations (feature maps) by minimizing the mean squared error (MSE) between paired feature maps. We show our method achieving a 0.64 Dice Score on the test dataset with unseen sharp kernels, compared to the 0.56 Dice Score of the baseline model. Moreover, F-Consistency scores 0.80 Dice Score between predictions on the paired images, which almost doubles the baseline score of 0.46 and surpasses the other methods. We also show F-Consistency to better generalize on the unseen kernels and without the presence of the COVID-19 lesions than the other methods trained on unlabeled data.Entities:
Keywords: COVID-19 segmentation; chest computed tomography; convolutional neural network; domain adaptation
Year: 2022 PMID: 36135401 PMCID: PMC9503667 DOI: 10.3390/jimaging8090234
Source DB: PubMed Journal: J Imaging ISSN: 2313-433X
Figure 1Schematic representation of the proposed method, (5), and its competitors, P-Consistency (4) and DANN (6). All methods build upon the same U-Net architecture, which we train to segment the COVID-19 binary mask (1) from the axial slices of chest CT images (2). These adaptation methods use unlabeled paired data (3) to improve the model performance on the target domain. We show the flow and different usage of the paired data in different methods with green.
Summary of the segmentation datasets. Effective size means the number of annotated images after appropriate filtering.
| Dataset | Source | Effective | Kernels | Annotations | Split |
|---|---|---|---|---|---|
| COVID-train | Mosmed-1110 | 50 | unknown | COVID-19 mask | 5-fold |
| MIDRC | 112 | B/L/BONE/ | COVID-19 mask | ||
| COVID-test | Medseg-9 | 9 | unknown | COVID-19 mask, | hold-out |
Summary of the datasets with paired images.
| Dataset | Kernel Pair (Smooth/Sharp) | Training | Testing Pairs |
|---|---|---|---|
| Paired-public [ | FC07/FC55 | 22 | 0 |
| FC07/FC51 | 98 | 0 | |
| Paired-private | FC07/FC55 | 60 | 20 |
| FC07/FC51 | 30 | 11 | |
| SOFT/LUNG | 30 | 10 | |
| STANDARD/LUNG | 30 | 10 |
Comparison of all considered methods. The adaptation methods are trained using all training kernel pairs of the Paired-private dataset. F-/P-Cons stand for F-/P-Consistency, where F-Consistency is our proposed method. All results are Dice Scores in the format mean ± std calculated from 5-fold cross-validation. We highlight the best scores in every column in bold.
| COVID-Train | COVID-Test | Paired-Private Consistency | |||||
|---|---|---|---|---|---|---|---|
| FC07/55 | FC07/51 | SOFT/LUNG | STAND/LUNG | Mean | |||
| Baseline |
|
|
|
|
|
|
|
| FBPAug |
|
|
|
|
|
|
|
| DANN |
|
|
|
|
|
|
|
| P-Cons |
|
|
|
|
|
|
|
| F-Cons |
|
|
|
|
|
|
|
Figure 2Examples of axial CT slices from the COVID-test dataset with the corresponding predictions and ground truth annotations. Three columns, denoted (A–C), contains three unique slices. The top row contains the contours of the ground truth and baseline prediction. The bottom row contains the contours of the adaptation methods’ predictions. DANN and F-Consistency correspond to DANN and F-Cons from Table 3, respectively.
Figure 3Examples of CT slices from the Private-paired dataset with the corresponding predictions on the paired images. Four doublets, denoted (A–D), contain corresponding slices from the smooth and sharp images. The doublets B and D are coronal and sagittal slices, respectively. Every slice contains predictions of four methods named in the legend.
Comparison of DANN, P-Consistency, and F-Consistency generalizing to previously unseen SOFT, STANDARD, and LUNG kernels. The numbers in the brackets next to the methods correspond to the number of kernel pairs in the Paired-private dataset they are trained with, e.g., DANN (4) matches with the DANN in Table 3. All results are Dice Scores in the format mean ± std calculated from 5-fold cross-validation.
| COVID-Train | COVID-Test | Paired-Private Consistency | |||||
|---|---|---|---|---|---|---|---|
| FC07/55 | FC07/51 | SOFT/LUNG | STAND/LUNG | Mean | |||
| Baseline |
|
|
|
|
|
|
|
| FBPAug |
|
|
|
|
|
|
|
| DANN (4) |
|
|
|
|
|
|
|
| DANN (2) |
|
|
|
|
|
|
|
| P-Cons (4) |
|
|
|
|
|
|
|
| P-Cons (2) |
|
|
|
|
|
|
|
| F-Cons (4) |
|
|
|
|
|
|
|
| F-Cons (2) |
|
|
|
|
|
|
|
Comparison of all adaptation methods from Table 3 except FBPAug trained on the Public-paired dataset. All results are Dice Scores in the format mean ± std calculated from 5-fold cross-validation. We highlight the consistency scores near or below Baseline level in italic. The best consistency scores are highlighted in bold.
| COVID-Train | COVID-Test | Paired-Private Consistency | |||||
|---|---|---|---|---|---|---|---|
| FC07/55 | FC07/51 | LUNG/SOFT | LUNG/STAND | Mean | |||
| Baseline |
|
|
|
|
|
|
|
| DANN |
|
|
|
|
|
|
|
| P-Cons |
|
|
|
|
|
|
|
| F-Cons |
|
|
|
|
|
|
|
Figure 4Trade-off between the segmentation quality and consistency scores induced by the regularization parameter (Section 2.4). The blue line corresponds to Dice Scores calculated on the COVID-train dataset. The orange line corresponds to the consistency scores calculated on the Paired-private dataset. The shaded areas correspond to the standard deviation along the Y-axis.