Yixin Wang1, Yao Zhang1, Yang Liu1, Jiang Tian2, Cheng Zhong2, Zhongchao Shi2, Yang Zhang3, Zhiqiang He4. 1. Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, China; AI Lab, Lenovo Research, Beijing, China. 2. AI Lab, Lenovo Research, Beijing, China. 3. Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, China; Lenovo Corporate Research & Development, Lenovo Ltd., Beijing, China. Electronic address: zhangyang20@lenovo.com. 4. Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, China; Lenovo Corporate Research & Development, Lenovo Ltd., Beijing, China. Electronic address: hezq@lenovo.com.
Abstract
BACKGROUND AND OBJECTIVE: Coronavirus disease 2019 (COVID-19) is a highly contagious virus spreading all around the world. Deep learning has been adopted as an effective technique to aid COVID-19 detection and segmentation from computed tomography (CT) images. The major challenge lies in the inadequate public COVID-19 datasets. Recently, transfer learning has become a widely used technique that leverages the knowledge gained while solving one problem and applying it to a different but related problem. However, it remains unclear whether various non-COVID19 lung lesions could contribute to segmenting COVID-19 infection areas and how to better conduct this transfer procedure. This paper provides a way to understand the transferability of non-COVID19 lung lesions and a better strategy to train a robust deep learning model for COVID-19 infection segmentation. METHODS: Based on a publicly available COVID-19 CT dataset and three public non-COVID19 datasets, we evaluate four transfer learning methods using 3D U-Net as a standard encoder-decoder method. i) We introduce the multi-task learning method to get a multi-lesion pre-trained model for COVID-19 infection. ii) We propose and compare four transfer learning strategies with various performance gains and training time costs. Our proposed Hybrid-encoder Learning strategy introduces a Dedicated-encoder and an Adapted-encoder to extract COVID-19 infection features and general lung lesion features, respectively. An attention-based Selective Fusion unit is designed for dynamic feature selection and aggregation. RESULTS: Experiments show that trained with limited data, proposed Hybrid-encoder strategy based on multi-lesion pre-trained model achieves a mean DSC, NSD, Sensitivity, F1-score, Accuracy and MCC of 0.704, 0.735, 0.682, 0.707, 0.994 and 0.716, respectively, with better genetalization and lower over-fitting risks for segmenting COVID-19 infection. CONCLUSIONS: The results reveal the benefits of transferring knowledge from non-COVID19 lung lesions, and learning from multiple lung lesion datasets can extract more general features, leading to accurate and robust pre-trained models. We further show the capability of the encoder to learn feature representations of lung lesions, which improves segmentation accuracy and facilitates training convergence. In addition, our proposed Hybrid-encoder learning method incorporates transferred lung lesion features from non-COVID19 datasets effectively and achieves significant improvement. These findings promote new insights into transfer learning for COVID-19 CT image segmentation, which can also be further generalized to other medical tasks.
BACKGROUND AND OBJECTIVE:Coronavirus disease 2019 (COVID-19) is a highly contagious virus spreading all around the world. Deep learning has been adopted as an effective technique to aid COVID-19 detection and segmentation from computed tomography (CT) images. The major challenge lies in the inadequate public COVID-19 datasets. Recently, transfer learning has become a widely used technique that leverages the knowledge gained while solving one problem and applying it to a different but related problem. However, it remains unclear whether various non-COVID19 lung lesions could contribute to segmenting COVID-19infection areas and how to better conduct this transfer procedure. This paper provides a way to understand the transferability of non-COVID19 lung lesions and a better strategy to train a robust deep learning model for COVID-19infection segmentation. METHODS: Based on a publicly available COVID-19 CT dataset and three public non-COVID19 datasets, we evaluate four transfer learning methods using 3D U-Net as a standard encoder-decoder method. i) We introduce the multi-task learning method to get a multi-lesion pre-trained model for COVID-19infection. ii) We propose and compare four transfer learning strategies with various performance gains and training time costs. Our proposed Hybrid-encoder Learning strategy introduces a Dedicated-encoder and an Adapted-encoder to extract COVID-19infection features and general lung lesion features, respectively. An attention-based Selective Fusion unit is designed for dynamic feature selection and aggregation. RESULTS: Experiments show that trained with limited data, proposed Hybrid-encoder strategy based on multi-lesion pre-trained model achieves a mean DSC, NSD, Sensitivity, F1-score, Accuracy and MCC of 0.704, 0.735, 0.682, 0.707, 0.994 and 0.716, respectively, with better genetalization and lower over-fitting risks for segmenting COVID-19infection. CONCLUSIONS: The results reveal the benefits of transferring knowledge from non-COVID19 lung lesions, and learning from multiple lung lesion datasets can extract more general features, leading to accurate and robust pre-trained models. We further show the capability of the encoder to learn feature representations of lung lesions, which improves segmentation accuracy and facilitates training convergence. In addition, our proposed Hybrid-encoder learning method incorporates transferred lung lesion features from non-COVID19 datasets effectively and achieves significant improvement. These findings promote new insights into transfer learning for COVID-19 CT image segmentation, which can also be further generalized to other medical tasks.
In December 2019, the coronavirus disease 2019 (COVID-19) broke out and has become a global challenge since then. This new virus was spreading rapidly, affecting countries worldwide. Up to 30 July 2020, 16,341,920 identified cases of COVID-19 have been reported in over 216 countries and territories, resulting in 650,805 deaths. This severe disease has been declared as a Public Health Emergency of International Concern by the World Health Organization (WHO).A gold standard method to screen the COVID-19patients is the real-time reverse transcription-polymerase chain reaction(rRT-PCR) [1]. However, it reports the results within few hours to 2 days and requires repeated tests [2], [3]. Moreover, such gold standards have been proven to have a high false negative rate due to the practical issues in sample collection and transportation as well as the performance of testing kits. Inversely, computed tomography (CT) images can not only detect most of the positive ones by RT-PCR, but also detect a lot more cases, especially for patients in the early stage [4], [5]. They have also shown strong ability to capture ground glass and bilateral patchy shadows which are typical CT features in affected patients [6], [7], [8]. Thus, chest CT has been adopted as a major diagnostic modality to confirm positive COVID-19, to effectively help the diagnosis of COVID-19 and to distinguish patterns and features in patients. Given that traditional CT imaging analysis methods are time-consuming and laborious, it is of great importance to develop artificial intelligence (AI) systems to aid COVID-19 diagnosis [9].Segmentation in CT slices is a key component of the diagnostic work-up for patients with COVID-19infection in clinical practice [10]. It can provide more detailed information related to the pathology, and is better for the quantitative measurement of lesion size and the extent or severity of lung involvement, which may have prognostic implications. Therefore, many recent works are focusing on better segmentation methods of COVID-19infections from CT images [4], [11], [12], [13], [14], [15]. Recently, deep learning with CNNs has showed significant performance improvements on the automatic detection and automatic extraction of essential features from CT images, related to the diagnosis of the Covid-19. Though deep learning has made great progress in medical image segmentation [16], [17], it remains a challenging task [18] in the field of COVID-19lung infection, as existing public datasets on COVID-19 are relatively small and weakly labelled. Thus, training deep networks from scratch with inadequate data and task-specific nature such as COVID-19infection may lead to over-fitting and poor generalization.Transfer learning is an effective method to solve this problem, which helps to leverage knowledge and latent features from other datasets and avoids over-parameterization. In transfer learning, successful deep learning models such as ResNet, DenseNet and GoogLeNet have been trained on large datasets such as ImageNet. These pre-trained models have proven impressive performance on natural image downstream tasks, and even they have been used in skin disease diagnosis from photographs recently [19], [20], [21], [22], [23]. However, there exists no large-scale annotated medical image datasets as data acquisition is difficult, and high-quality annotations are expensive. Recent research [24] shows that transfer learning from natural image datasets to medical tasks produces very limited performance gain. In particular, though with large medical datasets for pre-training, transfer learning in the task of COVID-19infection segmentation is still much more difficult: 1) The shape, texture and position of COVID-19infections are in high variation. 2) Existing large medical CT datasets differ in domains with COVID-19 datasets. Thus, similar in domain, a pre-trained model from lung lesions may share more knowledge with COVID-19infection and learn some general-purpose visual representations for lung lesions.Evidence shows that larger datasets are necessarily better for pre-training and the diversity of datasets is extremely important [25]. In medical domain, pre-training from medical datasets, especially chest CT datasets tends to be more homogeneous compared to non-medical and other medical areas’ data. Thus, non-COVID19 lung lesion CT imaging manifestations may serve as potential profit for COVID-19 segmentation. Existing works have proven that multi-task training through simply fusing different lesions with COVID-19 affects model’s representation ability for COVID-19infection segmentation [26]. Therefore, with limited COVID-19 datasets, 1) whether these non-COVID19 lesions help and to what extent they can contribute to COVID-19infection segmentation? 2) How to train a better pre-trained model using these non-COVID19 datasets for transfer learning? 3) In what manners can the pre-trained models fitted on non-COVID19 lesions be effectively transferred to COVID-19?In this paper, we aim to answer the above questions which are significant for COVID-19 segmentation. To our best knowledge, this is the first study to explore the transferability of non-COVID19 datasets for COVID-19 CT images segmentation.Our contributions are as follows.We experimentally assess the extent of contributions from non-COVID19 to COVID-19infection segmentation. We found that despite the disparity between non-COVID19 lung lesion images and COVID-19infection images, pre-training on the large scale well-annotated lung lesions may still be transferred to benefit COVID-19 recognition and segmentation.We conduct extensive experiments using various non-COVID19 lung lesion datasets. Although pre-training on a single non-COVID19 dataset is unstable among different transfer strategies, learning from different non-COVID19 lesions demonstrates promising performance since such a multi-task learning process can share knowledge from different related tasks and discover common and general representations of lung lesions.We design four different transfer learning strategies with various performance gains and training time costs: Continual Learning, Body Fine-tuning, Pre-trained Lesion Representations and Hybrid-encoder Learning. The Hybrid-encoder strategy effectively combines both non-COVID19 and COVID-19 features and shows the best performance. We further conclude that it is possible to freeze the encoder and train only the decoder on COVID-19 tasks during fine-tuning, which provides significant performance gains and fast convergence speed.
Background
In this section, we review the recent related research on COVID-19 CT images, COVID-19 segmentation and transfer learning for COVID-19.
Research on COVID-19 CT Images
CT is widely used in the screening and diagnosis of COVID-19 since ground-glass opacities and bilateral patchy shadows are the most relative imaging features in pneumonia associated with infections. Recent research on CT-based diagnosis for COVID-19 has indicated great performance. Compared with traditional CT images processing, artificial intelligence (AI) serves as a core technology, enabling a more accurate and efficient solution. The machine learning-based CT radiomics models such as random forest, logistic regression showed feasibility and accuracy for predicting hospital stay in patients affected in the work of [27]. Machine learning method was also adopted by Tang et al.
[28] to realize automatic severity assessment (non-severe or severe) of COVID-19 based on chest CT images, and to explore the severity-related features from the resulting assessment model. Gozes et al.
[29] presented a system that utilized robust 2D and 3D deep learning models, modifying and adapting existing AI models and combining them with clinical understanding. Huang et al.
[30] monitored the disease progression and understood the temporal evolution of COVID-19 quantitatively using serial CT scan by an automated deep learning method.
Research on COVID-19 Segmentation
Segmentation is an essential step in AI-based image processing and analysis [10]. In particular, segmenting the regions of interests (ROIs) of COVID-19infections is crucial and helpful for doctors to make further assessment and quantification. However, manual contouring of these infections is time-consuming and tedious. Although plenty of methods have been explored on COVID-19 diagnosis and classification, there are very few works on the segmentation of COVID-19infection due to its great annotation challenges.Shan et al.
[13] developed a DL-based system for automatic segmentation and quantification of infection regions and adopted a human-in-the-loop (HITL) strategy to accelerate the manual delineation of CT training. Zheng et al.
[15] designed a weakly-supervised deep learning algorithm to investigate the potential of a deep learning-based model for automatic COVID-19 detection on chest CT volumes using the weak patient-level label. Based on semi-supervised learning, Fan et al.
[12] presented a COVID-19Lung Infection Segmentation Deep Network (Inf-Net) for CT slices based on randomly selected propagation. Yan et al.
[14] introduced a feature variation block which adaptively adjusted the global properties of the features for segmenting COVID-19infection.
Research on COVID-19 Transfer Learning
In transfer learning, deep models are first trained on large datasets such as ImageNet, then these pre-trained models are fine-tuned on different downstream tasks. Several studies [25,31,32] have investigated transfer learning methodologies on deep neural networks applied to medical image analysis tasks. Plenty of works used networks pre-trained on natural images to extract features and followed by another classifier [33,34]. Carneiro et al.
[34] replaced the fully connected layers of a pre-trained model with a new logistic layer and only trained the appended layer, yielding promising results for classification of unregistered multi-view mammograms. Other studies performed layer fine-tuning on the pre-trained networks for adapting the learned features to the target domain. In [35], CNNs were pre-trained as a feature generator for chest pathology identification. Gao et al.
[36] fine-tuned all the layers of a pre-trained CNN to classify interstitial lung diseases. Ghafoorian et al.
[37] trained a CNN on legacy MR images of brain and evaluated the performance of the domain-adapted network on the same task with images from a different domain.Due to the limited labeled COVID-19 data, several transfer learning methods have been applied to address this problem. Chouhan et al.
[38] proposed an ensemble approach of transfer learning using pre-trained models trained on ImageNet. Researchers used five different pre-trained models and analyzed their performance in chest X-ray images. In the work of [39], several deep CNNs were employed for automatic COVID-19infection detection from X-ray images through tuning parameters. Majeed et al.
[40] presented a critical analysis for 12 off-the-shelf CNN models and proposed a simple CNN architecture with a small number of parameters that performed well on distinguishing COVID-19 from normal X-rays. By combining three different models which were fine-tuned on 3 datasets, Misra et al.
[41] designed a multi-channel ensemble TL method based on ResNet-18 in such a way that the model could extract more relevant features for each class and identify COVID-19 features more accurately from the X-ray images.Even though the above studies on transfer learning present inspiring achievement for COVID-19 research, there are several limitations: 1) They only focus on the ensemble of existing CNNs, but ignore the contribution of various datasets for pre-training. 2) Their studies are limited to X-ray dataset and only dedicate to COVID-19 detection and classification. 3) They lack an in-depth study on the transferability of different transfer manners related to COVID-19infections. Our work contributes to a much more difficult task of semantic segmentation in COVID-19 CT images. We explore the transferability from the perspective of transferring knowledge from various non-COVID19 lung lesions. Moreover, we investigate better transfer methods to assist COVID-19 segmentation.
Methodology
In this section, we briefly describe our backbone network in 3.1, then introduce the multi-task learning method to get a multi-lesion pre-trained model in 3.2. We further give detailed illustration on four transfer strategies employed in our work in 3.3-3.5. An overall comparison of the four strategies is clearly presented in Table 1
.
Table 1
General comparison of the four transfer learning strategies.
General comparison of the four transfer learning strategies.
Encoder-Decoder Network
The U-Net is a commonly used network for medical semantic segmentation. As an advanced architecture, it has a U-shape like structure with an encoding and a decoding signal path. The encoder serves as a contraction to capture semantically image contextual features. The decoder is a symmetric expanding path recovering spatial information. The two paths are connected using skip connections on each same level, which recombine with essential high-resolution features from the encoding path. In this work, to better explore the transferability of COVID-19 segmentation, we build a strong 3D U-Net network as the baseline following nnU-Net [42], which has surpassed most existing approaches on 23 public datasets in segmentation tasks. Instead of complex architecture, nnU-Net directly builds around the original U-Net architecture and automatically adapts itself to the specifics of COVID-19 dataset. Therefore, it’s much more convenient to be re-implemented and adopted as a basis to explore transferability. Original batch normalization and ReLU are replaced by instance normalization and leaky ReLU. What’s more, deep supervision loss [43] is aggregated to obtain multi-level deep supervision and facilitate the training process.
Pre-trained Multi-lesion Learning
Multi-task learning is an effective method to share knowledge among different related tasks. As for segmentation tasks, the performance of each task highly depends on the similarity among these tasks. Due to the large domain distance among those existing non-COVID19 lung lesion datasets, fusing these lesions to train a multi-task segmentation model tends to underperform on each single task. However, this multi-task training can exploit the shared knowledge which is essential for learning some general-purpose visual representations about lung lesions. Therefore, in our work, besides separately training segmentation models on each non-COVID19 dataset as pre-trained models for transfer learning, we provide a multi-lesion model learning from multiple lung lesions. Compared with learning from separate tasks, this multi-task strategy leads to a more robust pre-trained model across all lesion tasks and empowers downstream COVID-19 task.
Continual Learning
Continual Learning (CL) aims to learn from an endless stream of tasks [44]. It is built on the idea of learning continuously and adaptively about the external world and enabling the autonomous incremental learning of more complex skills and knowledge. This paradigm is capable of learning consecutive tasks without forgetting how to perform previously trained tasks. This is challenging that the training process tends to lose knowledge from the previous tasks due to the information relevant to current tasks. To avoid this, we adopt a training schedule to pre-train the model.During pre-training the upstream task, the model is trained with a high initial learning rate, allowing the network to obtain optimal weights. The value of the initial learning rate is set as 0.01 and decays throughout the training process following the ‘poly’ learning rate policy . When the model is trained to convergence, the learning rate becomes a much smaller value, which is further set as the initial learning rate during training downstream COVID-19 task to prevent significant changes in its network parameters.In the second downstream phase of COVID-19infections, with the weights of the pre-trained model and the small learning rate as a start point, the model is trained following the same decay policy. In this way, the learning rate is decreasing continuously so that the weight parameters after training non-COVID19 lesion tasks tend to follow its training process while slightly being updated by the current COVID-19infection task. This continual learning strategy is expected to be able to smoothly update the prediction model to take into account new tasks and data distributions (COVID-19) but still being able to re-use and retain useful knowledge and skills in the pre-trained model (non-COVID19).
Fine-tuning
Fine-tuning is the most standard strategy to transfer knowledge from domains where data are abundant. In general, it is conducted by copying the weights from a pre-trained network and tuning them on the downstream task. Recent work [45] shows that fine-tuning can enjoy better performance on small datasets. It is of additional interest to assess the contribution of the encoder and decoder relative to learning COVID-19 knowledge. While the progressive reinitializatons demonstrate the incremental effect of each layers, it is unnecessary to prob the extent of localized reinitialization because our encoder-decoder network is not very deep. Therefore, we adopt two strategies for tuning a non-COVID19 lesion model on the COVID-19 downstream task.
Body Fine-tuning
Due to the large domain difference between pre-trained tasks and downstream tasks, the most secured fine-tuning method is Body Fine-tuning, which means all parameters of the pre-trained models are used as the initial values to complete the training process of the model. When we train COVID-19infection networks, all of the parameters are assigned initially from the pre-trained models on non-COVID19 lesions. In this fine-tuning strategy, the update of parameters largely depends on COVID-19infection training process itself. Thus, it is a conservative fine-tuning approach that the training task of COVID-19infection is not affected by the upstream pre-trained non-COVID19 models too much.
Pre-trained Lesion Representations
Our segmentation model is an encoder-decoder architecture and the encoder serves as a series of convolution operations to encode image features into context representations. These representations are trained on large relative datasets from upstream tasks, and fed as features to downstream ones. In natural language processing tasks, features extracted from internal representations of sequence language models are encoded as pre-trained text representations [44,45]. In this fine-tuning strategy, we aim to learn general lung lesion features, which we call Pre-trained Lesion Representations. We first train models on non-COVID19 lung lesion datasets. The encoders of these models are capable of encoding lesion features. In other words, the encoders’ parameters can exhibit some transferability. Thus, in the following fine-tuning process on COVID-19 dataset, we preserve and freeze them while only fine-tuning and re-training the decoding parts.
Hybrid-encoder Learning
During performing the above fine-tuning strategies, we face two challenges: 1) Body Fine-tuning easily falls into over-fitting because the downstream COVID-19infection dataset is much smaller. 2) Utilizing the encoder of pre-trained models to capture feature representation is unstable, because the label spaces and losses for the upstream and downstream tasks differ inevitably. For example, though they are all lung lesions, they differ in appearance and shape. Therefore, we present a new transfer learning strategy for COVID-19, which incorporates three key properties:It leverages the transferred knowledge from non-COVID19 lung lesions.It gains stable performance improvement in both training from scratch and transfer learning methods.It shows no obvious training time increase.To achieve these properties, we propose a Hybrid-encoder architecture. As shown in Fig. 1
, we enhance the standard U-Net network by equipping with two encoders with the same architecture: Dedicated-encoder and Adapted-encoder. Furthermore, an attention-based Selective Fusion unit is developed to aggregate information from both of the encoders by determining two sets of learnable weights.
Fig. 1
An overview of Hybrid-encoder transfer learning strategy. The U-Net network consists of a parallel encoder and a shared decoder. Features from Dedicated-encoder and Adapted-encoder are aggregated via a Selective Fusion unit. For simplicity the figure shows 2D images.
An overview of Hybrid-encoder transfer learning strategy. The U-Net network consists of a parallel encoder and a shared decoder. Features from Dedicated-encoder and Adapted-encoder are aggregated via a Selective Fusion unit. For simplicity the figure shows 2D images.
Dedicated-encoder
The Dedicated-encoder is a task-specific feature extractor, focusing on segmenting COVID-19infection through reinitializing all the parameters. These parameters are all COVID19-specific which enable continuing update to the network.
Adapted-encoder
The Adapted-encoder is an auxiliary feature extractor, aiming to learn general lung lesion features. Based on Pre-trained Multi-lesion learning in 3.2, we pre-train our 3D U-Net network’s encoder to obtain dense representations of general lung lesions. When being transferred to target COVID-19 task, this pre-trained encoder serves as an adapted-encoder by totally freezing its pre-trained parameters as .Given a COVID-19 input volume as the sample it is passed through the above paralleled encoders. Each encoder follows the same 3D U-Net, which consists of a number of stacked convolution layers and pooling layers. More specifically, let be the output of layer of Dedicated-encoder and Adapted-encoder, respectively. These vectors can be obtained from the output of the previous layer and by a mapping :where represents the weight matrix, denotes convoluton operation, and represent instance normalization and leaky ReLU, respectively.At the end of the encoding phase, both Dedicated-encoder and Adapted-encoder representations with channels containing rich semantic information are learned separately from COVID-19 and non-COVID19 datasets, denoted as and and the two encoders are parameterized by and respectively.
Selective Fusion
Inspired by [46], we design a Selective Fusion operation to combine and aggregate the information from both encoders to obtain a global and comprehensive representation for decoding phase. To achieve this, we first fuse and using element-wise summation operation as follows:We then apply global average pooling (gp) to shrink through its 3D spatial dimensions . The element of channel-wise statistics is calculated by:The output can be interpreted as integrated local descriptors to provide selection guidance. In order to exploit channel-wise dependencies, we conduct fully connection (fc) via to reduce the dimension to with a reduction ratio :To adaptively select different information from two encoders, we utilize softmax operation to obtain soft-attention across channels by:where and represent the soft attention vectors for and respectively. Through applying these soft-attentions using Eqn. 6, features from these two paralleled encoders are dynamic selected as incorporated COVID-19 feature representations and then fed into the decoding part.
Experiments
In this section, we will describe in detail the datasets, experimental setup and results of our investigation.
Dataset Introduction
COVID-19 Dataset
This dataset is released by Coronacases Initiative and Radiopaedia 1
. It is a publicly available COVID-19 volume dataset which contains 20 COVID-19 CT scans. In the work of [26,47], left lung, right lung, and infections are well-labelled by two radiologists and verified by an experienced radiologist. Thus, with over 1800 annotated slices, this dataset serves as a downstream COVID-19infection segmentation dataset for transfer learning in our work.
Non-COVID19 Lung Lesion Datasets
In order to better explore the transferability from various non-COVID19 lung lesions to COVID-19infections, the following relationships need to be satisfied among these datasets: 1) The size of different lesion datasets are similar. 2) The shape, size and location of different lesion areas are relatively distinguishing. Therefore, in this paper, we introduce three public datasets.MSD Lung Tumor This dataset was used in a crowd-sourced challenge of generalized semantic segmentation algorithms called the Medical Segmentation Decathlon (MSD) held in MICCAI 2018 2
. This dataset includes patients with non-small cell lung cancer from Stanford University (Palo Alto, CA, USA) publicly available through TCIA and previously utilized to create a radiogenomic signature [48,49,50]. The tumor regions are denoted by an expert thoracic radiologist on a representative CT cross section using OsiriX [51]. 63 3D CT scans with corresponding tumor segmentation masks are utilized in our paper.StructSeg Lung Cancer StructSeg organ dataset is a collection of 3D organ CT scans along with their segmentation ground-truth from 2019 MICCAI challenge 3
. This dataset contains two types of cancers, nasopharyngeal cancer and lung cancer. We adopt the gross target volume segmentation of lung cancer from 50 patients. Each CT scan is annotated by an experienced oncologist and verified by another one.NSCLC Pleural Effusion This dataset is developed from a Non-Small Cell Lung Cancer (NSCLC) cohort of 211 subjects 4
. 78 cases with pleural effusion are selected with their segmentation masks.
Experimental Settings
All the experiments are implemented in Pytorch and trained on NVIDIA Tesla V100 32GB GPU. For fair comparison, we follow the settings on COVID-19 dataset benchmarks in [26].For the COVID-19 dataset, we use 5-fold cross validation based on a pre-defined dataset split file. Each fold contains 4 scans (20) for training and 16 (80) for testing. Training fewer data is more suitable for exploring the contribution of transfer learning.For non-COVID19 lung lesion datasets, we all randomly select 80 of the data for training and the rest of 20 for validation. Pre-trained models based on these non-COVID19 lesions are all trained from scratch with random initial parameters using 3D U-Net network. Due to the limited number of different lesion cases, we do data pre-processing following nnU-Net [42]. The input patch size is set as 50192 and batch size as 2, which should be chosen carefully [52]. Stochastic gradient descent optimizer with an initial learning rate of 0.01 and a nesterov momentum of 0.99 are used for non-COVID19 pre-training. Reduction ratio is set as 16, following [46]. We adopt a summation of Dice loss and Cross entropy loss as loss function.
Evaluation Metrics
Diagnostic evaluation is often used in clinical practice for disease diagnosis, patient follow-up or efficacy monitoring. Whether the results of a certain diagnostic evaluation are true, reliable and practical, will largely determine a reasonable medical decision. In this work, we introduce six evaluation metrics for the exploration of transferability.Dice similarity coefficient (DSC) measures volumetric overlap between segmentation results and annotations. It is computed by:where A is the sets of foreground voxels in the annotation and B is the corresponding sets of foreground voxels in the segmentation result, respectively.Normalized surface distance (NSD) [53] serves as a distance-based measurement to assess performance. It is computed by:where and represent the borders of ground-truth and segmentation masks which use a threshold to tolerate the inter-rater variability of the annotators. We set for COVID-19infection. In contrast to the DSC, which measures the overlap of volumes, the NSD measures the overlap of two surfaces.We also consider four other evaluation metrics. Accuracy denotes the correct rate for both positive and negative predictions. Sensitivity shows the percentage of positive instances correctly identified positive. F1-score is the weighted harmonic average of Precision and Sensitivity which is an effective and comprehensive evaluation. MCC (Matthews correlation coefficient) is a more reliable statistical rate which produces a high score only if the prediction obtained good results in all of the four confusion matrix categories (true positive, false positive, true negative, and false negative). Therefore, MCC is well-suited for our experiments on highly unbalanced binary label classes.
Analysis of Different Non-COVID19 Pre-trained Models
All of the four pre-trained models (MSD, StructSeg, NSCLC, Multi-lesion) are utilized to investigate the ability of transfer learning to COVID-19 datasets. With extensive experiments in Table 2
, we observe that different pre-trained models show different transferability. The best scores of models corresponding to each transfer learning strategies are highlighted along with the detailed comparative analysis of 5 validation folds.
Table 2
Results of 5-fold cross validation of different transfer learning strategies under different non-COVID19 lung lesion pre-trained models. The best results are shown in red font.
Results of 5-fold cross validation of different transfer learning strategies under different non-COVID19 lung lesion pre-trained models. The best results are shown in red font.
Single-lesion Pre-trained Model
Among pre-trained models on each single lesion (MSD, StructSeg, NSCLC), compared with training COVID-19 datasets from scratch, MSD tumor pre-trained model improves the segmentation by 3.0 DSC and 3.2 NSD at most. Meanwhile, StructSeg and NSCLC lung lesion pre-trained models show instability under different transfer strategies. As shown in Table 2, with Continual Learning and Body Fine-tuning strategies, StructSeg and NSCLC pre-trained models can achieve promising improvement in most folds, but instead, they obtain lower performance with the strategy of Pre-trained Lesion Representations. The rationale is that there exists a large domain distance between these two lesions and COVID-19infection. The strategy of Pre-trained Lesion Representations totally uses encoded non-COVID19 lesion features. Thus, the transferability of a single-lesion dataset model largely depends on their domain difference.
Multi-lesion Pre-trained Model
As shown in Table 2, we further notice that among all transfer learning strategies, multi-lesion pre-trained model not only achieves a high percentage for DSC and NSD values, but also performs the most stably among all pre-trained models, with the average DSC value of 0.696, 0.696, 0.693, 0.704, respectively. This shows the robustness of a multi-lesion pre-trained model used for transfer learning to COVID-19infection. Table 3
further verifies this demonstration using Sensitivity, F1-score, Accuracy and MCC concerned with different transfer strategies. It is observed that transferring from multi-lesion pre-trained model outperforms training from scratch on all these evaluation metrics among all strategies. Significantly, when it comes to Pre-trained Lesion Representations, where the encoder of pre-trained models is totally frozen and serves as a non-COVID19 lesion features extractor, the multi-lesion pre-trained model can still perform well. This confirms the effectiveness of multi-task training for multiple lung lesions, which generates more robust and general-purpose representations to help COVID-19infection tasks. Fig. 2
(a)-(e) show some examples of segmentation results of the above pre-trained models. It is clear that compared with training from scratch, pre-training from single non-COVID19 models can obtain more accurate massive structures of COVID-19 but shows dissatisfying instability. However, pre-training from multi-lesion model shows high precision and smooth boundary like manually annotated.
Table 3
Results of Average Sensitivity, F1-score, Accuracy and MCC of different transfer learning strategies under different non-COVID19 lung lesion pre-trianed models. The best results are shown in red font.
Fig. 2
Visual comparison of COVID-19 infection segmentation results. Training from scratch means using no pre-trained models. MSD, StructSeg, NSCLC and Multi-lesion mean using pre-trained models from corresponding non-COVID19 lesion datasets.
Results of Average Sensitivity, F1-score, Accuracy and MCC of different transfer learning strategies under different non-COVID19 lung lesion pre-trianed models. The best results are shown in red font.Visual comparison of COVID-19infection segmentation results. Training from scratch means using no pre-trained models. MSD, StructSeg, NSCLC and Multi-lesion mean using pre-trained models from corresponding non-COVID19 lesion datasets.
Analysis of Different Transfer Learning Strategies
Based on the above conclusion that multi-lesion pre-trained model brings consistently higher accuracy and robustness, we further analyze the performance of different transfer strategies adopted in this paper.Table 4
shows the results of COVID-19infection segmentation using the same multi-lesion pre-trained model under different transfer strategies. These results suggest that all the transfer strategies improve the performance of training from scratch on average DSC, NSD and Sensitivity. In particular, in fold 2, they improve the segmentation by more than 6.4 DSC and 6.6 NSD on maximum. The strategies of Continual Learning and Body Fine-tuning get similar promising results, which both improve the segmentation by 2.3 DSC, 1.8 NSD and 3.4 Sensitivity on average.
Table 4
Results of 5-fold cross validation of different transfer learning strategies based on Multi-lesion pre-trained model. Time represents the training time for an epoch. The best results are shown in red font.
Results of 5-fold cross validation of different transfer learning strategies based on Multi-lesion pre-trained model. Time represents the training time for an epoch. The best results are shown in red font.An interesting observation is that, compared with Continual Learning and Body Fine-tuning, where all the parameters are updated on COVID-19 dataset, the strategy of Pre-trained Lesion Representations still achieves a competitive performance with an entirely frozen encoder and inherited weights of pre-trained non-COVID19 models. It is observed to improve the DSC from 0.673 to 0.693, NSD from 0.700 to 0.716 and Sensitivity from 0.643 to 0.662. In particular, in terms of training cost, Table 4 shows the training time (per epoch) for each transfer strategy. It can be clearly seen that the strategy of Pre-trained Lesion Representations spends just 148s per epoch on average, much less than training from scratch and other transfer strategies. Due to the frozen encoder, the strategy of Pre-trained Lesion Representations cuts down nearly a half of parameters that need to be updated. Thus, it is promising to adopt this strategy for COVID-19 transfer learning to save training costs and gain fast convergence.It is also observed in Table 4 that our proposed Hybrid-encoder transfer learning strategy exhibits significantly better segmentation performance than other strategies using multi-lesion pre-trained model. It improves the average DSC from 0.673 to 0.704 and NSD from 0.700 to 0.735, which also performs best among all the pre-trained models in Table 2. In terms of Sensitivity, F1-score, Accuracy and MCC in Table 3, proposed Hybrid-encoder learning outperforms other strategies and enhances the values to 0.6818, 0.7069, 0.9943 and 0.7162, respectively. The transfer ability of Hybrid-encoder is also confirmed by Fig. 3
. Compared with training from scratch, Hybrid-encoder learning yields segmentation results with more accurate boundaries in Fig. 3 (b)(c)(e) and identifies some minor COVID-19infection areas in Fig. 3 (a)(d)(f). The success of proposed Hybrid-encoder learning strategy is owed to the designed parallel encoders, where the COVID-19 and non-COVID19 lesions are both employed for encoding feature representations, leading to better generalization and lower over-fitting risks.
Fig. 3
Visual comparison of COVID-19 infection segmentation results between training from scratch and proposed Hybrid-encoder transfer learning strategy based on multi-lesion pre-trained model.
Visual comparison of COVID-19infection segmentation results between training from scratch and proposed Hybrid-encoder transfer learning strategy based on multi-lesion pre-trained model.
Discussion
A valuable demonstration is that a multi-lesion pre-trained model can make the best of multiple lung lesion representations, and advance the generalization and robustness of pre-trained models. The rationality of transferability from non-COVID19 to COVID-19 relies on their feature and texture similarity in CT images. Therefore, instead of starting the learning process from scratch, CNN first learns how to extract the features during pre-training with a diverse and large multi-lesion dataset, and the parameters can acquire appropriate values. Since the two datasets share common features, the model can pre-learn the shared knowledge about the shape, color, and edge of lung lesions. When a new dataset (i.e. inadequate COVID-19infections) is given, the pre-trained CNN can start from patterns that have been learned before and then targets to specific object concept about COVID-19infections during training the downstream task. There is value in recognizing that the CT appearance of these different lung lesions share some similarity. Thus, with more kinds of lung lesion datasets incorporated to pre-train a model, we could achieve better performance. This exploration is an important contribution, enabling more research on transfer learning to COVID-19infection from the perspective of utilizing non-COVID19 lung lesions.Moreover, this paper examines a series of different transfer learning strategies, including Continual Learning, Body Fine-tuning, Pre-trained Lesion Representations and the proposed Hybrid-encoder Learning. We observe segmentation improvement in all performance metrics. It is also noted that the strategy of Pre-trained Lesion Representations with a frozen encoder enhances performance as well. This gains more insight into the significant transferability exhibited by the encoding parameters. Though the encoding layers do not contain any explicit knowledge of the COVID-19infection, their parameters still enable the optimizer to reach a higher performance while fine-tuning. The rationale is that the encoded multi-lesion representations contain more high-level and abundant encoding information of the medically relevant lung lesion observed in CT images. Through combining the multi-lesion representations and COVID-19infection features, the proposed Hybrid-encoder achieves significant improvement. These observation and exploration are important not only in COVID-19 transfer learning but also in the general medical domain, because feature reuse from pre-training out-of-domain datasets shows significant improvement for task performance and training convergence.
Conclusion
In this paper, we investigate the transferability in COVID-19 CT segmentation. We present a set of experiments to better understand how different non-COVID19 lung lesions influence the performance of COVID-19infection segmentation and their different transfer ability under different transfer learning strategies. Our results reveal clear benefits of pre-training on non-COVID19 lung lesion datasets when public labelled COVID-19 datasets are inadequate to train a robust deep learning model. Among all the strategies, our proposed Hybrid-encoder Learning method based on multi-lesion pre-trained model effectively utilizes tranferred non-COVID19 lung lesion knowledge and gains significant improvement.Future research directions include utilizing more various non-COVID19 lung lesion datasets and investigating better transfer learning methods, so that non-COVID19 lung lesions can be effectively used to improve the quality of COVID-19infection segmentation in the absence of sufficient high-quality COVID-19 datasets.
Declaration of Competing Interest
The authors have no conflict of interest to disclose.