Literature DB >> 34934410

Robust weakly supervised learning for COVID-19 recognition using multi-center CT images.

Qinghao Ye^1,2, Yuan Gao^3,4, Weiping Ding⁵, Zhangming Niu⁴, Chengjia Wang⁶, Yinghui Jiang^1,7, Minhao Wang^1,7, Evandro Fei Fang⁸, Wade Menpes-Smith⁴, Jun Xia⁹, Guang Yang^10,11.

Abstract

The world is currently experiencing an ongoing pandemic of an infectious disease named coronavirus disease 2019 (i.e., COVID-19), which is caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Computed Tomography (CT) plays an important role in assessing the severity of the infection and can also be used to identify those symptomatic and asymptomatic COVID-19 carriers. With a surge of the cumulative number of COVID-19 patients, radiologists are increasingly stressed to examine the CT scans manually. Therefore, an automated 3D CT scan recognition tool is highly in demand since the manual analysis is time-consuming for radiologists and their fatigue can cause possible misjudgment. However, due to various technical specifications of CT scanners located in different hospitals, the appearance of CT images can be significantly different leading to the failure of many automated image recognition approaches. The multi-domain shift problem for the multi-center and multi-scanner studies is therefore nontrivial that is also crucial for a dependable recognition and critical for reproducible and objective diagnosis and prognosis. In this paper, we proposed a COVID-19 CT scan recognition model namely coronavirus information fusion and diagnosis network (CIFD-Net) that can efficiently handle the multi-domain shift problem via a new robust weakly supervised learning paradigm. Our model can resolve the problem of different appearance in CT scan images reliably and efficiently while attaining higher accuracy compared to other state-of-the-art methods.

Entities: Chemical

Keywords: COVID-19; Medical image analysis; Multi-domain shift; Multicenter data processing; Weakly supervised learning

Year: 2021 PMID： 34934410 PMCID： PMC8667427 DOI： 10.1016/j.asoc.2021.108291

Source DB: PubMed Journal: Appl Soft Comput ISSN： 1568-4946 Impact factor: 6.725

Introduction

The pandemic of coronavirus disease (COVID-19) is spreading all over the world rapidly. The number of infections is growing exponentially in different regions, which has triggered great health concerns in the international communities. One of the effective diagnostic methods confirmed by the World Health Organization is via viral nucleic acid detection using the reverse transcription polymerase chain reaction (RT-PCR) test [1]. However, the RT-PCR test is not sensitive sufficiently in some cases, which may put hurdles for presumptive patients to be identified and treated early. As a non-invasive imaging technique, computed tomography (CT) can detect those characteristics, e.g., bilateral patchy shadows or ground glass opacity (GGO), manifested commonly in the COVID-19 infected lung. Hence CT may serve as an important tool for COVID-19 patients to be pre-screened and diagnosed early. The quantified imaging biomarkers extracted from CT images can also provide crucial prognostic values. Recently, deep learning based methods have been developed efficiently for the chest X-ray/CT data analysis and classification [2], [3], [4], and these approaches can achieve state-of-the-art performance on X-ray/CT image diagnosis and prognosis. Nevertheless, most CT scan datasets for COVID-19 only contain CT volumes with a set of CT slices with only patient-level annotations provided (i.e., patient-level class labels available) indicating the patient is infected or not. There is a lack of per-slice labels since annotating each slice is labor-intensive and time-consuming for radiologists. It has been reported that it could take an experienced radiologist about 21.5 min [5] to analyze and label one whole CT volume. Consequently, convolutional neural network (CNN) based deep learning models trained on CT slices with only the patient-level label can perform poorly because some annotations of these CT slices are incorrect (e.g., non-lesion slices of the lung are actually be falsely labeled) leading training data to be noisy. (a) Samples of CT images are taken from five different hospitals and (b) The histograms of these CT images. Compared with images from Hospital A and Hospital D, it is clear that the brightness levels are distinctive. Moreover, the contrast of the data collected from the China Consortium of Chest CT Image Investigation (CC-CCII) dataset is considerably different from CT images acquired from other hospitals. The right bottom figure demonstrates the distribution of the images from different hospitals after normalization, however, these distributions still behave distinctively. It is of note that there are no visually distinctive features across CT scan images but it is easy for human radiologists to correctly classify despite CT scanner changes. On the contrary, deep learning based automated methods may fail to generalize across CT images acquired from different hospitals. Yet another challenge when employing deep learning methods to medical image recognition is called data distribution shift (a.k.a., multi-domain shift). Data distribution shift refers to the phenomenon that a common object or organ collected under various scenarios (e.g., different machine vendors and sequence parameters) can result in vastly different data distributions. Therefore, models trained under the empirical risk minimization (ERM) [6] might cause the failure of model generalization. It is because the ERM assumes that training and testing data are sampled from the same or similar distribution and domains. However, in the data distribution shift scenario, this assumption would be violated. When a neural network is trained with images from one domain and tested on another domain (i.e., distinct imaging scenarios), the recognition performance often degrades dramatically. Fig. 1 represents images of different CT data collected from different hospitals. In the figure, it can be observed that CT data obtained from different hospitals are visually different although they all present image slices of the lung. It is on the grounds that every hospital uses different protocols and parameters for CT scanners when collecting the images for patients. Therefore, the multi-domain shift problem of the multi-center and multi-scanner studies is nontrivial. It is crucial to solving the multi-domain shift problem to achieve a dependable recognition, which is critical for reproducible diagnosis and prognosis.

Fig. 1

(a) Samples of CT images are taken from five different hospitals and (b) The histograms of these CT images. Compared with images from Hospital A and Hospital D, it is clear that the brightness levels are distinctive. Moreover, the contrast of the data collected from the China Consortium of Chest CT Image Investigation (CC-CCII) dataset is considerably different from CT images acquired from other hospitals. The right bottom figure demonstrates the distribution of the images from different hospitals after normalization, however, these distributions still behave distinctively. It is of note that there are no visually distinctive features across CT scan images but it is easy for human radiologists to correctly classify despite CT scanner changes. On the contrary, deep learning based automated methods may fail to generalize across CT images acquired from different hospitals.

To cope with the issues above, in this work, we trained our model on both patient-level and image-level with multiple domain information. In particular, we consider the sequential information within the CT volume when predicting a patient is tested COVID-19 positive or not. To preserve the sequential information, we divide a lung CT volume into individual sections from the upper lobe all the way to the inferior lobe. As illustrated in Fig. 2, our method aggregates these sections as the representation of a patient. When aggregating these sections, we utilize the multiple instance learning method with the -max selection strategy for images in each section. With the help of the -max selection, our model can filter out the uncertain and noisy images that can be beneficial to make an accurate prediction. Moreover, multiple instance learning method is incorporated that can enforce our model to mine confident candidates for training and testing [7] while ignoring modeling the joint distribution of sections from the patient rather than a single image, which is rewarding for unseen center prediction.

Fig. 2

The architecture of our proposed CIFD-Net. It is of note that denotes the probability of the Section , and represents the probability of the patient who is tested COVID-19 positive or not. indicates the noise transaction from the probability of the true label to the probability of the noise label . In addition, is a feature embedding function. In addition, ResNet-50 [55] is adopted for backbone network.

In summary, our contributions are mainly three-fold: We proposed a weakly supervised learning based multi-domain information fusion framework for automated COVID-19 diagnosis from multi-center and multi-scanner CT scans that only requires patient-level annotations for training. We propose a novel noisy label correction technique that propagates the patient-level predictions to individual slices and identifies the COVID-19 infected slices accurately. We develop a slice aggregation module to alleviate the data distribution shift problem, which is essential for the deployment of the developed model in the real-world scenario. By validation on the China Consortium of Chest CT Image Investigation (CC-CCII) [8] benchmark dataset, our proposed coronavirus information fusion and diagnosis network achieves superior performance compared to state-of-the-art models on both patient-level and image-level.

Related work

Before the COVID-19 pandemic, a huge amount of deep learning based methods has been proposed for lung cancer CT image analysis. In this research area, there have been great achievements, culminating in the development of many end-to-end pipelines for lung cancer diagnosis, classification, treatment planning, and prognostic evaluation [9], [10], [11], [12], [13], [14], [15]. In the treatment of interstitial lung disease (ILD), deep learning approaches have also been developed [16], [17], [18], [19]. In CT scans for COVID-19 patients, image characteristics, e.g., ground glass opacity and/or consolidation, are akin to those observed from lung cancer and ILD patient CT scans. Therefore, in the design of COVID-19 detection algorithms using CT images, insights from research on both lung cancer and ILD are significant and there is a clear translatability to the COVID-19 studies.

CNNs for visual recognition.

Convolutional Neural Network (CNN) has been widely used in the medical diagnosis system [3], [20], [21]. Recently, plenty of COVID-19 recognition algorithms have been proposed, which have adopted artificial intelligence algorithms especially using the CNN. A comprehensive review of artificial intelligence assisted COVID-19 detection and diagnosis can be found elsewhere [22], [23], [24], [25], [26], and here we only provided a summary for the most relevant studies. Jin et al. [27] developed a combined segmentation-classification model for COVID-19 diagnosis. A few pre-trained models were tested, e.g., fully convolutional network (FCN-8s), U-Net, V-Net, and 3D U-Net++, as well as classification models like dual path network (DPN-92), Inception-v3, residual network (ResNet-50), and attention ResNet-50, from which the 3D U-Net++ and ResNet-50 combination achieved the best performance. However, it was unclear which layers were pre-trained and re-trained, the reproducibility of this study is uncertain. Wang and Wong [3] proposed COVID-Net, which stacked multiple convolutional blocks with dilated convolution to recognize chest X-ray images. Li et al. [2] explored the patient label and used max-pooling strategy over features extracted by the CNN from a set of slices to make the prediction. In addition, Ouyang et al. [4] deployed a 3D CNN and used the residual learning mechanism to build the network, which incorporated the depth information of the CT volumes. Shan et al. [28] proposed a human-in-the-loop strategy for infection region quantification, in which a modified V-Net was developed incorporating bottleneck building blocks to reduce training costs. The human-in-the-loop training procedure output a segmentation for subsequent manual corrections performed by radiologists, and then these corrected data were input to re-train the network iteratively. More recently, Hu et al. [29] proposed a weakly supervised multi-scale learning framework for COVID-19 classification and lesions detection, which demonstrated promising results but its performance may be hindered by using the patient-level labels that contain noise labeling. For automatic prognostication of COVID-19 patients, Huang et al. [30] developed a two-step segmentation model that extracted lung and lobes region followed by pneumonia segmentation. Both steps used separated U-Net and at least two follow-up scans for each patient were analyzed. The authors found significant differences in lung opacification percentage between the initial and the first follow-up scans, but not between the first and the second follow-up scans. Although the study findings are intriguing, there are critiques on lacking important information essential to the reproducibility [31]. Although the aforementioned studies and many others have shown promising results [1], [32], [33], [34], [35], [36], [37], [38], [39], [40], [41], [42], two major issues can prevent the widespread deployment of these methods: (1) most previously proposed approaches relied on heavily annotated ground truth, e.g., for the infectious areas and slice-based labeling and (2) domain-shift failure for multi-center and multi-scanned datasets and therefore, poor reproducibility was always a concern.

Multiple instance learning.

The multiple instance learning (MIL) is a weakly supervised learning problem that has been attempted in several studies including weakly supervised object localization [7], video anomaly detection [43], weakly supervised image segmentation [44] and others. In the MIL framework, a bag can be defined as a set of instances or image slices. Positive bags are assumed to contain at least one instance from a certain category and negative bags do not contain any instances from that category. It is intuitive to consider the classification of CT volumes that contain multiple CT slices as a MIL problem. A few methods have been proposed to solve the MIL problem. For example, Oquab et al. [45] trained a CNN using the max-pooling MIL strategy to classify the object. However, some of the MIL pooling strategies, such as max-pooling and mean-pooling, very often lead to insufficient and unstable training because of gradient vanishing. To fix this problem, Ilse et al. [46] combined the gated attention mechanism with the MIL strategy to solve the medical image classification problem, but it could not predict the instance label accurately. Chen et al. [47] developed a stylized generative method to transfer the knowledge from MRI to CT within unsupervised manner. Xia et al. [48] utilized uncertainties along different volume angles to measure the importance of predicted labels. Chen et al. [49] modeled intra-consistency between two domains to align the feature distributions. However, these methods requires to train the model using both source domain and target domain, which cannot handle the unseen domain scenarios. Our method will provide solutions to these limitations. The architecture of our proposed CIFD-Net. It is of note that denotes the probability of the Section , and represents the probability of the patient who is tested COVID-19 positive or not. indicates the noise transaction from the probability of the true label to the probability of the noise label . In addition, is a feature embedding function. In addition, ResNet-50 [55] is adopted for backbone network.

Domain adaptation.

Domain adaptation refers to the techniques aimed at improving the performance of machine learning tasks, e.g., classification, detection, segmentation, when training the classifier on the data only from the source domain, but testing it using related samples from a shifted target domain. Some approaches also use domain adaptation to help learn the feature representation. Hoffman et al. [50] proposed a method that learned the difference between classification and detection tasks, and transferred this knowledge from the classifier to detectors using weakly annotated data. In addition, MIL was incorporated for learning feature representation and classifier [51]. Besides, Mahmood et al. [52] utilized transformations such as hue, saturation, contrast, and brightness for RGB images to change the color and texture of the images in the source domain. Existing domain adaptation methods tend to use strongly annotated data in the source domain in order to improve the recognition performance, while our methods will focus on a weakly supervised manner. In other words, our method will require no instance-level annotation or the auxiliary strongly annotated data for recognition.

Proposed method

In this section, we introduce the proposed coronavirus information fusion and diagnosis network (CIFD-Net) with their key modules including an explainable classification module (ECM), a slice aggregation module (SAM), and a slice noisy correction module (SNCM), respectively as illustrated in Fig. 2. The proposed ECM integrates the generation of class activation mapping into the forward propagation of the CIFD-Net, enabling CAMs generation during training and testing, which provides explainable results for the prediction of our model. Besides, instead of training on image-level (slice-wise) labels, which requires a significant amount of labor for manual labeling, we propose the SAM to train on patient-level labels. We model the joint probability of slices for each patient by which slices are divided into several consecutive sections with equal length. We then compute the probability of each section by adopting a -max selection strategy, which can ignore some slice with large uncertainty, thus reduce the noise during modeling the joint probability at the patient level. With the help of modeling the joint probability, our model pays more attention to modeling the distribution of affected sections leading to better generalization on multiple domains. Moreover, in order to improve the accuracy on the image-level, we further proposed the SNCM, which models the transaction between the true label and noisy label since the labels at the patient-level are considered to be noisy with respect to slice-wise labels.

Problem formulation

The ultimate goal of our model is to diagnose whether a patient is tested positive or negative given a 3D volumetric CT lung scan. Let denotes the lung CT volume for a patient with CT slices, where is a 2D CT slice image. Let denotes whether a patient is tested to be COVID-19 positive or not. when the patient gets COVID-19, while stands for the patient is not COVID-19 infected. During the training stage, we only have patient-level labels, and the number of CT lung slices can vary significantly. (a) The workflow of the class activation mapping (CAM) scheme and (b) The proposed explainable classification module (ECM). It shows that our ECM can generate the CAM using only one forward pass, but the original method proposed by Zhou et al. [53] needs a post-processing procedure to generate the CAM. is the th feature map from the backbone network. Besides, and are the weights for the fully connected layer and the convolutional layer. and are the class activation maps for class . is the class score for class .

Explainable classification module

As the predicting process of CNN is in a black box. Several techniques [53], [54] have been proposed to shed light on how CNN makes the prediction and obtains the remarkable localization ability without any supervision of localization maps. As an explainable auxiliary diagnosis tool for radiologists, we employ the class activation mapping (CAM) [53], which can generate the localization maps for the prediction from the output of the backbone networks, e.g., ResNet [55], VGG [56], GoogLeNet [57], etc. However, the process of generating CAM is a two-step process, in which the backbone network is trained on the dataset and utilizes the weights of the final fully connected layer to compute the weighted sum of feature maps of the last convolutional layer. Suppose is the th feature map with height and width from the last convolutional layer, and is the weight of the last fully connected layer, where is the number of classes and is the number of feature maps from the last convolutional layer. Therefore, the class score of the class can be calculated by Therefore, the localization map for the class proposed in [53] is defined by and we can visualize the object localization maps via . Although CAM is a useful way to locate the region, it requires a post-processing procedure to generate. In our method, we plug the generation of CAM into the network with only one forward pass. Instead of directly applying global average pooling after the last convolutional layer, we replace the fully connected layer using a 1 × 1 convolutional layer with the stride of 1 before the global average pooling operation. Suppose the weight of the convolutional layer is which is the same mathematical form as the weight of the fully connected layer, i.e., , we tweak Eq. (1) as follows, which results in the same output with Eq. (1). Thus, the modified CAM for the class is computed as The modified activation mapping can accurately indicate the importance of the activation from CT images and locate the infected areas of the COVID-19 patients, providing the explainable and reliable results for prediction. The region with higher activation score indicates more importance the activation contributed to the prediction. The modified activation mapping can also offer the auxiliary diagnostic information for radiologists. The differences between the original CAM and our ECM strategy are demonstrated in Fig. 3.

Fig. 3

(a) The workflow of the class activation mapping (CAM) scheme and (b) The proposed explainable classification module (ECM). It shows that our ECM can generate the CAM using only one forward pass, but the original method proposed by Zhou et al. [53] needs a post-processing procedure to generate the CAM. is the th feature map from the backbone network. Besides, and are the weights for the fully connected layer and the convolutional layer. and are the class activation maps for class . is the class score for class .

Slice aggregation module

In some mild COVID-19 cases, there might be only part of the CT volume that has an infection, and very often the lesions are quite small. If we simply treat all slices as COVID-19 positive and train a classifier with the image-level label, it could lead to a noisy learning and poor results as the consequence. To overcome this problem, we propose the SAM and use the joint distribution to model the probability of patient is COVID-19 positive or negative. We assume that lesions are consecutive and only affect adjacent slices, consequently, we use a section based strategy to tackle the problem. The intuition of using the section based strategy is that it can be directly mapped to the problem of multiple instance learning (MIL) [58]. In MIL, samples are divided into two bags classified as positive and negative bags. A positive bag contains at least one positive instance and a negative bag only has the negative instance. In the problem, only bag labels (patient annotations) are provided, and sections can be treated as instances in the corresponding bags. Given a patient with CT slices, we divide these slices into disjoint sections, which can be considered as a set that contains an equal number of consecutive CT slices, i.e., , where is the amount of sections for patient as defined as follows, where is an empirically designed parameter named as section length. Then the probability of patient belonging to the class can be represented as where is the probability of the th section that belongs to the class . Instead of taking the average of each probability of the slice in that section, we take the -max probability for each class to compute the section probability. This is because some slices may contain few infection regions which can confound the prediction. To alleviate this problem, we adopt the -max selection method which can be formulated as where is the top th class score of the slice in the th section for the class , and is the sigmoid function. Then we use the patient-level annotations as the ground-truth during the training. The classification loss can be formulated as The number of CT samples used for training for each class collected by four different hospitals A, B, C, and D. Besides, details of the CC-CCII dataset are also listed, which was used in the independent testing stage. The ratio of positive and negative samples in training set is approximately 1:1, and 2:1 in test dataset.

Slice noisy correction module

To further alleviate the negative impact of the image-level noises, we propose the SNCM, which is loosely inspired by [59], to model the hidden distribution between the noisy label and the true label. Let denotes the true posterior distribution, given an image . The distribution of noisy label, , can be modeled as We estimate the noise transaction for the class as follows where ; is a nonlinear mapping function; and are trainable parameters for the class between the status and . Transaction score can be regarded as the score of the transaction from the true label to the noisy label with respect to the class . As a result, the estimated probability of noisy label for the class is represented as Finally, with the help of the estimated noisy probability, for the patient , the noisy classification loss is computed by By combining Eqs. (8), (12), we can obtain the total loss function that we need to optimize for our model that is calculated as follows, where is a hyper-parameter to balance the loss terms. During the model training, the above loss functions are optimized iteratively. By incorporating the SAM, we can build a unified end-to-end deep neural network architecture for the COVID-19 diagnosis. The whole training procedure is summarized in Algorithm 1.

Experiments and discussions

In this section, the effectiveness of our method is validated and the results are quantified. First, we provide some statistics of the datasets and describe the implementation details as well as the experimental settings, which are followed by the reported results, the ablation studies, and further discussions on the qualitative and quantitative results.

Datasets

In order to verify the effectiveness of proposed model on the data from an independent hospital, we use data from several hospital, then test the model on an independent dataset. The datasets used in our study are summarized in Table 1. We collect CT datasets from four different local hospitals and anonymize the data by removing all the patient identity information. In total, there are 804 CT scan volumes with 45,167 CT images, 380 of which are COVID-19 positive and the other 424 are negative cases. All COVID-19 positive cases are confirmed by the RT-PCR tests. We train on the cross-domain datasets collected from hospitals A, B, C, and D and test on an open public CC-CCII dataset [8]. CC-CCII dataset consists of 2034 3D CT volumes with 130,511 CT images, which have been acquired by the CT scanner from a different manufacturer representing another image domain.

Table 1

The number of CT samples used for training for each class collected by four different hospitals A, B, C, and D. Besides, details of the CC-CCII dataset are also listed, which was used in the independent testing stage. The ratio of positive and negative samples in training set is approximately 1:1, and 2:1 in test dataset.

Dataset	Number of patients			Number of CT images			Subset
	Total	Positive	Negative	Total	Positive	Negative
Hospital A	424	0	424	24,670	0	24,670	Train
Hospital B	58	58	0	5,512	5,512	0	Train
Hospital C	17	17	0	2,611	2,611	0	Train
Hospital D	305	305	0	12,374	12,374	0	Train

CC-CCII [8]	2,034	1,320	714	130,511	84,629	45,882	Test

Data standardization, pre-processing

Following the protocol used in [8], we first normalized images with - normalization, then we used the U-Net segmentation network [60] to segment the CT images. After that, we randomly cropped a rectangular region whose aspect ratio is randomly sample in and area randomly sampled in , then resized the region into 224 × 224 shape. Meanwhile, we randomly flipped the input volumes horizontally with 0.5 probability. The input data would be a set of CT volumes which are composed by consecutive CT slice images.

Implementation details

We use ResNet-50 [55] as the backbone network pre-trained on ImageNet [61]. For data augmentation, we use random horizontal flipping for the input CT volume in the spatial dimension. Each image in a CT volume is randomly horizontal flipped with a probability of 0.5. Then, we resize them into the size of 224 × 224. In addition, brightness and contrast are randomly changed within the range [0.9, 1.1]. The dropout rate is set to 0.7, is set to 0.0001, and the weight decay coefficient is set to 10−5. During the training and testing stage, we set and to compute the patient probability. We train our model using the Adam optimizer [62] with the initial learning rate , and training is terminated after 4000 iterations with a batch size 10. All experiments have been conducted on a workstation with 4 NVIDIA Tesla V100 GPUs using PyTorch.

Quantitative results

We reproduce and compare with four state-of-the-art methods [2], [3], [4], [55] on the COVID-19 CT classification. The results are shown in Table 2. For image-level supervision, COVID-Net [3] and ResNet-50 [55] employ the patient-level annotations as image annotations. Different to the methods proposed by Wang and Wong [3] and He et al. and [55], VBNet [4] adopts a 3D residual convolutional neural network (3D-ResNet) to train on CT volumes with patient labels. Moreover, COVNet [2] also trains on the patient-level label that they feed a patient-specific set of CT images into a 2D ResNet and simply aggregate the image-level feature descriptors with a max-pooling operator.

Table 2

Comparison results of our CIFD-Net method vs. state-of-the-art architectures on the CC-CCII dataset.

Annotation	Method	Patient Acc. (%)	Precision (%)	Sensitivity (%)	Specificity (%)	F1-score (%)	AUC (%)
Patient-level	ResNet-50 [55]	53.70±0.02∗∗	61.42±0.08∗∗	77.13±0.10∗∗	10.37±0.17∗∗	68.38±0.05∗∗	46.30±0.10∗∗
	COVID-Net [3]	53.62±0.03∗∗	61.35±0.01∗∗	77.18±0.05∗∗	10.06±0.25∗	68.36±0.02∗∗	44.53±0.18∗
	COVNet [2]	67.64±0.04∗∗	76.03±0.08∗	73.17±0.07∗∗	57.34±0.18∗	74.57±0.04∗∗	66.13±0.15∗
	VB-Net [4]	76.75±0.04∗	85.25±0.10∗∗	77.61±0.07∗∗	75.22±0.19∗	81.25±0.05∗∗	89.48±0.16∗
	CIFD-Net (Ours)	89.25±0.02∗∗	89.98±0.13∗	93.86±0.06∗∗	80.67±0.13∗	91.91±0.07∗∗	93.22±0.06∗∗

Image-level	ResNet-50 [55]	67.29±0.04∗∗	68.23±0.06∗∗	92.95±0.05∗∗	20.40±0.16∗∗	78.71±0.02∗∗	53.43±0.11∗∗
	COVID-Net [3]	64.83±0.08∗	66.28±0.07∗	93.18±0.02∗∗	12.48±0.04∗∗	77.46±0.03∗∗	51.47±0.09∗∗
	COVNet [2]	70.79±0.03∗∗	83.09±0.07∗	68.95±0.11∗∗	74.10±0.08∗∗	75.37±0.05∗∗	73.08±0.07∗∗
	CIFD-Net (Ours)	84.83±0.02∗∗	91.19±0.03∗∗	84.74±0.07∗∗	84.99±0.11∗∗	87.86±0.04∗∗	89.63±0.08∗∗

* indicates the -value , and ** represents the -value .

Comparison results of our CIFD-Net method vs. state-of-the-art architectures on the CC-CCII dataset. * indicates the -value , and ** represents the -value . Receiver Operating Characteristic (ROC) curves and area under ROC curves (AUC) of different models trained using patient-level annotation (a) and image-level annotation (b) on the CC-CCII dataset. From Table 2, several interesting observations can be summarized as follows. The CIFD-Net outperforms most of the competing models by a large margin on the independent testing dataset, which can be attributed to the successful multi-domain shift problem proffered by our model. For the patient-level classification, our model is performed better than other compared methods by at least 12.5% on accuracy. Moreover, our model yields also the best performance on the image-level classification, which outperforms COVNet [2] by 14.1%. In addition, receiver operating characteristic (ROC) analysis and area under curves (AUC) results are obtained to quantify the classification performance. Our CIFD-Net achieves higher AUC value at both patient-level and image-level annotation compared to other state-of-the-art methods. Meanwhile, it is worth noticing that at the patient-level our method significantly outperforms other methods by at least 16.3% with respect to the sensitivity, which is an important indication for diagnosing COVID-19 positive cases. Models trained on patient-level, such as [2], [4] and ours, achieve significant performance improvement than those trained on the image-level, i.e., [3], [55], especially on the patient-level accuracy. This reflects that the image-level noise is non-trivial and can have a negative impact that these models can be overfitted because of the noise. Moreover, the models trained on the image-level may rely on learning the image textures [63], which are highly discriminative between domains. As a consequence, the models are prone to be overfitted and biased toward different textures while predicting, which may explain why these methods, e.g., methods proposed in [3], [55], are poorly generalized to the unseen domains. Although methods proposed by Li et al. and Ouyang et al. [2], [4] also trained on the patient-level labels, our proposed CIFD-Net is superior to these methods, especially on the patient-level classification. The method proposed by Li et al. [2] performed the worst and this may because it has been trained on randomly selected CT images extracted from each 3D volume that may impede the encoding of lesions (often appearing adjacently between slices). In contrast, Ouyang et al. [4] preserved the sequential information among the CT slices because their method was trained on the whole CT volumes. In contrast, we take the full 3D volume into account and preserve the sequential information by dividing the volume into sections [2], [4]. Besides, VB-Net achieves better performance than COVNet because VB-Net is trained with stronger supervision that is additional to the image level classifier. It also employs an auxiliary pixel-wise classifier trained with pixel-level infection annotation (i.e., infection segmentation mask). In comparison, our proposed model achieves better overall classification performance than VB-Net with weak supervision only. We carried out the ROC analysis and the AUC results were used to quantify the classification performances as shown in Fig. 4. From Fig. 4(a), we can observe that the models trained only on image-level annotations (i.e., ResNet-50 and COVID-Net) are not reliable since their AUCs are less than 50%. In addition, we found that overall our CIFD-Net remains the best performed algorithm with an AUC of 93.22%. It is of note that the overall results at the patient-level are higher than those at the image-level. This could be correlated with our findings in the classification that some CT slices with few lesion parts are hard to diagnose and classify.

Fig. 4

Receiver Operating Characteristic (ROC) curves and area under ROC curves (AUC) of different models trained using patient-level annotation (a) and image-level annotation (b) on the CC-CCII dataset.

To examine the influence of different loss terms, we conduct ablation studies on the proposed model and the results are reported in Table 3. As seen in the table, the model with the SNCM slightly outperforms the model without the SNCM on the patient-level. However, the SNCM advances the prediction at the image-level with significant improvement by 6.2% for the image accuracy. However, when only use the SNCM, the model would still be biased to predicting CT images tested negative because we only require our model to correct those CT images wrongly labeled as COVID-19 positive providing strong prior information to the training procedure.

Table 3

Accuracy (%) of all the cases where each proposed component is applied.

Exp.	ResNet-50	Lcls	Lnoisy	Patient Acc. (%)	Image Acc. (%)
1	√			53.72	67.31
2	√	√		83.97	78.60
3	√		√	35.10	35.16
4	√	√	√	89.23	84.83

Accuracy (%) of all the cases where each proposed component is applied. Furthermore, we have examined the sensitivity of the choice of the hyper-parameters and for our model. Fig. 5 shows the effect of the patient-level accuracy and the image-level accuracy while tuning the hyper-parameter . We can see that if is too large, the model would be biased and the performance would drop significantly since it acts as the regularization terms in the model training. The best results are obtained when . In addition, for the selection of the hyper-parameter , we can observe that when is too large or too small, the performance degrades dramatically. This is because that if is too large, the uncertainty of the section would increase and cause the noisy prediction. On the contrary, if is too small (e.g., ), some important slice information would be neglected, which leads to inaccurate results (see Fig. 6).

Fig. 5

Variations in classification results by changing the hyper-parameter . The light dash line represents the case when . It shows that our model achieves the best performance with .

Fig. 6

Variations in classification results by changing the hyper-parameter . Our model achieves the best performance with with section size .

Variations in classification results by changing the hyper-parameter . The light dash line represents the case when . It shows that our model achieves the best performance with . Variations in classification results by changing the hyper-parameter . Our model achieves the best performance with with section size . Visualization of the CAMs and bounding boxes generated by different methods on the CC-CCII dataset. The region with a deeper red color indicates discriminative regions for the prediction by the model. is the probability for being predicted as COVID-19 positive. Visualizations of infected/non-infected probabilities of each section for the patients. The -axis of the plot is the section index of the patient. The right sub-figures of the probability plot are the picture sampled from the section listed above. (a) The first few sections are recognized as COVID-19 positive with high probability and when approaching the last few sections, no obvious lesions are found thus the positive probability drops drastically. (b) It shows that the probabilities of the first three sections are close to 0.5 indicating uncertainty for these sections. (c) For the last few sections, the lesions are gradually showing up in the left and right lower lobes together with increased infected probability.

Qualitative results

For qualitative studies, we use the trained models (e.g., ResNet-50, COVNet, and others) to visualize the CAMs and bounding boxes on the test set. Fig. 7 presents the visualization of CAMs using our ECM. We can clearly see that the model trained on the slice-level (ResNet-50) tend to discard the lesions and focus on non-infected regions, and this also explains why it makes inaccurate and unreliable diagnosis decision causing trouble for radiologist use. On the contrary, models trained on patient-level, COVNet for instance, are able to detect some of the lesions occasionally but mostly failed in estimating the extent of the lesions reliably. In contrast, our model is not only precise in terms of lesion localization but also precise in estimating the extent of the infectious areas.

Fig. 7

Visualization of the CAMs and bounding boxes generated by different methods on the CC-CCII dataset. The region with a deeper red color indicates discriminative regions for the prediction by the model. is the probability for being predicted as COVID-19 positive.

Moreover, based on the results of the CAMs, we extracted the bounding boxes using each method. It can be found that our CIFD-Net is able to yield more accurate bounding boxes on the salient part of the CAMs (Fig. 7) comparing to other methods, which indicates that our methods can be more applicable to perform auxiliary diagnosis. For instance, in diffusive cases (Fig. 7 rows 1 to 4), our CIFD-Net method has produced more accurate saliency maps compared to ResNet-50 and COVNet with less false positives and false negatives. Therefore, more precise localization (bounding boxes) have been generated. For the lesions distributed peripherally and subpleurally, both our CIFD-Net and COVNet approaches have performed better than the ResNet-50 (Fig. 7 rows 5 and 6). However, our CIFD-Net is more sensitive to the infectious regions that are not obvious in the images (Fig. 7 row 7). In addition, we visualize the infection probability of lung sections for patients and sample the CT slices from corresponding sections. As illustrated in Fig. 8, the red curve depicts the infection probability varying along different lung sections, and the blue curve, on the opposite, depicts the non-infection probability for each section. Overall, it can be seen that the infected lung sections are distributed adjacently and the transition between the sections is smooth. Besides, we found our model is capable and robust of localizing where the infected lung sections are, regardless of the scale and the types of lesions. For example, for patient A Section 2, despite there is a very small lesion (GGO) peripherally, our model is still quite sensitive and is able to identify the infected section. Our model reaches around a saddle point, i.e., 0.5, when there are no apparent lesions detected, for instance, Section 1 for patient B and Section 2 for patient C.

Fig. 8

Visualizations of infected/non-infected probabilities of each section for the patients. The -axis of the plot is the section index of the patient. The right sub-figures of the probability plot are the picture sampled from the section listed above. (a) The first few sections are recognized as COVID-19 positive with high probability and when approaching the last few sections, no obvious lesions are found thus the positive probability drops drastically. (b) It shows that the probabilities of the first three sections are close to 0.5 indicating uncertainty for these sections. (c) For the last few sections, the lesions are gradually showing up in the left and right lower lobes together with increased infected probability.

Discussions

Our proposed CIFD-Net sequentially aggregates image-level features within a CT volume to alleviate the multi-domain shift problems, which turns out to be very effective and we have demonstrated that our CIFD-Net can be better generalized to unseen data domain compared to other state-of-the-art works. This may be attributed to (1) the -max selection strategy: when optimizing the joint probability, only top- probabilities within each section have been considered. Besides, those confounded images are not considered, which can result in a robust prediction; (2) our loss function is designed for modeling the joint probability of the patient instead of the individual image slice. Compared with the naive models, e.g., plain ResNet-50 trained on single image slice, our model is less likely to overfit on varied image styles and appearance, e.g., due to assorted textures and contrasts of the images, because our model takes into account the relationship between sections and the correlation between images in each section. In addition, we integrated a novel slice noise correction module, i.e., SNCM, in the proposed CIFD-Net, which adds additional regularization to the optimization. Besides, we argue that this not only contributes to boosting the classification performance on the image-level prediction but also leads to more precise localization of lesions. However, since we trained the CIFD-Net under the assumption that CT slices are consecutive and lung segments (sections) are ordered, it may be difficult to handle disordered CT slices by using the slice aggregation, i.e., SAM and as a consequence, it may result in less accurate classification.

Conclusion

In this study, we have proposed a robust COVID-19 recognition model named CIFD-Net, which exploits the ECM to assist radiologists for auxiliary diagnosis. To handle the volume information, the model adopts the SAM to combine different sections for the sake of modeling the joint probability of the patient is COVID-19 positive or not. In addition, we extend our CIFD-Net incorporating the SNCM to predict a single CT slice without any image-level annotations. To investigate the prediction performance of the proposed model, we conducted comprehensive experiments on publicly available CT datasets. Experimental results have verified the superiority of our model, which can solve the multi-domain shift problem efficiently and effectively, compared to other state-of-the-art methods.

CRediT authorship contribution statement

Qinghao Ye: Conceived and designed the study, Literature search, Data analysis, Data interpretation, Contributed to the tables and figures, Writing of the report, Writing – review & editig. Yuan Gao: Literature search, Data analysis, Contributed to the tables and figures, Writing of the report. Weiping Ding: Literature search, Contributed to the tables and figures, Writing of the report. Zhangming Niu: Literature search. Chengjia Wang: Literature search. Yinghui Jiang: Literature search, Data analysis. Minhao Wang: Literature search, Data analysis. Evandro Fei Fang: Literature search. Wade Menpes-Smith: Literature search. Jun Xia: Conceived and designed the study, Data collection, Data analysis, Data interpretation, Contributed to the tables and figures, Writing of the report. Guang Yang: Conceived and designed the study, Data analysis, Data interpretation, Contributed to the tables and figures, Writing of the report, Writing – review & editing.

Declaration of Competing Interest

The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: QY, YJ, MW, is employed by Hangzhou Ocean’s Smart Boya Co., Ltd., China. YG, ZN, WS, is employed by Aladdin Healthcare Technologies, Ltd., UK. YJ, MW, is employed by Mind Rank Ltd, Hongkong. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

35 in total

1. Uncertainty-aware multi-view co-training for semi-supervised medical image segmentation and domain adaptation.

Authors: Yingda Xia; Dong Yang; Zhiding Yu; Fengze Liu; Jinzheng Cai; Lequan Yu; Zhuotun Zhu; Daguang Xu; Alan Yuille; Holger Roth
Journal: Med Image Anal Date: 2020-06-27 Impact factor: 8.545

Review 2. Review of Artificial Intelligence Techniques in Imaging Data Acquisition, Segmentation, and Diagnosis for COVID-19.

Authors: Feng Shi; Jun Wang; Jun Shi; Ziyan Wu; Qian Wang; Zhenyu Tang; Kelei He; Yinghuan Shi; Dinggang Shen
Journal: IEEE Rev Biomed Eng Date: 2021-01-22

3. Adaptive Hierarchical Dual Consistency for Semi-Supervised Left Atrium Segmentation on Cross-Domain Data.

Authors: Jun Chen; Heye Zhang; Raad Mohiaddin; Tom Wong; David Firmin; Jennifer Keegan; Guang Yang
Journal: IEEE Trans Med Imaging Date: 2022-02-02 Impact factor: 10.048

4. Deep Learning COVID-19 Features on CXR Using Limited Training Data Sets.

Authors: Yujin Oh; Sangjoon Park; Jong Chul Ye
Journal: IEEE Trans Med Imaging Date: 2020-05-08 Impact factor: 10.048

5. Serial Quantitative Chest CT Assessment of COVID-19: A Deep Learning Approach.

Authors: Lu Huang; Rui Han; Tao Ai; Pengxin Yu; Han Kang; Qian Tao; Liming Xia
Journal: Radiol Cardiothorac Imaging Date: 2020-03-30

6. COVID-19 on Chest Radiographs: A Multireader Evaluation of an Artificial Intelligence System.

Authors: Keelin Murphy; Henk Smits; Arnoud J G Knoops; Michael B J M Korst; Tijs Samson; Ernst T Scholten; Steven Schalekamp; Cornelia M Schaefer-Prokop; Rick H H M Philipsen; Annet Meijers; Jaime Melendez; Bram van Ginneken; Matthieu Rutten
Journal: Radiology Date: 2020-05-08 Impact factor: 11.105

7. Computer-aided diagnosis of lung nodule classification between benign nodule, primary lung cancer, and metastatic lung cancer at different image size using deep convolutional neural network with transfer learning.

Authors: Mizuho Nishio; Osamu Sugiyama; Masahiro Yakami; Syoko Ueno; Takeshi Kubo; Tomohiro Kuroda; Kaori Togashi
Journal: PLoS One Date: 2018-07-27 Impact factor: 3.240

8. Unbox the black-box for the medical explainable AI via multi-modal and multi-centre data fusion: A mini-review, two showcases and beyond.

Authors: Guang Yang; Qinghao Ye; Jun Xia
Journal: Inf Fusion Date: 2022-01 Impact factor: 12.975

1 in total

1. EffViT-COVID: A dual-path network for COVID-19 percentage estimation.

Authors: Joohi Chauhan; Jatin Bedi
Journal: Expert Syst Appl Date: 2022-10-03 Impact factor: 8.665

1 in total