Chunmei He1, Lanqing Zheng1, Taifeng Tan1, Xianjun Fan1, Zhengchun Ye2. 1. School of Computer Science, School of Cyberspace Science, Xiangtan University, Xiangtan, Hunan 411105, China. 2. School of Mechanical Engineering, Xiangtan University, Xiangtan, Hunan 411105, China.
Abstract
The outbreak of COVID-19 threatens the safety of all human beings. Rapid and accurate diagnosis of patients is the effective way to prevent the rapid spread of COVID-19. The current computer-aided diagnosis of COVID-19 requires extensive labeled data for training, and this undoubtedly increases human and material resources costs. Domain adaptation (DA), an existing promising approach, can transfer knowledge from rich labeled pneumonia datasets for COVID-19 diagnosis and classification. However, due to the differences in feature distribution and task semantic between pneumonia and COVID-19, negative transfer may reduce the performance in diagnosis COVID-19 and pneumonia. Furthermore, the training data is usually mixed with many noise samples in practice, and this also poses new challenges for domain adaptation. As a kind of domain adaptation, partial domain adaptation (PDA) can well avoid outlier samples in the source domain and achieve good classification performance in the target domain. However, the existing PDA methods all learn a single feature representation; this can only learn local information about the inputs and ignore other important information in the samples. Therefore multi-attention representation network partial domain adaptation (MARPDA) is proposed in this paper to overcome the above shortcomings of PDA. In MARPDA, we construct the multiple representation networks with attention to acquire the image representation and effectively learn knowledge from different feature spaces. We design the sample-weighted strategy to achieve partial data transfer and address the negative transfer of noise data during training. MARPDA adapts to complex application scenarios and learns fine-grained features of the image from multiple representations. We apply the model to classify pneumonia and COVID-19 respectively, and evaluate it in qualitative and quantitative manners. The experimental results show that our classification accuracy is higher than that of the existing state-of-the-art methods. The stability and reliability of the proposed method are validated by the confusion matrix and the performance curves experiments. In summary, our method has better performance for diagnosis COVID-19 compared to the existing state-of-the-art methods.
The outbreak of COVID-19 threatens the safety of all human beings. Rapid and accurate diagnosis of patients is the effective way to prevent the rapid spread of COVID-19. The current computer-aided diagnosis of COVID-19 requires extensive labeled data for training, and this undoubtedly increases human and material resources costs. Domain adaptation (DA), an existing promising approach, can transfer knowledge from rich labeled pneumonia datasets for COVID-19 diagnosis and classification. However, due to the differences in feature distribution and task semantic between pneumonia and COVID-19, negative transfer may reduce the performance in diagnosis COVID-19 and pneumonia. Furthermore, the training data is usually mixed with many noise samples in practice, and this also poses new challenges for domain adaptation. As a kind of domain adaptation, partial domain adaptation (PDA) can well avoid outlier samples in the source domain and achieve good classification performance in the target domain. However, the existing PDA methods all learn a single feature representation; this can only learn local information about the inputs and ignore other important information in the samples. Therefore multi-attention representation network partial domain adaptation (MARPDA) is proposed in this paper to overcome the above shortcomings of PDA. In MARPDA, we construct the multiple representation networks with attention to acquire the image representation and effectively learn knowledge from different feature spaces. We design the sample-weighted strategy to achieve partial data transfer and address the negative transfer of noise data during training. MARPDA adapts to complex application scenarios and learns fine-grained features of the image from multiple representations. We apply the model to classify pneumonia and COVID-19 respectively, and evaluate it in qualitative and quantitative manners. The experimental results show that our classification accuracy is higher than that of the existing state-of-the-art methods. The stability and reliability of the proposed method are validated by the confusion matrix and the performance curves experiments. In summary, our method has better performance for diagnosis COVID-19 compared to the existing state-of-the-art methods.
The COVID-19 pandemic is seriously threatening global public health. A key challenge in the fight against COVID-19 is to diagnose patients in a large population, provide the medical treatment necessary and prevent further spread of the virus. Currently, the increasing X-ray images [1] provide a way for computer-aided diagnosis. Diagnosis of COVID-19 based on chest X-ray images is a standard image classification problem.Deep learning techniques show promising results to accomplish radiological tasks by automatic analyzing medical images these years. With the outbreak of COVID-19, many scholars seek to perform classification tasks on the collected datasets. Most methods choose to train network weights on a small number of annotated datasets. The existing supervised learning methods make great achievements in COVID-19 diagnosis, but this often requires large amounts of labeled data to train models [2]. Unfortunately, a large collection of annotated data usually requires a lot of human and financial resources. It is impractical and unlikely for physicians to label large numbers of COVID-19 samples in practice. DA [3] can effectively solve this problem. DA can transfer knowledge from an existing domain with large number of labeled data (source domain) to another domain with unlabeled data (target domain), complete the learning tasks in target domain and solve the problem of lack of labeled samples in target domain. According to the relation of label spaces between source domain and target domain, DA is divided into closed-set DA, PDA and open-set DA. In closed-set DA, the label spaces of source domain and target domain are the same. However in practice, the target domain label space is the subset of source domain label space, this kind of DA is PDA. In open-set DA, the label spaces of source domain and target domain are different, and the intersection of the two label spaces is not empty. PDA is a special case of open-set DA. The problem definition of PDA is more common in practice. In this paper, we focus on PDA. In real scenarios, there is more or less noise data in the source domain, and this presence affects the learning process. In this paper, we incorporate the noise data of the source domain into a separate class, called noise class. In this case, the number of target domain classes is smaller than the number of source domain categories. This case fits the problem hypothesis of PDA. In general, PDA needs to face the following key issues: (1) target domain data without labels; (2) differences in feature distribution between the source domain and target domain; (3) interference of noise class data in the source domain. The first two problems are generally addressed in DA, while the third is specific in PDA.During the learning process, PDA methods transfer the partial source domain knowledge to the target domain and realize cross-domain and cross-task learning. The adaptive process of PDA is explained in Fig. 1. Although the existing PDA methods achieve good results, both high-level semantic information and high-level characteristics of images are particularly important in medical diagnosis because of the particularity of medical images. Therefore, single representation learning is difficult to play an advantage in medical diagnosis problem. Inspired by fine-grained segmentation of images, we design multi-feature representation network with attention mechanism in PDA. Learning features from multiple representations can effectively improve the classification performance of PDA. Therefore, how to extract better medical data characteristics and realize knowledge transfer are particularly important.
Fig. 1
Adaptive process of partial domain adaptation, where (a) is the effect of applying traditional domain adaptation methods to partial sets, and (b) is the effect of applying partial domain adaptation methods. We can see that the outlier class samples in (a) are misaligned to the target samples (dashed line part), and in (b) the effect of rejecting the outlier class classes can correctly align the corresponding class samples of the source and target domain(dashed area).
Our goal is to train a deep model on the labeled source domain, transfer to the target domain and make it perform well on the target domain. Our method is illustrated in Fig. 2. MARPDA attempts to achieve partial knowledge transfer in a sample-weighted strategy and carry out cross-domain knowledge transfer through adversarial learning. To sum up, this paper makes the following contributions.
Fig. 2
The overview of MARPDA.
Adaptive process of partial domain adaptation, where (a) is the effect of applying traditional domain adaptation methods to partial sets, and (b) is the effect of applying partial domain adaptation methods. We can see that the outlier class samples in (a) are misaligned to the target samples (dashed line part), and in (b) the effect of rejecting the outlier class classes can correctly align the corresponding class samples of the source and target domain(dashed area).(1) We propose a multi-attention representation network for partial domain adaptation (MARPDA) to effectively solve the problem of cross-domain distribution differences and label space mismatch.(2) We design a new and more realistic application scenario. The model learns knowledge on the source domain and transfers to the target domain.(3) Extensive Experiments on public COVID-19 chest X-ray image classification datasets demonstrate that MARPDA outperforms several state-of-the-art baselines.
Related work
Radiological diagnosis is a conveniently medical technique for patients who are suspected of COVID-19 in urgent need of diagnosis. X-ray scans are widely used and provide compelling technology to diagnosis COVID-19 by radiologists. Many researchers propose COVID-19 detection methods based on image analysis. A detailed study is conducted in [4] to illustrate the importance of early detection and management of COVID-19 patients. In most of the cases, the deep learning based approaches are applied and achieved very promising detection accuracy for COVID-19. It designs a model framework to assist radiologists in the automatic diagnosis of COVID-19 in X-ray images [5]. In [6], it proposes a method to identify COVID-19 patients with multi-task deep learning networks. Although these methods achieve relatively remarkable achievements, these learning methods rely on large numbers of labeled datasets. The medical field often lacks labeled samples. It is difficult and expensive to label samples in medical field. As an emerging technology, transfer learning can effectively deal with these problems.Transfer learning (TL) [7] is widely applied in multiple fields, such as image recognition [8]. As a branch of transfer learning, DA [3] can acquire better adaptation to two related but different inter-domain knowledge. Most existing domain adaptation methods for images [9] seek to alleviate domain differences by adding adaptive layers to match the higher moment [10] of the distribution, or by designing domain discriminators to learn domain invariant features [11]. Many deep learning and machine learning methods are proposed for COVID-19’s medical image analysis. In [12], the authors propose an semi-supervised method for COVID-19 diagnostic tasks. In [13], it proposes a cross-domain and cross-task learning strategy to achieve knowledge transfer. In [14], the authors learn diagnostic COVID-19 from pneumonia cases through unsupervised meta-learning. These methods take into account data distribution and task differences between pneumonia and COVID-19 and propose specific countermeasure solutions. Indeed, in addition to consider the differences in cross-domain data distribution and cross-task differences, we also need to consider the noise data in the training data, a problem ignored by the existing methods.Learning knowledge transfer in noisy environments adds challenges to the existing learning tasks. The proposal of PDA [15], [16] inspires us to face this challenge. The setting of PDA is closer to reality than that of DA, and it assumes that the target label set is a subset of the set of source domain labels. Traditional DA methods cause negative transfer to this problem setting due to labels set mismatch problem. So PDA can realize cross-domain partial knowledge transfer by matching the same label data across the two domains. In existing PDA methods, partial adversarial domain adaptation [17] implements PDA in a source-domain class-weighted way; while example transfer network [18] uses a sample-weighted strategy. The existing methods consider only a single feature representation of the input sample, resulting in the neglect of partially important information. We propose a multi-attention representation network for PDA to achieve COVID-19 diagnosis. The multi-attention representation network can learn features in medical data from multiple different perspectives, learn more comprehensively raw data, and improve local feature learning, but the previous methods acquire only from a single structure. In addition, we leverage adversarial learning to reduce differences across domain distribution and match it with sample-weighted strategies for cross-task learning. Experimentally, our approach can achieve outstanding achievements in transfer tasks for diagnosis of pneumonia and COVID-19.
Proposed method
In this section, the problem definition is presented firstly. Then we mainly introduces the proposed method MARPDA.
Problem definition
According to the setting of PDA, there is a source domain of labeled examples drawn from classes and a target domain of unlabeled examples drawn from . The two domains share identical feature space while the source domain label space subsumes the target domain label space . Data in the source domain is sampled from distribution and data in the target domain is sampled from distribution , .
We propose a PDA method for deep domain diagnostic COVID-19, namely multi-attention representation network partial domain adaptation (MARPDA). The overview of MARPDA is shown in Fig. 2. The proposed MARPDA contains two main processes: (1) feature extraction; (2) partial feature alignment. It is noted that representation learned from individual substructures is finally concatenate together as input to partial features alignment.(1) Feature extractionThe overview of MARPDA.Feature extraction of MARPDA include two parts: Resnet and multi-attention representation networks. We design the Resnet network to get the initial features, and then the multi-attention representation networks is designed to carry out further feature extraction. Next, we focus on the multi-attention representation networks and the execution process.Some recent deep transfer methods [16], [18] use the activation of the global average pooling layer as image representations and then align the distributions of the single representation. It is known that this single-representation adaptation approach may miss or ignore some significance information to improve performance [19], [20]. Thus it is necessary to learn multiple domain-invariant representations by minimizing the discrepancy between the distributions of multiple representations.To learn multiple different domain-invariant representations, we design a mixed structure (MAR) composing of multiple substructures to extract multiple representations from low-pixel images. As an intuitive example shown in Fig. 3, MAR has multiple substructures (n is the number of substructures), which are different from each other. With the MAR replacing the global average pooling, multiple representation can be obtained, where R represents the output of the Resnet. Comparing to the single representation, the multiple representations can cover more information. Hence, aligning the distributions of the multiple representations with more information can achieve better performance.
Fig. 3
Substructure diagram, where A represents the Attention module.
The full-connected layer recombines the multiple representations, and the softmax layer outputs the predicted labels. Finally, the neural network
f (x) with MAR is reformulated as
where and refer to the inputs of the source and target domain, respectively.Different from previous single-representation adaptation networks, the deep transfer networks with MAR can learn multiple domain-invariant representations. The MAR is a multi-representation extractor. Moreover, the multiple domain invariant representations can cover more information. It is worth to note that the MAR can be implemented by most feed-forward models. When you implement MAR in other networks, you just replace the last average pooling layer with MAR.To improve the efficiency of knowledge transfer, we design the attention mechanism for multi-representation learning, as shown in Fig. 3. In CNN, the maximum pooling layer can obtain maximal response for local regions. Using maximum pooling can focus on important information in features that is conducive to improve the efficiency of DA. So we design the maximum pooling layer on different substructures to obtain the attention map of each channel and multiply this attention map with the output. This helps the model to classify samples from fine-granularity.Substructure diagram, where A represents the Attention module.(2) Partial Features AlignmentThanks to the effectiveness of generative adversarial networks (GAN) [21], many domain adaptation approaches [11] employ domain adversarial network to extract domain-transferable features and these methods achieve impressive performance in DA. The adversarial network in MARPDA consists of two parts: a feature extractor (Resnet50MAR) and a discriminator. They reduce inter-domain distribution differences by playing in minimal ways with the following formula. Since the existence of the source-outlier domain seriously misleads knowledge transfer between the source and target domains, our goal is to identify and distinguish the source-shared domain and the source-outlier domain and then filter out samples in the source-outlier domain. Finally, the source and target domain shared class sample alignment, i.e., partial features alignment.To address this problem, we seek to filter outlier class samples using an auxiliary classifier trained on the source domain. Combining the auxiliary classifier and the adversarial network enables partial features alignment. The input of the auxiliary classifier is representation of all samples in the source domain, and the output is a weight ranging between 01. The specific network structure is shown in Fig. 4. The input passes through three layers of a full-connected neural network to output , a vector of a dimension of Cs size. represents the probability that each sample belongs to the category . We finally introduce leaky-softmax activation in the network. The leaky-softmax activation has the property that the element-sum of its outputs is smaller than 1. Because the auxiliary classifier is trained on the source domain, the source examples have higher probability to be classified as a specific source class c, while the target examples have smaller logical value and uncertain predictions. Therefore, the element-sum of the leaky-softmax activation outputs is closer to 1 for source examples and closer to 0 for target examples.
Fig. 4
Network structure of the auxiliary classifier.
In confusing the source domain and the target domain, the shared class samples in the source domain are closer to the target domain data. In contrast, private class samples in the source domain are relatively far from the target domain data, and changes in their distribution are relatively less affected. So the output weight of the source domain shared class sample is closer to 0 and less than that of the source domain outlier class sample. Thus auxiliary discriminators are more likely to give a private class sample a larger value and a shared class a smaller value. The formula is as follows.Network structure of the auxiliary classifier.For the auxiliary classifier, we constrain it with two different loss functions, so that it can efficiently identify the source of each source domain sample. The first constraint function () is a classification loss trained by the existing source domain data and labels, and the second constraint function () is a second classification loss trained by source and target domain data.We aim to reduce the contribution of private class samples, so we reverse the output of the source domain on the auxiliary classifier by taking the reverse. This brings weights for shared class samples in the source domain and 0 for private class samples. The weight’s is computed as follows.
Eq. (8) is a weight vector, and each component represents the weight corresponding to a sample of source domain. We combine the sample weights with adversarial learning to identify a partial feature alignment as follows. where is entropy loss for target data, and it metrics prediction uncertainty.
Experimental results
In this section, we present a number of domain adaptation tasks and experiments to evaluate the performance of our method. At first, we introduce the experimental settings. Then the experimental results are evaluated on multiple domain adaptation tasks and compared with existing state-of-the-art methods. In addition, we analyze our experimental results in a graph-combined-table manner. The details are shown below.
Experimental setting
(1) DatasetsThe dataset is obtained from the open dataset [1], which contains X-ray for COVID-19, pneumonia and normal. Some of the data are shown in Fig. 5. We consider COVID-19 and pneumonia as two domains respectively and divide normal chest X-ray images into two domains. At this moment each domain contains two categories: 0 (negative) and 1 (positive). In addition, in order to simulate real application scenarios, for each domain, we randomly choose part of the data from each class and introduce Gaussian noise for noise processing. These data are combined into a separate class: noise. The statistics of two domains are summarized in Table 1.
Fig. 5
Partial examples of dataset, (a) examples of COVID-19, (b) examples of pneumonia.
Table 1
Statistics for the dataset.
Domain
Categories
#Total
#Normal
#Pneumonia
#COVID-19
#noise
Pneumonia
841
895
–
1000
2736
COVID-19
671
–
219
780
1670
The following experiments validate the impact of noise data on the knowledge transfer process, and show our method is effective to exclude noise and promote partial knowledge transfer.During training, the source domain has labels and the target domain is without label. We add the noise classes in source domain and the class number of source domain changes from two to three, and this is more practical in real-world scenarios. Therefore, there are serious domain differences and task differences in adaptation task. In addition, the imbalance in classes is also severe, as shown in Table 1. Given the above issues, the diagnostic task of COVID-19 is very challenging.Partial examples of dataset, (a) examples of COVID-19, (b) examples of pneumonia.Statistics for the dataset.(2) Compared methodsWe compare the proposed method with the following four methods, as shown in Table 2. In Source only, the model is trained only on source domain data, without transfer learning, and directly predicts sample labels on label-lack target domain. DANN [11] is the representative method in unsupervised domain adaptation (UDA), it trains models directly on source and target domain through adversarial way. PADA [17] and ETN [18] are representative methods in PDA. The former aligns the data distributions of the same classes in the source domain and the target domain in a class-weighted way, while the latter align the data distributions of the same classes in the two domains in a sample-weighted way.
Table 2
Comparison methods.
Method
Method name
Source paper
Source only
Residual Net
Deep residual learning for image recognition
DANN [11]
Domain-Adversarial Neural Networks
Domain-adversarial training of neural networks
PADA [17]
Partial Adversarial Networks
Partial adversarial domain adaptation
ETN [18]
Example Transfer Network
Learning to transfer examples for partial domain adaptation
(3) Implementation detailsComparison methods.For a fair comparison, we adopt a Resnet-50 [22] model, pre-trained on Image-Net [23], as the backbone of all methods. For MARPDA, the details of network are as follows.Network architectures: we implement feature extractor with multi-representation network structure, and the details are shown in Fig. 3. The implementation of classifiers and discriminators is based on multi-layer full-connected networks.Parameter settings: in the training process, we use an SGD optimizer with the learning rate of 0. 003 to train the whole network. The batch size for each domain is set to 16.Evaluation method: we use F1 score (%), Recall (%), Precision (%) and Accuracy (%) as our measure. The details are shown in the Appendix.
Evaluation the task from COVID-19 to pneumonia diagnosis under the noise(C-P)
In this section, we set the knowledge transfer task from COVID-19 to pneumonia under the noisy category conditions. We evaluate the effectiveness of our approach on this task. We also compare with some other methods including Source only, DANN [11], PADA [17] and ETN [18]. In this task, the source domain data is COVID-19. This domain contains three categories: 0, 1, 2, which means negative, positive and noise classes, respectively. The target domain data is pneumonia, which contains two categories: 0, 1, and they are negative and positive, respectively. The main challenges in this task are inter-domain category imbalance, data imbalance and inter-domain differences. The compared results are shown in Table 3. By Table 3, we make the analysis as follows.
Table 3
The compared results from COVID-19 to pneumonia.
Method
F1 score (%)
Recall (%)
Precision (%)
Accuracy (%)
Source-only
86.14
86.35
85.93
83.06
DANN [11]
38.28
25.62
75.64
53.13
PADA [17]
87.5
87.0
88.0
86.06
ETN [18]
90.61
88.67
92.65
88.58
Ours
91.30
93.20
89.46
89.06
(1) The evaluation of source only is lower than PADA, ETN, and MARPDA. Because source-only directly applies to the target data after training on the source domain, therefore the results are affected by inter-domain differences. So it is not wise to transfer directly with a network model without domain adaptation. (2) The F1 scores, Recall, Precision and Accuracy values of DANN are far lower than those of the PDA method and Source-only. This is because DANN is a shallow convolution network. Its learning ability is much lower than that of deep networks. Source-only is a standard Resnet50 network, a representative of deep networks with strong learning ability. Moreover, the knowledge that DANN learns in COVID-19 with a noisy class is difficult to correctly match to pneumonia, so the diagnosis is inefficient. (3) Comparing to PADA and ETN, our method (MARPDA) outperforms the comparison methods on multiple metrics. On the Precision metric, although our approach is about 3 percentage points lower than ETN, our scores are 91.30% and 93.20% on F1 scores and Recall measures, respectively, and nearly 4 percentage points above ETN and PADA. So our method has more advantages in comprehensive power.The compared results from COVID-19 to pneumonia.
Evaluation the task from pneumonia to COVID-19 diagnosis under the noise (P-C)
In this section, we set the knowledge transfer task from pneumonia to COVID-19 with noisy categories. We implement the diagnosis task from pneumonia to COVID-19 with noisy categories. We evaluate the effectiveness of our approach on this task. We also compare with the existing state-of-the-art methods including Source only, DANN [11], PADA [17] and ETN [18]. In this task, the source domain data is pneumonia. This domain contains three categories: 0, 1 and 2, which are respectively negative, positive and noise classes. The target domain data is COVID-19. This domain contains two categories: 0, 1, which are respectively negative and positive. The main challenges in this task are inter-domain category imbalance, data imbalance, and inter-domain differences. The compared results from pneumonia to COVID-19 are shown in Table 4. By Table 4 we can make the following analysis.
Table 4
The compared results from pneumonia to COVID-19 with noise.
Method
F1 score (%)
Recall (%)
Precision (%)
Accuracy (%)
Source-only
87.75
85.15
90.53
94.27
DANN [11]
40.78
40.68
83.8
78.87
PADA [17]
92.94
93.15
92.73
96.18
ETN [18]
93.67
91.13
96.35
96.14
Ours
96.52
95.85
97.2
97.98
(1) The experimental results of source-only show that we only learn on the source domain, compared to the PADA, ETN and MARPDA. This knowledge is difficult to migrate to target domain because of inter-domain differences, and this leads to their poor performance in both performance metrics; (2) DANN is a traditional DA method. It appears that its values are lower than existing deep learning methods. This is mainly influenced by the limited learning capacity of shallow networks and the negative transfer of traditional DA. During adaptation, the appearance of private class samples affects normal adaptation processes; (3) In addition, we also compare to the existing state-of-the-art PDA methods. As can be seen from Table 4, the F1 score value of PDA method exceeds 90% and the classification accuracy exceeds 95%, they are much higher than that of the traditional DA method and this reflects the advantages of PDA. Our method has a value of 96.52%, which is 3% higher than the existing methods. This indicates that our method has good comprehensive performance in COVID-19 diagnosis.The compared results from pneumonia to COVID-19 with noise.
Evaluation the task from pneumonia to COVID-19 diagnosis without the noise
In this section, we set the transferring task from pneumonia to COVID-19 without noise. We evaluate the performance of our method without noise conditions. This corresponds to the traditional UDA experiments. The source and target domain data have the same label sets, with only two categories: negative and positive. In this case, we focus more on the diagnosis of transferring from pneumonia to COVID-19. Also we compare with some other methods. The compared results are shown in Table 5.
Table 5
The compared results from pneumonia to COVID-19 without noise.
Method
F1 score (%)
Recall (%)
Precision (%)
Source-only
90.91
92.17
89.69
DANN [11]
78.72
90.83
69.48
PADA [17]
93.10
91.80
94.00
ETN [18]
96.80
97.70
95.96
Ours
97.07
98.62
95.56
MARPDA is evaluated on the F1 score and Recall, which are respectively at 97.07% and 98.62%, and those are greater than the existing state-of-the-art methods, respectively. On the Precision metric, it is slightly lower than the ETN. Higher Recall means that the more efficient to correctly diagnose positive patients, with more practical significance than Precision. From Table 5, we can find that DA method has certain advantages in this task Recall metric. Mainly note that DANN, as a representative of shallow networks, has a relatively low learning ability, so the diagnostic effect is inferior to deep networks. But, DANN compared to previous PDA experiments shows all metrics than those in Table 3, Table 4. This shows that DANN has some advantages in the case of inter-domain category balance. By this comparison, we can find that noisy classes can indeed affect the traditional DA adaptation effect. PADA and ETN serve as representative methods in PDA with F1 scores of 93.10% and 96.80%, respectively. Our method is 4 point and 1 point above their F1 scores, respectively. Overall, our method still maintains high accuracy on routine diagnostic tasks.The compared results from pneumonia to COVID-19 without noise.
Ablation studies
To verify the effectiveness of the different components in MARPDA, we carry out the ablation experiments. We eliminate the substructure and the attention mechanism respectively. The corresponding ablation experiment results are shown in Table 6. In Table 6, wo_substructure means without substructure; only_substructure means only has substructure and without attention module. MARPDA is with complete substructure and attention module. So substructureattention represents MARPDA.
Table 6
Results of the ablation experiments.
Component
F1score (%)
Recall (%)
Precision (%)
COVID-19->pneumonia
wo_substructure
90.61
88.66
92.65
only_substructure
91.3
90.58
92.10
substructure+attention
91.3
93.20
89.45
pneumonia->COVID-19
wo_substructure
93.67
91.13
96.35
only_substructure
94.34
94.34
94.34
substructure+attention
96.52
95.85
97.20
According to Table 6, we can make the following analysis. (1) The diagnostic effect after eliminating the substructure is the worst. The MARPDA without substructure utilizes only a single feature representation learning, which easily ignores part of the important information. Therefore, its F1 score has the lowest index value on two different tasks; (2) If only with substructure, the model can learn sample features by integrating multiple representations, which can further improve diagnostic performance. The model takes advantage of multi-representation learning and also improves the classification accuracy by about 2 percentage points; (3) The substructure with attention mechanism can pay more attention to data characteristics and improve the learning ability of the model. The complete MARPDA with substructure and attention modules are outstanding in F1 score, Recall and Precision metrics. In the task of COVID-19- pneumonia, our method is lower in the Precision metric than the previous substructure. But we compensate for this difference in Recall and are able to diagnose positive patients. In the pneumonia-COVID-19 task, our method can be more significantly reflected in the effectiveness of each part.Results of the ablation experiments.
Performance curves and confusion matrix
In this section, we show the performance curves for the MARPDA method on different tasks. Fig. 6(a) shows the test accuracy of transferring pneumonia to the COVID-19 task. Fig. 6(b) shows the test accuracy of transferring COVID-19 to the pneumonia task. As can be seen from the figures, our method converges fast and has a relatively stable performance.
Fig. 6
Test accuracy plot for MARPDA.
We also give the confusion matrix of our method in different transferring tasks. In Fig. 7, it is the confusion matrix of MARPDA trained after transferring from pneumonia to the COVID-19 without noise class, where Fig. 7(a) is the model’s confusion matrix diagnosing pneumonia in the source domain, and Fig. 7(b) is the confusion matrix diagnosing COVID-19 in the target domain. From Fig. 7, Fig. 8, we can see that MARPDA can efficiently detect pneumonia and COVID-19. Fig. 7(a) shows that both negative and positive patients can be accurately distinguished, with a classification accuracy of 100%. It can be seen from Fig. 7(b) that most patients can be correctly identified.
Fig. 7
The confusion matrix of MARPDA from pneumonia to COVID-19 in no noise class, (a) Confusion matrix for diagnosing pneumonia directly in the source domain by MARPDA (b) Confusion matrix for diagnosis of COVID-19 in target domain by MARPDA.
Fig. 8
Confusion matrix of MARPDA from pneumonia to COVID-19 in noisy class cases (a) Confusion matrix for diagnosing pneumonia directly in the source domain with noise (b) Confusion matrix diagnosing the COVID-19 in target domain with noise.
In addition to diagnose patients in noise-lack environments, we diagnose patients with a noisy class. We show the confusion matrix of MARPDA from pneumonia to the COVID-19 with a noisy class in Fig. 8. Fig. 8(a) is the confusion matrix diagnosing pneumonia in the source domain by MARPDA with noise, and it can be seen that the model almost correctly identifies patient and noisy data. In Fig. 8(b), we show the confusion matrix diagnosing the COVID-19 (target domain of the noise-lack class) data after learning on pneumonia (source domain). As seen from the figure, both images are correctly classified into the first two categories. Classification accuracy is not affected by the noise class in pneumonia.Test accuracy plot for MARPDA.The confusion matrix of MARPDA from pneumonia to COVID-19 in no noise class, (a) Confusion matrix for diagnosing pneumonia directly in the source domain by MARPDA (b) Confusion matrix for diagnosis of COVID-19 in target domain by MARPDA.Confusion matrix of MARPDA from pneumonia to COVID-19 in noisy class cases (a) Confusion matrix for diagnosing pneumonia directly in the source domain with noise (b) Confusion matrix diagnosing the COVID-19 in target domain with noise.
Results analysis and discussion
In the above experiments, we validate the effectiveness of our method through quantify and visualization. MARPDA shows different advantages on different tasks.In the quantitative analysis, we implement the task of mutual transfer between pneumonia and COVID-19 in noisy and noise-lack cases, respectively. We measure the performance of our method on F1 score, Recall, Precision and Accuracy. In the tasks of transferring COVID-19 to pneumonia and transferring pneumonia to COVID-19, our method achieves better results on the F1 score, Recall metric compared to the existing methods. Especially on Recall, our approach is nearly 4 percentage points higher. In the absence of the noise class, our method still performs a better diagnosis than the existing methods, which is mainly reflected in the F1 score and Recall metrics.In the qualitative analysis, we verify the effectiveness of our method by performance curves and confusion matrices. Through the performance curves, we can see that our method is relatively stable during training with robustness. In the confusion matrix, the darker the diagonal squares are, the more samples are correctly classified. Our method, the confusion matrix output by the model reflects the high classification performance of the model, both in the training phase and the test phase.Overall, traditional DA methods are difficult to accurately identify in different conditions. Although the existing classical PDA methods have certain advantages in identifying images, there is still room for improvement. The MARPDA method is superior compared to the F1 scores of existing advanced methods, both in the COVID-19 diagnosis and the diagnosis of pneumonia. Our method is applicable not only for data with noisy classes, but also for noise-lack environments, with certain generalization performance.
Conclusion
In this study, we propose an end-to-end network: multi-attention representation network partial domain adaptation for COVID-19 diagnosis.With the rapid propagation of COVID-19, it becomes a trend to rapidly diagnose COVID-19 using deep learning networks. But the diagnosis of COVID-19 by deep learning networks needs lots of labeled samples and it is difficult and costly in practice. Domain adaptation is an effective technology to solve these problems. The research of traditional closed-set domain adaptation methods are limited to the same source domain and target domain label space. In a more practical background, the source domain may contain some noise-class samples, and this can easily have a large impact on the transfer process of knowledge. In such a condition, we propose a new deep learning method and a new application scenario for COVID-19 and pneumonia diagnosis, called MARPDA. Its goal is to transfer partial knowledge from a label-rich source domain to a target domain without labels. To be specific, we minimize the domain discrepancy by aligning the feature distributions of two domains via domain adversarial learning. We design a multi-representation network with attention mechanism to better adapt to the properties of medical images. Our method can fully extract the global information in the data from different angles and from multiple different receptive fields. The proposed method changes the previous status quo of ignoring some important information for data features extraction by a single structure, and achieves fine-grained diagnosis. In new application scenarios, we propose a deep partial adaptation network to address the negative transfer caused by noisy data. Extensive experiments demonstrate the effectiveness and superiority of the MARPDA. In addition, our method can be applied in traditional scenarios. Overall, our method has some generalization. In the future, we can apply MARPDA to the COVID-19 diagnosis based on CT images and extend it to the segmentation task of COVID-19 fine-grained diagnosis.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.