Literature DB >> 35779478

Multi-branch fusion auxiliary learning for the detection of pneumonia from chest X-ray images.

Jia Liu¹, Jing Qi¹, Wei Chen², Yongjian Nian³.

Abstract

Lung infections caused by bacteria and viruses are infectious and require timely screening and isolation, and different types of pneumonia require different treatment plans. Therefore, finding a rapid and accurate screening method for lung infections is critical. To achieve this goal, we proposed a multi-branch fusion auxiliary learning (MBFAL) method for pneumonia detection from chest X-ray (CXR) images. The MBFAL method was used to perform two tasks through a double-branch network. The first task was to recognize the absence of pneumonia (normal), COVID-19, other viral pneumonia and bacterial pneumonia from CXR images, and the second task was to recognize the three types of pneumonia from CXR images. The latter task was used to assist the learning of the former task to achieve a better recognition effect. In the process of auxiliary parameter updating, the feature maps of different branches were fused after sample screening through label information to enhance the model's ability to recognize case of pneumonia without impacting its ability to recognize normal cases. Experiments show that an average classification accuracy of 95.61% is achieved using MBFAL. The single class accuracy for normal, COVID-19, other viral pneumonia and bacterial pneumonia was 98.70%, 99.10%, 96.60% and 96.80%, respectively, and the recall was 97.20%, 98.60%, 96.10% and 89.20%, respectively, using the MBFAL method. Compared with the baseline model and the model constructed using the above methods separately, better results for the rapid screening of pneumonia were achieved using MBFAL.

Entities: Chemical

Keywords: Auxiliary learning; Deep learning; Feature fusion; Multi-task learning; Pneumonia

Mesh：

Year: 2022 PMID： 35779478 PMCID： PMC9212341 DOI： 10.1016/j.compbiomed.2022.105732

Source DB: PubMed Journal: Comput Biol Med ISSN： 0010-4825 Impact factor: 6.698

Introduction

Pneumonia is an acute respiratory tract infection that leads to the production of pus in the alveoli and restricts the oxygen intake of patients. Infectious agents include viruses, bacteria and fungi. Viral pneumonia and bacterial pneumonia are infectious and are the two most common types of pneumonia [1], [2]. Pneumonia caused by a viral infection previously accounted for approximately 30% of all cases. COVID-19 has rapidly become a global epidemic after its emergence [3], [4], with a cumulative number of 437.37 million infections and 5.97 million deaths as of March 1, 2022 [5]. For infectious diseases, rapid screening and isolation are the keys to controlling the spread of the epidemic, and different types of infections correspond to different clinical treatment plans. Early detection of pathogens and timely targeted treatment are also conducive to the recovery of patients. The gold standard for the diagnosis of different types of pneumonia is a pathogen culture, but the culture time is long, the false negative rate is high, there are strict requirements related to the equipment, environment and operator, and it is difficult to distinguish whether the pathogen is a bacteria or virus [6]. The gold standard for COVID-19 screening is reverse transcriptase-polymerase chain reaction (RT-PCR), which takes 1 to 3 h to process and has poor real-time performance and low sensitivity to early infection [7], [8]. Fang et al. [9] showed through experiments that a sensitivity of only 71% could be achieved for detecting early COVID-19 infection using RT-PCR. Therefore, medical imaging is also an important tool for the auxiliary screening of pneumonia. Among various medical imaging methods, computerized tomography (CT) and chest X-ray (CXR) images are the most effective. CT is more sensitive than CXR for screening diseases, but its imaging quality is affected by the radiation dose, and there are risks of cross-infection during the imaging process. The portability of CXR imaging devices makes it possible to scan a patient in each isolation ward in approximately 15 s, reducing the risk of cross-infection. In addition, CXR is low cost, simple to operate, and have a wide range of applications [10], [11], [12], [13]. There are different and distinguishable characteristics between CXR images of different types of pulmonary infections, but as shown in Fig. 1, viral pneumonia and bacterial pneumonia are both inflammatory diseases, and their imaging findings usually have similarities [14]. For pneumonia screening through CXR images, radiologists need a great deal of time to make a diagnosis. Moreover, a diagnosis is prone to error, and it is difficult to provide a definite explanation for some features that are easily confused on the images [15]. By using artificial intelligence technology to assist in an artificial diagnosis, the CXR images of patients are automatically diagnosed and confirmed by doctors; this process results in a more rapid and accurate approach for pneumonia screening [16].

Fig. 1

Examples of CXR images of different types of pneumonias. a. COVID-19, b. Other viral pneumonia, and c. Bacterial pneumonia. There are some similarities among the features pf the three images.

Convolutional neural networks (CNNs) have been used to process a variety of medical images, including X-rays, through transfer learning [17], [18]. In CXR images, the signs of COVID-19 are similar to those in other viral pneumonia and bacterial pneumonia to a certain extent. It is also important to identify COVID-19, other viral pneumonia and bacterial pneumonia separately during pneumonia screening [19]. Therefore, the screening process of pneumonia involves the identification of multiple categories. Traditional deep learning methods usually regard the identification of multiple categories as a multi-classification task and use a vanilla network, i.e., a single-task network [20]. For pneumonia screening, normal and abnormal can be easily identified. How to make full use of the subtle differences in different pneumonia image features to improve the overall identification effect is a topic of research. Inspired by the prior-attention residual learning (PARL) architecture proposed by Wang et al. [19] and multi-task learning (MTL) and auxiliary learning (AL) strategies [21], [22], we propose a multi-branch fusion auxiliary learning (MBFAL) strategy to identify normal, COVID-19, other viral pneumonia, and bacterial pneumonia samples from CXR images: Examples of CXR images of different types of pneumonias. a. COVID-19, b. Other viral pneumonia, and c. Bacterial pneumonia. There are some similarities among the features pf the three images. 1. The multi-task learning strategy was achieved by using a multi-branch network. To improve the performance of the model on the validation dataset as much as possible, the multi-task was designed as the primary task branch for identifying the four categories and the auxiliary task branch for identifying different types of pneumonias. The auxiliary task branch consists of a detection branch that identifies COVID-19 and other pneumonia and a subclassification branch that identifies other viral pneumonia and bacterial pneumonia. 2. The PARL architecture, which is used to focus the model on the more easily identifiable region through the attention map transmission between the two branches, was applied to auxiliary task, and the feature maps from different branches in the auxiliary task were fused together by 1 × 1 convolution [23] to enhance the features of different pneumonias and improve the pneumonia recognition ability of the model [19]. 3. When the auxiliary parameters were updating, the label information was used to screen the pneumonia samples, and the feature maps of the auxiliary branch and primary branch were fused. Then, the final identification result was obtained through the classifier and the auxiliary parameters were updated through implicit differentiation optimization [24]. Through the influence of auxiliary parameters on primary parameters, the pneumonia recognition ability of the model could be improved without damaging the normal recognition ability as much as possible. 4. Primary task loss and auxiliary task loss were nonlinearly combined through a network that combines all losses to learn the deep relationship between the two tasks [21]. The structure of this paper is as follows. In the second section, some related works of the methods involved are summarized. In the third section, the proposed MBFAL method is introduced. In the fourth section, the performance of the method is tested and compared with some other methods. In the fifth section, the experimental results are analyzed, and some of the limitations of the current work are explained. In the sixth section, the research is summarized.

Related work

Multi-task learning [25] is used to obtain the output of multiple tasks simultaneously through a single network, and improves the performance of the model for each task by obtaining mutual information between tasks [20]. Auxiliary learning is a branch of multi-task learning; that is used to improve the prediction or generalization ability of the model on the primary task through one or more auxiliary tasks [26]. The main difference between the two models is that for multi-task learning, the performance of the model on all tasks needs to be considered, while for auxiliary learning, only the performance of the model on the primary tasks needs to be considered. The key problems of auxiliary learning are how to design the auxiliary tasks and how to integrate the loss between tasks to prevent the negative transfer of the auxiliary task on the primary task and maximize the positive effect [27], [28]. The multi-task model is easily affected by weight allocation among different tasks. Kendall et al. [29] proposed calculating the homoscedastic uncertainty between different task losses to carry out weighting. Chen et al. [30] proposed the GradNorm algorithm to balance the weight of each task by dynamically adjusting the gradient. This method requires access to the internal gradient of the network, which is difficult to implement. While Liu et al. [22] proposed the dynamic weighted average (DWA) algorithm based on GradNorm. When using DWA, only the loss value of each task needs to be obtained, and then the loss change rate of each task was calculated, and the loss was weighted average over time to dynamically balance the importance of each task. These methods are based on the assumption that all tasks are equally important. Therefore, the purpose of auxiliary learning cannot be achieved. To prevent the negative transfer of auxiliary tasks on the primary tasks, Du et al. [26] proposed determining whether the auxiliary task would reverse the optimization of the primary task by calculating the cosine similarity between the loss values of auxiliary tasks and the primary tasks, and then weighted them. Lin et al. [31] proposed that the dot product between the loss gradient of the auxiliary task and the loss gradient of the primary task could be used to determining whether the auxiliary task was helping the primary task to reduce loss. Both approaches also require access to gradients and lack consideration for relationships between auxiliary tasks. Navon et al. [21] believed that the linear weighting method could only be used to learn shallow relationships between tasks, and that a simple multi-layer perceptron (MLP) with a nonlinear activation function could be used to adaptively perform the nonlinear fusion of losses, thus obtaining a deeper relationship between auxiliary tasks and primary tasks. In our study, the method proposed by Navon et al. [21] was chosen to fuse losses. The design of auxiliary tasks mainly depends on the completeness of prior knowledge. Liu et al. [29] proposed that in the case of uncertainty about which auxiliary task is effective, the primary network can be used to obtain the predicted value of the primary task and the auxiliary task from one input, and then learn the auxiliary task label represented by probability through an additional network to learn an auxiliary task adaptively, avoiding manual design. These researchers optimized for training data, which may lead to auxiliary degradation [21]. For our four-category task, the difficulty lies in the recognition of different pneumonia types, so the auxiliary task was designed as the three-category task of recognizing COVID-19, other viral pneumonia and bacterial pneumonia. Auxiliary learning strategies have a bi-level optimization problem in parameter updating, that is, one optimization problem is used as a constraint to solve another optimization problem [32]. Bi-level optimization involves implicit function theorem (IFT), which requires calculation of the inverse Hessian matrix of the weight of neural network. However, in modern neural networks, due to the large number of parameters, it is difficult to directly calculate the inverse Hessian matrix, so a method is needed to approximate it [33]. Rajeswaran et al. [34] used the conjugate gradient (CG) to approximate the inverse Hessian matrix. Inspired by the unrolled differentiation algorithm [35], Lorraine et al. [24] used the Neumann series and the Jacobian matrix for approximation of this matrix. Experiments show that this method can be applied to large neural networks and is more stable than the CG method. Limited by the acquisition of datasets, for pneumonia detection based on CXRs, a network pre-trained on natural images is usually applied to the images through the transfer learning method, and then some targeted improvements are made [36]. Teja et al. [16] directly used the VGG-16 network pre-trained on the ImageNet dataset to classify COVID-19, non-COVID-19 and pneumonia patients, achieving 92.5% classification accuracy. Wang et al. [37] developed COVID-Net based on ResNet for non-infection, non-COVID-19 infection, and COVID-19 infection tripartite tasks; and opened up a large publicly available dataset called COVIDx that is constantly updated. AI-Waisy et al. [38] used ResNet34 and HRNet to detect COVID-19; and then fused these networks at the decision level to obtain the final classification results. However, these researchers only completed the normal and abnormal dichotomies. Li et al. [39] proposed COVID-GATNet based on DenseNet and Graph Attention Network (GAT), which was used to improve the classification accuracy by 1 percentage point compared with that of COVID-Net. Karthik et al. [40] performed lung segmentation through a pre-trained network and then completed classification using the proposed shuffled residual CNN. The methods of segmentation first and then classification cannot achieve end-to-end application. In addition, there are some methods based on feature extraction [41], [42], which are usually used to manually extract image features, such as texture, and then neural networks are used for classification. This method increases manual operations, and the results are no better than those obtained using neural networks directly. With the development of attention mechanisms, the most representative spatial soft attention mechanisms that are not involved in reinforcement learning are deformable convolutional networks [43] and self-attention mechanisms [44]. Lin et al. [45] proposed AANet based on these two attention mechanisms to identify normal, COVID-19 and other pneumonia. Zhang et al. [46] proposed obtaining the spatial probability distribution of output feature maps of each set of convolutions by SoftMax function, and then superimposing it on the original map through residual connections to generate an attention map; so that the network can focus on the lesion area. Inspired by this, Wang et al. [19] proposed the prior-attention residual learning (PARL) framework for detecting COVID-19 from CT scans. Through the multi-task learning method, attention was conducted between different tasks to further focus the network on the pneumonia lesion area. The prediction results of each task are concatenated together and then the final classification results of normal, COVID-19 and other viral pneumonia are output through a linear layer to improve the classification accuracy. However, this approach sacrifices the model’s ability to recognize normal samples to a certain extent. Previous approaches have mostly focused on the dichotomous task of distinguishing normal from abnormal or the tripartite task of adding other pneumonia, most of which are other viral pneumonia, bacterial pneumonia, or a mixture of the two. The CXR imaging features of other viral pneumonia and bacterial pneumonia are difficult to distinguish, so it is more meaningful to identify the two as separate categories. In our approach, the auxiliary branches are designed as PARL structures, and attention maps are transmitted from the branch of COVID-19 and other pneumonia to the branch of other viral pneumonia and bacterial pneumonia, so that the network focuses on the different regions between the three branches. The overall flow chart of the proposed method. Here, F1 and F2 are the output feature maps of the two branches within the auxiliary branch. If the input sample belongs to the COVID-19 category, F1 is concatenated with itself in the second dimension (feature channel dimension); otherwise, F1 is concatenated with F2, and then auxiliary features are obtained through the feature fusion layer formed by the 1 × 1 convolution. PF and AF are the primary features and the auxiliary features, and the predicted value is obtained by corresponding classifier respectively. The primary parameters are updated by loss backpropagation after the losses of the two branches are calculated and fused. When updating the auxiliary parameters, PF and AF are fused on the premise of keeping the normal sample feature maps unchanged; and then using the fusion feature for primary task classification to improve the model performance. Only the primary branch is required for testing.

Method

Brief introduction

The purpose of our study was to classify CXR images into four categories: normal, COVID-19, other viral pneumonia, and bacterial pneumonia. It is usually easy to distinguish normal results from abnormal results (COVID-19 pneumonia, other viral pneumonia, and bacterial pneumonia), so the auxiliary task is designed to identify the three types of pneumonia individually. The MBFAL method that we proposed is built based on ResNet34 and ResNet18 pre-trained on the ImageNet dataset. The primary branch uses the structure of ResNet34, and the auxiliary branch uses the last residual block group of two ResNet18 and the NDDR [23] feature fusion layer. The two branches share the feature maps before the last residual block group. It is easy to build and modify an arbitrary depth network with the residual structure, and the principle of layer depth selection is to achieve the optimal performance under the premise of the minimum number of parameters. The two residual blocks of the auxiliary branch were used to distinguish COVID-19 from other pneumonia and other viral pneumonia from bacterial pneumonia respectively. The PARL strategy is applied to obtain the attention map of the output feature map of the former through the SoftMax layer and then superimposed on the feature map of the latter to enhance the classification ability of the model. Finally, through label recognition, feature map fusion is carried out without changing the COVID-19 feature image. To fit the validation dataset as best as possible, the auxiliary branch is optimized at a certain number of iterations by a small part of the auxiliary data divided from the training set data [21]. In this process, the feature maps of the primary branch and the auxiliary branch are fused without changing the normal samples through label recognition. Because data are uniformly input, samples need to pass through the feature layer of the auxiliary branch one by one to determine whether they are normal samples. If so, the samples do not pass through the feature layer of the auxiliary branch. The purpose of this process is to prevent the interference of the auxiliary branch with the features of the normal samples, and to maintain the model’s ability to identify the normal and abnormal samples. However, the primary network identification process does not involve the selection of labels. The respective losses of the two tasks are fused through the Loss Combine Net. The overall process is shown in Fig. 2.

Fig. 2

The overall flow chart of the proposed method. Here, F1 and F2 are the output feature maps of the two branches within the auxiliary branch. If the input sample belongs to the COVID-19 category, F1 is concatenated with itself in the second dimension (feature channel dimension); otherwise, F1 is concatenated with F2, and then auxiliary features are obtained through the feature fusion layer formed by the 1 × 1 convolution. PF and AF are the primary features and the auxiliary features, and the predicted value is obtained by corresponding classifier respectively. The primary parameters are updated by loss backpropagation after the losses of the two branches are calculated and fused. When updating the auxiliary parameters, PF and AF are fused on the premise of keeping the normal sample feature maps unchanged; and then using the fusion feature for primary task classification to improve the model performance. Only the primary branch is required for testing.

Network structure

In our study, two networks were used: a multi-branch convolutional neural network for classification, and a multi-layer perceptron (MLP) for the fusion of primary and auxiliary losses during training. The primary and auxiliary parameters are updated separately by two optimizers. Classification network MBFAL is a multi-task structure; that is realized through a multi-branch network, which we call the primary branch and auxiliary branch. Auxiliary tasks are realized through the branch of the primary network, which can be regarded as a feature extractor of pneumonia samples. Then, the extracted features are fused with the features of the primary network before the classifier. The primary branch is ResNet34, and the auxiliary branch is a PARL structure composed of the last residual block group of two ResNet18, as shown in Fig. 3. The lower part of the network structure is the primary branch, and its parameters are called the primary parameters ; the upper part of the network structure is the auxiliary branch, and its parameters are called the auxiliary parameters Part 1 . The output of each branch is a two-dimensional array containing scores for each category: where, represents the primary branch and represents the auxiliary branch, and are shared features between tasks obtained through the network sharing part, as shown in Fig. 2.

Fig. 3

The predicted value of the network is mapped to the category probability distribution by SoftMax, and then Focal Loss [47] is used to calculate the loss value between the predicted value and the true label. The primary branch and the auxiliary branch are used to obtain the primary loss and auxiliary loss, respectively, and concatenate them together. Then, the loss combine network strategy proposed by Navon et al. [21] is used for the loss fusion. This network is a simple MLP. By introducing a nonlinear activation function and multiple mappings between linear layers to obtain the deep relationship between losses, more meaningful fusion results can be obtained: where, and represent the primary loss value and the auxiliary loss value respectively. These loss values are concatenated in the first dimension and then fused into a single loss value through the . Finally, represents the parameters of the . The combined loss process is shown in Fig. 4, and the parameters of the are called auxiliary parameter Part 2 , which is called auxiliary parameter when concatenated with and optimized as a whole.

Fig. 4

The loss fusion process and the structure of the . Softplus is a nonlinear activation function that introduces nonlinear properties into traditional MLP, making loss weighting nonlinear. As a result, using this process, it is more likely that optimal weighting results are found than when using simple linear weighting. There is also a residual structure in the Loss Combine Net, in which the primary loss is added to the fusion loss to emphasize the primary task and prevent the network from favor the auxiliary task.

The MBFAL structure. , , and constitute the primary network structure, whose parameters are primary parameters . The remaining part forms the auxiliary network structure, and its parameters are called auxiliary parameters Part 1 . The loss fusion process and the structure of the . Softplus is a nonlinear activation function that introduces nonlinear properties into traditional MLP, making loss weighting nonlinear. As a result, using this process, it is more likely that optimal weighting results are found than when using simple linear weighting. There is also a residual structure in the Loss Combine Net, in which the primary loss is added to the fusion loss to emphasize the primary task and prevent the network from favor the auxiliary task.

Updating parameters

The model has two parameters: the primary parameter and the auxiliary parameter . The performance of the model is influenced by the two parameters simultaneously. Network parameter updating depends on the gradient of the loss value with respect to the parameters. To fit the validation dataset as best as possible, the training data are further divided into a training set and an auxiliary set, and the two datasets are input into the network respectively to obtain two loss values, which are called the training loss and the auxiliary validation loss. Then, the two losses are used to calculate the gradient of the primary parameters and the auxiliary parameters respectively to update them. During this process, there is a bi-level optimization problem when the auxiliary validation loss is used to update the auxiliary parameters. That is, when the auxiliary validation loss takes the derivative of the auxiliary parameters, the auxiliary validation loss is affected by the primary parameters and auxiliary parameters. In addition, the primary parameters are affected by the auxiliary parameters. As a result, there is an implicit function relationship between the primary parameters and the auxiliary parameters: where, is the auxiliary validation loss; and is the optimal primary parameter, that is, the current primary parameter when the auxiliary parameter is updated. and can be easily obtained by direct derivation of the loss with respect to the parameters. The key point is how to calculate ; the inverse-Hessian matrix of the primary loss to primary parameters needs to be calculated [24], which is impossible in convolutional neural networks; but can be approximated by the Neumann series [48]: where, is the training loss, is the inverse-Hessian matrix of the primary parameter gradient, Eq. (7) represents the process of approximating the inverse-Hessian matrix with the Neumann series, and is the order of the Neumann Series. The overall process of network training is as follows: data are divided into a training set, an auxiliary set and a validation set. The training set is used to input the classification network to obtain the predicted values of the primary task and the auxiliary task, respectively calculate the loss value and input the Loss Combine Net to obtain the fusion loss value. Each iteration-is performed directly through backpropagation to achieve stochastic gradient descent to update the primary parameters. The auxiliary set is sent into the network every 30 iterations to obtain the auxiliary validation loss and training loss of the current iteration. The primary parameters are still updated directly, and the auxiliary parameters need to be manually updated through the gradient after gradually calculating the derivative of each part. This process is repeated until the loss value converges.

Experiments

We tested the model using a four-category dataset collected from Kaggle, and compared its performance with the auxiliary learning strategy using a separate network to generate meaningless auxiliary task labels (AL) [21], the multi-branched prior-attention residual learning (PARL) strategy [19], and the ResNet18 and ResNet34 baseline models.

Materials

The data used in this study were from two public datasets published by Kaggle [49], [50], [51], with a total of 21,057 cases, including 11,768 normal samples, 3674 COVID-19 samples, 2838 other viral pneumonia cases, and 2777 bacterial pneumonia cases. The specific data distribution is shown in Table 1.

Table 1

The data distribution.

	Normal	COVID-19	Other viral	Bacterial
COVID-19_Radiography_Dataset	10,192	3616	1345	–
CoronaHack -Chest X-ray-Dataset	2576	58	1493	2777

The data distribution. Experimental data distribution. The numeric value indicates the category number in the experiment. A large part of the normal category samples had a unique marker, while the other categories did not. This difference would interfere with the judgment of the model, and part of the COVID-19 data were CT images. After screening, the experimental data consisted of a total of 12,531 samples, including 3251 normal samples, 3665 COVID-19 samples, 2838 other virus samples, and 2777 bacterial samples. First, the data were divided into a training set and a validation set at a ratio of 9:1, and then a batch of data (set as 128 in the experiment) was taken from the training set as the auxiliary set. The auxiliary set was not divided in other methods. The specific distribution is shown in Table 2.

Table 2

Experimental data distribution. The numeric value indicates the category number in the experiment.

	Normal (0)	COVID-19 (1)	Other viral (2)	Bacterial (3)
All	3251	3665	2838	2777
Train (0.9)	2893	3261	2526	2471
Auxiliary	33	38	29	28
Validation (0.1)	325	366	283	278

Training

All CXR images were converted to 224 × 224 pixels before being sent to the network, and the ResNet18, ResNet34, auxiliary learning strategy, PARL network and MBFAL method were trained. There were 100 epochs of training with 128 data in each batch, with a total of 9400 iterations in our method and auxiliary learning strategy (auxiliary parameters updated every 30 iterations) and 9300 iterations in PARL, with an initial learning rate of 0.01. The primary parameters and auxiliary parameters were updated by two SGD optimizers, the cosine annealing learning rate scheduler, and the learning rate was changed by a cosine cycle in the training stage. In the methods based on the auxiliary method, the auxiliary set is input into the network every 30 iterations, and then the auxiliary parameters are updated by an implicit differentiation optimization strategy.

Results

During the process of updating auxiliary parameters, the proposed MBFAL involves the application of the Neumann series to approximate the inverse-Hessian matrix of the gradient of the network. In our study, the Neumann series with reasonable order for approximation is used since it is difficult to calculate the Neumann series of the -order. In general, the Neumann series with low order is considered due to the constraint of computational consumption. Table 3 shows the classification accuracy of the model and the average training time of each epoch under different orders of the Neumann series. It is clear that the computational consumption increases with the increase of the order. When the order of the Neumann series increases from one to two, the classification accuracy is significantly improved; however, when the order continues to increase, the classification accuracy of the model dose not improve. Therefore, considering both classification performance and computational consumption, the Neumann series with the second-order is selected to approximate the inverse-Hessian matrix for the proposed MBFAL.

Table 3

Comparison of model performance under different orders of the Neumann series.

	First-order	Second-order	Third-order	Fourth-order
Accuracy	94.96%	95.61%	95.23%	95.21%
Time (per epoch)	76.57 s	79.44 s	84.18 s	87.13 s

The performance of each model was evaluated by the total accuracy (T_acc), and the accuracy (Acc), recall rate (Rec), specificity (Spec), precision (Prec) and F1-score (F1) of each single class. The results are shown in Table 4. The indicators are calculated as follows: where, represents the total number of samples, and represents the th class. True positive (TP) represents the number of class samples that are correctly predicted to be class , and true negative (TN) represents the number of non-class samples that are not predicted to be class , false positive (FP) indicates the number of non-class samples predicted to be class , and false negative (FN) indicates the number of class samples not predicted to be class .

Table 4

Comparison of experimental results.

		ResNet50	ResNet101	ResNet18	ResNet34	AL	PARL	Our method
Total accuracy		92.73	92.92	92.41	92.65	95.21	94.24	95.61
Normal	Accuracy	97.50	97.90	97.70	98.40	98.70	98.40	98.70
	Recall	93.20	94.50	93.50	95.40	97.50	96.90	97.20
	Specificity	99.00	99.10	99.10	99.50	99.10	98.90	99.20
	Precision	97.10	97.50	97.40	98.40	97.50	96.90	97.80
	F1-score	95.10	96.00	95.40	96.90	97.50	96.90	97.50

COVID-19	Accuracy	98.20	98.60	98.10	98.70	99.40	99.00	99.10
	Recall	98.60	98.90	98.40	99.50	98.90	98.60	98.60
	Specificity	98.00	98.50	98.00	98.40	99.50	99.20	99.30
	Precision	97.10	97.50	97.40	98.40	97.50	96.90	97.80
	F1-score	96.90	97.70	96.80	97.90	98.90	98.30	98.50

Other viral	Accuracy	94.90	94.70	94.60	94.20	96.10	95.60	96.60
	Recall	94.70	95.10	94.30	94.00	94.70	88.70	96.10
	Specificity	94.90	94.60	94.60	94.30	96.50	97.60	96.70
	Precision	84.50	83.80	83.70	82.90	88.70	91.60	89.50
	F1-score	89.30	89.10	88.70	88.10	91.60	90.10	92.70

Bacterial	Accuracy	94.90	94.60	94.50	93.90	96.20	95.40	96.80
	Recall	82.40	81.30	81.30	79.10	88.10	91.00	89.20
	Specificity	98.50	98.50	98.30	98.20	98.60	96.70	99.00
	Precision	93.90	93.80	93.00	92.40	94.60	88.80	96.10
	F1-score	87.80	87.10	86.80	85.20	91.20	89.90	92.50

Comparison of model performance under different orders of the Neumann series. Comparison of experimental results. The ResNet deep model had no significant difference in recognition effect compared with the shallow model. Experiments showed that a small number of parameters were considered and relatively optimal performance was achieved using 18 34 structure in MBFAL. The total accuracy of MBFAL was better than that of the other models. Compared with PARL, MBFAL was able to better maintain the identification ability of normal samples and obtained the best pneumonia identification ability. The overall classification accuracy was improved by 0.4–3.3 percentage points compared with other models. The model with a higher recall rate had a better ability to avoid false negatives, and the model with higher precision had a better ability to avoid false positives. The F1-score is the balance of these two metrics. A confusion matrix was used to show the difference in recognition results more intuitively, as shown in Fig. 5, and the accuracy curve during training is shown in Fig. 6.

Fig. 5

Confusion matrices for five models.

Fig. 6

Training accuracy and validation accuracy. The convergence of the methods based on auxiliary learning is faster and better than the other three models, while MBFAL is slightly better than the simple auxiliary learning.

To measure the stability of model performance and exclude accidental cases, a ten-fold cross validation was performed, the average accuracy was 95.85%, and the confidence interval of deviation was [95.18–96.53], indicating that the performance of MBFAL on the experimental dataset has a certain degree of stability. The ROC curves for the entire dataset are drawn as shown in Fig. 7. Ablation studies were performed on the auxiliary learning network and PARL network, and both of them individually performed better than the baseline models. In addition, the performance of MBFAL combined with these two networks was further improved. To enhance the interpretability of the model, the Grad-CAM++ [52] method was used to draw class activation maps (CAMs) of the model, as shown in Fig. 8.

Fig. 7

ROC curves of all data obtained by ten-fold cross validation. The AUC values of Normal, COVID-19, Other Viral and Bacterial were 0.998, 0.999, 0.989 and 0.989, respectively.

Fig. 8

MBFAL CAM. The model determines the sample category by the red area in the CAM.

Confusion matrices for five models. Since most of the studies in other literature focused on the three classification identification tasks of normal, COVID-19 and other pneumonia, in our study, we treated other viral pneumonia and bacterial pneumonia as the same category for the three-classification experiment, and the overall accuracy of the model was 99.04%, which was better than other methods. The performance of the model is shown in Table 5.

Table 5

The results of the three classification tasks were compared with those of previous studies.

Methods	Images	Total	Other pneumonia
		Accuracy (%)	Sensitivity (%)	Precision (%)	F1-score (%)
PARL [19]	CT	89.67	95.50	82.68	88.63
COVID-Net [37]	CXR	93.33	94.00	91.30	92.60
COVID-GATNet [39]	CXR	94.33	95.10	91.30	93.10
DRE-Net [53]	CT	92.59	92.59	86.21	92.86
AANet [45]	CXR	95.00	93.00	93.00	93.00
Our Method	CXR	99.04	99.65	99.30	99.47

The results of the three classification tasks were compared with those of previous studies. Training accuracy and validation accuracy. The convergence of the methods based on auxiliary learning is faster and better than the other three models, while MBFAL is slightly better than the simple auxiliary learning. ROC curves of all data obtained by ten-fold cross validation. The AUC values of Normal, COVID-19, Other Viral and Bacterial were 0.998, 0.999, 0.989 and 0.989, respectively. MBFAL CAM. The model determines the sample category by the red area in the CAM.

Discussion

Our method is based on the strategy of auxiliary learning and validation set fitting using an auxiliary dataset; combined with the PARL structure and feature fusion strategy, to identify CXR images in normal, COVID-19, other viral pneumonia and bacterial pneumonia. The PARL separate application architecture obtained better results than those of the baseline model. Attention maps between branches made the model focus on the lesion area; and enhanced the sensitivity of the lesion type classification, but too much emphasis on pneumonia samples without dealing with any normal samples will inhibit the model’s ability to recognize normal samples to a certain extent. The strategy of using auxiliary learning to fit the validation set alone can greatly improve the overall recognition performance of the model, but the recognition performance of the three types of pneumonia is not optimal. Therefore, the PARL architecture was applied to auxiliary tasks to enhance the primary branch’s ability to recognize pneumonia samples in training. At the same time, in the auxiliary parameter update stage, the strategy of feature fusion, in which samples are selected through labels, can further improve the model’s ability to recognize pneumonia and prevent interference with normal samples as much as possible. The experimental results in Table 4 show that this strategy can be used to achieve a normal sample recognition level comparable to that of auxiliary learning, and obtain the optimal pneumonia recognition ability. Using this method to perform the three-classification recognition task can also be used to obtain better results compared with some other studies. The reason for using label information only for auxiliary parameter updates is that it is impossible for practical applications to know the sample category in advance. In fact, the model used a single ResNet34 network in the test phase. The proposed training strategy will inevitably make the model more inclined to pneumonia samples, and coupled with CXR images for early disease prediction performance is not obvious; therefore, the strategy will inevitably lose some recognition ability of normal samples. In addition, in the public dataset collected for this study, the samples of other viral pneumonia and bacterial pneumonia are relatively small, and most of them are from children. Although the data of the other two categories also exist in children, the proportion is relatively small. Therefore, the model tends to distinguish these two types of pneumonia from other types and confuse them easily. The next step should be to collect as much data as possible and further consider the implications of the age distribution of the data. In practical application, it should be considered that the processed image may be affected by uncertainties and inaccuracies, and the fuzzy image preprocessing should be carried out before applying the proposed algorithm to recognition [54]. The fuzzy image preprocessor based on geometric information proposed by Versace et al. [55] has a small amount of computation, so it can be combined with the proposed method in practical application to achieve the purpose of real-time application.

Conclusion

In this paper, we proposed an MBFAL strategy, which used an auxiliary learning strategy combined with the PARL architecture and feature fusion strategy to improve the model’s ability to recognize pneumonia types; while maximizing its ability to distinguish normal from abnormal images, to achieve multiple pneumonia recognition from CXR images. The network used ResNet18 and ResNet34 for architecture. In practical applications, only part of ResNet34 was needed, and the rest was a training strategy to assist in learning. Ablation studies showed that MBFAL is effective and has better results than other methods based on the same task in other studies, so it can be used as a rapid and effective auxiliary screening tool for pneumonia. In future work, we should consider collecting more data from more sources or exploring more effective data preprocessing techniques to improve the model generalization ability.

CRediT authorship contribution statement

Jia Liu: Conceptualization, Methodology, Software, Formal Analysis, Visualization, Writing – original draft. Jing Qi: Data curation, Resources. Wei Chen: Project administration, Validation. Yongjian Nian: Supervision, Funding acquisition, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

27 in total

1. Prior-Attention Residual Learning for More Discriminative COVID-19 Screening in CT Images.

Authors: Jun Wang; Yiming Bao; Yaofeng Wen; Hongbing Lu; Hu Luo; Yunfei Xiang; Xiaoming Li; Chen Liu; Dahong Qian
Journal: IEEE Trans Med Imaging Date: 2020-08 Impact factor: 10.048

2. Viral Pneumonia Screening on Chest X-Rays Using Confidence-Aware Anomaly Detection.

Authors: Jianpeng Zhang; Yutong Xie; Guansong Pang; Zhibin Liao; Johan Verjans; Wenxing Li; Zongji Sun; Jian He; Yi Li; Chunhua Shen; Yong Xia
Journal: IEEE Trans Med Imaging Date: 2021-03-02 Impact factor: 10.048

3. Multi-Task Learning for Dense Prediction Tasks: A Survey.

Authors: Simon Vandenhende; Stamatios Georgoulis; Wouter Van Gansbeke; Marc Proesmans; Dengxin Dai; Luc Van Gool
Journal: IEEE Trans Pattern Anal Mach Intell Date: 2022-06-03 Impact factor: 6.226

4. Deep Learning Enables Accurate Diagnosis of Novel Coronavirus (COVID-19) With CT Images.

Authors: Ying Song; Shuangjia Zheng; Liang Li; Xiang Zhang; Xiaodong Zhang; Ziwang Huang; Jianwen Chen; Ruixuan Wang; Huiying Zhao; Yutian Chong; Jun Shen; Yunfei Zha; Yuedong Yang
Journal: IEEE/ACM Trans Comput Biol Bioinform Date: 2021-12-08 Impact factor: 3.710

5. Diagnostic accuracy of C-reactive protein and procalcitonin in suspected community-acquired pneumonia adults visiting emergency department and having a systematic thoracic CT scan.

Authors: Josselin Le Bel; Pierre Hausfater; Camille Chenevier-Gobeaux; François-Xavier Blanc; Mikhael Benjoar; Cécile Ficko; Patrick Ray; Christophe Choquet; Xavier Duval; Yann-Erick Claessens
Journal: Crit Care Date: 2015-10-16 Impact factor: 9.097

6. Chest CT Findings in 2019 Novel Coronavirus (2019-nCoV) Infections from Wuhan, China: Key Points for the Radiologist.

Authors: Jeffrey P Kanne
Journal: Radiology Date: 2020-02-04 Impact factor: 11.105

7. Sensitivity of Chest CT for COVID-19: Comparison to RT-PCR.

Authors: Yicheng Fang; Huangqi Zhang; Jicheng Xie; Minjie Lin; Lingjun Ying; Peipei Pang; Wenbin Ji
Journal: Radiology Date: 2020-02-19 Impact factor: 11.105

8. Imaging Profile of the COVID-19 Infection: Radiologic Findings and Literature Review.

Authors: Ming-Yen Ng; Elaine Y P Lee; Jin Yang; Fangfang Yang; Xia Li; Hongxia Wang; Macy Mei-Sze Lui; Christine Shing-Yen Lo; Barry Leung; Pek-Lan Khong; Christopher Kim-Ming Hui; Kwok-Yung Yuen; Michael D Kuo
Journal: Radiol Cardiothorac Imaging Date: 2020-02-13

9. Learning distinctive filters for COVID-19 detection from chest X-ray using shuffled residual CNN.

Authors: R Karthik; R Menaka; Hariharan M
Journal: Appl Soft Comput Date: 2020-09-23 Impact factor: 6.725