Literature DB >> 34367873

A Novel Weighted Consensus Machine Learning Model for COVID-19 Infection Classification Using CT Scan Images.

Rohit Kumar Bondugula¹, Siba K Udgata¹, Nitin Sai Bommi¹.

Abstract

As COVID-19 has spread rapidly, detection of the COVID-19 infection from radiology and radiography images is probably one of the quickest ways to diagnose the patients. Many researchers found the necessity to utilize chest X-ray and chest computed tomography imaging to diagnose COVID-19 infection. In this paper, our objective is to minimize the false negatives and false positives in the detection process. Reduction in the number of false negatives minimizes community spread of the COVID-19 pandemic. Reducing false positives help people avoid mental trauma and wasteful expenses. This paper proposes a novel weighted consensus model to minimize the number of false negatives and false positives without compromising accuracy. In the proposed novel weighted consensus model, the accuracy of individual classification models is normalized. While predicting, different models predict different classes, and the sum of the normalized accuracy for a particular class is then considered based on a predefined threshold value. We used traditional Machine Learning classification algorithms like Linear Regression, Support Vector Machine, k-Nearest Neighbours, Decision Tree, and Random Forest for the weighted consensus experimental evaluation. We predicted the classes, which provided better insights into the condition. The proposed model can perform as well as the existing state-of-the-art technique in terms of accuracy (99.64%) and reduce false negatives and false positives. © King Fahd University of Petroleum & Minerals 2021.

Entities: Chemical

Keywords: COVID-19; Chest CT scan; Machine learning; Weighted consensus model

Year: 2021 PMID： 34367873 PMCID： PMC8327899 DOI： 10.1007/s13369-021-05879-y

Source DB: PubMed Journal: Arab J Sci Eng ISSN： 2191-4281 Impact factor: 2.807

Introduction

The World Health Organization (WHO) has declared the novel coronavirus (COVID-19) disease to be a pandemic and has raised public health concerns around the world. COVID-19 has been linked to 123.87 million confirmed cases and 2.72 million deaths as of the 17th of March, 2021. [1] COVID-19 is wide-spread and highly contagious which is transmitted directly from the infected people through direct contact and spreads indirectly through the air, surface, and the surroundings in which the infected persons come in contact [2]. The disease causes viral pneumonia in the lungs, resulting in acute respiratory problems and creating a lesion on the lungs. It also causes a variety of symptoms like fever, dry cough, headache, tiredness, loss of taste and smell and dyspnea [2-4]. Moreover, the COVID-19 spread is more worsened by the certainty that most of the infected people are having asymptotic symptoms [3]. Therefore, quickly diagnosing the infected person’s symptoms and quarantining them is crucial to curb the spread of the disease. The pandemic situation is affecting billions of people on a social, economic, and medical basis, creating dramatic changes in social relationships and educational environments and affecting many people’s lives. We cannot blame the doctors since they are responsible for many people and have few resources. However, we can assist or ease the burden on them by developing a model that predicts whether a person is potentially positive or negative [5-8]. The healthcare industry is looking for advanced technologies that can monitor, detect, and diagnose infection and quickly control the COVID-19 pandemic spread. Internet of Medical Things(IoMT) is one such sophisticated technology that can monitor people by crowd screening, tracking, notifying, and detecting the virus and controlling the spread through contract tracing and alerting the healthcare authorities [5]. In today’s medical practice, there are two primary types of diagnosis. The nasopharyngeal swab is used in real-time RT-PCR. The second category is imaging techniques, with CT scans outperforming chest X-rays. According to studies, chest CT is faster and more sensitive than the PCR process [9]. COVID-19 is often diagnosed with RT-PCR and serological testing [10]. However, these tests are difficult to conduct due to a lack of resources and qualified staff, particularly in late-stricken areas (e.g., Africa and Latin America). Furthermore, the sensitivity of PCR can be low [9, 11, 12]. Therefore, alternative methods to quickly diagnose the COVID-19 infection are crucially needed. Detecting the disease at an early stage and instantaneously quarantining the person is vital to stop the disease’s outspread because of the unavailability of the vaccine. The Chinese government announced that the diagnosis of the infection can be verified through RT-PCR [9]. However, RT-PCR takes more time for test and suffers from high false-negative [9, 13–16]. In this present pandemic situation, the low sensitivity of the RT-PCR cannot always be accepted. In a few cases, the infected people cannot get treatment on time, as it may not detect correctly. The infected people then may spread the infection to healthy people. It is noticed from the clinical reports of people who are infected that there are bilateral changes in Chest X-Ray and Chest CT scan images [13]. Hence, chest CT scan and X-Ray images are utilized as a substitute device to detect COVID-19 infection due to high sensitivity [3]. This paper’s main objective is to perform the classification of the COVID-19 patients using the Chest CT scan images such that the false negatives are minimum or false positives are minimum depending on the requirement. We used the machine learning classification models to detect the COVID-19 infection using the CT scan in the proposed work. We propose the novel weighted consensus model where the image passes through the models governed by predefined rules during the current situation to reduce or minimize community spread and save people from false negatives. The remainder of the paper is laid out as follows: Sect. 2 discusses the literature review in the area of COVID-19 classification. In Sect. 3, the proposed methodology of the classification model is discussed. In Sect. 4, a detailed explanation about the experimental setup is discussed, which is followed by the results and discussions in Sect. 5. Finally, in Sect. 6, we discuss the conclusions and future scope of the work.

Literature review

Recently, many researchers have appraised the medical imaging patterns and lesions on chest CT scan and chest X-ray for detecting the COVID-19 [17-26]. The Artificial Intelligence and radiology imaging of COVID-19 can be handy for accurate and timely diagnosis of disease [27]. Fang et al. [24] have studied the sensitivity of the chest CT scan and RT-PCR. Xie et al. [25] reported that the COVID-19 diagnosis was true negative for over 3% of the cases in the sample of 167 patients using RT-PCR. The sensitivity of the chest CT scan for COVID-19 infection detection is high compared to the RT-PCR based on the symptoms and travel history analysis of the patients [25]. From the clinical reports of people who are infected, it is observed that there are bilateral changes in CT scan images [13]. Therefore, a chest CT scan is used to diagnose the disease due to high sensitivity [3]. Yu-Dong Zhang et al. [28] proposed the DesneNet-OTLS method, which outperformed most of the state-of-the-art approaches in COVID-19 diagnosis. COVID-Net model [29] was developed to detect COVID-19 positive cases from chest radiography images which can achieve 80% sensitivity. Kermany et al. [30] used the ConvNet model for Chest X-ray and got training accuracy of 95.21% and validation accuracy of 95.31%. Xu et al. [13] employed a CNN model which differentiates COVID-19 pneumonia and viral pneumonia with maximum accuracy of 86.7%. Wang et al. [14] used the CT images of infected patients and analyzed the radiographic changes. They developed a model that used the amended inception transfer learning technique with an accuracy of 89.5%. The extracted features from CT images are used for prior diagnosis. This method can diagnose faster and also performs better compared to Xu’s model [13]. Qianqian Ni et al. [31] used a deep learning approach to identify COVID-19 pneumonia in chest CT images. Rajaraman et al. [32] used a weakly-labeled data augmentation strategy on COVID-19 chest X-ray images. Novitarci DCR et al. [33] combined SVM and CNN to detect COVID-19 from chest X-ray images. Khalifa et al. [34] used generative adversarial networks and transfer learning models like ResNet18, AlexNet and SqueezNet to classify COVID-19. Ozturk et al. [35] employed DarkNet on Chest X-ray images for the binary classification and multi-class classification with accuracy of 98.08% and 87% respectively. Narin et al. [16] proposed DCNN-based transfer models for diagnosis of COVID-19 using the chest X-ray images. They have employed Inception-ResNetV2, InceptionV3, and ResNet50 models for good prediction. The latter model gave an accuracy of 98%, that is the so far better result for chest x-ray [13, 14]. Yu-Dong Zhang et al. [36] proposed a novel deep learning model that can diagnose COVID-19 on chest CT more accurately with a sensitivity of 93.28%, a specificity of 94.00%, and with an accuracy of 93.64%. Zhang [37] also proposed a novel seven-layered CNN-based innovative diagnosis model which is effective in detecting the COVID-19 in chest CT images and achieves a sensitivity of 94.44%, a specificity of 93.63%, and an accuracy of 94.03%. Maior et al. [38] performed an analysis on chest X-ray images combining six different databases from open datasets to determine images of infected patients while distinguishing COVID-19 and pneumonia from ‘no-findings’ images. Saba et al. [39] proposed six models for the tissue characterization and classification of COVID-19 with pneumonia and achieved better results. Qian Lie et al. [40] integrated an image prepossessing technique for anomaly detection with supervised deep learning models for chest CT scan based COVID-19 diagnosis. Menendez et al. [41] developed a web application COVID-19 TRAINING, for training and diagnosis of COVID-19 chest x-ray. Guangyu Guo et al. [42] proposed IE-Net to eliminate the influence of the varied dimensions and diagnose the COVID-19 cases. IE-Net achieves 92.79% recall, 94.80% accuracy, 92.97% precision and 94.93% AUC for diagnosing COVID-19 cases from non-COVID-19. Umit Budak et al. [43] SegNet-based network model, which used the attention gate mechanism for the automatic segmentation of COVID-19 lesions in CT images which achieved sensitivity 92.73%, specificity 99.51%, and dice scores 89.61% respectively. In the VSBN model, Wang et al. [44] proposed a novel VGG-style base network as the backbone network and a convolutional block attention module as the attention module. The model’s sensitivity, accuracy, and F1 per class were all above 95%. From the comprehensive review, it has been noticed that for early diagnosis of COVID-19 patients, chest X-ray and CT images can be used [45]. Therefore, in this paper, machine learning models are used to classify COVID-19 patients from CT images.

Research gaps in the existing literature

Although many researchers have contributed significantly to this research domain, we still found some gaps in the work. While discussing with the medical practitioners and healthcare front line workers, the following shortcomings in the literature are highlighted, and those are the following; Most of the work is focusing on maximizing the accuracy of their proposed method. Accuracy, though, is a crucial performance evaluation parameter but can not be the only parameter. Few works also focused on the model’s training time and testing time and tried to reduce the classification time without compromising accuracy. The existing works are not tuned to address the changing pattern of the COVID-19 spread. No existing work focuses on minimizing the false negatives or false positives without compromising on the accuracy of the model The medical practitioners do not appreciate the existing models as the practitioners are least bothered about the statistical accuracy but more concerned about false negatives or false positives depending on the situation.

Contributions of the present work

After a thorough review of the existing works and identifying the gaps in these works, we designed a model to address the gaps. We developed a more acceptable and realistic model. The main contributions are the following: We introduced a novel weighted consensus model intending to lower the number of false negatives and false positives while maintaining accuracy. The proposed model uses the best performing architecture together with a consensus algorithm to enhance the accuracy. The proposed WCM model will also work for limited data samples as data augmentation technique can be used. The proposed model is supposed to be accepted by the medical practitioners as it is designed according to their requirements. The proposed model can also minimize false negatives or false positives without compromising each other much. This is possible by adaptive fine-tuning of the threshold values of the individual models used.

Proposed method

We used traditional Machine Learning classification algorithms to train the images. Five popular algorithms were used for classification which are described as follows.

Logistic regression

Logistic regression uses a logistic function that produces an output in the range [0, 1]. This algorithm is widely used to differentiate two classes linearly. It is an extension of linear regression with bounded output. The probability is the estimated output of the hypothesis.

Support vector machine

SVM’s goal is to find a hyperplane in n-dimensional space that divides different categories. There are numerous ways to create a hyperplane that separates different groups. On the other hand, SVM attempted to optimize the distance between the hyperplane and the data points.

K-nearest neighbour

A non-parametric algorithm stores the input data and finds the difference between the input data and the data to be tested. The model then assigns a class based on the mode of the k nearest samples. When the input data is so huge, it becomes computationally expensive as it has to find the difference between every input and the test data.

Decision tree

This algorithm uses a tree-like structure to make decisions based on the input. It only contains conditional control statements. The model is prone to over-fitting as it tries to make conditions for every type of input.

Random forest

As the decision tree is prone to over-fitting, we try to generalise by constructing multiple decision trees and then considering the mode of them as the output class.

Proposed novel weighted consensus method

To ensure reliability and robustness of the prediction, we used five models as the base. Similar to the analogy where we consult another doctor for a second or a third opinion and then use the weightage of the suggestions given by different doctors, we also use five models performing at human-level accuracy (consulting five doctors) and then use the weightage of each prediction to finally declare the outputs. The image is first passed through all five models and the predictions of each model are saved. Now all the models’ accuracies are summed and the weightage of each model is found by dividing the model’s accuracy by the total accuracy (normalizing weights). This ensures that the weighted accuracies sum to 1. This gives the weightage of the model among the five models. If a model has high accuracy, the model also carries much weightage. Once the weightage and the individual models’ predictions are found, to predict if an image belongs to a class or not, we sum all the normalized weights predicting that class and consider the class where the weightage is maximum. Since we calculate the class with the maximum threshold, we do not concentrate on a single class. The pseudo-code for the above explanation is given by Algorithm 1. To have better control over the number of FPs and FNs, we set a threshold value and then decide if the image belongs to that class or not. As the threshold goes higher, for the image to be predicted as the main class, more individual models have to predict it as the main class. This ensures that even if a model mispredicts an image, there are other models whose weights are considered in classifying the image. Here we are mainly focusing on a single image by setting a threshold value. If it is below the value, we can declare that the image does not belong to the wanted class. Then we use the maximum threshold algorithm on the other two classes for the final output. The pseudo-code is given by Algorithm 2. In algorithm 1, if two classes get the same model weightage, we can take a call of class precedence. Since FN of covid positive is dangerous than FP, priority is given to covid positive (class 2). The flow of data and the model are presented in Fig. 1.

Fig. 1

Block diagram of the weighted consensus model

Experiment design

Dataset description

The HUST-19 benchmark CT Scan dataset [46] was used in our experiment. They divided CT images into three categories: They manually labeled 19685 CT slices, which we trained using the three classes. We used 4001 pCT, 9979 nCT, and 5705 NiCT scan images to train the models. The number of image samples used in work is compared with the base paper in 1.

Table 1

Data statistics of the base paper and our data

	Base paper	Our data
Positive	4001	4001
Negative	9979	9979
Non-informative	5705	5705
Total	19685	19685

non-informative CT (NiCT) images, in which the lung parenchyma was not captured for any decision, positive CT (pCT) images, in which imaging features associated with COVID-19 pneumonia could be unambiguously discerned, and negative CT (nCT) images, in which imaging features in both lungs were irrelevant to COVID-19 pneumonia. The distribution of data is visualized in Fig. 2 and some sample images are shown in Fig. 3.

Fig. 2

Distribution of proportion of classes in the dataset

Fig. 3

Visualization of positive, negative and non-informative classes of the COVID-19 CT scan

Each image is loaded and resized into (150, 150) pixels to speed up training. If the images are loaded with higher resolution, the computational cost exceeds and if the images are loaded with lower resolution, the model may not capture the essential features present in the dataset. The trade-off between computational cost and accuracy is balanced for better results. The images are loaded with three channels and after resizing, the total shape of an image is 67500. Since the images do not present any RGB visuals to our naked eye, we converted the channels to 1 by loading the images as gray-scale images. Thereby saving space and speeding up computations. Data statistics of the base paper and our data While training, the traditional machine learning models expect a 2D array. So the images are flattened and sent as input. Distribution of proportion of classes in the dataset Visualization of positive, negative and non-informative classes of the COVID-19 CT scan

Encoding the dataset labels

The dataset contains 3 classes namely pCT (positive), nCT (negative) and NiCT (non-informative). Since the mathematical models cannot infer textual labels, they are encoded into numerical values. The order of labels is not mandatory as the label is a dependent variable. In the database, The total number of samples is 19685, out of which 4001 samples are labeled as positive, 9979 are labeled as negative and 5705 are non-informative.

Splitting the dataset for training and testing

The dataset is preprocessed and split into training and test sets in the ratio 9:1 as shown in the Table 2. Since there are many images, evaluating the performance on 1000-2000 images is optimal. After dividing the dataset, the test set is not modified and used to evaluate all the models.

Table 2

Dataset samples after splitting into training and testing

Training set	Testing set
17716	1969

Dataset samples after splitting into training and testing

Data normalization

Images are made up of pixel value matrices. Color images include a separate array of pixel values for each color channel, such as red, green, and blue, whereas black and white images have a single matrix of pixels. Pixel values are frequently unsigned integers in the 0 to 255 range. Although these pixel values can be directly provided to neural network models in their raw format, this can cause problems like slow training. Instead, preprocessing the image pixel values before modeling, such as simply scaling pixel values to the range 0-1, might be quite beneficial. So we applied the min-max normalization technique to all the pixels in the image. This accelerates the training process and brings all the pixel data under a common scale.

Experimental results and analysis

To evaluate the performance of a classification model, several metrics such as classification accuracy, precision, F1-score, sensitivity, and specificity are used. We computed the metrics at different thresholds to observe the percentage of correctly classified classes and select the suitable one. The percentage of weightage of each Machine Learning algorithm and accuracy of all the models with training time and testing time The training performance is evaluated using the following different performance metrics for each of the classes. The overall accuracy for all the classes is calculated as defined in Eqs. 1–6.where TP—The original class is positive, as expected by the model. FP—While the initial class was negative, the model expected a positive outcome. TN—The original class is negative, as expected by the model. FN—While the original class is positive, the model predicts a negative outcome. The models are evaluated on 1969 images. Since medical images are to be predicted, traditional performance metrics like accuracy alone are not enough. So we recorded the results over a wide range of metrics.

Execution time for training and testing

The models are trained and tested on the processor Intel Core i7-8700 CPU @ 3.20GHz*12, 16GB of RAM. The time taken to train 17716 images and test 1969 images are recorded and shown in Table 3. The proposed model relies on the 5 base models. So the training time is the sum of training times of all the models.

Table 3

The percentage of weightage of each Machine Learning algorithm and accuracy of all the models with training time and testing time

Models	Weightage	Accuracy	Training time	Testing time
	(in%)	(in%)	(in s)	(in s)
Logistic regression	20.109	99.594	1569.287	0.135
SVM	20.109	99.594	983.467	146.367
kNN	20.068	99.391	37.807	689.590
Decision tree	19.606	97.105	283.367	0.169
Random forest	20.109	99.594	52.387	0.141
Weighted consensus	–	99.645	–	0.413 (per sample)

Block diagram of the models’ weights

Model weightage and accuracy

As proposed, the models’ accuracies are normalized and weights are calculated. The weights, along with accuracies, are shown in Table 3. The distribution of weights can be visually seen in Fig. 4.

Fig. 4

Block diagram of the models’ weights

Sensitivity () and Specificity () values of Weighted Consensus Model of the three classes with different threshold values for the CT Scan Image dataset The overall accuracy (99.645) is found to be more than the individual models’ accuracies. This highlights that even if one model mispredicts a test sample, other models collectively add to the correct weightage, thereby classifying the sample correctly. Performance report of the Machine learning models for classification of CT scan images

Sensitivity and specificity analysis of results

Sensitivity is a metric that calculates the number of correctly defined positive groups (i.e., the proportion of people that have a disease (affected) who are correctly identified as having the condition). Specificity, on the other hand, is a measure of how many negative groups were correctly defined. The sensitivity and specificity values are recorded at different thresholds ranging from 0.1 to 0.9. See Table 4. The optimal threshold values vary for different classes. If the base models were to be considered alone, a default threshold of 0.5 might have been chosen. We have great control over the hyperparameter (threshold) with different threshold values and can be used under different circumstances. The optimal threshold was found to be 0.6 for nCT, 0.4 for NiCT and 0.5 for pCT. (The optimal values may be from one’s perspective).

Table 4

Sensitivity () and Specificity () values of Weighted Consensus Model of the three classes with different threshold values for the CT Scan Image dataset

	nCT		NiCT		pCT
Threshold	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_1$$\end{document}C1	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_2$$\end{document}C2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_1$$\end{document}C1	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_2$$\end{document}C2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_1$$\end{document}C1	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_2$$\end{document}C2
0.1	1	0.97	1	0.98	0.99	0.99
0.2	1	0.99	1	0.99	0.99	0.99
0.3	0.99	0.99	1	0.99	0.99	0.99
0.4	0.99	0.99	1	0.99	0.99	0.99
0.5	0.99	0.99	0.99	0.99	0.99	1
0.6	0.99	1	0.99	0.99	0.99	1
0.7	0.99	1	0.99	0.99	0.98	1
0.8	0.99	1	0.98	0.99	0.98	1
0.9	0.97	1	0.96	0.99	0.96	1

Confusion matrix analysis of COVID-19 chest CT scan

Classification analysis reports

The overall performance report containing different basic metrics like precision, recall, F1-score, support is shown in Table 5. The table gives a clear picture of the performance of the individual models, which forms the basis for choosing them in the proposed weighted consensus model. The three classes 0, 1 and 2 correspond to negative, non-informative and positive classes, respectively.

Table 5

Performance report of the Machine learning models for classification of CT scan images

Models		Precision	Recall	F1 score	Support
	Class	Accuracy	Accuracy	Accuracy
		(in%)	(in%)	(in%)
Logistic regression	0	1.00	1.00	1.00	982
	1	0.99	1.00	0.99	562
	2	1.00	0.99	1.00	425
	Accuracy	–	–	1.00	1969
	Macro avg	1.00	1.00	1.00	1969
	weighted avg	1.00	1.00	1.00	1969
SVM	0	1.00	0.99	1.00	982
	1	0.99	1.00	0.99	562
	2	1.00	1.00	1.00	425
	Accuracy	–	–	1.00	1969
	Macro avg	1.00	1.00	1.00	1969
	weighted avg	1.00	1.00	1.00	1969
k-NN	0	1.00	0.99	1.00	982
	1	0.98	0.99	0.99	562
	2	1.00	1.00	1.00	425
	Accuracy	–	–	0.99	1969
	Macro avg	0.99	0.99	0.99	1969
	weighted avg	0.99	0.99	0.99	1969
Decision tree	0	0.98	0.98	0.98	982
	1	0.97	0.98	0.97	562
	2	0.97	0.97	0.97	425
	Accuracy	–	–	0.97	1969
	Macro avg	0.97	0.97	0.97	1969
	weighted avg	0.97	0.97	0.97	1969
Random forest	0	0.99	1.00	1.00	982
	1	0.99	0.99	0.99	562
	2	1.00	0.99	0.99	425
	Accuracy	–	–	0.99	1969
	Macro avg	1.00	0.99	0.99	1969
	weighted avg	0.99	0.99	0.99	1969
Weighted consensus	0	1.00	1.00	1.00	982
	1	0.99	1.00	0.99	562
	2	1.00	1.00	1.00	425
	Accuracy	–	–	1.00	1969
	Macro avg	1.00	1.00	1.00	1969
	weighted avg	1.00	1.00	1.00	1969

We used scikit-learn API to generate the classification report. By default, scikit-learn rounds off to 2 decimal places. We achieved accuracies close to 99.5% in Logistic regression, SVM and Random Forest models. So the accuracy metrics in the classification reports show 1.00. Support is the number of actual occurrences of the class in the specified dataset. The support for different classes is close to the number of testing samples for that class. This is obvious by the accuracy, precision and recall. Comparison of three classes with base paper [46] and the proposed model

Confusion matrices of the classification results

The confusion matrices of individual models on test images are shown in Fig. 5. We adaptively chose threshold values to capture the best confusion matrices for each model using optimum sensitivity and specificity values from Table 5.

Fig. 5

Confusion matrix analysis of COVID-19 chest CT scan

False positives and false negatives observed during classifications

The main goal of this work is to reduce the number of FPs or FNs while taking into account the trade-off. Therefore we computed the number of FNs and FPs at the end of each stage to show the efficacy of our proposed model. Tables 7, 8 and 9 show the FNs and FPs of different classes predicted on the weighted consensus model.

Table 7

False negatives and False positives of nCT CT Scan

Threshold	False positives	False negatives
0.1	23	0
0.2	8	0
0.3	6	2
0.4	4	3
0.5	2	4
0.6	0	6
0.7	0	7
0.8	0	8
0.9	0	28

Table 8

False negatives and False positives of NiCT CT Scan

Threshold	False positives	False negatives
0.1	24	0
0.2	10	0
0.3	9	0
0.4	8	0
0.5	5	1
0.6	4	3
0.7	3	4
0.8	1	7
0.9	1	20

Table 9

False negatives and False positives of pCT CT Scan

Threshold	False positives	False negatives
0.1	14	1
0.2	2	1
0.3	1	1
0.4	1	1
0.5	0	2
0.6	0	4
0.7	0	5
0.8	0	5
0.9	0	13

As shown in Table 7 for nCT scan, the FPs decrease as the threshold increases. This indicates that an image will be classified as positive only if it crosses that threshold value. This will reduce the chances of falsely predicting positive values. Similarly, FNs increase as the threshold increases. The optimal threshold can be taken with different thresholds and their corresponding FPs and FNs, depending on the situation.

Results and discussions

In this paper, we propose a new weighted consensus model based on five machine learning classifiers, including Logistic Regression, SVM, KNN, Decision Tree, and Random Forest, to accurately predict classes while reducing false positives and false negatives. In the CT Scan medical data collection, we can tune the model to predict at different thresholds in three different groups, such as nCT, NiCT, and pCT. False negatives and False positives of nCT CT Scan False negatives and False positives of NiCT CT Scan False negatives and False positives of pCT CT Scan In the nCT CT Scan class from Table 7, it can be observed that as we increase the threshold value, FP decreases, and in contrast to it, FN increases. Finally, at 0.5, we got significant values of FP and FN. Similarly, the effect of threshold values are shown for NiCT and pCT CT Scan images in Table 8, Table 9 respectively. Similar behavior is observed in both cases. In all the three classes (nCT CT Scan, NiCT CT scan and pCT CT Scan), the FP numbers decrease and FN numbers increase with an increase in threshold values. At a certain threshold value of 0.5, FP and FN numbers are observed to be minimum. We considered this threshold value for NiCT and pCT classes. Eventually, we got significant FP numbers and FN numbers on all three classes of CT scan data set at 0.5. Therefore, we conclude that the threshold value can be chosen to be 0.5 for this study. The overall performance of the proposed algorithm in terms of sensitivity and specificity values corresponding to all three classes of CT scan dataset for different threshold values are reported in Table 4. For evaluating classification algorithms and models, apart from accuracy, log-loss is one of the most widely used metrics as it imposes a significant loss on wrong predictions. We found the log-loss for the Weighted Consensus model on the test set to be 0.1227 which is considered as good in all standard literature. This emphasizes the robustness and reliability of the proposed model. With an accuracy of 99.645% and prediction time of 0.413 seconds per sample, the model is highly robust, promising and can be deployed for instant predictions. HUST-19 [46] achieved an AUC value of 0.994 in distinguishing NiCT images from pCT and nCT images; and an AUC value of 0.991 in predicting pCT images for image-based prediction. The proposed weighted consensus model performed better with higher AUC scores for all “one-vs-rest” classes compared to Table 6.

Table 6

Comparison of three classes with base paper [46] and the proposed model

Class	HUST-19(AUC)	WCM (AUC)
Positive versus (negative and non-informative)	0.991	0.9976
Negative versus (positive and non-informative)	—	0.9970
Non-informative versus (positive and negative)	0.994	0.9973
Average	—	0.9973

The base paper [46] used HUST-19 to predict whether an image is COVID-19 positive, negative, or non-informative, with an AUC of 0.919. However, the weighted consensus model was able to perform with an accuracy rate of 0.996 and an average AUC score of 0.997. Therefore, under experimental conditions, the proposed weighted consensus algorithm provides more reliable results by outperforming the existing results.

Conclusions and future scope

This paper presented a weighted consensus model for classifying and identifying possible COVID-19 infection from CT scan images with outstanding accuracy. The proposed model performs as good as the existing best methods in terms of accuracy. Still, it is also quite fast as we normalized the images. The novel proposed method can minimize the false negatives and false positives depending on the requirements. This model will control the spread of infection by minimizing false negatives and reducing patients’ mental trauma by minimizing false positives when the situation improves. In the future, we want to extend this model to include continuous and periodic feedbacks to improve efficiency and make the model more robust. We are also collecting data locally from the hospitals and will train the model for better accuracy and robustness, which will be more acceptable in local conditions. Besides, we plan to use this proposed model to identify other diseases and develop this as a more general model.

32 in total

1. Clinical Characteristics of 138 Hospitalized Patients With 2019 Novel Coronavirus-Infected Pneumonia in Wuhan, China.

Authors: Dawei Wang; Bo Hu; Chang Hu; Fangfang Zhu; Xing Liu; Jing Zhang; Binbin Wang; Hui Xiang; Zhenshun Cheng; Yong Xiong; Yan Zhao; Yirong Li; Xinghuan Wang; Zhiyong Peng
Journal: JAMA Date: 2020-03-17 Impact factor: 56.272

2. Automatic detection of coronavirus disease (COVID-19) using X-ray images and deep convolutional neural networks.

Authors: Ali Narin; Ceren Kaya; Ziynet Pamuk
Journal: Pattern Anal Appl Date: 2021-05-09 Impact factor: 2.580

3. Eliminating Indefiniteness of Clinical Spectrum for Better Screening COVID-19.

Authors: Guangyu Guo; Zhuoyan Liu; Shijie Zhao; Lei Guo; Tianming Liu
Journal: IEEE J Biomed Health Inform Date: 2021-05-11 Impact factor: 7.021

4. Imaging Profile of the COVID-19 Infection: Radiologic Findings and Literature Review.

Authors: Ming-Yen Ng; Elaine Y P Lee; Jin Yang; Fangfang Yang; Xia Li; Hongxia Wang; Macy Mei-Sze Lui; Christine Shing-Yen Lo; Barry Leung; Pek-Lan Khong; Christopher Kim-Ming Hui; Kwok-Yung Yuen; Michael D Kuo
Journal: Radiol Cardiothorac Imaging Date: 2020-02-13

5. Automated detection of COVID-19 cases using deep neural networks with X-ray images.

Authors: Tulin Ozturk; Muhammed Talo; Eylul Azra Yildirim; Ulas Baran Baloglu; Ozal Yildirim; U Rajendra Acharya
Journal: Comput Biol Med Date: 2020-04-28 Impact factor: 4.589

6. Six artificial intelligence paradigms for tissue characterisation and classification of non-COVID-19 pneumonia against COVID-19 pneumonia in computed tomography lungs.

Authors: Luca Saba; Mohit Agarwal; Anubhav Patrick; Anudeep Puvvula; Suneet K Gupta; Alessandro Carriero; John R Laird; George D Kitas; Amer M Johri; Antonella Balestrieri; Zeno Falaschi; Alessio Paschè; Vijay Viswanathan; Ayman El-Baz; Iqbal Alam; Abhinav Jain; Subbaram Naidu; Ronald Oberleitner; Narendra N Khanna; Arindam Bit; Mostafa Fatemi; Azra Alizad; Jasjit S Suri
Journal: Int J Comput Assist Radiol Surg Date: 2021-02-03 Impact factor: 3.421

7. Convolutional neural network model based on radiological images to support COVID-19 diagnosis: Evaluating database biases.

Authors: Caio B S Maior; João M M Santana; Isis D Lins; Márcio J C Moura
Journal: PLoS One Date: 2021-03-01 Impact factor: 3.240

8. Developing a Training Web Application for Improving the COVID-19 Diagnostic Accuracy on Chest X-ray.

Authors: P Menéndez Fernández-Miranda; P Sanz Bellón; A Pérez Del Barrio; L Lloret Iglesias; P Solís García; F Aguilar-Gómez; D Rodríguez González; J A Vega
Journal: J Digit Imaging Date: 2021-03-08 Impact factor: 4.056

9. AI-Driven Tools for Coronavirus Outbreak: Need of Active Learning and Cross-Population Train/Test Models on Multitudinal/Multimodal Data.

Authors: K C Santosh
Journal: J Med Syst Date: 2020-03-18 Impact factor: 4.460

10. Classification of COVID-19 patients from chest CT images using multi-objective differential evolution-based convolutional neural networks.

Authors: Dilbag Singh; Vijay Kumar; Manjit Kaur
Journal: Eur J Clin Microbiol Infect Dis Date: 2020-04-27 Impact factor: 3.267