Literature DB >> 34230740

Deming least square regressed feature selection and Gaussian neuro-fuzzy multi-layered data classifier for early COVID prediction.

Rathnamma V Mydukuri¹, Suresh Kallam², Rizwan Patan³, Fadi Al-Turjman⁴, Manikandan Ramachandran⁵.

Abstract

Coronavirus disease (COVID-19) is a harmful disease caused by the new SARS-CoV-2 virus. COVID-19 disease comprises symptoms such as cold, cough, fever, and difficulty in breathing. COVID-19 has affected many countries and their spread in the world has put humanity at risk. Due to the increasing number of cases and their stress on administration as well as health professionals, different prediction techniques were introduced to predict the coronavirus disease existence in patients. However, the accuracy was not improved, and time consumption was not minimized during the disease prediction. To address these problems, least square regressive Gaussian neuro-fuzzy multi-layered data classification (LSRGNFM-LDC) technique is introduced in this article. LSRGNFM-LDC technique performs efficient COVID prediction with better accuracy and lesser time consumption through feature selection and classification. The preprocessing is used to eliminate the unwanted data in input features. Preprocessing is applied to reduce the time complexity. Next, Deming Least Square Regressive Feature Selection process is carried out for selecting the most relevant features through identifying the line of best fit. After the feature selection process, Gaussian neuro-fuzzy classifier in LSRGNFM-LDC technique performs the data classification process with help of fuzzy if-then rules for performing prediction process. Finally, the fuzzy if-then rule classifies the patient data as lower risk level, medium risk level and higher risk level with higher accuracy and lesser time consumption. Experimental evaluation is performed by Novel Corona Virus 2019 Dataset using different metrics like prediction accuracy, prediction time, and error rate. The result shows that LSRGNFM-LDC technique improves the accuracy and minimizes the time consumption as well as error rate than existing works during COVID prediction.

Entities: Chemical

Keywords: classification; coronavirus disease; feature selection; fuzzy technique

Year: 2021 PMID： 34230740 PMCID： PMC8250320 DOI： 10.1111/exsy.12694

Source DB: PubMed Journal: Expert Syst ISSN： 0266-4720 Impact factor: 2.812

INTRODUCTION

The occurrence of a new coronavirus called SARS‐CoV‐2has infected many people worldwide. COVID‐19 pandemic has resulted in more than 193,825 deaths in the last few months. Many researchers introduced different techniques to predict coronavirus before becoming worse. A long short‐term memory (LSTM) method was introduced in Tomar and Gupta (2020) for COVID‐19 prediction. The prediction of the number of positive cases and the number of recovered cases was more accurate in a certain range. Though the prediction accuracy was improved, time consumption was not minimized by the data‐driven LSTM method. A new fuzzy assisted system was designed in Adwibowo (2020) Srivastava et al. (2020) to identify the dental care safety concerning the patient and environmental conditions. The fuzzy assisted system allocated the estimation depending on patient body temperature, travel history, ventilation rate, and disinfection frequency. But the error rate was not minimized by the fuzzy assisted system. BAT model was introduced in Huang et al. (2020) for finding their order preference through similarity to ideal solution (TOPSIS) to address the best return date. However, the space complexity was not minimized by the BAT model. A numerical model was introduced in Sujath et al. (2020) for predicting the COVID‐2019 spread. Linear regression and Multilayer perceptron were introduced to anticipate the epidemiological ailment and COVID‐2019 cases in India. But the prediction accuracy was not improved by the numerical model. The development of the COVID‐19 pandemic was carried out in Bhardwaj (2020) Karmore et al., 2020 for different countries by the logistic model. The designed model carried out the regression analysis depending on the least‐square fitting. But the computational complexity was not minimized. A multiple ensemble neural network model was introduced in Melin et al. (2020) with fuzzy response aggregation for COVID‐19 time series. Ensemble neural networks comprised of different modules to perform the prediction under diverse conditions. However, the prediction accuracy was not improved by the designed model. Non‐pharmaceutical interventions (NPIs) were introduced in Lahiri et al. (2020) with prediction model for the prevention of pandemic of SARS‐CoV‐2 in Indian population. But the prediction time was not minimized by NPIs. Machine learning models were introduced in Rustam et al. (2020) to forecast the upcoming patients affected by COVID‐19. Regression, LASSO, SVM, and exponential smoothing were introduced to forecast the COVID‐19 factor. However, the space complexity problem during prediction remained unaddressed. An early prediction of the 2019‐nCoV epidemic was carried out in Zhong et al. (2020) using a mathematical model and epidemiological data in China. But the computational cost was not minimized during the COVID prediction. A hybrid artificial‐intelligence (AI) model was introduced in Zheng et al. (2020). Nagasubramanian et al. (2020) for COVID‐19 prediction. A susceptible model was introduced to decide the infection rate for determining the transmission law. Though the complexity was minimized, the prediction accuracy was not improved by the AI model. The problems identified from the existing prediction methods are lesser prediction accuracy, higher prediction time, higher computational cost, higher space complexity, higher error rate, and higher computational complexity. To solve the above‐said issues, least square regressive Gaussian neuro‐fuzzy multi‐layered data classification (LSRGNFM‐LDC) technique is introduced.

Motivation

Coronavirus disease (COVID‐19) is a dangerous disease caused from SARS‐CoV‐2 virus. Recently, several research works are developed for COVID by using machine learning models. But the time consumption was not minimized. In conventional methods, error rate and space complexity were not minimized. In many existing methods, prediction accuracy was not improved. Motivated by this, the LSRGNFM‐LDC technique uses the Deming regression and neuro‐fuzzy classifier for performing prediction process. The proposed Deming regression process is applied to select the most relevant feature with time consumption. In this work, neuro‐fuzzy classifier is applied to categorize the patient data depended on the risk levels through fuzzy if‐then rules with higher accuracy and lesser time consumption.

Contributions

The contributions of this work are listed below. To perform efficient COVID prediction with better accuracy and lesser time consumption as compared to state‐of‐the‐art works, LSRGNFM‐LDC technique is introduced. LSRGNFM‐LDC technique is carried out through two processes, namely, feature selection and classification. To select the more relevant features with minimal of time consumption as compared to state‐of‐the‐art works, Deming least square regressive feature selection (DLSRFS) process is introduced in LSRGNFM‐LDC technique. DLSRFS process is used for identifying the line of best fit. To improve the prediction accuracy with minimal time consumption as compared to state‐of‐the‐art works, the Gaussian neuro‐fuzzy classifier introduced in LSRGNFM‐LDC technique. Data classification process is performed to categorize the patient data with help of fuzzy if‐then rules for performing the prediction process. With help of these rules, the patient data is predicted as the higher‐risk patient data, medium‐risk patient data, or higher‐risk patient data in a more accurate manner with higher accuracy and lesser time consumption. The paper is organized into five different sections. In Section 2, the proposed LSRGNFM‐LDC technique is explained with a neat architectural diagram. Section 3 explains the related works of COVID prediction techniques. In Section 4, the experimental settings are presented with a detailed result analysis for four different parameters. The conclusion is given in Section 5.

RELATED WORKS

A new approach was introduced in Marmarelis (2020). Waheed et al. (2020) depending on data‐guided detection and infection waves explained by Riccati with estimated parameters. The designed approach was employed with Covid‐19 daily time‐series data lead to epidemic time‐course decomposition. An artificial‐intelligence method was introduced in Alazab et al. (2020) depending on a deep convolutional neural network (CNN) to identify COVID‐19 patients with real‐world datasets. But the computational cost was not minimized by the artificial‐intelligence method. A new approach was introduced in Abdel‐Basset et al. (2020b). The conditioner track proceeds the couple of CT images and their tags as contribution and products a significant representation of information is transmitted is used by division path to segment the new images. Toward support actual association among both paths, in its proposed an adaptive recalibration (RR) and recombination module that permits concentrated data exchange between paths with a trivial increase in computational complexity. A new framework was introduced in Abdel‐Basset et al. (2020a) with respect to the heart rate and sleep data from wearable devices to estimate the epidemic development of COVID‐19 in different cities. However, the computational complexity was not minimized. An objective approach was introduced in Petropoulos and Makridakis (2020) to forecast the COVID‐19. The data was reliable and performed disease detection with an increase in confirmed COVID‐19 with sizable, connected uncertainty. But the computational cost was not minimized by the objective approach. A nonlinear machine learning method was introduced in Kavadi et al. (2020) for global pandemic prediction of COVID‐19. Progressive Partial Derivative Linear Regression model was introduced to identify the best parameters efficiently. However, the prediction accuracy was not improved by the designed method. A new methodology was introduced in Fokas et al. (2020) for predicting the time evolution of individuals in each country infected with SARS‐CoV‐2. But the prediction time was not minimized by the designed methodology. An online forecasting mechanism was introduced in Abdulmajeed et al. (2020) to stream the data from Nigeria Center for Disease Control. Also, the designed mechanism updated the ensemble model for providing the COVID‐19 updates. However, the error rate was not minimized by an online forecasting mechanism. An innovative educational technique was introduced in DeFilippis et al. (2020) to provide experiential learning. FITs were supported in the early stage of the COVID‐19 pandemic and anticipated the challenges in the United States. But the space complexity was not minimized by innovative educational techniques. AI platform was introduced in Ke et al. (2020) to find the drugs with anti‐corona virus activities. Though the error rate was minimized, the time consumption for prediction was not minimized by AI. New technology was introduced in Vaishya et al. (2020) to identify the cluster of cases and to forecast the virus through collecting and examining all previous data. But the computational cost was not minimized. A new hybrid approach (HSMA_WOA) was introduced in Abdel‐Basset et al. (2020c) depending on the SMA and WOA to resolve the image segmentation problems (ISP) for identifying the optimal threshold values. A novel meta‐heuristic algorithm named slime mold algorithm (SMA) was presented for enhancing Kapur's entropy by the whale optimization algorithm. The designed algorithm failed to solve flow shop scheduling issues, DNA fragment assembly issues, and parameter evaluation of the photovoltaic solar cell. A novel IoT system based decision‐making model was developed by Abdel‐Basset et al. (2020c) and Al‐Turjman and Deebak (2020) to find and examine patients with type‐2 diabetes. A new decision‐making model was introduced on type‐2 neuromorphic numbers by the VIKOR method. A decision support system was introduced for the precise prediction of type‐2 diabetes risks for patients. But the accuracy was not improved. A data analytics and visualization were introduced in Abdel‐Basset et al. (2019) to examine malignant tumors and find weak spots of the tumor. However, failed to generate a greater positive impact and influence. MapReduce framework and fusion algorithm was introduced in Chang (2018a, 2018b) for medical imaging simulations. But the more varieties of gene simulations were not considered. The disruptive technologies were introduced Abdel‐Basset et al. (2020a) to analyze and restrict the spread of COVID‐19. However, the space complexity was not reduced. Susceptible‐Infected‐Recovered (SIR) model was introduced Tutsoy et al. (2020) for the evaluation of COVID‐19 casualties. But the error rate was not reduced. Mathematical and computational models were developed Torrealba‐Rodriguez et al. (2020) for the estimation of the COVID‐19 cases. However, it failed to reduce the prediction time. Gaussian prediction model was introduced Karaçuha et al. (2020) for forecasting the short‐term future of the pandemic. However, the computational complexity was not minimized.

METHODOLOGY

The coronavirus disease 19 (COVID‐19) pandemic has resulted in the propagation of clinical prediction models to perform the diagnosis, disease severity estimation, and prognosis. A COVID‐19 originated from β‐coronavirus was reported in December 2019 at Wuhan city in China. On March 11, 2020, COVID‐19 was announced as the public health crisis of international distress by World Health Organization (WHO). As of August 17, 2020, the Times of India website reported India's overall coronavirus crosses 26 lakh and 50,921 deaths. In Table 1 listed the detailed description of the variables used in this paper.

TABLE 1

Description of the variables

Variable	Description
P _i = P ₁, P ₂, . . ,P _n	Large number of patient data
Ft _j = Ft ₁, Ft ₂, . . ,Ft _m	Number of features
n	Number of data
m	Number of features
e _j	Error value
v _j	Ratio of their variance
I ₀	Intercept
s ₁	Slope
Fti^ and ai^	Estimate of true values of “Ft _j” and “a _j”
In(t)	Input layer output
Pa_i	Patient data with the most relevant feature
weight₀	Initial weight allocated at the input layer
b	Bias
Hidden(t)	Hidden layer result
weight_ih	Weight allocated between the input layer and the hidden layer
Ou(t)	Output layer
weight_ho	Weight allocated between the hidden layer and the output layer
Prediction_Acc	Prediction accuracy
Prediction_Time	Prediction time
Error_Rate	Error rate
Space_Complexity	Space complexity

Description of the variables The coronavirus gets directly transmitted through cough, contact‐transmission, sneeze, and respiratory globules. Patients are infectious before the beginning of clinical symptoms and contagiousness lasts up to three weeks after the recovery. Also, patients with mild symptoms were identified as infective. But the existing researchers failed to predict the COVID with higher accuracy and lesser time consumption. To address the existing problems, LSRGNFM‐LDC technique is introduced. The main objective of the LSRGNFM‐LDC technique is to perform an efficient COVID prediction with better accuracy and lesser time consumption. LSRGNFM‐LDC technique comprises two processes, namely feature selection and classification. In this technique, first, preprocessing is carried out for removing the unwanted data from the Novel Corona Virus 2019 dataset. Preprocessing is employed for minimizing the time complexity. Next, DLSRFS process is used in the LSRGNFM‐LDC technique for choosing the most relevant features. The data classification is carried out by applying neuro‐fuzzy classifier to perform the prediction process, with aid of relevant features. The architecture diagram of the LSRGNFM‐LDC technique is illustrated in Figure 1.

FIGURE 1

Architecture diagram of least square regressive Gaussian neuro‐fuzzy multi‐layered data classification technique

Architecture diagram of least square regressive Gaussian neuro‐fuzzy multi‐layered data classification technique Figure 1 explains the architecture diagram of the LSRGNFM‐LDC technique. Initially, the patient data is collected from the input database. The preprocessing is used for eliminating the unwanted data with lesser time complexity. After preprocessing, regression process is carried to select the most relevant feature via identifying the line of best fit. After the feature selection process, the fuzzy rules are constructed to perform the classification process. The classification process is used to classify the patient data with relevant features. In this way, the efficient COVID data prediction is carried out with higher accuracy and lesser time consumption. A brief description of the feature selection and classification process is given in the below sub‐section.

Deming least square regressive feature selection process

DLSRFS process is introduced in LSRGNFM‐LDC technique to select the more relevant features from the dataset. DR‐FS algorithm is an errors‐in‐variables model. DLSRFS process identifies the line of best fit for performing the relevant feature selection. DLSRFS process is not the same as the linear regression where it identifies the errors on both axes. The DLSRFS process determines the maximum likelihood assessment of the error‐in‐variable model. The steps involved in the DLSRFS process is illustrated in Figure 2.

FIGURE 2

Deming least square regressive feature selection process

Deming least square regressive feature selection process Figure 2 illustrates the step‐by‐step process diagram of Deming least‐square regressive feature selection analysis. Initially, many patient data is collected from Novel Corona Virus 2019 Dataset. Next, DLSRFS process is carried out for identifying the line of best fit. Then, this process is used to choose the relevant features for COVID disease prediction. Large number of patient data denoted as “P = P 1, P 2, . . ,P ” with number of features “Ft = Ft 1, Ft 2, . . ,Ft ,” From the expression, “n” denotes the number of data and “m” denotes the number of features in each dataset. DLSRFS process explain the available data (a , Ft ) are calculated readings of true values () that lie on regression line given by, From (1) and (2), “e ” denotes the error value. “v ” denotes the ratio of their variance. Both values are independent of each other. The intercept “I 0” and the slope “s 1” are determined as, From (3), “ ” and “ ” denotes the estimate of true values of “Ft ” and “a ” respectively. With help of the above equation, the DLSRFS process identifies the best fit line for every input feature. Also, it chooses the more relevant features to effectively predict the COVID disease. For achieving better outcomes, the DLSRFS process reduces the weighted sum of squared residuals. As a result, the DLSRFS process accurately selects the relevant features for disease prediction. The algorithmic processes of DLSRFS are given as, Deming Least Square Regressive Feature Selection Input: Number of features ‘Ft 1, Ft 2, Ft 3, …, Ft ’ of patient data Output: Select most relevant features with minimal time consumption //Deming Least Square Regressive Feature Selection Algorithm Step 1: Begin Step 2: For patient data ‘Ft ’ with number of features Step 3: Apply Deming least square regression analysis Step 4: Find the best fit line Step 5: Select the most relevant features for prediction Step 6: End For Step 7: End Algorithm 1 demonstrates the step‐by‐step process of the DLSRFS algorithm. With the above algorithmic process, the DLSRFS algorithm reduces the time consumption for relevant feature selection from the input Novel Corona Virus 2019 Dataset as compared to conventional works. Initially, the number of patient data is taken from Novel Corona Virus 2019 dataset as input. These patient data are collected from the dataset. Then, Deming least square regression analysis is used for finding the best fit line. Finally, the DLSRFS algorithm is employed to choose the most relevant features for COVID disease prediction.

Gaussian neuro‐fuzzy multi‐layered data classification

Gaussian neuro‐fuzzy multi‐layered data classification (GNFMLDC) process is carried out to classify the data points with higher accuracy and lesser time consumption. GNFMLDC process two processes, namely fuzzy logic and neural network. The fuzzy logic includes three operations, namely fuzzification operation, inference operation, and defuzzification operation. It is shown in Figure 3.

FIGURE 3

Fuzzy‐based operation

Fuzzy‐based operation From Figure 3, the initially GNFMLDC process collects the input “I (t)” and determines the required performance. In the fuzzification operation, the inputs are converted to fuzzy sets. The goal of the inference operation is the efficient conversion of fuzzy input to the fuzzy output using an if‐then loop. GNFMLDC process is designed with a set of rules termed fuzzy rules. The rules in the GNFMLDC process are denoted in the form of conditional statements. Finally, the defuzzification operation converts the fuzzy set to the output. A neuro‐fuzzy system is defined as a fuzzy system with a learning algorithm to determine fuzzy sets and fuzzy rules through processing the features of input data. The learning process aimed at local information and resulted in modification of the fundamental fuzzy system. The learning process of the neuro‐fuzzy system includes the semantical properties of the fundamental fuzzy system. A neuro‐fuzzy system is considered a multi‐layer (i.e., three‐layer) feedforward neural network. The first layer denotes the input layer with input variables. The hidden layer symbolizes the fuzzy rules, and the third layer denotes the output variables. Initially, the most relevant features of input data are considered as an input in the LSRGNFMLDC technique and given to the input layer. It is given by, From Equation (4), “In(t)” denotes the input layer output. “Pa ” denotes the patient data with the most relevant feature. “weight0” represents the initial weight allocated at the input layer. “b” symbolizes the bias. Then, the input layer result is transmitted to the hidden layer. In that layer, a neuro‐fuzzy system is used with the fuzzy rules and fuzzy sets. Fuzzy sets are trained as the connection weights and bias. The fuzzy set is portrayed using the membership function. It classifies the element in the fuzzy set as the continuous or the discrete form. A Gaussian membership function is the generalization of the characteristic function of a defined subset. Fuzzy if‐then rules are employed for taking the correct decision based on the input data. The fuzzy rules are depending on the if‐then rule condition as described in Equation (5). In GNFMLDC, four input membership functions, and one output membership functions are employed. The four‐input function are age, body temperature, travel history, and chronic disease. The three‐output membership function of the GNFMLDC is lower risk, medium risk, and higher risk. After considering the membership function, the fuzzy rule is formulated as, From (6), (7), and (8), the load is determined as under load, normal load, and overload while comparing to the threshold metal weight using the if‐then condition. The output triangular membership function is described in below Figure 4

FIGURE 4

Output Gaussian membership function of fuzzy set

Output Gaussian membership function of fuzzy set The values of the output Gaussian membership function vary between 0 and 1 to categorize the fuzzy members that belong to the fuzzy set. Less than 0.25 is a representing a lower risk. 0.5 is denoting a medium risk. Greater than 0.5 is representing a higher risk. The de‐fuzzification process converts the fuzzy sets into output results. The hidden layer output is given by, From Equation (9), “Hidden(t)” denotes the hidden layer result. “weight ” denotes the weight allocated between the input layer and the hidden layer. After that, the hidden layer results are transmitted to an output layer. An output of the GNFMLDCclassifier renders the classification output for each input patient data. The result of the output layer is formulated as, From Equation (10), “Ou(t)” denotes the output layer result. “weightho” represents the weight allocated between the hidden layer and the output layer. In this way, the patient data is classified into three types (i.e., Lower risk, Medium risk, and Higher risk) for performing the COVID prediction. The algorithmic description of the GNFMLDC process is explained below, Gaussian Neuro‐Fuzzy Multi‐Layered Data Classification Input: Number of patient data with the most relevant features Output: Improve prediction accuracy with minimal time consumption //Gaussian Neuro‐Fuzzy Multi‐Layered Data Classification Algorithm Step 1: Begin Step 2: For each input patient data ‘Pa ’ Step 3: Perform the fuzzification process Step 4: Analyze the fuzzy interference function Step 5: Determine the de‐fuzzification process Step 6: Obtain the classification results as higher‐risk patient data, medium risk patient data, or higher risk patient data Step 7: End for Step 8: End Algorithm 2 explains the step‐by‐step process of the Generalized Recurrent Neural Brown Boosting Classifier Algorithm. Initially, the relevant features of each patient data are collected as input. After that, a fuzzification operation is carried out to convert the input into fuzzy sets. Inference operation is the conversion of fuzzy input to the fuzzy output through the if‐then loop. The fuzzy inference operation is used to minimize the prediction time. Finally, the defuzzification operation changes the fuzzy set to output. With help of these rules, the patient data is predicted as the higher risk patient data, medium risk patient data or higher risk patient data with maximum accuracy and minimum time consumption. With the above algorithmic process, LSRGNFM‐LDC technique attains the better COVID prediction performance through the classification process when compared to the traditional works.

EXPERIMENTAL SETTINGS

Experimental evaluation of proposed LSRGNFM‐LDC technique and existing methods data‐driven LSTM Tomar and Gupta (2020), fuzzy assisted system Adwibowo (2020) and disruptive technologies Abdel‐Basset et al. (2020c) are implemented using Java language. The experiment of the LSRGNFM‐LDC technique is conducted using Novel Corona Virus 2019 Dataset taken from the Kaggle. The URL of the mentioned dataset is given as Kaggle.com (2020). The dataset has daily level information on the number of affected cases, deaths, and recovery from the 2019 novel coronavirus. It is time‐series data and the number of cases on a given day is the cumulative number. The dataset comprises eight files. The dataset totally includes 1608 columns. There are integer 1502 columns, String 63 columns, Uuid 18 columns and other 25 columns. Among the eight files, we have taken COVID19_open_line_listfile for conducting the experiments. The dataset comprises 44 features and 13,174 instances. The features are ID, age, sex, city, country, province, and so on. Among these features, relevant features are selected to perform the classification for COVID prediction. The experimental result of the proposed LSRGNFM‐LDC technique is compared against three conventional methods namely data‐driven LSTM, fuzzy assisted system, and disruptive technologies. For the experimental consideration, the numbers of patient data are taken from 100 to 1000. Totally ten runs are carried out for all the parameters evaluation. With the help of these numbers of patient data, the classification is done to improve the performance of COVID19 prediction. The Attributes Description in this dataset is depicted in Table 2.

TABLE 2

Attributes description

Attribute	Description
S. no	Serial number
Observation date	Date of the observation in MM/DD/YYYY
Province/state	Province or state of the observation
Country/region	Country of observation
Last update	Time in UTC at which the row is updated for the given province or country.
Confirmed	Cumulative number of confirmed cases till that date
Deaths	Cumulative number of deaths till that date
Recovered	Cumulative number of recovered cases till that date

Attributes description Date of the observation in MM/DD/YYYY Country of observation Result analysis of LSRGNFM‐LDC technique is compared with existing techniques with certain parameters are, Prediction accuracy Prediction time Error rate Space complexity

Impact on prediction accuracy

Prediction accuracy is defined as the ratio of the number of patient data that are correctly predicted the risk level through the classification process to the total number of patient data taken. Consequently, the prediction accuracy is formulated as, From (11), the prediction accuracy is calculated. The prediction accuracy is measured in terms of percentage (%). Table 3 describes the prediction accuracy of a different number of patient data. For determining the prediction accuracy, the proposed and existing three methods Tomar and Gupta (2020), Adwibowo (2020), and Abdel‐Basset (2020a) is implemented in the Java language. To determine the COVID prediction performance, proposed and traditional works Tomar and Gupta (2020), Adwibowo (2020), and Abdel‐Basset et al. (2020a) are implemented with Novel Corona Virus 2019 Dataset. When considering the number of patient data as 300, the proposed LSRGNFM‐LDC technique correctly predicts the 280‐patient data whereas data‐driven LSTM Tomar and Gupta (2020), Fuzzy assisted system Adwibowo (2020), and Disruptive technologies Abdel‐Basset et al. (2020c) correctly predicts the 258‐patient data, 269 patient data and 264 patient data respectively. Therefore, the prediction accuracy attained by the proposed LSRGNFM‐LDC technique is 93% and the prediction accuracy attained by Data‐driven LSTM Tomar and Gupta (2020), Fuzzy assisted system Adwibowo (2020), and Disruptive technologies Abdel‐Basset et al. (2020c) is 86, 90, and 88% respectively. From the result, the prediction accuracy using the proposed LSRGNFM‐LDC technique is higher when compared to other works by Tomar and Gupta (2020), Adwibowo (2020), and Abdel‐Basset et al. (2020a).

TABLE 3

Tabulation for prediction accuracy

Number of patient data (number)	Prediction accuracy (%)
Number of patient data (number)	Data‐driven LSTM	Fuzzy assisted system	Disruptive technologies	LSRGNFM‐LDC technique
100	80	85	83	94
200	85	90	87	95
300	86	90	88	93
400	88	91	90	95
500	89	92	91	96
600	88	90	89	95
700	87	89	88	96
800	89	90	89	96
900	89	91	90	95
1000	91	91	92	95

Abbreviations: LSRGNFM‐LDC, least square regressive Gaussian neuro‐fuzzy multi‐layered data classification; LSTM, long short‐term memory.

Tabulation for prediction accuracy Abbreviations: LSRGNFM‐LDC, least square regressive Gaussian neuro‐fuzzy multi‐layered data classification; LSTM, long short‐term memory. The proposed LSRGNFM‐LDC technique performs an accurate COVID prediction with an increasing number of patient data. LSRGNFM‐LDC technique is employed to improve the prediction accuracy for selecting the most relevant features by using Deming regression. This process finds the line of best fit for performing the relevant feature selection. Then, the Gaussian neuro‐fuzzy classifier is employed to classify the patient data for performing COVID prediction based on the risk levels through fuzzy if‐then rules. The patient data is categorized into three types such as Lower risk, Medium risk, and Higher risk for performing the COVID prediction. This helps the LSRGNFM‐LDC technique to improve prediction accuracy performance. Therefore, the proposed LSRGNFM‐LDC technique illustrates that prediction accuracy is said to be minimized by 9% when compared to Tomar and Gupta (2020), 5% when compared to Adwibowo (2020) and 7% when compared to Abdel‐Basset et al. (2020a) respectively.

Impact on prediction time

Prediction time is defined as the amount of time consumed for predicting the number of patient data as lesser risk data, medium risk data, and higher risk data. It is the product of the number of patient data and the amount of time consumed for predicting one patient data. Therefore, the prediction time is calculated as, From (12), the prediction time is determined. The prediction time is measured in terms of milliseconds. With the help of the obtained experimental values, the graph is illustrated in Figure 5.

FIGURE 5

Measurement of prediction time

Measurement of prediction time Figure 5 portrays the performance evaluation of the prediction time using three different methods namely data‐driven LSTM Tomar and Gupta (2020) and Fuzzy assisted system Adwibowo (2020), Disruptive technologies Abdel‐Basset et al. (2020c) and LSRGNFM‐LDC technique. Figure 5 explains the prediction time consumed for a different number of patient data. With aim of determining the COVID prediction time, the proposed and existing two methods Tomar and Gupta (2020), Adwibowo (2020), and Abdel‐Basset et al. (2020a) are implemented in Java language from a novel coronavirus dataset. When considering the number of patient data as 900, the proposed LSRGNFM‐LDC technique consumes14.85 ms prediction time whereas the Data‐driven LSTM Tomar and Gupta (2020), Fuzzy assisted system Adwibowo (2020) and Disruptive technologies Abdel‐Basset et al. (2020a) consumes 26.1, 23.4, and 20.9 ms respectively. Therefore, it is significant that the prediction time consumed using the proposed LSRGNFM‐LDC technique is lesser when compared to other works by Tomar and Gupta (2020), Adwibowo (2020), and Abdel‐Basset et al. (2020a). As described in Figure 5, the prediction time of all the methods gets increased while increasing the number of patient data. From the figure, the red color bar denotes the prediction time of the LSRGNFM‐LDC technique whereas the red color bar, green color bar, and violet color bar denote the prediction time of the data‐driven LSTM, Fuzzy assisted system and Disruptive technologies. The proposed LSRGNFM‐LDC technique outperforms better COVID prediction with an increasing number of patient data. LSRGNFM‐LDC technique significantly decreases the prediction time using applying the DLSRFS process. DLSRFS process is used to select the most relevant features through determining the line of best fit. After that, Gaussian neuro‐fuzzy classifier in LSRGNFM‐LDC technique performs the data classification process with help of fuzzy if‐then rules for performing prediction process. Next, Guassian neuro‐fuzzy classifier is used to categorize the patient data as lower risk level, medium risk level, and higher risk level. In this way, the LSRGNFM‐LDC technique minimizes the prediction time. As a result, the proposed LSRGNFM‐LDC technique illustrates that prediction time is said to be minimized by 41% when compared to Tomar and Gupta (2020), 34% when compared to Adwibowo (2020) Kolhar et al. (2020) and 25% when compared to Abdel‐Basset et al. (2020a) respectively.

Impact on the error rate

The error rate is defined as the ratio of the number of patient data that are incorrectly predicted through the classification process to the total number of patient data taken. As a result, the error rate is determined as, From (13), the error rate is determined. The error rate is measured in terms of percentage (%). The error rate comparison of three different methods, namely data‐driven LSTM Tomar and Gupta (2020), fuzzy assisted system Adwibowo (2020), disruptive technologies Abdel‐Basset et al. (2020c) and LSRGNFM‐LDC technique is illustrated in Table 4. For a better assessment, the various number of patient data are taken as input ranging from 100, 200, 300… 1000. For the same number of input data, three various error rate results are attained as shown in Table 4. Let us consider the number of patient data as100 for conducting the experiments. Among the 100‐patient data, six patient data are incorrectly predicted by the LSRGNFM‐LDC technique. Also, 20, 15, and 17 patient data are incorrectly predicted by data‐driven LSTM Tomar and Gupta (2020), fuzzy assisted system Adwibowo (2020), and disruptive technologies Abdel‐Basset et al. (2020c) respectively. Therefore, the error rate obtained by the LSRGNFM‐LDC technique, data‐driven LSTM Tomar and Gupta (2020), fuzzy assisted system Adwibowo (2020), and disruptive technologies Abdel‐Basset et al. (2020a) is 6, 20, 15, and 17%, respectively. The statistical discussion confirms that the proposed technique attains a lesser error rate than existing methods.

TABLE 4

Tabulation for error rate

Number of patient data (Number)	Error rate (%)
Number of patient data (Number)	Data‐driven LSTM	Fuzzy assisted system	Disruptive technologies	LSRGNFM‐LDC technique
100	20	15	17	6
200	15	10	13	5
300	14	10	12	7
400	13	9	10	5
500	11	8	9	4
600	12	10	11	5
700	13	11	12	4
800	11	10	11	4
900	11	9	10	5
1000	9	9	8	5

Abbreviations: LSRGNFM‐LDC, least square regressive Gaussian neuro‐fuzzy multi‐layered data classification; LSTM, long short‐term memory.

Tabulation for error rate Abbreviations: LSRGNFM‐LDC, least square regressive Gaussian neuro‐fuzzy multi‐layered data classification; LSTM, long short‐term memory. Table 4 demonstrates the error rate experimental results of three different prediction techniques with respect to the number of data collected from the COVID database. As described in Table 4, the space complexity of all the methods gets increased while increasing the number of patient data. From the obtained values, the error rate of the LSRGNFM‐LDC technique is lesser than other existing works. The reason behind the performance enhancement is the application of feature selection and classification techniques for predicting the COVID patient data. The LSRGNFM‐LDC technique uses the Deming regression and neuro‐fuzzy classifier for performing the prediction process. Besides, the DLSRFS process selects the most relevant feature selection for disease prediction. DLSRFS process discovers the line of best fit for performing the relevant feature selection. After that, the neuro‐fuzzy classifier is applied to categorize the patient data depended on the risk levels through fuzzy if‐then rules. The fuzzy if‐then rule classifies the patient data into lesser risk, medium risk, and higher risk. In this manner, the proposed technique reduces the error rate during the COVID prediction. Likewise, the comparative analysis indicates that the error rate is found to be minimized by 60, 50, and 55% when compared to three existing classification algorithms Tomar and Gupta (2020), Adwibowo (2020), and Abdel‐Basset et al. (2020a) , respectively.

Impact on space complexity

Space complexity is defined as the amount of memory space required for performing COVID prediction. It is the product of the number of patient data and the space required for storing one patient data. It is given by, From (14), the space complexity is computed. The space complexity is measured in terms of megabytes (MB). With attained space complexity table values, the graph is shown in Figure 6.

FIGURE 6

Measurement of space complexity

Measurement of space complexity Figure 6 illustrates the space complexity experimental results of three prediction techniques with respect to the number of data taken from the COVID database. The various number of patient data are considered as input ranging from 100, 200, 300…1000. Let us consider the number of patient data as 500 for experiment evaluation. Among 500 patient data, the memory space consumed by one patient data is 0.0185 MB by the LSRGNFM‐LDC technique. Also, 0.031, 0.028, and 0.0416 MB of memory space are consumed by data‐driven LSTM Tomar and Gupta (2020), fuzzy assisted system Adwibowo (2020), and disruptive technologies Abdel‐Basset et al. (2020c) respectively for one patient data prediction. Therefore, the space complexity attained by LSRGNFM‐LDC technique, data‐driven LSTM Tomar and Gupta (2020), fuzzy assisted system Adwibowo (2020), and disruptive technologies Abdel‐Basset et al (2020c) is 10.5, 23.5, 18, and 20.8 MB, respectively. From the figure, the red color bar denotes the space complexity of the LSRGNFM‐LDC technique whereas the red color bar, green color bar, and violet color bar denotes the space complexity of data‐driven LSTM, fuzzy assisted system, and disruptive technologies. With collected patient data, the space complexity of the three techniques is determined. From the attained values, the space complexity of the LSRGNFM‐LDC technique is lesser than other conventional works. This is because of applying the feature selection process during the COVID prediction process. LSRGNFM‐LDC technique uses the DLSRFS process for eliminating the irrelevant features. Also, the DLSRFS process chooses the most relevant feature selection through finding the line of best fit. DLSRFS process is used to effectively predict the COVID disease. This helps in minimizing the space consumption during the COVID prediction. Similarly, the result analysis denotes that space complexity is minimized by 57, 42, and 51% when compared to two existing classification algorithms Tomar and Gupta (2020), Adwibowo (2020), and Abdel‐Basset et al. (2020c), respectively.

CONCLUSION AND FUTURE SCOPE

A new technique termed LSRGNFM‐LDC technique is introduced for performing the COVID prediction with better accuracy and lesser time consumption. The preprocessing is applied on each input features to eradicate the irrelevant data. Preprocessing is used to minimize the time consumption. DLSRFS process in LSRGNFM‐LDC technique selects the most relevant features through identifying the line of best fit. In LSRGNFM‐LDC technique, the neuro‐fuzzy classifier utilizes the fuzzy if‐then rules for performing the prediction process. The fuzzy if‐then rule predicts the patient data into lesser risk, medium risk, and higher risk in a more accurate manner with higher accuracy and lesser time consumption. The wide‐ranging experimental evaluation is performed with the COVID database. The quantitative results are verified in terms of higher prediction accuracy and lesser time as well as space complexity when compared to other related works. In future, our proposed work is also proceed using Guided Whale Optimization Algorithm (Guided WOA) based on Stochastic Fractal Search to selects the relevant features. Weakly supervised deep learning model is applied to classify the patient data.

CONFLICT OF INTEREST

The authors declare there is no conflict of interest.

26 in total

1. Computational Intelligence for Medical Imaging Simulations.

Authors: Victor Chang
Journal: J Med Syst Date: 2017-11-25 Impact factor: 4.460

2. A machine learning forecasting model for COVID-19 pandemic in India.

Authors: R Sujath; Jyotir Moy Chatterjee; Aboul Ella Hassanien
Journal: Stoch Environ Res Risk Assess Date: 2020-05-30 Impact factor: 3.379

3. Predicting COVID-19 in China Using Hybrid AI Model.

Authors: Nanning Zheng; Shaoyi Du; Jianji Wang; He Zhang; Wenting Cui; Zijian Kang; Tao Yang; Bin Lou; Yuting Chi; Hong Long; Mei Ma; Qi Yuan; Shupei Zhang; Dong Zhang; Feng Ye; Jingmin Xin
Journal: IEEE Trans Cybern Date: 2020-05-08 Impact factor: 11.448

4. Effectiveness of preventive measures against COVID-19: A systematic review of In Silico modeling studies in indian context.

Authors: Arista Lahiri; Sweety Suman Jha; Saikat Bhattacharya; Soumalya Ray; Arup Chakraborty
Journal: Indian J Public Health Date: 2020-06

5. Multiple Ensemble Neural Network Models with Fuzzy Response Aggregation for Predicting COVID-19 Time Series: The Case of Mexico.

Authors: Patricia Melin; Julio Cesar Monica; Daniela Sanchez; Oscar Castillo
Journal: Healthcare (Basel) Date: 2020-06-19

6. Adapting the Educational Environment for Cardiovascular Fellows-in-Training During the COVID-19 Pandemic.

Authors: Ersilia M DeFilippis; Ada C Stefanescu Schmidt; Nosheen Reza
Journal: J Am Coll Cardiol Date: 2020-04-15 Impact factor: 24.094

7. Artificial intelligence approach fighting COVID-19 with repurposing drugs.

Authors: Yi-Yu Ke; Tzu-Ting Peng; Teng-Kuang Yeh; Wen-Zheng Huang; Shao-En Chang; Szu-Huei Wu; Hui-Chen Hung; Tsu-An Hsu; Shiow-Ju Lee; Jeng-Shin Song; Wen-Hsing Lin; Tung-Jung Chiang; Jiunn-Horng Lin; Huey-Kang Sytwu; Chiung-Tong Chen
Journal: Biomed J Date: 2020-05-15 Impact factor: 4.910

8. Deming least square regressed feature selection and Gaussian neuro-fuzzy multi-layered data classifier for early COVID prediction.

Authors: Rathnamma V Mydukuri; Suresh Kallam; Rizwan Patan; Fadi Al-Turjman; Manikandan Ramachandran
Journal: Expert Syst Date: 2021-03-26 Impact factor: 2.812

9. ONLINE FORECASTING OF COVID-19 CASES IN NIGERIA USING LIMITED DATA.

Authors: Kabir Abdulmajeed; Monsuru Adeleke; Labode Popoola
Journal: Data Brief Date: 2020-05-08

10. Modeling and prediction of COVID-19 in Mexico applying mathematical and computational models.

Authors: O Torrealba-Rodriguez; R A Conde-Gutiérrez; A L Hernández-Javier
Journal: Chaos Solitons Fractals Date: 2020-05-29 Impact factor: 5.944

2 in total

1. Quantum Edge Extraction of Chest CT Image for the Detection and Differentiation of Infected Region of COVID-19 Patient.

Authors: Rajib Chetia; Partha Pratim Sahu
Journal: Arab J Sci Eng Date: 2022-01-30 Impact factor: 2.334

2. Deming least square regressed feature selection and Gaussian neuro-fuzzy multi-layered data classifier for early COVID prediction.

Authors: Rathnamma V Mydukuri; Suresh Kallam; Rizwan Patan; Fadi Al-Turjman; Manikandan Ramachandran
Journal: Expert Syst Date: 2021-03-26 Impact factor: 2.812

2 in total