Literature DB >> 34253340

A novel hybrid fuzzy time series model for prediction of COVID-19 infected cases and deaths in India.

Abstract

World is facing stress due to unpredicted pandemic of novel COVID-19. Daily growing magnitude of confirmed cases of COVID-19 put the whole world humanity at high risk and it has made a pressure on health professionals to get rid of it as soon as possible. So, it becomes necessary to predict the number of upcoming cases in future for the preparation of future plan-of-action and medical set-ups. The present manuscript proposed a hybrid fuzzy time series model for the prediction of upcoming COVID-19 infected cases and deaths in India by using modified fuzzy C-means clustering technique. Proposed model has two phases. In phase-I, modified fuzzy C-means clustering technique is used to form basic intervals with the help of clusters centroid while in phase-II, these intervals are upgraded to form sub-intervals. The proposed model is tested against available COVID-19 data for the measurement of its performance based on mean square error, root mean square error and average forecasting error rate. The novelty of the proposed model lies in the prediction of COVID-19 infected cases and deaths for next coming 31 days. Beside of this, estimation for the approximate number of isolation beds and ICU required has been carried out. The projection of the present model is to provide a base for the decision makers for making protection plan during COVID-19 pandemic.

Entities: Chemical

Keywords: COVID-19; Clustering; Fuzzy C-means; Fuzzy time series; Pandemic

Mesh：

Year: 2021 PMID： 34253340 PMCID： PMC8259256 DOI： 10.1016/j.isatra.2021.07.003

Source DB: PubMed Journal: ISA Trans ISSN： 0019-0578 Impact factor: 5.911

Introduction

Currently, a novel virus from a family of coronaviruses spread all over the world named as COVID-19 by world health organization (WHO). The number of infected COVID-19 cases has a rapid growth in most of the countries from December 2019 onward. That is why WHO declared it as a pandemic. It is assumed that the spreading of this virus was originated from a city of China called Wuhan in mid-December 2019. On 30 January 2020, the first case of COVID-19 was reported in India. After the first case, COVID-19 shows rapid growth in almost all parts of India within a few months. As a result, the number of COVID-19 infected cases and deaths per month has increased. It shows that there is a drastic change in the growth of the novel coronavirus. Unfortunately, the vaccine for the preservation of COVID-19 is not invented yet in India and rest of the world. Thus, the mobility rate of COVID-19 at a peak in India. Hence, it becomes necessary to predict the number of upcoming COVID-19 infected cases and deaths in future for the preparation of future plan-of-action and medical set-ups in advance. For forecasting purposes, several mathematical models have been developed in the past few months. Ded and Majumdar [1] proposed model named as time series model for the analysis of incidence trends and also obtained reproductive number of COVID-19. Mandal et al. [2] presented an analysis of COVID-19 through restriction on travels from other countries and the impact of quarantine in India by using simple mathematical model. Bhardwaj [3] proposed regression based predictive model for the prediction of total infection that arises due to COVID-19 at the end of outbreak. Bhola et al. [4] discussed the predictive model for future projection of this pandemic in India. Chatterjee et al. [5] proposed a compartmental SEIR model for obtaining the impact on healthcare and number of infections due to the COVID-19 epidemic in India. Chakarborty and Ghosh [6] build a real time forecasting method for the prediction of future COVID-19 cases and obtained case fatality rates for different countries. Roy and Kar [7] presented a study of impact of natural causes like summer and humidity on COVID-19 by analysing the presently available data. Mondal and Ghosh [8] have focused on the analysis of exponential growth of COVID-19 cases in India with respect to other countries and made future sketching of it. Mandal et al. [9] have proposed a mathematical model by introducing quarantine class and other measures introduced by the government to control COVID-19 spreading and forecast trends of it in three states of India. Marimuthu et al. [10] have estimated the number of infected cases of COVID-19 in Delhi by using SEIR model. Scheiner et al. [11] have developed a mathematical model called death kinetic law by using SEIR model and compared with another approach infection-to-death delay rule. El Koufi et al. [12] presented SIR epidemic model to stochastic from a deterministic frame with non-linear incidence and vertical rates. Al-Qaness et al. [13] proposed a forecasting model to estimate confirmed cases of corona in China based on previous cases. Lalwani et al. [14] proposed 3-phase SIRD model for determining the lockdown period for highly affected regions and predict lockdown period for India and Italy. Patrikar et al. [15] have presented modified SEIR to forecast the estimated cases of novel coronavirus in India. Salgotra et al. [16] developed genetic programming-based prediction model to estimate confirmed cases and death of COVID-19 across three states and India as well. Pai et al. [17] predict the number of active cases of COVID-19 in India by knowing the impact of lockdown along with the inflation of active cases. Kuniya [18] has applied the SEIR compartmental model for predicting of the epidemic peak of COVID-19 in Japan. Tiwari et al. [19] built time series forecasting model to predict the epidemic of COVID-19 in India. Tomar and Gupta [20] proposed data driven model & predict the active cases in India for upcoming days and measure the impact of lockdown on COVID-19. Krishna and Prakash [21] presented a model to estimate the mobility of COVID-19. Sarkar et al. [22] presented SARIIqSq model to forecast the scenario of COVID-19 in India. Malavika et al. [23] presented short term prediction by using Logistic growth model and also used SIR model to forecast peak time, active cases and the effect of lockdown in India. Kaushik et al. [24] gave review on different terminology due to COVID-19 in India such as clinical presentation, treatment, virology etc. Ambikapathy and Krishnamurthy [25] have proposed a model for knowing the impact of lockdown on COVID-19 spreading in India. Acharya and Porwal [26] present an ecological study by using a percentile ranking method for obtaining specific domain, overall vulnerability and present the number of active cases of coronavirus in 9 districts of India. Cooper et al. [27] proposed SIR model to examine the effectiveness of COVID-19 spreading in the community. Singhal et al. [28] have proposed two models, mathematical model with and without parameter, for investigating the trend of COVID-19 and gave some prediction for the upcoming days. Ranjan [29] analysed the outbreak of COVID-19 in India by using the epidemiological model for long-term and short-term predictions. Poddar et al. [30] proposed a model to study the spreading rate, death rate, recovery rate of COVID-19 and study the prediction of it in India. Vasantha and Patil [31] presented an overview of development of different models on COVID-19 in India. Pandey et al. [32] used regression and SEIR model to forecast the outbreak of COVID-19 in India. Alkahtani and Alzaid [33] presented a model of COVID-19 based on fractional differential operator and numerical method by using Lagrange polynomial for solving the system equation. Arora et al. [34] presented a deep learning-based models for forecasting the number of positive cases of COVID-19 for union territories and 32 states of India. Sahoo and Sapra [35] have gave data driven epidemic model for the analysis of COVID-19 in India based on real data of COVID-19. Çakan [36] presented SEIR epidemic model by considering impact of health and analysed the global & local stability of this model. Giri et al. [37] have proposed neural network model by introducing the lockdown condition for showing the infection risk of COVID-19. Clustering is an approach in which a grouped data are separated into a smaller data groups based on similarity measures. In the literature, many researchers used clustering technique with fuzzy time series (FTS) for forecasting purposes. Song and Chissom [38], [39] have proposed a time series model based on fuzzy set theory. Huarng [40] proved that accuracy of the forecasting model can be improved by changing the length of intervals. Li et al. [41] have developed a forecasting algorithm for FTS based on fuzzy C-means (FCM) clustering. Qui et al. [42] proposed high order FTS model based on fuzzy logical relationship (FLR) and automatic clustering. Sang et al. [43] presented a forecasting FTS model based on IFCM clustering technique. Kumar et al. [44] proposed two distance metrics and developed two clustering algorithms AMFCM and EMFCM. Zhang et al. [45] developed FTS forecasting model based on time series clustering and multiple linear regressions. While the FCM clustering technique uses Euclidean distance (ED) which gets easily stuck in a noisy environment and does not obtain good results. So, in the present manuscript basic FCM is modified by using an exponential function to tolerate noisy data before using with FTS technique. Along with this, most researchers generally are using mathematical modelling, like SEIR and SIR techniques, instead of using soft computing techniques for the prediction of COVID-19. Some of these models will forecast the effect of COVID-19 in upcoming weeks or days with more error rate. This gave us an encouragement to think out of the box and developed a novel hybrid fuzzy time series model (ANHFTS) which is based on modified FCM clustering technique for prediction of COVID-19 infected cases and deaths in India for coming 31 days. ANHFTS model has been used to forecast the infected cases and deaths in India. A primary reason for proposing this model is that this approach is more capable as compared to classical techniques and more durable in comparison to exist predicting models. Also, it can predict the COVID-19 infected cases and deaths for short-term and for long-term with small error. Proposed model has two phases. In phase-I, modified fuzzy c-means (MFCM) clustering technique is used to form intervals with the help of centroid while in phase-II, these intervals are upgraded to form sub-intervals and then predict the approximated cases of infection and deaths in India. The main contribution of the presented manuscript is: Developed a hybrid model for forecasting by FTS technique. FCM clustering technique is modified by using an exponential function to tolerate noisy data. This model can predict the approximate COVID-19 infected cases and deaths for trained and untrained data in India. Estimate the approximate number of isolation beds and ICU requirements till 31 August 2020 in India. The proposed ANHFTS model is considered as an unsupervised learning process. Rest of the presented manuscript is organized as follows. Some basic preliminaries of FCM and FTS are introduced in Section 2. In Section 3, some notation and MFCM with their necessary conditions are presented here. Section 4, described the description of the proposed model. Section 5 contains some performance measures which conclude the performance of proposed model. Section 6 reveals the implementation of proposed algorithm on two examples with the prediction of COVID-19 infected cases and deaths in India for next 31 days till 31 August 2020. Section 7 concludes this work.

Background information

In this section, we have discussed basic about FCM and FTS techniques which are used in our proposed model.

Fuzzy C-means clustering technique

Well known soft clustering technique, FCM partition the historical data in such a way that it can exist in more than one cluster with distinct membership value. Ruspini [46], [47] form clusters by using the concept of fuzzy set theory. Later on, Bezdek [48] improved the process of clustering after FCM formulation by Dunn [49]. The main objective of FCM is to minimize the objective function with their necessary conditions as follows. where, is the th data point, represent its membership value in th cluster, denotes Euclidean distance, c is number of cluster, is th cluster centroid. Eqs. (2), (3) are necessary conditions that will minimize Eq. (1). Here, .

Some basic definitions

This section contains important definitions which are used in throughout the present manuscript. Song and Chissom defined the FTS first time. The basic definitions of FTS, time-variant & time-invariant, FLR and FLR group (FLRG) are briefly reviewed as follows.

Fuzzy Time Series

Let be a fuzzy set defined on a universe of discourse , a subset of R. Then the set of is denoted by and it is called FTS defined on [38].

First Order Model

Suppose is formed by . Then, fuzzy relation can be expressed as , where o represent max–min composition and is fuzzy relationship between . Then, is called first order model [39].

Time-variant and Time-invariant

If relation of is not depend on t, , then is known as time-invariant FTS otherwise it is called time-variant [50].

Fuzzy Logical Relationship

If and then the relationship between is known as FLR and can be expressed as , where and are previous and current state of FLR [51].

Fuzzy Logical Relationship Group

Let assume that are FLR’s. Then these FLR’s can be grouped to form FLRG as [40].

Problem formulation and modified fuzzy c-means

Notation

The various notations used throughout this article are as:

Problem statement

Now-a-days, the whole India faced a pandemic problem in the form of COVID-19. Due to this, the number of confirmed cases shows a rapid growth in India day-by-day. Therefore, it is crucial to predict the number of upcoming infected cases and deaths in India. So, health professional are prepared for that situation and able to control the pandemic situation of COVID-19. By considering this problem, a novel hybrid predictive model has been developed by using the fuzzy time series based on MFCM.

Modified fuzzy C-means

Among the clustering technique, basic FCM [48] technique is frequently used by the researcher because of its easy implementation. But in a noisy environment, it gets easily stuck and does not form the desired output. So, to overcome this problem, the basic FCM has been modified by introducing a negative exponential variable for better robustness and the objective function of MFCM is given below. where, and , is the dimension of data and denote the th entry of th data point. The necessary conditions for the minimization of Eq. (4) are derived in the upcoming sub-section.

Derivation for cluster centroid

The minimization of objective function has necessary condition in the form of cluster centroid, it has been derived below for clusters. Let derive it for first cluster centroid i.e. . Differentiate Eq. (5) with respect to by considering other variables as constant. In the same way, we assumed that it can be easily obtained for th cluster. Hence, the general form of th cluster centroid for MFCM is as follows:

Derivation for membership value

Another necessary condition for the minimization of Eq. (4) is determined with respect to membership. So, the Lagrangian function for Eq. (4) is For obtaining necessary condition, differentiates Eq. (7) with respect to by considering other variables as constant. Again, differentiate Eq. (8), but this time with respect to and obtain the following form. From Eqs. (8), (9), we get By putting the value of in Eq. (9), the general form of membership value is obtained.

Description of the proposed ANHFTS model MFCM based approach

The cases of COVID-19 are growing rapidly in India, due to this obtaining the information regarding approximate number of infected cases and deaths in India become difficult. The main issues of government are how to control this disease and predict the upcoming confirmed cases of COVID-19 to prepare in advance for public health and economic decision on the basis of mathematical model. The present manuscript addresses a novel model for the prediction of COVID-19 infected cases and deaths by using FTS technique based on MFCM clustering to prepare in advance during this ongoing pandemic problem. The present model predicts the cases in two phases: Phase I: Form basic intervals by using cluster centroid obtained from MFCM clustering technique. Phase II: Upgrade the basic intervals into sub-intervals to forecast more accurately and predict the upcoming infected cases and deaths in India. Phase I In this phase, basic intervals are formed with help of obtained cluster centroid during MFCM clustering technique. To start the process of MFCM, first we have to know about how many clusters have to be formed. To obtain the number of clusters make partition the universe of discourse X as , where are randomly chosen positive numbers and are the minimum and maximum value of the collected historical data set, into equal length intervals. Instead of applied basic FCM to obtain centroid, we modified the FCM to get better results. By using MFCM, the centroids are obtained by assigning membership value randomly. Now, the basic intervals are formed with help of obtained centroid . Phase I is elaborated in the form of algorithm A with involved steps. Phase II In this phase, the basic intervals are upgraded by FTS technique to forecast more accurate value. Initially, the total number of elements is obtained from the set of historical data which belong to their respective basic intervals i.e. . Now, partition the interval into sub-interval with equal length. Repeat this process until all basic intervals partitioned into sub-intervals. Then, select only those sub-intervals in which historical data belong and referred them as . Now, define the linguistic variable for each intervals and allocate them to the historical data to fuzzified it as . The fuzzified time series data set can be defuzzified by using the mid points of upgraded interval for those whose FLR are non-empty is given below where n is the total number of sub-intervals, t represent time and is the mid-point of th interval. If the FLR is empty (untrained data), i.e. where represent empty FLR, then the predicted values are obtained by Eq. (12) where are previous predicted values, is current predicted value at any time t, r is the average rate of increment and decrement in all forecasted values obtained by Eq. (11) and h is any small number lying between 0 to 0.5 to overcome the effect of high increment and decrement in the value of r. Phase II is also elaborated in the form of algorithm B with summarized steps. The Fig. 1 depicts the flow chart for proposed ANHFTS model containing algorithm A and algorithm B.

Fig. 1

Flow chart of proposed ANHFTS model.

Performance measure

The measurement of the performance of proposed ANHFTS model is evaluated with three different parameters which are mean square error (MSE), root mean square error (RMSE) and average forecasting error rate (AFER).

Mean square error

The average of squared difference between forecasted and actual value is estimated by mean square error [52]. The value of MSE can be formulated in Eq. (13), lower its value, the better forecasted value. where n is the total number of data points.

Root mean square error

Root mean square error is used to calculate that how much the forecasted value differs with actual value [53]. The value of RMSE should be small for better forecasting and calculated by using Eq. (14). where n is the total number of data points.

Average forecasting error rate

Average forecasting error rate [54] is the percentage of error that reflects the absolute difference between the actual value and forecasted value at any point of time and it is defined in Eq. (15). where n is the total number of data points.

Experimental results and analysis

In this section, first we show that the MFCM is free from noisy data then, the proposed ANHFTS model is implemented on two examples to figure out the performance of it and prediction for the upcoming COVID-19 infected cases and deaths in India at the end of month August 2020 has been carried out.

Noisy environment effect

For clustering purpose, FCM is well-known technique. However, noisy environment easily affects the result of FCM. While clustering should be free from it. So, FCM is modified by introducing a variable in negative exponential form to conquer this problem. Let be a universe of discourse with data points defined on . By minimizing with respect to , the estimated value of is obtained by Eq. (16). Let {5, 3, 4.7, 6, 5.6, 5.1, 4.4, 5.3, 7, 4} be an artificial data set [55], has to be tested by the procedure of least-square method. 5 and 7.0833 are the estimated value of before and after adding the noisy value 30, in artificial data set, by Eq. (16), it shows noisy data highly affects the minimizer. The same data is applied to MFCM for obtaining minimizer result by Eq. (6) are 4.9951 and 4.9259 before adding noisy data and after respectively. Hence, it shows that MFCM tolerate noisy environment. Selected sub-intervals with their mid-points for April 2020. Forecasted infected cases of COVID-19 for the month of April, May, June and July 2020 in India.

Implementation of proposed ANHFTS model

For the future prediction of COVID-19 infected cases and deaths in India, the proposed ANHFTS model has been implemented. But before the prediction, the proposed model is tested against available data of infected cases and deaths due to COVID-19 form April to July 2020, for checking the efficiency of the proposed model. For predicting the number of infected cases of COVID-19 in India, the data form 1st April to 30th July 2020 are considered. This epidemic data of COVID-19 has been taken from Government authorized portal [56]. For simplicity, the whole data is divided into four groups i.e., April, May, June and July months group. Before the prediction of infected cases of COVID-19 for August 2020, the proposed model is tested against the actual data of infected cases by this virus in India. For simplicity, available four months data of COVID-19 is tested by proposed ANHFTS model month-wise. First apply the ANHFTS model for April 2020 COVID-19 data. Let X be the universe of discourse which contain the COVID-19 data of infected person for April 2020. The minimum and maximum values of data set are 2059 and 34 866 respectively, which are denoted by and respectively. For the simplicity, the values of and are taken randomly 59 and 2134 respectively. Then, the universe of discourse will be . Partition X into randomly chosen 7 equal length intervals such as . So, the 7 number of clusters will be formed according to the present model. Now, initialize the membership value and obtain the average of data as = 14975.1. Apply the step 5 and 6 of algorithm A until the termination condition is not obtained. The iterative process of MFCM upgrades the membership value along with centroid successively. After satisfying the termination condition, the centroid will be and in increasing order. After applying the step 8 of algorithm A, the following basic intervals are obtained: Now, obtain the number of elements belong to the interval are 2059, 2545, 3105 and 3684. Partition the interval into 4 sub-intervals by using the step 1 of the algorithm B. Repeat the same procedure with other remaining intervals and select those sub-intervals , which contain the given data. The obtained results are shown in Table 1.

Table 1

Selected sub-intervals with their mid-points for April 2020.

Variables	Sub-intervals	Corresponding elements	Mid-points
b1	[1531.3185, 2165.7626]	2059	1848.5405
b2	[2165.7626, 2800.2066]	2545	2482.9846
b3	[2800.2066, 3434.6507]	3105	3117.4286
b4	[3434.6507, 4069.0947]	3684	3751.8727
b5	[4069.0947, 4660.3387]	4293	4364.7167
b6	[4660.3387, 5251.5828]	4777	4955.9608
b7	[5251.5828, 5842.8268]	5350	5547.2048
b8	[5842.8268, 6434.0709]	5915	6138.4488
b9	[6434.0709, 7025.3149]	6728	6729.6929
b10	[7025.3149, 7986.8055]	7599	7506.0602
b11	[7986.8055, 8948.2961]	8453	8467.5508
b12	[8948.2961, 9909.7866]	9211	9429.0413
b13	[9909.7866, 10871.2772]	10 454	10 390.5319
b14	[10871.2772, 11844.1885]	11 485	11 387.7329
b15	[11844.1885, 12817.0999]	12 371	12 330.6442
b16	[12817.0999, 13790.0112]	13 432	13 303.5556
b17	[13790.0112, 14762.9226]	14 354	14 276.4669
b18	[14762.9226, 15735.8339]	15 725	15 249.3782
b19	[15735.8339, 18591.0009]	17 305, 18 544	17 877.2092
b20	[18591.0009, 21446.1679]	20 081, 21 373	20 732.3762
b21	[21446.1679, 23115.8187]	23 040	22 280.9933
b22	[23115.8187, 24785.4696]	24 448	23 950.6441
b23	[24785.4696, 26455.1204]	26 283	25 620.2950
b24	[26455.1204, 28124.7712]	27 890	27 289.9458
b25	[28124.7712, 29961.8690]	29 458	29 043.3201
b26	[29961.8690, 31798.9668]	31 360	30 880.4179
b27	[31798.9668, 33636.0646]	33 065	32 717.5157
b28	[33636.0646, 35473.1624]	34 866	34 554.6135

After applying the step 3 of algorithm B, the linguistic variables for each sub-intervals 28 are as: where, the denominators of denotes the membership value of each sub-intervals to their respective linguistic variables . Now, allocate these linguistic variables to the historical data according to their belongness into the sub-intervals . Then, the first order FLR’s and FLRG’s will be in the following form where represent th group and . After applying the remaining steps involved in the proposed ANHFTS model, the forecasted COVID-19 infected cases for April-2020 month with their linguistic variables are find out, which are shown in Table 2.

Table 2

Forecasted infected cases of COVID-19 for the month of April, May, June and July 2020 in India.

Date	April 2020		May 2020		June 2020		July 2020
	Linguistic variable	Forecasted infected cases of COVID-19	Linguistic variable	Forecasted infected cases of COVID-19	Linguistic variable	Forecasted infected cases of COVID-19	Linguistic variable	Forecasted infected cases of COVID-19
1	I1	2021	I1	37 471	I1	198 485	I1	609 538
2	I2	2399	I2	39 560	I2	205 184	I2	626 775
3	I3	3052	I3	42 829	I3	215 456	I3	652 376
4	I4	3694	I4	46 155	I4	225 730	I4	675 653
5	I5	4317	I5	49 544	I5	236 013	I5	696 495
6	I6	4920	I6	53 001	I6	246 307	I6	716 497
7	I7	5516	I7	56 603	I7	256 604	I7	737 468
8	I8	6110	I8	60 366	I8	266 959	I8	761 458
9	I9	6742	I9	64 105	I9	277 496	I9	788 599
10	I10	7503	I10	67 583	I10	288 226	I10	817 035
11	I11	8413	I11	70 780	I11	299 023	I11	846 076
12	I12	9380	I12	73 879	I12	309 907	I12	875 750
13	I13	10 347	I13	77 148	I13	321 065	I13	905 893
14	I14	11 318	I14	80 959	I14	332 511	I14	936 820
15	I15	12 292	I15	85 348	I15	344 057	I15	968 566
16	I16	13 268	I16	90 050	I16	355 730	I16	1 000 985
17	I17	14 243	I17	95 092	I17	367 800	I17	1 034 623
18	I18	15 556	I18	100 500	I18	380 285	I18	1 069 536
19	I19	17 724	I19	106 148	I19	392 915	I19	1 105 449
20	I19	17 724	I20	112 150	I20	406 281	I20	1 143 113
21	I20	20 275	I21	118 533	I21	421 951	I21	1 182 607
22	I20	20 275	I22	125 090	I22	440 045	I22	1 225 814
23	I21	22 253	I23	131 770	I23	458 627	I23	1 278 650
24	I22	23 892	I24	138 581	I24	476 037	I24	1 339 135
25	I23	25 566	I25	145 484	I25	492 216	I25	1 395 061
26	I24	27 257	I26	152 533	I26	507 968	I26	1 442 334
27	I25	29 009	I27	159 739	I27	524 454	I27	1 486 467
28	I26	30 826	I28	167 056	I28	543 236	I28	1 530 135
29	I27	32 666	I29	174 555	I29	564 426	I29	1 573 321
30	I28	33 920	I30	182 246	I30	579 347	I30	1 602 320
31	–	–	I31	187 506	–	–	–	–

MSE	191 360.0033		837 093.6694		7 095 004.0344		47 860 278.3793

RMSE	437.4471		914.9282		2663.6449		6918.1123

AFER (%)	2.0093		0.6061		0.4802		0.5561

Again, repeat the whole process for forecasting the COVID-19 infected cases for May, June and July 2020 months. The number of clusters formed for these months is 10, 8 and 10 respectively. After applying the steps involved in the proposed model, the forecasted results for the months May, June and July 2020 with their linguistic variables are evaluated and shown in Table 2. Graphical representations of forecasted and actual infected cases of COVID-19 from April to July 2020 in India. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Graphical representations of predicted COVID-19 infected cases in August 2020 in India. By the analysis of Table 2, it can be concluded that the calculated value of AFER (%), 2.0093, 0.6061, 0.4802 and 0.5561 for April, May, June and July 2020 months respectively are very small. Therefore, the forecasted values of the infected persons obtained by the proposed model are very close to the actual values. Hence, the proposed model is well trained and suitable for the future prediction of novel corona virus. Fig. 2 shows the graphical representation of Table 2. In which Fig. 2(a), (b), (c) and (d) show the graphical representation of forecasted and actual COVID-19 infected cases for the month of April, May, June and July 2020 respectively. In this figure, the forecasted infected indicate the data used for training purposes and actual values indicate the official data of infected cases till the end of July 2020 in India. From this graph, it is observed that the forecasted COVID-19 infected cases closely match the available official data.

Fig. 2

Graphical representations of forecasted and actual infected cases of COVID-19 from April to July 2020 in India. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

From Table 2, it can be concluded that the spreading of COVID-19 is increasing day-by-day in India. Now, apply step 7 of algorithm B, to calculate the average rate of increment of COVID-19 for the month of July 2020. The average rate of increment for July is 0.03451 and the randomly generated value of is 0.007. Now-a-days, the health ministry of India facing a lot of stress due to the COVID-19 virus. So, it becomes necessary to predict the number of COVID-19 cases in upcoming days to take protective measures in a worst situation. Therefore, the proposed ANHFTS model can also predict the number of upcoming new infected cases approximately. The predicted newly infected cases for the upcoming August-2020 month is shown in Table 3. The results of Table 3 shows that there could be approximately 3 659 185 COVID-19 infected cases up to the end of August 2020. The predicted results of Table 3 are also depicted graphically in Fig. 3.

Table 3

Predicted COVID-19 infected cases and deaths for upcoming month August 2020 in India.

Date	Predicted COVID-19 infected cases	Predicted COVID-19 deaths
31 July	1 644 409	36 259
1 August	1 687 561	36 939
2 August	1 731 761	37 631
3 August	1 777 008	38 334
4 August	1 823 443	39 052
5 August	1 871 093	39 782
6 August	1 919 990	40 526
7 August	1 970 164	41 284
8 August	2 021 649	42 056
9 August	2 074 479	42 843
10 August	2 128 691	43 645
11 August	2 184 319	44 461
12 August	2 241 400	45 293
13 August	2 299 973	46 140
14 August	2 360 077	47 003
15 August	2 421 752	47 882
16 August	2 485 038	48 778
17 August	2 549 978	49 690
18 August	2 616 616	50 620
19 August	2 684 994	51 567
20 August	2 755 160	52 531
21 August	2 827 159	53 514
22 August	2 901 039	54 515
23 August	2 976 850	55 535
24 August	3 054 643	56 574
25 August	3 134 468	57 632
26 August	3 216 379	58 710
27 August	3 300 431	59 808
28 August	3 386 680	60 927
29 August	3 475 182	62 067
30 August	3 565 997	63 228
31 August	3 659 185	64 410

Fig. 3

Graphical representations of predicted COVID-19 infected cases in August 2020 in India.

Predicted COVID-19 infected cases and deaths for upcoming month August 2020 in India. In this example, the number of predicted approximate deaths due to COVID-19 is obtained for the month of August 2020. The available data of deaths due to COVID-19 have been collected from Government authorized portal [56] during the period of April–July 2020. The prediction of COVID-19 deaths in India for August 2020 month will be determined, but before this prediction, the proposed model has been tested against the official data of deaths from COVID-19 in India. The proposed ANHFTS model is applied to the official data month-wise. The universe of discourse for April 2020 will be . According to step 1 of algorithm A, the estimated number of clusters for April 2020 will be 10. After applying the remaining steps involved in the proposed ANHFTS model, the forecasted COVID-19 deaths for the month of April 2020 with their linguistic variable are shown in Table 4. Similarly, the universe of discourse for May, June and July 2020 will be , and respectively. By step 1 of algorithm A, the estimated number of clusters for these months will be 10, 8 and 10 respectively. The forecasted value of COVID-19 deaths in India with their linguistic variables for the month of May, June and July 2020 are shown in Table 4.

Table 4

Forecasted COVID-19 deaths for the month of April, May, June and July 2020 in India.

Date	April 2020		May 2020		June 2020		July 2020
	Linguistic variable	Forecasted COVID-19 deaths	Linguistic variable	Forecasted COVID-19 deaths	Linguistic variable	Forecasted COVID-19 deaths	Linguistic variable	Forecasted COVID-19 deaths
1	I1	53	I1	1257	I1	5594	I1	17 874
2	I2	63	I2	1347	I2	5774	I2	18 210
3	I3	82	I3	1477	I3	6051	I3	18 720
4	I4	101	I4	1588	I4	6327	I4	19 232
5	I5	120	I5	1693	I5	6603	I5	19 744
6	I6	141	I6	1802	I6	6880	I6	20 260
7	I7	162	I7	1914	I7	7156	I7	20 781
8	I8	189	I8	2026	I8	7447	I8	21 309
9	I9	231	I9	2131	I9	7782	I9	21 825
10	I9	231	I10	2229	I10	8166	I10	22 298
11	I10	275	I11	2325	I11	8581	I11	22 725
12	I11	314	I12	2421	I12	9041	I12	23 137
13	I12	357	I13	2521	I12	9041	I13	23 605
14	I13	395	I14	2626	I13	9550	I14	24 244
15	I14	424	I15	2731	I14	10 299	I15	25 029
16	I15	450	I16	2842	I15	11 358	I16	25 776
17	I16	477	I17	2968	I16	12 204	I17	26 417
18	I17	510	I18	3111	I17	12 734	I18	27 034
19	I18	548	I19	3260	I17	12 734	I19	27 685
20	I19	589	I20	3412	I18	13 247	I20	28 371
21	I20	630	I21	3567	I19	13 741	I21	29 078
22	I21	672	I22	3724	I20	14 218	I22	29 811
23	I22	717	I23	3885	I21	14 658	I23	30 570
24	I23	771	I24	4051	I22	15 061	I24	31 335
25	I24	832	I25	4222	I23	15 450	I25	32 091
26	I25	891	I26	4402	I24	15 839	I26	32 838
27	I26	943	I27	4593	I25	16 228	I27	33 583
28	I27	998	I28	4790	I26	16 617	I28	34 332
29	I28	1064	I29	4994	I27	17 005	I29	35 085
30	I29	1117	I30	5205	I28	17 266	I30	35 593
31	–	–	I31	5350	–	–	–	–

MSE	126.2271		1203.2048		27 223.6109		17 023.0030

RMSE	11.2351		34.6872		164.9958		130.4722

AFER (%)	2.2989		1.0395		1.0560		0.4196

It can be observed by the analysis of Table 4 that the value of performance measure AFER , 2.2989, 1.0395, 1.0560 and 0.4196 for April, May, June and July 2020 months respectively are very close to 0. Therefore, the forecasted values of deaths due to COVID-19 obtained by proposed model have a minor difference from the actual value. The forecasted and actual values of deaths in India versus date for April, May, June and July 2020 months are also depicted in Fig. 4(a), (b), (c) and (d) respectively. From this graph, it is observed that the forecasted COVID-19 deaths obtained by proposed model are closely match with the available official data.

Fig. 4

Graphical representations of forecasted and actual deaths due to COVID-19 from April to July 2020 in India.

By using forecasted COVID-19 deaths data by proposed model present in Table 4, the average rate of increment in COVID-19 death cases for July is 0.02441 and randomly generated value of is 0.005. The prediction of the deaths of COVID-19 is also necessary for taking protective measures against it. The predicted deaths due to novel virus for August 2020 month is shown in Table 3. The predicted COVID-19 deaths in August 2020 are also represented in Fig. 5. The results of Table 3 shows that there could be approximately 64 410 deaths due to COVID-19 up to the end of August 2020. These predicted values of deaths may differ from the official data which shall be obtained at the end of August 2020 because of the awareness of people and daily upgrading health infrastructure towards novel corona virus.

Fig. 5

Graphical representations of predicted COVID-19 deaths in August 2020 in India.

In Example 6.2.1, Example 6.2.2, errors are calculated month-wise. For checking the accuracy in forecasted COVID-19 infected cases and deaths in India by the proposed ANHFTS model, the error percentage is calculated on daily basis by taking a interval of 5 days for the month of July 2020. The forecasted values of COVID-19 infected cases and deaths obtained by the proposed model, official data and calculated error percentage are shown in Table 5. The result of this table shows that the error between these two data for infected cases and deaths are minor. Hence, the proposed model can be applied to predict the approximate infected cases and deaths of COVID-19 for upcoming month August 2020 with fewer errors.

Table 5

Comparison of official data and forecasted data of infected cases as well as deaths for July 2020 in India.

Date	Official data	Forecasted value	Error percentage
Infected cases
05-07-2020	697 846	696 495	0.1936
10-07-2020	822 604	817 035	0.6770
15-07-2020	970 169	968 566	0.1652
20-07-2020	1 154 913	1 143 113	1.0217
25-07-2020	1 387 087	1 395 061	−0.5749
30-07-2020	1 612 354	1 602 320	0.6223

Deaths

05-07-2020	19 701	19 744	−0.2183
10-07-2020	22 144	22 298	−0.6954
15-07-2020	24 929	25 029	−0.4011
20-07-2020	28 099	28 371	−0.9680
25-07-2020	32 121	32 091	0.0934
30-07-2020	35 769	35 593	0.4920

Forecasted COVID-19 deaths for the month of April, May, June and July 2020 in India. Comparison of official data and forecasted data of infected cases as well as deaths for July 2020 in India. Graphical representations of forecasted and actual deaths due to COVID-19 from April to July 2020 in India. Graphical representations of predicted COVID-19 deaths in August 2020 in India.

Estimation of approximated isolation bed’s and ICU’s

The recovery rate in India from COVID-19 is 64% [57] of the total infected cases on 30 July 2020. If it is assumed that this recovery rate will remain constant till 31 August 2020. According to the proposed ANHFTS model of COVID-19, the expected number of recovered people will be approximately 2 341 879 till 31 August 2020. The active cases may require hospitalization, quarantine and ICU’s in case of emergency. The number of active cases will be calculated by Eq. (17). From Eq. (17), the number of active cases will be approximately 1 252 896 at the end of August 2020. According to the recent report of the Ministry of Health and Family Welfare (MoHFW), a total of 944 170 isolation beds, 31 258 ICU’s and 114 638 oxygen supported beds are available to fight against COVID-19 by including 930 COVID hospital, 2362 COVID health-centre, 10 341 quarantine centres and 7195 COVID centres [58]. At present, India has successfully prevented plenty of infected cases and deaths from COVID-19 due to their awareness and advance health infrastructure. However, COVID-19 infected cases continuously shows growth. Therefore, it is essential to increase the number of isolation beds, the number of ICUs or ventilator devices for the struggle against COVID-19 pandemic diseases. According to the present study, the expected number of active cases will be approximately 1 252 896 at the end of August 2020. It is assumed that only 2% to 5% [59], [60] of the active cases are critical who required Ventilators. So, the estimated number of required ventilators for the treatment of COVID-19 infection will be approximately 25 058 to 62 645 and the rest of the active cases approximately 1 190 252 to 1 227 838 should be hospitalized or quarantined. Therefore, India will require approximately 12.5 lakh beds for infected persons and 65 thousand ICU’s for critical infected persons at the end of August 2020. Indian government should impose a strict lockdown to make a breakdown in new cases. Also, the Indian people should follow the guidelines of the Health Ministry and should adopt protective measurements like hand wash, wearing a mask, use sanitizer, follow social distancing, etc.

Analysis of variance

The results obtained by the proposed ANHFTS model and official data of COVID-19 infected cases and deaths in India for the month of July 2020 were tested by one-way ANOVA and it has been carried out in MINITAB 19. The results of analysis of variance are shown in Table 6. The result of ANOVA between ANHFTS model and official data of COVID-19 infected cases shows that the -value is 0.9855 which is greater than the F-value 0.0003 at 95% confidence level. Similarly, the -value is also greater than the F-value for the number of COVID-19 deaths in India for July 2020. Hence, the mean value of the proposed ANHFTS model and official data do not differ significantly at 95% level of significant for infected cases as well as deaths in India, which conform the better accuracy of the proposed model.

Table 6

ANOVA analysis of COVID-19 infected cases and deaths in July 2020.

Source	DF	Adj. SS	Adj. MS	F-value	P-value
AHFTS model versus Official data for infected cases in India
Between-group	1	4.17E＋07	4.17E＋07	0.0003	0.9855
Within-group	10	1.20E＋12	1.20E＋11
Total	11	1.20E＋12

AHFTS model versus Official data for deaths in India

Between-group	1	1.10E＋04	1.10E＋04	0.0003	0.9865
Within-group	10	3.65E＋08	3.65E＋07
Total	11	3.65E＋08

ANOVA analysis of COVID-19 infected cases and deaths in July 2020.

Conclusion

Daily growing magnitude of COVID-19 cases put the whole world humanity at high risk. Thus, it becomes necessary to control the outbreak of COVID-19 disease and forecast the infected cases and deaths in the upcoming days to execute the necessary plans. Therefore, this study presents a hybrid predictive FTS model based on the MFCM clustering technique. The main purpose of this article is to develop an effective model for estimating the number of COVID-19 infected cases and deaths in India for next coming 31 days. The proposed ANHFTS model is tested against available COVID-19 data of India for the measurement of its performance based on MSE, RMSE and AFER. Table 2, Table 4, show that the proposed ANHFTS model is capable for the prediction of infection and deaths. According to Table 3, the number of newly infected cases and deaths at the end of August 2020 in India will be approximately 3 659 185 and 64 410 respectively. Also, the proposed model predict the requirement of isolation bed’s and ICU’s to deal with COVID-19 in coming days. The output of the proposed model suggest that there will be requirement of approximately 12.5 lakh beds for infected persons and 65 thousand ICU’s for critical infected persons at the end of August 2020 in India. Thus, the proposed model could be significantly important for government and health care decision-makers for making protection plan during this pandemic. The Indian government substantially controls COVID-19 disease but they have to plan a strict strategy against the increment of COVID-19 cases in India and reduces the spread of virus significantly otherwise it will affect a large population of India. If there is no outbreak in the spreading of COVID-19 then, this figure 3 659 185 may be converted into another big figure or even up to crore in upcoming months. Through the developed ANHFTS model we may be enable for calculating important parameter such as infection rates and deaths rates, which will help us to have a more accurate grasp of the transmission trained of COVID-19 type disease, if occurs in future. In recent years, several membership functions have been developed. Each membership grade has some advantages as well as disadvantages. It is impossible to develop a general framework for a membership function because each model possesses its limitations and characteristic. In the context of each application, some membership functions have been seen more appropriate than other. However, the issue of choosing a general membership grade is still a subject of research. In future study, we will extend this present work and study the impact of different membership functions on the predictive results.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

vi	ith data point
μki	Membership value of ith data point in kth cluster
Ck	kth cluster centroid
c	Number of clusters 2≤c≤n−1
m	Fuzzy index
Jvi,Ck	Objective function
Ft	Forecasted value at any time t
Ii	ith linguistic variable
AVi	ith actual value
r	Rate of increment or decrement
Ct	Number of confirmed cases at any time t
At	Number of active cases at any time t
Rt	Number of recovered cases at any time t
Dt	Number of deaths at any time t
MFCM	Modified fuzzy C-means
ANHFTS	A novel hybrid fuzzy time series

Algorithm A	To make basic interval by using MFCM clustering technique
Step 1:	Partition the universe of discourse X into several equal length intervals for obtaining the numbers of clusters c.
Step 2:	Input m and fuzzy stopping criteria ∈, here ∈=0.0001;m=2.
Step 3:	Randomly initialize membership value for each historical data s.t. ∑k=1cμkit=1, where t is number of iterations.
Step 4:	Calculate the value of a and ui by following formula a=\|∑i=1nvi/n\|andui=\|∑j=1dmij\|.
Step 5:	Calculate the cluster centroid by .
Step 6:	Update the membership value by .
Step 7:	If ‖μkit+1−μkit‖≤∈, then go to next step, otherwise go to step 5.
Step 8:	Calculate the basic intervals with the help of centroid by using following steps:
Step 8.1:	UBk=Ck+Ck+12;LBk+1=UBk;k=1,2,…,c−1, where UB and LB are upper bound and lower bound respectively.
Step 8.2:	LB1=2C1−LB2,UBc=2Cc−LBc.
Step 9:	End.

Algorithm B	To make forecasting by using FTS
Step 1:	Basic interval obtained by previous algorithm A are partitioned into sub-intervals according to the number of elements yi belong to them.
Step 2:	Select those sub-intervals bi in which historical data belong and calculate the mid-points mi of each sub-interval.
Step 3:	Linguistic variables are defined for each selected sub-interval obtained in step 2 as Ii=∑j=1nbiαij;i∈N,whereαij=1ifi=j0.5ifj=j−1orj=j+10otherwise.
Step 4:	Allocate the linguistic variable to all historical data according to the belonging of data to their respective sub-interval.
Step 5:	Create first order FLR and FLRG from step 4.
Step 6:	Defuzzify the historical data i.e., calculate the forecasted value by using Eq. (11), if FLR is non-empty.
Step 7:	Calculate the average rate of increment or decrement r of defuzzified value obtained in previous step; 0≤r≤1.
Step 8:	Determine the predicted value by Eq. (12), if FLR is empty.
Step 9:	End.

29 in total

1. Temperature prediction using fuzzy time series.

Authors: S M Chen; J R Hwang
Journal: IEEE Trans Syst Man Cybern B Cybern Date: 2000

2. A SIR model assumption for the spread of COVID-19 in different communities.

Authors: Ian Cooper; Argha Mondal; Chris G Antonopoulos
Journal: Chaos Solitons Fractals Date: 2020-06-28 Impact factor: 9.922

3. A vulnerability index for the management of and response to the COVID-19 epidemic in India: an ecological study.

Authors: Rajib Acharya; Akash Porwal
Journal: Lancet Glob Health Date: 2020-07-16 Impact factor: 26.763

4. Prudent public health intervention strategies to control the coronavirus disease 2019 transmission in India: A mathematical model-based approach.

Authors: Sandip Mandal; Tarun Bhatnagar; Nimalan Arinaminpathy; Anup Agarwal; Amartya Chowdhury; Manoj Murhekar; Raman R Gangakhedkar; Swarup Sarkar
Journal: Indian J Med Res Date: 2020 Feb & Mar Impact factor: 2.375

5. A data driven epidemic model to analyse the lockdown effect and predict the course of COVID-19 progress in India.

Authors: Bijay Kumar Sahoo; Balvinder Kaur Sapra
Journal: Chaos Solitons Fractals Date: 2020-06-20 Impact factor: 9.922

6. Mathematical modeling of COVID-19 fatality trends: Death kinetics law versus infection-to-death delay rule.

Authors: Stefan Scheiner; Niketa Ukaj; Christian Hellmich
Journal: Chaos Solitons Fractals Date: 2020-05-30 Impact factor: 5.944

7. Optimization Method for Forecasting Confirmed Cases of COVID-19 in China.

Authors: Mohammed A A Al-Qaness; Ahmed A Ewees; Hong Fan; Mohamed Abd El Aziz
Journal: J Clin Med Date: 2020-03-02 Impact factor: 4.241

8. Prediction of the Epidemic Peak of Coronavirus Disease in Japan, 2020.

Authors: Toshikazu Kuniya
Journal: J Clin Med Date: 2020-03-13 Impact factor: 4.241

9. Prediction and analysis of COVID-19 positive cases using deep learning models: A descriptive case study of India.

Authors: Parul Arora; Himanshu Kumar; Bijaya Ketan Panigrahi
Journal: Chaos Solitons Fractals Date: 2020-06-17 Impact factor: 9.922

3 in total

1. A novel explainable COVID-19 diagnosis method by integration of feature selection with random forest.

Authors: Mehrdad Rostami; Mourad Oussalah
Journal: Inform Med Unlocked Date: 2022-04-06

2. Improved seagull optimization algorithm of partition and XGBoost of prediction for fuzzy time series forecasting of COVID-19 daily confirmed.

Authors: Sidong Xian; Kaiyuan Chen; Yue Cheng
Journal: Adv Eng Softw Date: 2022-08-01 Impact factor: 4.255

3. Knowledge-based and data-driven underground pressure forecasting based on graph structure learning.

Authors: Yue Wang; Mingsheng Liu; Yongjian Huang; Haifeng Zhou; Xianhui Wang; Senzhang Wang; Haohua Du
Journal: Int J Mach Learn Cybern Date: 2022-10-02 Impact factor: 4.377

3 in total