Vaibhav Vaishnav1, Jayashri Vajpai2. 1. Department of Electrical Engineering, Indian Institute of Technology, Jodhpur - 342037, India. 2. Department of Electrical Engineering, M.B.M. Engineering College, Jodhpur - 342011, India.
Abstract
Ever since the outbreak of novel coronavirus in December 2019, lockdown has been identified as the only effective measure across the world to stop the community spread of this pandemic. India implemented a complete shutdown across the nation from March 25, 2020 as lockdown I and went on to extend it by giving timely partial relaxations in the form of lockdown II, III & IV. This paper statistically analyses the impact of relaxation during Lockdown III and IV on coronavirus disease (COVID) spread in India using the Group Method of Data Handling (GMDH) to forecast the number of active cases using time series analysis and hence the required medical infrastructure for the period of next six months. The Group Method of Data Handling is a novel self organized data mining technique with data driven adaptive learning capability which grasps the auto correlative relations between the samples and gives a high forecasting accuracy irrespective of the length and stochasticity of a time series. The GMDH model has been first validated and standardized by forecasting the number of active and confirmed cases during lockdown III-IV with an accuracy of 2.58% and 2.00% respectively. Thereafter, the number of active cases has been forecasted for the rest of 2020 to predict the impact of lockdown relaxation on spread of COVID-19 and indicate preparatory measures necessary to counter it.
Ever since the outbreak of novel coronavirus in December 2019, lockdown has been identified as the only effective measure across the world to stop the community spread of this pandemic. India implemented a complete shutdown across the nation from March 25, 2020 as lockdown I and went on to extend it by giving timely partial relaxations in the form of lockdown II, III & IV. This paper statistically analyses the impact of relaxation during Lockdown III and IV on coronavirus disease (COVID) spread in India using the Group Method of Data Handling (GMDH) to forecast the number of active cases using time series analysis and hence the required medical infrastructure for the period of next six months. The Group Method of Data Handling is a novel self organized data mining technique with data driven adaptive learning capability which grasps the auto correlative relations between the samples and gives a high forecasting accuracy irrespective of the length and stochasticity of a time series. The GMDH model has been first validated and standardized by forecasting the number of active and confirmed cases during lockdown III-IV with an accuracy of 2.58% and 2.00% respectively. Thereafter, the number of active cases has been forecasted for the rest of 2020 to predict the impact of lockdown relaxation on spread of COVID-19 and indicate preparatory measures necessary to counter it.
Human civilizations have been periodically challenged by the onset of infectious diseases. In the realm of infectious diseases, a pandemic is the worst case scenario. The latest one in the series of pandemics has been caused by the family of corona viruses. Corona viruses are pleomorphic, single stranded ribonucleic acid (RNA) viruses. The “novel” coronavirus is a new strain that has not been previously identified in humans. The name derives from the crown like appearance produced by the club shaped projections that stud the viral envelope. The 21st century saw its first pandemic in 2002 as Severe Acute Respiratory Syndrome or SARS followed by Middle East Respiratory Syndrome (MERS-CoV) and Severe Acute Respiratory Syndrome (SARS-CoV) [1]. Today the world is fighting another pandemic known as Coronavirus disease 2019 abbreviated as COVID-19. The initial cases of COVID-19 were reported on 8 December 2019 in Wuhan, Hubei province, China. Cases were reported after exposure to the local Hunan South China seafood market that sells a variety of wild animals, suggesting that the zoonoticCoronavirus crossed the barrier from animal to human at this market [2]. The COVID-19 is said to be caused by 2019-nCoV (Novel Coronavirus 2019, 2020) termed by World Health Organization (WHO) or SARS-CoV-2(Severe Acute Respiratory Syndrome Coronavirus 2) as termed by the International Committee on Taxonomy of Viruses. COVID-19 virus is categorized by WHO as β-CoV of group 2B [3]. The genome of this virus is identified and it resembles the SARS-CoV (80% similarity) and MERS-CoV (50% similarity) [4,5]. As of 30/06/2020, the world has registered 1,01,85,374 confirmed cases and 5,03,862 deaths due to COVID-19. With nearly 25% of total cases in world, USA has been the most effected country followed by Brazil, Russia, and India. The first confirmed case of novel coronavirus in India was reported on 30 January 2020, in the state of Kerala. As of today, India has reported 5,66,840 confirmed cases and 16,893 deaths due to COVID-19 [6,7].The spread of coronavirus is by sneezing, cough droplets and contact. This virus tends to enter the body through the mouth, nose, and eyes [8]. It is speculated that the virus may infect a person at a distance of about 6 ft (1.8 m) radius. The virus can survive for about 2 hours to few days in sneezing and cough droplets lying on the surface or ground. Studies have shown that the infection can spread through fomites but it is not the major source of the infection. This virus has been detected in stools of the patients but no infection via stool has been reported. Similar to SARS-CoV, nCoV infects cells of the respiratory tract through the angiotensin-converting enzyme 2 (ACE2) receptors [9]. A proteolytic cleavage occurs at SARS-CoV S protein at position (S2′) mediating the membrane fusion and viral infectivity [10]. It is likely that the infection may arise if a person comes in direct or indirect contact with any of the body fluids of an infected person.The entire clinical picture of the disease is not fully known however the symptoms vary from mild to severe. Risk is greater in extremes of age and in the patients having other health problems like lung diseases, diabetes, heart diseases, and cancer. The common signs of infection are fatigue, muscle pain, sneezing, sore throat, dry cough, high fever, respiratory problems, etc. with some severe cases having pneumonia, serious respiratory syndrome, kidney failure and death. The incubation period as reported by World Health Organization is between 2 to 10 days [11].Prevention and management are the most important aspects of controlling COVID-19 spread. Thus, the need for collective efforts of the public and the government arise. Simple steps like avoiding sneezing and coughing at the public places, covering mouth and nose with masque during sneezing and coughing, frequent cleaning of hands with soap and alcohol based sanitizers are essential. It is advised to avoid the interactions with persons; suspecting respiratory problems symptoms like sneezing, coughing, breathing problem, etc.Different nations have imposed different duration and natures of lockdown as per the severity and number of cases in their country. The Government of India too, has introduced social distancing as a precaution to avoid the possibility of a large-scale population movement that can promote the community spread of the disease. The purpose of these initiatives is the restriction of social interaction in workplaces, schools, and other public spheres, except for essential public services such as hospitals. Despite no vaccine, social distancing has been identified as the most commonly used prevention and control strategy [12]. The Indian government on 25th March 2020 implemented a nationwide lockdown to slow the spread of COVID-19. It was world's largest lockdown which sent 1.3 billion people into isolation. After 40 days and one extension, due to declining economy the government decided to give some relaxations from 3 May, 2020. Since the beginning of lockdown 3.0, the country has seen an unprecedented growth in confirmed cases. This paper analyses the effect of opening or relaxation of lockdown on novel coronavirus spread in India using time series analysis and forecasts its nationwide quantitative spread for next six months thereby giving an extra edge to the medical fraternity in combating COVID-19. Although the mortality rate in India today is 2.98% but with world's 4th highest number of cases and 2nd largest population, the next few months are very critical in deciding the overall long term impact COVID-19 will have on Indian population and economy. This paper is summarized as follows: Section 2 gives a review of time series forecasting, Section 3 elaborates the GMDH technique and algorithm, Section 4 discusses application and results and finally Section 5 describes the Conclusion.
Time series forecasting
A time series is a set of quantitative observations on a variable of interest arranged in sequential order. Over the years, time series analysis has been used to study the statistical properties of data and propose a suitable mathematical model for data generating process so as to forecast the future values of time series. Time series forecasting has been used to forecast future values of numerous economic, demographic, climatic, financial and industrial variables. Monthly electrical peak load demands, daily minimum temperatures, yearly population growth, weekly industrial emissions, hourly manufactured units are all different examples of time series.The idea of stochasticity in the era of early 19th century time series analysis was first introduced by Yule [13] and Kolmogorov [14] who proposed that every time series can be considered as a stochastic process. This idea became the foundation of first autoregressive and moving average models by the researchers like Slutsky, Walker, Yaglom and Yule. In 1970, Box & Jenkins integrated the existing knowledge and proposed the historic Autoregressive Integrated Moving Average (ARIMA) model [15] which became a stepping stone for application of modern time series analysis and forecasting in various areas of science. Since then, univariate ARIMA, multivariate (Vector) ARIMA, Seasonal ARIMA (SARIMA), Autoregressive Moving Average with Exogenous Inputs (ARMAX), exponential smoothing, multiple regression, etc. have been used for time series forecasting in all domains of quantitative measurement.Forecasting in the field of epidemics aims at estimating the size and impact of an infectious disease in near future. The above mentioned time series models have been used from time to time to predict the impact of various infectious diseases across the world. Teng et al. [16] have used ARIMA (0,1,3) model for dynamic forecasting of worldwide Zika virus outbreaks in November 2016. Similarly, Li et al. [17] have applied piecewise exponential smoothing on logarithmic transformed data to predict epidemiological trend of measles in Shandong Province, China for the year 2005. Linear and Poisson regression have been used by Pelat et al. .[18]. for retrospective detection of pneumonia and influenza mortality, and prospective surveillance of diarrhoea in France from 1968 to 1999. With the corona outbreak, researchers throughout the world have used ARIMA and exponential smoothing models to forecast the COVID19 effects across countries like China, USA, Italy, India, Canada, France, South Korea and UK for different forecasting horizons. [19], [20], [21], [22], [23], [24]Apart from all these methods, the most commonly used epidemiological mathematical model for disease modeling has been the SIR model [25] where abbreviation stands for the number of suscepted, infected and recovered individuals in total population. It is used for prediction of infected population after taking the values of transmission rate and recovery rate of the disease in population. The SIR and modified SIR models have been used to predict the cycle of almost all the epedemics may it Ebola virus in Africa [26], Measeles in Britain [27], smallpox in Bangladesh [28], H1N1 in Japan [29], Influenza in Honk Kong [30] or COVID 19 across different countries across the world [31], [32], [33].But all the above mentioned time series models are parametric in nature, i.e., one needs to mention the fixed set of parameters which describe the relationship between input and output variables and which are estimated from time series realizations. Hence, the data generating process is hidden and model structure has to be prespecified. Moreover, most of these models are accurate for linear and stationary time series. Although, most of the diseases form seasonal time series but predicting the spread of a disease that has never been encountered before is a challenging task. The time series associated with many diseases are highly dynamic in nature and cannot be approximated accurately by using traditional epidemiological and statistical models mentioned before. Therefore, it is necessary to use advanced computing models like the ones of neural networks for their modeling. The prior assumptions imposed on the data generating process are lesser in case of neural networks and they are more robust and tolerant to non linearity in the forecasting data. Hence, with strong adaptive learning capability, neural networks have been widely used to model various disease time series which exhibit complex nonlinear patterns.Zhang et al. [34] have shown the outperformance of neural networks over SARIMA model in forecasting typhoid fever incidence in Guangxi province of China for year 2010. Chakraborty et al. [35] have forecasted dengue epidemic for San Juan and Iquitos regions using hybrid Neural network ARIMA model to capture both linearity and non linearity in time series. Zhu et al. [36] have proposed a novel deep neural network to forecast the Influenza outbreak in Guangzhou, China for year 2018. Pertaining to its exponential growth, the novel Corona virus outbreak has also been predicted using various versions of neural networks in past three months. Chemmula & Zhang [37] have made predictions for Canada using long short term memory neural nets while Huang et al. [38] have used deep convolutional Neural Network for forecasting confirmed cases in China. Uhlig et al. [39] have combined epidemiological and neural network approach to prepare an online dashboard for forecasting and providing COVID-19 prognoses for all countries of Europe and South East Asia.But despite their strong nonlinear mapping ability, researchers have been able to reduce but not completely remove the requirement of neural networks regarding prior information about the system under investigation. Despite their combination with other optimization techniques in hybrid networks, their black box nature still remains as an important limitation that cannot be ignored. Also, in all the above examples where neural networks have been used for forecasting, the length of all the time series is sufficiently large with high correlation between data samples for optimum training of the network. This is not the case with COVID-19.In India, the virus has marked its presence by March 2020 end and the associated time series is not large enough as compared to datasets of previous pandemic outbreaks. Therefore, to model the growth and effects of COVID-19, a highly self organized model is required that is competent enough to decode the nonlinear trend within data even from a short length of time series. GMDH is one such nonparametric nonlinear model which automatically extracts knowledge from data samples and trains itself without any prior knowledge about the system. It is an advanced neural network with a nonlinear optimization process which is capable of predicting real, dynamic and chaotic time series without affecting the forecasting accuracy.The authors of this paper have already used GMDH from forecasting the number of monthly airline passengers (a benchmark linear time series in forecasting literature) [40] to forecasting monthly peak electrical load demand for India's largest state, Rajasthan (a real time non linear time series) [41]. In both the cases, GMDH has performed better than reported references. Apart from this in past one decade, GMDH has been used for forecasting wind speed [42], reservoir water levels [43], daily traffic flow [44], stock indices [45], significant wave height [46], turbidity [47], industry market demand [48], cash demand in ATMs [49], local vehicle population [50] and even oil prices [51]. In the field of disease forecasting, GMDH has been recently used to predict the number of patients with lower respiratory disease due to air pollution [52] and total number of knee and hip replacements in arthritispatients [53] but it has yet not been used to predict the size of an epidemic. This paper, in a first, proposes GMDH to predict the growth of pandemic like COVID 19 after explaining the algorithm in next section.
GMDH technique
GMDH is an inductive self organizing technique proposed by Ukrainian scientist A. G. Ivakhnenko [54] which identifies the internal structure of non linear systems by extracting knowledge from data samples. The GMDH network uses polynomials to model the mathematical relationship between multiple inputs and single output. It is an advanced version of perceptron [55] where the total number of layers and number of nodes in each layer of network are not prespecified but are automatically decided as the calculation proceeds and the network evolves. The layers and respective nodes in each layer are linked by a quadratic transfer function. The weights of these transfer functions are calculated by solving Gauss normal equations for a group of inputs at a time rather than randomly searching amongst all inputs, hence the name Group Method of Data Handling. The passage of each node to next layer is determined by the survival of fittest criterion, until the final optimized model with minimum error is achieved. The GMDH uses non linear Kolmogarov-Gabor [56] polynomial as output equation for network development given as follows:where a0, ai, aij, aijk. denote polynomial weights and x1, x2, x3
……xN denote the input variables. The above equation is linear in weights and non linear in variable x. The detailed algorithm is explained in next sub section.
The GMDH algorithm
– the first step consists of modeling the data structure where dataset is divided into training and checking set respectively. Training set is used for estimating the weights of polynomial transfer functions whereas checking set is used for selection of fittest node in a respective layer. Different ratios of training to checking set observations can be used to examine the potential of algorithm on different datasets with different statistical properties.– with polynomials as partial functions, each GMDH network is made up of variable number of hidden layers with each layer having variable number of nodes the method of selection of which is described in next two steps. Two inputs are fed at each node which undergo a quadratic transfer function as per the Ivakhnenko [54] polynomial:where a, b, c, d, e, f are coefficients of polynomial for pair of input variables x1, x2 .If n is the number of input variables at a layer then the number of nodes in that layer is given by nC2 = n(n-1)/2 and so are the number of Ivakhnenko polynomials for that layer. The number of nodes that enter the next layer are governed by Regularity criteria as described in step IV. Hence, groups of many lower order polynomials are used for successive approximation at each layer rather than one higher order polynomial with all the input variables with terms of all powers.– the regression coefficients of node transfer function are determined using training set and least squares method, i.e., forming a network such that square of difference between the actual output yi present in training set and predicted output yi* is minimum for each pair of input variables [57].– the outputs of polynomial transfer functions in a layer serve as inputs to the next layer but the number of variables to be passed on to next layer are determined by using a regularity criterion and checking set. The regularity criterion ‘R’ measures the mean squared error between the predicted and actual value for each node (but this time using the checking set). If the value of R is less than a threshold value, then the node output is passed as an input to the next layer otherwise it is eliminated. This self selection procedure is analogous to Darwin's theory of evolution. The regularity criterion is given as follows:
- After selecting the fittest node outputs for the next layer, the value of Rmin for each layer is recorded. This process is repeated unless the GMDH model begins to show over fitting, i.e., unless the value of Rmin for a layer is greater than the value of Rmin for the previous layer. Hence, the polynomial with least value of R is chosen as best polynomial and the model so formed as the most optimal model as shown in the Fig. 1
.
Fig. 1
Variation of regularity criterion for optimum fit.
Variation of regularity criterion for optimum fit.
Implementation of GMDH algorithm
A simplified GMDH model has been shown in Fig. 2
. Let x1, x2, x3, x4 be the total number of inputs. For 4 inputs, the total numbers of pairs fed to the first layer are 4C2 = 6. For six input pairs (x1, x2), (x1, x3), (x1, x4), (x2, x3), (x2, x4), (x3, x4), let the outputs of polynomial transfer functions (as per Eq. (2)) be y11, y12, y13, y14, y15 and y16 respectively. As described in step IV, only those outputs are passed as inputs to the next layer, which fit the regularity criteria. Assuming that y11, y13 and y15 pass the regularity criteria (as indicated by dark circles), they become inputs for the second layer. Now, for 3 inputs, 3C2 = 3 pairs are generated, i.e. (Y11, Y13), (Y11, Y15) and (Y13, Y15) respectively. The outputs for the second layer are given by y21, y22 and y23. Assuming that y23 fails the regularity criteria y21 and y22 are passed on to third layer as inputs whose output y31 is determined as the best polynomial. Here, 4 inputs have been taken for simplicity. The following exercise can be done for any number of inputs until only one output passes the regularity criteria. The total number of layers in this network are three with 6, 3 and 1 nodes respectively which have been determined automatically using the regularity criteria as the algorithm proceeds. It is also important to note that with four inputs (x1, x2, x3, x4), the complete polynomial with terms of all powers would have had a total of 70 terms. Hence, determining a fourth order polynomial fit would have involved a simultaneous estimation of 70 parameters whereas GMDH involves calculation of only 6 parameters at a time as per Ivakhnenko polynomial. This saves computation time and makes GMDH preferable over other techniques to solve large dimensional problems when the data sequence is comparatively short. The next section discusses the application of this GMDH algorithm for COVID 19 forecasting.
Fig. 2
GMDH network architecture for four inputs.
GMDH network architecture for four inputs.
Application & results
Ever since the outbreak of global pandemic COVID-19, medical researchers throughout the world have been working on development of its vaccine but till this time, the COVID-19 remains incurable and social distancing has been identified as the only preventive measure. As of 30/06/2020, India has reported 5,66,840 confirmed novel coronavirus cases with 3,34,822 recoveries and 16,893 deaths as shown in Fig. 3
[58]. India is home to world's second largest population and has significantly higher population density as compared to USA, Brazil, Russia, UK, Italy – most adversely affected countries by coronavirus. The countries like USA, UK and Italy reported highest number of deaths due to COVID-19 [7], inspite of having the world's best medical infrastructure. Learning from this challenging situation of most of the developed nations, India took a proactive decision amongst the South East Asian countries even when the roots of pandemic were not so deep in the region. To break the infection chain in its very early stages, the Government of India announced a complete lockdown across the nation for 21 days from 25 March 2020 to 14 April 2020. It was historical in the sense that
Fig. 3
Total confirmed, recovered and deceased COVID 19 cases growth in India.
Total confirmed, recovered and deceased COVID 19 cases growth in India.1.3 billion people were home isolated, with complete shutdown of all social activities except the essential services. With increasing number of positive cases, the government further extended the lockdown from 15 April 2020 to 3 May 2020. Notably, the confirmed cases rose to 39,980 till 3 May 2020 from 562 as on beginning of lockdown I. Giving importance to the high recovery rate and deteriorating economy, the government decided to impose Lockdown III from 4 May 2020 to 17 May 2020 with suitable relaxations which was further extended till 31 May 2020 as lockdown IV. The first subSection 4.1 describes that how the time series of Lockdown III-IV is statistically different from that of lockdown I-II and how GMDH model accurately predicts the number of active and confirmed cases during Lockdown III-IV in spite of these differences.
Effect of relaxation in lockdown III-IV
After bearing a 50 day complete shutdown of all social and economic activities, the government announced lockdown III with some relaxation in norms. These norms granted some allowances to public like permitting them to get out of their homes between 7 AM to 7 PM and use vehicles with 50% occupancy. Public and private sector offices were also allowed to resume functioning with one third employees and so were shops and industries while strictly following the norms of social distancing. All the districts were divided into red, orange, green and containment zones as per the number of cases in the region. The government imposed huge fines for not wearing masks in public places, for not following social distancing guidelines and even regulated the execution of night curfew throughout the country but still a significant increase in the positive cases was observed during lockdown III-IV. The first COVID-19 case was reported in India on 31st January 2020 in Kerela. From 1,82,143 confirmed positive cases which were reported till 31st May 2020 around 50% (91,216) were reported merely within a span of 14 days of lockdown IV whereas it took 50 days for 21.67% (39,488) of these to get reported during lockdown I and II as shown in Fig. 4
. As shown in the Table 1
, the daily average number of reported cases also became more than four times from lockdown II to lockdown IV. Along with increase in average recovery rate, decrease in average growth rate of daily cases was also an important parameter for government to partially remove the lockdown. As visible from Table 1, the average growth rate decreased to 5.09% (lockdown IV) from 15.73% (lockdown I) but the percentage decrease in growth rate in going from lockdown II to III was merely 17.80% whereas it was 53.11% while going from lockdown I to II. Apart from relaxation in isolation norms, there had been two more important reasons for sudden increase in the number of positive cases – increased testing capacity and migration of laborers. As per the daily bulletin published by Indian Council of MedicaI Research (ICMR) [59], the total samples tested during lockdown III were almost equivalent to total samples tested during first two lockdowns as shown in Fig. 5
and Table 1. Although the tests performed per million population in India are still way too less than those performed by countries above India in the tally of total confirmed cases, the testing capabilities have been strengthened by setting up more labs during lockdown period and revising testing norms.
Fig. 4
Distribution of total cases reported in four lockdowns.
Table 1
Impact of lockdown I, II, III & IV.
Parameters
Lockdown I (25/03/20 – 14/04/20) 21 days
Lockdown II (15/04/20 – 03/05/20) 19 days
Lockdown III (04/05/20 – 17/05/20) 14 days
Lockdown IV (18/05/20 – 31/05/20) 14 days
Average daily cases
470
1559
3639
6515
Average growth rate (%)
15.72
7.37
6.05
5.09
Total cases reported
9871
29,617
50,947
91,216
Total samples tested
2,22,199
8,01,557
11,81,192
15,09,385
Total recoveries
1013
9596
23,477
52,874
Average recovery rate (%)
7.47
19.63
31.32
41.93
Total deaths
332
962
1571
2292
Average Death rate (%)
2.62
3.22
3.28
2.95
Average doubling time (days)
4.68
8.86
11.28
13.64
Average growth-doubling time (%)
3.02
3.61
0.67
1.13
Fig. 5
Total number of samples tested from Lockdown I to Lockdown IV.
Distribution of total cases reported in four lockdowns.Impact of lockdown I, II, III & IV.Total number of samples tested from Lockdown I to Lockdown IV.Further, the mass movement of internal migrants across nation also emerged as an important cause for virus transmission. As per a report published by World Bank [60], around 40 million internal migrants were effected due to lockdown. After the first lockdown, with complete halt on transportation services, an increasingly large number of migrants started heading towards their native places on foot. Hence, the government decided to run special trains and buses to regulate interstate transfer of migrants from 1st May 2020 onwards. As per PIB bulletin of 28th May 2020 a total of 3543 ‘Shramik Special’ trains took 48 lakh migrants back to their home states from different parts of country in the span of 26 days (01/05/2020 to 26/05/2020) during lockdown III-IV [61]. The migrant inflow resulted in spreading of the infection to rural areas and the districts which were earlier in green zone. All these reasons resulted in tremendous growth in number of cases post 3rd May 2020. As visible from Fig. 6
, till the end of lockdown IV, the highest spike in number of daily reported cases since outbreak was observed on 31st May 2020 with 8380 cases reported in single day.
Fig. 6
Number of daily reported cases from Lockdown I to Lockdown IV.
Number of daily reported cases from Lockdown I to Lockdown IV.The last parameter to conclude the effect of relaxation in lockdown is – the doubling time, which has been used worldwide by statisticians to predict the growth of coronavirus. Doubling time is the time taken by the number of infections to double from a given day. The doubling time for India on 24/03/2020, before imposition of first lockdown was 3.4 days. It has been quoted at several places that at this rate, without lockdown India would have surpassed 100,000 cases in April end which it actually did on 19/05/2020. Undoubtedly, due to lockdown as per Table 1, the average doubling time increased over the four editions but it can be easily noticed from Fig. 7
, that during lockdown I and II, the doubling time showed a very steady growth but during lockdown III, it remained consistently constant at around 11 days [62]. Also, it is important to note that the average growth rate of doubling time decreased drastically from 3.61% to 0.67% in going from lockdown II to lockdown III when restrictions were partially removed. Hence, this analysis concludes the factual description of COVID-19 growth in India due to relaxation in lockdown III-IV. After four stages of lockdown, the government has finally started to unlock the nation in phases from 01/06/2020. The night curfew has been reduced to 10 PM to 5 AM, travel restrictions have been lifted off and economic activities have been allowed in entire nation except for containment zones. All the time series used in this section - total confirmed cases, total deaths, total recovered cases, daily reported cases has been formulated using data available from daily COVID-19 bulletin published by Press Information Bureau (PIB) of India [63], daily situation reports published by World Health Organization [64] and website of Ministry of Health and Family Welfare, Government of India [65].
Fig. 7
Variation of doubling time from Lockdown I to Lockdown IV.
Variation of doubling time from Lockdown I to Lockdown IV.
Active cases and GMDH model validation
Although various online COVID dashboards and media is primarily talking about the confirmed, recovered and deceased cases, the number of active cases is the most important parameter to analyze the impact of coronavirus and quality of healthcare services in a country. Active cases are defined as the number of cases which remain after subtracting the recovered and deceased patients from total confirmed positive cases on a given date. Although India has crossed the 5,00,000 mark in terms of total confirmed COVID-19 cases, with a recovery rate of 59.06% as on 30th June 2020, India has managed to keep the number of active cases in check by imposing lockdown in early stages of infection. With 5,66,840 total confirmed cases, India has a total of 2,15,125 active cases today. The number of active cases helps government to mark containment zones and monitor the growth of virus in a particular region. The active cases time series has been formulated using the time series of confirmed, recovered and deceased cases.As explained in subSection 4.1, lockdown III-IV were quite different from lockdown I and II in terms of both impact and statistics. The complete dataset for total active cases in India till lockdown IV has been divided in two sets, i.e. time series 1 and 2 respectively. With total active cases from 31/01/2020 to 03/05/2020 (lockdown I-II), time series 1 contains 94 observations whereas time series 2 consists of 28 observations from 04/05/20 to 31/05/20 (lockdown III-IV). As mentioned before, GMDH is a highly self organized data mining technique which extracts the correlation between data samples without any prior knowledge about the time series and enhances the forecasting accuracy. The GMDH model is applied to time series 1 to forecast time series 2 so as to validate and standardize the model.Time series 1 with 94 samples is used for fitting the model and time series 2 with 28 observations is used as validation set. Three training to testing set ratios (60/40, 70/30 and 80/20) were used for making a 5 day ahead forecast of number of total active cases reported during lockdown III-IV. Amongst these three, the model with training to testing ratio 70/30 gave the most accurate results which have been shown in Table 2
and Fig. 8
. The Mean Absolute Percentage Error (MAPE) has been used as accuracy criteria. The GMDH model has forecasted the active cases with MAPE of 2.58%. The total number of confirmed cases has also been forecasted for lockdown III-IV with MAPE of 2.00% which depicts the efficiency of GMDH as a nonlinear forecaster, following the same division of dataset and forecasting horizon. The results are as shown in Table 3
and Fig. 9
.
Table 2
Actual total active cases vs GMDH model forecast for Lockdown III-IV.
Date
Actual active cases
GMDH model prediction
04–05–2020
29,454
29,802
05–05–2020
32,139
31,310
06–05–2020
33,514
33,384
07–05–2020
35,903
35,744
08–05–2020
37,916
38,416
09–05–2020
39,834
40,749
10–05–2020
41,473
43,219
11–05–2020
44,029
44,768
12–05–2020
46,008
45,515
13–05–2020
47,480
47,523
14–05–2020
49,219
49,838
15–05–2020
51,401
52,074
16–05–2020
53,038
54,314
17–05–2020
53,946
56,359
18–05–2020
56,316
56,438
19–05–2020
58,802
59,300
20–05–2020
61,149
61,470
21–05–2020
63,625
63,638
22–05–2020
66,331
66,482
23–05–2020
69,598
68,357
24–05–2020
73,561
71,437
25–05–2020
77,104
72,267
26–05–2020
80,723
74,184
27–05–2020
83,004
76,651
28–05–2020
86,111
78,793
29–05–2020
89,987
82,595
30–05–2020
86,423
85,568
31–05–2020
89,996
88,620
Fig. 8
Plot of actual active cases vs GMDH forecast for lockdown III-IV.
Table 3
Actual confirmed cases vs GMDH model forecast for Lockdown III-IV.
Date
Actual Confirmed cases
GMDH Model Prediction
04–05–2020
42,533
42,063
05–05–2020
46,433
44,696
06–05–2020
49,391
47,759
07–05–2020
52,952
50,664
08–05–2020
56,342
53,804
09–05–2020
59,662
57,231
10–05–2020
62,939
60,828
11–05–2020
67,152
64,483
12–05–2020
70,756
67,947
13–05–2020
74,281
71,540
14–05–2020
78,003
76,059
15–05–2020
81,970
80,471
16–05–2020
85,940
84,924
17–05–2020
90,927
89,890
18–05–2020
96,169
97,143
19–05–2020
101,139
101,187
20–05–2020
106,750
106,559
21–05–2020
112,359
114,454
22–05–2020
118,447
120,136
23–05–2020
125,101
125,202
24–05–2020
131,868
132,717
25–05–2020
138,845
138,982
26–05–2020
145,380
146,426
27–05–2020
151,767
153,736
28–05–2020
158,333
161,228
29–05–2020
165,799
168,571
30–05–2020
173,763
176,861
31–05–2020
182,143
183,860
Fig. 9
Plot of actual confirmed cases vs GMDH forecast for lockdown III-IV.
Actual total active cases vs GMDH model forecast for Lockdown III-IV.Plot of actual active cases vs GMDH forecast for lockdown III-IV.Actual confirmed cases vs GMDH model forecast for Lockdown III-IV.Plot of actual confirmed cases vs GMDH forecast for lockdown III-IV.
Forecast for number of active cases
As mentioned in subSection 4.2, the number of active cases describes the most accurate status of pandemic growth in country. Hence, a good medium term forecast of the number of active cases can help the government in proportionally scaling up the testing facilities, arranging for the medical infrastructure - personal protective equipments (PPEs), ventilators, oxygen support, isolation wards, etc. and take important strategic and economic decisions for near future. For example, on 23/01/2020, there was just one COVID-19 testing lab in India. As of 30/06/2020, 761 government and 288 private labs have been steadily set up as per the growth of coronavirus in different parts of the country [58].India presently has 1039 dedicated COVID Hospitals and 2398 dedicated COVID Health Centers with 3,15,758 isolation beds, 34,479 ICU beds and 1,28,589 oxygen supported beds. Moreover, 8958 COVID Care Centers with 8,10,621 beds have also been operationalized to combat COVID-19 in the country. The ventilators available for COVID beds are 21,494 [66,67]. The centre has also allocated Rs. 2000 crore from PM CARES Fund Trust for supply of 50,000 ‘Made-in-India’ ventilators to government run COVID hospitals in all states [68].As quoted by the Health Ministry on 30th May 2020 [69], there are 2.55% active COVID-19patients in ICU, 0.48% on ventilators and 1.96% on oxygen support. Therefore, taking these numbers as reference and taking the active case time series data till the end of lockdown, i.e., 31st May 2020 the GMDH model has been used to forecast the number of active cases for next seven months starting from 01/06/2020. Although recovery rate has increased since 31st May 2020, assuming the worst case scenario, and taking the stats revealed by Health Ministry as standard, the number of required ICU beds, ventilators and oxygen support systems have been tabulated in Table 4
as per the forecasted active cases by the GMDH model. The numbers for June month give an extra validation to the GMDH forecasts for next six months. For instance, the active cases forecasted by GMDH model for 30/06/2020 were 2,13,130 whereas actual number of reported active cases on 30/06/2020 were 2,15,125 as shown in Table 4.
Table 4
Medical requirements as per forecasted active cases.
Date
No. of Active Cases (x)
ICU beds (2.55% of x)
Ventilators (0.48% of x)
Oxygen support (1.96% of x)
15/06/2020
149,238
3806
716
2925
30/06/2020
213,130
5435
1023
4177
15/07/2020
272,505
6949
1308
5341
30/07/2020
329,801
8410
1583
6464
15/08/2020
385,543
9831
1851
7557
30/08/2020
439,101
11,197
2108
8606
15/09/2020
497,438
12,685
2388
9750
30/09/2020
552,410
14,086
2652
10,827
15/10/2020
601,848
15,347
2889
11,796
30/10/2020
645,527
16,461
3099
12,652
15/11/2020
679,947
17,339
3264
13,327
30/11/2020
712,512
18,169
3420
13,965
15/12/2020
739,607
18,860
3550
14,496
30/12/2020
773,558
19,726
3713
15,162
Medical requirements as per forecasted active cases.With successive unlocking of country after every lockdown and unavailability of vaccine, GMDH forecasts are suggestive of the fact that the novel coronavirus is here to stay for a long time. However, the model predicts that the growth rate of active cases will slow down with time which is in favor of steady growth in recovery rate since lockdown ended on 31/05/2020. The model suggests that the active cases in country will reach 2 lakh by June end but it will take 1.5 more months for it to reach the mark of 4 lakh and 2 more months for the double to become triple. The model as shown in Fig. 10
suggests that there would be a total of 7,73,558 active COVID-19 cases by the end of year 2020. The fortnightly requirements of ICU beds, ventilators and oxygen support systems have been tabulated in Table 4 as per the forecasted active cases. With limited data and limitations of data science, it is impossible for any model to accurately predict the number of people effected by COVID-19 but a forecasted range of effected people may keep the medical fraternity and the government a step ahead in fighting pandemic.
Fig. 10
Number of active cases as predicted by GMDH model.
Number of active cases as predicted by GMDH model.
Conclusion
Several epidemiological and soft computing models have been used by researchers all over the world to forecast the quantitative growth of novel coronavirus across different regions but the short length and nonlinearity exhibited by time series of COVID cases has been an incessant challenge. In this paper, an applied soft computing technique has been used to predict the number of active COVID-19patients and ventilator requirements in India for the upcoming months of 2020 by considering the available data published upto 30/06/2020 by the Ministry of Health, Government of India . The country has shown a considerable growth in recovery rate till now and the death rate is also low as compared to the most severely affected nations worldwide but it has entered the top five nations in terms of number of confirmed cases and is presently at an alarming fourth position worldwide. Therefore, with world's second largest population and rising number of cases, these GMDH predictions can guide the healthcare system in terms of preparedness and management of apparatus particularly ventilators. The sequence of lockdown has prevented the growth from becoming exponential but it has not been able to bring a decrease in the number of daily reported cases. Every lockdown has witnessed a successive rise in the number of average reported daily cases. 19,458 cases were reported on 29/06/2020. Hence, these predictions are very important as the country is getting ready to fully unlock in phases.Apart from forecasting active cases, the effect of initial relaxations by the government and the impact of partial unlocking of nation during lockdown III and lockdown IV has also been covered in this paper. An in depth numerical analysis has been carried out in terms of standard COVID parameters like doubling rate, average daily reported cases, average growth rate, total samples tested, recoveries, deaths to elaborate the impact of lockdown III and IV in increasing the COVID spread as compared to lockdown I and II. The degree of statistical variation between the time series of four lockdowns has been explained and then their forecasting has been used to standardise the GMDH model. Model's accuracy in matching the actual number of cases for lockdown III-IV has validated its authenticity for the number of forecasted active cases for next six months.The lockdown certainly gave preparation time to the government for boosting up the medical infrastructure and limiting the growth rate of virus but no economy whether developing or developed can afford lockdown for a long time. The lockdown V which is better called as unlock I was certainly a necessary event but the unavoidable risk to which it has exposed the citizens of nation cannot be overlooked. Applied soft computing and these active case forecasts are one of the many ways one can assist the government to combat this risk. Moreover, USA and China have reported their success in conducting preliminary trials for coronavirus vaccine but even if they succeed in trials for a larger number of populations in next stages, in which hopefully they will, it will take a minimum of six months to one year for the vaccine to reach markets. Hence, prediction over a period of six months has been made using GMDH in terms of active cases. The present trend and predictions both are suggestive of fact that the novel coronavirus is a problem we will have to stay with for a longer period of time and today prevention is the only measure, not just better than cure.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Authors: Tereza Brenda Clementino de Freitas; Rafaella Cristina Tavares Belo; Sabrina Mércia Dos Santos Siebra; André de Macêdo Medeiros; Teresinha Silva de Brito; Sonia Elizabeth Lopez Carrillo; Israel Junior Borges do Nascimento; Sidnei Miyoshi Sakamoto; Maiara de Moraes Journal: Nepal J Epidemiol Date: 2022-06-30