Literature DB >> 33522669

Machine learning techniques to detect and forecast the daily total COVID-19 infected and deaths cases under different lockdown types.

Tanzila Saba¹, Ibrahim Abunadi¹, Mirza Naveed Shahzad², Amjad Rehman Khan¹.

Abstract

COVID-19 has impacted the world in many ways, including loss of lives, economic downturn and social isolation. COVID-19 was emerged due to the SARS-CoV-2 that is highly infectious pandemic. Every country tried to control the COVID-19 spread by imposing different types of lockdowns. Therefore, there is an urgent need to forecast the daily confirmed infected cases and deaths in different types of lockdown to select the most appropriate lockdown strategies to control the intensity of this pandemic and reduce the burden in hospitals. Currently are imposed three types of lockdown (partial, herd, complete) in different countries. In this study, three countries from every type of lockdown were studied by applying time-series and machine learning models, named as random forests, K-nearest neighbors, SVM, decision trees (DTs), polynomial regression, Holt winter, ARIMA, and SARIMA to forecast daily confirm infected cases and deaths due to COVID-19. The models' accuracy and effectiveness were evaluated by error based on three performance criteria. Actually, a single forecasting model could not capture all data sets' trends due to the varying nature of data sets and lockdown types. Three top-ranked models were used to predict the confirmed infected cases and deaths, the outperformed models were also adopted for the out-of-sample prediction and obtained very close results to the actual values of cumulative infected cases and deaths due to COVID-19. This study has proposed the auspicious models for forecasting and the best lockdown strategy to mitigate the causalities of COVID-19.

Entities: Chemical

Keywords: COVID-19; healthcare; lockdown; lungs infection; machine learning models; public health; time series

Mesh：

Year: 2021 PMID： 33522669 PMCID： PMC8014446 DOI： 10.1002/jemt.23702

Source DB: PubMed Journal: Microsc Res Tech ISSN： 1059-910X Impact factor: 2.893

INTRODUCTION

Lung infection due to the coronavirus outbreak at the end of 2019 has affected more than 57 million people around the globe, with more than 1.36 million deaths. The main source of this virus was Wuhan of China's Hubei province and it spread so dynamically throughout the world that WHO declared it pandemic short after. Infectious disease transmission is a complex transmission process that takes place from human to human. In addition to different symptoms of this infection, it has highly destructive effects on the lungs, causing a break in the respirate system that may lead to death (Khan et al., 2021). Indeed, every country tried to handle this pandemic through different strategies such as partial or complete lockdown to stop the spread‐out of this virus or herd immunity from creating sufficient resistance among the people to cover this infection through the body immune system. Hence, there is a need to develop predictive models to analyze and assess the mechanism for propagating infectious diseases that can accurately predict future patterns of infectious diseases for humanity's welfare. These models' basic objective is to classify the behavior of affected cases to minimize the harm caused by a coronavirus (Rehman, Sadad, Saba, Hussain, & Tariq, 2021b). Machine learning techniques play a significant role in infection detection and prediction (Perveen et al., 2020; Yousaf et al., 2020). The trained‐techniques can process big data at high speed to find infection cases and trends to warn the decision‐makers (Sadad, Munir, Saba, & Hussain, 2018; Saba, Haseeb. et al., 2020; Saba, Mohamed, et al., 2020; Ullah et al., 2019). In their review study, Long and Ehrenfeld (2020) claimed that prediction through artificial intelligence methods might reduce the effects of this pandemic crisis. Accordingly, several automatic classifications of infection detections and forecasting models are reported in the literature with different scope (Saba, Bokhari, Sharif, Yasmin, & Raza, 2018; Saba, Khan, et al., 2019; Mashood Nasir, et al., 2020). The forecasting methods could be divided into statistical, machine learning (ML), and deep learning methods (Mughal, Muhammad, Sharif, Rehman, & Saba, 2018; Mughal, Sharif, Muhammad, & Saba, 2018; Phetchanchai, Selamat, Saba, & Rehman, 2010; Saba, Rehman, & AlGhamdi, 2017). The machine learning solution recently proposed was using the Random Forest Infection Scale (iSARF), to detect the infection size and affected lung areas. MLP and adaptive network‐based fuzzy inference (ANFIS) are used in the estimation and forecasting of dynamic variance behaviors. It was proposed to take the verified cases and estimate the numbers of infected persons in the country with the hybrid approach to vector control by Support Vector Regression (SVR) and ARIMA (Al‐Ameen et al., 2015; Sadad et al., 2021). Also, Parbat and Chakraborty (2020) used the RBF kernel model for forecasting everyday cases, recovered conditions, and death. Indeed, deep learning approaches play a crucial role in detecting infection and forecasting large outbreak data trends that helped avoid coronavirus spread through early alarming (Rehman, Saba, Ayesha & Tariq, 2021c). COVID‐19 deals with time‐series data and the use of sequential models to resolve its complex existence has been generally supported. Bandyopadhyay and Dutta (2020) proposed to test the predictions with confirmed, negative, and death‐case COVID‐19 RNN and long short‐term memory (LSTM) network. Huang, Chen, Ma, and Kuo (2020) used the model for estimating the total reported cases of COVID‐19 using DL‐based convolutionary neural network (CNN). However, the main issue with the deep learning approaches is their training requirement of huge labeled data, which is hard to manage for a particular community. It could not be generalized due to the different nature of infection around the globe. Regarding this context, this research's main achievement is to explore and compare the predictive capacity of time series analysis and machine learning models to predict daily cumulative Confirmed Infected Cases (CIC) and deaths under different types of lockdown. The main contributions of the research are listed below: Determined infected and death cases due to Covid19 Concluded which type of lockdown was much more effective than others and the best‐predicted results under which strategy. Predicted possible arrangements and revision of the lockdown policy in certain cases. Further, this paper is composed of sections such that Section 2 provides a brief description of data sets publically available. Section 3 presents material and methods. Performance evaluation criteria are discussed in Section 4 and the results and discussion in Section 5; finally, research concluded in Section 6.

DATA SETS DESCRIPTION

The standard data sets are important to train the classifiers and compare the results in state‐of‐the‐art reported techniques (Lung, Salam, Rehman, Rahim, & Saba, 2014; Rad, Rahim, Rehman, & Saba, 2016). To save the maximum population from COVID‐19 in the country, countries implemented various policies. They imposed different lockdown types (like the partial, herd, and complete) to reduce people's social activities and movements for creating social distancing. In this study, three countries from each type of lockdown were considered for the prediction of incidences. They collected time‐series data sets of cumulative confirmed cases and cumulative deaths due to COVID‐19 were collected from https://github.com/CSSEGISandData for nine countries, including India, Iran, Hubei (China), Iceland, Sweden, Netherland, Russia, Bulgaria, and Greece, for the period of January 22, 2020, to September 30, 2020. Figure 1 highlighted the selected nine countries and the type of lockdown in those countries. To compare the cases in a standardized way, the cumulative confirmed cases per million and deaths per million of each considered country were plotted in Figures 2 and 3. But all the analyses were performed using cumulative CIC and deaths.

FIGURE 1

Nine selected countries, each three from the partial, herd, and complete lockdown

FIGURE 2

Day by day number of cumulative confirmed cases per million population in each of the nine countries

FIGURE 3

Day by day number of cumulative deaths per million population in each of the nine countries

Nine selected countries, each three from the partial, herd, and complete lockdown Day by day number of cumulative confirmed cases per million population in each of the nine countries Day by day number of cumulative deaths per million population in each of the nine countries

MATERIALS AND METHODS

Machine learning and time‐series techniques are proven to effectively predict and control several issues like infection diseases, floods, earthquakes, and so forth (Nodehi et al., 2014; Rehman, 2020). This section presents a description of the machine learning models applied in this study to predict the daily cumulative CIC and deaths due to COVID‐19. The framework of the study is presented in Figure 4.

FIGURE 4

Flowchart to forecast the daily cumulative infected and death cases due to COVID‐19

Random forests

The random forest (RF) is supervised, robust, tree based machine learning technique; useful for classification and regression purposes (Rehman et al., 2020). It is a beneficial technique for predicting speed, suitability for big dimensional problems, useful for handling the missing data and outliers. It looks like the forest building approach, which helps to find out the unbiased estimates. RF solves complex problems and find out accurate results by using precise learning algorithms and functions. This model still maintains its accuracy despite data is less and has missing values (Breiman, 2001). It Choose the multiple DTs to find the ultimate output and best solution path to a problem; this method is called bootstrap aggregation, also called bagging (Rehman et al., 2021a). The purpose of combining bagging with random feature selection to reduce the correlation between trees without reducing variance too much (Kuznetsova, Westenberg, Buchin, Dinkla, & van den Elzen, 2014). RF also uses the bootstrapping method that randomly draws multiple samples from the original data set to improve a prediction's accuracy. Ribeiro, Da Silva, Mariani, and dos Santos Coelho (2020) used the RF to forecast confirmed cases of COVID‐19 in Brazil.

K‐nearest neighbors

The K‐nearest Neighbors (KNN) technique is a nonparametric method that is useful for regression and classification purposes (Jamal, Hazim Alkawaz, Rehman, & Saba, 2017). It is an occurrence‐based learner model. In the time‐series analysis, the KNN explores the k nearest past comparable values (named nearest neighbors) by Euclidean distance in the given data set (like nearest past COVID‐19 values). The KNN is provided the smallest similarity measure between the past and new cases. In the present study, by considering the KNN the CIC and deaths are forecasted, as da Silva, Ribeiro, Mariani, and dos Santos Coelho (2020) also forecasted the COVID‐19 cases by the KNN.

Support vector regression

The support vector machine (SVM) is a machine learning technique used for regression and classification purposes and time‐series data prediction. It has excellent generalization ability and also appropriate for even small data (Khan et al., 2019). The support vector regression (SVR) works by following the same SVM principle with a continuous dependent variable instead of categorical. SVR uses the kernels that convert the data from low dimension to high dimension to classify classes easily. The planes that separate the classes into higher dimension are called hyperplanes (Nobel, 2006). The SVR training algorithms are mostly offline, but online algorithms are mostly used to automatically track system model time‐varying changes and time lagging characteristics Rehman (2021). The online SVR algorithms have drawbacks like when the margin support vector is empty and the training speed is plodding. The training time for computing data depends on the kernel function. The popular four kernels are the linear, radial basis, polynomial, and sigmoid. Linear kernel use for linear separable distribution, polynomial for polynomial separable distribution, radial for circularly separable, and sigmoid for special distribution. Parbat and Chakraborty (2020) also predicted the confirmed, recovered, and death cases by employing the SVR model.

ARIMA (p, d, q)

ARIMA stands for the auto‐regressive integrated moving average, used for modeling and forecasting in time series analysis. It deals with the non‐stationary time series data by making it stationary. In ARIMA (p, d, q), the order p represents the number of lag variables of the time‐series that appear on the independent side, d shows the order of difference that is required to make the non‐stationary series stationary, and q is the order for moving average also appear as independent variables. The p and q values vary until we get the most suitable ARIMA model for modeling and prediction. In many studies, the ARIMA model employed and obtained better forecasting of COVID‐19 confirmed, recovered, and death cases for many countries.

SARIMA (p, d, q)(P, D, Q)

The seasonal autoregressive integrated moving average (SARIMA) model is the ARIMA model's seasonal extension (Szeto, Ghosh, Basu, & Mahony, 2009). The order (p, d, q) is the same as in the ARIMA model. Still, the order P represents the number of seasonal lag values, D presents seasonal difference to series stationery. Q is the order for lag values of seasonal moving average and s is for the seasonal pattern. The comparative studies justified that the SARIMA models perform better than a simple ARIMA, if seasonality present in the data (Chung & Rosalion, 2001). The seasonal pattern in the data series has been observed by autocorrelation function (ACF) and partial ACF (PACF) in time series for the analysis with SARIMA (Szeto et al., 2009).

Decision trees

DTs are supervised learning methods used for both classification and regression purposes (Saba, 2021). They predicted the dependent variable's values by learning simple decision rules inferred from the data featured (Khan et al., 2020). A DTs algorithm starts from the root node and goes through multiple internal or split nodes until reaching the leaf. Data points go internal nodes if the binary tree goes to the right internal node; otherwise, left until these points shall end up on appropriate leaves. When the learning process is completed, we can test the algorithm on test data with unknown features (Waheed, Alkawaz, Rehman, Almazyad, & Saba, 2016). Chi‐squared Automatic Interaction Detection, Classification and Regression Trees, C4.5, and C5.0 are the most common tree methods which are used (Tso & Yau, 2007). DTs produce a model whose results are may represent interpretable rules or logic statements. It provides obvious information on the significant factors for classification and/or prediction.

Holt winter model

The Holt winter model was introduced by Chatfield and Yar (1988) and is used for forecasting the values based on own past values of the series. It is beneficial for short‐term data prediction and contains the three components: level, trend, and seasonal. It has additive and multiplicative forms, and the difference between the additive and multiplicative models is dependent on the nature of the seasonal component. If the variation is almost stable through series is called an additive model and if variations change proportionally to the level of the series is called the multiplicative model.

Polynomial regression

The regression‐based on the relationship between the dependent variable (y ) and up to the rth degree of the independent variable (x ) is called polynomial regression. It fits a temporarily nonlinear relationship between the observations of x and the corresponding conditional‐mean of the y observations, denoted as E(y | x ). It has been used for nonlinear phenomena like the distribution of isotopes in lake sediments, a growth rate of tissues, and progression of disease epidemic (Sun, Liu, Zhou, & Li, 2014).

Gradients boosting regressor

Gradients boosting regressor (GBR) is a machine learning algorithm adopted for prediction and/or classification. It uses the gradients boosting decent approach for problem minimization and to obtain a prediction model in the form of an ensemble of the weak prediction model. It contains three elements, which are loss function, a weak learner, and an additive model. The loss function needs to be optimized, the weak learner uses for making the prediction values and the additive model is used to add a weak learner to minimize the loss function. It uses the DTs with a fixed size as weak learners. Decision trees can handle a mixed type of data and have the ability to model complex functions. GBR has some advantages like high predictive power, supports the different loss functions, and robustness to outliers in output space.

PERFORMANCE EVALUATION‐METRIC

The comprehensive investigation was achieved in Python by assessed the three well‐recognized errors, such as mean absolute percentage error (MAPE), mean absolute error (MAE), and root mean square (RMSE). The computing expression for these evaluation‐metric are as given below where n is the number of observations, A and P are the jth actual observed and predicted values, respectively.

RESULTS AND DISCUSSION

The performance of six machine learning and three time‐series methods was evaluated to predict the cumulative CIC and deaths discussed in the methods and materials section. The predictive modeling purpose was to better predict COVID‐19 cases in selected countries concerning lockdown types. The daily time‐series data sets of COVID‐19 were split into training and testing sample sets with a ratio of 95%:05%, respectively, as Li and Chan (2017) and Azuaje (2003) obtained much better results by this ratio. The concept of the partition of the data sets into training and testing sample sets is presented in Figure 5. Every considered technique was applied in the training sample data set and obtained the best‐fitted model. Finally, validated the best‐fitted model in the testing sample data set and the results are tabulated. Nine models have been used to get the best forecasting model, ranging from simple to complex and from time‐series to machine‐learning. The COVID‐19 cumulative CIC and deaths were forecasted for the nine countries and three different lockdown types. In total, we have produced forecasts 54 times for daily data but graphically presented three best models per country for CIC and deaths, those identified best model on the base of the smaller MAPE, MAE and RMSE.

FIGURE 5

Legend for Figures 6, 7, 8

FIGURE 6

Prediction with three best models of the daily number of cumulative confirmed infected cases and deaths due to COVID‐19 in Bulgaria, Greece, and Russia, where partial lockdown was imposed

FIGURE 7

Prediction with three best models of the daily number of cumulative confirmed infected cases and deaths due to COVID‐19 in Hubei (China), Iran, and India, where complete lockdown imposed

FIGURE 8

Prediction with three best models of the daily number of cumulative confirmed infected cases and deaths due to COVID‐19 in Iceland, Netherland, and Sweden, where herd lockdown imposed

The forecasting models are the optimal choice to deal with past data and predict new values based on past data. This paper implemented the PR, SVR, DT, GBR, RF, KNN, Holt Winter, ARIMA, and SARIMA models on COVID‐19 data from January 22, 2020 to September 30, 2020 predict the COVID‐19 trend in selected countries. All models were mature on training data sets. By applying the matured models on tested data sets, we obtained the MAPE, MAE, and RMSE values shown in Tables 1, 2, 3 of all countries concerning daily cumulative CIC and deaths of all lockdown types. The best optimal model is select regarding the MAPE, MAE, and RMSE criteria; closer to zero are the main criteria to prefer one model over another with the lowest prediction error for one country to others. The Holt's winter, ARIMA, and SARIMA models have achieved the optimal models regarding the MAPE, MAE, and RMSE criteria of all partial lockdown countries concerning forecast daily cumulative CIC and deaths the base of past data and make them bold.

TABLE 1

Forecasting accuracy measures for the cumulative confirmed cases and deaths in the countries where partial lockdown was imposed

Country	Bulgaria			Greece			Russia
Models	MAPE	MAE	RMSE	MAPE	MAE	RMSE	MAPE	MAE	RMSE
Daily cumulative number of confirmed cases
PR	15.82	3,166.78	3,746.56	15.4	2,651.9	3,072.3	0.94	10,767.6	11,687.8
SVR	33.94	6,680.51	6,743.44	8.45	1,414	1,447.4	5.91	67,351.6	74,295.3
DT	5.32	1,066.69	1,260.38	12.6	2,152.6	2,453.6	3.94	44,993.5	51,984.9
GBR	5.36	1,073.51	1,266.16	12.6	2,157.9	2,458.2	3.98	45,449.1	52,379.7
RF	5.69	1,137.62	1,320.95	13.5	2,305.5	2,588.8	4.17	47,625.9	54,279.3
KNN	5.72	1,143.69	1,326.18	13.7	2,332.1	2,612.5	4.19	47,827	54,455.8
Holt winter	0.85	171.40	223.02	2.98	509.82	573.4	0.51	5,940.53	7,962.28
ARIMA	1.06	215.20	297.55	0.43	71.2	82.98	0.33	3,906.5	5,547.78
SARIMA	0.86	175.45	253.65	0.34	55.44	69.12	0.37	4,299.09	6,094.87
Daily cumulative number of deaths
PR	1.16	9.36	13.53	1.91	7.08	8.18	2.1	423.86	483.27
SVR	23.57	185.48	188.88	2.91	10.9	12.5	51.4	10,224.3	10,314.9
DT	4.31	34.38	40.77	9.81	36.6	42.2	4.25	855.23	973.59
GBR	4.35	34.66	41.01	9.85	36.7	42.3	4.29	862.96	980.39
RF	4.78	38.03	43.89	10.7	39.7	44.9	4.56	917.51	1,028.73
KNN	4.95	39.38	45.07	11.1	41.1	46.2	4.61	926.73	1,036.96
Holt winter	0.39	3.12	3.99	0.56	2.08	2.62	0.41	82.7	91.38
ARIMA	1.94	15.29	16.19	1.50	5.58	6.34	0.10	21.84	27.75
SARIMA	1.41	11.13	12.22	1.87	6.96	7.99	0.06	13.76	15.76

Three model values were bolded due to those provided best results than other.

TABLE 2

Forecasting accuracy measures for the cumulative confirmed cases and deaths in the countries where complete lockdown was imposed

Country	Hubei (China)			Iran			India
Models	MAPE	MAE	RMSE	MAPE	MAE	RMSE	MAPE	MAE	RMSE
Daily cumulative number of confirmed cases
PR	32.32	22,854.48	25,854.48	5.78	25,669.04	29,622.75	2.55	154,437.44	193,090.93
SVR	0.02	10.80	10.89	17.34	75,997.33	77,097.60	81.43	4,736,407.58	4,743,009.46
DT	0.00	0.00	0.00	5.21	23,112.84	26,477.11	10.08	601,536.69	677,737.28
GBR	0.007	0.27	0.52	5.26	23,316.20	26,654.81	10.10	602,666.34	678,740.12
RF	0.00	0.00	0.00	5.51	24,420.93	27,626.36	10.85	646,264.90	717,732.36
KNN	0.00	0.00	0.00	5.53	24,520.34	27,714.27	10.91	649,748.69	720,870.85
Holt winter	0.0002	0.018	0.019	0.29	1,329.28	1,640.21	0.44	26,939.18	33,528.18
ARIMA	0.40	273.73	335.69	0.75	3,355.12	4,079.23	1.61	97,391.75	119,710.00
SARIMA	0.01	10.8	0.007	0.81	3,638.89	4,365.41	1.32	79,729.27	98,717.20
Daily cumulative number of deaths
PR	37.40	1,687.79	1981.23	4.88	1,245.73	1,534.01	0.93	868.14	908.25
SVR	0.06	2.81	2.81	34.55	8,675.00	8,740.14	6.78	6,325.34	6,625.19
DT	0.00	0.00	0.00	4.84	1,230.38	1,412.56	8.31	7,825.23	8,810.17
GBR	0.007	0.03	0.03	4.88	1,240.93	1,421.76	8.33	7,852.31	8,834.23
RF	0.00	0.00	0.00	5.15	1,309.80	1,482.25	8.90	8,372.49	9,299.65
KNN	0.00	0.00	0.00	5.19	1,318.38	1,489.84	8.94	8,412.23	9,335.44
Holt winter	0.00002	0.00009	0.00009	0.24	60.78	65.94	0.40	387.29	504.58
ARIMA	2.87	129.65	148.38	0.12	30.28	33.31	0.55	531.91	699.18
SARIMA	0.00007	0.0006	0.0007	0.09	22.65	25.87	0.47	452.77	602.94

Three model values were bolded due to those provided best results than other.

TABLE 3

Forecasting accuracy measures for the cumulative confirmed cases and deaths in the countries where herd lockdown was imposed

Country	Iceland			Netherland			Sweden
Models	MAPE	MAE	RMSE	MAPE	MAE	RMSE	MAPE	MAE	RMSE
Daily cumulative number of confirmed cases
PR	27.69	715.54	830.54	9.56	10,285.79	11,743.07	7.39	6,742.31	8,223.87
SVR	6.71	173.88	203.87	9.45	10,195.53	11,608.08	47.96	43,296.94	43,666.40
DT	11.40	293.92	332.64	14.26	15,463.84	18,014.32	2.44	2,226.53	2,701.35
GBR	11.43	294.69	333.32	14.30	15,504.66	18,049.37	2.47	2,259.54	2,728.62
RF	11.66	300.41	338.39	15.00	16,219.56	18,667.07	2.59	2,366.83	2,818.11
KNN	11.74	302.42	340.18	15.11	16,340.34	18,772.11	2.61	2,381.53	2,830.46
Holt winter	7.89	202.74	225.68	3.89	4,265.77	5,347.75	0.82	741.17	855.84
ARIMA	7.40	190.30	212.45	3.49	3,845.05	4,967.90	0.71	650.78	813.43
SARIMA	7.49	192.93	216.84	2.07	2,297.04	3,144.86	0.39	354.67	456.47
Daily cumulative number of deaths
PR	24.29	2.42	2.78	27.86	1769.76	2060.50	17.09	1,005.51	1,178.89
SVR	20.23	2.02	2.03	34.08	2,157.51	2,171.72	58.15	3,417.77	3,445.04
DT	0.00	0.00	0.00	0.93	59.53	75.31	0.20	11.92	14.98
GBR	0.001	0.0001	0.0001	0.95	60.99	76.47	0.23	13.89	16.59
RF	0.00	0.00	0.00	0.96	61.52	76.89	0.24	14.10	16.77
KNN	0.00	0.00	0.00	0.98	62.53	77.70	0.236	13.92	16.62
Holt winter	0.00	0.002	0.0003	0.65	41.78	54.87	0.63	37.06	41.79
ARIMA	1.20	0.12	0.13	0.60	38.59	51.96	0.57	33.83	40.15
SARIMA	2.7	0.27	0.30	0.53	34.03	47.01	0.42	24.93	27.59

Three model values were bolded due to those provided best results than other.

Forecasting accuracy measures for the cumulative confirmed cases and deaths in the countries where partial lockdown was imposed Three model values were bolded due to those provided best results than other. Forecasting accuracy measures for the cumulative confirmed cases and deaths in the countries where complete lockdown was imposed Three model values were bolded due to those provided best results than other. Forecasting accuracy measures for the cumulative confirmed cases and deaths in the countries where herd lockdown was imposed Three model values were bolded due to those provided best results than other. Every data set is modeled and predicted by the nine considered models, but we are only discussing the three best‐fitted models. As the Holt's winter, ARIMA (3,2,2), and SARIMA (0,2,1)(1,0,1)7 were selected as best models for forecasting the amount of daily cumulative CIC and deaths based on past data in Bulgaria. Their MAPE values for Bulgaria CIC and deaths were (0.85, 1.06, 0.86) and (0.39, 1.94, 1.41), MAE values were (171.40, 215.20, 175.45) and (3.12, 15.29, 11.13) and RMSE values were (223.02, 297.55, 253.65) and (3.99, 16.19, 12.22), respectively. Holt's winter was selected as the best technique for forecasting the number of daily cumulative CIC and Bulgaria deaths with the indices' least values. The Holt's winter, ARIMA (3, 2, 2) and SARIMA (0,2,1)(1,0,1)7 models have declared the best models for forecasting the amount of daily CIC and deaths in Greece. Their MAPE values of for Greece daily CIC and deaths were (2.98, 0.43, 0.34) and (0.56, 1.50, 1.87), MAE values were (509.82, 71.2, 55.44) and (2.08, 5.58, 6.96) and RMSE values were (573.4, 82.98, 69.12) and (2.62, 6.34, 7.99), respectively. Overall, the best model of the confirmed cases was the SARIMA (0,2,1)(1,0,1)7 and Holt's winter for the number of deaths prediction for Greece with the partial lockdown. Holt's winter, ARIMA, and SARIMA were ranked the top three models to forecast the amount of daily confirmed infected cases and deaths. However, the confirmed infected cases of COVID‐19 and deaths were accurately modeled by ARIMA (3, 2, 2) and SARIMA (1,2,2)(0,0,2)7, respectively. These models produced the least values of MAPE, MAE, and RMSE (0.33, 3,906.5, 5,547.78) and (0.06, 13.76, 15.76), respectively. The prediction of these best models is presented graphically in Figure 6 for the daily number of cumulative CIC and deaths due to COVID‐19 in Bulgaria, Greece, and Russia, where partial lockdown was imposed. Prediction with three best models of the daily number of cumulative confirmed infected cases and deaths due to COVID‐19 in Bulgaria, Greece, and Russia, where partial lockdown was imposed The strategy of complete lockdown was imposed in Hubei (China), Iran, and India. The data sets of these countries were modeled by the nine techniques and the results reported in Table 2. The best three models for the Hubei (China) data sets, according to the MAPE, MAE, and RMSE were DTs, RF, and KNN models. These models completely captured the data trend and predicted 100% accurate results of the daily number of CIC and deaths based on past data. The Holt's winter and SARIMA (1,2,1)(1,0,1)7 were the optimal models for forecasting the amount of daily CIC and deaths in Iran. For the Indian data sets, Holt's winter models were considered the best models for forecasting daily CIC and deaths. The future prediction of these countries is sketched in Figure 7 by the best three models. Prediction with three best models of the daily number of cumulative confirmed infected cases and deaths due to COVID‐19 in Hubei (China), Iran, and India, where complete lockdown imposed Different models produced optimal results for both CIC and deaths for different countries' data sets in the Herd lockdown type. The best values of the three models were bolded in Table 3. In the top three models, the SVR, ARIMA, and SARIMA models are considered best for Iceland confirmed cases. As the MAPE, MAE and RMSE for daily confirmed cases of SVR, ARIMA (3, 2, 2) and SARIMA (2, 2, 2) (0, 0, 1)7 were (6.71, 7.40, 7.49), (173.88, 190.30, 192.93) and (203.87, 212.45, 216.84), respectively. The DTs, RF, and KNN models were accurately modeled the number of deaths with zero error. In Netherland, Holt's winter and ARIMA (3, 2, 2) were selected best models for forecasting the amount of daily CIC and deaths. Whereas, SARIMA (2, 2, 2) (1, 0, 1)7 and SARIMA (1, 2, 2) (2, 0, 0)7 were the best models for forecasting the amount of daily CIC and deaths, respectively. The MAPE values of for Netherland daily CIC and deaths for the best models were (3.89, 3.49, 2.07) and (0.65, 0.60, 0.53), MAE values were (4,265.77, 3,845.05, 2,297.04) and (41.78, 38.59, 34.03) and RMSE were (5,347.75, 4,967.90, 3,144.86) and (54.87, 51.96, 47.01), respectively. The Holt's winter, ARIMA (3, 2, 2) and SARIMA (0, 2, 2) (1, 0, 1)7 were select best models for forecasting the amount of daily confirmed cases in Sweden whereas DT, GBR, and KNN for deaths. The MAPE, MAE and RMSE for daily confirmed cases of Holt's winter, ARIMA (3, 2, 2) and SARIMA (0, 2, 2) (1, 0, 1)7 were (0.82, 0.71, 0.39), (741.17, 650.78, 354.67) and (855.84, 813.43, 456.47), respectively. The MAPE, MAE and RMSE for Sweden deaths for DT, GBR and KNN were (0.20, 0.23, 0.236), (11.92, 13.89, 13.92) and (14.98, 16.59, 16.62), respectively. The prediction by these models and these countries where herd lockdown strategy was imposed was presented graphically in Figure 8. Prediction with three best models of the daily number of cumulative confirmed infected cases and deaths due to COVID‐19 in Iceland, Netherland, and Sweden, where herd lockdown imposed

Multistep ahead forecasting

The accurate models capable of predicting cumulative CIC and deaths efficiently and the proposed models were also used for out‐of‐sample forecasting to further illustrate the models' efficiency. The out‐of‐sample forecasting results were figured for multistep head from October 1, 2020, to October 10, 2020, compared with the actual values of CIC and deaths as presented in Table 4. The comparison indicated that the proposed models fulfilled all the evaluation criteria.

TABLE 4

The averages and standard deviations of the out‐of‐sample forecasted values from October 1, 2020 to October 10, 2020, with the best selected model for each country

Lockdown types	Country	Model	Average forecasted values (October 1 to October 10)	Average actual values (October 1 to October 10)	SD of forecasted values (October 1 to October 10)	SD of actual values (October 1 to October 10)
Daily number of confirmed cases
Partial	Bulgaria	Holt winter	21,068.21	22,364.2	419.565	1,154.95
	Greece	SARIMA	20,403.24	20,454.8	1,050.33	1,072.37
	Russia	ARIMA	1,192,959	1,226,702	19,193.6	33,236.9
Complete	Hubei	DT, RF, KNN	68,139.	68,139.0	0.00000	0.00000
	Iran	Holt winter	471,780.2	478,174.1	9,701.03	11,976.1
	India	Holt winter	6,874,084	6,725,805	272,790	219,685
Herd	Iceland	SVR	2,465.534	3,070.4	37.7329	240.709
	Netherland	SARIMA	125,992.2	143,655	7,024.40	14,957.7
	Sweden	SARIMA	93,869.75	95,800.3	1,116.77	1883.34
Daily number of confirmed deaths
Partial	Bulgaria	Holt winter	851.5456	859.5	17.2964	22.6629
	Greece	Holt winter	426.1323	416.3	17.4140	14.6215
	Russia	SARIMA	21,381.99	21,528.6	425.639	506.402
Complete	Hubei	DT, RF, KNN	4,512	4,512	0.00000	0.00000
	Iran	SARIMA	27,296.69	27,319.8	578.550	660.548
	India	Holt winter	106,064.8	104,097.2	3,558.89	2,864.92
Herd	Iceland	DT, RF, KNN	10	10	0.00000	0.00000
	Netherland	SARIMA	6,341.134	6,484.6	14.2624	51.8977
	Sweden	DT	5,864	5,892.8	0.000	3.64539

The averages and standard deviations of the out‐of‐sample forecasted values from October 1, 2020 to October 10, 2020, with the best selected model for each country

CONCLUSION

In this paper, time‐series models named Holt's Winter, ARIMA, SARIMA and machine learning approach named RFs, KNN, SVR, DTs, polynomial regression and GBR approach were employed in the task of modeling. Ten‐days‐ahead forecasting of COVID‐19 performed for cumulative CIC and deaths in Bulgaria, Greece, Russia, Hubei (China), Iran, India, Iceland, Netherland and Sweden under different lockdown policies. The MAPE, MAE, and RMSE criteria were used to evaluate the performance of the compared approaches. It is impossible to recommend a single approach to model and forecasting for all data sets in respect of obtained results. As the different data sets exhibited different trends depending upon size, nature and type of lockdown. The optimized model for each data set was used to forecast 10‐day‐ahead cases and obtained the results very close to the actual values. Further, it is observed that the herd lockdown strategy is the best policy to control COVID‐19 cases and deaths.

ETHICAL APPROVAL

No experiments are conducted on animals or humans.

CONFLICT OF INTEREST

Authors declare that they have no competing interests.

26 in total

1. Retinal imaging analysis based on vessel detection.

Authors: Arshad Jamal; Mohammed Hazim Alkawaz; Amjad Rehman; Tanzila Saba
Journal: Microsc Res Tech Date: 2017-03-13 Impact factor: 2.769

2. An ensemble classification of exudates in color fundus images using an evolutionary algorithm based optimal features selection.

Authors: Hidayat Ullah; Tanzila Saba; Naveed Islam; Naveed Abbas; Amjad Rehman; Zahid Mehmood; Adeel Anjum
Journal: Microsc Res Tech Date: 2019-01-24 Impact factor: 2.769

3. A novel classification scheme to decline the mortality rate among women due to breast tumor.

Authors: Bushra Mughal; Muhammad Sharif; Nazeer Muhammad; Tanzila Saba
Journal: Microsc Res Tech Date: 2017-11-16 Impact factor: 2.769

Review 4. Computer vision for microscopic skin cancer diagnosis using handcrafted and non-handcrafted features.

Authors: Tanzila Saba
Journal: Microsc Res Tech Date: 2021-01-05 Impact factor: 2.769

Review 5. Fundus image classification methods for the detection of glaucoma: A review.

Authors: Tanzila Saba; Syedia Tahseen Fatima Bokhari; Muhammad Sharif; Mussarat Yasmin; Mudassar Raza
Journal: Microsc Res Tech Date: 2018-10-03 Impact factor: 2.769

6. Microscopic melanoma detection and classification: A framework of pixel-based fusion and multilevel features reduction.

Authors: Amjad Rehman; Muhammad A Khan; Zahid Mehmood; Tanzila Saba; Muhammad Sardaraz; Muhammad Rashid
Journal: Microsc Res Tech Date: 2020-01-03 Impact factor: 2.769

7. Short-term forecasting COVID-19 cumulative confirmed cases: Perspectives for Brazil.

Authors: Matheus Henrique Dal Molin Ribeiro; Ramon Gomes da Silva; Viviana Cocco Mariani; Leandro Dos Santos Coelho
Journal: Chaos Solitons Fractals Date: 2020-05-01 Impact factor: 5.944

8. Prediction of COVID-19 - Pneumonia based on Selected Deep Features and One Class Kernel Extreme Learning Machine.

Authors: Muhammad Attique Khan; Seifedine Kadry; Yu-Dong Zhang; Tallha Akram; Muhammad Sharif; Amjad Rehman; Tanzila Saba
Journal: Comput Electr Eng Date: 2020-12-30 Impact factor: 3.818

12 in total

1. Prediction of Patients with COVID-19 Requiring Intensive Care: A Cross-sectional Study Based on Machine-learning Approach from Iran.

Authors: Golnar Sabetian; Aram Azimi; Azar Kazemi; Benyamin Hoseini; Naeimehossadat Asmarian; Vahid Khaloo; Farid Zand; Mansoor Masjedi; Reza Shahriarirad; Sepehr Shahriarirad
Journal: Indian J Crit Care Med Date: 2022-06

2. Microscopic segmentation and classification of COVID-19 infection with ensemble convolutional neural network.

Authors: Javeria Amin; Muhammad Almas Anjum; Muhammad Sharif; Amjad Rehman; Tanzila Saba; Rida Zahra
Journal: Microsc Res Tech Date: 2021-08-26 Impact factor: 2.893

3. Data analytics and knowledge management approach for COVID-19 prediction and control.

Authors: Iqbal Hasan; Prince Dhawan; S A M Rizvi; Sanjay Dhir
Journal: Int J Inf Technol Date: 2022-06-11

4. Individual Factors Associated With COVID-19 Infection: A Machine Learning Study.

Authors: Tania Ramírez-Del Real; Mireya Martínez-García; Manlio F Márquez; Laura López-Trejo; Guadalupe Gutiérrez-Esparza; Enrique Hernández-Lemus
Journal: Front Public Health Date: 2022-06-30

5. An intelligence design for detection and classification of COVID19 using fusion of classical and convolutional neural network and improved microscopic features selection approach.

Authors: Javaria Amin; Muhammad Almas Anjum; Muhammad Sharif; Tanzila Saba; Usman Tariq
Journal: Microsc Res Tech Date: 2021-05-08 Impact factor: 2.893

6. Machine learning techniques to detect and forecast the daily total COVID-19 infected and deaths cases under different lockdown types.

Authors: Tanzila Saba; Ibrahim Abunadi; Mirza Naveed Shahzad; Amjad Rehman Khan
Journal: Microsc Res Tech Date: 2021-02-01 Impact factor: 2.893

7. Panel Associations Between Newly Dead, Healed, Recovered, and Confirmed Cases During COVID-19 Pandemic.

Authors: Ming Guan
Journal: J Epidemiol Glob Health Date: 2021-12-11

8. Associations between mobility patterns and COVID-19 deaths during the pandemic: A network structure and rank propagation modelling approach.

Authors: Furxhi Irini; Arash Negahdari Kia; Darren Shannon; Tim Jannusch; Finbarr Murphy; Barry Sheehan
Journal: Array (N Y) Date: 2021-07-07

9. COVID-DAI: A novel framework for COVID-19 detection and infection growth estimation using computed tomography images.

Authors: Tahira Nazir; Marriam Nawaz; Ali Javed; Khalid Mahmood Malik; Abdul Khader Jilani Saudagar; Muhammad Badruddin Khan; Mozaherul Hoque Abul Hasanat; Abdullah AlTameem; Mohammad AlKathami
Journal: Microsc Res Tech Date: 2022-02-23 Impact factor: 2.893

10. Twitter conversations predict the daily confirmed COVID-19 cases.

Authors: Rabindra Lamsal; Aaron Harwood; Maria Rodriguez Read
Journal: Appl Soft Comput Date: 2022-09-05 Impact factor: 8.263