Literature DB >> 35692385

A novel grey model based on Susceptible Infected Recovered Model: A case study of COVD-19.

Abstract

The COVID-19 pandemic has lasted for nearly two years, and the global epidemic situation is still grim and growing. Therefore, it is necessary to make correct predictions about the epidemic to implement appropriate and effective epidemic prevention measures. This paper analyzes the classic Susceptible Infected Recovered Model (SIR) to understand the significance of model characteristics and parameters, and uses the differential and difference information of the grey system to put forward a grey prediction model based on SIR infectious disease model. The Laplace transform is used to calculate the model reduction formula, and finally obtain the modeling steps of the model. It is applied to large and small numerical cases to verify the validity of different orders of magnitude data. Meanwhile, data of different lengths are modeled and predicted to verify the robustness of model. Finally, the new model is compared with three classical grey prediction models. The results show that the model is significantly superior to the comparison model, indicating that the model can effectively predict the COVID-19 epidemic, and is applicable to countries with different population magnitude, can carry out stable and effective simulation and prediction for data of different lengths.

Entities: Chemical

Keywords: Buffer operator; COVID-19; Grey prediction model; Laplace transform; SIR infectious disease

Year: 2022 PMID： 35692385 PMCID： PMC9169490 DOI： 10.1016/j.physa.2022.127622

Source DB: PubMed Journal: Physica A ISSN： 0378-4371 Impact factor: 3.778

Introduction

The COVID-19 emerged at the end of December 2019. Due to the rapid spread of COVID-19 and its high infection rate, the virus swept the world in just a few months. Thanks to the correct understanding of the epidemic situation and the implementation of effective control measures, the epidemic has been brought under control in some countries which were early epidemic centers, but the global situation is still grim. According to a WHO report dated July 19–25, 2021 [1], the global number of new cases reported this week (19–25 July 2021) was over 3.8 million, an 8% increase as compared to the previous week; the Region of the Americas reported the largest increase in case of incidence as compared to the previous week, followed by the Western Pacific Region (30% and 25%, respectively). The number of deaths reported this week increased sharply with over 69 000 deaths, a 21% increase compared to the previous week (See Fig. 1 for details). According to the report, the cumulative number of cases reported globally is now nearly 194 million, and the number of cumulative deaths exceeds 4 million. If these trends continue, the cumulative number of cases reported globally could exceed 200 million in the next two weeks. In fact, as of August 6, 2021, the cumulative number of cases has already exceeded 200 million (see Fig. 2).

Fig. 1

COVID-19 cases reported weekly by WHO Region, and global deaths, as of 25 July 2021 (Ref. [1]).

Fig. 2

COVID-19 cases and deaths per 100 000 population reported, 19–25 July 2021 (Ref. [1]).

The Americas and Europe reported the highest weekly incidence of cases per 100,000 population, with 123.3 and 108.3 new cases reported per 100,000 population, respectively, for the week 19–25 July 2021. In the past week, the Americas and South-East Asia had the highest number of deaths per 100,000 population, reporting 2.8 and 1.1 new deaths per 100,000 population, respectively. The highest numbers of new cases were reported from the United States of America (500 332 new cases; 131% increase), Brazil (324 334 new cases; 13% increase), Indonesia (289 029 new cases; 17% decrease), the United Kingdom (282 920 new cases; 5% decrease), and India (265 836 new cases; similar to the previous week). COVID-19 cases reported weekly by WHO Region, and global deaths, as of 25 July 2021 (Ref. [1]). COVID-19 is not the first global pandemic in human history. In the 14th century, the plague pandemic killed 25 million Europeans. The Spanish flu pandemic of the 20th century is the deadliest plague in history, with an estimated 700 million cases worldwide and more than 40 to 50 million deaths. In the 21st century, SARS, avian flu, and H1N1 also caused significant losses to many countries and even the world. Drawing lessons from history, countries are stepping up prevention and control measures in the face of a global epidemic. In response to different epidemic situations, reasonable and practical measures can be taken to ensure that the purpose of epidemic control is achieved, while minimizing social and world economic losses, to ensure people’s life and social operation. Therefore, as the global epidemic situation is still grim, it is of great significance to predict the epidemic situation of various countries more accurately and provide reference for the government to formulate relevant measures in advance. COVID-19 cases and deaths per 100 000 population reported, 19–25 July 2021 (Ref. [1]).

Literature review

At present, many scholars have conducted researches in the field of epidemic prediction, and the prediction methods are mainly divided into three categories: 1. Machine learning models. For example, Tiwari [2] et al. used machine learning methods to predict the development trend of the COVID-19 epidemic in India by analyzing the epidemic in China. Arora [3] et al. trained the data of COVID-19 cases in India through recursive neural network to screen out the LSTM variable with the minimum error and finally applied it to predict the epidemic in India. 2. Time series models. For example, Singh [4] et al. used discrete wavelet decomposition to decompose the data of deaths in five countries with severe COVID-19 into component series, and then used the ARIMA model to predict deaths in the next month. Maleki et al. [5] proposed an improved autoregressive time series model and applied this model to predict the number of confirmed and recovered COVID-19 cases worldwide. 3. Combined model. For example, Yang [6] et al. proposed a dynamic SEIR model to predict the peak and scale of the epidemic in China through AI training on population migration and epidemic data. Torrealba-rodriguez et al. [7] combined the Gompertz model and Logistic model with the Artificial Neural Network model to predict the confirmed COVID-19 cases in Mexico. In addition, Wu [8], Yousaf [9], Melin [10], and other scholars used different methods to research epidemic prediction and obtained some results, as shown in Table 1.

Table 1

Summary of references.

Description	Ref	Methods		Application
	[3]	World Health Organization: COVID-19 weekly epidemiological update.

Machine learning model	[2]	Machine learning		India
	[3]	RNN, LSTM		India
	[8]	CNN		Oil

Time sequence model	[4]	ARIMA, Discrete wavelet decomposition		France, Italy, Spain, UK, USA
	[5]	TP-SMN-AR		Global
	[9]	ARIMA		Pakistan

Comprehensive model	[6]	A dynamic SEIR model, AI		China
	[7]	Gompertz and Logistic model, ANN		Mexico
	[10]	Fuzzy clustering, Neural network		Mexico

	[11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24]	Literatures about the development and application of the grey predictive model.
Grey predictive model	[25]	GM(1,1) model with rolling mechanism		Germany, Turkey, USA
	[26]	Grey fractional-order model		China
	[27]	Grey Verhulst model based on the rolling mechanism		China
	[28]	GM(1,1), NGBM(1,1), FANGBM(1,1)		Italy, Britain, USA
	[29]	A grey multivariable prediction model with quadratic polynomial		China
	[30]	A grey prediction model with modified grey action		China, Italy, Britain, Russia
	[31]	GM(1,1) model with the principle of self-memory	Incidence prediction of tuberculosis in China
	[32]	GM(1,1) model with BP model	Prediction of mumps incidence in China
	[33], [34], [35], [36], [37]	Relevant literatures on calculation methods and comparative models cited in this paper.

Different prevention and control measures need to be implemented at different stages of the epidemic, so it is necessary to predict the epidemic quickly and effectively at each stage. The above machine learning models and combination models can achieve good results in epidemic prediction, but generally need a large amount of data. The time series models can show good simulation performance for many kinds of data, but the prevention and control measures need to be effective for a period of time, so it is essential to predict the trend of the epidemic in the future accurately. At present, although the COVID-19 pandemic until now has accumulated a large amount of data. However, the epidemic is affected by various factors, such as the prevention and control measures and movement of people, and historical data accumulated over time may not be applicable to current situations. So we need to find a model do not need a large amount of training data, and the grey prediction model is a kind of prediction model with simple a structure and strong applicability. The Grey prediction model was proposed by Deng [11] in 1982, which is an important part of grey theory, including the classic univariate grey prediction model(GM(1,1) model) and multi-variable grey prediction model(GM(1,N) model). Because of its simple structure, easy to learn and use, suitable for small samples, the grey prediction model has become a research hotspot in recent years. It has been widely used in various fields, for example, ecological environment [12], [13], traffic flow prediction [14], [15], energy consumption prediction [16], [17], [18], etc. In order to improve the performance of the multi-variable grey prediction model, scholars have done in-depth research on GM(1,N) model from the modeling mechanism, model structure, parameter optimization, and other aspects and obtained many achievements. For example, Xiao et al. [19] used Box–Cox transform to optimize the constraints of the classical NGBM(1,1) model and improved the nonlinearity of model. Liu et al. [20] proposed a general form of the fractional grey polynomial model and used this model to predict the electricity consumption of China and India effectively. Yu et al. [21] introduced elastic network, regularization method, and grey Wolf algorithm to optimize the grey Bernoulli model. Zhu [22] derived and solved the grey model through a new derivation method based on convolution integral, further improving the model accuracy. Duan et al. [23] reduced the dependence of the model on data types and proposed a new multivariable Verhulst model, the application showed that the model could effectively predict coal consumption. Yan et al. [24] introduced the Hausdorff derivative into the grey fractional-order model and proved the relationship between the order and the error. Summary of references. Due to its excellent and stable prediction performance for small sample data, many scholars applied the grey prediction model to the prediction of COVID-19. For example, Zeynep [25] used a classical GM(1,1) model with the rolling mechanism to predict the number of confirmed cases in Germany, Turkey, and the United States. Liu et al. used a grey fractional model to predict the epidemic of COVID-19 in China. Zhao et al. [27] proposed a grey Verhulst model based on a rolling mechanism and used data series of different lengths to predict the epidemic in China. Utkucan et al. [28] used GM(1,1), NGBM(1,1), and FANGBM(1,1) to predict the number of confirmed COVID-19 cases in Italy, the United Kingdom and the United States. Zhang et al. [29] proposed a grey multivariable prediction model with a quadratic polynomial to analyze the early data of the COVID-19 epidemic in China and predict it. Luo et al. [30] proposed a new grey prediction model and optimized the model by genetic algorithm, finally applied it to epidemic prediction in China, Italy, The UK, and Russia. The grey prediction model has achieved many effective results in the prediction of COVID-19. In addition, scholars also applied the grey prediction model to predict other epidemic diseases. For example, Guo et al. [31] introduced the self-memory into the traditional grey prediction model and applied the model to predict the incidence of tuberculosis in China. Jia et al. [32] applied GM(1,1) model and BP neural network model to predict the incidence of mumps in China, the new model combining the advantages of both has a good effect. All of the above grey prediction models applied to the COVID-19 epidemic are univariate grey prediction models, which take confirmed cases as the original sequence and directly apply them to the grey prediction model for prediction, so there are some limitations. First, the epidemic is related to the number of people cured and affected by various factors such as prevention and control measures and movement of people. Therefore, the changing trend of each sequence will affect the system. Combined with the actual situation to consider, the number of confirmed and cured is two related sequences. So, only considering the trend change of confirmed cases is not enough, which may affect the validity of the model. This is the first limitation of the current grey prediction model applied to COVID-19 prediction. Second, current grey prediction models basically take confirmed COVID-19 cases as the original sequence and directly apply them to the grey prediction model, ignoring the principle of infectious disease and the influence of confirmed and cured cases on the entire infectious disease system. Therefore, most of the grey prediction models use univariate models. This is the second limitation. Finally, the existing grey multi-variable models take the main sequence as the main research object, but other sequences are generally linear and may have impact on the description of the system, which is the limitation of the development of the multivariable grey prediction model. Based on the current researches, this paper firstly analyzed the classical infectious disease model and established the SIR infectious disease differential equation according to the model characteristics. Through the relationship between differential and difference, a dynamic grey prediction model based on the SIR model is proposed, and then the model is calculated and verified by some practical cases. Therefore, this paper has the following contributions: From the SIR infectious diseases model, the relationship between the number of confirmed cases and cured cases is studied, as well as the structural characteristics of the SIR model, and the dynamic differential equation with multiple variables is established. By the grey difference information of differentiation and difference, a novel multivariate grey prediction model based on SIR is established and is solved by the Laplace transform. Then, the properties of the model are studied and the modeling steps are obtained. The new model has the time derivative of the number of diagnosed and the number of patients cured, establishes the dynamic relationship between the two variables, and further analyzes the whole system, which makes the grey prediction model no longer dominated by only a single sequence. The function of main sequence and related sequences is analyzed in essence, and the knowledge of grey system theory is extended. At the same time, a buffer operator is introduced to preprocess the original data, which can reduce the interference of irregular oscillation of data. The new model is applied to practical cases of The UK and Cuba, respectively. To verify the validity of the model for different orders of magnitude data; At the same time, the robustness of the model is verified by using different length data modeling and prediction. In six cases, the proposed model shows high prediction and simulation accuracy, which is obviously superior to the comparison models. The remaining sections are as follows: Section 3 establishes the SIR grey dynamic prediction model based on the buffer operator, and the properties and modeling mechanism of the model are proved and discussed. Section 4 applies the model to practical cases. Section 5 is the conclusion.

A novel grey prediction model based on SIR

This section firstly introduces the classic SIR infectious disease model and establishes the dynamic differential equation model by analyzing the relationship among the three variables S, I, R. Then, according to the grey difference information and the principle of differential and difference, the grey prediction model based on SIR(SIRGM(1,N) model) is established. At the same time, considering the volatility of the original data, the average weakening buffer operator (AWBO) [35] is used to reduce the interference of the original data, so as to establish the SIR grey prediction model based on the buffer operator(D-SIRGM(1,N) model). The related properties of the model is studied, and obtains the modeling process. Fig. 3 is the flow chart of modeling ideas and methods (see Fig. 4).

Fig. 3

The flow chart of modeling ideas and methods.

Fig. 4

The flow chart of D-SIRGM(1,N) model.

The flow chart of modeling ideas and methods. The flow chart of D-SIRGM(1,N) model.

A classic SIR infectious disease model

SIR model of infectious diseases was proposed by Kermack and McKendrick [33] in 1932, which is also the most classic and basic model of infectious disease models and has made a foundational contribution to the study of infectious disease dynamics. In this model, the population is divided into three categories: Susceptible (S), Infective (I), Removal (R). For infectious diseases with short onset time and short duration, it can be regarded as a short-term model, ignoring birth rate and death rate, and the total number of people in the short term remains unchanged. Therefore, let the total number of people be a constant for any time t: So, the differential equation of SIR model is as followed: where represents the probability of infection of susceptible persons; represents the probability that the infected person will recover after a certain period of time and will become immune to the disease and will not be re-infected. For some long-term infectious diseases, the birth and death of the population must be considered, and the mortality rate is set as , the natural birth rate and death rate are respectively , . Assuming that there is no vertical transmission, the freshmen are all susceptible groups. The following model is obtained from Eqs. (1), (2): From Eq. (3), a equation can be obtained: In the model of infectious disease, the most noteworthy is the development trend of the disease, namely, the change of infected cases. Therefore, the following equation can be obtained from Eq. (4): In order to better consider the quantitative relationship between S, I, R, the following differential equation is given based on the structural characteristics of SIR model in Eq. (5): In the actual situation, it is impossible to diagnose everyone every day to obtain a completely accurate number of the infected and cured. Therefore, in the actual calculation process, I in this model is replaced by the confirmed, R by the cured number, and S is the remaining number. For Eq. (1) can obtain: As a matter of fact, when infectious diseases occur, countries are usually taken as the unit, and the total number of people in a country is usually tens of millions or hundreds of millions. According to the above formula, when the total number of people is large, the number of S is large relative to I and R, and the change is mainly influenced by I and R. Therefore, if S is considered, on the one hand, its value is too large and the order of magnitude is different from the other two factors, which may have a great impact on the model results; On the other hand, S is mainly affected by other two factors, so I and R can represent S,I and R to a large extent. Therefore, the above model is simplified as followed:

Establishment of grey prediction model based on SIR

Set the original sequence of newly cured and newly diagnosed as followed: Assume that the observation period is , is day of observation, and are the newly cured and newly confirmed patients on day respectively. Set as the total number of people cured during this period, as the total number of people diagnosed, so: Let the1-AGO sequence [34] of as follow: where Let be the adjacent mean-generating sequence [34] of : where According to Eq. (7), the following equation is obtained: Let the first-order difference replace the differential part in Eq. (14), when : Similarly: Therefore, the following grey prediction model can be defined: The original sequences and 1-AGO sequences are as Definition 3.1, the adjacent to the mean generation sequences are as Definition 3.2, so is grey prediction model based on SIR(SIRGM(1,N)). Where, Eq. (14) is the whitening equation of this model. Here are related properties of SIRGM(1,N), by the following theorems: Set parameters as , so Then the least square estimation parameters satisfy: It is easy to prove the validity of Theorem 3.1 by and least square estimation, which can be referred to literature [13]. Set as described in Theorem 3.1 , then the SIRGM(1,N) model has the following conclusions: (1) The time response equation is: (2) The reducing value is: To prove Theorem 3.2 , the Laplace transform and the convolution theorem is given as lemma.

Laplace Transform

For , a continuous time function whose value is not zero can through the following equation: transforms into of the complex variable , where is the operator, stands for the Laplace transform of , is the result of the Laplace transform of ; On the contrary, the inverse Laplace transform is the inverse of the above process, and the operator is .

Convolution Theorem

For continuous time functions and that satisfy the Laplace transform, their Laplace transform are as follow: Convolution of and is defined as: So, The Laplace transform of must exist, and Then Let the Laplace transform of are: Through Eq. (19) can obtain: similarly: Take the Laplace transform of Eq. (14), and substitute Eq. (21)–(24): then Take the inverse transformation of Eq. (25), so: where . In Eq. (26) cannot be directly compute by the inverse Laplace transform, so use the convolution theorem to compute it. According to Lemma 3.2, set then so Substitute Eq. (27) into Eq. (26) can get: Because the data sequence is , , So the initial value of above process are replaced by . Therefore, the solution of whitening differential Eq. (14) is: The integral term in Eq. (29) contains discrete value , so cannot be directly integrated, the trapezoidal formula is used for approximate calculation, so: Substitute Eq. (30) into Eq. (29) can get: Take the initial value as , the time response formula is as follows: where ; According to Eq. (32), the reductive formula of SIRGM(1,N) is: Theorem 3.2 is proved.

A SIR grey prediction model based on buffer operator

Set original sequence is , the buffer sequence is , where when is monotone growth sequence, monotone decay sequence or oscillation sequence, D is a weakening buffer operator, and D is called average weakening buffer operator (AWBO) [35]. The buffer operator is used for the original sequences to obtain the buffer sequences , so as to obtain the new SIR grey prediction model based on the buffer operator (D-SIRGM(1,N)): According to Theorem 3.2, the time response formula of D-SIRGM(1,N) is: The reductive formula is:

Steps and flowcharts of the model

is the original buffer sequence and prediction sequence, where the first m term is the simulated value and the last n-m term is the predicted value, then the absolute percentage error(APE) of a single value, the mean absolute percentage error of simulation(MAPEs) of the first m term and the mean absolute percentage error of prediction(MAPEp) are respectively as following: According to the above definition and theorem, the calculation steps of D-SIRGM model are obtained.

Application of D-SIRGM model in COVID-19 in Cuba and the UK

This section is mainly about the application of the D-SIRGM(1,N) model in COVID-19 prediction. Firstly, in order to verify the effectiveness of this model for data with different magnitudes, two typical countries, Cuba and the UK, are selected to conduct numerical experiments on behalf of large and small numerical samples, respectively. Secondly, to verify that the model has stable simulation and prediction performance for both small sample and large sample data, the data selected in each country are modeled with different lengths. Case 1,2,3 are modeled with data of two weeks, three weeks, and four weeks, respectively, and then tested with data of the next week. Finally, the results obtained by D-SIRGM(1,N) are compared and analyzed with classic multi-variable grey model GM(1,N) [35], NGM(1,N) [36] and GMVM(1,N) [37].

Data interpretation

As COVID-19 is circulating the world, it is necessary to build a prediction model to predict and prevent it. This model needs to have strong universality to adapt to the data of different countries. Therefore, a country with a small numerical sample and a large numerical sample are selected in this paper to verify whether the proposed model is highly sensitive to data of different levels and whether it can effectively reflect data characteristics and trend changes. Cuba has a population of 11.4 million, and the UK has a population of 66 million. Both countries enjoy social stability and sound medical systems. Therefore, the epidemic is hardly affected by other unstable factors, which can better reflect the actual trend of the epidemic. In addition, the number of newly confirmed cases in Cuba and the UK in the second half of 2020 was between tens to tens of thousands. Therefore, the epidemic data of these two countries in the second half of 2020 were selected as numerical cases, data from the Global COVID-19 Big Data Platform (https://www.zq-ai.com/#/fe/xgfybigdata). In the modeling process of grey prediction model, the original data must be greater than zero. So, the zero value should be removed before data modeling. However, in reality, due to the information is not uploaded timely or the epidemic is relatively stable, there will be a large number of cases are zero. Therefore, in the selection of original data, the period when epidemic information is relatively continuous is selected to reduce the influence of zero value. In addition, because the epidemic data are reported and obtained by each region in turn, the immediacy and continuity of information cannot be ensured, so there are irregular oscillations in the original data, which affects the effect of the model. Therefore, in this paper, the weakened average buffer operator is used to make the sequence more gentle, and the principle of new information first is considered, that is, the latest information remains unchanged under the action of the buffer operator, to effectively reduce the interference caused by the original data.

Application of D-SIRGM model in COVID-19 prediction in Cuba

In the case of Cuba, daily new diagnoses and new cures of 5 July and 25 September 2020 were used. The quarantine observation period is usually 14–28 days during an epidemic, so in the following numerical examples, the data of 14–28 days are used for modeling in a 7-day cycle. Data of newly diagnosed and cured cases in two, three, and four weeks were selected as modeling data to predict the newly diagnosed cases in the next week. Then, the results of the D-SIRGM (1,N) model were compared and analyzed with GM(1,N),NGM(1,N) and GMVM(1,N). Case 1 Simulate two weeks and predict one week In this case, 14 data in Cuba from September 4 to September 17 were selected for modeling, and 7 data from September 18 to September 25 were tested. The number of confirmed cases on September 24 was zero, so it was not involved in the calculation. Through calculation, the model parameters are as follows: The time response formula is: The results were compared with NGM(1,N) model, GM(1,N) model and GMVM(1,N) model. The specific data are shown in Table 2.

Table 2

Simulation and prediction results of each model for two-week data in Cuba.

Day	Buffer	D-SIRGM(1,N)		NGM(1,N)		GM(1,N)		GMVM(1,N)
	data	Values	APE (%)	Values	APE (%)	Values	APE (%)	Values	APE (%)
9.4	58.29	58.29	0.0000	58.29	0.0000	58.29	0.0000	58.29	0.0000
9.5	56.80	57.73	1.6380	57.06	0.4639	47.44	16.4864	57.21	0.7251
9.6	57.05	58.18	1.9724	56.56	0.8708	63.08	10.5663	58.12	1.8694
9.7	58.44	59.62	2.0118	56.08	4.0376	58.70	0.4450	58.99	0.9322
9.8	61.24	60.12	1.8164	55.50	9.3650	60.15	1.7721	59.80	2.3382
9.9	62.38	61.09	2.0539	54.87	12.0282	60.66	2.7445	60.57	2.8965
9.10	64.87	62.17	4.1559	54.21	16.4255	60.99	5.9694	61.28	5.5292
9.11	63.64	62.65	1.5630	53.42	16.0583	63.51	0.2137	61.89	2.7482
9.12	61.46	63.54	3.3822	52.57	14.4738	64.57	5.0603	62.44	1.5989
9.13	63.08	64.70	2.5564	51.68	18.0822	64.78	2.6892	62.94	0.2332
9.14	63.36	65.09	2.7274	50.63	20.0901	67.72	6.8706	63.25	0.1809
9.15	66.60	65.81	1.1921	49.48	25.7005	69.52	4.3887	63.47	4.7034
9.16	69.33	68.19	1.6513	48.48	30.0808	65.58	5.4198	63.88	7.8674
9.17	68.38	69.87	2.1907	47.50	30.5316	64.34	5.8982	64.14	6.1952
MAPE (%)			2.2240		15.2468		5.2711		2.9091
9.18	67.71	69.07	2.0025	46.16	31.8305	71.90	6.1766	63.67	5.9764
9.19	69.50	71.47	2.8362	44.95	35.3187	68.12	1.9824	63.86	8.1101
9.20	69.20	69.98	1.1282	43.27	37.4679	78.12	12.8869	62.65	9.4614
9.21	73.75	68.05	7.7300	41.05	44.3364	89.29	21.0726	60.72	17.6641
9.22	69.67	76.99	10.5151	39.96	42.6370	62.39	10.4416	63.39	9.0078
9.23	64.00	75.62	18.1619	38.39	40.0089	72.58	13.4044	61.50	3.9134
9.25	80.00	77.10	3.6280	36.78	54.0200	72.58	9.2765	60.72	24.1006
MAPE (%)			6.5717		40.8028		10.7487		11.1763

As can be seen from Table 2, except for NGM(1,N), the other three models have better performance. The MAPEs of D-SIRGM (1,N) and GMVM(1,N) are close, which are 2.2240% and 2.9091%, respectively. Meanwhile, the MAPEp of the three models are about 10%, among which D-SIRGM(1,N) is the lowest. As a whole, the performance difference between models is not apparent, so Fig. 5 is drawn below to further compare and analyze in detail.

Fig. 5b

Comparison of APE of each model for two-week of data in Cuba.

Simulation and prediction results of each model for two-week data in Cuba. Simulation and prediction results of each model for three-week data in Cuba. As shown in Fig. 5a, the NGM(1,N) model shows a monotonically decreasing trend, which is contrary to the actual data, so it has the worst effect among the comparison models. Therefore, the results of the remaining three models are mainly discussed. The curve of GMVM(1,N) is almost all below the actual data and shows a slight downward trend in the prediction part, so the prediction accuracy is relatively low. The curves of D-SIRGM(1,N) and GM(1,N) show fluctuate up and down near the actual data curve, which is consistent with the changing trend of the real data. However, the fluctuation range of the GM(1,N) model is relatively large, so the simulation and prediction effect of D-SIRGM(1,N) is better.

Fig. 5a

Comparison of the results of each model for two-week of data in Cuba.

Comparison of the results of each model for two-week of data in Cuba. Comparison of APE of each model for two-week of data in Cuba. Case 2 Simulate three weeks and predict one week In this case, the data for three weeks are selected for modeling, and the number of newly diagnosed patients in the next week is predicted. 21 data from 5 July to 2 August in Cuba were selected for modeling, and 7 data from 4 August to 11 August were tested. In the same way got calculation results and compared them with the above model. The specific results are shown in Table 3.

Table 3

Simulation and prediction results of each model for three-week data in Cuba.

Day	Buffer	D-SIRGM(1,N)		NGM(1,N)		GM(1,N)		GMVM(1,N)
	data	Values	APE (%)	Values	APE (%)	Values	APE (%)	Values	APE (%)
7.5	24.18	24.18	0.0000	24.18	0.0000	24.18	0.0000	24.18	0.0000
7.6	24.96	24.26	2.8074	24.12	3.3588	11.10	55.5469	24.19	3.1051
7.8	25.62	25.13	1.9037	23.33	8.9301	23.61	7.8369	25.08	2.0734
7.9	25.88	26.07	0.7192	22.48	13.1526	32.61	25.9886	26.14	1.0069
7.11	26.79	27.02	0.8364	21.55	19.5601	39.67	48.0813	27.39	2.2412
7.13	27.22	28.08	3.1545	20.57	24.4362	44.05	61.8628	28.81	5.8691
7.14	28.09	29.26	4.1556	19.53	30.4811	46.21	64.5019	30.38	8.1627
7.16	29.24	30.48	4.2587	18.42	37.0137	48.25	65.0387	32.20	10.1165
7.17	30.40	31.73	4.3868	17.21	43.3901	50.62	66.5238	34.34	12.9476
7.19	31.68	33.27	5.0094	15.96	49.6191	48.16	51.9991	36.28	14.4935
7.20	33.39	34.79	4.1843	14.62	56.2178	49.18	47.3008	38.80	16.1990
7.21	35.29	36.38	3.0762	13.17	62.6907	50.20	42.2325	41.66	18.0452
7.22	37.31	38.18	2.3145	11.63	68.8190	48.75	30.6590	44.43	19.0717
7.23	38.93	40.02	2.7959	9.98	74.3569	49.44	26.9980	47.87	22.9515
7.25	41.43	42.09	1.6042	8.24	80.1144	47.64	14.9881	51.06	23.2367
7.26	44.38	44.17	0.4776	6.35	85.7007	49.55	11.6341	55.52	25.0845
7.28	45.92	46.44	1.1425	4.32	90.5866	49.56	7.9345	59.93	30.5198
7.29	44.64	48.76	9.2470	2.13	95.2290	51.86	16.1742	65.65	47.0850
7.30	45.80	51.13	11.6381	−0.26	100.5688	55.73	21.6844	72.89	59.1391
7.31	49.89	53.56	7.3599	−2.87	105.7448	60.10	20.4596	81.55	63.4731
8.2	54.75	55.96	2.2161	−5.73	110.4641	66.92	22.2203	93.13	70.1076
MAPE (%)			3.6644		58.0217		35.4833		22.7465
8.4	57.14	58.73	2.7843	−8.78	115.3613	66.09	15.6655	101.86	78.2630
8.5	62.67	61.37	2.0710	−12.12	119.3348	73.21	16.8246	117.22	87.0482
8.7	64.00	64.73	1.1463	−15.58	124.3495	64.72	1.1240	122.12	90.8108
8.8	67.75	68.16	0.6090	−19.28	128.4598	64.28	5.1178	133.17	96.5672
8.9	72.33	72.36	0.0350	−23.09	131.9239	52.25	27.7688	132.80	83.5993
8.10	79.00	76.88	2.6890	−27.08	134.2820	45.51	42.3865	135.71	71.7873
8.11	93.00	81.57	12.2883	−31.34	133.7018	45.35	51.2410	145.41	56.3576
MAPE (%)			3.0890		126.7733		22.8754		80.6333

As shown in Table 3, D-SIRGM (1,N) has the smallest MAPEs and MAPEp, which are both below 5%. While the other three comparison models have poor simulation and prediction effects, all of which are above 20%. Table 3 is drawn as Fig. 6 below to compare further and analyze the effects. As the poor effect of the NGM(1,N) model will affect the observation, it is not drawn in Fig. 6a.

Fig. 6a

Comparison of the results of each model for three-week of data in Cuba.

It can be seen from Fig. 6a that the actual data curve shows a steady upward trend. D- SIRGM(1,N) and GMVM(1,N) are consistent with the actual data, but GMVM(1,N) increased exponentially, making the curves above the original data curve and gradually deviate. GM(1,N) fluctuates in an unstable state, and the predicted part shows an obvious downward trend, which is inconsistent with the actual situation. In Figs. 6a–6b, D-SIRGM(1,N) curve is basically consistent with the original data curve and has good effects in both prediction and simulation stages. Therefore, D-SIRGM(1,N) has good performance in this case.

Fig. 6b

Comparison of APE of each model for three-week of data in Cuba.

Comparison of the results of each model for three-week of data in Cuba. Comparison of APE of each model for three-week of data in Cuba. Case3 Simulate four weeks and predict one week In this case, the data of four weeks are selected for modeling, and the number of newly diagnosed patients in the next week is predicted. 21 data from August 16 to September 14 in Cuba were selected for modeling, and 7 data from September 15 to September 21 were tested. The calculated results are compared with the above models, and the specific data are shown in Table 4.

Table 4

Simulation and prediction results of each model for four-week data in Cuba.

Day	Buffer	D-SIRGM(1,N)		NGM(1,N)		GM(1,N)		GMVM(1,N)
	data	Values	APE (%)	Values	APE (%)	Values	APE (%)	Values	APE (%)
8.16	54.63	54.63	0.0000	54.63	0.0000	54.63	0.0000	54.63	0.0000
8.18	53.68	53.40	0.5199	53.29	0.7109	34.17	36.3427	53.61	0.1150
8.19	53.85	53.84	0.0121	54.75	1.6692	60.32	12.0247	53.72	0.2420
8.20	54.16	54.19	0.0596	56.10	3.5886	61.86	14.2320	53.87	0.5277
8.21	53.52	54.47	1.7879	57.40	7.2508	59.14	10.5014	54.07	1.0391
8.22	52.53	54.77	4.2515	58.69	11.7249	57.58	9.6007	54.33	3.4259
8.23	53.76	55.05	2.3974	59.95	11.5198	55.92	4.0281	54.64	1.6419
8.24	54.43	55.31	1.6141	61.18	12.3970	54.96	0.9844	55.00	1.0512
8.25	54.04	55.58	2.8475	62.39	15.4664	55.05	1.8759	55.43	2.5699
8.26	54.77	55.87	2.0064	63.61	16.1488	55.52	1.3760	55.92	2.0953
8.27	55.88	56.21	0.5892	64.86	16.0648	56.67	1.4077	56.49	1.0997
8.28	57.58	56.54	1.8044	66.07	14.7315	56.50	1.8805	57.09	0.8482
8.30	55.43	56.84	2.5417	67.23	21.2787	56.08	1.1665	57.74	4.1499
8.31	55.27	57.13	3.3586	68.37	23.7017	56.21	1.6908	58.45	5.7553
9.1	55.62	57.40	3.2055	69.49	24.9380	56.20	1.0518	59.22	6.4795
9.2	55.45	57.67	3.9997	70.59	27.2963	56.36	1.6411	60.06	8.3147
9.3	56.63	58.01	2.4355	71.74	26.6816	58.39	3.1133	61.13	7.9384
9.4	56.39	58.35	3.4695	72.85	29.1994	58.00	2.8553	62.07	10.0837
9.5	54.53	58.76	7.7650	74.04	35.7840	60.43	10.8241	63.39	16.2475
9.6	54.69	59.32	8.4697	75.33	37.7379	63.36	15.8652	64.92	18.7145
9.7	56.20	59.82	6.4375	76.51	36.1464	61.83	10.0255	66.02	17.4658
9.8	59.43	60.40	1.6333	77.80	30.9104	64.77	8.9806	67.81	14.0988
9.9	60.69	61.02	0.5470	79.09	30.3071	65.59	8.0669	69.44	14.4198
9.10	63.67	61.65	3.1716	80.36	26.2243	66.03	3.7092	71.12	11.7144
9.11	62.00	62.39	0.6366	81.75	31.8611	69.48	12.0621	73.53	18.5922
9.12	59.00	63.22	7.1444	83.18	40.9778	71.02	20.3717	75.77	28.4265
9.13	60.89	64.03	5.1595	84.58	38.9127	71.40	17.2637	77.91	27.9537
9.14	61.00	65.01	6.5662	86.14	41.2195	75.91	24.4368	81.26	33.2186
MAPE (%)			3.1271		22.7574		8.7918		9.5640
9.15	65.29	66.12	1.2842	87.80	34.4849	79.03	21.0462	84.61	29.5990
9.16	69.17	66.97	3.1762	89.17	28.9190	73.13	5.7353	85.45	23.5442
9.17	67.60	67.62	0.0284	90.43	33.7735	71.33	5.5196	87.36	29.2309
9.18	66.25	68.85	3.9287	92.28	39.2906	86.04	29.8787	95.22	43.7282
9.19	69.33	69.97	0.9133	93.85	35.3599	80.41	15.9793	96.41	39.0577
9.20	68.50	72.27	5.5058	96.65	41.1004	110.74	61.6645	112.42	64.1190
9.21	86.00	78.08	9.2088	102.51	19.1972	185.09	115.2174	152.46	77.2825
MAPE (%)			3.4351		33.1608		36.4344		43.7945

It can be seen from Table 4 that the MAPEs of D-SIRGM(1,N), GM(1,N) and GMVM(1,N) are all below 10%, among which D-SIRGM(1,N) is the least. Except for D-SIRGM(1,N), the MAPEp of the other three models are all above 30%. In order to better observe the changes of single data and the overall trend, Table 4 is drawn as Fig. 7. As the data of the three comparison models on the last day is too large, which affects the observation of the overall trend, it is not included in Fig. 7a.

Fig. 7a

Comparison of the results of each model for four-week of data in Cuba.

Simulation and prediction results of each model for four-week data in Cuba. It can be seen from Fig. 7a that GM(1,N) and GMVM(1,N) had better effects in the simulation stage, but they increased sharply in the prediction stage. From Fig. 7b, D-SIRGM(1,N) is close to the actual data in both the simulation and prediction stage, and the actual data curve fluctuates near the D-SIRGM(1,N) curve, which indicates that D-SIRGM(1,N) can better fit the actual data.

Fig. 7b

Comparison of APE of each model for four-week of data in Cuba.

Comparison of the results of each model for four-week of data in Cuba. Comparison of APE of each model for four-week of data in Cuba. Based on the above three Cuba cases, it can be concluded that D-SIRGM(1,N) can not only fit the increase and decrease trend of data and capture the fluctuation characteristics of data well but also has a stable simulation and prediction performance for cases with different data volumes. When the length of the simulated data is two weeks, three weeks, and four weeks, the simulation accuracy is 2.2240%, 3.6644% and 3.1271%, respectively, which are all below 5%. Even in the case of a small number of samples, the simulation accuracy is still high, and the prediction performance becomes more stable with the increase of samples, 6.5717%, 3.0890%, and 3.4351%, respectively, indicating that D-SIRGM(1,N) also has stable performance in small sample data.

Application of D-SIRGM model in COVID-19 prediction in UK

In the UK cases, daily new diagnoses and new cures were used between 10 August and 10 November 2020. Data of two, three and four weeks were selected for modeling to predict the number of newly confirmed cases in the next week. Then, the results obtained by D-SIRGM(1,N) model were compared with GM(1,N), NGM(1,N) and GMVM(1,N). Case 1 Simulate two weeks and predict one week In this case, the data of two weeks are selected for modeling, and the number of newly diagnosed patients in the next week is predicted. 14 data from October 21 to November 3 in the UK were selected for modeling, and 7 data from November 4 to November 10 were tested. The results generated by the model were compared with NGM(1,N), GM(1,N) and GMVM(1,N), the simulation and prediction results of the two-week data in the UK of each model are shown in Table 5.

Table 5

Simulation and prediction results of each model for two-week data in UK.

Day	Buffer	D-SIRGM(1,N)		NGM(1,N)		GM(1,N)		GMVM(1,N)
	data	Values	APE (%)	Values	APE (%)	Values	APE (%)	Values	APE (%)
10.21	22505.95	22505.95	0.0000	22505.95	0.0000	22505.95	0.0000	22505.95	0.0000
10.22	22562.35	22491.02	0.3162	22451.98	0.4892	16919.31	25.0109	22369.54	0.8546
10.23	22343.95	22511.93	0.7518	24167.26	8.1602	25002.27	11.8973	22419.66	0.3389
10.24	22404.56	22571.07	0.7432	25772.47	15.0323	24567.40	9.6536	22415.22	0.0476
10.25	22512.82	22665.94	0.6801	27280.90	21.1794	24481.29	8.7437	22347.40	0.7348
10.26	22481.06	22741.39	1.1580	28658.04	27.4764	24012.88	6.8138	22228.90	1.1217
10.27	22660.20	22830.94	0.7535	29932.43	32.0925	24184.79	6.7281	22043.62	2.7210
10.28	22784.36	23003.37	0.9612	31172.64	36.8160	25785.91	13.1737	21734.32	4.6086
10.29	22775.38	23093.94	1.3987	32263.56	41.6598	24767.96	8.7488	21449.39	5.8220
10.30	22613.58	22949.52	1.4856	33065.74	46.2207	20349.71	10.0111	21374.49	5.4794
10.31	22571.45	22922.01	1.5531	33848.47	49.9614	21425.77	5.0758	20976.02	7.0684
11.1	22386.90	22773.72	1.7279	34476.69	54.0038	19437.65	13.1740	20785.78	7.1520
11.2	22438.56	22751.45	1.3944	35130.32	56.5623	21192.38	5.5537	20214.19	9.9132
11.3	22335.63	22780.17	1.9903	35786.18	60.2202	22474.71	0.6227	19565.05	12.4043
MAPE (%)			1.1472		34.6057		9.6313		4.4820
11.4	22818.00	22969.36	0.6633	36530.07	60.0932	25685.38	12.5663	18478.34	19.0186
11.5	23272.50	22591.07	2.9281	36788.19	58.0758	16197.99	30.3986	19414.35	16.5781
11.6	22886.20	22344.01	2.3691	37028.20	61.7927	16278.98	28.8699	18990.78	17.0208
11.7	22565.50	22029.57	2.3750	37185.71	64.7901	14881.90	34.0502	18860.28	16.4198
11.8	22313.00	21728.01	2.6217	37316.42	67.2407	14578.19	34.6650	18527.23	16.9666
11.9	20984.50	21643.25	3.1392	37598.71	79.1737	18222.74	13.1610	17126.01	18.3873
11.10	21385.00	21051.61	1.5590	37477.92	75.2533	9718.79	54.5532	19000.76	11.1491
MAPE (%)			2.2365		66.6314		29.7520		16.5058

As can be seen from Table 5, the MAPEs of D-SIRGM(1,N), GM(1,N) and GMVM(1,N) are all below 10%, and D-SIRGM(1,N) is the least. However, only the MAPEp of D-SIRGM(1,N) is less than 10%, which is 2.2365%, showing excellent prediction performance. The MAPEp of the three comparison models are all high, which are 66.6314%, 29.7520%, 16.5058%, respectively. For a more intuitive comparison of models, Fig. 8 is drawn below.

Fig. 8a

Comparison of the results of each model for two-week of data in the UK.

Simulation and prediction results of each model for two-week data in UK. It can be seen from Fig. 8a that the actual data curve is relatively flat, while the NGM(1,N) curve is all above the actual data curve and presents an upward trend. GM(1,N) fluctuated around the actual data curve, but showed an obvious downward trend as a whole. D-SIRGM(1,N) and GMVM(1,N) are relatively close to the actual data curve, but GMVM(1,N) also has an obvious downward trend, while D-SIRGM(1,N) curve is almost consistent with the actual data curve. It can also be seen from Fig. 8b that the error between D-SIRGM(1,N) and the original data is minimal. Therefore, in this case, the simulation performance of D-SIRGM(1,N) is good, and the prediction performance is significantly better than the three comparison models.

Fig. 8b

Comparison of APE of each model for two-week of data in the UK.

Comparison of the results of each model for two-week of data in the UK. Comparison of APE of each model for two-week of data in the UK. Case 2 Simulate three weeks and predict one week In this case, the data of three weeks are selected for modeling, and the number of newly diagnosed patients in the next week is predicted. 21 data from October 13 to November 4 in the UK were selected for modeling, and 7 data from November 5 to November 11 were tested. The calculation results are compared with the above model. The specific data are shown in Table 6.

Table 6

Simulation and prediction results of each model for three-week data in the UK.

Day	Buffer	D-SIRGM(1,N)		NGM(1,N)		GM(1,N)		GMVM(1,N)
	data	Values	APE (%)	Values	APE (%)	Values	APE (%)	Values	APE (%)
10.13	22525.14	22525.14	0.0000	22525.14	0.0000	22525.14	0.0000	22525.14	0.0000
10.14	22840.04	22688.55	0.6633	22725.27	0.5025	17728.73	22.3787	22646.36	0.8480
10.16	22296.27	22616.25	1.4351	24174.05	8.4220	23126.24	3.7225	22621.10	1.4569
10.17	22428.36	22591.34	0.7266	25531.64	13.8364	22169.56	1.1539	22599.58	0.7634
10.18	22710.46	22575.14	0.5959	26802.51	18.0184	22076.18	2.7929	22582.01	0.5656
10.19	22994.17	22588.08	1.7661	28002.47	21.7807	22656.21	1.4698	22569.13	1.8485
10.21	22412.55	22578.25	0.7393	29115.49	29.9071	22637.50	1.0037	22559.72	0.6567
10.22	22461.81	22550.66	0.3956	30138.58	34.1770	22334.89	0.5650	22553.29	0.4073
10.23	22249.30	22547.94	1.3422	31094.88	39.7567	22570.88	1.4453	22552.09	1.3609
10.24	22301.74	22581.16	1.2529	32004.41	43.5064	23295.47	4.4559	22557.85	1.1484
10.25	22398.28	22638.13	1.0708	32873.06	46.7660	24100.65	7.6005	22570.46	0.7687
10.26	22361.65	22658.84	1.3290	33674.56	50.5907	23965.04	7.1703	22583.20	0.9908
10.27	22522.13	22697.93	0.7806	34425.46	52.8517	24205.43	7.4740	22603.08	0.3595
10.28	22628.80	22820.18	0.8457	35169.48	55.4191	25735.30	13.7281	22641.89	0.0578
10.29	22609.36	22826.28	0.9595	35827.41	58.4628	24699.42	9.2442	22661.56	0.2309
10.30	22447.23	22607.83	0.7155	36304.41	61.7322	20408.98	9.0802	22635.43	0.8384
10.31	22394.75	22594.90	0.8937	36776.89	64.2210	21376.23	4.5480	22672.81	1.2416
11.1	22210.91	22433.43	1.0019	37154.78	67.2817	19432.93	12.5073	22667.96	2.0578
11.2	22239.80	22434.75	0.8766	37553.93	68.8591	20999.00	5.5792	22723.05	2.1729
11.2	22126.22	22449.30	1.4602	37957.58	71.5502	22074.80	0.2324	22780.09	2.9552
11.4	22522.13	22594.29	0.3204	38417.70	70.5776	24834.15	10.2656	22887.59	1.6227
MAPE (%)			0.9585		43.9110		6.3209		1.1176
11.5	22869.43	22122.30	3.2669	38575.26	68.6761	16166.89	29.3078	22720.34	0.6519
11.6	22480.33	21946.83	2.3732	38720.47	72.2415	16136.96	28.2175	22741.96	1.1638
11.7	22142.60	21698.95	2.0036	38814.80	75.2947	14837.62	32.9906	22727.08	2.6396
11.8	21847.50	21478.00	1.6913	38890.58	78.0093	14460.39	33.8122	22735.28	4.0635
11.9	20806.67	21414.49	2.9213	39032.47	87.5960	16765.67	19.4217	22835.14	9.7491
11.10	20918.00	20948.63	0.1464	38994.83	86.4176	11316.83	45.8991	22666.93	8.3609
11.11	20451.00	20750.78	1.4658	38998.98	90.6947	12574.25	38.5152	22725.06	11.1195
MAPE (%)			1.9812		79.8471		32.5949		5.3926

As can be seen from Table 6, the MAPEs of GM(1,N) is 6.3209%, but the MAPEp is as high as 32.5949%, showing poor comprehensive effect. The MAPEs and MAPEp of NGM(1,N) are more than 30%. The simulation and prediction effects of D-SIRGM(1,N) and GMVM(1,N) are the best among the four models, all of which are about 5%. There is little difference between the two models in the simulation stage, and the error of D-SIRGM(1,N) is smaller in the prediction stage. To further intuitively compare the effects of the two models, they are drawn as Fig. 9.

Fig. 9b

Comparison of APE of each model for three-week of data in UK.

Simulation and prediction results of each model for three-week data in the UK. As can be seen in Fig. 9a, in the simulation stage, the actual data showed an obvious trend of fluctuation, both GMVM(1,N) and D-SIRGM(1,N) were basically above the actual data curve. However, GMVM(1,N) was a relatively stable straight line, D-SIRGM(1,N) showed a slight trend of fluctuation. In the prediction stage, the actual data showed an obvious downward trend. Although GMVM(1,N) showed fluctuations, it completely deviated from the actual data, while D-SIRGM(1,N) was consistent with the actual data and showed an obvious downward trend, so the prediction effect was better.

Fig. 9a

Comparison of D-SIRGM(1,N) and GMVM(1,N) for three-week of data in the UK.

Comparison of D-SIRGM(1,N) and GMVM(1,N) for three-week of data in the UK. Comparison of APE of each model for three-week of data in UK. Case3 Simulate four weeks and predict one week In this case, four-week data are selected for modeling to predict the number of newly diagnosed cases in the next week. 21 data from August 10 to September 15 in the UK were selected for modeling, and 7 data from September 16 to September 22 were tested. The calculation results are compared with the above model. The specific data are shown in Table 7.

Table 7

Simulation and prediction results of each model for four-week data in the UK.

Day	Buffer	D-SIRGM(1,N)		NGM(1,N)		GM(1,N)		GMVM(1,N)
	data	Values	APE (%)	Values	APE (%)	Values	APE (%)	Values	APE (%)
8.10	2557.29	2557.29	0.0000	2557.29	0.0000	2557.29	0.0000	2557.29	0.0000
8.12	2576.56	2490.09	3.3559	2547.40	1.1318	693.34	73.0906	2537.24	1.5259
8.15	2588.97	2542.58	1.7918	2495.50	3.6105	1610.94	37.7769	2587.98	0.0383
8.16	2588.84	2597.25	0.3246	2445.13	5.5513	2369.23	8.4830	2637.58	1.8826
8.17	2638.65	2652.80	0.5364	2396.88	9.1625	3001.75	13.7612	2685.74	1.7850
8.18	2689.70	2709.96	0.7534	2350.72	12.6029	3526.57	31.1138	2732.18	1.5792
8.19	2757.41	2767.91	0.3806	2307.68	16.3101	3978.29	44.2763	2776.35	0.6866
8.20	2787.36	2827.84	1.4524	2267.80	18.6396	4360.59	56.4417	2817.91	1.0960
8.21	2846.15	2888.61	1.4921	2232.68	21.5542	4714.14	65.6324	2855.85	0.3407
8.22	2915.27	2965.95	1.7384	2183.59	25.0983	4640.95	59.1946	2898.79	0.5652
8.23	2979.08	3022.69	1.4640	2138.74	28.2080	4875.37	63.6537	2932.66	1.5581
8.24	3058.33	3081.97	0.7730	2098.34	31.3893	5089.49	66.4137	2962.16	3.1448
8.26	3148.65	3143.48	0.1644	2063.04	34.4785	5297.74	68.2544	2986.18	5.1599
8.27	3234.73	3207.04	0.8559	2034.05	37.1185	5517.40	70.5676	3003.13	7.1597
8.29	3264.57	3277.75	0.4037	2006.40	38.5403	5603.13	71.6346	3019.00	7.5222
8.30	3307.60	3363.39	1.6866	1960.86	40.7164	5198.34	57.1634	3057.20	7.5704
8.31	3390.84	3422.70	0.9395	1920.48	43.3628	5350.47	57.7917	3063.61	9.6506
9.2	3499.33	3506.26	0.1979	1856.49	46.9473	4766.05	36.1987	3116.03	10.9535
9.3	3626.65	3566.39	1.6616	1784.23	50.8021	4569.20	25.9896	3146.85	13.2298
9.5	3649.31	3629.79	0.5351	1694.33	53.5712	4128.70	13.1363	3200.68	12.2935
9.6	3639.87	3673.09	0.9127	1602.65	55.9695	4109.25	12.8955	3221.16	11.5033
9.7	3685.71	3743.56	1.5694	1472.64	60.0447	3127.48	15.1460	3342.18	9.3207
9.8	3742.77	3764.54	0.5817	1345.78	64.0433	3267.07	12.7098	3357.21	10.3014
9.9	3852.33	3790.03	1.6173	1220.82	68.3097	3378.86	12.2905	3370.11	12.5176
9.11	3955.55	3816.16	3.5238	1100.76	72.1717	3576.41	9.5850	3365.97	14.9051
9.12	4058.10	3849.03	5.1519	982.40	75.7915	3689.90	9.0732	3365.33	17.0714
9.13	3726.11	3900.36	4.6763	844.22	77.3431	3223.22	13.4965	3450.27	7.4030
9.15	3774.63	3937.84	4.3241	693.11	81.6378	2959.85	21.5856	3514.43	6.8931
MAPE (%)			1.5876		39.7817		38.0502		6.5799
9.16	3938.29	3958.63	0.5166	545.27	86.1547	3150.50	20.0032	3503.15	11.0490
9.17	4075.17	4014.08	1.4989	359.02	91.1899	2204.65	45.9003	3704.94	9.0851
9.18	4089.40	4029.11	1.4743	164.83	95.9694	2122.53	48.0967	3761.25	8.0245
9.19	4259.25	4045.68	5.0143	−41.17	100.9665	1944.76	54.3403	3841.10	9.8173
9.20	4235.67	4082.26	3.6217	−290.46	106.8575	914.77	78.4031	4124.05	2.6351
9.21	4141.00	4093.40	1.1495	−575.02	113.8860	142.11	96.5683	4380.08	5.7735
9.22	4383.00	4070.69	7.1256	−868.60	119.8174	122.55	97.2040	4473.19	2.0578
MAPE (%)			2.9144		102.1202		62.9308		6.9203

It can be seen from Table 7 that the value of newly confirmed cases predicted by the NGM(1,N) model is negative, which is inconsistent with the actual situation. The data of GM(1,N) model showed a downward trend, which did not conform to the trend of the epidemic. Therefore, the performance of the two models was poor in this case. However, the MAPEs of D-SIRGM (1,N) and GMVM(1,N) are both less than 10%, and D-SIRGM(1,N) is better. In order to better compare GMVM(1,N) and D-SIRGM(1,N), the results of the two models in Table 7 are drawn in Fig. 10.

Fig. 10a

Comparison of D-SIRGM(1,N) and GMVM(1,N) for four-week of data in the UK.

Simulation and prediction results of each model for four-week data in the UK. On the whole, the simulation and prediction effects of the two models are good. The two curves are also distributed near the actual data curve, and D-SIRGM(1,N) is more consistent with the original data. In the simulation stage, the curve of D-SIRGM(1,N) is basically consistent with the actual data curve. In addition, the curve of GMVM(1,N) changes with time and the growth rate also increases, so it is flat in the early stage and sharply increases in the late stage. From Fig. 10b, D-SIRGM(1,N) is closer to the changing trend of actual data and has a small and stable error.

Fig. 10b

Comparison of APE of each model for four-week of data in the UK.

Comparison of D-SIRGM(1,N) and GMVM(1,N) for four-week of data in the UK. Comparison of APE of each model for four-week of data in the UK. Based on the above three UK cases, it can be seen that D-SIRGM(1,N) can not only keenly simulate the variation trend and numerical characteristics of large numerical data, but also have good simulation and prediction performance with different amounts of simulated data. When the amount of simulated data is two weeks, three weeks and four weeks, the simulation accuracy is 1.1472%, 0.9585%, 1.5876%, respectively, and the prediction accuracy is around 2%, that is, regardless of the size of the simulated sample, this model has stable simulation performance and good prediction effect.

Application summary

To illustrate the differences between the D-SIRGM(1,N) model and the comparison models, all the results of four models from six cases in Cuba and the UK are presented in Table 8 below.

Table 8

Simulation and prediction errors of each model in each case.

MAPE	Model	Cuba			UK
		Case1	Case2	Case3	Case1	Case2	Case3
MAPEs	D-SIRGM(1,N)	2.224	3.6644	3.1271	1.1472	0.9585	1.5876
	NGM(1,N)	15.2468	58.0217	22.7574	34.6057	43.911	39.7817
	GM(1,N)	5.2711	35.4833	8.7918	9.6313	6.3209	38.0502
	GMVM(1,N)	2.9091	22.7465	9.564	4.482	1.1176	6.5799

MAPEp	D-SIRGM(1,N)	6.5717	3.089	3.4351	2.2365	1.9812	2.9144
	NGM(1,N)	40.8028	126.7733	33.1608	66.6314	79.8471	102.1202
	GM(1,N)	10.7487	22.8754	36.4344	29.752	32.5949	62.9308
	GMVM(1,N)	11.1763	80.6333	43.7945	16.5058	5.3926	6.9203

As can be seen from Table 8, D-SIRGM(1,N)model has the smallest simulation and prediction error in each case, which is superior to the three comparison models and shows good simulation and prediction performance. On the one hand, the number of model modeling data in Case1, Case2 and Case3 increases gradually. But the D-SIRGM(1,N)model all shows good results. The simulation errors are all below 5%, indicating that the simulation performance of this model is relatively stable when the data of different lengths are used for modeling. On the other hand, the value range of Cuba and the UK is different. The value of Cuba is small, and the value of the UK is large. D-SIRGM(1,N) model shows good performance in the calculation of both countries, and the performance of the UK case is relatively better. This may be because the fluctuation of large value data is more apparent, and the differential term in the model structure can grasp its change characteristics more keenly. Hence, it also shows better performance in the UK case. Simulation and prediction errors of each model in each case. From the above cases in Cuba and the UK can obtain the following conclusions. D-SIRGM(1,N) is relatively sensitive to the overall trend of data of different orders of magnitude and can better capture the increase, decrease, and fluctuation characteristics of data. It has a good simulation and prediction performance for data with complex changes. For cases with two, three, and four weeks of modeling data, the simulation accuracy of each case is less than 5%, meaning that D-SIRGM(1,N) has robust simulation performance. And its prediction accuracy is almost below 5%, indicating that the models established with different data volumes all have good prediction performance, and the model is effective in epidemic prediction. During the epidemic prevention and control period, the closed observation period is usually 14–28 days. If the development of the epidemic can be more clearly understood during this period, it will be more conducive for relevant departments to respond in advance and take reasonable and targeted measures based on the analysis results. D- SIRGM(1,N) has a good simulation and prediction performance for the data within 14–28 days. Therefore, during the prevention and control period it can make effective prediction of the next seven days by the existing data. On the one hand, relevant departments can timely adjust the assessment of epidemic risk levels based on the weekly analysis results to strengthen or weaken the prevention levels, which can control the corresponding costs and avoid the waste of resources due to excessive prevention or the aggravation of the epidemic caused by inadequate prevention. On the other hand, people can judge the current situation based on the results, and appropriately reduce or increase their movement, so as to promote the control measures during-epidemic and economic recovery measures in post-epidemic.

Conclusion

COVID-19 with strong infectivity has spread globally. The epidemic is affected by various factors, and its change is uncertain. Historical data with a large span may influence the effect of some prediction models, so there are certain limitations. As a kind of model applicable to small sample data and widely used, the grey prediction model has been applied to the prediction of COVID-19 and achieved some results. However, most current grey prediction models are univariate and seldom consider multiple influencing factors. At the same time, the original data of COVID-19 is modeled and calculated directly and seldom consider the data characteristics of infectious diseases. Therefore, from the background of infectious disease model, this paper proposes a new grey prediction model based on the SIR infectious disease model. In this paper, by analyzing the relationship between the parameters of the classic SIR model, the dynamic SIR differential equation is established, and the grey buffer operator is introduced to put forward a SIR grey prediction model based on the buffer operator. At the same time, the classical mathematics method Laplace transform is used to solve the model, and the modeling steps and the process of the model are obtained. Compared with other prediction methods, the proposed model retains the structural characteristics of SIR infectious disease model, which can better represent the changing trend of the number of the confirmed infectious diseases and the quantitative relationship between the number of confirmed and cured infectious diseases, and is more in line with the actual background of infectious disease prediction. Compared with the traditional grey prediction model, the proposed model structure contains the differential terms of all variables, which can better reflect the impact of the change degree of relevant variables on the whole system and can grasp data fluctuations more keenly in the actual calculation. Therefore, the new model can not only take the structure and computational advantages of the grey prediction model, but also get more explanatory prediction results consistent with the law of infectious diseases. The model is applied to the prediction of COVID-19 in the UK and Cuba. Compared with classical GM(1,N), NGM(1,N) and GMVM(1,N), the new model has the optimal simulation and prediction accuracy, and its performance is relatively stable for different types of data. It also shows that the new model has universal applicability to large or small country, and can be applied to epidemic prediction in each country. At the same time, the model has obtained effective results for the data of two, three and four weeks, indicating that the model can make effective prediction by using historical data of different lengths, which is very important for countries to adjust the epidemic prevention and control measures in real time reasonably and effectively. In this paper, a new grey prediction model is proposed from the background of infectious diseases, and the buffer operator is used to preprocess the data. However, the prediction of COVID-19 is a very complicated work. Although the proposed model shows good performance in calculation, there are still some problems that can be further studied. In the future, this model can be further optimized from the data processing and parameter optimization to improve the overall accuracy. At the same time, the modeling mechanism of the model will be studied. The advantages of the model structure and its application scope are analyzed theoretically. In addition, the model structure can be generalized and applied to other areas.

CRediT authorship contribution statement

Huiming Duan: Conceptualization, Methodology, Funding acquisition, Project administration, Supervision, Writing – original draft, Editing. Weige Nie: Software, Visualization, Writing – original draft, Editing, Validation, Investigation, Formal analysis, Data curation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

18 in total

1. A novel multivariable grey prediction model and its application in forecasting coal consumption.

Authors: Huiming Duan; Xilin Luo
Journal: ISA Trans Date: 2021-03-22 Impact factor: 5.468

Review 2. Prediction of the Number of Patients Infected with COVID-19 Based on Rolling Grey Verhulst Models.

Authors: Yu-Feng Zhao; Ming-Huan Shou; Zheng-Xin Wang
Journal: Int J Environ Res Public Health Date: 2020-06-25 Impact factor: 3.390

3. Multiple Ensemble Neural Network Models with Fuzzy Response Aggregation for Predicting COVID-19 Time Series: The Case of Mexico.

Authors: Patricia Melin; Julio Cesar Monica; Daniela Sanchez; Oscar Castillo
Journal: Healthcare (Basel) Date: 2020-06-19

4. Time series modelling to forecast the confirmed and recovered cases of COVID-19.

Authors: Mohsen Maleki; Mohammad Reza Mahmoudi; Darren Wraith; Kim-Hung Pho
Journal: Travel Med Infect Dis Date: 2020-05-13 Impact factor: 6.211

5. Outbreak Trends of Coronavirus Disease-2019 in India: A Prediction.

Authors: Sunita Tiwari; Sushil Kumar; Kalpna Guleria
Journal: Disaster Med Public Health Prep Date: 2020-04-22 Impact factor: 1.385

6. A novel grey model based on traditional Richards model and its application in COVID-19.

Authors: Xilin Luo; Huiming Duan; Kai Xu
Journal: Chaos Solitons Fractals Date: 2020-11-17 Impact factor: 5.944

7. Forecasting the U.S. oil markets based on social media information during the COVID-19 pandemic.

Authors: Binrong Wu; Lin Wang; Sirui Wang; Yu-Rong Zeng
Journal: Energy (Oxf) Date: 2021-03-18 Impact factor: 7.147

8. Short-term prediction of COVID-19 spread using grey rolling model optimized by particle swarm optimization.

Authors: Zeynep Ceylan
Journal: Appl Soft Comput Date: 2021-06-09 Impact factor: 6.725

9. Modified SEIR and AI prediction of the epidemics trend of COVID-19 in China under public health interventions.

Authors: Zifeng Yang; Zhiqi Zeng; Ke Wang; Sook-San Wong; Wenhua Liang; Mark Zanin; Peng Liu; Xudong Cao; Zhongqiang Gao; Zhitong Mai; Jingyi Liang; Xiaoqing Liu; Shiyue Li; Yimin Li; Feng Ye; Weijie Guan; Yifan Yang; Fei Li; Shengmei Luo; Yuqi Xie; Bin Liu; Zhoulang Wang; Shaobo Zhang; Yaonan Wang; Nanshan Zhong; Jianxing He
Journal: J Thorac Dis Date: 2020-03 Impact factor: 3.005

10. Modeling and prediction of COVID-19 in Mexico applying mathematical and computational models.

Authors: O Torrealba-Rodriguez; R A Conde-Gutiérrez; A L Hernández-Javier
Journal: Chaos Solitons Fractals Date: 2020-05-29 Impact factor: 5.944