Literature DB >> 35692385

A novel grey model based on Susceptible Infected Recovered Model: A case study of COVD-19.

Huiming Duan1, Weige Nie1.   

Abstract

The COVID-19 pandemic has lasted for nearly two years, and the global epidemic situation is still grim and growing. Therefore, it is necessary to make correct predictions about the epidemic to implement appropriate and effective epidemic prevention measures. This paper analyzes the classic Susceptible Infected Recovered Model (SIR) to understand the significance of model characteristics and parameters, and uses the differential and difference information of the grey system to put forward a grey prediction model based on SIR infectious disease model. The Laplace transform is used to calculate the model reduction formula, and finally obtain the modeling steps of the model. It is applied to large and small numerical cases to verify the validity of different orders of magnitude data. Meanwhile, data of different lengths are modeled and predicted to verify the robustness of model. Finally, the new model is compared with three classical grey prediction models. The results show that the model is significantly superior to the comparison model, indicating that the model can effectively predict the COVID-19 epidemic, and is applicable to countries with different population magnitude, can carry out stable and effective simulation and prediction for data of different lengths.
© 2022 Elsevier B.V. All rights reserved.

Entities:  

Keywords:  Buffer operator; COVID-19; Grey prediction model; Laplace transform; SIR infectious disease

Year:  2022        PMID: 35692385      PMCID: PMC9169490          DOI: 10.1016/j.physa.2022.127622

Source DB:  PubMed          Journal:  Physica A        ISSN: 0378-4371            Impact factor:   3.778


Introduction

The COVID-19 emerged at the end of December 2019. Due to the rapid spread of COVID-19 and its high infection rate, the virus swept the world in just a few months. Thanks to the correct understanding of the epidemic situation and the implementation of effective control measures, the epidemic has been brought under control in some countries which were early epidemic centers, but the global situation is still grim. According to a WHO report dated July 19–25, 2021 [1], the global number of new cases reported this week (19–25 July 2021) was over 3.8 million, an 8% increase as compared to the previous week; the Region of the Americas reported the largest increase in case of incidence as compared to the previous week, followed by the Western Pacific Region (30% and 25%, respectively). The number of deaths reported this week increased sharply with over 69 000 deaths, a 21% increase compared to the previous week (See Fig. 1 for details). According to the report, the cumulative number of cases reported globally is now nearly 194 million, and the number of cumulative deaths exceeds 4 million. If these trends continue, the cumulative number of cases reported globally could exceed 200 million in the next two weeks. In fact, as of August 6, 2021, the cumulative number of cases has already exceeded 200 million (see Fig. 2).
Fig. 1

COVID-19 cases reported weekly by WHO Region, and global deaths, as of 25 July 2021 (Ref. [1]).

Fig. 2

COVID-19 cases and deaths per 100 000 population reported, 19–25 July 2021 (Ref. [1]).

The Americas and Europe reported the highest weekly incidence of cases per 100,000 population, with 123.3 and 108.3 new cases reported per 100,000 population, respectively, for the week 19–25 July 2021. In the past week, the Americas and South-East Asia had the highest number of deaths per 100,000 population, reporting 2.8 and 1.1 new deaths per 100,000 population, respectively. The highest numbers of new cases were reported from the United States of America (500 332 new cases; 131% increase), Brazil (324 334 new cases; 13% increase), Indonesia (289 029 new cases; 17% decrease), the United Kingdom (282 920 new cases; 5% decrease), and India (265 836 new cases; similar to the previous week). COVID-19 cases reported weekly by WHO Region, and global deaths, as of 25 July 2021 (Ref. [1]). COVID-19 is not the first global pandemic in human history. In the 14th century, the plague pandemic killed 25 million Europeans. The Spanish flu pandemic of the 20th century is the deadliest plague in history, with an estimated 700 million cases worldwide and more than 40 to 50 million deaths. In the 21st century, SARS, avian flu, and H1N1 also caused significant losses to many countries and even the world. Drawing lessons from history, countries are stepping up prevention and control measures in the face of a global epidemic. In response to different epidemic situations, reasonable and practical measures can be taken to ensure that the purpose of epidemic control is achieved, while minimizing social and world economic losses, to ensure people’s life and social operation. Therefore, as the global epidemic situation is still grim, it is of great significance to predict the epidemic situation of various countries more accurately and provide reference for the government to formulate relevant measures in advance. COVID-19 cases and deaths per 100 000 population reported, 19–25 July 2021 (Ref. [1]).

Literature review

At present, many scholars have conducted researches in the field of epidemic prediction, and the prediction methods are mainly divided into three categories: 1. Machine learning models. For example, Tiwari [2] et al. used machine learning methods to predict the development trend of the COVID-19 epidemic in India by analyzing the epidemic in China. Arora [3] et al. trained the data of COVID-19 cases in India through recursive neural network to screen out the LSTM variable with the minimum error and finally applied it to predict the epidemic in India. 2. Time series models. For example, Singh [4] et al. used discrete wavelet decomposition to decompose the data of deaths in five countries with severe COVID-19 into component series, and then used the ARIMA model to predict deaths in the next month. Maleki et al. [5] proposed an improved autoregressive time series model and applied this model to predict the number of confirmed and recovered COVID-19 cases worldwide. 3. Combined model. For example, Yang [6] et al. proposed a dynamic SEIR model to predict the peak and scale of the epidemic in China through AI training on population migration and epidemic data. Torrealba-rodriguez et al. [7] combined the Gompertz model and Logistic model with the Artificial Neural Network model to predict the confirmed COVID-19 cases in Mexico. In addition, Wu [8], Yousaf [9], Melin [10], and other scholars used different methods to research epidemic prediction and obtained some results, as shown in Table 1.
Table 1

Summary of references.

DescriptionRefMethodsApplication
[3]World Health Organization: COVID-19 weekly epidemiological update.

Machine learning model[2]Machine learningIndia
[3]RNN, LSTMIndia
[8]CNNOil

Time sequence model[4]ARIMA, Discrete wavelet decompositionFrance, Italy, Spain, UK, USA
[5]TP-SMN-ARGlobal
[9]ARIMAPakistan

Comprehensive model[6]A dynamic SEIR model, AIChina
[7]Gompertz and Logistic model, ANNMexico
[10]Fuzzy clustering, Neural networkMexico

[11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24]Literatures about the development and application of the grey predictive model.
Grey predictive model[25]GM(1,1) model with rolling mechanismGermany, Turkey, USA
[26]Grey fractional-order modelChina
[27]Grey Verhulst model based on the rolling mechanismChina
[28]GM(1,1), NGBM(1,1), FANGBM(1,1)Italy, Britain, USA
[29]A grey multivariable prediction model with quadratic polynomialChina
[30]A grey prediction model with modified grey action
China, Italy, Britain, Russia
[31]GM(1,1) model with the principle of self-memoryIncidence prediction of tuberculosis in China
[32]GM(1,1) model with BP model
Prediction of mumps incidence in China
[33], [34], [35], [36], [37]Relevant literatures on calculation methods and comparative models cited in this paper.
Different prevention and control measures need to be implemented at different stages of the epidemic, so it is necessary to predict the epidemic quickly and effectively at each stage. The above machine learning models and combination models can achieve good results in epidemic prediction, but generally need a large amount of data. The time series models can show good simulation performance for many kinds of data, but the prevention and control measures need to be effective for a period of time, so it is essential to predict the trend of the epidemic in the future accurately. At present, although the COVID-19 pandemic until now has accumulated a large amount of data. However, the epidemic is affected by various factors, such as the prevention and control measures and movement of people, and historical data accumulated over time may not be applicable to current situations. So we need to find a model do not need a large amount of training data, and the grey prediction model is a kind of prediction model with simple a structure and strong applicability. The Grey prediction model was proposed by Deng [11] in 1982, which is an important part of grey theory, including the classic univariate grey prediction model(GM(1,1) model) and multi-variable grey prediction model(GM(1,N) model). Because of its simple structure, easy to learn and use, suitable for small samples, the grey prediction model has become a research hotspot in recent years. It has been widely used in various fields, for example, ecological environment  [12], [13], traffic flow prediction [14], [15], energy consumption prediction [16], [17], [18], etc. In order to improve the performance of the multi-variable grey prediction model, scholars have done in-depth research on GM(1,N) model from the modeling mechanism, model structure, parameter optimization, and other aspects and obtained many achievements. For example, Xiao et al. [19] used Box–Cox transform to optimize the constraints of the classical NGBM(1,1) model and improved the nonlinearity of model. Liu et al. [20] proposed a general form of the fractional grey polynomial model and used this model to predict the electricity consumption of China and India effectively. Yu et al. [21] introduced elastic network, regularization method, and grey Wolf algorithm to optimize the grey Bernoulli model. Zhu [22] derived and solved the grey model through a new derivation method based on convolution integral, further improving the model accuracy. Duan et al. [23] reduced the dependence of the model on data types and proposed a new multivariable Verhulst model, the application showed that the model could effectively predict coal consumption. Yan et al. [24] introduced the Hausdorff derivative into the grey fractional-order model and proved the relationship between the order and the error. Summary of references. Due to its excellent and stable prediction performance for small sample data, many scholars applied the grey prediction model to the prediction of COVID-19. For example, Zeynep [25] used a classical GM(1,1) model with the rolling mechanism to predict the number of confirmed cases in Germany, Turkey, and the United States. Liu et al. used a grey fractional model to predict the epidemic of COVID-19 in China. Zhao et al. [27] proposed a grey Verhulst model based on a rolling mechanism and used data series of different lengths to predict the epidemic in China. Utkucan et al. [28] used GM(1,1), NGBM(1,1), and FANGBM(1,1) to predict the number of confirmed COVID-19 cases in Italy, the United Kingdom and the United States. Zhang et al. [29] proposed a grey multivariable prediction model with a quadratic polynomial to analyze the early data of the COVID-19 epidemic in China and predict it. Luo et al. [30] proposed a new grey prediction model and optimized the model by genetic algorithm, finally applied it to epidemic prediction in China, Italy, The UK, and Russia. The grey prediction model has achieved many effective results in the prediction of COVID-19. In addition, scholars also applied the grey prediction model to predict other epidemic diseases. For example, Guo et al. [31] introduced the self-memory into the traditional grey prediction model and applied the model to predict the incidence of tuberculosis in China. Jia et al. [32] applied GM(1,1) model and BP neural network model to predict the incidence of mumps in China, the new model combining the advantages of both has a good effect. All of the above grey prediction models applied to the COVID-19 epidemic are univariate grey prediction models, which take confirmed cases as the original sequence and directly apply them to the grey prediction model for prediction, so there are some limitations. First, the epidemic is related to the number of people cured and affected by various factors such as prevention and control measures and movement of people. Therefore, the changing trend of each sequence will affect the system. Combined with the actual situation to consider, the number of confirmed and cured is two related sequences. So, only considering the trend change of confirmed cases is not enough, which may affect the validity of the model. This is the first limitation of the current grey prediction model applied to COVID-19 prediction. Second, current grey prediction models basically take confirmed COVID-19 cases as the original sequence and directly apply them to the grey prediction model, ignoring the principle of infectious disease and the influence of confirmed and cured cases on the entire infectious disease system. Therefore, most of the grey prediction models use univariate models. This is the second limitation. Finally, the existing grey multi-variable models take the main sequence as the main research object, but other sequences are generally linear and may have impact on the description of the system, which is the limitation of the development of the multivariable grey prediction model. Based on the current researches, this paper firstly analyzed the classical infectious disease model and established the SIR infectious disease differential equation according to the model characteristics. Through the relationship between differential and difference, a dynamic grey prediction model based on the SIR model is proposed, and then the model is calculated and verified by some practical cases. Therefore, this paper has the following contributions: From the SIR infectious diseases model, the relationship between the number of confirmed cases and cured cases is studied, as well as the structural characteristics of the SIR model, and the dynamic differential equation with multiple variables is established. By the grey difference information of differentiation and difference, a novel multivariate grey prediction model based on SIR is established and is solved by the Laplace transform. Then, the properties of the model are studied and the modeling steps are obtained. The new model has the time derivative of the number of diagnosed and the number of patients cured, establishes the dynamic relationship between the two variables, and further analyzes the whole system, which makes the grey prediction model no longer dominated by only a single sequence. The function of main sequence and related sequences is analyzed in essence, and the knowledge of grey system theory is extended. At the same time, a buffer operator is introduced to preprocess the original data, which can reduce the interference of irregular oscillation of data. The new model is applied to practical cases of The UK and Cuba, respectively. To verify the validity of the model for different orders of magnitude data; At the same time, the robustness of the model is verified by using different length data modeling and prediction. In six cases, the proposed model shows high prediction and simulation accuracy, which is obviously superior to the comparison models. The remaining sections are as follows: Section 3 establishes the SIR grey dynamic prediction model based on the buffer operator, and the properties and modeling mechanism of the model are proved and discussed. Section 4 applies the model to practical cases. Section 5 is the conclusion.

A novel grey prediction model based on SIR

This section firstly introduces the classic SIR infectious disease model and establishes the dynamic differential equation model by analyzing the relationship among the three variables S, I, R. Then, according to the grey difference information and the principle of differential and difference, the grey prediction model based on SIR(SIRGM(1,N) model) is established. At the same time, considering the volatility of the original data, the average weakening buffer operator (AWBO) [35] is used to reduce the interference of the original data, so as to establish the SIR grey prediction model based on the buffer operator(D-SIRGM(1,N) model). The related properties of the model is studied, and obtains the modeling process. Fig. 3 is the flow chart of modeling ideas and methods (see Fig. 4).
Fig. 3

The flow chart of modeling ideas and methods.

Fig. 4

The flow chart of D-SIRGM(1,N) model.

The flow chart of modeling ideas and methods. The flow chart of D-SIRGM(1,N) model.

A classic SIR infectious disease model

SIR model of infectious diseases was proposed by Kermack and McKendrick [33] in 1932, which is also the most classic and basic model of infectious disease models and has made a foundational contribution to the study of infectious disease dynamics. In this model, the population is divided into three categories: Susceptible (S), Infective (I), Removal (R). For infectious diseases with short onset time and short duration, it can be regarded as a short-term model, ignoring birth rate and death rate, and the total number of people in the short term remains unchanged. Therefore, let the total number of people be a constant for any time t: So, the differential equation of SIR model is as followed: where represents the probability of infection of susceptible persons; represents the probability that the infected person will recover after a certain period of time and will become immune to the disease and will not be re-infected. For some long-term infectious diseases, the birth and death of the population must be considered, and the mortality rate is set as , the natural birth rate and death rate are respectively , . Assuming that there is no vertical transmission, the freshmen are all susceptible groups. The following model is obtained from Eqs. (1), (2): From Eq. (3), a equation can be obtained: In the model of infectious disease, the most noteworthy is the development trend of the disease, namely, the change of infected cases. Therefore, the following equation can be obtained from Eq. (4): In order to better consider the quantitative relationship between S, I, R, the following differential equation is given based on the structural characteristics of SIR model in Eq. (5): In the actual situation, it is impossible to diagnose everyone every day to obtain a completely accurate number of the infected and cured. Therefore, in the actual calculation process, I in this model is replaced by the confirmed, R by the cured number, and S is the remaining number. For Eq. (1) can obtain: As a matter of fact, when infectious diseases occur, countries are usually taken as the unit, and the total number of people in a country is usually tens of millions or hundreds of millions. According to the above formula, when the total number of people is large, the number of S is large relative to I and R, and the change is mainly influenced by I and R. Therefore, if S is considered, on the one hand, its value is too large and the order of magnitude is different from the other two factors, which may have a great impact on the model results; On the other hand, S is mainly affected by other two factors, so I and R can represent S,I and R to a large extent. Therefore, the above model is simplified as followed:

Establishment of grey prediction model based on SIR

Set the original sequence of newly cured and newly diagnosed as followed: Assume that the observation period is , is day of observation, and are the newly cured and newly confirmed patients on day respectively. Set as the total number of people cured during this period, as the total number of people diagnosed, so: Let the1-AGO sequence [34] of as follow: where Let be the adjacent mean-generating sequence [34] of : where According to Eq. (7), the following equation is obtained: Let the first-order difference replace the differential part in Eq. (14), when : Similarly: Therefore, the following grey prediction model can be defined: The original sequences and 1-AGO sequences are as Definition 3.1, the adjacent to the mean generation sequences are as Definition 3.2, so is grey prediction model based on SIR(SIRGM(1,N)). Where, Eq. (14) is the whitening equation of this model. Here are related properties of SIRGM(1,N), by the following theorems: Set parameters as , so Then the least square estimation parameters satisfy: It is easy to prove the validity of Theorem 3.1 by and least square estimation, which can be referred to literature [13]. Set as described in Theorem 3.1 , then the SIRGM(1,N) model has the following conclusions: (1) The time response equation is: (2) The reducing value is: To prove Theorem 3.2 , the Laplace transform and the convolution theorem is given as lemma.

Laplace Transform

For , a continuous time function whose value is not zero can through the following equation: transforms into of the complex variable , where is the operator, stands for the Laplace transform of , is the result of the Laplace transform of ; On the contrary, the inverse Laplace transform is the inverse of the above process, and the operator is .

Convolution Theorem

For continuous time functions and that satisfy the Laplace transform, their Laplace transform are as follow: Convolution of and is defined as: So, The Laplace transform of must exist, and Then Let the Laplace transform of are: Through Eq. (19) can obtain: similarly: Take the Laplace transform of Eq. (14), and substitute Eq. (21)–(24): then Take the inverse transformation of Eq. (25), so: where . In Eq. (26) cannot be directly compute by the inverse Laplace transform, so use the convolution theorem to compute it. According to Lemma 3.2, set then so Substitute Eq. (27) into Eq. (26) can get: Because the data sequence is , , So the initial value of above process are replaced by . Therefore, the solution of whitening differential Eq. (14) is: The integral term in Eq. (29) contains discrete value , so cannot be directly integrated, the trapezoidal formula is used for approximate calculation, so: Substitute Eq. (30) into Eq. (29) can get: Take the initial value as , the time response formula is as follows: where ; According to Eq. (32), the reductive formula of SIRGM(1,N) is: Theorem 3.2 is proved.

A SIR grey prediction model based on buffer operator

Set original sequence is , the buffer sequence is , where when is monotone growth sequence, monotone decay sequence or oscillation sequence, D is a weakening buffer operator, and D is called average weakening buffer operator (AWBO) [35]. The buffer operator is used for the original sequences to obtain the buffer sequences , so as to obtain the new SIR grey prediction model based on the buffer operator (D-SIRGM(1,N)): According to Theorem 3.2, the time response formula of D-SIRGM(1,N) is: The reductive formula is:

Steps and flowcharts of the model

is the original buffer sequence and prediction sequence, where the first m term is the simulated value and the last n-m term is the predicted value, then the absolute percentage error(APE) of a single value, the mean absolute percentage error of simulation(MAPEs) of the first m term and the mean absolute percentage error of prediction(MAPEp) are respectively as following: According to the above definition and theorem, the calculation steps of D-SIRGM model are obtained.

Application of D-SIRGM model in COVID-19 in Cuba and the UK

This section is mainly about the application of the D-SIRGM(1,N) model in COVID-19 prediction. Firstly, in order to verify the effectiveness of this model for data with different magnitudes, two typical countries, Cuba and the UK, are selected to conduct numerical experiments on behalf of large and small numerical samples, respectively. Secondly, to verify that the model has stable simulation and prediction performance for both small sample and large sample data, the data selected in each country are modeled with different lengths. Case 1,2,3 are modeled with data of two weeks, three weeks, and four weeks, respectively, and then tested with data of the next week. Finally, the results obtained by D-SIRGM(1,N) are compared and analyzed with classic multi-variable grey model GM(1,N) [35], NGM(1,N) [36] and GMVM(1,N) [37].

Data interpretation

As COVID-19 is circulating the world, it is necessary to build a prediction model to predict and prevent it. This model needs to have strong universality to adapt to the data of different countries. Therefore, a country with a small numerical sample and a large numerical sample are selected in this paper to verify whether the proposed model is highly sensitive to data of different levels and whether it can effectively reflect data characteristics and trend changes. Cuba has a population of 11.4 million, and the UK has a population of 66 million. Both countries enjoy social stability and sound medical systems. Therefore, the epidemic is hardly affected by other unstable factors, which can better reflect the actual trend of the epidemic. In addition, the number of newly confirmed cases in Cuba and the UK in the second half of 2020 was between tens to tens of thousands. Therefore, the epidemic data of these two countries in the second half of 2020 were selected as numerical cases, data from the Global COVID-19 Big Data Platform (https://www.zq-ai.com/#/fe/xgfybigdata). In the modeling process of grey prediction model, the original data must be greater than zero. So, the zero value should be removed before data modeling. However, in reality, due to the information is not uploaded timely or the epidemic is relatively stable, there will be a large number of cases are zero. Therefore, in the selection of original data, the period when epidemic information is relatively continuous is selected to reduce the influence of zero value. In addition, because the epidemic data are reported and obtained by each region in turn, the immediacy and continuity of information cannot be ensured, so there are irregular oscillations in the original data, which affects the effect of the model. Therefore, in this paper, the weakened average buffer operator is used to make the sequence more gentle, and the principle of new information first is considered, that is, the latest information remains unchanged under the action of the buffer operator, to effectively reduce the interference caused by the original data.

Application of D-SIRGM model in COVID-19 prediction in Cuba

In the case of Cuba, daily new diagnoses and new cures of 5 July and 25 September 2020 were used. The quarantine observation period is usually 14–28 days during an epidemic, so in the following numerical examples, the data of 14–28 days are used for modeling in a 7-day cycle. Data of newly diagnosed and cured cases in two, three, and four weeks were selected as modeling data to predict the newly diagnosed cases in the next week. Then, the results of the D-SIRGM (1,N) model were compared and analyzed with GM(1,N),NGM(1,N) and GMVM(1,N). Case 1 Simulate two weeks and predict one week In this case, 14 data in Cuba from September 4 to September 17 were selected for modeling, and 7 data from September 18 to September 25 were tested. The number of confirmed cases on September 24 was zero, so it was not involved in the calculation. Through calculation, the model parameters are as follows: The time response formula is: The results were compared with NGM(1,N) model, GM(1,N) model and GMVM(1,N) model. The specific data are shown in Table 2.
Table 2

Simulation and prediction results of each model for two-week data in Cuba.

DayBufferD-SIRGM(1,N)
NGM(1,N)
GM(1,N)
GMVM(1,N)
dataValuesAPE (%)ValuesAPE (%)ValuesAPE (%)ValuesAPE (%)
9.458.2958.290.000058.290.000058.290.000058.290.0000
9.556.8057.731.638057.060.463947.4416.486457.210.7251
9.657.0558.181.972456.560.870863.0810.566358.121.8694
9.758.4459.622.011856.084.037658.700.445058.990.9322
9.861.2460.121.816455.509.365060.151.772159.802.3382
9.962.3861.092.053954.8712.028260.662.744560.572.8965
9.1064.8762.174.155954.2116.425560.995.969461.285.5292
9.1163.6462.651.563053.4216.058363.510.213761.892.7482
9.1261.4663.543.382252.5714.473864.575.060362.441.5989
9.1363.0864.702.556451.6818.082264.782.689262.940.2332
9.1463.3665.092.727450.6320.090167.726.870663.250.1809
9.1566.6065.811.192149.4825.700569.524.388763.474.7034
9.1669.3368.191.651348.4830.080865.585.419863.887.8674
9.1768.3869.872.190747.5030.531664.345.898264.146.1952
MAPE (%)2.224015.24685.27112.9091
9.1867.7169.072.002546.1631.830571.906.176663.675.9764
9.1969.5071.472.836244.9535.318768.121.982463.868.1101
9.2069.2069.981.128243.2737.467978.1212.886962.659.4614
9.2173.7568.057.730041.0544.336489.2921.072660.7217.6641
9.2269.6776.9910.515139.9642.637062.3910.441663.399.0078
9.2364.0075.6218.161938.3940.008972.5813.404461.503.9134
9.2580.0077.103.628036.7854.020072.589.276560.7224.1006
MAPE (%)6.571740.802810.748711.1763
As can be seen from Table 2, except for NGM(1,N), the other three models have better performance. The MAPEs of D-SIRGM (1,N) and GMVM(1,N) are close, which are 2.2240% and 2.9091%, respectively. Meanwhile, the MAPEp of the three models are about 10%, among which D-SIRGM(1,N) is the lowest. As a whole, the performance difference between models is not apparent, so Fig. 5 is drawn below to further compare and analyze in detail.
Fig. 5b

Comparison of APE of each model for two-week of data in Cuba.

Simulation and prediction results of each model for two-week data in Cuba. Simulation and prediction results of each model for three-week data in Cuba. As shown in Fig. 5a, the NGM(1,N) model shows a monotonically decreasing trend, which is contrary to the actual data, so it has the worst effect among the comparison models. Therefore, the results of the remaining three models are mainly discussed. The curve of GMVM(1,N) is almost all below the actual data and shows a slight downward trend in the prediction part, so the prediction accuracy is relatively low. The curves of D-SIRGM(1,N) and GM(1,N) show fluctuate up and down near the actual data curve, which is consistent with the changing trend of the real data. However, the fluctuation range of the GM(1,N) model is relatively large, so the simulation and prediction effect of D-SIRGM(1,N) is better.
Fig. 5a

Comparison of the results of each model for two-week of data in Cuba.

Comparison of the results of each model for two-week of data in Cuba. Comparison of APE of each model for two-week of data in Cuba. Case 2 Simulate three weeks and predict one week In this case, the data for three weeks are selected for modeling, and the number of newly diagnosed patients in the next week is predicted. 21 data from 5 July to 2 August in Cuba were selected for modeling, and 7 data from 4 August to 11 August were tested. In the same way got calculation results and compared them with the above model. The specific results are shown in Table 3.
Table 3

Simulation and prediction results of each model for three-week data in Cuba.

DayBufferD-SIRGM(1,N)
NGM(1,N)
GM(1,N)
GMVM(1,N)
dataValuesAPE (%)ValuesAPE (%)ValuesAPE (%)ValuesAPE (%)
7.524.1824.180.000024.180.000024.180.000024.180.0000
7.624.9624.262.807424.123.358811.1055.546924.193.1051
7.825.6225.131.903723.338.930123.617.836925.082.0734
7.925.8826.070.719222.4813.152632.6125.988626.141.0069
7.1126.7927.020.836421.5519.560139.6748.081327.392.2412
7.1327.2228.083.154520.5724.436244.0561.862828.815.8691
7.1428.0929.264.155619.5330.481146.2164.501930.388.1627
7.1629.2430.484.258718.4237.013748.2565.038732.2010.1165
7.1730.4031.734.386817.2143.390150.6266.523834.3412.9476
7.1931.6833.275.009415.9649.619148.1651.999136.2814.4935
7.2033.3934.794.184314.6256.217849.1847.300838.8016.1990
7.2135.2936.383.076213.1762.690750.2042.232541.6618.0452
7.2237.3138.182.314511.6368.819048.7530.659044.4319.0717
7.2338.9340.022.79599.9874.356949.4426.998047.8722.9515
7.2541.4342.091.60428.2480.114447.6414.988151.0623.2367
7.2644.3844.170.47766.3585.700749.5511.634155.5225.0845
7.2845.9246.441.14254.3290.586649.567.934559.9330.5198
7.2944.6448.769.24702.1395.229051.8616.174265.6547.0850
7.3045.8051.1311.6381−0.26100.568855.7321.684472.8959.1391
7.3149.8953.567.3599−2.87105.744860.1020.459681.5563.4731
8.254.7555.962.2161−5.73110.464166.9222.220393.1370.1076
MAPE (%)3.664458.021735.483322.7465
8.457.1458.732.7843−8.78115.361366.0915.6655101.8678.2630
8.562.6761.372.0710−12.12119.334873.2116.8246117.2287.0482
8.764.0064.731.1463−15.58124.349564.721.1240122.1290.8108
8.867.7568.160.6090−19.28128.459864.285.1178133.1796.5672
8.972.3372.360.0350−23.09131.923952.2527.7688132.8083.5993
8.1079.0076.882.6890−27.08134.282045.5142.3865135.7171.7873
8.1193.0081.5712.2883−31.34133.701845.3551.2410145.4156.3576
MAPE (%)3.0890126.773322.875480.6333
As shown in Table 3, D-SIRGM (1,N) has the smallest MAPEs and MAPEp, which are both below 5%. While the other three comparison models have poor simulation and prediction effects, all of which are above 20%. Table 3 is drawn as Fig. 6 below to compare further and analyze the effects. As the poor effect of the NGM(1,N) model will affect the observation, it is not drawn in Fig. 6a.
Fig. 6a

Comparison of the results of each model for three-week of data in Cuba.

It can be seen from Fig. 6a that the actual data curve shows a steady upward trend. D- SIRGM(1,N) and GMVM(1,N) are consistent with the actual data, but GMVM(1,N) increased exponentially, making the curves above the original data curve and gradually deviate. GM(1,N) fluctuates in an unstable state, and the predicted part shows an obvious downward trend, which is inconsistent with the actual situation. In Figs. 6a–6b, D-SIRGM(1,N) curve is basically consistent with the original data curve and has good effects in both prediction and simulation stages. Therefore, D-SIRGM(1,N) has good performance in this case.
Fig. 6b

Comparison of APE of each model for three-week of data in Cuba.

Comparison of the results of each model for three-week of data in Cuba. Comparison of APE of each model for three-week of data in Cuba. Case3 Simulate four weeks and predict one week In this case, the data of four weeks are selected for modeling, and the number of newly diagnosed patients in the next week is predicted. 21 data from August 16 to September 14 in Cuba were selected for modeling, and 7 data from September 15 to September 21 were tested. The calculated results are compared with the above models, and the specific data are shown in Table 4.
Table 4

Simulation and prediction results of each model for four-week data in Cuba.

DayBufferD-SIRGM(1,N)
NGM(1,N)
GM(1,N)
GMVM(1,N)
dataValuesAPE (%)ValuesAPE (%)ValuesAPE (%)ValuesAPE (%)
8.1654.6354.630.000054.630.000054.630.000054.630.0000
8.1853.6853.400.519953.290.710934.1736.342753.610.1150
8.1953.8553.840.012154.751.669260.3212.024753.720.2420
8.2054.1654.190.059656.103.588661.8614.232053.870.5277
8.2153.5254.471.787957.407.250859.1410.501454.071.0391
8.2252.5354.774.251558.6911.724957.589.600754.333.4259
8.2353.7655.052.397459.9511.519855.924.028154.641.6419
8.2454.4355.311.614161.1812.397054.960.984455.001.0512
8.2554.0455.582.847562.3915.466455.051.875955.432.5699
8.2654.7755.872.006463.6116.148855.521.376055.922.0953
8.2755.8856.210.589264.8616.064856.671.407756.491.0997
8.2857.5856.541.804466.0714.731556.501.880557.090.8482
8.3055.4356.842.541767.2321.278756.081.166557.744.1499
8.3155.2757.133.358668.3723.701756.211.690858.455.7553
9.155.6257.403.205569.4924.938056.201.051859.226.4795
9.255.4557.673.999770.5927.296356.361.641160.068.3147
9.356.6358.012.435571.7426.681658.393.113361.137.9384
9.456.3958.353.469572.8529.199458.002.855362.0710.0837
9.554.5358.767.765074.0435.784060.4310.824163.3916.2475
9.654.6959.328.469775.3337.737963.3615.865264.9218.7145
9.756.2059.826.437576.5136.146461.8310.025566.0217.4658
9.859.4360.401.633377.8030.910464.778.980667.8114.0988
9.960.6961.020.547079.0930.307165.598.066969.4414.4198
9.1063.6761.653.171680.3626.224366.033.709271.1211.7144
9.1162.0062.390.636681.7531.861169.4812.062173.5318.5922
9.1259.0063.227.144483.1840.977871.0220.371775.7728.4265
9.1360.8964.035.159584.5838.912771.4017.263777.9127.9537
9.1461.0065.016.566286.1441.219575.9124.436881.2633.2186
MAPE (%)3.127122.75748.79189.5640
9.1565.2966.121.284287.8034.484979.0321.046284.6129.5990
9.1669.1766.973.176289.1728.919073.135.735385.4523.5442
9.1767.6067.620.028490.4333.773571.335.519687.3629.2309
9.1866.2568.853.928792.2839.290686.0429.878795.2243.7282
9.1969.3369.970.913393.8535.359980.4115.979396.4139.0577
9.2068.5072.275.505896.6541.1004110.7461.6645112.4264.1190
9.2186.0078.089.2088102.5119.1972185.09115.2174152.4677.2825
MAPE (%)3.435133.160836.434443.7945
It can be seen from Table 4 that the MAPEs of D-SIRGM(1,N), GM(1,N) and GMVM(1,N) are all below 10%, among which D-SIRGM(1,N) is the least. Except for D-SIRGM(1,N), the MAPEp of the other three models are all above 30%. In order to better observe the changes of single data and the overall trend, Table 4 is drawn as Fig. 7. As the data of the three comparison models on the last day is too large, which affects the observation of the overall trend, it is not included in Fig. 7a.
Fig. 7a

Comparison of the results of each model for four-week of data in Cuba.

Simulation and prediction results of each model for four-week data in Cuba. It can be seen from Fig. 7a that GM(1,N) and GMVM(1,N) had better effects in the simulation stage, but they increased sharply in the prediction stage. From Fig. 7b, D-SIRGM(1,N) is close to the actual data in both the simulation and prediction stage, and the actual data curve fluctuates near the D-SIRGM(1,N) curve, which indicates that D-SIRGM(1,N) can better fit the actual data.
Fig. 7b

Comparison of APE of each model for four-week of data in Cuba.

Comparison of the results of each model for four-week of data in Cuba. Comparison of APE of each model for four-week of data in Cuba. Based on the above three Cuba cases, it can be concluded that D-SIRGM(1,N) can not only fit the increase and decrease trend of data and capture the fluctuation characteristics of data well but also has a stable simulation and prediction performance for cases with different data volumes. When the length of the simulated data is two weeks, three weeks, and four weeks, the simulation accuracy is 2.2240%, 3.6644% and 3.1271%, respectively, which are all below 5%. Even in the case of a small number of samples, the simulation accuracy is still high, and the prediction performance becomes more stable with the increase of samples, 6.5717%, 3.0890%, and 3.4351%, respectively, indicating that D-SIRGM(1,N) also has stable performance in small sample data.

Application of D-SIRGM model in COVID-19 prediction in UK

In the UK cases, daily new diagnoses and new cures were used between 10 August and 10 November 2020. Data of two, three and four weeks were selected for modeling to predict the number of newly confirmed cases in the next week. Then, the results obtained by D-SIRGM(1,N) model were compared with GM(1,N), NGM(1,N) and GMVM(1,N). Case 1 Simulate two weeks and predict one week In this case, the data of two weeks are selected for modeling, and the number of newly diagnosed patients in the next week is predicted. 14 data from October 21 to November 3 in the UK were selected for modeling, and 7 data from November 4 to November 10 were tested. The results generated by the model were compared with NGM(1,N), GM(1,N) and GMVM(1,N), the simulation and prediction results of the two-week data in the UK of each model are shown in Table 5.
Table 5

Simulation and prediction results of each model for two-week data in UK.

DayBufferD-SIRGM(1,N)
NGM(1,N)
GM(1,N)
GMVM(1,N)
dataValuesAPE (%)ValuesAPE (%)ValuesAPE (%)ValuesAPE (%)
10.2122505.9522505.950.000022505.950.000022505.950.000022505.950.0000
10.2222562.3522491.020.316222451.980.489216919.3125.010922369.540.8546
10.2322343.9522511.930.751824167.268.160225002.2711.897322419.660.3389
10.2422404.5622571.070.743225772.4715.032324567.409.653622415.220.0476
10.2522512.8222665.940.680127280.9021.179424481.298.743722347.400.7348
10.2622481.0622741.391.158028658.0427.476424012.886.813822228.901.1217
10.2722660.2022830.940.753529932.4332.092524184.796.728122043.622.7210
10.2822784.3623003.370.961231172.6436.816025785.9113.173721734.324.6086
10.2922775.3823093.941.398732263.5641.659824767.968.748821449.395.8220
10.3022613.5822949.521.485633065.7446.220720349.7110.011121374.495.4794
10.3122571.4522922.011.553133848.4749.961421425.775.075820976.027.0684
11.122386.9022773.721.727934476.6954.003819437.6513.174020785.787.1520
11.222438.5622751.451.394435130.3256.562321192.385.553720214.199.9132
11.322335.6322780.171.990335786.1860.220222474.710.622719565.0512.4043
MAPE (%)1.147234.60579.63134.4820
11.422818.0022969.360.663336530.0760.093225685.3812.566318478.3419.0186
11.523272.5022591.072.928136788.1958.075816197.9930.398619414.3516.5781
11.622886.2022344.012.369137028.2061.792716278.9828.869918990.7817.0208
11.722565.5022029.572.375037185.7164.790114881.9034.050218860.2816.4198
11.822313.0021728.012.621737316.4267.240714578.1934.665018527.2316.9666
11.920984.5021643.253.139237598.7179.173718222.7413.161017126.0118.3873
11.1021385.0021051.611.559037477.9275.25339718.7954.553219000.7611.1491
MAPE (%)2.236566.631429.752016.5058
As can be seen from Table 5, the MAPEs of D-SIRGM(1,N), GM(1,N) and GMVM(1,N) are all below 10%, and D-SIRGM(1,N) is the least. However, only the MAPEp of D-SIRGM(1,N) is less than 10%, which is 2.2365%, showing excellent prediction performance. The MAPEp of the three comparison models are all high, which are 66.6314%, 29.7520%, 16.5058%, respectively. For a more intuitive comparison of models, Fig. 8 is drawn below.
Fig. 8a

Comparison of the results of each model for two-week of data in the UK.

Simulation and prediction results of each model for two-week data in UK. It can be seen from Fig. 8a that the actual data curve is relatively flat, while the NGM(1,N) curve is all above the actual data curve and presents an upward trend. GM(1,N) fluctuated around the actual data curve, but showed an obvious downward trend as a whole. D-SIRGM(1,N) and GMVM(1,N) are relatively close to the actual data curve, but GMVM(1,N) also has an obvious downward trend, while D-SIRGM(1,N) curve is almost consistent with the actual data curve. It can also be seen from Fig. 8b that the error between D-SIRGM(1,N) and the original data is minimal. Therefore, in this case, the simulation performance of D-SIRGM(1,N) is good, and the prediction performance is significantly better than the three comparison models.
Fig. 8b

Comparison of APE of each model for two-week of data in the UK.

Comparison of the results of each model for two-week of data in the UK. Comparison of APE of each model for two-week of data in the UK. Case 2 Simulate three weeks and predict one week In this case, the data of three weeks are selected for modeling, and the number of newly diagnosed patients in the next week is predicted. 21 data from October 13 to November 4 in the UK were selected for modeling, and 7 data from November 5 to November 11 were tested. The calculation results are compared with the above model. The specific data are shown in Table 6.
Table 6

Simulation and prediction results of each model for three-week data in the UK.

DayBufferD-SIRGM(1,N)
NGM(1,N)
GM(1,N)
GMVM(1,N)
dataValuesAPE (%)ValuesAPE (%)ValuesAPE (%)ValuesAPE (%)
10.1322525.1422525.140.000022525.140.000022525.140.000022525.140.0000
10.1422840.0422688.550.663322725.270.502517728.7322.378722646.360.8480
10.1622296.2722616.251.435124174.058.422023126.243.722522621.101.4569
10.1722428.3622591.340.726625531.6413.836422169.561.153922599.580.7634
10.1822710.4622575.140.595926802.5118.018422076.182.792922582.010.5656
10.1922994.1722588.081.766128002.4721.780722656.211.469822569.131.8485
10.2122412.5522578.250.739329115.4929.907122637.501.003722559.720.6567
10.2222461.8122550.660.395630138.5834.177022334.890.565022553.290.4073
10.2322249.3022547.941.342231094.8839.756722570.881.445322552.091.3609
10.2422301.7422581.161.252932004.4143.506423295.474.455922557.851.1484
10.2522398.2822638.131.070832873.0646.766024100.657.600522570.460.7687
10.2622361.6522658.841.329033674.5650.590723965.047.170322583.200.9908
10.2722522.1322697.930.780634425.4652.851724205.437.474022603.080.3595
10.2822628.8022820.180.845735169.4855.419125735.3013.728122641.890.0578
10.2922609.3622826.280.959535827.4158.462824699.429.244222661.560.2309
10.3022447.2322607.830.715536304.4161.732220408.989.080222635.430.8384
10.3122394.7522594.900.893736776.8964.221021376.234.548022672.811.2416
11.122210.9122433.431.001937154.7867.281719432.9312.507322667.962.0578
11.222239.8022434.750.876637553.9368.859120999.005.579222723.052.1729
11.222126.2222449.301.460237957.5871.550222074.800.232422780.092.9552
11.422522.1322594.290.320438417.7070.577624834.1510.265622887.591.6227
MAPE (%)0.958543.91106.32091.1176
11.522869.4322122.303.266938575.2668.676116166.8929.307822720.340.6519
11.622480.3321946.832.373238720.4772.241516136.9628.217522741.961.1638
11.722142.6021698.952.003638814.8075.294714837.6232.990622727.082.6396
11.821847.5021478.001.691338890.5878.009314460.3933.812222735.284.0635
11.920806.6721414.492.921339032.4787.596016765.6719.421722835.149.7491
11.1020918.0020948.630.146438994.8386.417611316.8345.899122666.938.3609
11.1120451.0020750.781.465838998.9890.694712574.2538.515222725.0611.1195
MAPE (%)1.981279.847132.59495.3926
As can be seen from Table 6, the MAPEs of GM(1,N) is 6.3209%, but the MAPEp is as high as 32.5949%, showing poor comprehensive effect. The MAPEs and MAPEp of NGM(1,N) are more than 30%. The simulation and prediction effects of D-SIRGM(1,N) and GMVM(1,N) are the best among the four models, all of which are about 5%. There is little difference between the two models in the simulation stage, and the error of D-SIRGM(1,N) is smaller in the prediction stage. To further intuitively compare the effects of the two models, they are drawn as Fig. 9.
Fig. 9b

Comparison of APE of each model for three-week of data in UK.

Simulation and prediction results of each model for three-week data in the UK. As can be seen in Fig. 9a, in the simulation stage, the actual data showed an obvious trend of fluctuation, both GMVM(1,N) and D-SIRGM(1,N) were basically above the actual data curve. However, GMVM(1,N) was a relatively stable straight line, D-SIRGM(1,N) showed a slight trend of fluctuation. In the prediction stage, the actual data showed an obvious downward trend. Although GMVM(1,N) showed fluctuations, it completely deviated from the actual data, while D-SIRGM(1,N) was consistent with the actual data and showed an obvious downward trend, so the prediction effect was better.
Fig. 9a

Comparison of D-SIRGM(1,N) and GMVM(1,N) for three-week of data in the UK.

Comparison of D-SIRGM(1,N) and GMVM(1,N) for three-week of data in the UK. Comparison of APE of each model for three-week of data in UK. Case3 Simulate four weeks and predict one week In this case, four-week data are selected for modeling to predict the number of newly diagnosed cases in the next week. 21 data from August 10 to September 15 in the UK were selected for modeling, and 7 data from September 16 to September 22 were tested. The calculation results are compared with the above model. The specific data are shown in Table 7.
Table 7

Simulation and prediction results of each model for four-week data in the UK.

DayBufferD-SIRGM(1,N)
NGM(1,N)
GM(1,N)
GMVM(1,N)
dataValuesAPE (%)ValuesAPE (%)ValuesAPE (%)ValuesAPE (%)
8.102557.292557.290.00002557.290.00002557.290.00002557.290.0000
8.122576.562490.093.35592547.401.1318693.3473.09062537.241.5259
8.152588.972542.581.79182495.503.61051610.9437.77692587.980.0383
8.162588.842597.250.32462445.135.55132369.238.48302637.581.8826
8.172638.652652.800.53642396.889.16253001.7513.76122685.741.7850
8.182689.702709.960.75342350.7212.60293526.5731.11382732.181.5792
8.192757.412767.910.38062307.6816.31013978.2944.27632776.350.6866
8.202787.362827.841.45242267.8018.63964360.5956.44172817.911.0960
8.212846.152888.611.49212232.6821.55424714.1465.63242855.850.3407
8.222915.272965.951.73842183.5925.09834640.9559.19462898.790.5652
8.232979.083022.691.46402138.7428.20804875.3763.65372932.661.5581
8.243058.333081.970.77302098.3431.38935089.4966.41372962.163.1448
8.263148.653143.480.16442063.0434.47855297.7468.25442986.185.1599
8.273234.733207.040.85592034.0537.11855517.4070.56763003.137.1597
8.293264.573277.750.40372006.4038.54035603.1371.63463019.007.5222
8.303307.603363.391.68661960.8640.71645198.3457.16343057.207.5704
8.313390.843422.700.93951920.4843.36285350.4757.79173063.619.6506
9.23499.333506.260.19791856.4946.94734766.0536.19873116.0310.9535
9.33626.653566.391.66161784.2350.80214569.2025.98963146.8513.2298
9.53649.313629.790.53511694.3353.57124128.7013.13633200.6812.2935
9.63639.873673.090.91271602.6555.96954109.2512.89553221.1611.5033
9.73685.713743.561.56941472.6460.04473127.4815.14603342.189.3207
9.83742.773764.540.58171345.7864.04333267.0712.70983357.2110.3014
9.93852.333790.031.61731220.8268.30973378.8612.29053370.1112.5176
9.113955.553816.163.52381100.7672.17173576.419.58503365.9714.9051
9.124058.103849.035.1519982.4075.79153689.909.07323365.3317.0714
9.133726.113900.364.6763844.2277.34313223.2213.49653450.277.4030
9.153774.633937.844.3241693.1181.63782959.8521.58563514.436.8931
MAPE (%)1.587639.781738.05026.5799
9.163938.293958.630.5166545.2786.15473150.5020.00323503.1511.0490
9.174075.174014.081.4989359.0291.18992204.6545.90033704.949.0851
9.184089.404029.111.4743164.8395.96942122.5348.09673761.258.0245
9.194259.254045.685.0143−41.17100.96651944.7654.34033841.109.8173
9.204235.674082.263.6217−290.46106.8575914.7778.40314124.052.6351
9.214141.004093.401.1495−575.02113.8860142.1196.56834380.085.7735
9.224383.004070.697.1256−868.60119.8174122.5597.20404473.192.0578
MAPE (%)2.9144102.120262.93086.9203
It can be seen from Table 7 that the value of newly confirmed cases predicted by the NGM(1,N) model is negative, which is inconsistent with the actual situation. The data of GM(1,N) model showed a downward trend, which did not conform to the trend of the epidemic. Therefore, the performance of the two models was poor in this case. However, the MAPEs of D-SIRGM (1,N) and GMVM(1,N) are both less than 10%, and D-SIRGM(1,N) is better. In order to better compare GMVM(1,N) and D-SIRGM(1,N), the results of the two models in Table 7 are drawn in Fig. 10.
Fig. 10a

Comparison of D-SIRGM(1,N) and GMVM(1,N) for four-week of data in the UK.

Simulation and prediction results of each model for four-week data in the UK. On the whole, the simulation and prediction effects of the two models are good. The two curves are also distributed near the actual data curve, and D-SIRGM(1,N) is more consistent with the original data. In the simulation stage, the curve of D-SIRGM(1,N) is basically consistent with the actual data curve. In addition, the curve of GMVM(1,N) changes with time and the growth rate also increases, so it is flat in the early stage and sharply increases in the late stage. From Fig. 10b, D-SIRGM(1,N) is closer to the changing trend of actual data and has a small and stable error.
Fig. 10b

Comparison of APE of each model for four-week of data in the UK.

Comparison of D-SIRGM(1,N) and GMVM(1,N) for four-week of data in the UK. Comparison of APE of each model for four-week of data in the UK. Based on the above three UK cases, it can be seen that D-SIRGM(1,N) can not only keenly simulate the variation trend and numerical characteristics of large numerical data, but also have good simulation and prediction performance with different amounts of simulated data. When the amount of simulated data is two weeks, three weeks and four weeks, the simulation accuracy is 1.1472%, 0.9585%, 1.5876%, respectively, and the prediction accuracy is around 2%, that is, regardless of the size of the simulated sample, this model has stable simulation performance and good prediction effect.

Application summary

To illustrate the differences between the D-SIRGM(1,N) model and the comparison models, all the results of four models from six cases in Cuba and the UK are presented in Table 8 below.
Table 8

Simulation and prediction errors of each model in each case.

MAPEModelCuba
UK
Case1Case2Case3Case1Case2Case3
MAPEsD-SIRGM(1,N)2.2243.66443.12711.14720.95851.5876
NGM(1,N)15.246858.021722.757434.605743.91139.7817
GM(1,N)5.271135.48338.79189.63136.320938.0502
GMVM(1,N)2.909122.74659.5644.4821.11766.5799

MAPEpD-SIRGM(1,N)6.57173.0893.43512.23651.98122.9144
NGM(1,N)40.8028126.773333.160866.631479.8471102.1202
GM(1,N)10.748722.875436.434429.75232.594962.9308
GMVM(1,N)11.176380.633343.794516.50585.39266.9203
As can be seen from Table 8, D-SIRGM(1,N)model has the smallest simulation and prediction error in each case, which is superior to the three comparison models and shows good simulation and prediction performance. On the one hand, the number of model modeling data in Case1, Case2 and Case3 increases gradually. But the D-SIRGM(1,N)model all shows good results. The simulation errors are all below 5%, indicating that the simulation performance of this model is relatively stable when the data of different lengths are used for modeling. On the other hand, the value range of Cuba and the UK is different. The value of Cuba is small, and the value of the UK is large. D-SIRGM(1,N) model shows good performance in the calculation of both countries, and the performance of the UK case is relatively better. This may be because the fluctuation of large value data is more apparent, and the differential term in the model structure can grasp its change characteristics more keenly. Hence, it also shows better performance in the UK case. Simulation and prediction errors of each model in each case. From the above cases in Cuba and the UK can obtain the following conclusions. D-SIRGM(1,N) is relatively sensitive to the overall trend of data of different orders of magnitude and can better capture the increase, decrease, and fluctuation characteristics of data. It has a good simulation and prediction performance for data with complex changes. For cases with two, three, and four weeks of modeling data, the simulation accuracy of each case is less than 5%, meaning that D-SIRGM(1,N) has robust simulation performance. And its prediction accuracy is almost below 5%, indicating that the models established with different data volumes all have good prediction performance, and the model is effective in epidemic prediction. During the epidemic prevention and control period, the closed observation period is usually 14–28 days. If the development of the epidemic can be more clearly understood during this period, it will be more conducive for relevant departments to respond in advance and take reasonable and targeted measures based on the analysis results. D- SIRGM(1,N) has a good simulation and prediction performance for the data within 14–28 days. Therefore, during the prevention and control period it can make effective prediction of the next seven days by the existing data. On the one hand, relevant departments can timely adjust the assessment of epidemic risk levels based on the weekly analysis results to strengthen or weaken the prevention levels, which can control the corresponding costs and avoid the waste of resources due to excessive prevention or the aggravation of the epidemic caused by inadequate prevention. On the other hand, people can judge the current situation based on the results, and appropriately reduce or increase their movement, so as to promote the control measures during-epidemic and economic recovery measures in post-epidemic.

Conclusion

COVID-19 with strong infectivity has spread globally. The epidemic is affected by various factors, and its change is uncertain. Historical data with a large span may influence the effect of some prediction models, so there are certain limitations. As a kind of model applicable to small sample data and widely used, the grey prediction model has been applied to the prediction of COVID-19 and achieved some results. However, most current grey prediction models are univariate and seldom consider multiple influencing factors. At the same time, the original data of COVID-19 is modeled and calculated directly and seldom consider the data characteristics of infectious diseases. Therefore, from the background of infectious disease model, this paper proposes a new grey prediction model based on the SIR infectious disease model. In this paper, by analyzing the relationship between the parameters of the classic SIR model, the dynamic SIR differential equation is established, and the grey buffer operator is introduced to put forward a SIR grey prediction model based on the buffer operator. At the same time, the classical mathematics method Laplace transform is used to solve the model, and the modeling steps and the process of the model are obtained. Compared with other prediction methods, the proposed model retains the structural characteristics of SIR infectious disease model, which can better represent the changing trend of the number of the confirmed infectious diseases and the quantitative relationship between the number of confirmed and cured infectious diseases, and is more in line with the actual background of infectious disease prediction. Compared with the traditional grey prediction model, the proposed model structure contains the differential terms of all variables, which can better reflect the impact of the change degree of relevant variables on the whole system and can grasp data fluctuations more keenly in the actual calculation. Therefore, the new model can not only take the structure and computational advantages of the grey prediction model, but also get more explanatory prediction results consistent with the law of infectious diseases. The model is applied to the prediction of COVID-19 in the UK and Cuba. Compared with classical GM(1,N), NGM(1,N) and GMVM(1,N), the new model has the optimal simulation and prediction accuracy, and its performance is relatively stable for different types of data. It also shows that the new model has universal applicability to large or small country, and can be applied to epidemic prediction in each country. At the same time, the model has obtained effective results for the data of two, three and four weeks, indicating that the model can make effective prediction by using historical data of different lengths, which is very important for countries to adjust the epidemic prevention and control measures in real time reasonably and effectively. In this paper, a new grey prediction model is proposed from the background of infectious diseases, and the buffer operator is used to preprocess the data. However, the prediction of COVID-19 is a very complicated work. Although the proposed model shows good performance in calculation, there are still some problems that can be further studied. In the future, this model can be further optimized from the data processing and parameter optimization to improve the overall accuracy. At the same time, the modeling mechanism of the model will be studied. The advantages of the model structure and its application scope are analyzed theoretically. In addition, the model structure can be generalized and applied to other areas.

CRediT authorship contribution statement

Huiming Duan: Conceptualization, Methodology, Funding acquisition, Project administration, Supervision, Writing – original draft, Editing. Weige Nie: Software, Visualization, Writing – original draft, Editing, Validation, Investigation, Formal analysis, Data curation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
  18 in total

1.  A novel multivariable grey prediction model and its application in forecasting coal consumption.

Authors:  Huiming Duan; Xilin Luo
Journal:  ISA Trans       Date:  2021-03-22       Impact factor: 5.468

Review 2.  Prediction of the Number of Patients Infected with COVID-19 Based on Rolling Grey Verhulst Models.

Authors:  Yu-Feng Zhao; Ming-Huan Shou; Zheng-Xin Wang
Journal:  Int J Environ Res Public Health       Date:  2020-06-25       Impact factor: 3.390

3.  Multiple Ensemble Neural Network Models with Fuzzy Response Aggregation for Predicting COVID-19 Time Series: The Case of Mexico.

Authors:  Patricia Melin; Julio Cesar Monica; Daniela Sanchez; Oscar Castillo
Journal:  Healthcare (Basel)       Date:  2020-06-19

4.  Time series modelling to forecast the confirmed and recovered cases of COVID-19.

Authors:  Mohsen Maleki; Mohammad Reza Mahmoudi; Darren Wraith; Kim-Hung Pho
Journal:  Travel Med Infect Dis       Date:  2020-05-13       Impact factor: 6.211

5.  Outbreak Trends of Coronavirus Disease-2019 in India: A Prediction.

Authors:  Sunita Tiwari; Sushil Kumar; Kalpna Guleria
Journal:  Disaster Med Public Health Prep       Date:  2020-04-22       Impact factor: 1.385

6.  A novel grey model based on traditional Richards model and its application in COVID-19.

Authors:  Xilin Luo; Huiming Duan; Kai Xu
Journal:  Chaos Solitons Fractals       Date:  2020-11-17       Impact factor: 5.944

7.  Forecasting the U.S. oil markets based on social media information during the COVID-19 pandemic.

Authors:  Binrong Wu; Lin Wang; Sirui Wang; Yu-Rong Zeng
Journal:  Energy (Oxf)       Date:  2021-03-18       Impact factor: 7.147

8.  Short-term prediction of COVID-19 spread using grey rolling model optimized by particle swarm optimization.

Authors:  Zeynep Ceylan
Journal:  Appl Soft Comput       Date:  2021-06-09       Impact factor: 6.725

9.  Modified SEIR and AI prediction of the epidemics trend of COVID-19 in China under public health interventions.

Authors:  Zifeng Yang; Zhiqi Zeng; Ke Wang; Sook-San Wong; Wenhua Liang; Mark Zanin; Peng Liu; Xudong Cao; Zhongqiang Gao; Zhitong Mai; Jingyi Liang; Xiaoqing Liu; Shiyue Li; Yimin Li; Feng Ye; Weijie Guan; Yifan Yang; Fei Li; Shengmei Luo; Yuqi Xie; Bin Liu; Zhoulang Wang; Shaobo Zhang; Yaonan Wang; Nanshan Zhong; Jianxing He
Journal:  J Thorac Dis       Date:  2020-03       Impact factor: 3.005

10.  Modeling and prediction of COVID-19 in Mexico applying mathematical and computational models.

Authors:  O Torrealba-Rodriguez; R A Conde-Gutiérrez; A L Hernández-Javier
Journal:  Chaos Solitons Fractals       Date:  2020-05-29       Impact factor: 5.944

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.