Literature DB >> 34722380

Forecasting the Number of New Coronavirus Infections Using an Improved Grey Prediction Model.

Hui Li¹, Bo Zeng¹, Jianzhou Wang¹, Hua'an Wu¹.

Abstract

BACKGROUND: Recently, a new coronavirus has been rapidly spreading from Wuhan, China. Forecasting the number of infections scientifically and effectively is of great significance to the allocation of medical resources and the improvement of rescue efficiency.
METHODS: The number of new coronavirus infections was characterized by "small data, poor information" in the short term. The grey prediction model provides an effective method to study the prediction problem of "small data, poor information". Based on the order optimization of NHGM(1,1,k), this paper uses particle swarm optimization algorithm to optimize the background value, and obtains a new improved grey prediction model called GM(1,1|r,c,u).
RESULTS: Through MATLAB simulation, the comprehensive percentage error of GM(1,1|r,c,u), NHGM(1,1,k), UGM(1,1), DGM(1,1) are 2.4440%, 11.7372%, 11.6882% and 59.9265% respectively, so the new model has the best prediction performance. The new coronavirus infections was predicted by the new model.
CONCLUSION: The number of new coronavirus infections in China increased continuously in the next two weeks, and the final infections was nearly 100 thousand. Based on the prediction results, this paper puts forward specific suggestions.

Entities: Chemical

Keywords: Background value optimization; Forecasting the number of infections; Grey prediction model; New coronavirus; Particle swarm optimization

Year: 2021 PMID： 34722380 PMCID： PMC8542816 DOI： 10.18502/ijph.v50i9.7057

Source DB: PubMed Journal: Iran J Public Health ISSN： 2251-6085 Impact factor: 1.429

Introduction

The new coronavirus was found in viral pneumonia cases in 2019 in Wuhan, China. On Jan 12, 2020, the new coronavirus was officially named “new coronavirus 2019” (2019-nCoV) by WHO. This virus is mainly transmitted by the droplets of coughing or sneezing, which is highly infectious. The movement and aggregation of people bring great resistance to the effective control of the new coronavirus. Although the Chinese government has taken positive measures, such as activating the joint prevention and control mechanism fully, concentrating treatment, the new coronavirus has increased rapidly. According to the website data of the national health commission of China (1), from 440 cases on Jan 21, 2020 to 28018 cases on Feb 5, 2020, the situation of China is very serious. What’s worse, on Feb 5, 2020, there were 19665 cases in Hubei Province. From the distribution of new coronavirus infections in China in Fig. 1, six provinces with 500–1000 cases and 13 provinces with 100–500 cases.

Fig. 1:

Distribution of new coronavirus infections in China as of Feb 5, 2020 (Original)

Distribution of new coronavirus infections in China as of Feb 5, 2020 (Original) The increasing number of new coronavirus infections has brought a challenge to the allocation and scheduling of emergency medical resources (medicine, medical staff). Therefore, forecasting the number of infections scientifically and effectively is of great significance to the rational allocation and scheduling of medical resources. However, due to the uncertain mechanism of the new coronavirus, the latency of the infected and the irregular flow of people, it is difficult to collect a large amount of accurate statistical data in a short period of time. Therefore, the number of new coronavirus infections have typical characteristics of “small data, poor information”. Grey system theory is a theory initiated by Deng Julong (2), which is an effective method to study the uncertainty problem with small data and poor information (3). Grey prediction model is an important branch of grey system theory (4). Among them, the GM(1,1) model, as the first single-variable grey prediction model in grey system theory, has been widely used in the fields of transportation (5), energy (6–8), environment (9), tourism (10), medical (11) and so on. Over the past 30 years, scholars have done a lot of research on GM (1,1), mainly focusing on how to further optimize the parameters (12,13), such as the initial conditions (14,15), the accumulative order (16,17), and how to extend the model form (18–21). These studies have enriched and broadened the application scope of single-variable grey prediction model and improved its prediction performance. However, the structural form of GM(1,1) is expressed in homogeneous exponential index, while more original sequences in reality are expressed in approximate non-homogeneous exponential index. Researchers have done a lot of research on how to improve the modeling ability of the GM(1,1) model to approximate non-homogeneous exponential index. On the one hand, the model is expanded from the perspective of optimizing the model structure to ensure that the final restored expression of the model contains constant terms (22–24). On the other hand, from the perspective of optimizing model modeling process, the direct modeling method was used for the original sequence with exponential growth features (25,26). The NHGM(1,1, k) model (22) is a grey prediction model that can fit the original sequence with non-homogeneous exponential index. In this paper, the order of NHGM(1,1,k) was optimized to improve the flexibility and adaptability of the model and then improve the prediction performance. However, in the above modeling process, the mean sequence generated by consecutive neighbors is the background value, its background value coefficient was fixed at 0.5. This simplified processing method cannot achieve the best simulation effect. In order to further optimize the performance of the model, the particle swarm optimization algorithm was used to optimize the background value of the model, and the optimal background value coefficient was obtained under the condition of the minimum square sum of simulation error, then got a new model with better simulation effect. This paper intends to use the new model to predict the number of new coronavirus infections in China with the characteristics of “small data, poor information”.

Model

This section first describes the order optimization of NHGM(1,1,k), and the optimized model is called GM(1,1|r,c). Secondly, on the basis of order optimization, we introduce the new model with background value optimization by PSO, and we called the new model GM(1,1|r,c,u).

Description of GM(1,1|r,c)

Definition 1 (22,27) Let the non-negative sequence X(0) = (X(0) (1), x(0)(2), … X(0)(n)) be the original sequence. Suppose r ∈ R+, then the sequence X() = (X() (1), x()(2), … X()(n)) is called the r-order accumulative generating sequence of X(0), where The sequence X(−) = (X(−) (1), x(−)(2), … X(−)(n)) is called the r-order inverse accumulative generating sequence of X(0), where The sequence Z() = (z() (1), z()(2) is called the mean sequence generated by consecutive neighbors of X(), where Definition 2 Suppose the definition of X(i), Z() as Definition 1, then the basic form of the GM(1,1|r,c) model is as followed. The whitenization equation (or image equation) of the GM(1,1|r,c) model is The time-response expression of the GM(1,1|r,c) model is The final restored expression of the GM(1,1|r,c) model is Theorem 1 Suppose the definition of X(0), X(), Z() as Definition 1, let p̂ = (a,b,c) as the least squares estimate of the sequence of parameters, and Then the least squares estimate of the sequence of parameters of GM(1,1|r,c) satisfies p̂ = (a,b,c) = (B)−1 B.

The GM(1,1|r,c,u) model

In the grey prediction model, as the background value, the mean sequence generated by consecutive neighbors is a common smoothing method to weaken the influence of extreme value or abnormal value in 1-AGO sequence on grey action. Background value coefficient is the weight of adjacent elements in the process of constructing the mean sequence generated by consecutive neighbors. The different values of background value coefficient will affect the simulation and prediction effect of the model. This section we introduce the new model GM(1,1|r,c,u) with background value coefficient optimization by PSO. Definition 3 Suppose the definition of X(0), X(), as Definition 1, let u(u ∈ (0,1)) as background value coefficient, then is the basic form of GM(1,1|r,c,u). The whitenization equation of the GM(1,1|r,c,u) model is According to equation [8] for parameter estimation, the corresponding parameter values are calculated. Through equation [9], the time-response expression and the final restored expression of GM(1,1|r,c,u) are deduced. It is not difficult to find that equation [9] is the same as equation [5], so the time-response expression and the final restored expression of GM(1,1|r,c,u) is as the same as equation [6] and equation [7] respectively. Theorem 2 Suppose the definition of X(), u as Definition 1 and Definition 3, let p̂ = (a,b,c) as the least squares estimate of the sequence of parameters, and Then the least squares estimate of the sequence of parameters of GM(1,1|r,c,u) satisfies According to Theorem 2, there are undetermined background value coefficients u in matrix B. At the same time, the estimation of the parameters depends on the matrix B and Y. Different background value coefficients will correspond to different matrix B, then different parameter values will be obtained. Therefore, background value coefficient u is an important parameter that affects the size of p̂ = (a,b,c) of the GM(1,1|r,c,u) model. The different p̂ = (a,b,c) will affect the simulation/prediction result of the model, and then affect the accuracy of the model. Where, the optimal value of the background value coefficient u should meet the minimum square sum of simulation error of the GM(1,1|r,c,u) model within the range of 0 < u < 1, that is In equation [11], x(0)(k) is the original value, which is a given term. x̂(0) (k) is the simulation value, which calculates by the time-response expression and the final restored expression of GM(1,1|r,c,u). Obviously, the optimization process of the background value coefficients takes a lot of time and occupies limited computer resources. The emergence and maturity of various swarm optimization algorithms (such as particle swarm optimization, ant colony optimization, etc.) provide good solutions for complex distributed optimization problems. The PSO (the Particle Swarm Optimization) is a group intelligence algorithm for global optimization evolution, which simulates the predation behavior of birds. The algorithm has the advantages of simple structure, few parameters, easy programming and so on. And it has been widely used in function optimization, neural network training and engineering, etc (28–32). At the same time, the particle swarm optimization algorithm based on adaptive mutation of group fitness variance can effectively solve the premature convergence phenomenon and significantly improve the global convergence performance. Therefore, the particle swarm optimization is used to optimize the background value coefficient u of the GM (1,1|r,c,u) model. The specific process is as follows. Step1 Initializing the position and velocity of particles in a particle swarm randomly. Step2 Setting pBest as the current position and gBest as the optimal particle position in initial swarm. Step3 Computing the mean relative simulative percentage errors of GM(1,1|r,c,u) when u equals pBest. Step3.1 Calculating the r-order accumulated generating sequences X() of X(0). Step3.2 Calculating the mean sequence generated by consecutive neighbors Z() when the background value coefficient is u. Step3.3 Calculating matrix B, matrix Y and parameter values p̂ = (a,b,c). Step3.4 Calculating the simulation values X̂(0) of X(0). Step3.5 Calculating the mean relative simulative percentage errors f(pBest) of X̂(0). Step3.6 Judging whether the value of |f(pBest)-f(gBest)| is less than the given convergence value δ; if it meets the accuracy required then go to step 9, otherwise, go to step 4. Step4 Executing the following calculations for all particles. Step4.1 Updating the position and velocity of particle. Step4.2 If the fitness of this particle is superior to pBest, the new position is set to pBest. Step4.3 If the fitness of this particle is superior to gBest, the new position is set to gBest. Step5 Calculating the variance of group fitness σ2, then calculating f(pBest). Step6 Calculating the probability of variation p. Step7 Generating random number ε ∈ [0,1], if ε < p, then perform mutation operation; otherwise, turn Step8. Step8 Judging whether the algorithm meets the convergence rule, if it meets then go to step 9, else go to step 3. Step9 Outputting gBest, which is the optimal value of u, then outputting the simulated data of the GM(1,1|r,c,u) model when u=gBest. According to the modeling mechanism of the above model, the whole modeling flow chart is made as shown in Fig. 2.

Fig. 2:

The flow chart of the model

Forecasting the number of new coronavirus infections

According to the website data of the national health commission of the People’s Republic of China (1), the specific number of daily infection cases from Jan 21, 2020 to Feb 5, 2020 is shown in Table 1. In this section, the grey prediction model GM(1,1|r,c,u) was used to predict the number of new coronavirus infections in China. The specific experimental steps are as follows.

Table 1:

The number of new coronavirus infections in China from Jan 21, 2020 to Feb 5, 2020

Date	Jan 21	Jan 22	Jan 23	Jan 24	Jan 25	Jan 26	Jan 27	Jan 28
Number	440	571	830	1287	1975	2744	4515	5974
Date	Jan 29	Jan 30	Jan 31	Feb 1	Feb 2	Feb 3	Feb 4	Feb 5
Number	7711	9692	11791	14380	17205	20438	24324	28018

The number of new coronavirus infections in China from Jan 21, 2020 to Feb 5, 2020

Step1 Data segmentation

In order to verify the simulation and prediction performance of GM(1,1|r,c,u) at the same time, the sample data in Table 1 needs to be grouped. Specifically, the first 13 data are used as the original data X(0) to establish the model, and the last 3 data are used as the reserved data to test the prediction error of the model. So, we can get the modeling data X(0)

Step2 Calculating parameters of GM(1,1|r,c,u)

According to the modeling process of the model, firstly, the optimal order r* of GM(1,1|r,c) is obtained by particle swarm optimization. Secondly, calculating the optimal background value coefficient u* of GM(1,1|r,c,u) with r=r* by MATLAB and particle swarm optimization. Third, bring r* and u* into the GM(1,1|r,c,u) model, GM(1,1|r*,c, u*) can obtained. Based on these, matrix B, matrix Y and the values of parameters a, b, c can be obtained. The calculation results of relevant parameters are shown in Table 2.

Table 2:

The parameter values of GM(1,1|r,c,u)

Parameter	r ^*	u ^*	a	b	c
Value	−0.3584	0.4721	0.1142	204.5525	−290.5717

The parameter values of GM(1,1|r,c,u)

Step 3 Constructing GM(1,1|r,c,u)

By substituting the parameters a, b, c, r* in Table 2 into equation [6], the GM(1,1|r*,c, u*) model can be constructed to predict the number of new coronavirus infections in China. The results are as follows: The final restored expression of the GM(1,1|r*,c, u*)model is According to equation [20], the simulation and prediction data of GM(1,1|r*,c, u*) can be calculated. Furthermore, residual error and relative simulation/prediction error can be obtained.

Step 4 Comparison of model performance

In order to compare performance among GM(1,1|r*,c, u*) and other GM (1,1) models, GM(1,1|r*,c, u*), NHGM(1,1,k), UGM(1,1)(33) and DGM(1,1)(34) were simulated the number of new coronavirus infections in China respectively. The experimental results are shown in Table 3.

Table 3:

The simulated and predicted data with different models

Date	x ⁽ ⁰ ⁾ (k)	GM(1,1\|r,c, u)		NHGM(1,1,k)		UGM(1,1)		DGM(1,1)

		x̂⁽⁰⁾(k)	Δ(k)	x̂⁽⁰⁾(k)	Δ(k)	x̂⁽⁰⁾(k)	Δ(k)	x̂⁽⁰⁾(k)	Δ(k)
Simulated data
Jan 22	571	567	0.6255	69	87.9759	77	86.5270	1635	186.4029
Jan 23	830	830	0.0007	688	17.0875	699	15.7445	2082	150.8929
Jan 24	1287	1308	1.6398	1415	9.9574	1430	11.1050	2652	106.0349
Jan 25	1975	2034	2.9907	2268	14.8472	2288	15.8251	3377	70.9642
Jan 26	2744	3022	10.1326	3269	19.1432	3294	20.0539	4300	56.6899
Jan 27	4515	4277	5.2726	4444	1.5727	4476	0.8627	5475	21.2608
Jan 28	5974	5798	2.9404	5822	2.5367	5863	1.8533	6972	16.6986
Jan 29	7711	7582	1.6665	7440	3.5140	7492	2.8439	8877	15.1260
Jan 30	9692	9624	0.7057	9338	3.6504	9403	2.9792	11304	16.6335
Jan 31	11791	11915	1.0478	11566	1.9114	11647	1.2200	14394	22.0785
Feb 01	14380	14447	0.4689	14179	1.3948	14281	0.6873	18329	27.4629
Feb 02	17205	17214	0.0520	17247	0.2419	17373	0.9774	23340	35.6566
Average relative simulation percentage error(%)			2.2953	13.6528		13.3899		60.4918
Predicted data
Feb 03	20438	20206	1.1371	20846	1.9955	21003	2.7631	29720	45.4153
Feb 04	24324	23414	3.7416	25069	3.0645	25263	3.8618	37844	55.5845
Feb 05	28018	26830	4.2386	30026	7.1653	30265	8.0188	48190	71.9956
Average relative prediction percentage error(%)			3.0391	4.0751		4.8812		57.6651
Comprehensive percentage error of simulation and prediction (%)			2.4440	11.7372		11.6882		59.9265

The simulated and predicted data with different models Where, x (0)(k) is the original data, x̂ (0)(k) is the simulated data with different model, Δk is the relative percentage error(RPE) of. And In order to reflect the comparison effect of models more directly, according to the experimental results in Table 3, the analog curve and the comparison analysis chart of relative simulation/prediction percentage error with different models are made, as shown in Fig. 3 and 4, respectively. The following conclusions can be drawn.

Fig. 3:

The analog curve with different models

Fig. 4:

The relative percentage error of simulation/prediction with different models

The analog curve with different models The relative percentage error of simulation/prediction with different models (i)Seen from the analog curves with different models, GM(1,1|r*,c, u*), NHGM(1,1,k) and UGM(1,1) are close to the original simulation curve, while DGM(1,1) has some deviation from the original simulation curve. (ii)Seen from the chart of relative simulation/prediction percentage error, the error fluctuation of GM(1,1|r*,c, u*)is obviously smallest, and the stability of the model is best. While the error fluctuation of DGM(1,1) is largest, and the stability of the model is most poor. The trend of NHGM(1,1,k) and UGM(1,1) is close, and their fluctuation and stability are between DGM(1,1) and GM(1,1|r*,c, u*). In a word, the simulation/prediction performance of GM(1,1|r*,c, u*) is the best, while that of DGM(1,1) is the worst. (iii)Seen from the numerical value of comprehensive percentage error of simulation and prediction, the errors of GM(1,1|r*,c, u*), NHGM(1,1,k), UGM(1,1) and DGM(1,1) are 2.4440%, 11.7372%, 11.6882%, 59.9265% respectively. Therefore, the comprehensive percentage error of GM(1,1|r*,c, u*) is much lower than other models. Based on the above conclusions, DGM(1,1) is not suitable for prediction the number of new coronavirus infections in China, while GM(1,1|r*,c, u*) is the best choice because its performance is the best.

Step5 Forecasting the number of the new coronavirus infections in China

Through the comparative analysis of the above models, we choose GM(1,1|r*,c, u*) to predict the number of new coronavirus infections in China in the next two weeks (Table 4).

Table 4:

Prediction of the number of new coronavirus infections in China in the next two weeks

Date	Feb 6	Feb 7	Feb 8	Feb 9	Feb 10	Feb 11	Feb 12
Number	30447	34256	38249	42419	46760	51264	55926
Date	Feb 13	Feb 14	Feb 15	Feb 16	Feb 17	Feb 18	Feb 19
Number	60739	65697	70796	76030	81395	86886	92498

Prediction of the number of new coronavirus infections in China in the next two weeks Seen from the result, the number of new coronavirus infections in China will continue to increase over the next two weeks. The growth rate of infections may slow as control and prevention measures are strengthened, but there will still be an increasing trend in the short term, the final number of infection cases is close to 100 thousand.

Discussion

From the point of prediction results, the infection cases in China will continue to rise in the next two weeks. As a result, the demand for emergency medical resources (medicine, medical staff) will continue to increase in the short term. Meanwhile, the demand for protective materials will increase unprecedentedly, and there will be a larger gap in the short term. In addition, the means of virus control and prevention have been unable to meet the growing trend. Based on the above experimental results, the following suggestions are discussed. In the short term, with the increasing demand for emergency medical resources, on the one hand, relevant enterprises and departments should report production capacity, product inventory, total number of medical staff and other data timely, and then plan the resource quantity of each region rationally according to the allocation and scheduling cost, time, demand and so on. On the other hand, more efforts should be made to produce and import the medical materials, and the transportation work should be done well to send them to the epidemic area in time. Taking the medical bed demand as an example, it is necessary to increase medical beds in medical institutions to accept the increasing infected patients as soon as possible. What’s more, it is possible to coordinate the requisition of hotels, training centers, etc to increase the isolation beds when the medical bed is insufficient. In the short term, the demand for protective materials is growing rapidly, especially for masks, which faces a huge gap. First of all, the production of masks should be expanded rapidly. On the one hand, for enterprises without resumption of work, they should coordinate to solve the difficulties in employment, materials, funds and other aspects. On the other hand, enterprises that do not produce at full capacity should be encouraged to produce as much as possible. Secondly, we should increase the intensity of import procurement. Finally, as there are still some unscientific and unreasonable phenomena in the use of masks, we should strengthen the propaganda of how to use masks scientifically and reasonably. In the short term, with the increasing number of infections, social control and public opinion work should be further strengthened. The following suggestions are given in detail. First, relevant departments should release and publicize authoritative information in time. Secondly, the relevant departments monitor and analyze the flow of population in real time. Third, the relevant departments should do a good job in the organization and guarantee of transportation.

Conclusion

The quantitative prediction results provide a basis for qualitative analysis of medical resources allocation, prevention and control measures and other coping strategies in the short term, and have positive guiding significance and important reference value for the practical work. However, the predicted data is the result of model inference based on the existing data, and if the Chinese government takes strong control measures to control the spread of the epidemic, the final number of infections may be less than the predicted number.

Ethical considerations

Ethical issues (Including plagiarism, informed consent, misconduct, data fabrication and/or falsification, double publication and/or submission, redundancy, etc.) have been completely observed by the authors.

1 in total

1. The conformable fractional grey system model.

Authors: Xin Ma; Wenqing Wu; Bo Zeng; Yong Wang; Xinxing Wu
Journal: ISA Trans Date: 2019-07-05 Impact factor: 5.468

1 in total