| Literature DB >> 36198691 |
Yixuan Tan1, Yuan Zhang2, Xiuyuan Cheng3, Xiao-Hua Zhou4,5,6.
Abstract
A better understanding of various patterns in the coronavirus disease 2019 (COVID-19) spread in different parts of the world is crucial to its prevention and control. Motivated by the previously developed Global Epidemic and Mobility (GLEaM) model, this paper proposes a new stochastic dynamic model to depict the evolution of COVID-19. The model allows spatial and temporal heterogeneity of transmission parameters and involves transportation between regions. Based on the proposed model, this paper also designs a two-step procedure for parameter inference, which utilizes the correlation between regions through a prior distribution that imposes graph Laplacian regularization on transmission parameters. Experiments on simulated data and real-world data in China and Europe indicate that the proposed model achieves higher accuracy in predicting the newly confirmed cases than baseline models.Entities:
Mesh:
Year: 2022 PMID: 36198691 PMCID: PMC9534028 DOI: 10.1038/s41598-022-18775-8
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1A diagram illustrating the model proposed in this paper. The example includes three regions, marked by circles in blue and indexed by 1, 2, and 3. For and , on the edge (k, j) represents the similarity between regions k and j. The square nodes associated with each region denote the compartments for each region, including susceptible, exposed, hospitalized, and removed compartments. The arrows connecting square nodes denote the transition between compartments in each region, and the details can be found in “Model description”. Note that and , the transmission parameters in the three regions, are allowed to be spatially heterogeneous. The double arrows in red denote the transportation between three regions on the l-th day. For (), represents the total transportation volume from region k to region j on the l-th day.
List of notations and parameters.
| Notation | Description | Notes |
|---|---|---|
| Total number of regions | ||
| Total number of days | ||
Susceptible individuals in the | The same range of and | |
| Exposed individuals in the | ||
| Hospitalized individuals in the | ||
| Removed individuals in the | ||
| Total individuals in the | be constant over time | |
| Traveling volume matrix on the | ||
| Deterministic counterpart of | Other compartments follow the same | |
| Determined by ( | Convention of notations, | |
| Accumulated confirmed cases determined by ( | ||
| Accumulated removed cases determined by ( | ||
| Accumulated confirmed cases on the | ||
| Accumulated removed cases on the | ||
Newly confirmed cases determined by ( on the | ||
| Newly confirmed cases on the | ||
| Infection rate in the | are allowed to be time-varying | |
| Inverse of average incubation period | ||
| Inverse of average removed time in the | ||
| Set of parameters to be estimated | ||
The matrix characterizing the proximity between regions | ||
Total number of groups that | ||
| The | ||
Penalty factor of Graph Laplacian regularization | ||
Parameter reducing regularization between inter-group regions | ||
| Affinity matrix constructed by | ||
| Parameter in the prior ( |
Models to be compared when the transportation data are available.
| Model | Migration | Heterogeneity | Prior of |
|---|---|---|---|
| 1 | Uniform prior | ||
| 2 | Uniform prior | ||
| 3 | Uniform prior | ||
| 4 | Uniform prior | ||
| 5 | Graph Laplacian prior |
Model 5 is the proposed model in this paper, and Models 1–4 are baseline models with different settings.
Figure 2True and fitted trajectories for the simulated data with four provinces. The vertical lines show the threshold of training-testing split. In Model 5, is chosen, at which the averaged validation errors over 100 replicas are minimized as shown in Supplementary Figs. S2 and S3.
Figure 3Absolute errors of fitted trajectories of Models 1–5 for the simulated data with four provinces. The vertical lines show the threshold of training-testing split.
Training and testing errors with standard deviation of Models 1–5 for simulated data with four provinces.
| Model | M | H | GL prior | MAE | MAE | MSE | MSE |
|---|---|---|---|---|---|---|---|
| 1 | |||||||
| 2 | |||||||
| 3 | |||||||
| 4 | |||||||
| 5 | |||||||
The formulas of errors are detailed in Sect. B (Eq. (S5)) of Supplementary Information. Remarks for Columns 1–4 and the choice of in Model 5 are the same as those in Table 3. MAE, MAE, MSE and MSE are computed for each of the 100 replicas, then the mean and standard deviation are presented in the table above.
Estimated with standard deviation using Models 1–5 for simulated data with four provinces.
| Model | M | H | GL prior | ||||
|---|---|---|---|---|---|---|---|
| 1 | |||||||
| 2 | |||||||
| 3 | |||||||
| 4 | |||||||
| 5 | |||||||
Recall that the ground truth is , , , . Models in Column 1 are detailed in Table 2. Columns 2–4 indicate whether the model permits migration between provinces (for “M”), the heterogeneity of parameters (for “H”) and the Graph Laplacian prior (for “GL prior”), respectively. are inferred for each of the 100 replicas, and then the mean and standard deviation are presented in the table above. In addition, for parameter inference using Model 5 , ( respectively), at which the averaged relative validation errors over 100 replicas are minimized as shown in Supplementary Figs. S2 and S3.
Figure 4Maps of the thirty provinces divided into three groups, colored by transmission parameters or their estimates. In all the three panels, the circles on top of the panels denote the nine provinces in Group 1 (indexed by 1–9), the squares in the middle of the panels denote twelve provinces in Group 2 (indexed by 10–21), and the diamonds at the bottom of the panels denote nine provinces in Group 3 (indexed by 22–30). The provinces are colored by the ground truth in the left panel, by the averaged estimates using Model 5 in the middle panel, and by the averaged estimates using Model 1 in the right panel. The superscripts and denote the models used to obtain the estimates of . In the experiments, the traveling volumes between provinces are taken to be constant.
Figure 7Testing and validation errors on simulated data with thirty provinces. Testing and validation errors on simulated data with four provinces. Left: The weighted prediction errors on validation set (MAE) and testing set (MAE) respectively, plotted vs. the values of . The errors are averaged over 100 replicas of experiment. Right: Same plot of MSE error. In each plot, the blue and red horizontal lines show the values of the averaged errors when . Note that both the MAE and MSE validation errors are minimized at , which is marked by blue squares in both plots. The construction of training/validation/testing data is detailed in Sect. A.1.1 of Supplementary Information, and the formulas of computing the errors can be found in Sect. B of Supplementary Information.
Figure 5True and fitted trajectories for the simulated data in the three provinces chosen from the total thirty provinces. The vertical lines show the threshold of training-testing split. For Model 5, , at which the averaged validation errors over replicas are minimized (as shown in Fig. 7).
Figure 6Absolute errors of the fitted trajectories for simulated data in the three provinces chosen from the total thirty ones. The vertical lines show the threshold of training-testing split.
Estimated with standard deviation using Models 1–5 for simulated data with thirty provinces.
| Model | M | H | GL prior | |||
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 | ||||||
| 4 | ||||||
| ( | ||||||
| 5( | ||||||
| ( | ||||||
The ground truth is (in ), (in ), (in ) as listed in Sect. C of Supplementary Information. Remarks for Columns 1–4 and estimates of ’s are the same as those in Table 3. In addition, for inference of parameters using Model 5, , at which the averaged validation errors over 100 replicas (computed with partition P) are minimized as shown in Fig. 7 and Supplementary Fig. S4. The partitions of Model 5 are P, , and respectively. P is the ground truth underlying graph structure, and and are mismatched partitions introduced in “Models to compare”.
Training and testing errors of Models 1–5 with standard deviation for simulated data with thirty provinces.
| Model | M | H | GL prior | MAE | MAE | MSE | MSE |
|---|---|---|---|---|---|---|---|
| 1 | |||||||
| 2 | |||||||
| 3 | |||||||
| 4 | |||||||
| ( | |||||||
| 5 ( | |||||||
| ( |
The formulas of errors are detailed in Sect. B (Eq. (S5)) of Supplementary Information. Remarks for Columns 1–4 and presented results of errors are the same as those in Table 4. In addition, the remarks for the choice of in Model 5 and the partitions P, , and are the same as those in Table 5.
Figure 12Testing and validation errors on real-world COVID-19 data in China. The two plots are similar to those in Figure 7, with errors computed on the real-world data in China instead of simulated data. Both the MAE and MSE validation errors are minimized at , which is marked by blue squares in both plots. and are chosen by perturbing the minimizer , have slightly larger validation errors, and are marked with yellow pentagrams and green diamonds in both plots. The construction of training/validation/testing data is detailed in Sects. A.1.2 and D of Supplementary Information.
Figure 8True and fitted trajectories in Hubei. The orange line with circles shows the true trajectory, the blue lines with crosses show the predicted deterministic trajectories using Model 5, blue and orange scatter plots show 100 stochastic trajectories with inferred using Model 5 and sampled from the posterior distribution of respectively. In each figure, the black vertical line shows the threshold of training-testing split. For Model 5, is chosen, at which the validation errors achieves the minimum value, marked by blue squares in Fig. 12 and Supplementary Fig. S5.
Figure 9True and fitted trajectories in Henan. The remarks for the lines and scatter plots are the same as those in Figure 8. For Model 5, , and are chosen. As shown in Fig. 12 and Supplementary Fig. S5, the validation errors are minimized at . and are obtained by perturbing the minimizer without increasing validation errors much. , and are marked by yellow pentagrams, blue squares, and green diamonds respectively in Fig. 12 and Supplementary Fig. S5.
Figure 10True and fitted trajectories in Anhui. The remarks for the lines and scatter plots are the same as those in Fig. 8. The choice of , and has been explained in Fig. 9.
Figure 11Estimated posterior distribution of in Hubei. The vertical black lines represent the values of corresponding using Model 5.
Estimated transmission rates in Hubei.
| Mean of estimated posterior | Standard deviation of estimated posterior | ||
|---|---|---|---|
| 0.3566 | 0.3566 | ||
| 0.0723 | 0.0716 | 0.0076 |
is inferred using (4.4) with , at which the validation errors are minimized as shown in Fig. 12 and Supplementary Fig. S5. The posterior distribution is estimated by MCMC iterations.
Training and testing errors of Models 1–5 for real-world data in China.
| Model | M | H | GL prior | MAE | MAE | MSE | MSE |
|---|---|---|---|---|---|---|---|
| 1 | 0.399 | 0.497 | 0.650 | 0.626 | |||
| 2 | 0.388 | 0.459 | 0.645 | 0.590 | |||
| 3 | 0.296 | 0.398 | 0.565 | 0.511 | |||
| 4 | 0.296 | 0.400 | 0.562 | 0.514 | |||
| 5 | 0.296 | 0.373 | 0.565 | 0.477 | |||
| 0.298 | 0.343 | 0.572 | 0.438 | ||||
| 0.303 | 0.328 | 0.582 | 0.416 |
The formulas of errors are detailed in Sect. B (Eq. (S5)) of Supplementary Information. The remarks for Columns 1–4 are the same as those in Table 4. In addition, the choice of , and in Model 5 has been explained in Fig. 9. Note that the validation errors are minimized at as shown in Fig. 12.
Figure 17Testing and validation errors on real-world COVID-19 data in Europe. The two plots are similar to those in Fig. 7, with the errors computed on the real-world data in Europe instead of simulated data. Both the MAE and MSE validation errors are minimized at , which is marked by blue squares in both plots. and are chosen by perturbing the minimizer , have slightly larger validation errors, and are marked with yellow pentagrams and green diamonds in both plots. The construction of training/validation/testing data is detailed in Sects. A.1.2 and E of Supplementary Information.
Figure 13True and fitted trajectories in Austria. The remarks for the lines and scatter plots are the same as those in Fig. 8. The vertical lines show the threshold of training-testing split of COVID-19 data in Europe. For Model 3’, , and are chosen. As shown in Fig. 17 and Supplementary Fig. S6, the validation errors are minimized at , and and are obtained by perturbing the minimizer . , and are marked by yellow pentagrams, blue squares, and green diamonds respectively in Fig. 17 and Supplementary Fig. S6.
Figure 14True and fitted trajectories in Germany. The remarks for the lines and scatter plots are the same as those in Fig. 8. The choice of , and has been explained in Fig. 13.
Figure 15True and fitted trajectories in Italy. The remarks for the lines and scatter plots are the same as those in Fig. 8. The choice of , and has been explained in Fig. 13.
Figure 16Estimated posterior distributions of in Italy. The vertical black lines represent the values of corresponding using Model 3’.
Estimated transmission rates in Italy.
| Mean of estimated posterior | Standard deviation of estimated posterior | ||
|---|---|---|---|
| 0.1034 | 0.1035 | ||
| 0.1719 | 0.1719 |
is inferred using (4.5) with , at which the validation errors are minimized as shown in Fig. 17 and Supplementary Fig. S6. The posterior distribution is estimated by MCMC iterations.
Training and testing errors of Models 1’–3’ for real-world data in Europe.
| Model | M | H | GL prior | MAE | MAE | MSE | MSE |
|---|---|---|---|---|---|---|---|
| 1’ | 0.330 | 0.639 | 0.424 | 0.691 | |||
| 2’ | 0.171 | 0.543 | 0.228 | 0.599 | |||
| 3’ ( | 0.203 | 0.450 | 0.268 | 0.489 | |||
| 0.214 | 0.447 | 0.285 | 0.488 | ||||
| 0.225 | 0.441 | 0.302 | 0.486 | ||||
| 3’ ( | 0.209 | 0.436 | 0.273 | 0.511 | |||
| 0.222 | 0.415 | 0.291 | 0.492 | ||||
| 0.237 | 0.399 | 0.311 | 0.480 |
The formulas of errors are detailed in Sect. B (Eq. (S5)) of Supplementary Information. The remarks for Columns 2–4 are the same as those in Table 4. In addition, the choice of , and in Model 3’ has been explained in Fig. 13. Note that the validation errors are minimized at as shown in Fig. 17. The partitions in Model 3’ are P and respectively. P and are two different partitions of countries in Europe introduced in “Models to compare”.