| Literature DB >> 31316113 |
Tao Zhang1, Yue Ma2, Xiong Xiao1, Yun Lin1, Xingyu Zhang3, Fei Yin4, Xiaosong Li1.
Abstract
The surveillance of infectious diseases relies on the identification of dynamic relations between the infectious diseases and corresponding influencing factors. However, the identification task confronts with two practical challenges: small sample size and delayed effect. To overcome both challenges to imporve the identification results, this study evaluated the performance of dynamic Bayesian network(DBN) in infectious diseases surveillance. Specifically, the evaluation was conducted by two simulations. The first simulation was to evaluate the performance of DBN by comparing it with the Granger causality test and the least absolute shrinkage and selection operator (LASSO) method; and the second simulation was to assess how the DBN could improve the forecasting ability of infectious diseases. In order to make both simulations close to the real-world situation as much as possible, their simulation scenarios were adapted from real-world studies, and practical issues such as nonlinearity and nuisance variables were also considered. The main simulation results were: ① When the sample size was large (n = 340), the true positive rates (TPRs) of DBN (≥98%) were slightly higher than those of the Granger causality method and approximately the same as those of the LASSO method; the false positive rates (FPRs) of DBN were averagely 46% less than those of the Granger causality test, and 22% less than those of the LASSO method. ② When the sample size was small, the main problem was low TPR, which would be further aggravated by the issues of nonlinearity and nuisance variables. In the worst situation (i.e., small sample size, nonlinearity and existence of nuisance variables), the TPR of DBN declined to 43.30%. However, it was worth noting that such decline could also be found in the corresponding results of Granger causality test and LASSO method. ③ Sample size was important for identifying the dynamic relations among multiple variables, in this case, at least three years of weekly historical data were needed to guarantee the quality of infectious diseases surveillance. ④ DBN could improve the foresting results through reducing forecasting errors by 7%. According to the above results, DBN is recommended to improve the quality of infectious diseases surveillance.Entities:
Mesh:
Year: 2019 PMID: 31316113 PMCID: PMC6637193 DOI: 10.1038/s41598-019-46737-0
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1(a) The simulation structure of Simulation 1 in the absence of nuisance variables; (b) the simulation structure of Simulation 1 in the presence of nuisance variables.
The settings of the simulation scenarios.
| No. of Simulation | Sample Size | Mechanism | Existence of Nuisance Variables |
|---|---|---|---|
| 1 | 340 | linear | N |
| 2 | 52 | linear | N |
| 3 | 340 | nonlinear | N |
| 4 | 340 | linear | Y |
| 5 | 52 | nonlinear | N |
| 6 | 52 | linear | Y |
| 7 | 340 | nonlinear | Y |
| 8 | 52 | nonlinear | Y |
Figure 2The time plots of the real and simulated data sets
The comparison between the real and simulated data.
| Variable | Real Data | Simulated Data | Test Statistics* | |||
|---|---|---|---|---|---|---|
| Mean | std | Mean | std | |||
| Sunshine | 47.97 | 14.18 | 48.13 | 15.25 | 0.8269 | |
| Temperature | 13.37 | 11.46 | 13.14 | 11.79 | Z = −0.4243 | 0.6714 |
| RH | 51.15 | 13.53 | 50.42 | 13.38 | 0.7010 | |
| HFMD | 446 | 324.60 | 439.16 | 321.52 | Z = −0.1387 | 0.8897 |
*t stands for the t statistics of the paired-sample t-test, and Z stands for the Z statistics of the Wilcoxon signed rank test for paired data.
Figure 3The results of dynamic Bayesian network (DBN), Granger causality test and LASSO method applied on each scenario, where the solid lines represented the true positive rate (TPR), and dashed lines represented the false positive rate (FPR).
Figure 4The curve of sample size and TPR (%).
Figure 5(a) The estimated DBN, where the solid lines represented the true positive rate (TPR), and dashed lines represented the false positive rate (FPR); (b) the time plots of the real values of HFMD time series (triangles) and results of the modelling strategy in combination with DBN (solid lines).
The comparison of the two strategies*.
| Strategy | Average Fitting MAPE | Average Forecasting MAPE |
|---|---|---|
| Strategy with DBN | 10.7371% | 15.0701% |
| Strategy without DBN | 11.4175% | 21.9365% |
*The average fitting/forecasting MAPE was calculated as the mean value of the fitting/forecasting MAPEs through the 5000 replicates.