| Literature DB >> 32246053 |
Songyu Wu1, Wenchao Wu1, Hetian Xue1, Chenhao Zhao1, Yuhan Yang1, Kai An1, Qing Zhen2.
Abstract
Reporting on brucellosis, a relatively rare infectious disease caused by Brucella, is often delayed or incomplete in traditional disease surveillance systems in China. Internet search engine data related to brucellosis can provide an economical and efficient complement to a conventional surveillance system because people tend to seek brucellosis-related health information from Baidu, the largest search engine in China. In this study, brucellosis incidence data reported by the CDC of China and Baidu index data were gathered to evaluate the relationship between them. We applied an autoregressive integrated moving average (ARIMA) model and an ARIMA model with Baidu search index data as the external variable (ARIMAX) to predict the incidence of brucellosis. The two models based on brucellosis incidence data were then compared, and the ARIMAX model performed better in all the measurements we applied. Our results illustrate that Baidu index data can enhance the traditional surveillance system to monitor and predict brucellosis epidemics in China.Entities:
Mesh:
Year: 2020 PMID: 32246053 PMCID: PMC7125199 DOI: 10.1038/s41598-020-62517-7
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Trends of the actual incidence of brucellosis from January 2011 to December 2018.
Description of the observed data, there are five forms used to express brucellosis.
| Abbr. | Min. | Mean | Median | Max | Std.Dev. |
|---|---|---|---|---|---|
| B1 | 211 | 582.5 | 580.5 | 1133 | 201.2 |
| B2 | 84 | 324.9 | 305.5 | 1957 | 197.6 |
| B3 | 155 | 264.2 | 261.5 | 465 | 60.5 |
| B4 | 35 | 127.2 | 130.5 | 187 | 28.0 |
| B5 | 45 | 140.0 | 140.0 | 243 | 28.3 |
B1, B2, B3, B4, and B5 are the different keywords describing brucellosis in the Chinese language.
Figure 2Time series of 5 different index terms from January 2011 to December 2018.
The results of Spearman correlation analysis between BSI data and actual brucellosis incidence shows that all Baidu search indexes correlated with the time series of brucellosis incidence.
| Abbr. | rs | P-value |
|---|---|---|
| B1 | 0.491 | <0.01 |
| B2 | 0.362 | <0.01 |
| B3 | 0.426 | <0.01 |
| B4 | 0.447 | <0.01 |
| B5 | 0.501 | <0.01 |
The results of cross-correlation analysis between BSI data and actual brucellosis incidence. CCF: cross-correlation function.
| Abbr. | Maximum CCF | Lag | P-value |
|---|---|---|---|
| B1 | 0.491 | 0 | <0.01 |
| B2 | 0.212 | 0 | 0.038 |
| B3 | 0.476 | 0 | <0.01 |
| B4 | 0.367 | 0 | <0.01 |
| B5 | 0.391 | 0 | <0.01 |
Parameters of ARIMA (1,1,1) (0,1,1)[12].
| Coefficient | Standard error | T | P-value | |
|---|---|---|---|---|
| AR1 | 0.606 | 0.169 | 3.576 | <0.001 |
| MA1 | −0.817 | 1.150 | −7.107 | <0.001 |
| SMA1 | −0.762 | 0.209 | −3.644 | <0.001 |
The parameters of the ARIMA model with BSI as the external variable.
| Coefficient | Standard error | T | P-value | |
|---|---|---|---|---|
| AR1 | 0.688 | 0.169 | 4.074 | <0.001 |
| MA1 | −0.892 | 0.119 | −7.524 | <0.001 |
| SMA1 | −0.572 | 0.144 | −3.972 | <0.001 |
| B1 | 3.057 | 0.815 | 3.751 | <0.001 |
| B2 | −0.536 | 0.211 | −2.530 | 0.011 |
| B4 | 8.712 | 3.202 | 2.721 | 0.007 |
Comparison between the ARIMA and ARIMAX models.
| Model 1(ARIMA) | Model 2(ARIMAX) | |
|---|---|---|
| ME | −16.3709 | −3.4060 |
| MAE | 321.1026 | 292.1836 |
| MPE | −0.7645 | 0.3039 |
| MAPE | 9.02% | 8.07% |
| RMSE | 445.6888 | 399.3604 |
| AIC | 1288.43 | 1259.64 |
| Ljung-Box p | 0.9396 | 0.7705 |
ME: mean error, MAE: mean average error, MPE: mean percentage error, MAPE: mean absolute percentage error, RMSE: root mean square error, AIC: akaike information criterion.
Actual brucellosis incidence in 2019 and the out-of-sample prediction for ARIMA (Model 1) and ARIMAX (Model 2).
| Time | Actual incidence | Model 1 | Model 2 | ||
|---|---|---|---|---|---|
| Value | 95%CI | Value | 95%CI | ||
| Jan 2019 | 2390 | 1696.793 | (735.412, 2658.175) | 1851.391 | (1007.858, 2694.924) |
| Feb 2019 | 2227 | 1833.312 | (609.734, 3056.889) | 1738.133 | (659.862, 2816.403) |
| Mar 2019 | 4021 | 3808.437 | (2430.329, 5189.545) | 4035.677 | (2823.679, 5247.675) |
| Apr 2019 | 4559 | 4406.562 | (2919.176, 5893.947) | 4927.924 | (3627.244, 6228.604) |
| May 2019 | 5238 | 5333.362 | (3759.529, 6907.195) | 5391.204 | (4025.604, 6756.804) |
Figure 3The whole research process from the selection of keywords to out-sample prediction.