| Literature DB >> 30674890 |
Kang Li1,2, Meiliang Liu1, Yi Feng2, Chuanyi Ning1, Weidong Ou1,2, Jia Sun2, Wudi Wei1, Hao Liang3, Yiming Shao4,5.
Abstract
China's reported cases of Human Immunodeficiency Virus (HIV) and AIDS increased from over 50000 in 2011 to more than 130000 in 2017, while AIDS related search indices on Baidu from 2.1 million to 3.7 million in the same time periods. In China, people seek AIDS related knowledge from Baidu which one of the world's largest search engine. We study the relationship of national HIV surveillance data with the Baidu index (BDI) and use it to monitor AIDS epidemic and inform targeted intervention. After screening keywords and making index composition, we used seasonal autoregressive integrated moving average (ARIMA) modeling. The most correlated search engine query data was obtained by using ARIMA with external variables (ARIMAX) model for epidemic prediction. A significant correlation between monthly HIV/AIDS report cases and Baidu Composite Index (r = 0.845, P < 0.001) was observed using time series plot. Compared with the ARIMA model based on AIDS surveillance data, the ARIMAX model with Baidu Composite Index had the minimal an Akaike information criterion (AIC, 839.42) and the most exact prediction (MAPE of 6.11%). We showed that there are close correlations of the same trends between BDI and HIV/AIDS reports cases for both increasing and decreasing AIDS epidemic. Therefore, the Baidu search query data may be a good useful indicator for reliably monitoring and predicting HIV/AIDS epidemic in China.Entities:
Mesh:
Year: 2019 PMID: 30674890 PMCID: PMC6344537 DOI: 10.1038/s41598-018-35685-w
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Time series of Some Keywords Search Index and monthly reported cases for HIV/AIDS in China, 2011–2016. This picture shows the time-series comparison curve between the Baidu search index and the national monthly report case number for the four keywords “AIDS spread,” “pimple,” “thrush,” and “Initial symptoms of HIV”. (The X-axis date interval is month. The Y-axis uses three coordinates, which the black Y-axis shows the number of monthly report cases, the red Y axis is the Baidu search index of the keywords and the blue Y-axis is the ratio of the search index to the monthly report cases); BDI: Baidu Search index.
The monthly search statistics of Baidu search index with AIDS related keywords from January 2011 to June 2017.
| Categories | Search Keywords (in Chinese) | Search Keywords (in English) | Search Amount Mean ±SD | Minimum | Median | Maximum |
|---|---|---|---|---|---|---|
| General |
| AIDS | 282794.76±81806.11 | 105084 | 287680 | 478888 |
|
| AIDS infection | 3969.22±536.76 | 2697 | 3999 | 5600 | |
|
| AIDS virus | 7474.72±1450.09 | 4704 | 7300.5 | 11130 | |
| HIV | HIV | 80804.25±36113.52 | 26208 | 81807 | 201779 | |
| Epidemiology |
| Gay | 70612.92±34610.10 | 31496 | 63577 | 227478 |
|
| Short version for Gay | 95150.82±27032.44 | 56880 | 89094 | 196320 | |
| MSM | MSM | 18313.65±7210.63 | 11060 | 15177 | 43740 | |
|
| Skating poison | 31739.19±10564.72 | 13832 | 31031 | 48690 | |
|
| Taking drugs | 44442.03±22981.59 | 19065 | 40515 | 200105 | |
|
| HIV/AIDS transmission route | 50743.35±21089.82 | 15204 | 47318 | 103170 | |
|
| Transmission of AIDS | 7494.35±2175.75 | 4984 | 6960 | 18445 | |
|
| Spread of AIDS | 5761.00±1781.26 | 2356 | 6165 | 11563 | |
|
| Street walker | 49207.99±19012.56 | 25530 | 44356 | 103571 | |
|
| Underground prostitute | 8198.35±1512.05 | 6020 | 7828 | 15035 | |
|
| Hotel prostitute | 12688.99±4091.40 | 6960 | 12405 | 37386 | |
|
| Guesthouse prostitute | 12606.51±5170.44 | 5700 | 10703 | 26195 | |
|
| Prostitute | 186255.93±79940.38 | 109709 | 164874 | 655929 | |
|
| Sexual services | 63012.64±43128.26 | 6417 | 74734 | 159371 | |
|
| Sauna Service | 53452.07±11478.20 | 35616 | 54507 | 110732 | |
|
| Male comrades | 5889.83±2448.54 | 2940 | 5356 | 13560 | |
|
| Gay website | 197372.31±303577.34 | 33420 | 57195 | 1142400 | |
|
| Short version for Gay website | 41340.38±15108.44 | 16895 | 37433 | 90086 | |
|
| Anal intercourse | 19588.61±6819.02 | 10013 | 17458 | 34813 | |
| Diagnosis |
| AIDS detection | 33018.03±9039.65 | 15428 | 31545 | 60729 |
|
| AIDS examination | 8498.89±2372.57 | 5270 | 7859 | 17453 | |
|
| AIDS-testing | 3442.53±3376.01 | 217 | 2409 | 14911 | |
|
| AIDS test strip | 16544.07±6882.37 | 8525 | 14264 | 34348 | |
|
| Best testing time for AIDS | 2251.10±1388.90 | 60 | 2502 | 6386 | |
|
| How to check AIDS | 9682.33±4949.57 | 4216 | 7730 | 26629 | |
|
| AIDS self-testing | 10714.46±11162.65 | 1302 | 4852 | 43260 | |
|
| AIDS testing Center | 2845.25±1247.76 | 341 | 2589 | 5460 | |
| HIV | HIV testing | 11223.13±3308.57 | 6804 | 10095 | 25482 | |
| HIV | HIV test strip | 5470.21±3148.79 | 3150 | 4559 | 18368 | |
|
| AIDS incubation period | 34965.79±23678.05 | 12270 | 28222 | 155744 | |
|
| AIDS window period | 29875.68±13916.84 | 11718 | 26585 | 75609 | |
| AIDS Symptom |
| AIDS Symptoms | 56036.86±34095.86 | 29120 | 52082 | 286564 |
|
| Initial symptoms of AIDS | 24792.63±28036.33 | 6468 | 13037 | 135690 | |
|
| Symptoms in AIDS window period | 13617.51±3359.92 | 9780 | 12726 | 23580 | |
|
| Early symptoms of AIDS infection | 3664.33±1262.98 | 682 | 3658 | 9672 | |
| AIDS Symptom | HIV | Initial symptoms of HIV | 4479.74±749.24 | 2660 | 4485 | 6750 |
|
| Papule | 28113.04±7560.40 | 11935 | 27885 | 42160 | |
|
| Thrush | 55245.08±16240.63 | 20608 | 53695 | 98220 | |
|
| Snow-mouth disease | 3412.49±645.11 | 1288 | 3446 | 5301 | |
|
| AIDS Diarrhea | 4242.1±690.90 | 2688 | 4402 | 5890 | |
|
| AIDS low fever | 2112.93±609.71 | 961 | 2046 | 3999 | |
|
| What are the symptoms of AIDS | 12427.42±3307.87 | 6665 | 12302 | 21948 | |
| AIDS Treatment |
| AIDS treatment | 10377.01±3175.92 | 7192 | 9989 | 30566 |
|
| Acyclovir | 41032.24±17393.68 | 15820 | 39901 | 109585 | |
|
| Zidovudine | 5543.51±3237.67 | 1008 | 5730 | 24242 | |
|
| Lamivudine | 15011.61±2749.59 | 9641 | 14632 | 22041 |
Internet user search messages in Baidu using Chinese and the translation of each Chinese keywords is listed in English.
Correlation analysis of Baidu Search index and HIV/AIDS reported cases.
| Keyword (in English) | Coefficients | Keyword (in English) | Coefficients | ||
|---|---|---|---|---|---|
| AIDS-testing | 0.639 | <0.001 | Thrush | 0.766 | <0.001 |
| AIDS virus | 0.652 | <0.001 | Zidovudine | 0.767 | <0.001 |
| AIDS test strip | 0.654 | <0.001 | Street walker | 0.771 | <0.001 |
| AIDS | 0.662 | <0.001 | Acyclovir | 0.776 | <0.001 |
| AIDS examination | 0.665 | <0.001 | Initial symptoms of HIV | 0.792 | <0.001 |
| AIDS incubation period | 0.667 | <0.001 | How to check AIDS | 0.799 | <0.001 |
| Taking drugs | 0.700 | <0.001 | Prostitute | 0.804 | <0.001 |
| Spread of AIDS | 0.704 | <0.001 | HIV | 0.819 | <0.001 |
| AIDS window period | 0.730 | <0.001 | Papule | 0.879 | <0.001 |
| Sexual services | 0.751 | <0.001 |
Correlation coefficient is calculated by Spearman’s rank method. Only the key search keywords with correlation coefficients of 0.6 or above are listed.
Cross-correlation between monthly HIV/AIDS report cases and Baidu search index data.
| Keyword (in English) | Maximum CCF | Lag (month) | Keyword (in English) | Maximum CCF | Lag (month) | ||
|---|---|---|---|---|---|---|---|
| Papule | 0.875 | 0 | <0.001 | Zidovudine | 0.638 | 0 | <0.001 |
| Early symptoms of HIV | 0.781 | 0 | <0.001 | Acyclovir | 0.620 | 0 | <0.001 |
| Thrush | 0.777 | 0 | <0.001 | AIDS examination | 0.615 | 0 | <0.001 |
| HIV | 0.760 | 0 | <0.001 | AIDS virus | 0.585 | 0 | <0.001 |
| Sexual services | 0.746 | 0 | <0.001 | Prostitute | 0.580 | 0 | <0.001 |
| AIDS window period | 0.702 | 0 | <0.001 | AIDS-testing | 0.578 | 0 | <0.001 |
| Spread of AIDS | 0.678 | 0 | <0.001 | AIDS test strip | 0.568 | 0 | <0.001 |
| Street walker | 0.664 | 0 | <0.001 | AIDS | 0.533 | 0 | <0.001 |
| How to check AIDS | 0.657 | 0 | <0.001 |
CCF: Cross-Correlation Function.
Figure 2Time series of Baidu Composite Index in China from 2011 to 2016. This figure displays the three-dimensional changes in the year and month timescales of Baidu Composite Index from 1 January, 2011 and 31 December, 2016. (The X-axis date interval is month; the Y-axis time interval is year; the Z-axis is the national Baidu Composite Index (Baidu CI).
Figure 3Comparisons of HIV/AIDS report cases and the five types of keywords in different provinces from 2011 to 2016. The column diagram shows the total number of HIV/AIDS report cases for six provinces; the line graph represents the five types of keywords total search index in each province.
Figure 4Search intensity and annual case counts. This figure describes the changes in annual case counts and the Web users Search intensity in different provinces from 2011 to 2016. The line charts represent the annual HIV/AIDS case counts (black), and Baidu Search intensity (gray) for all of the six provinces. Pcc: Pearson Correlation Coefficient.
Characteristics of ARIMAX models: coefficients, standard errors, P value for coefficients and Ljung-Box test of residuals, MAPE, AIC.
| Model | Variable | Parameter | Lag | Coefficients | Standard error | Ljung-Box test | AIC | MAPE | |
|---|---|---|---|---|---|---|---|---|---|
| Model 1 | ARIMA | MA | 1 | 0.939 | 0.117 | <0.0001 | 0.269 | 1184.78 | 7.57% |
| MA | 2 | −0.283 | 0.117 | 0.0185 | |||||
| SAR | 12 | 0.779 | 0.098 | <0.0001 | |||||
| Model 2 | ARIMA + Baidu CI | AR | 1 | −0.644 | 0.114 | <0.0001 | 0.155 | 839.42 | 6.11% |
| SMA | 12 | −0.100 | 0.292 | 0.0013 |
ARIMA: autoregressive integrated moving average model, ARIMAX: ARIMA with external variables, AIC: Akaike information criterion, MAPE: mean absolute percentage error, MA: moving average, SAR: seasonal autoregressive, SMA: seasonal moving average.
Figure 5Autocorrelation check of residuals for the model, and the Interrelationships diagram of input sequence and output sequence. The X-axis gives the number of lags in weeks, the Y-axis is the value of the correlation coefficient, and the gray zone illustrate 95% confidence interval.