Literature DB >> 26218589

Early detection of an epidemic erythromelalgia outbreak using Baidu search data.

Yuzhou Gu1, Fengling Chen2, Tao Liu1, Xiaojuan Lv1, Zhaoming Shao2, Hualiang Lin1, Chaobin Liang2, Weilin Zeng1, Jianpeng Xiao1, Yonghui Zhang3, Cunrui Huang4, Shannon Rutherford5, Wenjun Ma1.   

Abstract

Dozens of epidemic erythromelalgia (EM) outbreaks have been reported in China since the mid-twentieth century, and the most recent happened in Foshan City, Guangdong Province early 2014. This study compared the daily case counts of this recent epidemic EM outbreak from February 11 to March 3 with Baidu search data for the same period. After keyword selection, filtering and composition, the most correlated lag of the EM Search Index was used for comparison and linear regression model development. This study also explored the spatial distribution of epidemic EM in China during this period based on EM Search Index. The EM Search Index at lag 2 was most significantly associated with daily case counts in Foshan (ρ = 0.863, P < 0.001). It captured an upward trend in the outbreak about one week ahead of official report and the linear regression analysis indicated that every 1.071 increase in the EM Search Index reflected a rise of 1 EM cases 2 days earlier. The spatial analysis found that the number of EM Search Indexes increased in the middle of Guangdong Province and South China during the outbreak period. The EM Search Index may be a good early indicator of an epidemic EM outbreak.

Entities:  

Mesh:

Year:  2015        PMID: 26218589      PMCID: PMC4517510          DOI: 10.1038/srep12649

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.379


Erythromelalgia (EM) is a clinical syndrome characterized by a triad of erythema, burning pain and increased temperature of feet or hands or both. This syndrome is rare in the western world12, and there are no outbreak reports in developed countries. More than 70 epidemic EM outbreaks and over 80,000 cases have been reported in Chinese literature since the mid-twentieth century. For example, around 10,000 and 19,000 cases were observed in Hunan and Hubei Province respectively during a serious outbreak in 1987, and the number of cases in Fujian Province and Hainan Province were more than 10,000 and 11,000 in 1990 outbreak, respectively3456789. Although the onset of epidemic EM is acute, its clinical symptoms are not very serious and usually disappear within a few days7810. As China has not developed any traditional disease surveillance to monitor this syndrome, the real situation of epidemic EM outbreaks remains unknown in China. Most epidemic EM outbreaks in China have been reported between February and March, coinciding with a V-shaped temperature change: namely a sharp temperature decline followed by a rapid temperature rise within a few days. Previous studies have hypothesized that these large temperature fluctuations that occur in South China are associated with epidemic EM outbreaks67811121314. Furthermore, Liu et al.14 recently found that one degree Celsius increment of daily temperature might trigger an average rise of 1.22 EM cases in epidemic EM outbreak. During February 2014, the temperature in Foshan City of Guangdong Province experienced a very large temperature fluctuation, accompanied by an epidemic EM outbreak in two high schools. The cases of this outbreak were characterized by burning pain and numbness in toes and feet. As most cases were mild, and epidemic EM is not a notifiable disease in China, it is not clear whether unreported cases occurred elsewhere in Guangdong or China during this period. The availability and popularity of the Internet has grown greatly in recent years. As at December, 2013, there were 618 million Internet users in China, accounting for about 45.8% of the national population, and the proportion in Guangdong Province was even higher15. At the same time, an increasing number of people, including patients and their family members, are inclined to search online for health information before seeking medical service1617, making it possible to monitor the health status of the population by tracking changes in frequencies of specific search keywords. Internet search engines are now the most common tool to obtain information for Internet users1819, and data from different search engines have been successfully utilized for early detection of diseases such as influenza and dengue171820212223242526. Such studies suggest that Internet search data-base surveillance might be a novel way to monitor epidemic EM outbreaks in near real-time. Baidu is the most popular search engine in China, with 86.7% of Internet users preferring it19. This wide use makes it the most representative for analyzing Chinese online behavior26. Further, the search volume of Baidu users are released daily on Baidu’s Index website (http://index.baidu.com), which allows for timely capture in the changes in search keywords. Although more and more studies are investigating the relationship between search data and some infectious diseases, no study has yet focused on epidemic EM. Due to the lack of a traditional surveillance system, the Internet surveillance approach for early detection of epidemic EM outbreaks is a promising one. The present study compared Baidu search data with case counts reported in the Foshan outbreak during the same period, in order to identify whether there was an association between epidemic EM and Internet search behavior, and develop an Internet search data-based surveillance method which would be useful to detect an outbreak of epidemic EM in the future.

Materials and Methods

Data sources

Outbreak data

This study used daily case counts over the entire 21 day outbreak period from February 11 to March 3, 2014 in Foshan City of Guangdong Province, China. The definition of a case is that a student in an outbreak high school reported an onset of pain, redness or numbness in toes or feet with no obvious cause after February 10, 2014. We eliminated those cases induced by injury. The first case was reported on February 27 and 494 cases were retrospectively confirmed by epidemiologists and clinical experts after a systematic field investigation. Daily case counts are shown in Table 1.
Table 1

Daily EM case counts during the outbreak period in Foshan City.

DateCase countDateCase countDateCase count
2014-02-1112014-02-18172014-02-2565
2014-02-1202014-02-1992014-02-2671
2014-02-1312014-02-20262014-02-2765
2014-02-1452014-02-21212014-02-2837
2014-02-1542014-02-22142014-03-014
2014-02-1642014-02-23502014-03-023
2014-02-17152014-02-24802014-03-032

Baidu search data

The Baidu index website (http://index.baidu.com) contains search volumes for numerous keywords keyed in by Baidu users from June 2006. Data are available on a daily basis, at a city, province and national level. Considering the time lags between symptoms onset and online searching, we collected the data for 24 days from February 11 to March 6, 2014. The search volume for the same period in 2013 was collected for comparison.

Meteorological data

Due to the hypothesis of an association between epidemic EM and large temperature fluctuation we collected the daily maximum temperature in Foshan City from February 6 to March 3, 2014 (a total of 26 days), which contains the entire period of a large temperature change. Meteorological data were obtained from a free weather query website in Chinese (http://www.tianqihoubao.com). No ethics committee approval or written consent from patients were required to obtain since only daily count data was obtained, and no information about the identity of any case was revealed.

Keyword selection and filtering

Keyword selection is the critical issue in Internet search data-based surveillance, as it directly affects the ability and detective accuracy of the surveillance method. Different people may type in entirely different words when searching the same information, especially when searching in Chinese language, where one meaning can be expressed in several ways. Consequently, diverse results can be obtained by selecting different keywords. Despite the significance of this, there are no principles or standards for guidance182627. Previous studies generally chose the names or clinical symptoms of target diseases as their core keywords22232526. As EM is a little-known disease within the lay Chinese community, insufficient search volume of this word leads to Baidu’s failure in calculating its search information. Therefore, we chose primary keywords which represent the major clinical characteristics of cases in 2014 and previous outbreaks (see Supplementary Table S1). A Chinese website (http://tool.chinaz.com/baidu/words.aspx) was used for further obtaining related keywords. Related key-word recommendations in the website not only include suggestions from Baidu, but also mining from portal websites, blogs, and online reports using semantic correlation analysis26. Upon typing in the 19 primary terms respectively, we obtained 62 related keywords (see Supplementary Table S1). However, more keywords do not necessarily lead to a better result1728 since some recommended keywords are not closely related to EM, which could reduce the detective ability of the surveillance system. Hence, we collected the search data in Foshan City from Baidu and filtered keywords following two steps: We eliminated the words irrelevant to EM and those with a search volume of zero during the outbreak period, and 32 keywords remained (see Supplementary Table S1). Spearman’s rank correlation coefficients (ρ) were then calculated between daily case counts and daily search volumes for each keyword using different time lags. We deleted the words with maximum correlation coefficients less than 0.4 in each time lag and those correlations that were statistically insignificant. Taking into account the remaining number, as well as strength of the correlation of keywords that met the criteria above, we considered time lags of 0 to 3 days. The remaining keywords for each of the four time lags were 14, 15, 17 and 17, respectively (Table 2).
Table 2

Keywords under time lags of 0 to 3 days after second step filtering.

Baidu users search information in Chinese and the corresponding translation of each Chinese keywords are listed.

EM Search Index composition

Following selection and filtering, the remaining keywords were used for composition of an EM Search Index for each time lag. Weights of keywords were defined by the strength of the correlation coefficient2627. The weights calculation and EM Search Index composition formulae are as follows:In the above formulae, l denotes time lag of the search data, n is the number of keywords at each time lag, keyword and weight represent the ith keyword and the weight of it.

Epidemic EM outbreak detection

In order to compare the epidemic situation with temperature change, we first graphed a line figure to depict the relationship between daily maximum temperature and daily case counts. Spearman’s rank correlation coefficients were then calculated between outbreak data and the EM Search Index for each time lag of 0 to 3 days. The time lag that has the largest coefficient was selected for further analysis. Based on this, we further developed a linear regression model as follows:EM Search Index denotes the lag EM Search Index with the largest correlation, β1 as the regression coefficient. The model estimates the case count l days before, based on the Baidu search data for the current day. Though the temperature fluctuation between February and March 2014 was widespread in South China, there were no other reports of epidemic EM outbreaks from other cities or provinces. Therefore, it was not clear whether EM cases occurred elsewhere during this period. We calculated the EM Search Index from February 11 to March 3, 2014 for Guangdong Province and 33 other provinces/municipalities of China. By plotting these data on maps, we aimed to roughly explore whether similar outbreaks of epidemic EM occurred in other parts of China during this period. In order to understand the influence of regional difference, Internet search data from February 11 to March 3 2013 were collected for comparison. All analyses were performed using SPSS 19.0, and the maps were plotted with ArcGIS 9.3 (ESRI).

Results

A large fluctuation in temperature in Foshan City between February and March in 2014 was observed. Daily maximum temperature suddenly dropped about 10 °C on February 8 and continuously declined to the lowest (6 °C) on February 13, then slowly returned to a relatively high level afterwards (Fig. 1). When temperature began to increase, EM cases occurred (Fig. 1). Spearman’s rank correlation coefficient analysis showed that the daily case counts were positively associated with daily maximum temperature during the temperature increase (ρ = 0.650, P = 0.001).
Figure 1

Daily case counts and daily maximum temperature.

This figure displays the pattern of temperature change in Foshan City between February and March 2014 and provides the trend in daily EM case counts within this period.

EM Search Indexes for time lags of 0 to 3 days were composed of 14, 15, 17 and 17 keywords respectively (Table 2), and the correlation coefficients between EM Search Indexes and outbreak data are listed in Table 3. We found the correlation getting closer with the increase of lag days before reaching a peak at lag 2 (ρ = 0.863, P < 0.001). Therefore, EM Search Index at lag 2 was chosen for further analysis.
Table 3

Correlation between outbreak data and EM Search Index (lags of 0 to 3 days).

 Lag 0Lag 1Lag 2Lag 3
Spearman’s rank correlation coefficient0.747*0.846*0.863*0.843*

*P < 0.0001.

We then graphed the curves of daily case counts and EM Search Index at lag 2 over the outbreak period (Fig. 2). Obviously, the search data accurately captured the change in daily case counts. Particularly, we found an apparent increase in EM cases after February 20, followed by a similar uptrend of search volume after February 21. However, the first case hadn’t been reported until February 27 by the local Center for Disease Control and Prevention, which suggested that, although there was a lag between the EM outbreak and the EM Search Index, the EM Search Index still had the ability to detect the epidemic about 1 week before the outbreak was reported.
Figure 2

EM Search Index and daily case counts.

This figure describes the changes in daily EM case counts and the EM Search Index at lag 2 during the outbreak period (February 11–March 3) for Foshan City. The report date of outbreak is clearly indicated.

The coefficient (β1) for the linear regression model between outbreak data and the EM Search Index was 0.934 (P < 0.001), indicating that during the outbreak period, every 1.071 increase in EM Search Index reflected a rise of 1 case 2 days before. The R2 was 0.83, suggesting that the Search Index could explain 83% of the variation in daily case counts. The EM Search Index from February 11 to March 3, 2014 for each city in Guangdong Province and 34 provinces/municipalities in China were plotted on maps, in contrast with the same period of 2013 (Figs 3 and 4). As demonstrated in Fig. 3, most cities of Guangdong Province showed low search frequencies in 2013, but a much higher EM Search Index was observed in Guangzhou, Foshan and Shenzhen in 2014. South China and East China showed relatively high EM Search Index in 2014, with the highest in Guangdong. In contrast, no region showed a high EM Search Index during the same period in 2013 (Fig. 4).
Figure 3

The spatial distribution of EM Search Index in Guangdong Province, China.

This figure depicts the spatial distribution of EM Search Index counts across Guangdong Province during the outbreak period in Foshan City in 2014 by filling different colour depth for the cities through ArcGIS 9.3 (ESRI). Distribution of the same period in 2013 was plotted for comparison.

Figure 4

The spatial distribution of EM Search Index in China.

This figure depicts the spatial distribution of EM Search Index counts across China during the outbreak period in Foshan City in 2014 by filling different colour depth for the provinces/municipalities through ArcGIS 9.3 (ESRI). Distribution of the same period in 2013 was plotted for comparison.

Discussion

Since Eysenbach et al.20 set the important precedent for disease surveillance using Internet search data, there have been more and more studies on this topic. Most existing studies have focused on infectious diseases such as influenza and dengue fever17212223242526. This study is the first that has investigated the application of Internet search data in the early detection of outbreaks of epidemic EM. In this study, we compared Baidu search index counts and daily case counts of a recent epidemic EM outbreak in Foshan City, Guangdong Province, China, and found that the EM Search Index at 2 lag days was significantly associated with an EM outbreak, and every 1.071 increase in EM Search Index might reflect a rise of 1 case 2 days before. These findings indicate that the onset of EM symptoms were associated with an increase in Internet search behavior for keywords relating to the illness after 2 days. Even though a 2 days lag was identified, the EM Search Index captured the sharp uptrend of daily case counts about a week ahead of the official report because of the delayed reports from the local Center for Disease Control and Prevention, This suggests that EM Search Index may be a good predictor for early detection of epidemic EM outbreaks. Due to little attention to EM by the public, no mass media reported EM during this outbreak, which adds weight to the utility of Internet search data and how it reflects individual’s health concerns and issues29. However, epidemic EM have mainly occurred in students living in schools, which makes our results useful for extrapolating to a similar population rather than general population14. EM is little known within the Chinese ordinary people and insufficient knowledge might result in more pain and more panic during an epidemic EM outbreak. An early detection system could help to facilitate the timely treatment of cases and ease public concerns about the health symptoms. Previous studies reported the phenomenon of epidemic EM outbreaks accompanied by a large temperature fluctuation67811121314, and this is confirmed by the outbreak that happened in Foshan in February and March, 2014. Certainly, conducting EM surveillance in schools, communities or hospitals during large temperature fluctuations is a direct way to detect EM outbreaks. However, it may be more cost-effective to monitor the changes of temperature and EM Search Index simultaneously. For example, coinciding with the sharp drop in ambient temperature followed by a temperature rise within a short period, we observed an obviously increasing trend of EM Search Index, which could be a strong signal for the occurrence of an EM outbreak. Therefore, Internet search data provides an opportunity for government or the public to early detect epidemic EM outbreaks and consequently take measures in time. According to the Chinese literature, most epidemic EM outbreaks have coincided with large temperature changes in many provinces of South China345689303132333435. For example, the outbreak in 1987, which affected Hubei, Henan, Hunan, Jiangxi and Zhejiang Province of South China45673132, and up to six provinces including Fujian, Anhui, Guangdong, Guangxi, Guizhou and Hainan of South China were involved in the outbreak34893033343536. Thus it is possible that the EM outbreak might not only limited in Foshan City in 2014 because ambient temperature fluctuation could be observed in many parts of South China between February and March. Therefore, we tried to retrospectively explore spatial distribution of epidemic EM using EM Search Index. Our results showed that cities in the middle of Guangdong Province and some provinces in South China had relative high search frequencies on symptoms of EM during the outbreak period. Some of the cities or provinces with high EM Search Index have ever occurred one or even more than one epidemic EM outbreaks in previous studies, such as Guangzhou1336, Shenzhen37, Zhongshan38 of Guangdong Province and Zhejiang31, Jiangsu39, Henan4, Fujian39, Hebei40, Hubei46 and Hunan4512. On the other hand, the whole country showed a low search frequency during the same period in 2013, when there was no large temperature fluctuation, suggesting that there was a real epidemic during this time. From our findings, we speculate that these cities or provinces with greater EM Search Index counts might have experienced epidemic EM outbreaks during this period in 2014, when the temperature experienced a large fluctuation. There are some limitations of this current study. First of all, Baidu doesn’t release the search data of keywords without sufficient search volume, which might result in an underestimation of correlation. Additionally, although the selected keywords captured the trend of outbreak data very well, there still may be some omission due to the diversity of online search habits, and we haven’t got other data for model validation. Thirdly, a number of factors affect the individual search behavior thereby influencing the sustainability of our detection model202325. Also, Internet access is uneven throughout China, with the lowest provincial Internet penetration of 32.6% in Jiangxi Province and the highest (75.2%) in Beijing (Internet penetration of each province/municipality of mainland China in 2013 see Supplementary Table S2), and the population sizes of different regions are also different. Thus, the accuracy of comparison of actual search index counts between cities or provinces should be considered with caution. In conclusion, the EM Search Index using Baidu search term methodology may be a good indicator for early detection of an epidemic EM outbreak, especially when combined with temperature change monitoring.

Additional Information

How to cite this article: Gu, Y. et al. Early detection of an epidemic erythromelalgia outbreak using Baidu search data. Sci. Rep. 5, 12649; doi: 10.1038/srep12649 (2015).
  11 in total

1.  Infodemiology: tracking flu-related searches on the web for syndromic surveillance.

Authors:  Gunther Eysenbach
Journal:  AMIA Annu Symp Proc       Date:  2006

2.  Using internet searches for influenza surveillance.

Authors:  Philip M Polgreen; Yiling Chen; David M Pennock; Forrest D Nelson
Journal:  Clin Infect Dis       Date:  2008-12-01       Impact factor: 9.079

3.  Patients' use of the Internet for medical information.

Authors:  Joseph A Diaz; Rebecca A Griffith; James J Ng; Steven E Reinert; Peter D Friedmann; Anne W Moulton
Journal:  J Gen Intern Med       Date:  2002-03       Impact factor: 5.128

4.  Incidence of erythromelalgia: a population-based study in Olmsted County, Minnesota.

Authors:  K B Reed; M D P Davis
Journal:  J Eur Acad Dermatol Venereol       Date:  2008-08-18       Impact factor: 6.166

5.  Prediction of dengue incidence using search query surveillance.

Authors:  Benjamin M Althouse; Yih Yng Ng; Derek A T Cummings
Journal:  PLoS Negl Trop Dis       Date:  2011-08-02

6.  Internet search limitations and pandemic influenza, Singapore.

Authors:  Alex R Cook; Mark I C Chen; Raymond Tzer Pin Lin
Journal:  Emerg Infect Dis       Date:  2010-10       Impact factor: 6.883

7.  A large temperature fluctuation may trigger an epidemic erythromelalgia outbreak in China.

Authors:  Tao Liu; Yonghui Zhang; Hualiang Lin; Xiaojuan Lv; Jianpeng Xiao; Weilin Zeng; Yuzhou Gu; Shannon Rutherford; Shilu Tong; Wenjun Ma
Journal:  Sci Rep       Date:  2015-03-30       Impact factor: 4.379

8.  Web queries as a source for syndromic surveillance.

Authors:  Anette Hulth; Gustaf Rydevik; Annika Linde
Journal:  PLoS One       Date:  2009-02-06       Impact factor: 3.240

9.  Using Google Trends for influenza surveillance in South China.

Authors:  Min Kang; Haojie Zhong; Jianfeng He; Shannon Rutherford; Fen Yang
Journal:  PLoS One       Date:  2013-01-25       Impact factor: 3.240

10.  Monitoring influenza epidemics in china with search query from baidu.

Authors:  Qingyu Yuan; Elaine O Nsoesie; Benfu Lv; Geng Peng; Rumi Chunara; John S Brownstein
Journal:  PLoS One       Date:  2013-05-30       Impact factor: 3.240

View more
  20 in total

1.  Measuring Global Disease with Wikipedia: Success, Failure, and a Research Agenda.

Authors:  Reid Priedhorsky; Dave Osthus; Ashlynn R Daughton; Kelly R Moran; Nicholas Generous; Geoffrey Fairchild; Alina Deshpande; Sara Y Del Valle
Journal:  CSCW Conf Comput Support Coop Work       Date:  2017 Feb-Mar

2.  Tracking and predicting hand, foot, and mouth disease (HFMD) epidemics in China by Baidu queries.

Authors:  Q Y Xiao; H J Liu; M W Feldman
Journal:  Epidemiol Infect       Date:  2017-02-22       Impact factor: 4.434

3.  Using Baidu Search Index to Predict Dengue Outbreak in China.

Authors:  Kangkang Liu; Tao Wang; Zhicong Yang; Xiaodong Huang; Gabriel J Milinovich; Yi Lu; Qinlong Jing; Yao Xia; Zhengyang Zhao; Yang Yang; Shilu Tong; Wenbiao Hu; Jiahai Lu
Journal:  Sci Rep       Date:  2016-12-01       Impact factor: 4.379

4.  Temporal Topic Modeling to Assess Associations between News Trends and Infectious Disease Outbreaks.

Authors:  Saurav Ghosh; Prithwish Chakraborty; Elaine O Nsoesie; Emily Cohn; Sumiko R Mekaru; John S Brownstein; Naren Ramakrishnan
Journal:  Sci Rep       Date:  2017-01-19       Impact factor: 4.379

5.  Identifying Potential Norovirus Epidemics in China via Internet Surveillance.

Authors:  Kui Liu; Sichao Huang; Zi-Ping Miao; Bin Chen; Tao Jiang; Gaofeng Cai; Zhenggang Jiang; Yongdi Chen; Zhengting Wang; Hua Gu; Chengliang Chai; Jianmin Jiang
Journal:  J Med Internet Res       Date:  2017-08-08       Impact factor: 5.428

6.  Monitoring seasonal influenza epidemics by using internet search data with an ensemble penalized regression model.

Authors:  Pi Guo; Jianjun Zhang; Li Wang; Shaoyi Yang; Ganfeng Luo; Changyu Deng; Ye Wen; Qingying Zhang
Journal:  Sci Rep       Date:  2017-04-19       Impact factor: 4.379

7.  Dengue Baidu Search Index data can improve the prediction of local dengue epidemic: A case study in Guangzhou, China.

Authors:  Zhihao Li; Tao Liu; Guanghu Zhu; Hualiang Lin; Yonghui Zhang; Jianfeng He; Aiping Deng; Zhiqiang Peng; Jianpeng Xiao; Shannon Rutherford; Runsheng Xie; Weilin Zeng; Xing Li; Wenjun Ma
Journal:  PLoS Negl Trop Dis       Date:  2017-03-06

Review 8.  Erythromelalgia: a cutaneous manifestation of neuropathy?

Authors:  María Bibiana Leroux
Journal:  An Bras Dermatol       Date:  2018 Jan-Feb       Impact factor: 1.896

9.  Chinese Public Attention to the Outbreak of Ebola in West Africa: Evidence from the Online Big Data Platform.

Authors:  Kui Liu; Li Li; Tao Jiang; Bin Chen; Zhenggang Jiang; Zhengting Wang; Yongdi Chen; Jianmin Jiang; Hua Gu
Journal:  Int J Environ Res Public Health       Date:  2016-08-04       Impact factor: 3.390

10.  Forecasting influenza epidemics by integrating internet search queries and traditional surveillance data with the support vector machine regression model in Liaoning, from 2011 to 2015.

Authors:  Feng Liang; Peng Guan; Wei Wu; Desheng Huang
Journal:  PeerJ       Date:  2018-06-25       Impact factor: 2.984

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.