Hyekyung Woo1, Hyeon Sung Cho2, Eunyoung Shim1,3, Jong Koo Lee4, Kihwang Lee5, Gilyoung Song5, Youngtae Cho1. 1. Department of Public Health Science, School of Public Health, Seoul National University, Seoul, Korea. 2. Department of Intelligent Cognitive Technology Research, Electronics and Telecommunications Research Institute, Daejeon, Korea. 3. Department of New Business, Samsung Fire and Marine Insurance, Seoul, Korea. 4. College of Medicine, Seoul National University, Seoul, Korea. 5. Mining Laboratory, Daumsoft, Seoul, Korea.
Abstract
OBJECTIVE: Social media data are a highly contextual health information source. The objective of this study was to identify Korean keywords for detecting influenza epidemics from social media data. METHODS: We included data from Twitter and online blog posts to obtain a sufficient number of candidate indicators and to represent a larger proportion of the Korean population. We performed the following steps: initial keyword selection; generation of a keyword time series using a preprocessing approach; optimal feature selection; model building and validation using least absolute shrinkage and selection operator, support vector machine (SVM), and random forest regression (RFR). RESULTS: A total of 15 keywords optimally detected the influenza epidemic, evenly distributed across Twitter and blog data sources. Model estimates generated using our SVM model were highly correlated with recent influenza incidence data. CONCLUSIONS: The basic principles underpinning our approach could be applied to other countries, languages, infectious diseases, and social media sources. Social media monitoring using our approach may support and extend the capacity of traditional surveillance systems for detecting emerging influenza. (Disaster Med Public Health Preparedness. 2018; 12: 352-359).
OBJECTIVE: Social media data are a highly contextual health information source. The objective of this study was to identify Korean keywords for detecting influenza epidemics from social media data. METHODS: We included data from Twitter and online blog posts to obtain a sufficient number of candidate indicators and to represent a larger proportion of the Korean population. We performed the following steps: initial keyword selection; generation of a keyword time series using a preprocessing approach; optimal feature selection; model building and validation using least absolute shrinkage and selection operator, support vector machine (SVM), and random forest regression (RFR). RESULTS: A total of 15 keywords optimally detected the influenza epidemic, evenly distributed across Twitter and blog data sources. Model estimates generated using our SVM model were highly correlated with recent influenza incidence data. CONCLUSIONS: The basic principles underpinning our approach could be applied to other countries, languages, infectious diseases, and social media sources. Social media monitoring using our approach may support and extend the capacity of traditional surveillance systems for detecting emerging influenza. (Disaster Med Public Health Preparedness. 2018; 12: 352-359).
Entities:
Keywords:
Korea; epidemics; influenza; social media; surveillance
Authors: Rachel L Graves; Christopher Tufts; Zachary F Meisel; Dan Polsky; Lyle Ungar; Raina M Merchant Journal: Subst Use Misuse Date: 2018-04-16 Impact factor: 2.164