Sean D Young1,2, Elizabeth A Torrone3, John Urata2, Sevgi O Aral3. 1. From the University of California Institute for Prediction Technology, University of California, Los Angeles, CA. 2. Department of Family Medicine, University of California, Los Angeles, CA. 3. Division of STD Prevention, Centers for Disease Control and Prevention, Atlanta, GA.
Abstract
BACKGROUND: Researchers have suggested that social media and online search data might be used to monitor and predict syphilis and other sexually transmitted diseases. Because people at risk for syphilis might seek sexual health and risk-related information on the internet, we investigated associations between internet state-level search query data (e.g., Google Trends) and reported weekly syphilis cases. METHODS: We obtained weekly counts of reported primary and secondary syphilis for 50 states from 2012 to 2014 from the US Centers for Disease Control and Prevention. We collected weekly internet search query data regarding 25 risk-related keywords from 2012 to 2014 for 50 states using Google Trends. We joined 155 weeks of Google Trends data with 1-week lag to weekly syphilis data for a total of 7750 data points. Using the least absolute shrinkage and selection operator, we trained three linear mixed models on the first 10 weeks of each year. We validated models for 2012 and 2014 for the following 52 weeks and the 2014 model for the following 42 weeks. RESULTS: The models, consisting of different sets of keyword predictors for each year, accurately predicted 144 weeks of primary and secondary syphilis counts for each state, with an overall average R of 0.9 and overall average root mean squared error of 4.9. CONCLUSIONS: We used Google Trends search data from the prior week to predict cases of syphilis in the following weeks for each state. Further research could explore how search data could be integrated into public health monitoring systems.
BACKGROUND: Researchers have suggested that social media and online search data might be used to monitor and predict syphilis and other sexually transmitted diseases. Because people at risk for syphilis might seek sexual health and risk-related information on the internet, we investigated associations between internet state-level search query data (e.g., Google Trends) and reported weekly syphilis cases. METHODS: We obtained weekly counts of reported primary and secondary syphilis for 50 states from 2012 to 2014 from the US Centers for Disease Control and Prevention. We collected weekly internet search query data regarding 25 risk-related keywords from 2012 to 2014 for 50 states using Google Trends. We joined 155 weeks of Google Trends data with 1-week lag to weekly syphilis data for a total of 7750 data points. Using the least absolute shrinkage and selection operator, we trained three linear mixed models on the first 10 weeks of each year. We validated models for 2012 and 2014 for the following 52 weeks and the 2014 model for the following 42 weeks. RESULTS: The models, consisting of different sets of keyword predictors for each year, accurately predicted 144 weeks of primary and secondary syphilis counts for each state, with an overall average R of 0.9 and overall average root mean squared error of 4.9. CONCLUSIONS: We used Google Trends search data from the prior week to predict cases of syphilis in the following weeks for each state. Further research could explore how search data could be integrated into public health monitoring systems.
Authors: Rishi Desai; Benjamin A Lopman; Yair Shimshoni; John P Harris; Manish M Patel; Umesh D Parashar Journal: Clin Infect Dis Date: 2012-03-14 Impact factor: 9.079
Authors: James B Weaver; Darren Mays; Stephanie Sargent Weaver; Gary L Hopkins; Dogan Eroglu; Jay M Bernhardt Journal: Am J Public Health Date: 2010-06-17 Impact factor: 9.308
Authors: Jeremy Ginsberg; Matthew H Mohebbi; Rajan S Patel; Lynnette Brammer; Mark S Smolinski; Larry Brilliant Journal: Nature Date: 2009-02-19 Impact factor: 49.962
Authors: Mauricio Santillana; Elaine O Nsoesie; Sumiko R Mekaru; David Scales; John S Brownstein Journal: Clin Infect Dis Date: 2014-08-12 Impact factor: 9.079
Authors: Gabriel J Milinovich; Simon M R Avril; Archie C A Clements; John S Brownstein; Shilu Tong; Wenbiao Hu Journal: BMC Infect Dis Date: 2014-12-31 Impact factor: 3.090
Authors: Shea M Lemley; Jeffrey D Klausner; Sean D Young; Chrysovalantis Stafylis; Caroline Mulatya; Neal Oden; Haiyi Xie; Leslie Revoredo; Dikla Shmueli-Blumberg; Emily Hichborn; Erin McKelle; Landhing Moran; Petra Jacobs; Lisa A Marsch Journal: JMIR Res Protoc Date: 2020-10-19
Authors: Sean D Young; Qingpeng Zhang; Daniel Dajun Zeng; Yongcheng Zhan; William Cumberland Journal: J Med Internet Res Date: 2022-03-03 Impact factor: 7.076