John Paparrizos1, Ryen W White2, Eric Horvitz1. 1. Columbia University, New York, NY; and Microsoft Research, Redmond, WA. 2. Columbia University, New York, NY; and Microsoft Research, Redmond, WA ryenw@microsoft.com.
Abstract
INTRODUCTION: People's online activities can yield clues about their emerging health conditions. We performed an intensive study to explore the feasibility of using anonymized Web query logs to screen for the emergence of pancreatic adenocarcinoma. The methods used statistical analyses of large-scale anonymized search logs considering the symptom queries from millions of people, with the potential application of warning individual searchers about the value of seeking attention from health care professionals. METHODS: We identified searchers in logs of online search activity who issued special queries that are suggestive of a recent diagnosis of pancreatic adenocarcinoma. We then went back many months before these landmark queries were made, to examine patterns of symptoms, which were expressed as searches about concerning symptoms. We built statistical classifiers that predicted the future appearance of the landmark queries based on patterns of signals seen in search logs. RESULTS: We found that signals about patterns of queries in search logs can predict the future appearance of queries that are highly suggestive of a diagnosis of pancreatic adenocarcinoma. We showed specifically that we can identify 5% to 15% of cases, while preserving extremely low false-positive rates (0.00001 to 0.0001). CONCLUSION: Signals in search logs show the possibilities of predicting a forthcoming diagnosis of pancreatic adenocarcinoma from combinations of subtle temporal signals revealed in the queries of searchers.
INTRODUCTION:People's online activities can yield clues about their emerging health conditions. We performed an intensive study to explore the feasibility of using anonymized Web query logs to screen for the emergence of pancreatic adenocarcinoma. The methods used statistical analyses of large-scale anonymized search logs considering the symptom queries from millions of people, with the potential application of warning individual searchers about the value of seeking attention from health care professionals. METHODS: We identified searchers in logs of online search activity who issued special queries that are suggestive of a recent diagnosis of pancreatic adenocarcinoma. We then went back many months before these landmark queries were made, to examine patterns of symptoms, which were expressed as searches about concerning symptoms. We built statistical classifiers that predicted the future appearance of the landmark queries based on patterns of signals seen in search logs. RESULTS: We found that signals about patterns of queries in search logs can predict the future appearance of queries that are highly suggestive of a diagnosis of pancreatic adenocarcinoma. We showed specifically that we can identify 5% to 15% of cases, while preserving extremely low false-positive rates (0.00001 to 0.0001). CONCLUSION: Signals in search logs show the possibilities of predicting a forthcoming diagnosis of pancreatic adenocarcinoma from combinations of subtle temporal signals revealed in the queries of searchers.
Authors: Charles A Phillips; Alaina Hunt; Mikaela Salvesen-Quinn; Jorge Guerra; Marilyn M Schapira; L Charles Bailey; Raina M Merchant Journal: Pediatr Blood Cancer Date: 2019-05-09 Impact factor: 3.167
Authors: Heather S L Jim; Aasha I Hoogland; Naomi C Brownstein; Anna Barata; Adam P Dicker; Hans Knoop; Brian D Gonzalez; Randa Perkins; Dana Rollison; Scott M Gilbert; Ronica Nanda; Anders Berglund; Ross Mitchell; Peter A S Johnstone Journal: CA Cancer J Clin Date: 2020-04-20 Impact factor: 508.702
Authors: Reid Priedhorsky; Dave Osthus; Ashlynn R Daughton; Kelly R Moran; Nicholas Generous; Geoffrey Fairchild; Alina Deshpande; Sara Y Del Valle Journal: CSCW Conf Comput Support Coop Work Date: 2017 Feb-Mar
Authors: Vibhu Agarwal; Liangliang Zhang; Josh Zhu; Shiyuan Fang; Tim Cheng; Chloe Hong; Nigam H Shah Journal: J Med Internet Res Date: 2016-09-21 Impact factor: 5.428