Literature DB >> 21134784

An exploratory study of a text classification framework for Internet-based surveillance of emerging epidemics.

Manabu Torii1, Lanlan Yin, Thang Nguyen, Chand T Mazumdar, Hongfang Liu, David M Hartley, Noele P Nelson.   

Abstract

PURPOSE: Early detection of infectious disease outbreaks is crucial to protecting the public health of a society. Online news articles provide timely information on disease outbreaks worldwide. In this study, we investigated automated detection of articles relevant to disease outbreaks using machine learning classifiers. In a real-life setting, it is expensive to prepare a training data set for classifiers, which usually consists of manually labeled relevant and irrelevant articles. To mitigate this challenge, we examined the use of randomly sampled unlabeled articles as well as labeled relevant articles.
METHODS: Naïve Bayes and Support Vector Machine (SVM) classifiers were trained on 149 relevant and 149 or more randomly sampled unlabeled articles. Diverse classifiers were trained by varying the number of sampled unlabeled articles and also the number of word features. The trained classifiers were applied to 15 thousand articles published over 15 days. Top-ranked articles from each classifier were pooled and the resulting set of 1337 articles was reviewed by an expert analyst to evaluate the classifiers.
RESULTS: Daily averages of areas under ROC curves (AUCs) over the 15-day evaluation period were 0.841 and 0.836, respectively, for the naïve Bayes and SVM classifier. We referenced a database of disease outbreak reports to confirm that this evaluation data set resulted from the pooling method indeed covered incidents recorded in the database during the evaluation period.
CONCLUSIONS: The proposed text classification framework utilizing randomly sampled unlabeled articles can facilitate a cost-effective approach to training machine learning classifiers in a real-life Internet-based biosurveillance project. We plan to examine this framework further using larger data sets and using articles in non-English languages.
Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.

Entities:  

Mesh:

Year:  2010        PMID: 21134784      PMCID: PMC3904285          DOI: 10.1016/j.ijmedinf.2010.10.015

Source DB:  PubMed          Journal:  Int J Med Inform        ISSN: 1386-5056            Impact factor:   4.046


  13 in total

1.  Estimating the support of a high-dimensional distribution.

Authors:  B Schölkopf; J C Platt; J Shawe-Taylor; A J Smola; R C Williamson
Journal:  Neural Comput       Date:  2001-07       Impact factor: 2.026

2.  Event-based biosurveillance of respiratory disease in Mexico, 2007-2009: connection to the 2009 influenza A(H1N1) pandemic?

Authors:  N P Nelson; J S Brownstein; D M Hartley
Journal:  Euro Surveill       Date:  2010-07-29

3.  Document classification for mining host pathogen protein-protein interactions.

Authors:  Lanlan Yin; Guixian Xu; Manabu Torii; Zhendong Niu; Jose M Maisog; Cathy Wu; Zhangzhi Hu; Hongfang Liu
Journal:  Artif Intell Med       Date:  2010-05-15       Impact factor: 5.326

4.  The surveillance of communicable diseases in the European Union--a long-term strategy (2008-2013).

Authors:  A Amato-Gauci; A Ammon
Journal:  Euro Surveill       Date:  2008-06-26

5.  PreBIND and Textomy--mining the biomedical literature for protein-protein interactions using a support vector machine.

Authors:  Ian Donaldson; Joel Martin; Berry de Bruijn; Cheryl Wolting; Vicki Lay; Brigitte Tuekam; Shudong Zhang; Berivan Baskin; Gary D Bader; Katerina Michalickova; Tony Pawson; Christopher W V Hogue
Journal:  BMC Bioinformatics       Date:  2003-03-27       Impact factor: 3.169

6.  Landscape of international event-based biosurveillance.

Authors:  Dm Hartley; Np Nelson; R Walters; R Arthur; R Yangarber; L Madoff; Jp Linge; A Mawudeku; N Collier; Js Brownstein; G Thinus; N Lightfoot
Journal:  Emerg Health Threats J       Date:  2010-02-19

Review 7.  Use of unstructured event-based reports for global infectious disease surveillance.

Authors:  Mikaela Keller; Michael Blench; Herman Tolentino; Clark C Freifeld; Kenneth D Mandl; Abla Mawudeku; Gunther Eysenbach; John S Brownstein
Journal:  Emerg Infect Dis       Date:  2009-05       Impact factor: 6.883

8.  Surveillance Sans Frontières: Internet-based emerging infectious disease intelligence and the HealthMap project.

Authors:  John S Brownstein; Clark C Freifeld; Ben Y Reis; Kenneth D Mandl
Journal:  PLoS Med       Date:  2008-07-08       Impact factor: 11.069

9.  A heuristic indication and warning staging model for detection and assessment of biological events.

Authors:  James M Wilson; Marat G Polyak; Jane W Blake; Jeff Collmann
Journal:  J Am Med Inform Assoc       Date:  2007-12-20       Impact factor: 4.497

10.  Fever detection from free-text clinical records for biosurveillance.

Authors:  Wendy W Chapman; John N Dowling; Michael M Wagner
Journal:  J Biomed Inform       Date:  2004-04       Impact factor: 6.317

View more
  13 in total

1.  International society for disease surveillance conference 2011: building the future of public health surveillance.

Authors:  Daniel B Neill; Karl A Soetebier
Journal:  Emerg Health Threats J       Date:  2011-12-06

Review 2.  Uncovering text mining: a survey of current work on web-based epidemic intelligence.

Authors:  Nigel Collier
Journal:  Glob Public Health       Date:  2012-07-11

Review 3.  A review of evaluations of electronic event-based biosurveillance systems.

Authors:  Kimberly N Gajewski; Amy E Peterson; Rohit A Chitale; Julie A Pavlin; Kevin L Russell; Jean-Paul Chretien
Journal:  PLoS One       Date:  2014-10-20       Impact factor: 3.240

4.  Coughing, sneezing, and aching online: Twitter and the volume of influenza-like illness in a pediatric hospital.

Authors:  David M Hartley; Courtney M Giannini; Stephanie Wilson; Ophir Frieder; Peter A Margolis; Uma R Kotagal; Denise L White; Beverly L Connelly; Derek S Wheeler; Dawit G Tadesse; Maurizio Macaluso
Journal:  PLoS One       Date:  2017-07-28       Impact factor: 3.240

5.  Web monitoring of emerging animal infectious diseases integrated in the French Animal Health Epidemic Intelligence System.

Authors:  Elena Arsevska; Sarah Valentin; Julien Rabatel; Jocelyn de Goër de Hervé; Sylvain Falala; Renaud Lancelot; Mathieu Roche
Journal:  PLoS One       Date:  2018-08-03       Impact factor: 3.240

Review 6.  Global mapping of infectious disease.

Authors:  Simon I Hay; Katherine E Battle; David M Pigott; David L Smith; Catherine L Moyes; Samir Bhatt; John S Brownstein; Nigel Collier; Monica F Myers; Dylan B George; Peter W Gething
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2013-02-04       Impact factor: 6.237

7.  Evaluation of epidemic intelligence systems integrated in the early alerting and reporting project for the detection of A/H5N1 influenza events.

Authors:  Philippe Barboza; Laetitia Vaillant; Abla Mawudeku; Noele P Nelson; David M Hartley; Lawrence C Madoff; Jens P Linge; Nigel Collier; John S Brownstein; Roman Yangarber; Pascal Astagneau
Journal:  PLoS One       Date:  2013-03-05       Impact factor: 3.240

8.  Use of media and public-domain Internet sources for detection and assessment of plant health threats.

Authors:  Carla S Thomas; Noele P Nelson; Gary C Jahn; Tianchan Niu; David M Hartley
Journal:  Emerg Health Threats J       Date:  2011-09-05

9.  Discovering Multi-Scale Co-Occurrence Patterns of Asthma and Influenza with Oak Ridge Bio-Surveillance Toolkit.

Authors:  Arvind Ramanathan; Laura L Pullum; Tanner C Hobson; Christopher G Stahl; Chad A Steed; Shannon P Quinn; Chakra S Chennubhotla; Silvia Valkova
Journal:  Front Public Health       Date:  2015-08-03

10.  Automatic Annotation of Narrative Radiology Reports.

Authors:  Ivan Krsnik; Goran Glavaš; Marina Krsnik; Damir Miletić; Ivan Štajduhar
Journal:  Diagnostics (Basel)       Date:  2020-04-01
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.