Literature DB >> 19447070

Classifying disease outbreak reports using n-grams and semantic features.

Mike Conway1, Son Doan, Ai Kawazoe, Nigel Collier.   

Abstract

INTRODUCTION: This paper explores the benefits of using n-grams and semantic features for the classification of disease outbreak reports, in the context of the BioCaster disease outbreak report text mining system. A novel feature of this work is the use of a general purpose semantic tagger - the USAS tagger - to generate features.
BACKGROUND: We outline the application context for this work (the BioCaster epidemiological text mining system), before going on to describe the experimental data used in our classification experiments (the 1000 document BioCaster corpus). FEATURE SETS: Three broad groups of features are used in this work: Named Entity based features, n-gram features, and features derived from the USAS semantic tagger.
METHODOLOGY: Three standard machine learning algorithms - Naïve Bayes, the Support Vector Machine algorithm, and the C4.5 decision tree algorithm - were used for classifying experimental data (that is, the BioCaster corpus). Feature selection was performed using the chi(2) feature selection algorithm. Standard text classification performance metrics - Accuracy, Precision, Recall, Specificity and F-score - are reported.
RESULTS: A feature representation composed of unigrams, bigrams, trigrams and features derived from a semantic tagger, in conjunction with the Naïve Bayes algorithm and feature selection yielded the highest classification accuracy (and F-score). This result was statistically significant compared to a baseline unigram representation and to previous work on the same task. However, it was feature selection rather than semantic tagging that contributed most to the improved performance.
CONCLUSION: This study has shown that for the classification of disease outbreak reports, a combination of bag-of-words, n-grams and semantic features, in conjunction with feature selection, increases classification accuracy at a statistically significant level compared to previous work in this domain.

Entities:  

Mesh:

Year:  2009        PMID: 19447070     DOI: 10.1016/j.ijmedinf.2009.03.010

Source DB:  PubMed          Journal:  Int J Med Inform        ISSN: 1386-5056            Impact factor:   4.046


  20 in total

1.  International society for disease surveillance conference 2011: building the future of public health surveillance.

Authors:  Daniel B Neill; Karl A Soetebier
Journal:  Emerg Health Threats J       Date:  2011-12-06

2.  The Yale cTAKES extensions for document classification: architecture and application.

Authors:  Vijay Garla; Vincent Lo Re; Zachariah Dorey-Stein; Farah Kidwai; Matthew Scotch; Julie Womack; Amy Justice; Cynthia Brandt
Journal:  J Am Med Inform Assoc       Date:  2011-05-27       Impact factor: 4.497

3.  Text mining for the Vaccine Adverse Event Reporting System: medical text classification using informative feature selection.

Authors:  Taxiarchis Botsis; Michael D Nguyen; Emily Jane Woo; Marianthi Markatou; Robert Ball
Journal:  J Am Med Inform Assoc       Date:  2011-06-27       Impact factor: 4.497

4.  Automatic adverse drug events detection using letters to the editor.

Authors:  Chao Yang; Padmini Srinivasan; Philip M Polgreen
Journal:  AMIA Annu Symp Proc       Date:  2012-11-03

5.  A practical approach for content mining of Tweets.

Authors:  Sunmoo Yoon; Noémie Elhadad; Suzanne Bakken
Journal:  Am J Prev Med       Date:  2013-07       Impact factor: 5.043

6.  Finding falls in ambulatory care clinical documents using statistical text mining.

Authors:  James A McCart; Donald J Berndt; Jay Jarman; Dezon K Finch; Stephen L Luther
Journal:  J Am Med Inform Assoc       Date:  2012-12-15       Impact factor: 4.497

7.  What's unusual in online disease outbreak news?

Authors:  Nigel Collier
Journal:  J Biomed Semantics       Date:  2010-03-31

8.  Methods of knowledge discovery in tweets.

Authors:  Sunmoo Yoon; Suzanne Bakken
Journal:  NI 2012 (2012)       Date:  2012-06-23

Review 9.  Uncovering text mining: a survey of current work on web-based epidemic intelligence.

Authors:  Nigel Collier
Journal:  Glob Public Health       Date:  2012-07-11

10.  Talking about Climate Change and Global Warming.

Authors:  Maurice Lineman; Yuno Do; Ji Yoon Kim; Gea-Jae Joo
Journal:  PLoS One       Date:  2015-09-29       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.