Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Taming variability in free text: application to health surveillance.

Literature DB >> 15714636

Taming variability in free text: application to health surveillance.

Abstract

INTRODUCTION: Use of free text in syndromic surveillance requires managing the substantial word variation that results from use of synonyms, abbreviations, acronyms, truncations, concatenations, misspellings, and typographic errors. Failure to detect these variations results in missed cases, and traditional methods for capturing these variations require ongoing, labor-intensive maintenance.
OBJECTIVES: This paper examines the problem of word variation in chief-complaint data and explores three semi-automated approaches for addressing it.
METHODS: Approximately 6 million chief complaints from patients reporting to emergency departments at 54 hospitals were analyzed. A method of text normalization that models the similarities between words was developed to manage the linguistic variability in chief complaints. Three approaches based on this method were investigated: 1) automated correction of spelling and typographical errors; 2) use of International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) codes to select chief complaints to mine for overlooked vocabulary; and 3) identification of overlooked vocabulary by matching words that appeared in similar contexts.
RESULTS: The prevalence of word errors was high. For example, such words as diarrhea, nausea, and vomiting were misspelled 11.0%-18.8% of the time. Approximately 20% of all words were abbreviations or acronyms whose use varied substantially by site. Two methods, use of ICD-9-CM codes to focus searches and the automated pairing of words by context, both retrieved relevant but previously unexpected words. Text normalization simultaneously reduced the number of false positives and false negatives in syndrome classification, compared with commonly used methods based on word stems. In approximately 25% of instances, using text normalization to detect lower respiratory syndrome would have improved the sensitivity of current word-stem approaches by approximately 10%-20%.
CONCLUSIONS: Incomplete vocabulary and word errors can have a substantial impact on the retrieval performance of free-text syndromic surveillance systems. The text normalization methods described in this paper can reduce the effects of these problems.

Entities: Disease Species

Mesh：

Year: 2004 PMID： 15714636

Source DB: PubMed Journal: MMWR Suppl ISSN： 2380-8942

Keyword Cloud
Cited

10 in total

1. Timeliness of emergency department diagnoses for syndromic surveillance.

Authors: Debbie Travers; Clifton Barnett; Amy Ising; Anna Waller
Journal: AMIA Annu Symp Proc Date: 2006

2. Evaluation of a chief complaint pre-processor for biosurveillance.

Authors: Debbie Travers; Shiying Wu; Matthew Scholer; Matt Westlake; Anna Waller; Anne-Lyne McCalla
Journal: AMIA Annu Symp Proc Date: 2007-10-11

3. Identification of misspelled words without a comprehensive dictionary using prevalence analysis.

Authors: Alexander Turchin; Julia T Chu; Maria Shubina; Jonathan S Einbinder
Journal: AMIA Annu Symp Proc Date: 2007-10-11

Review 4. Using chief complaints for syndromic surveillance: a review of chief complaint based classifiers in North America.

Authors: Mike Conway; John N Dowling; Wendy W Chapman
Journal: J Biomed Inform Date: 2013-04-17 Impact factor: 6.317

5. Chief complaint-based performance measures: a new focus for acute care quality measurement.

Authors: Richard T Griffey; Jesse M Pines; Heather L Farley; Michael P Phelan; Christopher Beach; Jeremiah D Schuur; Arjun K Venkatesh
Journal: Ann Emerg Med Date: 2014-10-16 Impact factor: 5.721

6. A UMLS-based spell checker for natural language processing in vaccine safety.

Authors: Herman D Tolentino; Michael D Matters; Wikke Walop; Barbara Law; Wesley Tong; Fang Liu; Paul Fontelo; Katrin Kohl; Daniel C Payne
Journal: BMC Med Inform Decis Mak Date: 2007-02-12 Impact factor: 2.796

7. Injury narrative text classification using factorization model.

Authors: Lin Chen; Kirsten Vallmuur; Richi Nayak
Journal: BMC Med Inform Decis Mak Date: 2015-05-20 Impact factor: 2.796

8. Using Syndromic Surveillance to Investigate Tattoo-Related Skin Infections in New York City.

Authors: Mollie Kotzen; Jessica Sell; Robert W Mathes; Catherine Dentinger; Lillian Lee; Corinne Schiff; Don Weiss
Journal: PLoS One Date: 2015-06-15 Impact factor: 3.240

9. Using n-Grams for Syndromic Surveillance in a Turkish Emergency Department Without English Translation: A Feasibility Study.

Authors: Sylvia Halász; Philip Brown; Cem Oktay; Arif Alper Cevik; Isa Kılıçaslan; Colin Goodall; Dennis G Cochrane; Thomas R Fowler; Guy Jacobson; Simon Tse; John R Allegra
Journal: Biomed Inform Insights Date: 2013-04-25

10. Innovative uses for syndromic surveillance.

Authors: Erin K O'Connell; Guoyan Zhang; Fermin Leguen; Anthoni Llau; Edhelene Rico
Journal: Emerg Infect Dis Date: 2010-04 Impact factor: 6.883

10 in total