Literature DB >> 15714636

Taming variability in free text: application to health surveillance.

Alan R Shapiro1.   

Abstract

INTRODUCTION: Use of free text in syndromic surveillance requires managing the substantial word variation that results from use of synonyms, abbreviations, acronyms, truncations, concatenations, misspellings, and typographic errors. Failure to detect these variations results in missed cases, and traditional methods for capturing these variations require ongoing, labor-intensive maintenance.
OBJECTIVES: This paper examines the problem of word variation in chief-complaint data and explores three semi-automated approaches for addressing it.
METHODS: Approximately 6 million chief complaints from patients reporting to emergency departments at 54 hospitals were analyzed. A method of text normalization that models the similarities between words was developed to manage the linguistic variability in chief complaints. Three approaches based on this method were investigated: 1) automated correction of spelling and typographical errors; 2) use of International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) codes to select chief complaints to mine for overlooked vocabulary; and 3) identification of overlooked vocabulary by matching words that appeared in similar contexts.
RESULTS: The prevalence of word errors was high. For example, such words as diarrhea, nausea, and vomiting were misspelled 11.0%-18.8% of the time. Approximately 20% of all words were abbreviations or acronyms whose use varied substantially by site. Two methods, use of ICD-9-CM codes to focus searches and the automated pairing of words by context, both retrieved relevant but previously unexpected words. Text normalization simultaneously reduced the number of false positives and false negatives in syndrome classification, compared with commonly used methods based on word stems. In approximately 25% of instances, using text normalization to detect lower respiratory syndrome would have improved the sensitivity of current word-stem approaches by approximately 10%-20%.
CONCLUSIONS: Incomplete vocabulary and word errors can have a substantial impact on the retrieval performance of free-text syndromic surveillance systems. The text normalization methods described in this paper can reduce the effects of these problems.

Entities:  

Mesh:

Year:  2004        PMID: 15714636

Source DB:  PubMed          Journal:  MMWR Suppl        ISSN: 2380-8942


  10 in total

1.  Timeliness of emergency department diagnoses for syndromic surveillance.

Authors:  Debbie Travers; Clifton Barnett; Amy Ising; Anna Waller
Journal:  AMIA Annu Symp Proc       Date:  2006

2.  Evaluation of a chief complaint pre-processor for biosurveillance.

Authors:  Debbie Travers; Shiying Wu; Matthew Scholer; Matt Westlake; Anna Waller; Anne-Lyne McCalla
Journal:  AMIA Annu Symp Proc       Date:  2007-10-11

3.  Identification of misspelled words without a comprehensive dictionary using prevalence analysis.

Authors:  Alexander Turchin; Julia T Chu; Maria Shubina; Jonathan S Einbinder
Journal:  AMIA Annu Symp Proc       Date:  2007-10-11

Review 4.  Using chief complaints for syndromic surveillance: a review of chief complaint based classifiers in North America.

Authors:  Mike Conway; John N Dowling; Wendy W Chapman
Journal:  J Biomed Inform       Date:  2013-04-17       Impact factor: 6.317

5.  Chief complaint-based performance measures: a new focus for acute care quality measurement.

Authors:  Richard T Griffey; Jesse M Pines; Heather L Farley; Michael P Phelan; Christopher Beach; Jeremiah D Schuur; Arjun K Venkatesh
Journal:  Ann Emerg Med       Date:  2014-10-16       Impact factor: 5.721

6.  A UMLS-based spell checker for natural language processing in vaccine safety.

Authors:  Herman D Tolentino; Michael D Matters; Wikke Walop; Barbara Law; Wesley Tong; Fang Liu; Paul Fontelo; Katrin Kohl; Daniel C Payne
Journal:  BMC Med Inform Decis Mak       Date:  2007-02-12       Impact factor: 2.796

7.  Injury narrative text classification using factorization model.

Authors:  Lin Chen; Kirsten Vallmuur; Richi Nayak
Journal:  BMC Med Inform Decis Mak       Date:  2015-05-20       Impact factor: 2.796

8.  Using Syndromic Surveillance to Investigate Tattoo-Related Skin Infections in New York City.

Authors:  Mollie Kotzen; Jessica Sell; Robert W Mathes; Catherine Dentinger; Lillian Lee; Corinne Schiff; Don Weiss
Journal:  PLoS One       Date:  2015-06-15       Impact factor: 3.240

9.  Using n-Grams for Syndromic Surveillance in a Turkish Emergency Department Without English Translation: A Feasibility Study.

Authors:  Sylvia Halász; Philip Brown; Cem Oktay; Arif Alper Cevik; Isa Kılıçaslan; Colin Goodall; Dennis G Cochrane; Thomas R Fowler; Guy Jacobson; Simon Tse; John R Allegra
Journal:  Biomed Inform Insights       Date:  2013-04-25

10.  Innovative uses for syndromic surveillance.

Authors:  Erin K O'Connell; Guoyan Zhang; Fermin Leguen; Anthoni Llau; Edhelene Rico
Journal:  Emerg Infect Dis       Date:  2010-04       Impact factor: 6.883

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.