Literature DB >> 18693937

Identification of misspelled words without a comprehensive dictionary using prevalence analysis.

Alexander Turchin1, Julia T Chu, Maria Shubina, Jonathan S Einbinder.   

Abstract

Misspellings are common in medical documents and can be an obstacle to information retrieval. We evaluated an algorithm to identify misspelled words through analysis of their prevalence in a representative body of text. We evaluated the algorithm's accuracy of identifying misspellings of 200 anti-hypertensive medication names on 2,000 potentially misspelled words randomly selected from narrative medical documents. Prevalence ratios (the frequency of the potentially misspelled word divided by the frequency of the non-misspelled word) in physician notes were computed by the software for each of the words. The software results were compared to the manual assessment by an independent reviewer. Area under the ROC curve for identification of misspelled words was 0.96. Sensitivity, specificity, and positive predictive value were 99.25%, 89.72% and 82.9% for the prevalence ratio threshold (0.32768) with the highest F-measure (0.903). Prevalence analysis can be used to identify and correct misspellings with high accuracy.

Entities:  

Mesh:

Substances:

Year:  2007        PMID: 18693937      PMCID: PMC2813663     

Source DB:  PubMed          Journal:  AMIA Annu Symp Proc        ISSN: 1559-4076


  14 in total

1.  Case study: a data warehouse for an academic medical center.

Authors:  J S Einbinder; K W Scully; R D Pates; J R Schubart; R E Reynolds
Journal:  J Healthc Inf Manag       Date:  2001

2.  Looking back or looking all around: comparing two spell checking strategies for documents edition in an electronic patient record.

Authors:  P Ruch; R H Baud; A Geiddbühler; C Lovis; A M Rassinoux; A Rivière
Journal:  Proc AMIA Symp       Date:  2001

3.  Integrating query of relational and textual data in clinical databases: a case study.

Authors:  John M Fisk; Pradeep Mutalik; Forrest W Levin; Joseph Erdos; Caroline Taylor; Prakash Nadkarni
Journal:  J Am Med Inform Assoc       Date:  2003 Jan-Feb       Impact factor: 4.497

4.  Taming variability in free text: application to health surveillance.

Authors:  Alan R Shapiro
Journal:  MMWR Suppl       Date:  2004-09-24

5.  MediClass: A system for detecting and classifying encounter-based clinical events in any electronic medical record.

Authors:  Brian Hazlehurst; H Robert Frost; Dean F Sittig; Victor J Stevens
Journal:  J Am Med Inform Assoc       Date:  2005-05-19       Impact factor: 4.497

6.  The use of routinely collected computer data for research in primary care: opportunities and challenges.

Authors:  Simon de Lusignan; Chris van Weel
Journal:  Fam Pract       Date:  2005-12-20       Impact factor: 2.267

7.  Extracting information on pneumonia in infants using natural language processing of radiology reports.

Authors:  Eneida A Mendonça; Janet Haas; Lyudmila Shagina; Elaine Larson; Carol Friedman
Journal:  J Biomed Inform       Date:  2005-03-30       Impact factor: 6.317

8.  DITTO - a tool for identification of patient cohorts from the text of physician notes in the electronic medical record.

Authors:  Alexander Turchin; Merri L Pendergrass; Isaac S Kohane
Journal:  AMIA Annu Symp Proc       Date:  2005

Review 9.  Comprehensive computerised primary care records are an essential component of any national health information strategy: report from an international consensus conference.

Authors:  Simon de Lusignan; Sheila Teasdale; David Little; John Zapp; Alan Zuckerman; David W Bates; Andrew Steele
Journal:  Inform Prim Care       Date:  2004

10.  Using regular expressions to abstract blood pressure and treatment intensification information from the text of physician notes.

Authors:  Alexander Turchin; Nikheel S Kolatkar; Richard W Grant; Eric C Makhni; Merri L Pendergrass; Jonathan S Einbinder
Journal:  J Am Med Inform Assoc       Date:  2006-08-23       Impact factor: 4.497

View more
  2 in total

1.  Spell checker for consumer language (CSpell).

Authors:  Chris J Lu; Alan R Aronson; Sonya E Shooshan; Dina Demner-Fushman
Journal:  J Am Med Inform Assoc       Date:  2019-03-01       Impact factor: 4.497

2.  Automated Misspelling Detection and Correction in Persian Clinical Text.

Authors:  Azita Yazdani; Marjan Ghazisaeedi; Nasrin Ahmadinejad; Masoumeh Giti; Habibe Amjadi; Azin Nahvijou
Journal:  J Digit Imaging       Date:  2020-06       Impact factor: 4.056

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.