Literature DB >> 30668712

Spell checker for consumer language (CSpell).

Chris J Lu1, Alan R Aronson1, Sonya E Shooshan1, Dina Demner-Fushman1.   

Abstract

Objective: Automated understanding of consumer health inquiries might be hindered by misspellings. To detect and correct various types of spelling errors in consumer health questions, we developed a distributable spell-checking tool, CSpell, that handles nonword errors, real-word errors, word boundary infractions, punctuation errors, and combinations of the above.
Methods: We developed a novel approach of using dual embedding within Word2vec for context-dependent corrections. This technique was used in combination with dictionary-based corrections in a 2-stage ranking system. We also developed various splitters and handlers to correct word boundary infractions. All correction approaches are integrated to handle errors in consumer health questions.
Results: Our approach achieves an F1 score of 80.93% and 69.17% for spelling error detection and correction, respectively. Discussion: The dual-embedding model shows a significant improvement (9.13%) in F1 score compared with the general practice of using cosine similarity with word vectors in Word2vec for context ranking. Our 2-stage ranking system shows a 4.94% improvement in F1 score compared with the best 1-stage ranking system.
Conclusion: CSpell improves over the state of the art and provides near real-time automatic misspelling detection and correction in consumer health questions. The software and the CSpell test set are available at https://umlslex.nlm.nih.gov/cSpell.

Entities:  

Mesh:

Year:  2019        PMID: 30668712      PMCID: PMC6351975          DOI: 10.1093/jamia/ocy171

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


  6 in total

1.  A frequency-based technique to improve the spelling suggestion rank in medical queries.

Authors:  Jonathan Crowell; Qing Zeng; Long Ngo; Eve-Marie Lacroix
Journal:  J Am Med Inform Assoc       Date:  2004-02-05       Impact factor: 4.497

2.  SPELLING CORRECTION IN THE PUBMED SEARCH ENGINE.

Authors:  W John Wilbur; Won Kim; Natalie Xie
Journal:  Inf Retr Boston       Date:  2006-11       Impact factor: 2.293

3.  Identification of misspelled words without a comprehensive dictionary using prevalence analysis.

Authors:  Alexander Turchin; Julia T Chu; Maria Shubina; Jonathan S Einbinder
Journal:  AMIA Annu Symp Proc       Date:  2007-10-11

4.  Automated misspelling detection and correction in clinical free-text records.

Authors:  Kenneth H Lai; Maxim Topaz; Foster R Goss; Li Zhou
Journal:  J Biomed Inform       Date:  2015-04-24       Impact factor: 6.317

5.  An Ensemble Method for Spelling Correction in Consumer Health Questions.

Authors:  Halil Kilicoglu; Marcelo Fiszman; Kirk Roberts; Dina Demner-Fushman
Journal:  AMIA Annu Symp Proc       Date:  2015-11-05

6.  UMLS knowledge for biomedical language processing.

Authors:  A T McCray; A R Aronson; A C Browne; T C Rindflesch; A Razi; S Srinivasan
Journal:  Bull Med Libr Assoc       Date:  1993-04
  6 in total
  3 in total

1.  The journey to transparency, reproducibility, and replicability.

Authors:  Suzanne Bakken
Journal:  J Am Med Inform Assoc       Date:  2019-03-01       Impact factor: 4.497

2.  Automatic classification of scanned electronic health record documents.

Authors:  Heath Goodrum; Kirk Roberts; Elmer V Bernstam
Journal:  Int J Med Inform       Date:  2020-10-17       Impact factor: 4.046

3.  Consumer health information and question answering: helping consumers find answers to their health-related information needs.

Authors:  Dina Demner-Fushman; Yassine Mrabet; Asma Ben Abacha
Journal:  J Am Med Inform Assoc       Date:  2020-02-01       Impact factor: 4.497

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.