Literature DB >> 12386112

Creating an online dictionary of abbreviations from MEDLINE.

Jeffrey T Chang1, Hinrich Schütze, Russ B Altman.   

Abstract

OBJECTIVE: The growth of the biomedical literature presents special challenges for both human readers and automatic algorithms. One such challenge derives from the common and uncontrolled use of abbreviations in the literature. Each additional abbreviation increases the effective size of the vocabulary for a field. Therefore, to create an automatically generated and maintained lexicon of abbreviations, we have developed an algorithm to match abbreviations in text with their expansions.
DESIGN: Our method uses a statistical learning algorithm, logistic regression, to score abbreviation expansions based on their resemblance to a training set of human-annotated abbreviations. We applied it to Medstract, a corpus of MEDLINE abstracts in which abbreviations and their expansions have been manually annotated. We then ran the algorithm on all abstracts in MEDLINE, creating a dictionary of biomedical abbreviations. To test the coverage of the database, we used an independently created list of abbreviations from the China Medical Tribune. MEASUREMENTS: We measured the recall and precision of the algorithm in identifying abbreviations from the Medstract corpus. We also measured the recall when searching for abbreviations from the China Medical Tribune against the database.
RESULTS: On the Medstract corpus, our algorithm achieves up to 83% recall at 80% precision. Applying the algorithm to all of MEDLINE yielded a database of 781,632 high-scoring abbreviations. Of all the abbreviations in the list from the China Medical Tribune, 88% were in the database.
CONCLUSION: We have developed an algorithm to identify abbreviations from text. We are making this available as a public abbreviation server at \url[http://abbreviation.stanford.edu/].

Entities:  

Mesh:

Year:  2002        PMID: 12386112      PMCID: PMC349378          DOI: 10.1197/jamia.m1139

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


  8 in total

1.  A literature network of human genes for high-throughput analysis of gene expression.

Authors:  T K Jenssen; A Laegreid; J Komorowski; E Hovig
Journal:  Nat Genet       Date:  2001-05       Impact factor: 38.330

2.  Textquest: document clustering of Medline abstracts for concept discovery in molecular biology.

Authors:  I Iliopoulos; A J Enright; C A Ouzounis
Journal:  Pac Symp Biocomput       Date:  2001

3.  PNAD-CSS: a workbench for constructing a protein name abbreviation dictionary.

Authors:  M Yoshida; K Fukuda; T Takagi
Journal:  Bioinformatics       Date:  2000-02       Impact factor: 6.937

4.  Mapping abbreviations to full forms in biomedical articles.

Authors:  Hong Yu; George Hripcsak; Carol Friedman
Journal:  J Am Med Inform Assoc       Date:  2002 May-Jun       Impact factor: 4.497

5.  A study of abbreviations in the UMLS.

Authors:  H Liu; Y A Lussier; C Friedman
Journal:  Proc AMIA Symp       Date:  2001

6.  Automatic extraction of acronym-meaning pairs from MEDLINE databases.

Authors:  J Pustejovsky; J Castaño; B Cochran; M Kotecki; M Morrell
Journal:  Stud Health Technol Inform       Date:  2001

7.  Automatic annotation for biological sequences by extraction of keywords from MEDLINE abstracts. Development of a prototype system.

Authors:  M A Andrade; A Valencia
Journal:  Proc Int Conf Intell Syst Mol Biol       Date:  1997

8.  A general method applicable to the search for similarities in the amino acid sequence of two proteins.

Authors:  S B Needleman; C D Wunsch
Journal:  J Mol Biol       Date:  1970-03       Impact factor: 5.469

  8 in total
  37 in total

1.  A simple and practical dictionary-based approach for identification of proteins in Medline abstracts.

Authors:  Sergei Egorov; Anton Yuryev; Nikolai Daraselia
Journal:  J Am Med Inform Assoc       Date:  2004-02-05       Impact factor: 4.497

2.  Neuroanatomical term generation and comparison between two terminologies.

Authors:  Prashanti R Srinivas; Daniel Gusfield; Oliver Mason; Michael Gertz; Michael Hogarth; James Stone; Edward G Jones; Fredric A Gorin
Journal:  Neuroinformatics       Date:  2003

3.  Using UMLS lexical resources to disambiguate abbreviations in clinical text.

Authors:  Youngjun Kim; John Hurdle; Stéphane M Meystre
Journal:  AMIA Annu Symp Proc       Date:  2011-10-22

4.  Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts.

Authors:  A M Cohen; W R Hersh; C Dubay; K Spackman
Journal:  BMC Bioinformatics       Date:  2005-04-22       Impact factor: 3.169

5.  ALICE: an algorithm to extract abbreviations from MEDLINE.

Authors:  Hiroko Ao; Toshihisa Takagi
Journal:  J Am Med Inform Assoc       Date:  2005-05-19       Impact factor: 4.497

6.  A system for automated lexical mapping.

Authors:  Jennifer Y Sun; Yao Sun
Journal:  J Am Med Inform Assoc       Date:  2006-02-24       Impact factor: 4.497

Review 7.  Biomedical language processing: what's beyond PubMed?

Authors:  Lawrence Hunter; K Bretonnel Cohen
Journal:  Mol Cell       Date:  2006-03-03       Impact factor: 17.970

8.  A fault model for ontology mapping, alignment, and linking systems.

Authors:  Helen L Johnson; K Bretonnel Cohen; Lawrence Hunter
Journal:  Pac Symp Biocomput       Date:  2007

9.  Enhancing acronym/abbreviation knowledge bases with semantic information.

Authors:  Manabu Torii; Hongfang Liu
Journal:  AMIA Annu Symp Proc       Date:  2007-10-11

10.  A study of abbreviations in clinical notes.

Authors:  Hua Xu; Peter D Stetson; Carol Friedman
Journal:  AMIA Annu Symp Proc       Date:  2007-10-11
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.