Literature DB >> 18366721

Synonym set extraction from the biomedical literature by lexical pattern discovery.

John McCrae1, Nigel Collier.   

Abstract

BACKGROUND: Although there are a large number of thesauri for the biomedical domain many of them lack coverage in terms and their variant forms. Automatic thesaurus construction based on patterns was first suggested by Hearst 1, but it is still not clear how to automatically construct such patterns for different semantic relations and domains. In particular it is not certain which patterns are useful for capturing synonymy. The assumption of extant resources such as parsers is also a limiting factor for many languages, so it is desirable to find patterns that do not use syntactical analysis. Finally to give a more consistent and applicable result it is desirable to use these patterns to form synonym sets in a sound way.
RESULTS: We present a method that automatically generates regular expression patterns by expanding seed patterns in a heuristic search and then develops a feature vector based on the occurrence of term pairs in each developed pattern. This allows for a binary classifications of term pairs as synonymous or non-synonymous. We then model this result as a probability graph to find synonym sets, which is equivalent to the well-studied problem of finding an optimal set cover. We achieved 73.2% precision and 29.7% recall by our method, out-performing hand-made resources such as MeSH and Wikipedia.
CONCLUSION: We conclude that automatic methods can play a practical role in developing new thesauri or expanding on existing ones, and this can be done with only a small amount of training data and no need for resources such as parsers. We also concluded that the accuracy can be improved by grouping into synonym sets.

Entities:  

Mesh:

Year:  2008        PMID: 18366721      PMCID: PMC2335115          DOI: 10.1186/1471-2105-9-159

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  3 in total

1.  Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors:  M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal:  Nat Genet       Date:  2000-05       Impact factor: 38.330

2.  Automatic extraction of gene and protein synonyms from MEDLINE and journal articles.

Authors:  Hong Yu; Vasileios Hatzivassiloglou; Carol Friedman; Andrey Rzhetsky; W John Wilbur
Journal:  Proc AMIA Symp       Date:  2002

3.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000.

Authors:  A Bairoch; R Apweiler
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

  3 in total
  8 in total

1.  Using domain knowledge and domain-inspired discourse model for coreference resolution for clinical narratives.

Authors:  Prateek Jindal; Dan Roth
Journal:  J Am Med Inform Assoc       Date:  2012-07-10       Impact factor: 4.497

2.  Word add-in for ontology recognition: semantic enrichment of scientific literature.

Authors:  J Lynn Fink; Pablo Fernicola; Rahul Chandran; Savas Parastatidis; Alex Wade; Oscar Naim; Gregory B Quinn; Philip E Bourne
Journal:  BMC Bioinformatics       Date:  2010-02-24       Impact factor: 3.169

3.  Using a search engine-based mutually reinforcing approach to assess the semantic relatedness of biomedical terms.

Authors:  Yi-Yu Hsu; Hung-Yu Chen; Hung-Yu Kao
Journal:  PLoS One       Date:  2013-11-13       Impact factor: 3.240

4.  Quantifying the impact and extent of undocumented biomedical synonymy.

Authors:  David R Blair; Kanix Wang; Svetlozar Nestorov; James A Evans; Andrey Rzhetsky
Journal:  PLoS Comput Biol       Date:  2014-09-25       Impact factor: 4.475

5.  Expansion of medical vocabularies using distributional semantics on Japanese patient blogs.

Authors:  Magnus Ahltorp; Maria Skeppstedt; Shiho Kitajima; Aron Henriksson; Rafal Rzepka; Kenji Araki
Journal:  J Biomed Semantics       Date:  2016-09-26

6.  Learning unsupervised contextual representations for medical synonym discovery.

Authors:  Elliot Schumacher; Mark Dredze
Journal:  JAMIA Open       Date:  2019-11-04

7.  Challenges for automatically extracting molecular interactions from full-text articles.

Authors:  Tara McIntosh; James R Curran
Journal:  BMC Bioinformatics       Date:  2009-09-24       Impact factor: 3.169

8.  Synonym extraction and abbreviation expansion with ensembles of semantic spaces.

Authors:  Aron Henriksson; Hans Moen; Maria Skeppstedt; Vidas Daudaravičius; Martin Duneld
Journal:  J Biomed Semantics       Date:  2014-02-05
  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.