Literature DB >> 33083825

Better synonyms for enriching biomedical search.

Lana Yeganova1, Sun Kim1, Qingyu Chen1, Grigory Balasanov1, W John Wilbur1, Zhiyong Lu1.   

Abstract

OBJECTIVE: In a biomedical literature search, the link between a query and a document is often not established, because they use different terms to refer to the same concept. Distributional word embeddings are frequently used for detecting related words by computing the cosine similarity between them. However, previous research has not established either the best embedding methods for detecting synonyms among related word pairs or how effective such methods may be.
MATERIALS AND METHODS: In this study, we first create the BioSearchSyn set, a manually annotated set of synonyms, to assess and compare 3 widely used word-embedding methods (word2vec, fastText, and GloVe) in their ability to detect synonyms among related pairs of words. We demonstrate the shortcomings of the cosine similarity score between word embeddings for this task: the same scores have very different meanings for the different methods. To address the problem, we propose utilizing pool adjacent violators (PAV), an isotonic regression algorithm, to transform a cosine similarity into a probability of 2 words being synonyms.
RESULTS: Experimental results using the BioSearchSyn set as a gold standard reveal which embedding methods have the best performance in identifying synonym pairs. The BioSearchSyn set also allows converting cosine similarity scores into probabilities, which provides a uniform interpretation of the synonymy score over different methods.
CONCLUSIONS: We introduced the BioSearchSyn corpus of 1000 term pairs, which allowed us to identify the best embedding method for detecting synonymy for biomedical search. Using the proposed method, we created PubTermVariants2.0: a large, automatically extracted set of synonym pairs that have augmented PubMed searches since the spring of 2019. Published by Oxford University Press on behalf of the American Medical Informatics Association 2020. This work is written by a US Government employee and is in the public domain in the US.

Mesh:

Year:  2020        PMID: 33083825      PMCID: PMC7727334          DOI: 10.1093/jamia/ocaa151

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


  13 in total

1.  Measures of semantic similarity and relatedness in the biomedical domain.

Authors:  Ted Pedersen; Serguei V S Pakhomov; Siddharth Patwardhan; Christopher G Chute
Journal:  J Biomed Inform       Date:  2006-06-10       Impact factor: 6.317

Review 2.  Empirical distributional semantics: methods and biomedical applications.

Authors:  Trevor Cohen; Dominic Widdows
Journal:  J Biomed Inform       Date:  2009-02-14       Impact factor: 6.317

3.  Semantic Similarity and Relatedness between Clinical Terms: An Experimental Study.

Authors:  Serguei Pakhomov; Bridget McInnes; Terrence Adam; Ying Liu; Ted Pedersen; Genevieve B Melton
Journal:  AMIA Annu Symp Proc       Date:  2010-11-13

4.  Bridging the gap: Incorporating a semantic similarity measure for effectively mapping PubMed queries to documents.

Authors:  Sun Kim; Nicolas Fiorini; W John Wilbur; Zhiyong Lu
Journal:  J Biomed Inform       Date:  2017-10-03       Impact factor: 6.317

5.  Quantifying semantic similarity of clinical evidence in the biomedical literature to facilitate related evidence synthesis.

Authors:  Hamed Hassanzadeh; Anthony Nguyen; Karin Verspoor
Journal:  J Biomed Inform       Date:  2019-10-30       Impact factor: 6.317

6.  A Study of the Morpho-Semantic Relationship in Medline.

Authors:  W John Wilbur; Larry Smith
Journal:  Open Inf Syst J       Date:  2013-11-21

7.  Bio-SimVerb and Bio-SimLex: wide-coverage evaluation sets of word similarity in biomedicine.

Authors:  Billy Chiu; Sampo Pyysalo; Ivan Vulić; Anna Korhonen
Journal:  BMC Bioinformatics       Date:  2018-02-05       Impact factor: 3.169

8.  Best Match: New relevance search for PubMed.

Authors:  Nicolas Fiorini; Kathi Canese; Grisha Starchenko; Evgeny Kireev; Won Kim; Vadim Miller; Maxim Osipov; Michael Kholodov; Rafis Ismagilov; Sunil Mohan; James Ostell; Zhiyong Lu
Journal:  PLoS Biol       Date:  2018-08-28       Impact factor: 8.029

9.  BioWordVec, improving biomedical word embeddings with subword information and MeSH.

Authors:  Yijia Zhang; Qingyu Chen; Zhihao Yang; Hongfei Lin; Zhiyong Lu
Journal:  Sci Data       Date:  2019-05-10       Impact factor: 6.444

10.  Semantic relatedness and similarity of biomedical terms: examining the effects of recency, size, and section of biomedical publications on the performance of word2vec.

Authors:  Yongjun Zhu; Erjia Yan; Fei Wang
Journal:  BMC Med Inform Decis Mak       Date:  2017-07-03       Impact factor: 2.796

View more
  1 in total

Review 1.  Literature search: Simple rules for confronting the unknown.

Authors:  Ruchika Jha; Vishal Sondhi; Biju Vasudevan
Journal:  Med J Armed Forces India       Date:  2022-08-30
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.