Literature DB >> 23046094

Semantic similarity in the biomedical domain: an evaluation across knowledge sources.

Vijay N Garla1, Cynthia Brandt.   

Abstract

BACKGROUND: Semantic similarity measures estimate the similarity between concepts, and play an important role in many text processing tasks. Approaches to semantic similarity in the biomedical domain can be roughly divided into knowledge based and distributional based methods. Knowledge based approaches utilize knowledge sources such as dictionaries, taxonomies, and semantic networks, and include path finding measures and intrinsic information content (IC) measures. Distributional measures utilize, in addition to a knowledge source, the distribution of concepts within a corpus to compute similarity; these include corpus IC and context vector methods. Prior evaluations of these measures in the biomedical domain showed that distributional measures outperform knowledge based path finding methods; but more recent studies suggested that intrinsic IC based measures exceed the accuracy of distributional approaches. Limitations of previous evaluations of similarity measures in the biomedical domain include their focus on the SNOMED CT ontology, and their reliance on small benchmarks not powered to detect significant differences between measure accuracy. There have been few evaluations of the relative performance of these measures on other biomedical knowledge sources such as the UMLS, and on larger, recently developed semantic similarity benchmarks.
RESULTS: We evaluated knowledge based and corpus IC based semantic similarity measures derived from SNOMED CT, MeSH, and the UMLS on recently developed semantic similarity benchmarks. Semantic similarity measures based on the UMLS, which contains SNOMED CT and MeSH, significantly outperformed those based solely on SNOMED CT or MeSH across evaluations. Intrinsic IC based measures significantly outperformed path-based and distributional measures. We released all code required to reproduce our results and all tools developed as part of this study as open source, available under http://code.google.com/p/ytex. We provide a publicly-accessible web service to compute semantic similarity, available under http://informatics.med.yale.edu/ytex.web/.
CONCLUSIONS: Knowledge based semantic similarity measures are more practical to compute than distributional measures, as they do not require an external corpus. Furthermore, knowledge based measures significantly and meaningfully outperformed distributional measures on large semantic similarity benchmarks, suggesting that they are a practical alternative to distributional measures. Future evaluations of semantic similarity measures should utilize benchmarks powered to detect significant differences in measure accuracy.

Entities:  

Mesh:

Year:  2012        PMID: 23046094      PMCID: PMC3533586          DOI: 10.1186/1471-2105-13-261

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  15 in total

1.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program.

Authors:  A R Aronson
Journal:  Proc AMIA Symp       Date:  2001

2.  Towards the development of a conceptual distance metric for the UMLS.

Authors:  Jorge E Caviedes; James J Cimino
Journal:  J Biomed Inform       Date:  2004-04       Impact factor: 6.317

3.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications.

Authors:  Guergana K Savova; James J Masanz; Philip V Ogren; Jiaping Zheng; Sunghwan Sohn; Karin C Kipper-Schuler; Christopher G Chute
Journal:  J Am Med Inform Assoc       Date:  2010 Sep-Oct       Impact factor: 4.497

4.  Graph-based word sense disambiguation of biomedical documents.

Authors:  Eneko Agirre; Aitor Soroa; Mark Stevenson
Journal:  Bioinformatics       Date:  2010-10-07       Impact factor: 6.937

5.  Measures of semantic similarity and relatedness in the biomedical domain.

Authors:  Ted Pedersen; Serguei V S Pakhomov; Siddharth Patwardhan; Christopher G Chute
Journal:  J Biomed Inform       Date:  2006-06-10       Impact factor: 6.317

6.  A cluster-based approach for semantic similarity in the biomedical domain.

Authors:  Hisham Al-Mubaid; Hoa A Nguyen
Journal:  Conf Proc IEEE Eng Med Biol Soc       Date:  2006

7.  Comparison of ontology-based semantic-similarity measures.

Authors:  Wei-Nchih Lee; Nigam Shah; Karanjot Sundlass; Mark Musen
Journal:  AMIA Annu Symp Proc       Date:  2008-11-06

8.  Ontology-guided feature engineering for clinical text classification.

Authors:  Vijay N Garla; Cynthia Brandt
Journal:  J Biomed Inform       Date:  2012-05-09       Impact factor: 6.317

9.  Semantic Similarity and Relatedness between Clinical Terms: An Experimental Study.

Authors:  Serguei Pakhomov; Bridget McInnes; Terrence Adam; Ying Liu; Ted Pedersen; Genevieve B Melton
Journal:  AMIA Annu Symp Proc       Date:  2010-11-13

10.  UMLS-Interface and UMLS-Similarity : open source software for measuring paths and semantic similarity.

Authors:  Bridget T McInnes; Ted Pedersen; Serguei V S Pakhomov
Journal:  AMIA Annu Symp Proc       Date:  2009-11-14
View more
  18 in total

1.  Characterizing the sublanguage of online breast cancer forums for medications, symptoms, and emotions.

Authors:  Noémie Elhadad; Shaodian Zhang; Patricia Driscoll; Samuel Brody
Journal:  AMIA Annu Symp Proc       Date:  2014-11-14

2.  Feature extraction for phenotyping from semantic and knowledge resources.

Authors:  Wenxin Ning; Stephanie Chan; Andrew Beam; Ming Yu; Alon Geva; Katherine Liao; Mary Mullen; Kenneth D Mandl; Isaac Kohane; Tianxi Cai; Sheng Yu
Journal:  J Biomed Inform       Date:  2019-02-07       Impact factor: 6.317

3.  Corpus domain effects on distributional semantic modeling of medical terms.

Authors:  Serguei V S Pakhomov; Greg Finley; Reed McEwan; Yan Wang; Genevieve B Melton
Journal:  Bioinformatics       Date:  2016-08-16       Impact factor: 6.937

4.  Retrofitting Concept Vector Representations of Medical Concepts to Improve Estimates of Semantic Similarity and Relatedness.

Authors:  Zhiguo Yu; Byron C Wallace; Todd Johnson; Trevor Cohen
Journal:  Stud Health Technol Inform       Date:  2017

5.  Intrinsic Evaluation of Contextual and Non-contextual Word Embeddings using Radiology Reports.

Authors:  Mirza S Khan; Bennett A Landman; Stephen A Deppen; Michael E Matheny
Journal:  AMIA Annu Symp Proc       Date:  2022-02-21

6.  An automatic hypothesis generation for plausible linkage between xanthium and diabetes.

Authors:  Arida Ferti Syafiandini; Gyuri Song; Yuri Ahn; Heeyoung Kim; Min Song
Journal:  Sci Rep       Date:  2022-10-20       Impact factor: 4.996

7.  Calculating semantic relatedness for biomedical use in a knowledge-poor environment.

Authors:  Maciej Rybinski; José Aldana-Montes
Journal:  BMC Bioinformatics       Date:  2014-11-27       Impact factor: 3.169

8.  Consumers' Use of UMLS Concepts on Social Media: Diabetes-Related Textual Data Analysis in Blog and Social Q&A Sites.

Authors:  Min Sook Park; Zhe He; Zhiwei Chen; Sanghee Oh; Jiang Bian
Journal:  JMIR Med Inform       Date:  2016-11-24

9.  TopoICSim: a new semantic similarity measure based on gene ontology.

Authors:  Rezvan Ehsani; Finn Drabløs
Journal:  BMC Bioinformatics       Date:  2016-07-29       Impact factor: 3.169

10.  A visual and curatorial approach to clinical variant prioritization and disease gene discovery in genome-wide diagnostics.

Authors:  Regis A James; Ian M Campbell; Edward S Chen; Philip M Boone; Mitchell A Rao; Matthew N Bainbridge; James R Lupski; Yaping Yang; Christine M Eng; Jennifer E Posey; Chad A Shaw
Journal:  Genome Med       Date:  2016-02-02       Impact factor: 11.117

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.