Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Vector representations of multi-word terms for semantic relatedness.

Literature DB >> 29247788

Vector representations of multi-word terms for semantic relatedness.

Sam Henry¹, Clint Cuffy², Bridget T McInnes².

Abstract

This paper presents a comparison between several multi-word term aggregation methods of distributional context vectors applied to the task of semantic similarity and relatedness in the biomedical domain. We compare the multi-word term aggregation methods of summation of component word vectors, mean of component word vectors, direct construction of compound term vectors using the compoundify tool, and direct construction of concept vectors using the MetaMap tool. Dimensionality reduction is critical when constructing high quality distributional context vectors, so these baseline co-occurrence vectors are compared against dimensionality reduced vectors created using singular value decomposition (SVD), and word2vec word embeddings using continuous bag of words (CBOW), and skip-gram models. We also find optimal vector dimensionalities for the vectors produced by these techniques. Our results show that none of the tested multi-word term aggregation methods is statistically significantly better than any other. This allows flexibility when choosing a multi-word term aggregation method, and means expensive corpora preprocessing may be avoided. Results are shown with several standard evaluation datasets, and state of the results are achieved.

Entities: Disease

Keywords: Distributional similarity; Natural language processing; Semantic similarity and relatedness

Mesh：

Year: 2017 PMID： 29247788 DOI： 10.1016/j.jbi.2017.12.006

Source DB: PubMed Journal: J Biomed Inform ISSN： 1532-0464 Impact factor: 6.317

Keyword Cloud
Cited

4 in total

1. Multi-Ontology Refined Embeddings (MORE): A hybrid multi-ontology and corpus-based semantic representation model for biomedical concepts.

Authors: Steven Jiang; Weiyi Wu; Naofumi Tomita; Craig Ganoe; Saeed Hassanpour
Journal: J Biomed Inform Date: 2020-10-01 Impact factor: 6.317

2. Unsupervised low-dimensional vector representations for words, phrases and text that are transparent, scalable, and produce similarity metrics that are not redundant with neural embeddings.

Authors: Neil R Smalheiser; Aaron M Cohen; Gary Bonifield
Journal: J Biomed Inform Date: 2019-01-14 Impact factor: 6.317

3. A Year of Papers Using Biomedical Texts: Findings from the Section on Natural Language Processing of the IMIA Yearbook.

Authors: Natalia Grabar; Cyril Grouin
Journal: Yearb Med Inform Date: 2019-08-16

4. Indirect association and ranking hypotheses for literature based discovery.

Authors: Sam Henry; Bridget T McInnes
Journal: BMC Bioinformatics Date: 2019-08-15 Impact factor: 3.169

4 in total