| Literature DB >> 31009761 |
Junseok Park1, Kwangmin Kim1, Woochang Hwang2, Doheon Lee3.
Abstract
There have been many attempts to identify relationships among concepts corresponding to terms from biomedical information ontologies such as the Unified Medical Language System (UMLS). In particular, vector representation of such concepts using information from UMLS definition texts is widely used to measure the relatedness between two biological concepts. However, conventional relatedness measures have a limited range of applicable word coverage, which limits the performance of these models. In this paper, we propose a concept-embedding model of a UMLS semantic relatedness measure to overcome the limitations of earlier models. We obtained context texts of biological concepts that are not defined in UMLS by utilizing Wikipedia as an external knowledgebase. Concept vector representations were then derived from the context texts of the biological concepts. The degree of relatedness between two concepts was defined as the cosine similarity between corresponding concept vectors. As a result, we validated that our method provides higher coverage and better performance than the conventional method.Keywords: Embedding; NLP; Paragraph vector; Similarity; UMLS; Wikipedia
Year: 2019 PMID: 31009761 DOI: 10.1016/j.jbi.2019.103182
Source DB: PubMed Journal: J Biomed Inform ISSN: 1532-0464 Impact factor: 6.317