Literature DB >> 33181823

A Collection of Benchmark Data Sets for Knowledge Graph-based Similarity in the Biomedical Domain.

Carlota Cardoso1, Rita T Sousa1, Sebastian Köhler2, Catia Pesquita1.   

Abstract

The ability to compare entities within a knowledge graph is a cornerstone technique for several applications, ranging from the integration of heterogeneous data to machine learning. It is of particular importance in the biomedical domain, where semantic similarity can be applied to the prediction of protein-protein interactions, associations between diseases and genes, cellular localization of proteins, among others. In recent years, several knowledge graph-based semantic similarity measures have been developed, but building a gold standard data set to support their evaluation is non-trivial. We present a collection of 21 benchmark data sets that aim at circumventing the difficulties in building benchmarks for large biomedical knowledge graphs by exploiting proxies for biomedical entity similarity. These data sets include data from two successful biomedical ontologies, Gene Ontology and Human Phenotype Ontology, and explore proxy similarities calculated based on protein sequence similarity, protein family similarity, protein-protein interactions and phenotype-based gene similarity. Data sets have varying sizes and cover four different species at different levels of annotation completion. For each data set, we also provide semantic similarity computations with state-of-the-art representative measures. Database URL: https://github.com/liseda-lab/kgsim-benchmark.
© The Author(s) 2020. Published by Oxford University Press.

Entities:  

Year:  2020        PMID: 33181823      PMCID: PMC7661097          DOI: 10.1093/database/baaa078

Source DB:  PubMed          Journal:  Database (Oxford)        ISSN: 1758-0463            Impact factor:   3.451


  36 in total

1.  The Unified Medical Language System (UMLS): integrating biomedical terminology.

Authors:  Olivier Bodenreider
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

2.  Simple sequence-based kernels do not predict protein-protein interactions.

Authors:  Jiantao Yu; Maozu Guo; Chris J Needham; Yangchao Huang; Lu Cai; David R Westhead
Journal:  Bioinformatics       Date:  2010-08-27       Impact factor: 6.937

3.  Kernel methods for predicting protein-protein interactions.

Authors:  Asa Ben-Hur; William Stafford Noble
Journal:  Bioinformatics       Date:  2005-06       Impact factor: 6.937

4.  Investigating Correlation between Protein Sequence Similarity and Semantic Similarity Using Gene Ontology Annotations.

Authors:  Najmul Ikram; Muhammad Abdul Qadir; Muhammad Tanvir Afzal
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2017-04-18       Impact factor: 3.710

5.  An improved method for scoring protein-protein interactions using semantic similarity within the gene ontology.

Authors:  Shobhit Jain; Gary D Bader
Journal:  BMC Bioinformatics       Date:  2010-11-15       Impact factor: 3.169

6.  OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders.

Authors:  Joanna S Amberger; Carol A Bocchini; François Schiettecatte; Alan F Scott; Ada Hamosh
Journal:  Nucleic Acids Res       Date:  2014-11-26       Impact factor: 19.160

7.  Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources.

Authors:  Sebastian Köhler; Leigh Carmody; Nicole Vasilevsky; Julius O B Jacobsen; Daniel Danis; Jean-Philippe Gourdine; Michael Gargano; Nomi L Harris; Nicolas Matentzoglu; Julie A McMurry; David Osumi-Sutherland; Valentina Cipriani; James P Balhoff; Tom Conlin; Hannah Blau; Gareth Baynam; Richard Palmer; Dylan Gratian; Hugh Dawkins; Michael Segal; Anna C Jansen; Ahmed Muaz; Willie H Chang; Jenna Bergerson; Stanley J F Laulederkind; Zafer Yüksel; Sergi Beltran; Alexandra F Freeman; Panagiotis I Sergouniotis; Daniel Durkin; Andrea L Storm; Marc Hanauer; Michael Brudno; Susan M Bello; Murat Sincan; Kayli Rageth; Matthew T Wheeler; Renske Oegema; Halima Lourghi; Maria G Della Rocca; Rachel Thompson; Francisco Castellanos; James Priest; Charlotte Cunningham-Rundles; Ayushi Hegde; Ruth C Lovering; Catherine Hajek; Annie Olry; Luigi Notarangelo; Morgan Similuk; Xingmin A Zhang; David Gómez-Andrés; Hanns Lochmüller; Hélène Dollfus; Sergio Rosenzweig; Shruti Marwaha; Ana Rath; Kathleen Sullivan; Cynthia Smith; Joshua D Milner; Dorothée Leroux; Cornelius F Boerkoel; Amy Klion; Melody C Carter; Tudor Groza; Damian Smedley; Melissa A Haendel; Chris Mungall; Peter N Robinson
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

8.  Predicting gene ontology functions from protein's regional surface structures.

Authors:  Zhi-Ping Liu; Ling-Yun Wu; Yong Wang; Luonan Chen; Xiang-Sun Zhang
Journal:  BMC Bioinformatics       Date:  2007-12-11       Impact factor: 3.169

9.  False positive reduction in protein-protein interaction predictions using gene ontology annotations.

Authors:  Mahmoud A Mahdavi; Yen-Han Lin
Journal:  BMC Bioinformatics       Date:  2007-07-23       Impact factor: 3.169

10.  A new method to measure the semantic similarity from query phenotypic abnormalities to diseases based on the human phenotype ontology.

Authors:  Xiaofeng Gong; Jianping Jiang; Zhongqu Duan; Hui Lu
Journal:  BMC Bioinformatics       Date:  2018-05-08       Impact factor: 3.169

View more
  2 in total

1.  HESML: a real-time semantic measures library for the biomedical domain with a reproducible survey.

Authors:  Juan J Lastra-Díaz; Alicia Lara-Clares; Ana Garcia-Serrano
Journal:  BMC Bioinformatics       Date:  2022-01-06       Impact factor: 3.169

2.  GOntoSim: a semantic similarity measure based on LCA and common descendants.

Authors:  Amna Binte Kamran; Hammad Naveed
Journal:  Sci Rep       Date:  2022-03-09       Impact factor: 4.379

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.