Literature DB >> 35419584

An evidence-based lexical pattern approach for quality assurance of Gene Ontology relations.

Rashmie Abeysinghe1, Yuntao Yang2, Mason Bartels2, W Jim Zheng2, Licong Cui2.   

Abstract

Gene Ontology (GO) is widely used in the biological domain. It is the most comprehensive ontology providing formal representation of gene functions (GO concepts) and relations between them. However, unintentional quality defects (e.g. missing or erroneous relations) in GO may exist due to the large size of GO concepts and complexity of GO structures. Such quality defects would impact the results of GO-based analyses and applications. In this work, we introduce a novel evidence-based lexical pattern approach for quality assurance of GO relations. We leverage two layers of evidence to suggest potentially missing relations in GO as follows. We first utilize related concept pairs (i.e. existing relations) in GO to extract relationship-specific lexical patterns, which serve as the first layer evidence to automatically suggest potentially missing relations between unrelated concept pairs. For each suggested missing relation, we further identify two other existing relations as the second layer of evidence that resemble the difference between the missing relation and the existing relation based on which the missing relation is suggested. Applied to the 15 December 2021 release of GO, this approach suggested a total of 866 potentially missing relations. Local domain experts evaluated the entire set of potentially missing relations, and identified 821 as missing relations and 45 indicate erroneous existing relations. We submitted these findings to the GO consortium for further validation and received encouraging feedback. These indicate that our evidence-based approach can be utilized to uncover missing relations and erroneous existing relations in GO.
© The Author(s) 2022. Published by Oxford University Press.

Entities:  

Keywords:  Gene Ontology; erroneous relations; lexical patterns; missing relations; ontology quality assurance

Mesh:

Year:  2022        PMID: 35419584      PMCID: PMC9116247          DOI: 10.1093/bib/bbac122

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   13.994


  4 in total

Review 1.  Lateral Thinking: How Histone Modifications Regulate Gene Expression.

Authors:  Moyra Lawrence; Sylvain Daujat; Robert Schneider
Journal:  Trends Genet       Date:  2015-12-17       Impact factor: 11.639

2.  Quality Assurance of NCI Thesaurus by Mining Structural-Lexical Patterns.

Authors:  Rashmie Abeysinghe; Michael A Brooks; Jeffery Talbert; Cui Licong
Journal:  AMIA Annu Symp Proc       Date:  2018-04-16

3.  Leveraging Non-lattice Subgraphs to Audit Hierarchical Relations in NCI Thesaurus.

Authors:  Rashmie Abeysinghe; Michael A Brooks; Licong Cui
Journal:  AMIA Annu Symp Proc       Date:  2020-03-04

4.  ChEBI: a database and ontology for chemical entities of biological interest.

Authors:  Kirill Degtyarenko; Paula de Matos; Marcus Ennis; Janna Hastings; Martin Zbinden; Alan McNaught; Rafael Alcántara; Michael Darsow; Mickaël Guedj; Michael Ashburner
Journal:  Nucleic Acids Res       Date:  2007-10-11       Impact factor: 16.971

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.