Literature DB >> 31562759

Metric learning on expression data for gene function prediction.

Stavros Makrodimitris1,2, Marcel J T Reinders1,3, Roeland C H J van Ham1,2.   

Abstract

MOTIVATION: Co-expression of two genes across different conditions is indicative of their involvement in the same biological process. However, when using RNA-Seq datasets with many experimental conditions from diverse sources, only a subset of the experimental conditions is expected to be relevant for finding genes related to a particular Gene Ontology (GO) term. Therefore, we hypothesize that when the purpose is to find similarly functioning genes, the co-expression of genes should not be determined on all samples but only on those samples informative for the GO term of interest.
RESULTS: To address this, we developed Metric Learning for Co-expression (MLC), a fast algorithm that assigns a GO-term-specific weight to each expression sample. The goal is to obtain a weighted co-expression measure that is more suitable than the unweighted Pearson correlation for applying Guilt-By-Association-based function predictions. More specifically, if two genes are annotated with a given GO term, MLC tries to maximize their weighted co-expression and, in addition, if one of them is not annotated with that term, the weighted co-expression is minimized. Our experiments on publicly available Arabidopsis thaliana RNA-Seq data demonstrate that MLC outperforms standard Pearson correlation in term-centric performance. Moreover, our method is particularly good at more specific terms, which are the most interesting. Finally, by observing the sample weights for a particular GO term, one can identify which experiments are important for learning that term and potentially identify novel conditions that are relevant, as demonstrated by experiments in both A. thaliana and Pseudomonas Aeruginosa.
AVAILABILITY AND IMPLEMENTATION: MLC is available as a Python package at www.github.com/stamakro/MLC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2019. Published by Oxford University Press.

Entities:  

Year:  2020        PMID: 31562759     DOI: 10.1093/bioinformatics/btz731

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  5 in total

1.  DeepGOWeb: fast and accurate protein function prediction on the (Semantic) Web.

Authors:  Maxat Kulmanov; Fernando Zhapa-Camacho; Robert Hoehndorf
Journal:  Nucleic Acids Res       Date:  2021-07-02       Impact factor: 16.971

2.  Generating weighted and thresholded gene coexpression networks using signed distance correlation.

Authors:  Javier Pardo-Diaz; Philip S Poole; Mariano Beguerisse-Díaz; Charlotte M Deane; Gesine Reinert
Journal:  Netw Sci (Camb Univ Press)       Date:  2022-06-16

3.  A tensor-based bi-random walks model for protein function prediction.

Authors:  Sai Hu; Zhihong Zhang; Huijun Xiong; Meiping Jiang; Yingchun Luo; Wei Yan; Bihai Zhao
Journal:  BMC Bioinformatics       Date:  2022-05-30       Impact factor: 3.307

4.  deepSimDEF: deep neural embeddings of gene products and Gene Ontology terms for functional analysis of genes.

Authors:  Ahmad Pesaranghader; Stan Matwin; Marina Sokolova; Jean-Christophe Grenier; Robert G Beiko; Julie Hussin
Journal:  Bioinformatics       Date:  2022-05-10       Impact factor: 6.931

5.  Robust gene coexpression networks using signed distance correlation.

Authors:  Javier Pardo-Diaz; Lyuba V Bozhilova; Mariano Beguerisse-Díaz; Philip S Poole; Charlotte M Deane; Gesine Reinert
Journal:  Bioinformatics       Date:  2021-02-01       Impact factor: 6.931

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.