Literature DB >> 32814973

Bipartite tight spectral clustering (BiTSC) algorithm for identifying conserved gene co-clusters in two species.

Yidan Eden Sun1, Heather J Zhou1, Jingyi Jessica Li1,2,3.   

Abstract

MOTIVATION: Gene clustering is a widely used technique that has enabled computational prediction of unknown gene functions within a species. However, it remains a challenge to refine gene function prediction by leveraging evolutionarily conserved genes in another species. This challenge calls for a new computational algorithm to identify gene co-clusters in two species, so that genes in each co-cluster exhibit similar expression levels in each species and strong conservation between the species.
RESULTS: Here, we develop the bipartite tight spectral clustering (BiTSC) algorithm, which identifies gene co-clusters in two species based on gene orthology information and gene expression data. BiTSC novelly implements a formulation that encodes gene orthology as a bipartite network and gene expression data as node covariates. This formulation allows BiTSC to adopt and combine the advantages of multiple unsupervised learning techniques: kernel enhancement, bipartite spectral clustering, consensus clustering, tight clustering and hierarchical clustering. As a result, BiTSC is a flexible and robust algorithm capable of identifying informative gene co-clusters without forcing all genes into co-clusters. Another advantage of BiTSC is that it does not rely on any distributional assumptions. Beyond cross-species gene co-clustering, BiTSC also has wide applications as a general algorithm for identifying tight node co-clusters in any bipartite network with node covariates. We demonstrate the accuracy and robustness of BiTSC through comprehensive simulation studies. In a real data example, we use BiTSC to identify conserved gene co-clusters of Drosophila melanogaster and Caenorhabditis elegans, and we perform a series of downstream analysis to both validate BiTSC and verify the biological significance of the identified co-clusters.
AVAILABILITY AND IMPLEMENTATION: The Python package BiTSC is open-access and available at https://github.com/edensunyidan/BiTSC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2021. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Mesh:

Year:  2021        PMID: 32814973      PMCID: PMC8599197          DOI: 10.1093/bioinformatics/btaa741

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.931


  34 in total

1.  A gene-coexpression network for global discovery of conserved genetic modules.

Authors:  Joshua M Stuart; Eran Segal; Daphne Koller; Stuart K Kim
Journal:  Science       Date:  2003-08-21       Impact factor: 47.728

2.  Coexpression analysis of human genes across many microarray data sets.

Authors:  Homin K Lee; Amy K Hsu; Jon Sajdak; Jie Qin; Paul Pavlidis
Journal:  Genome Res       Date:  2004-06       Impact factor: 9.043

Review 3.  Orthologs, paralogs, and evolutionary genomics.

Authors:  Eugene V Koonin
Journal:  Annu Rev Genet       Date:  2005       Impact factor: 16.830

Review 4.  A genomic perspective on protein families.

Authors:  R L Tatusov; E V Koonin; D J Lipman
Journal:  Science       Date:  1997-10-24       Impact factor: 47.728

5.  Cross-species queries of large gene expression databases.

Authors:  Hai-Son Le; Zoltán N Oltvai; Ziv Bar-Joseph
Journal:  Bioinformatics       Date:  2010-08-11       Impact factor: 6.937

6.  Gene co-regulation is highly conserved in the evolution of eukaryotes and prokaryotes.

Authors:  Berend Snel; Vera van Noort; Martijn A Huynen
Journal:  Nucleic Acids Res       Date:  2004-09-07       Impact factor: 16.971

7.  TreeFam: a curated database of phylogenetic trees of animal gene families.

Authors:  Heng Li; Avril Coghlan; Jue Ruan; Lachlan James Coin; Jean-Karim Hériché; Lara Osmotherly; Ruiqiang Li; Tao Liu; Zhang Zhang; Lars Bolund; Gane Ka-Shu Wong; Weimou Zheng; Paramvir Dehal; Jun Wang; Richard Durbin
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

8.  Meta-analysis of RNA-seq expression data across species, tissues and studies.

Authors:  Peter H Sudmant; Maria S Alexis; Christopher B Burge
Journal:  Genome Biol       Date:  2015-12-22       Impact factor: 13.583

9.  A novel method for cross-species gene expression analysis.

Authors:  Erik Kristiansson; Tobias Österlund; Lina Gunnarsson; Gabriella Arne; D G Joakim Larsson; Olle Nerman
Journal:  BMC Bioinformatics       Date:  2013-02-27       Impact factor: 3.169

10.  Similarities and differences in genome-wide expression data of six organisms.

Authors:  Sven Bergmann; Jan Ihmels; Naama Barkai
Journal:  PLoS Biol       Date:  2003-12-15       Impact factor: 8.029

View more
  1 in total

1.  PseudotimeDE: inference of differential gene expression along cell pseudotime with well-calibrated p-values from single-cell RNA sequencing data.

Authors:  Dongyuan Song; Jingyi Jessica Li
Journal:  Genome Biol       Date:  2021-04-29       Impact factor: 13.583

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.