Literature DB >> 33951731

Comparison of sparse biclustering algorithms for gene expression datasets.

Kath Nicholls1, Chris Wallace1,2.   

Abstract

MOTIVATION: Gene clustering and sample clustering are commonly used to find patterns in gene expression datasets. However, genes may cluster differently in heterogeneous samples (e.g. different tissues or disease states), whilst traditional methods assume that clusters are consistent across samples. Biclustering algorithms aim to solve this issue by performing sample clustering and gene clustering simultaneously. Existing reviews of biclustering algorithms have yet to include a number of more recent algorithms and have based comparisons on simplistic simulated datasets without specific evaluation of biclusters in real datasets, using less robust metrics.
RESULTS: We compared four classes of sparse biclustering algorithms on a range of simulated and real datasets. All algorithms generally struggled on simulated datasets with a large number of genes or implanted biclusters. We found that Bayesian algorithms with strict sparsity constraints had high accuracy on the simulated datasets and did not require any post-processing, but were considerably slower than other algorithm classes. We found that non-negative matrix factorisation algorithms performed poorly, but could be re-purposed for biclustering through a sparsity-inducing post-processing procedure we introduce; one such algorithm was one of the most highly ranked on real datasets. In a multi-tissue knockout mouse RNA-seq dataset, the algorithms rarely returned clusters containing samples from multiple different tissues, whilst such clusters were identified in a human dataset of more closely related cell types (sorted blood cell subsets). This highlights the need for further thought in the design and analysis of multi-tissue studies to avoid differences between tissues dominating the analysis. AVAILABILITY: Code to run the analysis is available at https://github.com/nichollskc/biclust_comp, including wrappers for each algorithm, implementations of evaluation metrics, and code to simulate datasets and perform pre- and post-processing. The full tables of results are available at https://doi.org/10.5281/zenodo.4581206.
© The Author(s) 2021. Published by Oxford University Press.

Entities:  

Keywords:  biclustering; clustering; gene expression; multi-tissue

Mesh:

Year:  2021        PMID: 33951731      PMCID: PMC8574648          DOI: 10.1093/bib/bbab140

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


  21 in total

1.  Shifting and scaling patterns from gene expression data.

Authors:  Jesús S Aguilar-Ruiz
Journal:  Bioinformatics       Date:  2005-09-06       Impact factor: 6.937

2.  A systematic comparison and evaluation of biclustering methods for gene expression data.

Authors:  Amela Prelić; Stefan Bleuler; Philip Zimmermann; Anja Wille; Peter Bühlmann; Wilhelm Gruissem; Lars Hennig; Lothar Thiele; Eckart Zitzler
Journal:  Bioinformatics       Date:  2006-02-24       Impact factor: 6.937

3.  Nonsmooth nonnegative matrix factorization (nsNMF).

Authors:  Alberto Pascual-Montano; J M Carazo; Kieko Kochi; Dietrich Lehmann; Roberto D Pascual-Marqui
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2006-03       Impact factor: 6.226

4.  Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis.

Authors:  Hyunsoo Kim; Haesun Park
Journal:  Bioinformatics       Date:  2007-05-05       Impact factor: 6.937

5.  Snakemake--a scalable bioinformatics workflow engine.

Authors:  Johannes Köster; Sven Rahmann
Journal:  Bioinformatics       Date:  2012-08-20       Impact factor: 6.937

6.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.

Authors:  T R Golub; D K Slonim; P Tamayo; C Huard; M Gaasenbeek; J P Mesirov; H Coller; M L Loh; J R Downing; M A Caligiuri; C D Bloomfield; E S Lander
Journal:  Science       Date:  1999-10-15       Impact factor: 47.728

7.  A systematic comparative evaluation of biclustering techniques.

Authors:  Victor A Padilha; Ricardo J G B Campello
Journal:  BMC Bioinformatics       Date:  2017-01-23       Impact factor: 3.169

8.  Tensor decomposition for multiple-tissue gene expression experiments.

Authors:  Victoria Hore; Ana Viñuela; Alfonso Buil; Julian Knight; Mark I McCarthy; Kerrin Small; Jonathan Marchini
Journal:  Nat Genet       Date:  2016-08-01       Impact factor: 38.330

9.  A comprehensive evaluation of module detection methods for gene expression data.

Authors:  Wouter Saelens; Robrecht Cannoodt; Yvan Saeys
Journal:  Nat Commun       Date:  2018-03-15       Impact factor: 14.919

10.  The International Mouse Phenotyping Consortium Web Portal, a unified point of access for knockout mice and related phenotyping data.

Authors:  Gautier Koscielny; Gagarine Yaikhom; Vivek Iyer; Terrence F Meehan; Hugh Morgan; Julian Atienza-Herrero; Andrew Blake; Chao-Kung Chen; Richard Easty; Armida Di Fenza; Tanja Fiegel; Mark Grifiths; Alan Horne; Natasha A Karp; Natalja Kurbatova; Jeremy C Mason; Peter Matthews; Darren J Oakley; Asfand Qazi; Jack Regnart; Ahmad Retha; Luis A Santos; Duncan J Sneddon; Jonathan Warren; Henrik Westerberg; Robert J Wilson; David G Melvin; Damian Smedley; Steve D M Brown; Paul Flicek; William C Skarnes; Ann-Marie Mallon; Helen Parkinson
Journal:  Nucleic Acids Res       Date:  2013-11-04       Impact factor: 16.971

View more
  1 in total

1.  Healthcare Biclustering-Based Prediction on Gene Expression Dataset.

Authors:  M Ramkumar; N Basker; D Pradeep; Ramesh Prajapati; N Yuvaraj; R Arshath Raja; C Suresh; Rahul Vignesh; U Barakkath Nisha; K Srihari; Assefa Alene
Journal:  Biomed Res Int       Date:  2022-02-22       Impact factor: 3.411

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.