Literature DB >> 29617963

Detecting hidden batch factors through data-adaptive adjustment for biological effects.

Haidong Yi1, Ayush T Raman2,3, Han Zhang1,4, Genevera I Allen5, Zhandong Liu2,3.   

Abstract

Motivation: Batch effects are one of the major source of technical variations that affect the measurements in high-throughput studies such as RNA sequencing. It has been well established that batch effects can be caused by different experimental platforms, laboratory conditions, different sources of samples and personnel differences. These differences can confound the outcomes of interest and lead to spurious results. A critical input for batch correction algorithms is the knowledge of batch factors, which in many cases are unknown or inaccurate. Hence, the primary motivation of our paper is to detect hidden batch factors that can be used in standard techniques to accurately capture the relationship between gene expression and other modeled variables of interest.
Results: We introduce a new algorithm based on data-adaptive shrinkage and semi-Non-negative Matrix Factorization for the detection of unknown batch effects. We test our algorithm on three different datasets: (i) Sequencing Quality Control, (ii) Topotecan RNA-Seq and (iii) Single-cell RNA sequencing (scRNA-Seq) on Glioblastoma Multiforme. We have demonstrated a superior performance in identifying hidden batch effects as compared to existing algorithms for batch detection in all three datasets. In the Topotecan study, we were able to identify a new batch factor that has been missed by the original study, leading to under-representation of differentially expressed genes. For scRNA-Seq, we demonstrated the power of our method in detecting subtle batch effects. Availability and implementation: DASC R package is available via Bioconductor or at https://github.com/zhanglabNKU/DASC. Contact: zhanghan@nankai.edu.cn or zhandonl@bcm.edu. Supplementary information: Supplementary data are available at Bioinformatics online.

Entities:  

Mesh:

Substances:

Year:  2018        PMID: 29617963      PMCID: PMC6454417          DOI: 10.1093/bioinformatics/btx635

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  37 in total

1.  Learning the parts of objects by non-negative matrix factorization.

Authors:  D D Lee; H S Seung
Journal:  Nature       Date:  1999-10-21       Impact factor: 49.962

2.  Adjustment of systematic microarray data biases.

Authors:  Monica Benito; Joel Parker; Quan Du; Junyuan Wu; Dong Xiang; Charles M Perou; J S Marron
Journal:  Bioinformatics       Date:  2004-01-01       Impact factor: 6.937

3.  Metagenes and molecular pattern discovery using matrix factorization.

Authors:  Jean-Philippe Brunet; Pablo Tamayo; Todd R Golub; Jill P Mesirov
Journal:  Proc Natl Acad Sci U S A       Date:  2004-03-11       Impact factor: 11.205

4.  The ENCODE (ENCyclopedia Of DNA Elements) Project.

Authors: 
Journal:  Science       Date:  2004-10-22       Impact factor: 47.728

5.  Adjusting batch effects in microarray expression data using empirical Bayes methods.

Authors:  W Evan Johnson; Cheng Li; Ariel Rabinovic
Journal:  Biostatistics       Date:  2006-04-21       Impact factor: 5.899

6.  On the design and analysis of gene expression studies in human populations.

Authors:  Joshua M Akey; Shameek Biswas; Jeffrey T Leek; John D Storey
Journal:  Nat Genet       Date:  2007-07       Impact factor: 38.330

7.  Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis.

Authors:  Hyunsoo Kim; Haesun Park
Journal:  Bioinformatics       Date:  2007-05-05       Impact factor: 6.937

8.  Convex and semi-nonnegative matrix factorizations.

Authors:  Chris Ding; Tao Li; Michael I Jordan
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2010-01       Impact factor: 6.226

9.  A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies.

Authors:  Oliver Stegle; Leopold Parts; Richard Durbin; John Winn
Journal:  PLoS Comput Biol       Date:  2010-05-06       Impact factor: 4.475

10.  Capturing heterogeneity in gene expression studies by surrogate variable analysis.

Authors:  Jeffrey T Leek; John D Storey
Journal:  PLoS Genet       Date:  2007-08-01       Impact factor: 5.917

View more
  6 in total

1.  Identification and Validation of Candidate Gene Module Along With Immune Cells Infiltration Patterns in Atherosclerosis Progression to Plaque Rupture via Transcriptome Analysis.

Authors:  Jing Xu; Cheng Chen; Yuejin Yang
Journal:  Front Cardiovasc Med       Date:  2022-06-22

2.  Inferring Multiple Sclerosis Stages from the Blood Transcriptome via Machine Learning.

Authors:  Massimo Acquaviva; Ramesh Menon; Marco Di Dario; Gloria Dalla Costa; Marzia Romeo; Francesca Sangalli; Bruno Colombo; Lucia Moiola; Vittorio Martinelli; Giancarlo Comi; Cinthia Farina
Journal:  Cell Rep Med       Date:  2020-07-21

3.  Identification of a Five-mRNA Signature as a Novel Potential Prognostic Biomarker for Glioblastoma by Integrative Analysis.

Authors:  Huifang Xu; Linfang Zhang; Xiujuan Xia; Wei Shao
Journal:  Front Genet       Date:  2022-07-08       Impact factor: 4.772

4.  BatchI: Batch effect Identification in high-throughput screening data using a dynamic programming algorithm.

Authors:  Anna Papiez; Michal Marczyk; Joanna Polanska; Andrzej Polanski
Journal:  Bioinformatics       Date:  2019-06-01       Impact factor: 6.937

Review 5.  Knowledge Generation with Rule Induction in Cancer Omics.

Authors:  Giovanni Scala; Antonio Federico; Vittorio Fortino; Dario Greco; Barbara Majello
Journal:  Int J Mol Sci       Date:  2019-12-18       Impact factor: 5.923

6.  iMOKA: k-mer based software to analyze large collections of sequencing data.

Authors:  Claudio Lorenzi; Sylvain Barriere; Jean-Philippe Villemin; Laureline Dejardin Bretones; Alban Mancheron; William Ritchie
Journal:  Genome Biol       Date:  2020-10-13       Impact factor: 13.583

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.