Literature DB >> 29430031

An improved and explicit surrogate variable analysis procedure by coefficient adjustment.

Seunggeun Lee1, Wei Sun2, Fred A Wright3, Fei Zou4.   

Abstract

Unobserved environmental, demographic and technical factors canadversely affect the estimation and testing of the effects ofprimary variables. Surrogate variable analysis, proposed to tacklethis problem, has been widely used in genomic studies. To estimatehidden factors that are correlated with the primary variables,surrogate variable analysis performs principal component analysiseither on a subset of features or on all features, but weightingeach differently. However, existing approaches may fail to identifyhidden factors that are strongly correlated with the primaryvariables, and the extra step of feature selection and weightcalculation makes the theoretical investigation of surrogatevariable analysis challenging. In this paper, we propose an improvedsurrogate variable analysis, using all measured features, that has anatural connection with restricted least squares, which allows us tostudy its theoretical properties. Simulation studies and real-dataanalysis show that the method is competitive with state-of-the-artmethods.

Entities:  

Keywords:  Batch effect; High-dimensional data; Principal component analysis; Surrogate variable analysis

Year:  2017        PMID: 29430031      PMCID: PMC5627626          DOI: 10.1093/biomet/asx018

Source DB:  PubMed          Journal:  Biometrika        ISSN: 0006-3444            Impact factor:   2.445


  24 in total

1.  Surrogate variable analysis using partial least squares (SVA-PLS) in gene expression studies.

Authors:  Sutirtha Chakraborty; Somnath Datta; Susmita Datta
Journal:  Bioinformatics       Date:  2012-01-11       Impact factor: 6.937

2.  Using control genes to correct for unwanted variation in microarray data.

Authors:  Johann A Gagnon-Bartsch; Terence P Speed
Journal:  Biostatistics       Date:  2011-11-17       Impact factor: 5.899

3.  Correction for hidden confounders in the genetic analysis of gene expression.

Authors:  Jennifer Listgarten; Carl Kadie; Eric E Schadt; David Heckerman
Journal:  Proc Natl Acad Sci U S A       Date:  2010-09-01       Impact factor: 11.205

4.  Asymptotic conditional singular value decomposition for high-dimensional genomic data.

Authors:  Jeffrey T Leek
Journal:  Biometrics       Date:  2010-06-16       Impact factor: 2.571

5.  Remarks on Parallel Analysis.

Authors:  A Buja; N Eyuboglu
Journal:  Multivariate Behav Res       Date:  1992-10-01       Impact factor: 5.923

6.  Principal components analysis corrects for stratification in genome-wide association studies.

Authors:  Alkes L Price; Nick J Patterson; Robert M Plenge; Michael E Weinblatt; Nancy A Shadick; David Reich
Journal:  Nat Genet       Date:  2006-07-23       Impact factor: 38.330

7.  Independent surrogate variable analysis to deconvolve confounding factors in large-scale microarray profiling studies.

Authors:  Andrew E Teschendorff; Joanna Zhuang; Martin Widschwendter
Journal:  Bioinformatics       Date:  2011-04-06       Impact factor: 6.937

8.  CONVERGENCE AND PREDICTION OF PRINCIPAL COMPONENT SCORES IN HIGH-DIMENSIONAL SETTINGS.

Authors:  Seunggeun Lee; Fei Zou; Fred A Wright
Journal:  Ann Stat       Date:  2010-01-01       Impact factor: 4.028

9.  Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed.

Authors:  Laurent Jacob; Johann A Gagnon-Bartsch; Terence P Speed
Journal:  Biostatistics       Date:  2015-08-17       Impact factor: 5.899

10.  A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies.

Authors:  Oliver Stegle; Leopold Parts; Richard Durbin; John Winn
Journal:  PLoS Comput Biol       Date:  2010-05-06       Impact factor: 4.475

View more
  6 in total

1.  ESTIMATION AND INFERENCE IN METABOLOMICS WITH NON-RANDOM MISSING DATA AND LATENT FACTORS.

Authors:  Chris McKennan; Carole Ober; Dan Nicolae
Journal:  Ann Appl Stat       Date:  2020-06-29       Impact factor: 2.083

2.  Estimating and accounting for unobserved covariates in high-dimensional correlated data.

Authors:  Chris McKennan; Dan Nicolae
Journal:  J Am Stat Assoc       Date:  2020-06-30       Impact factor: 4.369

3.  Accounting for unobserved covariates with varying degrees of estimability in high-dimensional biological data.

Authors:  Chris McKennan; Dan Nicolae
Journal:  Biometrika       Date:  2019-09-16       Impact factor: 3.028

4.  Data-based RNA-seq simulations by binomial thinning.

Authors:  David Gerard
Journal:  BMC Bioinformatics       Date:  2020-05-24       Impact factor: 3.169

5.  A comparison of methods accounting for batch effects in differential expression analysis of UMI count based single cell RNA sequencing.

Authors:  Wenan Chen; Silu Zhang; Justin Williams; Bensheng Ju; Bridget Shaner; John Easton; Gang Wu; Xiang Chen
Journal:  Comput Struct Biotechnol J       Date:  2020-03-30       Impact factor: 6.155

6.  Causal Discovery in High-Dimensional Point Process Networks with Hidden Nodes.

Authors:  Xu Wang; Ali Shojaie
Journal:  Entropy (Basel)       Date:  2021-12-01       Impact factor: 2.524

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.