Literature DB >> 20560929

Asymptotic conditional singular value decomposition for high-dimensional genomic data.

Jeffrey T Leek1.   

Abstract

High-dimensional data, such as those obtained from a gene expression microarray or second generation sequencing experiment, consist of a large number of dependent features measured on a small number of samples. One of the key problems in genomics is the identification and estimation of factors that associate with many features simultaneously. Identifying the number of factors is also important for unsupervised statistical analyses such as hierarchical clustering. A conditional factor model is the most common model for many types of genomic data, ranging from gene expression, to single nucleotide polymorphisms, to methylation. Here we show that under a conditional factor model for genomic data with a fixed sample size, the right singular vectors are asymptotically consistent for the unobserved latent factors as the number of features diverges. We also propose a consistent estimator of the dimension of the underlying conditional factor model for a finite fixed sample size and an infinite number of features based on a scaled eigen-decomposition. We propose a practical approach for selection of the number of factors in real data sets, and we illustrate the utility of these results for capturing batch and other unmodeled effects in a microarray experiment using the dependence kernel approach of Leek and Storey (2008, Proceedings of the National Academy of Sciences of the United States of America 105, 18718-18723).
© 2010, The International Biometric Society.

Entities:  

Mesh:

Year:  2010        PMID: 20560929      PMCID: PMC3165001          DOI: 10.1111/j.1541-0420.2010.01455.x

Source DB:  PubMed          Journal:  Biometrics        ISSN: 0006-341X            Impact factor:   2.571


  12 in total

1.  Thresholding of statistical maps in functional neuroimaging using the false discovery rate.

Authors:  Christopher R Genovese; Nicole A Lazar; Thomas Nichols
Journal:  Neuroimage       Date:  2002-04       Impact factor: 6.556

2.  Principal component analysis for clustering gene expression data.

Authors:  K Y Yeung; W L Ruzzo
Journal:  Bioinformatics       Date:  2001-09       Impact factor: 6.937

Review 3.  Mapping complex disease loci in whole-genome association studies.

Authors:  Christopher S Carlson; Michael A Eberle; Leonid Kruglyak; Deborah A Nickerson
Journal:  Nature       Date:  2004-05-27       Impact factor: 49.962

4.  A unified statistical approach for determining significant signals in images of cerebral activation.

Authors:  K J Worsley; S Marrett; P Neelin; A C Vandal; K J Friston; A C Evans
Journal:  Hum Brain Mapp       Date:  1996       Impact factor: 5.038

5.  Adjusting batch effects in microarray expression data using empirical Bayes methods.

Authors:  W Evan Johnson; Cheng Li; Ariel Rabinovic
Journal:  Biostatistics       Date:  2006-04-21       Impact factor: 5.899

6.  Quantitative high-throughput screening: a titration-based approach that efficiently identifies biological activities in large chemical libraries.

Authors:  James Inglese; Douglas S Auld; Ajit Jadhav; Ronald L Johnson; Anton Simeonov; Adam Yasgar; Wei Zheng; Christopher P Austin
Journal:  Proc Natl Acad Sci U S A       Date:  2006-07-24       Impact factor: 11.205

7.  Principal components analysis corrects for stratification in genome-wide association studies.

Authors:  Alkes L Price; Nick J Patterson; Robert M Plenge; Michael E Weinblatt; Nancy A Shadick; David Reich
Journal:  Nat Genet       Date:  2006-07-23       Impact factor: 38.330

8.  A general framework for multiple testing dependence.

Authors:  Jeffrey T Leek; John D Storey
Journal:  Proc Natl Acad Sci U S A       Date:  2008-11-24       Impact factor: 11.205

9.  Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels.

Authors:  Richa Saxena; Benjamin F Voight; Valeriya Lyssenko; Noël P Burtt; Paul I W de Bakker; Hong Chen; Jeffrey J Roix; Sekar Kathiresan; Joel N Hirschhorn; Mark J Daly; Thomas E Hughes; Leif Groop; David Altshuler; Peter Almgren; Jose C Florez; Joanne Meyer; Kristin Ardlie; Kristina Bengtsson Boström; Bo Isomaa; Guillaume Lettre; Ulf Lindblad; Helen N Lyon; Olle Melander; Christopher Newton-Cheh; Peter Nilsson; Marju Orho-Melander; Lennart Råstam; Elizabeth K Speliotes; Marja-Riitta Taskinen; Tiinamaija Tuomi; Candace Guiducci; Anna Berglund; Joyce Carlson; Lauren Gianniny; Rachel Hackett; Liselotte Hall; Johan Holmkvist; Esa Laurila; Marketa Sjögren; Maria Sterner; Aarti Surti; Margareta Svensson; Malin Svensson; Ryan Tewhey; Brendan Blumenstiel; Melissa Parkin; Matthew Defelice; Rachel Barry; Wendy Brodeur; Jody Camarata; Nancy Chia; Mary Fava; John Gibbons; Bob Handsaker; Claire Healy; Kieu Nguyen; Casey Gates; Carrie Sougnez; Diane Gage; Marcia Nizzari; Stacey B Gabriel; Gung-Wei Chirn; Qicheng Ma; Hemang Parikh; Delwood Richardson; Darrell Ricke; Shaun Purcell
Journal:  Science       Date:  2007-04-26       Impact factor: 47.728

10.  Three-parameter lognormal distribution ubiquitously found in cDNA microarray data and its application to parametric data treatment.

Authors:  Tomokazu Konishi
Journal:  BMC Bioinformatics       Date:  2004-01-13       Impact factor: 3.169

View more
  23 in total

1.  Reducing system noise in copy number data using principal components of self-self hybridizations.

Authors:  Yoon-ha Lee; Michael Ronemus; Jude Kendall; B Lakshmi; Anthony Leotta; Dan Levy; Diane Esposito; Vladimir Grubor; Kenny Ye; Michael Wigler; Boris Yamrom
Journal:  Proc Natl Acad Sci U S A       Date:  2011-12-29       Impact factor: 11.205

2.  CoGAPS: an R/C++ package to identify patterns and biological process activity in transcriptomic data.

Authors:  Elana J Fertig; Jie Ding; Alexander V Favorov; Giovanni Parmigiani; Michael F Ochs
Journal:  Bioinformatics       Date:  2010-09-01       Impact factor: 6.937

3.  Significance analysis and statistical dissection of variably methylated regions.

Authors:  Andrew E Jaffe; Andrew P Feinberg; Rafael A Irizarry; Jeffrey T Leek
Journal:  Biostatistics       Date:  2011-06-17       Impact factor: 5.899

4.  Identification of epigenetic modulators in human breast cancer by integrated analysis of DNA methylation and RNA-Seq data.

Authors:  Xin Zhou; Zhibin Chen; Xiaodong Cai
Journal:  Epigenetics       Date:  2018-08-07       Impact factor: 4.528

5.  An improved and explicit surrogate variable analysis procedure by coefficient adjustment.

Authors:  Seunggeun Lee; Wei Sun; Fred A Wright; Fei Zou
Journal:  Biometrika       Date:  2017-04-21       Impact factor: 2.445

6.  svaseq: removing batch effects and other unwanted noise from sequencing data.

Authors:  Jeffrey T Leek
Journal:  Nucleic Acids Res       Date:  2014-10-07       Impact factor: 16.971

7.  Fast, Exact Bootstrap Principal Component Analysis for p > 1 million.

Authors:  Aaron Fisher; Brian Caffo; Brian Schwartz; Vadim Zipunnikov
Journal:  J Am Stat Assoc       Date:  2016-08-18       Impact factor: 5.033

8.  Differential gene expression data from the human central nervous system across Alzheimer's disease, Lewy body diseases, and the amyotrophic lateral sclerosis and frontotemporal dementia spectrum.

Authors:  Ayush Noori; Aziz M Mezlini; Bradley T Hyman; Alberto Serrano-Pozo; Sudeshna Das
Journal:  Data Brief       Date:  2021-02-11

9.  A comparison of feature selection and classification methods in DNA methylation studies using the Illumina Infinium platform.

Authors:  Joanna Zhuang; Martin Widschwendter; Andrew E Teschendorff
Journal:  BMC Bioinformatics       Date:  2012-04-24       Impact factor: 3.169

10.  A DNA methylation network interaction measure, and detection of network oncomarkers.

Authors:  Thomas E Bartlett; Sofia C Olhede; Alexey Zaikin
Journal:  PLoS One       Date:  2014-01-06       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.