Literature DB >> 22508067

Sparse principal component analysis for identifying ancestry-informative markers in genome-wide association studies.

Seokho Lee1, Michael P Epstein, Richard Duncan, Xihong Lin.   

Abstract

Genome-wide association studies (GWAS) routinely apply principal component analysis (PCA) to infer population structure within a sample to correct for confounding due to ancestry. GWAS implementation of PCA uses tens of thousands of single-nucleotide polymorphisms (SNPs) to infer structure, despite the fact that only a small fraction of such SNPs provides useful information on ancestry. The identification of this reduced set of ancestry-informative markers (AIMs) from a GWAS has practical value; for example, researchers can genotype the AIM set to correct for potential confounding due to ancestry in follow-up studies that utilize custom SNP or sequencing technology. We propose a novel technique to identify AIMs from genome-wide SNP data using sparse PCA. The procedure uses penalized regression methods to identify those SNPs in a genome-wide panel that significantly contribute to the principal components while encouraging SNPs that provide negligible loadings to vanish from the analysis. We found that sparse PCA leads to negligible loss of ancestry information compared to traditional PCA analysis of genome-wide SNP data. We further demonstrate the value of sparse PCA for AIM selection using real data from the International HapMap Project and a genomewide study of inflammatory bowel disease. We have implemented our approach in open-source R software for public use.
© 2012 Wiley Periodicals, Inc.

Entities:  

Mesh:

Year:  2012        PMID: 22508067      PMCID: PMC3596262          DOI: 10.1002/gepi.21621

Source DB:  PubMed          Journal:  Genet Epidemiol        ISSN: 0741-0395            Impact factor:   2.135


  26 in total

1.  Principal components analysis corrects for stratification in genome-wide association studies.

Authors:  Alkes L Price; Nick J Patterson; Robert M Plenge; Michael E Weinblatt; Nancy A Shadick; David Reich
Journal:  Nat Genet       Date:  2006-07-23       Impact factor: 38.330

2.  A genomewide single-nucleotide-polymorphism panel with high ancestry information for African American admixture mapping.

Authors:  Chao Tian; David A Hinds; Russell Shigeta; Rick Kittles; Dennis G Ballinger; Michael F Seldin
Journal:  Am J Hum Genet       Date:  2006-08-15       Impact factor: 11.025

3.  A panel of ancestry informative markers for estimating individual biogeographical ancestry and admixture from four continents: utility and applications.

Authors:  Indrani Halder; Mark Shriver; Matt Thomas; Jose R Fernandez; Tony Frudakis
Journal:  Hum Mutat       Date:  2008-05       Impact factor: 4.878

4.  On the use of general control samples for genome-wide association studies: genetic matching highlights causal variants.

Authors:  Diana Luca; Steven Ringquist; Lambertus Klei; Ann B Lee; Christian Gieger; H-Erich Wichmann; Stefan Schreiber; Michael Krawczak; Ying Lu; Alexis Styche; Bernie Devlin; Kathryn Roeder; Massimo Trucco
Journal:  Am J Hum Genet       Date:  2008-01-24       Impact factor: 11.025

5.  PLINK: a tool set for whole-genome association and population-based linkage analyses.

Authors:  Shaun Purcell; Benjamin Neale; Kathe Todd-Brown; Lori Thomas; Manuel A R Ferreira; David Bender; Julian Maller; Pamela Sklar; Paul I W de Bakker; Mark J Daly; Pak C Sham
Journal:  Am J Hum Genet       Date:  2007-07-25       Impact factor: 11.025

6.  Improved correction for population stratification in genome-wide association studies by identifying hidden population structures.

Authors:  Qizhai Li; Kai Yu
Journal:  Genet Epidemiol       Date:  2008-04       Impact factor: 2.135

7.  Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data.

Authors:  Bingshan Li; Suzanne M Leal
Journal:  Am J Hum Genet       Date:  2008-08-07       Impact factor: 11.025

8.  Population structure and eigenanalysis.

Authors:  Nick Patterson; Alkes L Price; David Reich
Journal:  PLoS Genet       Date:  2006-12       Impact factor: 5.917

9.  Discerning the ancestry of European Americans in genetic association studies.

Authors:  Alkes L Price; Johannah Butler; Nick Patterson; Cristian Capelli; Vincenzo L Pascali; Francesca Scarnicci; Andres Ruiz-Linares; Leif Groop; Angelica A Saetta; Penelope Korkolopoulou; Uri Seligsohn; Alicja Waliszewska; Christine Schirmer; Kristin Ardlie; Alexis Ramos; James Nemesh; Lori Arbeitman; David B Goldstein; David Reich; Joel N Hirschhorn
Journal:  PLoS Genet       Date:  2007-11-19       Impact factor: 5.917

10.  PCA-correlated SNPs for structure identification in worldwide human populations.

Authors:  Peristera Paschou; Elad Ziv; Esteban G Burchard; Shweta Choudhry; William Rodriguez-Cintron; Michael W Mahoney; Petros Drineas
Journal:  PLoS Genet       Date:  2007-09       Impact factor: 5.917

View more
  9 in total

1.  Utilizing the Jaccard index to reveal population stratification in sequencing data: a simulation study and an application to the 1000 Genomes Project.

Authors:  Dmitry Prokopenko; Julian Hecker; Edwin K Silverman; Marcello Pagano; Markus M Nöthen; Christian Dina; Christoph Lange; Heide Loehlein Fier
Journal:  Bioinformatics       Date:  2015-12-31       Impact factor: 6.937

2.  Integrative sparse principal component analysis of gene expression data.

Authors:  Mengque Liu; Xinyan Fan; Kuangnan Fang; Qingzhao Zhang; Shuangge Ma
Journal:  Genet Epidemiol       Date:  2017-11-08       Impact factor: 2.135

3.  Lymphocyte-monocyte-neutrophil index: a predictor of severity of coronavirus disease 2019 patients produced by sparse principal component analysis.

Authors:  Yingjie Qi; Jian-An Jia; Huiming Li; Nagen Wan; Shuqin Zhang; Xiaoling Ma
Journal:  Virol J       Date:  2021-06-04       Impact factor: 4.099

4.  Fast principal component analysis of large-scale genome-wide data.

Authors:  Gad Abraham; Michael Inouye
Journal:  PLoS One       Date:  2014-04-09       Impact factor: 3.240

Review 5.  Dimension reduction techniques for the integrative analysis of multi-omics data.

Authors:  Chen Meng; Oana A Zeleznik; Gerhard G Thallinger; Bernhard Kuster; Amin M Gholami; Aedín C Culhane
Journal:  Brief Bioinform       Date:  2016-03-11       Impact factor: 11.622

Review 6.  A Nonlinear Model for Gene-Based Gene-Environment Interaction.

Authors:  Jian Sa; Xu Liu; Tao He; Guifen Liu; Yuehua Cui
Journal:  Int J Mol Sci       Date:  2016-06-04       Impact factor: 5.923

7.  Identifying regions of disease-related variants in admixed populations with the summation partition approach.

Authors:  Jonathan Auerbach; Michael Agne; Rachel Fan; Adeline Lo; Shaw-Hwa Lo; Tian Zheng; Pei Wang
Journal:  BMC Proc       Date:  2016-10-18

8.  Multiple Group Testing Procedures for Analysis of High-Dimensional Genomic Data.

Authors:  Hyoseok Ko; Kipoong Kim; Hokeun Sun
Journal:  Genomics Inform       Date:  2016-12-30

9.  LEI: A Novel Allele Frequency-Based Feature Selection Method for Multi-ancestry Admixed Populations.

Authors:  Michael J Wathen; Yadu Gautam; Sudhir Ghandikota; Marepalli B Rao; Tesfaye B Mersha
Journal:  Sci Rep       Date:  2019-07-31       Impact factor: 4.379

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.