Literature DB >> 32196066

Fast and robust ancestry prediction using principal component analysis.

Daiwei Zhang1, Rounak Dey2, Seunggeun Lee1.   

Abstract

MOTIVATION: Population stratification (PS) is a major confounder in genome-wide association studies (GWAS) and can lead to false-positive associations. To adjust for PS, principal component analysis (PCA)-based ancestry prediction has been widely used. Simple projection (SP) based on principal component loadings and the recently developed data augmentation, decomposition and Procrustes (ADP) transformation, such as LASER and TRACE, are popular methods for predicting PC scores. However, the predicted PC scores from SP can be biased toward NULL. On the other hand, ADP has a high computation cost because it requires running PCA separately for each study sample on the augmented dataset.
RESULTS: We develop and propose two alternative approaches: bias-adjusted projection (AP) and online ADP (OADP). Using random matrix theory, AP asymptotically estimates and adjusts for the bias of SP. OADP uses a computationally efficient online singular value decomposition algorithm, which can greatly reduce the computation cost of ADP. We carried out extensive simulation studies to show that these alternative approaches are unbiased and the computation speed can be 16-16 000 times faster than ADP. We applied our approaches to the UK Biobank data of 488 366 study samples with 2492 samples from the 1000 Genomes data as the reference. AP and OADP required 0.82 and 21 CPU hours, respectively, while the projected computation time of ADP was 1628 CPU hours. Furthermore, when inferring sub-European ancestry, SP clearly showed bias, unlike the proposed approaches.
AVAILABILITY AND IMPLEMENTATION: The OADP and AP methods, as well as SP and ADP, have been implemented in the open-source Python software FRAPOSA, available at github.com/daviddaiweizhang/fraposa. CONTACT: leeshawn@umich.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Mesh:

Year:  2020        PMID: 32196066      PMCID: PMC7267814          DOI: 10.1093/bioinformatics/btaa152

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  12 in total

1.  Principal components analysis corrects for stratification in genome-wide association studies.

Authors:  Alkes L Price; Nick J Patterson; Robert M Plenge; Michael E Weinblatt; Nancy A Shadick; David Reich
Journal:  Nat Genet       Date:  2006-07-23       Impact factor: 38.330

2.  Principal component analysis of genetic data.

Authors:  David Reich; Alkes L Price; Nick Patterson
Journal:  Nat Genet       Date:  2008-05       Impact factor: 38.330

3.  Ancestry estimation and control of population stratification for sequence-based association studies.

Authors:  Chaolong Wang; Xiaowei Zhan; Jennifer Bragg-Gresham; Hyun Min Kang; Dwight Stambolian; Emily Y Chew; Kari E Branham; John Heckenlively; Robert Fulton; Richard K Wilson; Elaine R Mardis; Xihong Lin; Anand Swaroop; Sebastian Zöllner; Gonçalo R Abecasis
Journal:  Nat Genet       Date:  2014-03-16       Impact factor: 38.330

4.  Fast Principal-Component Analysis Reveals Convergent Evolution of ADH1B in Europe and East Asia.

Authors:  Kevin J Galinsky; Gaurav Bhatia; Po-Ru Loh; Stoyan Georgiev; Sayan Mukherjee; Nick J Patterson; Alkes L Price
Journal:  Am J Hum Genet       Date:  2016-02-25       Impact factor: 11.025

5.  CONVERGENCE AND PREDICTION OF PRINCIPAL COMPONENT SCORES IN HIGH-DIMENSIONAL SETTINGS.

Authors:  Seunggeun Lee; Fei Zou; Fred A Wright
Journal:  Ann Stat       Date:  2010-01-01       Impact factor: 4.028

6.  Differential confounding of rare and common variants in spatially structured populations.

Authors:  Iain Mathieson; Gil McVean
Journal:  Nat Genet       Date:  2012-02-05       Impact factor: 38.330

7.  UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age.

Authors:  Cathie Sudlow; John Gallacher; Naomi Allen; Valerie Beral; Paul Burton; John Danesh; Paul Downey; Paul Elliott; Jane Green; Martin Landray; Bette Liu; Paul Matthews; Giok Ong; Jill Pell; Alan Silman; Alan Young; Tim Sprosen; Tim Peakman; Rory Collins
Journal:  PLoS Med       Date:  2015-03-31       Impact factor: 11.069

8.  A global reference for human genetic variation.

Authors:  Adam Auton; Lisa D Brooks; Richard M Durbin; Erik P Garrison; Hyun Min Kang; Jan O Korbel; Jonathan L Marchini; Shane McCarthy; Gil A McVean; Gonçalo R Abecasis
Journal:  Nature       Date:  2015-10-01       Impact factor: 49.962

9.  Identification of a rare coding variant in complement 3 associated with age-related macular degeneration.

Authors:  Xiaowei Zhan; David E Larson; Chaolong Wang; Daniel C Koboldt; Yuri V Sergeev; Robert S Fulton; Lucinda L Fulton; Catrina C Fronick; Kari E Branham; Jennifer Bragg-Gresham; Goo Jun; Youna Hu; Hyun Min Kang; Dajiang Liu; Mohammad Othman; Matthew Brooks; Rinki Ratnapriya; Alexis Boleda; Felix Grassmann; Claudia von Strachwitz; Lana M Olson; Gabriëlle H S Buitendijk; Albert Hofman; Cornelia M van Duijn; Valentina Cipriani; Anthony T Moore; Humma Shahid; Yingda Jiang; Yvette P Conley; Denise J Morgan; Ivana K Kim; Matthew P Johnson; Stuart Cantsilieris; Andrea J Richardson; Robyn H Guymer; Hongrong Luo; Hong Ouyang; Christoph Licht; Fred G Pluthero; Mindy M Zhang; Kang Zhang; Paul N Baird; John Blangero; Michael L Klein; Lindsay A Farrer; Margaret M DeAngelis; Daniel E Weeks; Michael B Gorin; John R W Yates; Caroline C W Klaver; Margaret A Pericak-Vance; Jonathan L Haines; Bernhard H F Weber; Richard K Wilson; John R Heckenlively; Emily Y Chew; Dwight Stambolian; Elaine R Mardis; Anand Swaroop; Goncalo R Abecasis
Journal:  Nat Genet       Date:  2013-09-15       Impact factor: 38.330

10.  The UK Biobank resource with deep phenotyping and genomic data.

Authors:  Clare Bycroft; Colin Freeman; Desislava Petkova; Gavin Band; Lloyd T Elliott; Kevin Sharp; Allan Motyer; Damjan Vukcevic; Olivier Delaneau; Jared O'Connell; Adrian Cortes; Samantha Welsh; Alan Young; Mark Effingham; Gil McVean; Stephen Leslie; Naomi Allen; Peter Donnelly; Jonathan Marchini
Journal:  Nature       Date:  2018-10-10       Impact factor: 49.962

View more
  8 in total

1.  Rare genetic variants explain missing heritability in smoking.

Authors:  Seon-Kyeong Jang; Luke Evans; Allison Fialkowski; Donna K Arnett; Allison E Ashley-Koch; Kathleen C Barnes; Diane M Becker; Joshua C Bis; John Blangero; Eugene R Bleecker; Meher Preethi Boorgula; Donald W Bowden; Jennifer A Brody; Brian E Cade; Brenda W Campbell Jenkins; April P Carson; Sameer Chavan; L Adrienne Cupples; Brian Custer; Scott M Damrauer; Sean P David; Mariza de Andrade; Carla L Dinardo; Tasha E Fingerlin; Myriam Fornage; Barry I Freedman; Melanie E Garrett; Sina A Gharib; David C Glahn; Jeffrey Haessler; Susan R Heckbert; John E Hokanson; Lifang Hou; Shih-Jen Hwang; Matthew C Hyman; Renae Judy; Anne E Justice; Robert C Kaplan; Sharon L R Kardia; Shannon Kelly; Wonji Kim; Charles Kooperberg; Daniel Levy; Donald M Lloyd-Jones; Ruth J F Loos; Ani W Manichaikul; Mark T Gladwin; Lisa Warsinger Martin; Mehdi Nouraie; Olle Melander; Deborah A Meyers; Courtney G Montgomery; Kari E North; Elizabeth C Oelsner; Nicholette D Palmer; Marinelle Payton; Anna L Peljto; Patricia A Peyser; Michael Preuss; Bruce M Psaty; Dandi Qiao; Daniel J Rader; Nicholas Rafaels; Susan Redline; Robert M Reed; Alexander P Reiner; Stephen S Rich; Jerome I Rotter; David A Schwartz; Aladdin H Shadyab; Edwin K Silverman; Nicholas L Smith; J Gustav Smith; Albert V Smith; Jennifer A Smith; Weihong Tang; Kent D Taylor; Marilyn J Telen; Ramachandran S Vasan; Victor R Gordeuk; Zhe Wang; Kerri L Wiggins; Lisa R Yanek; Ivana V Yang; Kendra A Young; Kristin L Young; Yingze Zhang; Dajiang J Liu; Matthew C Keller; Scott Vrieze
Journal:  Nat Hum Behav       Date:  2022-08-04

2.  Integrating external controls in case-control studies improves power for rare-variant tests.

Authors:  Yatong Li; Seunggeun Lee
Journal:  Genet Epidemiol       Date:  2022-02-16       Impact factor: 2.344

3.  Portability of 245 polygenic scores when derived from the UK Biobank and applied to 9 ancestry groups from the same cohort.

Authors:  Florian Privé; Hugues Aschard; Shai Carmi; Lasse Folkersen; Clive Hoggart; Paul F O'Reilly; Bjarni J Vilhjálmsson
Journal:  Am J Hum Genet       Date:  2022-01-06       Impact factor: 11.043

4.  Novel score test to increase power in association test by integrating external controls.

Authors:  Yatong Li; Seunggeun Lee
Journal:  Genet Epidemiol       Date:  2020-11-08       Impact factor: 2.344

5.  On cross-ancestry cancer polygenic risk scores.

Authors:  Lars G Fritsche; Ying Ma; Daiwei Zhang; Maxwell Salvatore; Seunggeun Lee; Xiang Zhou; Bhramar Mukherjee
Journal:  PLoS Genet       Date:  2021-09-16       Impact factor: 6.020

6.  Postictal Psychosis in Epilepsy: A Clinicogenetic Study.

Authors:  Vera Braatz; Helena Martins Custodio; Simona Balestrini; Sanjay M Sisodiya; Costin Leu; Luigi Agrò; Baihan Wang; Stella Calafato; Genevieve Rayner; Michael G Doyle; Christian Hengsbach; Francesca Bisulli; Yvonne G Weber; Antonio Gambardella; Norman Delanty; Gianpiero Cavalleri; Jacqueline Foong; Ingrid E Scheffer; Samuel F Berkovic; Elvira Bramon
Journal:  Ann Neurol       Date:  2021-08-03       Impact factor: 11.274

7.  Effect of Reconstruction Algorithm on the Identification of 3D Printing Polymers Based on Hyperspectral CT Technology Combined with Artificial Neural Network.

Authors:  Zheng Fang; Renbin Wang; Mengyi Wang; Shuo Zhong; Liquan Ding; Siyuan Chen
Journal:  Materials (Basel)       Date:  2020-04-22       Impact factor: 3.623

8.  Efficient toolkit implementing best practices for principal component analysis of population genetic data.

Authors:  Florian Privé; Keurcien Luu; Michael G B Blum; John J McGrath; Bjarni J Vilhjálmsson
Journal:  Bioinformatics       Date:  2020-08-15       Impact factor: 6.937

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.