Daiwei Zhang1, Rounak Dey2, Seunggeun Lee1. 1. Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA. 2. Department of Biostatistics, Harvard University, Boston, MA 02115, USA.
Abstract
MOTIVATION: Population stratification (PS) is a major confounder in genome-wide association studies (GWAS) and can lead to false-positive associations. To adjust for PS, principal component analysis (PCA)-based ancestry prediction has been widely used. Simple projection (SP) based on principal component loadings and the recently developed data augmentation, decomposition and Procrustes (ADP) transformation, such as LASER and TRACE, are popular methods for predicting PC scores. However, the predicted PC scores from SP can be biased toward NULL. On the other hand, ADP has a high computation cost because it requires running PCA separately for each study sample on the augmented dataset. RESULTS: We develop and propose two alternative approaches: bias-adjusted projection (AP) and online ADP (OADP). Using random matrix theory, AP asymptotically estimates and adjusts for the bias of SP. OADP uses a computationally efficient online singular value decomposition algorithm, which can greatly reduce the computation cost of ADP. We carried out extensive simulation studies to show that these alternative approaches are unbiased and the computation speed can be 16-16 000 times faster than ADP. We applied our approaches to the UK Biobank data of 488 366 study samples with 2492 samples from the 1000 Genomes data as the reference. AP and OADP required 0.82 and 21 CPU hours, respectively, while the projected computation time of ADP was 1628 CPU hours. Furthermore, when inferring sub-European ancestry, SP clearly showed bias, unlike the proposed approaches. AVAILABILITY AND IMPLEMENTATION: The OADP and AP methods, as well as SP and ADP, have been implemented in the open-source Python software FRAPOSA, available at github.com/daviddaiweizhang/fraposa. CONTACT: leeshawn@umich.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Population stratification (PS) is a major confounder in genome-wide association studies (GWAS) and can lead to false-positive associations. To adjust for PS, principal component analysis (PCA)-based ancestry prediction has been widely used. Simple projection (SP) based on principal component loadings and the recently developed data augmentation, decomposition and Procrustes (ADP) transformation, such as LASER and TRACE, are popular methods for predicting PC scores. However, the predicted PC scores from SP can be biased toward NULL. On the other hand, ADP has a high computation cost because it requires running PCA separately for each study sample on the augmented dataset. RESULTS: We develop and propose two alternative approaches: bias-adjusted projection (AP) and online ADP (OADP). Using random matrix theory, AP asymptotically estimates and adjusts for the bias of SP. OADP uses a computationally efficient online singular value decomposition algorithm, which can greatly reduce the computation cost of ADP. We carried out extensive simulation studies to show that these alternative approaches are unbiased and the computation speed can be 16-16 000 times faster than ADP. We applied our approaches to the UK Biobank data of 488 366 study samples with 2492 samples from the 1000 Genomes data as the reference. AP and OADP required 0.82 and 21 CPU hours, respectively, while the projected computation time of ADP was 1628 CPU hours. Furthermore, when inferring sub-European ancestry, SP clearly showed bias, unlike the proposed approaches. AVAILABILITY AND IMPLEMENTATION: The OADP and AP methods, as well as SP and ADP, have been implemented in the open-source Python software FRAPOSA, available at github.com/daviddaiweizhang/fraposa. CONTACT: leeshawn@umich.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Alkes L Price; Nick J Patterson; Robert M Plenge; Michael E Weinblatt; Nancy A Shadick; David Reich Journal: Nat Genet Date: 2006-07-23 Impact factor: 38.330
Authors: Chaolong Wang; Xiaowei Zhan; Jennifer Bragg-Gresham; Hyun Min Kang; Dwight Stambolian; Emily Y Chew; Kari E Branham; John Heckenlively; Robert Fulton; Richard K Wilson; Elaine R Mardis; Xihong Lin; Anand Swaroop; Sebastian Zöllner; Gonçalo R Abecasis Journal: Nat Genet Date: 2014-03-16 Impact factor: 38.330
Authors: Cathie Sudlow; John Gallacher; Naomi Allen; Valerie Beral; Paul Burton; John Danesh; Paul Downey; Paul Elliott; Jane Green; Martin Landray; Bette Liu; Paul Matthews; Giok Ong; Jill Pell; Alan Silman; Alan Young; Tim Sprosen; Tim Peakman; Rory Collins Journal: PLoS Med Date: 2015-03-31 Impact factor: 11.069
Authors: Adam Auton; Lisa D Brooks; Richard M Durbin; Erik P Garrison; Hyun Min Kang; Jan O Korbel; Jonathan L Marchini; Shane McCarthy; Gil A McVean; Gonçalo R Abecasis Journal: Nature Date: 2015-10-01 Impact factor: 49.962
Authors: Xiaowei Zhan; David E Larson; Chaolong Wang; Daniel C Koboldt; Yuri V Sergeev; Robert S Fulton; Lucinda L Fulton; Catrina C Fronick; Kari E Branham; Jennifer Bragg-Gresham; Goo Jun; Youna Hu; Hyun Min Kang; Dajiang Liu; Mohammad Othman; Matthew Brooks; Rinki Ratnapriya; Alexis Boleda; Felix Grassmann; Claudia von Strachwitz; Lana M Olson; Gabriëlle H S Buitendijk; Albert Hofman; Cornelia M van Duijn; Valentina Cipriani; Anthony T Moore; Humma Shahid; Yingda Jiang; Yvette P Conley; Denise J Morgan; Ivana K Kim; Matthew P Johnson; Stuart Cantsilieris; Andrea J Richardson; Robyn H Guymer; Hongrong Luo; Hong Ouyang; Christoph Licht; Fred G Pluthero; Mindy M Zhang; Kang Zhang; Paul N Baird; John Blangero; Michael L Klein; Lindsay A Farrer; Margaret M DeAngelis; Daniel E Weeks; Michael B Gorin; John R W Yates; Caroline C W Klaver; Margaret A Pericak-Vance; Jonathan L Haines; Bernhard H F Weber; Richard K Wilson; John R Heckenlively; Emily Y Chew; Dwight Stambolian; Elaine R Mardis; Anand Swaroop; Goncalo R Abecasis Journal: Nat Genet Date: 2013-09-15 Impact factor: 38.330
Authors: Clare Bycroft; Colin Freeman; Desislava Petkova; Gavin Band; Lloyd T Elliott; Kevin Sharp; Allan Motyer; Damjan Vukcevic; Olivier Delaneau; Jared O'Connell; Adrian Cortes; Samantha Welsh; Alan Young; Mark Effingham; Gil McVean; Stephen Leslie; Naomi Allen; Peter Donnelly; Jonathan Marchini Journal: Nature Date: 2018-10-10 Impact factor: 49.962
Authors: Seon-Kyeong Jang; Luke Evans; Allison Fialkowski; Donna K Arnett; Allison E Ashley-Koch; Kathleen C Barnes; Diane M Becker; Joshua C Bis; John Blangero; Eugene R Bleecker; Meher Preethi Boorgula; Donald W Bowden; Jennifer A Brody; Brian E Cade; Brenda W Campbell Jenkins; April P Carson; Sameer Chavan; L Adrienne Cupples; Brian Custer; Scott M Damrauer; Sean P David; Mariza de Andrade; Carla L Dinardo; Tasha E Fingerlin; Myriam Fornage; Barry I Freedman; Melanie E Garrett; Sina A Gharib; David C Glahn; Jeffrey Haessler; Susan R Heckbert; John E Hokanson; Lifang Hou; Shih-Jen Hwang; Matthew C Hyman; Renae Judy; Anne E Justice; Robert C Kaplan; Sharon L R Kardia; Shannon Kelly; Wonji Kim; Charles Kooperberg; Daniel Levy; Donald M Lloyd-Jones; Ruth J F Loos; Ani W Manichaikul; Mark T Gladwin; Lisa Warsinger Martin; Mehdi Nouraie; Olle Melander; Deborah A Meyers; Courtney G Montgomery; Kari E North; Elizabeth C Oelsner; Nicholette D Palmer; Marinelle Payton; Anna L Peljto; Patricia A Peyser; Michael Preuss; Bruce M Psaty; Dandi Qiao; Daniel J Rader; Nicholas Rafaels; Susan Redline; Robert M Reed; Alexander P Reiner; Stephen S Rich; Jerome I Rotter; David A Schwartz; Aladdin H Shadyab; Edwin K Silverman; Nicholas L Smith; J Gustav Smith; Albert V Smith; Jennifer A Smith; Weihong Tang; Kent D Taylor; Marilyn J Telen; Ramachandran S Vasan; Victor R Gordeuk; Zhe Wang; Kerri L Wiggins; Lisa R Yanek; Ivana V Yang; Kendra A Young; Kristin L Young; Yingze Zhang; Dajiang J Liu; Matthew C Keller; Scott Vrieze Journal: Nat Hum Behav Date: 2022-08-04
Authors: Vera Braatz; Helena Martins Custodio; Simona Balestrini; Sanjay M Sisodiya; Costin Leu; Luigi Agrò; Baihan Wang; Stella Calafato; Genevieve Rayner; Michael G Doyle; Christian Hengsbach; Francesca Bisulli; Yvonne G Weber; Antonio Gambardella; Norman Delanty; Gianpiero Cavalleri; Jacqueline Foong; Ingrid E Scheffer; Samuel F Berkovic; Elvira Bramon Journal: Ann Neurol Date: 2021-08-03 Impact factor: 11.274