MOTIVATION: Inference of ancestry using genetic data is motivated by applications in genetic association studies, population genetics and personal genomics. Here, we provide methods and software for improved ancestry inference using genome-wide single nucleotide polymorphism (SNP) weights from external reference panels. This approach makes it possible to leverage the rich ancestry information that is available from large external reference panels, without the administrative and computational complexities of re-analyzing the raw genotype data from the reference panel in subsequent studies. RESULTS: We extensively validate our approach in multiple African American, Latino American and European American datasets, making use of genome-wide SNP weights derived from large reference panels, including HapMap 3 populations and 6546 European Americans from the Framingham Heart Study. We show empirically that our approach provides much greater accuracy than either the prevailing ancestry-informative marker (AIM) approach or the analysis of genome-wide target genotypes without a reference panel. For example, in an independent set of 1636 European American genome-wide association study samples, we attained prediction accuracy (R(2)) of 1.000 and 0.994 for the first two principal components using our method, compared with 0.418 and 0.407 using 150 published AIMs or 0.955 and 0.003 by applying principal component analysis directly to the target samples. We finally show that the higher accuracy in inferring ancestry using our method leads to more effective correction for population stratification in association studies. AVAILABILITY: The SNPweights software is available online at http://www.hsph.harvard.edu/faculty/alkes-price/software/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Inference of ancestry using genetic data is motivated by applications in genetic association studies, population genetics and personal genomics. Here, we provide methods and software for improved ancestry inference using genome-wide single nucleotide polymorphism (SNP) weights from external reference panels. This approach makes it possible to leverage the rich ancestry information that is available from large external reference panels, without the administrative and computational complexities of re-analyzing the raw genotype data from the reference panel in subsequent studies. RESULTS: We extensively validate our approach in multiple African American, Latino American and European American datasets, making use of genome-wide SNP weights derived from large reference panels, including HapMap 3 populations and 6546 European Americans from the Framingham Heart Study. We show empirically that our approach provides much greater accuracy than either the prevailing ancestry-informative marker (AIM) approach or the analysis of genome-wide target genotypes without a reference panel. For example, in an independent set of 1636 European American genome-wide association study samples, we attained prediction accuracy (R(2)) of 1.000 and 0.994 for the first two principal components using our method, compared with 0.418 and 0.407 using 150 published AIMs or 0.955 and 0.003 by applying principal component analysis directly to the target samples. We finally show that the higher accuracy in inferring ancestry using our method leads to more effective correction for population stratification in association studies. AVAILABILITY: The SNPweights software is available online at http://www.hsph.harvard.edu/faculty/alkes-price/software/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Rajesh Kumar; Max A Seibold; Melinda C Aldrich; L Keoki Williams; Alex P Reiner; Laura Colangelo; Joshua Galanter; Christopher Gignoux; Donglei Hu; Saunak Sen; Shweta Choudhry; Edward L Peterson; Jose Rodriguez-Santana; William Rodriguez-Cintron; Michael A Nalls; Tennille S Leak; Ellen O'Meara; Bernd Meibohm; Stephen B Kritchevsky; Rongling Li; Tamara B Harris; Deborah A Nickerson; Myriam Fornage; Paul Enright; Elad Ziv; Lewis J Smith; Kiang Liu; Esteban González Burchard Journal: N Engl J Med Date: 2010-07-07 Impact factor: 91.245
Authors: Alkes L Price; Nick J Patterson; Robert M Plenge; Michael E Weinblatt; Nancy A Shadick; David Reich Journal: Nat Genet Date: 2006-07-23 Impact factor: 38.330
Authors: Greta Lee Splansky; Diane Corey; Qiong Yang; Larry D Atwood; L Adrienne Cupples; Emelia J Benjamin; Ralph B D'Agostino; Caroline S Fox; Martin G Larson; Joanne M Murabito; Christopher J O'Donnell; Ramachandran S Vasan; Philip A Wolf; Daniel Levy Journal: Am J Epidemiol Date: 2007-03-19 Impact factor: 4.897
Authors: Elisa Alonso-Perez; Marian Suarez-Gestal; Manuel Calaza; Torsten Witte; Chryssa Papasteriades; Maurizio Marchini; Sergio Migliaresi; Attila Kovacs; Josep Ordi-Ros; Marc Bijl; Maria Jose Santos; Sarka Ruzickova; Rudolf Pullmann; Patricia Carreira; Fotini N Skopouli; Sandra D'Alfonso; Gian Domenico Sebastiani; Ana Suarez; Francisco J Blanco; Juan J Gomez-Reino; Antonio Gonzalez Journal: PLoS One Date: 2011-12-14 Impact factor: 3.240
Authors: Luke R Lloyd-Jones; Alexander Holloway; Allan McRae; Jian Yang; Kerrin Small; Jing Zhao; Biao Zeng; Andrew Bakshi; Andres Metspalu; Manolis Dermitzakis; Greg Gibson; Tim Spector; Grant Montgomery; Tonu Esko; Peter M Visscher; Joseph E Powell Journal: Am J Hum Genet Date: 2017-01-05 Impact factor: 11.025
Authors: Samira Asgari; Yang Luo; Ali Akbari; Gillian M Belbin; Xinyi Li; Daniel N Harris; Martin Selig; Eric Bartell; Roger Calderon; Kamil Slowikowski; Carmen Contreras; Rosa Yataco; Jerome T Galea; Judith Jimenez; Julia M Coit; Chandel Farroñay; Rosalynn M Nazarian; Timothy D O'Connor; Harry C Dietz; Joel N Hirschhorn; Heinner Guio; Leonid Lecca; Eimear E Kenny; Esther E Freeman; Megan B Murray; Soumya Raychaudhuri Journal: Nature Date: 2020-05-13 Impact factor: 49.962
Authors: Michael E Belloy; Valerio Napolioni; Summer S Han; Yann Le Guen; Michael D Greicius Journal: JAMA Neurol Date: 2020-07-01 Impact factor: 18.302