MOTIVATION: The rapid development of genotyping technology and extensive cataloguing of single nucleotide polymorphisms (SNPs) across the human genome have made genetic association studies the mainstream for gene mapping of complex human diseases. For many diseases, the most practical approach is the population-based design with unrelated individuals. Although having the advantages of easier sample collection and greater power than family-based designs, unrecognized population stratification in the study samples can lead to both false-positive and false-negative findings and might obscure the true association signals if not appropriately corrected. METHODS: We report PHYLOSTRAT, a new method that corrects for population stratification by combining phylogeny constructed from SNP genotypes and principal coordinates from multi-dimensional scaling (MDS) analysis. This hybrid approach efficiently captures both discrete and admixed population structures. RESULTS: By extensive simulations, the analysis of a synthetic genome-wide association dataset created using data from the Human Genome Diversity Project, and the analysis of a lactase-height dataset, we show that our method can correct for population stratification more efficiently than several existing population stratification correction methods, including EIGENSTRAT, a hybrid approach based on MDS and clustering, and STRATSCORE , in terms of requiring fewer random SNPs for inference of population structure. By combining the flexibility and hierarchical nature of phylogenetic trees with the advantage of representing admixture using MDS, our hybrid approach can capture the complex population structures in human populations effectively. SOFTWARE AVAILABILITY: Codes can be downloaded from http://people.pcbi.upenn.edu/ approximately lswang/phylostrat/ CONTACT: mingyao@upenn.edu; iswang@upenn.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: The rapid development of genotyping technology and extensive cataloguing of single nucleotide polymorphisms (SNPs) across the human genome have made genetic association studies the mainstream for gene mapping of complex human diseases. For many diseases, the most practical approach is the population-based design with unrelated individuals. Although having the advantages of easier sample collection and greater power than family-based designs, unrecognized population stratification in the study samples can lead to both false-positive and false-negative findings and might obscure the true association signals if not appropriately corrected. METHODS: We report PHYLOSTRAT, a new method that corrects for population stratification by combining phylogeny constructed from SNP genotypes and principal coordinates from multi-dimensional scaling (MDS) analysis. This hybrid approach efficiently captures both discrete and admixed population structures. RESULTS: By extensive simulations, the analysis of a synthetic genome-wide association dataset created using data from the Human Genome Diversity Project, and the analysis of a lactase-height dataset, we show that our method can correct for population stratification more efficiently than several existing population stratification correction methods, including EIGENSTRAT, a hybrid approach based on MDS and clustering, and STRATSCORE , in terms of requiring fewer random SNPs for inference of population structure. By combining the flexibility and hierarchical nature of phylogenetic trees with the advantage of representing admixture using MDS, our hybrid approach can capture the complex population structures in human populations effectively. SOFTWARE AVAILABILITY: Codes can be downloaded from http://people.pcbi.upenn.edu/ approximately lswang/phylostrat/ CONTACT: mingyao@upenn.edu; iswang@upenn.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Noah A Rosenberg; Jonathan K Pritchard; James L Weber; Howard M Cann; Kenneth K Kidd; Lev A Zhivotovsky; Marcus W Feldman Journal: Science Date: 2002-12-20 Impact factor: 47.728
Authors: Mattias Jakobsson; Sonja W Scholz; Paul Scheet; J Raphael Gibbs; Jenna M VanLiere; Hon-Chung Fung; Zachary A Szpiech; James H Degnan; Kai Wang; Rita Guerreiro; Jose M Bras; Jennifer C Schymick; Dena G Hernandez; Bryan J Traynor; Javier Simon-Sanchez; Mar Matarin; Angela Britton; Joyce van de Leemput; Ian Rafferty; Maja Bucan; Howard M Cann; John A Hardy; Noah A Rosenberg; Andrew B Singleton Journal: Nature Date: 2008-02-21 Impact factor: 49.962
Authors: Jun Z Li; Devin M Absher; Hua Tang; Audrey M Southwick; Amanda M Casto; Sohini Ramachandran; Howard M Cann; Gregory S Barsh; Marcus Feldman; Luigi L Cavalli-Sforza; Richard M Myers Journal: Science Date: 2008-02-22 Impact factor: 47.728
Authors: Brendan J Keating; Sam Tischfield; Sarah S Murray; Tushar Bhangale; Thomas S Price; Joseph T Glessner; Luana Galver; Jeffrey C Barrett; Struan F A Grant; Deborah N Farlow; Hareesh R Chandrupatla; Mark Hansen; Saad Ajmal; George J Papanicolaou; Yiran Guo; Mingyao Li; Stephanie Derohannessian; Paul I W de Bakker; Swneke D Bailey; Alexandre Montpetit; Andrew C Edmondson; Kent Taylor; Xiaowu Gai; Susanna S Wang; Myriam Fornage; Tamim Shaikh; Leif Groop; Michael Boehnke; Alistair S Hall; Andrew T Hattersley; Edward Frackelton; Nick Patterson; Charleston W K Chiang; Cecelia E Kim; Richard R Fabsitz; Willem Ouwehand; Alkes L Price; Patricia Munroe; Mark Caulfield; Thomas Drake; Eric Boerwinkle; David Reich; A Stephen Whitehead; Thomas P Cappola; Nilesh J Samani; A Jake Lusis; Eric Schadt; James G Wilson; Wolfgang Koenig; Mark I McCarthy; Sekar Kathiresan; Stacey B Gabriel; Hakon Hakonarson; Sonia S Anand; Muredach Reilly; James C Engert; Deborah A Nickerson; Daniel J Rader; Joel N Hirschhorn; Garret A Fitzgerald Journal: PLoS One Date: 2008-10-31 Impact factor: 3.240
Authors: David Serre; Alexandre Montpetit; Guillaume Paré; James C Engert; Salim Yusuf; Bernard Keavney; Thomas J Hudson; Sonia Anand Journal: PLoS One Date: 2008-01-02 Impact factor: 3.240
Authors: Brenna M Henn; Simon Gravel; Andres Moreno-Estrada; Suehelay Acevedo-Acevedo; Carlos D Bustamante Journal: Hum Mol Genet Date: 2010-09-28 Impact factor: 6.150
Authors: Yang Zhao; Feng Chen; Rihong Zhai; Xihong Lin; Zhaoxi Wang; Li Su; David C Christiani Journal: Int J Epidemiol Date: 2012-11-12 Impact factor: 7.196
Authors: Sung K Kim; Christopher R Gignoux; Jeffrey D Wall; Annette Lum-Jones; Hansong Wang; Christopher A Haiman; Gary K Chen; Brian E Henderson; Laurence N Kolonel; Loic Le Marchand; Daniel O Stram; Richa Saxena; Iona Cheng Journal: PLoS One Date: 2012-11-07 Impact factor: 3.240