| Literature DB >> 26961892 |
Yafang Li1, Jinyoung Byun2, Guoshuai Cai3, Xiangjun Xiao4, Younghun Han5, Olivier Cornelis6, James E Dinulos7, Joe Dennis8, Douglas Easton9, Ivan Gorlov10, Michael F Seldin11, Christopher I Amos12.
Abstract
BACKGROUND: Identifying subpopulations within a study and inferring intercontinental ancestry of the samples are important steps in genome wide association studies. Two software packages are widely used in analysis of substructure: Structure and Eigenstrat. Structure assigns each individual to a population by using a Bayesian method with multiple tuning parameters. It requires considerable computational time when dealing with thousands of samples and lacks the ability to create scores that could be used as covariates. Eigenstrat uses a principal component analysis method to model all sources of sampling variation. However, it does not readily provide information directly relevant to ancestral origin; the eigenvectors generated by Eigenstrat are sample specific and thus cannot be generalized to other individuals.Entities:
Mesh:
Year: 2016 PMID: 26961892 PMCID: PMC4784403 DOI: 10.1186/s12859-016-0965-1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Proportion of ancestry inference using first and second PCA score. X and Y axis denote the first and second PCA score generated by FastPop. Red, blue, green and black denote Hapmap samples with European, Africana, Asian ancestry and studied samples with unknown ancestry. The centroids of each population were computed for Hapmap samples. Three lines in dark green were drawn connecting the centroids; six extra line perpendicular to triangle sides and across the centroids were drawn in grey. Individuals in area 1–3 are classified as pure European, Afrian and Asian origin; samples in area 4–6 have mixture origin of two adjacent populations; samples in area 7 have mixture origin of European, African and Asian ancestry. h1, h2 and h3 denote the distance between the samples in the triangle to the sides of the triangle; l1-l6 denote the distance between the image of the sample at the triangle sides and the population
Fig. 4Panel a displays four intercontinental populations and one mixed population in 2-dimensions. Tetrahedron model in b can be applied to the extended intercontinental analysis. European, Asian, African, and Native American are four distinct populations and denoted in red, green, blue, purple, respectively. Mexican American is mixed population and represented in black. Each intercontinental population has three combinations derived for each face in the tetrahedron. First, FastPop is applied to infer ancestry on each face of tetrahedron and then average proportions over each intercontinental population are used to summarize ancestry
Fig. 2Flow chart of intercontinental ancestry analysis using FastPop
Fig. 3Comparison of estimated proportion of ancestry between FastPop and Structure for 19661 individuals. X and Y axees denote the proportion of ancestry for each individual from FastPop and Structure
Comparison of assigned ancestry using different cutoff value between FastPop and Structure
| Cutoff value | CEU | YRI | CHB | |
|---|---|---|---|---|
| Without prior population information for Hapmap samples | ||||
| 0.9 | PCA Scores | 17520/16329/16325 | 64/62/55 | 740/721/719 |
| 0.8 | PCA Scores | 18016/17928/17928 | 175/165/159 | 773/768/767 |
| 0.7 | PCA Scores | 18171/18122/18122 | 266/263/260 | 799/796/796 |
| With prior population information for Hapmap samples | ||||
| 0.9 | PCA Scores | 17510/16329/16321 | 69/62/59 | 743/721/719 |
| 0.8 | PCA Scores | 18017/17928/17928 | 174/165/159 | 774/768/768 |
| 0.7 | PCA Scores | 18167/18122/18121 | 267/263/260 | 799/796/796 |
Number in each cell indicates: No. assigned by structure/No. assigned by FastPop/No. common in both methods. Structure analysis was conducted with/without prior population information for Hapmap samples. Structure was run was run under admixture model. Parameter setting “BURNIN 10000 NUMREPS 1000 INFERALPHA 1 POPSPECIFICALMBDA 1”. Without prior population information, the running time is 21:16:00 and 23:30:00 for with prior population information