| Literature DB >> 26098940 |
Dmitry Prokopenko1, Julian Hecker1, Edwin Silverman2, Markus M Nöthen3, Matthias Schmid4, Christoph Lange5, Heide Loehlein Fier1.
Abstract
One of the main caveats of association studies is the possible affection by bias due to population stratification. Existing methods rely on model-based approaches like structure and ADMIXTURE or on principal component analysis like EIGENSTRAT. Here we provide a novel visualization technique and describe the problem of population substructure from a graph-theoretical point of view. We group the sequenced individuals into triads, which depict the relational structure, on the basis of a predefined pairwise similarity measure. We then merge the triads into a network and apply community detection algorithms in order to identify homogeneous subgroups or communities, which can further be incorporated as covariates into logistic regression. We apply our method to populations from different continents in the 1000 Genomes Project and evaluate the type 1 error based on the empirical p-values. The application to 1000 Genomes data suggests that the network approach provides a very fine resolution of the underlying ancestral population structure. Besides we show in simulations, that in the presence of discrete population structures, our developed approach maintains the type 1 error more precisely than existing approaches.Entities:
Mesh:
Year: 2015 PMID: 26098940 PMCID: PMC4476755 DOI: 10.1371/journal.pone.0130708
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Ranking procedure for individual i.
| Subject | Covariance value with i | Rank |
|---|---|---|
| 1 | 0.004918 | 4.5 |
| 2 | 0.014093 | 2 |
| 3 | 0.000124 | 6 |
| 4 | 0.028862 | 1 |
| 5 | 0.004918 | 4.5 |
| 6 | 0.012716 | 3 |
| . . . | . . . | . . . |
Description of datasets, used in the analysis.
| Dataset | Number of individuals | Number of subpopulations | Number of SNPs (MAF>0.05) |
|---|---|---|---|
| 1) Americans | 174 | 3 | 930369 |
| 2) Africans | 229 | 3 | 2011030 |
| 3) Asians | 279 | 3 | 716976 |
| 4) Europeans | 378 | 5 | 851672 |
Fig 13 American subpopulations.
The polygons around the nodes represent the detected communities. The node colors represent the actual labels.
Fig 45 European subpopulations.
The polygons around the nodes represent the detected communities. The node colors represent the actual labels.
Contingency table for American subpopulations, rows correspond to detected communities, columns to actual subpopulations.
| CLM | MXL | PUR | |
|---|---|---|---|
|
| 0 | 0 | 18 |
|
| 0 | 0 | 3 |
|
| 0 | 0 | 10 |
|
| 7 | 0 | 8 |
|
| 0 | 0 | 15 |
|
| 5 | 0 | 0 |
|
| 10 | 0 | 0 |
|
| 10 | 0 | 0 |
|
| 23 | 0 | 0 |
|
| 0 | 30 | 0 |
|
| 5 | 10 | 0 |
|
| 0 | 16 | 1 |
|
| 0 | 3 | 0 |
PUR—Puerto Rican, CLM—Colombian, MXL–Mexican
Contingency table for European subpopulations, rows correspond to detected communities, columns to actual subpopulations.
| CEU | FIN | GBR | IBS | TSI | |
|---|---|---|---|---|---|
|
| 0 | 0 | 14 | 0 | 0 |
|
| 0 | 0 | 4 | 0 | 0 |
|
| 8 | 0 | 16 | 0 | 0 |
|
| 5 | 0 | 22 | 0 | 0 |
|
| 0 | 32 | 0 | 0 | 0 |
|
| 0 | 26 | 0 | 0 | 0 |
|
| 0 | 13 | 0 | 0 | 0 |
|
| 0 | 16 | 0 | 0 | 0 |
|
| 0 | 5 | 0 | 0 | 0 |
|
| 0 | 0 | 0 | 6 | 0 |
|
| 0 | 0 | 0 | 8 | 0 |
|
| 8 | 0 | 14 | 0 | 0 |
|
| 3 | 1 | 9 | 0 | 0 |
|
| 6 | 0 | 8 | 0 | 0 |
|
| 17 | 0 | 0 | 0 | 0 |
|
| 12 | 0 | 0 | 0 | 0 |
|
| 26 | 0 | 1 | 0 | 0 |
|
| 0 | 0 | 0 | 0 | 24 |
|
| 0 | 0 | 0 | 0 | 9 |
|
| 0 | 0 | 0 | 0 | 47 |
|
| 0 | 0 | 0 | 0 | 4 |
|
| 0 | 0 | 0 | 0 | 14 |
GBR—British in England and Scotland, FIN—Finnish, IBS—Iberian in Spain, CEU—Utah residents with Northern and Western European ancestry, TSI—Toscani in Italy
Average proportions of significant SNPs in the simulation study.
| naive | PCA(1 or 2 components) | PCA (10 components) | ADMIXTURE (1 or 2 ancestry estimates) | Unconnected components | Detected communities | |
|---|---|---|---|---|---|---|
|
| ||||||
| Random SNPs | 0.0007397 | 0.0000835 | 0.0000879 | 0.0000835 | 0.0000841 | 0.0001035 |
| Differentiated SNPs | 0.8471269 | 0.0000849 | 0.0000917 | 0.0000833 | 0.0000854 | 0.0001003 |
| Causal SNPs | 0.5035125 | 0.4839071 | 0.4836014 | 0.4838919 | 0.4833888 | 0.4820485 |
|
| ||||||
| Random SNPs | 0.0349635 | 0.0000852 | 0.0000926 | 0.0000851 | 0.0000829 | 0.0001029 |
| Differentiated SNPs | 1 | 0.0000889 | 0.0000979 | 0.0000892 | 0.0000772 | 0.000094 |
| Causal SNPs | 0.5024409 | 0.2571964 | 0.2585007 | 0.2573263 | 0.2545517 | 0.2562946 |
|
| ||||||
| Random SNPs | 0.0010452 | 0.0000867 | 0.0000925 | 0.0000867 | 0.0000871 | 0.0001042 |
| Differentiated SNPs | 0.9981074 | 0.0000874 | 0.0000936 | 0.0000874 | 0.0000864 | 0.0001009 |
| Causal SNPs | 0.5007428 | 0.4588232 | 0.4592651 | 0.4588511 | 0.4587604 | 0.4605026 |
|
| ||||||
| Random SNPs | 0.0006067 | 0.0000913 | 0.0000972 | 0.0000909 | 0.0006084 | 0.0001426 |
| Differentiated SNPs | 0.7514909 | 0.0000912 | 0.0000987 | 0.0000911 | 0.7527596 | 0.0130633 |
| Causal SNPs | 0.5087061 | 0.4445503 | 0.4431812 | 0.4445344 | 0.5086953 | 0.4694344 |
The values in the table represent the proportions of SNPs (averaged over 10 replications) found to be significant. The significance level was set to 0.0001. The results are present for 4 scenarios, which are described in the section: "Evaluation via simulated association studies ".
* For these methods in the scenario with 3 underlying discrete subpopulations we took 2 principal components and 2 ancestry estimates, as recommended by the authors.
Contingency table for African subpopulations, rows correspond to detected communities, columns to actual subpopulations.
| ASW | LWK | YRI | |
|---|---|---|---|
|
| 0 | 0 | 12 |
|
| 0 | 0 | 11 |
|
| 0 | 0 | 19 |
|
| 0 | 0 | 16 |
|
| 0 | 0 | 14 |
|
| 0 | 0 | 15 |
|
| 0 | 7 | 0 |
|
| 0 | 19 | 0 |
|
| 0 | 27 | 0 |
|
| 0 | 34 | 0 |
|
| 3 | 0 | 0 |
|
| 3 | 0 | 0 |
|
| 5 | 0 | 0 |
|
| 25 | 0 | 0 |
|
| 10 | 0 | 0 |
|
| 9 | 0 | 0 |
YRI—Yoruba in Nigeria, LWK—Luhya in Kenia, ASW—African ancestry in Southwest US
Contingency table for Asian subpopulations, rows correspond to detected communities, columns to actual subpopulations.
| CHB | CHS | JPT | |
|---|---|---|---|
|
| 6 | 11 | 0 |
|
| 0 | 5 | 0 |
|
| 4 | 20 | 0 |
|
| 2 | 21 | 0 |
|
| 0 | 14 | 0 |
|
| 17 | 0 | 0 |
|
| 18 | 0 | 0 |
|
| 4 | 2 | 0 |
|
| 5 | 12 | 0 |
|
| 25 | 0 | 1 |
|
| 16 | 8 | 0 |
|
| 0 | 0 | 17 |
|
| 0 | 0 | 11 |
|
| 0 | 0 | 13 |
|
| 0 | 0 | 14 |
|
| 0 | 0 | 5 |
|
| 0 | 0 | 13 |
|
| 0 | 0 | 15 |
CHS—Southern Han Chinese, CHB—Han Chinese in Beijing, JPT—Japanese in Tokyo