| Literature DB >> 21738600 |
Hong Gao1, Katarzyna Bryc, Carlos D Bustamante.
Abstract
Inferring population structure using bayesian clustering programs often requires a priori specification of the number of subpopulations, K, from which the sample has been drawn. Here, we explore the utility of a common bayesian model selection criterion, the Deviance Information Criterion (DIC), for estimating K. We evaluate the accuracy of DIC, as well as other popular approaches, on datasets generated by coalescent simulations under various demographic scenarios. We find that DIC outperforms competing methods in many genetic contexts, validating its application in assessing population structure.Entities:
Mesh:
Year: 2011 PMID: 21738600 PMCID: PMC3125185 DOI: 10.1371/journal.pone.0021014
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Subpopulation topology of Model Split and Model Tree for ranging from three to five.
In Model Split, subpopulations are split from one ancestral population simultaneously, forming a star-shaped topology. In Model Tree, populations separate at different time points, forming a tree-shaped topology. The time interval between two consecutive dashed lines is 0.5 scaled in units of generations, where is the effective population size.
Figure 2Performance of DIC on one data set simulated under Model Split for each true value, 1,2,3 and 5.
Accuracy of multiple estimators under Models Split and Tree.
| Model | Split | Tree | ||||||
| K | 1 | 2 | 3 | 4 | 5 | 3 | 4 | 5 |
|
| 0.495 | 0.502 | 0.493 | 0.492 | 0.486 | 0.507 | 0.501 | |
| DIC | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.98 |
| STRUCTURE | 0.90 | 1.00 | 1.00 | 0.86 | 0.80 | 0.98 | 0.94 | 0.72 |
| STRUCTURE, F model | 0.90 | 0.98 | 0.94 | 0.82 | 0.54 | 0.90 | 0.82 | 0.62 |
|
| 1.00 | 0.94 | 0.70 | 0.64 | 0.80 | 0.86 | 0.64 | |
|
| 1.00 | 0.90 | 0.78 | 0.50 | 0.84 | 0.92 | 0.54 | |
| Eigenanalysis, | 0.97 | 0.89 | 0.86 | 0.86 | 0.96 | 0.96 | 0.92 | 0.90 |
| Eigenanalysis, | 1.00 | 0.96 | 0.91 | 0.93 | 0.99 | 0.98 | 0.94 | 0.92 |
| Eigenanalysis, | 1.00 | 1.00 | 0.96 | 0.96 | 1.00 | 1.00 | 0.96 | 0.96 |
| Structurama, noninformative prior | 1.00 | 1.00 | 0.82 | 0.18 | 0.02 | 0.88 | 0.22 | 0.00 |
| Structurama, correct prior | 1.00 | 1.00 | 0.82 | 0.18 | 0.02 | 0.82 | 0.22 | 0.00 |
| BAPS | 1.00 | 1.00 | 1.00 | 0.82 | 1.00 | 1.00 | 1.00 | 0.96 |
Performance assessment of methods including DIC, STRUCTURE, , Eigenanalysis, Structurama and BAPS. “” is the population differentiation statistic estimated by SmartPCA [11] averaged across 50 data sets. STRUCTURE's performance is evaluated based upon both the original model and the correlated alleles or “F” model. Similarly tested is the statistic that relies on STRUCTURE. Eigenanalysis is tested at three significance levels (). Structurama is assessed using both a noninformative prior on and the true value as the starting point. BAPS is evaluated using the individual clustering mode. Blank values in the table indicate that a program did not generate a result.
Accuracy of multiple estimators under Models and .
| Model |
|
| ||||||
| K | 2 | 3 | 4 | 5 | 2 | 3 | 4 | 5 |
|
| 0.392 | 0.430 | 0.452 | 0.454 | 0.191 | 0.248 | 0.263 | 0.281 |
| DIC | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| STRUCTURE | 1.00 | 0.98 | 0.94 | 0.84 | 1.00 | 1.00 | 0.96 | 0.84 |
| STRUCTURE, F model | 0.88 | 0.96 | 0.94 | 0.88 | 0.86 | 0.86 | 0.94 | 0.86 |
|
| 1.00 | 0.78 | 0.94 | 0.80 | 1.00 | 0.92 | 0.76 | 0.80 |
|
| 1.00 | 0.84 | 0.94 | 0.88 | 1.00 | 0.96 | 0.80 | 0.92 |
| Eigenanalysis, | 0.96 | 0.84 | 0.98 | 0.96 | 1.00 | 0.86 | 0.94 | 0.98 |
| Eigenanalysis, | 0.98 | 0.94 | 1.00 | 0.96 | 1.00 | 0.98 | 0.98 | 1.00 |
| Eigenanalysis, | 0.98 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| Structurama, noninformative prior | 1.00 | 0.96 | 0.80 | 0.44 | 0.74 | 0.52 | 0.12 | 0.00 |
| Structurama, correct prior | 1.00 | 0.98 | 0.78 | 0.44 | 0.72 | 0.52 | 0.10 | 0.06 |
| BAPS | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.98 |
Evaluation of these methods are performed in the same manner as in Table 1.
Accuracy of multiple estimators under Models and Inbred.
| Model |
| Inbred | |||||||
| K | 2 | 3 | 4 | 5 | 1 | 2 | 3 | 4 | 5 |
|
| 0.048 | 0.063 | 0.069 | 0.073 | 0.489 | 0.498 | 0.491 | 0.504 | |
| DIC | 1.00 | 0.94 | 0.70 | 0.56 | 1.00 | 1.00 | 1.00 | 0.98 | 0.98 |
| STRUCTURE | 0.02 | 0.02 | 0.06 | 0.16 | 0.64 | 1.00 | 0.98 | 0.90 | 0.84 |
| STRUCTURE, F model | 0.90 | 0.98 | 1.00 | 1.00 | 0.34 | 0.36 | 0.22 | 0.20 | 0.22 |
|
| 0.32 | 0.48 | 0.26 | 0.16 | 1.00 | 0.74 | 0.80 | 0.68 | |
|
| 0.94 | 0.96 | 0.74 | 0.64 | 1.00 | 0.94 | 0.84 | 0.82 | |
| Eigenanalysis, | 0.94 | 0.96 | 0.90 | 0.94 | 0.86 | 0.68 | 0.61 | 0.66 | 0.68 |
| Eigenanalysis, | 1.00 | 0.98 | 0.90 | 0.90 | 0.96 | 0.92 | 0.73 | 0.78 | 0.75 |
| Eigenanalysis, | 1.00 | 0.92 | 0.90 | 0.84 | 1.00 | 0.93 | 0.81 | 0.84 | 0.85 |
| Structurama, noninformative prior | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | 0.82 | 0.24 | 0.02 |
| Structurama, correct prior | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | 0.78 | 0.22 | 0.02 |
| BAPS | 0.64 | 0.54 | 0.22 | 0.14 | 0.74 | 1.00 | 1.00 | 1.00 | 0.98 |
Evaluation of these methods are performed in the same manner as in Table 1.
Accuracy of multiple estimators with reduced data dimensions.
| Model | Subpopulation Size = 10 | Number of Loci = 10 | ||||||||
| K | 1 | 2 | 3 | 4 | 5 | 1 | 2 | 3 | 4 | 5 |
| DIC | 1.00 | 1.00 | 1.00 | 1.00 | 0.98 | 1.00 | 1.00 | 0.82 | 0.42 | 0.48 |
| STRUCTURE | 0.84 | 1.00 | 0.86 | 0.60 | 0.40 | 1.00 | 0.96 | 0.86 | 0.72 | 0.18 |
| STRUCTURE, F model | 0.16 | 1.00 | 0.86 | 0.66 | 0.34 | 0.10 | 0.96 | 0.86 | 0.72 | 0.18 |
|
| 0.98 | 0.68 | 0.64 | 0.22 | 0.94 | 0.24 | 0.06 | 0.04 | ||
|
| 0.98 | 0.86 | 0.62 | 0.16 | 0.94 | 0.20 | 0.10 | 0.04 | ||
| Eigenanalysis, | 0.90 | 0.80 | 0.82 | 0.80 | ||||||
| Eigenanalysis, | 0.96 | 0.84 | 0.88 | 0.92 | ||||||
| Eigenanalysis, | 0.20 | 0.42 | 0.66 | 0.78 | ||||||
| Structurama, noninformative prior | 1.00 | 1.00 | 0.96 | 0.38 | 0.00 | 1.00 | 0.90 | 0.40 | 0.14 | 0.00 |
| Structurama, correct prior | 1.00 | 1.00 | 0.96 | 0.38 | 0.00 | 1.00 | 0.90 | 0.38 | 0.12 | 0.00 |
| BAPS | 1.00 | 1.00 | 1.00 | 0.8 | 0.5 | 0.00 | 0.02 | 0.04 | 0.36 | 0.28 |
Evaluation of these methods are performed in the same manner as in Table 1. Data are simulated under Model Split with the size of each subpopulation reduced from 50 to 10 and the number of loci reduced from 100 to 10, respectively.
Accuracy of multiple estimators with shorter splitting time among subpopulations.
| Model | Subpopulation Splitting Time = 0.05 | ||||
| K | 1 | 2 | 3 | 4 | 5 |
|
| 0.090 | 0.084 | 0.093 | 0.097 | |
| DIC | 1.00 | 1.00 | 0.92 | 0.60 | 0.26 |
| STRUCTURE | 0.64 | 0.78 | 0.50 | 0.54 | 0.22 |
| STRUCTURE, F model | 0.76 | 1.00 | 0.94 | 0.94 | 0.74 |
|
| 1.00 | 0.44 | 0.08 | 0.04 | |
|
| 0.96 | 0.78 | 0.56 | 0.42 | |
| Eigenanalysis, | 0.96 | 0.96 | 0.94 | 0.9 | 0.72 |
| Eigenanalysis, | 1.00 | 1.00 | 0.98 | 0.94 | 0.70 |
| Eigenanalysis, | 1.00 | 1.00 | 0.98 | 0.88 | 0.48 |
| Structurama, noninformative prior | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| Structurama, correct prior | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| BAPS | 1.00 | 1.00 | 0.58 | 0.02 | 0.00 |
Evaluation of these methods are performed in the same manner as in Table 1. Data are simulated under Model Split with the splitting time reduced from to .
Figure 3Analysis result of data from the Human Genome Diversity Panel.
A. Estimated DIC for different values of . B. Distruct classification bar plot of individuals from the above data set assuming . Each vertical bar represents one individual and each color represents a different cluster.