| Literature DB >> 21708038 |
Akio Onogi1, Masanobu Nurimoto, Mitsuo Morita.
Abstract
BACKGROUND: A Bayesian approach based on a Dirichlet process (DP) prior is useful for inferring genetic population structures because it can infer the number of populations and the assignment of individuals simultaneously. However, the properties of the DP prior method are not well understood, and therefore, the use of this method is relatively uncommon. We characterized the DP prior method to increase its practical use.Entities:
Mesh:
Year: 2011 PMID: 21708038 PMCID: PMC3161044 DOI: 10.1186/1471-2105-12-263
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Programs provided in this study
| Program | Purpose | Assumption | Precedent software equivalent to the program |
|---|---|---|---|
| DPART | Inference of K and assignment of individuals | K and assignment of individuals follow the DP prior | HWLER when |
| FUM | Assignment of individuals | Allele frequencies of each population are drawn independently | STRUCTURE (no admixture and no F model) |
| FCM | Assignment of individuals | Allele frequencies of populations are correlated | STRUCTURE (no admixture and F model) |
Pairwise Fbetween base populations generated by simulation method 1
| Microsatellite | SNP | |||
|---|---|---|---|---|
| M = 0.005 | M = 0.003 | M = 0.001 | M = 0.002 | |
| Pairwise Fst | 0.0371 (± 0.0036) | 0.0610 (± 0.0049) | 0.1298 (± 0.0105) | 0.0996 (± 0.0079) |
M indicates the migration rate.
Evaluation of the SAMS sampler
| K = 2 | K = 4 | K = 8 | |||||
|---|---|---|---|---|---|---|---|
| Program | algorithm | Microsatellite | SNP | Microsatellite | SNP | Microsatellite | SNP |
| DPART | Gibbs | 0.016 (98) | 0.292 (42) | 0.228 (26) | 0.476 (4) | 0.315 (3) | 0.523 (0) |
| Gibbs + SAMS | 0.006 (100) | 0.005 (100) | 0.023 (97) | 0.020 (97) | 0.047 (89) | 0.088 (68) | |
| STRUC- | Gibbs + MC3 | 0.006 (100) | 0.025 (96) | 0.039 (91) | 0.118 (67) | 0.122 (52) | 0.264 (20) |
| TURAMA | |||||||
Average , which is the partition distance between the true and inferred partition, the number of data sets in which was 0.1 or less (in parentheses) in 100 simulated data sets, and the average K values in the inferred partitions (in Italic) are shown. The number of individuals in each population was 25. The number of loci was 20 for microsatellites and 100 and for SNPs. The migration rate was 0.003 for microsatellites. MC3 indicates MCMCMC.
Effect of λ on the behavior of DPART (number of alleles was 5)
| Ancestral allele freq. and number of loci | {0.2, 0.2 ..., 0.2} × 30 loci | {0.8, 0.05, ..., 0.05} × 100 loci | {0.2,0.2 ...} × 30 loci + {0.8, 0.05, ...} × 30 loci |
|---|---|---|---|
| Mean major allele frequency | 0.353 ± 0.074 | 0.799 ± 0.105 | 0.575 ± 0.242 |
| 0.028 (97) | 0.500 (0) | 0.500 (0) | |
| 0.044 (91) | 0.500 (0) | 0.236 (53) | |
| 0.130 (75) | 0.215 (57) | 0.136 (73) | |
| 0.459 (8) | 0.020 (96) | 0.370 (26) | |
| Inferred (unique) | 0.034 (93) | 0.475 (5) | 0.028 (95) |
| Inferred (single) | 0.024 (99) | 0.120 (76) | 0.166 (67) |
Average , the number of data sets in which was 0.1 or less (in parentheses), and the average K values (in Italic) are shown. The number of populations was 2 and the number of individuals per population was 25. Vectors in parentheses indicate ancestral allele frequencies. "Mean major allele frequency" indicates the mean values of major allele frequencies in the data sets. Jis the number of observed alleles. "Inferred (unique)" indicates that a unique λ value was inferred for each locus, and "Inferred (single)" indicates that a single value was inferred for all loci.
Effect of λ on the behavior of DPART (number of alleles was 2)
| Ancestral allele freq. and number of loci | {0.5, 0.5} × 50 loci | {0.8, 0.2} × 200 loci | {0.5, 0.5} × 50 loci + {0.8, 0.2} × 50 loci |
|---|---|---|---|
| Mean major allele frequency | 0.621 ± 0.088 | 0.802 ± 0.114 | 0.711 ± 0.137 |
| 0.067 (83) | 0.500 (0) | 0.440 (12) | |
| 0.085 (73) | 0.470 (6) | 0.176 (66) | |
| 0.325 (36) | 0.050 (90) | 0.208 (59) | |
| Inferred (unique) | 0.104 (70) | 0.080 (84) | 0.063 (89) |
| Inferred (single) | 0.068 (86) | 0.035 (93) | 0.073 (87) |
The number of populations was 2 and the number of individuals per population was 25.
Summary of results for the chicken data set, representing 20 breeds
| Program | Number of clusters | Differences from the partition that was determined from breeds | |
|---|---|---|---|
| HWLER | 23 | Breed 21 was divided into two clusters (14 and 16 individuals), breed 121 was divided into four clusters (1, 1, 3, and 25 individuals), and breeds 44 and 45 shared a cluster. | |
| DPART | Inferred (unique) | 23 | Same as HWLER |
| Inferred (single) | 22 | Breed 121 was divided into four clusters (1, 1, 3, and 25 individuals). Breeds 44 and 45 shared a cluster. | |
| 0.05 | 23 | Same as HWLER | |
| 0.5 | 20 | Breed 121 was divided into two clusters (5 and 25 individuals), breeds 44 and 45 shared a cluster, an individual in breed 5 shared a cluster with breed 50, and an individual in breed 16 shared a cluster with breed 5. | |
| 1 | 17 | Breeds 5 and 6, 18 and 37, and 44 and 45 shared different clusters respectively. Three individuals in breed 102 shared a cluster with breed 33. An individual in breed 5 shared a cluster with breed 50. | |
| 3 | 9 | Breeds 5, 16, 18, 21, 37 and 3402 shared a cluster. Breeds 33, 44, 45, 51 and an individual in breed 102 shared a cluster. Breed 13, 26, 42, 50, and an individual each in breeds 5 and 102 shared a cluster. |
Comparison among DPART, FUM, and FCM in data sets with unbalanced sample sizes
| Nl = 20 | Nl = 50 | |||||
|---|---|---|---|---|---|---|
| M = 0.003 | M = 0.001 | M = 0.003 | ||||
| N (10, 10) | N (10, 100) | N (10, 200) | N (10, 300) | N (10, 300) | N (10, 300) | |
| DPART | 0.056 (83) | 0.018 (95) | 0.025 (94) | 0.023 (96) | 0.001 (100) | 0.002 (100) |
| FUM | 0.024 (96) | 0.010 (100) | 0.009 (99) | 0.095 (54) | 0.001 (100) | 0.001 (100) |
| FCM | 0.053 (89) | 0.041 (83) | 0.146 (10) | 0.190 (0) | 0.024 (84) | 0.021 (90) |
Average , the number of data sets in which was 0.1 or less (in parentheses), and the average K values (in Italic) are shown. Nl and M indicate the number of loci and the migration rate, respectively. N ( ) denotes sample sizes.
Comparison among DPART, FUM, and FCM in data sets with moderately unbalanced sample sizes
| Nl = 10 | Nl = 20 | |||||
|---|---|---|---|---|---|---|
| M = 0.005 | M = 0.003 | M = 0.005 | ||||
| N (50, 50) | N (50, 100) | N (50, 200) | N (50, 300) | N (50, 300) | N (50, 300) | |
| DPART | 0.125 (73) | 0.120 (66) | 0.100 (62) | 0.119 (52) | 0.030 (99) | 0.023 (100) |
| FUM | 0.073 (86) | 0.072 (85) | 0.081 (70) | 0.118 (33) | 0.023 (99) | 0.016 (100) |
| FCM | 0.074 (84) | 0.078 (80) | 0.103 (50) | 0.146 (11) | 0.039 (89) | 0.023 (99) |
Comparison among DPART, FUM, and FCM in data sets with multiple small subsets
| Nl = 20 | Nl = 50 | Nl = 10 | Nl = 20 | |||
|---|---|---|---|---|---|---|
| N (10, 10, 200) | N (10, 200, 200) | N (10, 10, 200) | N (50, 50, 200) | N (50, 200, 200) | N (50, 50, 200) | |
| DPART | 0.112 (66) | 0.032 (92) | 0.004 (100) | 0.043 (99) | 0.040 (99) | 0.006 (100) |
| FUM | 0.435 (7) | 0.012 (99) | 0.334 (0) | 0.154 (74) | 0.035 (100) | 0.057 (89) |
| FCM | 0.496 (0) | 0.055 (83) | 0.364 (17) | 0.162 (73) | 0.041 (100) | 0.072 (86) |
The migration rate was 0.003
Figure 1Results obtained with each program during analysis of the bull data set The dendrogram of DPART was generated based on co-assignment probabilities for all individual pairs. Each vertical bar in results for FUM and FCM represents the probability that the individual was derived from each population indicated by five colors. Each bar in the results of STRUCTURE represents the proportion of the individual's genome from each ancestral population. The bar plots were drawn by R [40]. The five clusters detected by DPART are referred to as clusters A, B, C, D, and E.
Pairwise Fbetween clusters detected by DPART in the bull data set
| B | C | D | E | |
|---|---|---|---|---|
| A | 0.1085 | 0.0913 | 0.1495 | 0.0839 |
| B | 0.0759 | 0.1792 | 0.1024 | |
| C | 0.1145 | 0.0548 | ||
| D | 0.0324 |