| Literature DB >> 16792813 |
Baolin Wu1, Nianjun Liu, Hongyu Zhao.
Abstract
BACKGROUND: Inference of population stratification and individual admixture from genetic markers is an integrative part of a study in diverse situations, such as association mapping and evolutionary studies. Bayesian methods have been proposed for population stratification and admixture inference using multilocus genotypes and widely used in practice. However, these Bayesian methods demand intensive computation resources and may run into convergence problem in Markov Chain Monte Carlo based posterior samplings.Entities:
Mesh:
Year: 2006 PMID: 16792813 PMCID: PMC1550430 DOI: 10.1186/1471-2105-7-317
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Commonly used software for population structure inference
| Software | STRUCTURE | GENELAND | BAPS/BAPS2 | ADMIXMAP | PARTITION | L-POP | PSMIX |
| Method | Bayesian MCMC | Bayesian MCMC | Bayesian, MCMC is used when the number of populations ≥ 9 | Bayesian MCMC | Bayesian MCMC | Latent class analysis, EM | Clustering analysis, EM |
| Features | Population structure inference | Process geo-referenced individual multilocus genetic data for population structure inference | Population structure inference. Use geographical sampling design of the individuals | Mainly for analysis of datasets that consist of trait measurements and genotype data on a sample of individuals from an admixed or stratified population | Population structure inference | Population structure inference | Population structure inference |
| Assumptions | HWE and LE between loci | HWE, LE between loci, and spatial distribution of sub-populations | HWE and LE between loci | Ancestry state is the same at all loci within a compound locus on any gamete. Mating is not assortative for admixture in the population from which the parental gametes were drawn | HWE and LE between loci. The underlying population genetic model is appropriate for out-crossing diploid organisms. | HWE and LE between loci | HWE and LE between loci |
| Input parameters | Parameters for running MCMC, parameters for ancestry model and allele frequency model, and the number of populations | In addition to genetic and spatial data, the user must provide parameters for the maximum number of populations, the way geographical information is handled and the allele frequency model | When MCMC is used, need parameters for running MCMC | Parameters for running MCMC, allele frequencies (number of population is specified here), and mating model. Disease information (outcome variable is suggested even if focus on population structure). Parameters for tests and output | Parameters for running MCMC, maximum number of populations, prior parameter for allelic diversity, and prior parameter for number of populations | Number of populations, admixture option, data format options, model options, output format options, and convergence criterion | Number of populations and convergence criterion |
| Output | One file for estimates and some files for plots. Main parameter estimates are inferred ancestry of individuals and estimated allele frequencies in each population | Main parameter estimates are the number of populations, population membership of each individual, maps giving the population memberships of each geographical pixel of a given size to locate genetic discontinuities between populations | Main parameter estimates are the number of populations and population membership of each individual | Individual/gamete level admixture variables. Ancestry-specific allele or haplotype frequencies. Results for association analysis and model parameters | The output file contains a list of the parameter settings followed by the sequence of observations of the Markov chain. A companion program PartitionView is provided to obtain useful information from the PARTITION output file. | Main outputs are estimates of allele frequencies, posterior class probabilities, and class-specific allele frequencies | Main parameter estimates are inferred ancestry of individuals |
| Advantages | Easy to use. Once number of populations is given, the estimates are accurate | Easy to use. Flexible to extend. Can work with or without spatial information. Can estimate number of populations | Easy to use. Provide good estimate for number of populations. When geographical sampling information is applicable, can improve the statistical power to detect clusters in the data | In addition to population structure inference, can perform association analysis on structured populations. Can deal with tightly linked loci using haplotypes | Easy to use. Can estimate number of populations and calculate a Bayes factor in support of a single source population against the alternative of more than one source population. | Computationally efficient | Easy to use. Computationally efficient. Flexible to extend |
| Limitations | Computationally intensive. Can detect number of populations but does not work well.# | Does not handle admixture. Computationally intensive, especially when "Falush" is used as allele frequency model, or number of populations needs to be estimated. | Very memory intensive. When MCMC is used, becomes relatively computationally intensive. Only provides membership partition, does not handle admixture | Difficult to use. Computationally intensive. Does not estimate number of populations * | Computationally intensive, especially when number of populations needs to be estimated | Parameter configuration is difficult to use, works OK for discrete populations but not for admixed populations. Does not estimate number of populations | Does not estimate number of populations |
| Platforms | Windows, Unix/Linux | Windows/Linux/Mac (R package) | Windows | Windows, Unix/Linux. R statistical package is required | Windows | Windows (DOS), Unix/Linux | Windows/Linux/Mac (R package) |
| References | Pritchard et al. (2000), Falush et al. (2003) | Guillot et al. (2005) | Corander et al. (2003, 2004) | McKeigue et al. (2000), Hoggart et al. (2003, 2004) | Dawson and Belkhir (2001) | Purcell and Sham (2004) | Tang et al. (2005), Liu et al. (2005) |
| URL | |||||||
# With the findings of Evanno et al. (2005), STRUCTURE's ability to detect number of populations should be improved greatly.
* Only focus on the function of population structure inference.
Figure 1Comparison of estimates of individual admixture of STRUCTURE (x-axis) and PSMIX (y-axis) for the data of Pima and Surui from Rosenberg et al. (2002). Only the first 50 markers were used.
Figure 2Estimates of STRUCTURE, PSMIX and L-POP for the data of Pima and Surui (the first 21 samples are Surui) from Rosenberg et al. (2002). Only the first 50 markers were used.
Figure 3Estimates of STRUCTURE and PSMIX for the simulated data.