| Literature DB >> 25540699 |
Alexander I Putman1, Ignazio Carbone1.
Abstract
Advancing technologies have facilitated the ever-widening application of genetic markers such as microsatellites into new systems and research questions in biology. In light of the data and experience accumulated from several years of using microsatellites, we present here a literature review that synthesizes the limitations of microsatellites in population genetic studies. With a focus on population structure, we review the widely used fixation (F ST) statistics and Bayesian clustering algorithms and find that the former can be confusing and problematic for microsatellites and that the latter may be confounded by complex population models and lack power in certain cases. Clustering, multivariate analyses, and diversity-based statistics are increasingly being applied to infer population structure, but in some instances these methods lack formalization with microsatellites. Migration-specific methods perform well only under narrow constraints. We also examine the use of microsatellites for inferring effective population size, changes in population size, and deeper demographic history, and find that these methods are untested and/or highly context-dependent. Overall, each method possesses important weaknesses for use with microsatellites, and there are significant constraints on inferences commonly made using microsatellite markers in the areas of population structure, admixture, and effective population size. To ameliorate and better understand these constraints, researchers are encouraged to analyze simulated datasets both prior to and following data collection and analysis, the latter of which is formalized within the approximate Bayesian computation framework. We also examine trends in the literature and show that microsatellites continue to be widely used, especially in non-human subject areas. This review assists with study design and molecular marker selection, facilitates sound interpretation of microsatellite data while fostering respect for their practical limitations, and identifies lessons that could be applied toward emerging markers and high-throughput technologies in population genetics.Entities:
Keywords: Data analysis; microsatellite; population genetics; population structure
Year: 2014 PMID: 25540699 PMCID: PMC4267876 DOI: 10.1002/ece3.1305
Source DB: PubMed Journal: Ecol Evol ISSN: 2045-7758 Impact factor: 2.912
Trends in number of articles having microsatellites (MS), single-nucleotide polymorphisms (SNP), or both as a topic in the Web of Science database since 2004
| Non-human | Human | |||||
|---|---|---|---|---|---|---|
| MS | SNP | Both | MS | SNP | Both | |
| Year | No. of articles with topic (% of total articles in group) | |||||
| 2004 | 1114 (5.2) | 248 (1.2) | 21 (0.1) | 516 (2.8) | 304 (1.7) | 40 (0.2) |
| 2005 | 1279 (5.8) | 360 (1.6) | 80 (0.4) | 558 (2.8) | 445 (2.3) | 44 (0.2) |
| 2006 | 1395 (5.8) | 393 (1.6) | 33 (0.1) | 484 (2.6) | 522 (2.8) | 47 (0.3) |
| 2007 | 1464 (5.8) | 477 (1.9) | 60 (0.2) | 443 (2.4) | 629 (3.4) | 39 (0.2) |
| 2008 | 1604 (6.3) | 526 (2.1) | 73 (0.3) | 372 (2.0) | 778 (4.1) | 47 (0.2) |
| 2009 | 1945 (7.2) | 665 (2.5) | 71 (0.3) | 363 (1.9) | 793 (4.2) | 59 (0.3) |
| 2010 | 1613 (5.8) | 782 (2.8) | 77 (0.3) | 284 (1.4) | 882 (4.5) | 41 (0.2) |
| 2011 | 1768 (6.0) | 882 (3.0) | 89 (0.3) | 318 (1.6) | 896 (4.6) | 30 (0.2) |
| 2012 | 1856 (6.1) | 1055 (3.5) | 119 (0.4) | 280 (1.4) | 792 (3.9) | 23 (0.1) |
| 2013 | 1609 (5.4) | 1059 (3.5) | 123 (0.4) | 270 (1.3) | 749 (3.7) | 22 (0.1) |
| 2014 | 975 (5.2) | 684 (3.7) | 99 (0.5) | 213 (1.7) | 362 (2.9) | 21 (0.2) |
A total of 84 journals having a non-human subject matter that have published at least 100 articles since 2004 with MS, SNP, or both as a topic.
A total of 56 journals having a human-related keywords in their title (e.g., human, medicine, cancer, forensic, and pharma) that have published at least 100 articles since 2004 with MS, SNP, or both as a topic.
PLoS One is presented separately because of its very high article volume (exceeding all 84 non-human journals combined in 2013), its lack of defined subject area, and publication of an appreciable number of population genetics studies.
A total of 4403 journals that have published less than 100 articles since 2004 with MS, SNP, or both as a topic.
Major assumptions of and questions relating to exploratory clustering methods and their implications in population genetic studies using microsatellites
| Method | Assumption or question | References | Related issues | References |
|---|---|---|---|---|
| In general | Qualitative (strict) group membership | Xu and Wunsch ( | Fuzzy methods allow partial group membership | Xu and Wunsch ( |
| In general | Distance measure is appropriate for data | Felsenstein ( | Microsatellite mutation model is difficult to infer and could vary among loci and be costly to incorrectly specify | This paper, introduction |
| In general | Is clustering method appropriate for sample? | – | Many clustering methods are available but have not been thoroughly tested with microsatellites and/or complex population models | Odong et al. ( |
| In general | Do clustering results accurately depict structure in data or distance matrix? | – | Many methods for cluster validation exist, but are not easily available to population geneticists, have not been evaluated with microsatellites, and are infrequently applied in population genetic studies | Xu and Wunsch ( |
| UPGMA | Structure is hierarchical | Kalinowski ( | Cannot depict nonhierarchical structure | Kalinowski ( |
| UPGMA | Constant molecular clock | Felsenstein ( | Distorts results when rate of evolution varies among samples | Felsenstein ( |
| NJ | Relaxed molecular clock | Felsenstein ( | Allows rate of evolution to vary | Felsenstein ( |
| NJ | – | – | Ties are possible when clustering tips. When individuals are closely related, this can lead to falsely high bootstrap values | Felsenstein ( |
UPGMA, unweighted pair group method with arithmetic mean; NJ, neighbor joining.
Major assumptions of and questions relating to ordination analyses and their implications in population genetic studies using microsatellites
| Method | Assumption or question | References | Related issues | References |
|---|---|---|---|---|
| In general | How should results from ordination analyses be interpreted? | – | Identifying biologically important structure among results is an open question | Jombart et al. ( |
| In general | Markers are independent sources of ancestry information | Lawson et al. ( | Correlation of markers due to gametic linkage can distort ordination results and impede interpretation | Patterson et al. ( |
| In general | No missing data | – | It may not be clear how missing values for microsatellite loci should be replaced | This paper |
| In general | Data are noncompositional, relationships between variables are linear | Jombart et al. ( | Each allele at a microsatellite locus is treated as a different marker, which creates groups of compositional data | Patterson et al. ( |
| Microsatellites not formally incorporated into PCA. Analysis combining %PCA with multiple co-inertia analysis may be required | Laloë et al. ( | |||
| Little information available on performing transformations to correct for nonlinearity and/or compositional microsatellite data | This paper | |||
| PCA | Depicts allele frequency variance, assumes homogeneous variances among alleles | Jombart et al. ( | Allele frequencies need to be scaled, but the choice of scaling method for microsatellites may not be obvious or accessible | Jombart et al. ( |
| PCA | How do outliers influence results from PCA? | – | Outliers are likely to dominate the results and hamper interpretation of other population structure. | Serneels and Verdonck ( |
| PCoA | Depicts distance, assumes distance measure is appropriate for data | Jombart et al. ( | Microsatellite mutation model is difficult to infer and is costly to incorrectly specify | This paper, introduction |
| DAPC | Describes between-population variation only | Jombart et al. ( | Depends on assumptions of PCA and chosen clustering method | Jombart et al. ( |
PCA, principal component analysis; PCoA, principal coordinate analysis; DAPC, discriminant analysis of principal components.
Major assumptions of and questions relating to descriptive statistics and their implications in population genetic studies using microsatellites
| Method | Assumption or question | References | Related issues | References |
|---|---|---|---|---|
| In general | Diploid genome | Dufresne et al. ( | For haploids or especially polyploids, options (in terms of both statistics and packages) are fewer | Dufresne et al. ( |
| Biallelic markers1 under infinite allele model of mutation. Often estimated using heterozygosity | Holsinger and Weir ( | Hedrick ( | ||
| Requires use of unbiased estimators of heterozygosity. Cannot compare subpopulations or loci with different levels of gene diversity | Meirmans and Hedrick ( | |||
| Infinite island model of population structure | Meirmans and Hedrick ( | Violation of migration assumptions require | Weir and Hill ( | |
| Correlation of allele frequencies among populations can cause overestimation using most methods | Fu et al. ( | |||
| Stepwise mutation model (SMM) | Chakraborty and Nei ( | Likely confounded by deviations from SMM. Levels of diversity and structure in the sample likely influence relative performance of | Balloux et al. ( | |
| Parameters including allele size are associated with high variance; should be estimated using analysis of molecular variance (AMOVA) | Michalakis and Excoffier ( | |||
| Depends only on allelic differentiation | Jost ( | Can be sensitive to markers with high mutation rates | Leng and Zhang ( | |
| In general | Which statistics should be employed? | – | Recommended to report as many as possible with microsatellites and ensure clarity in distinguishing parameters and estimators, such as for | Heller and Siegismund ( |
| In general | How should | – | For any parameter: Microsatellites may substantially underestimate population structure, and interpretation has been described as “dangerous.” | Balloux and Lugon-Moulin ( |
GST and some implementations of θ can be used for multiallelic markers.
Major assumptions of and questions relating to model-based clustering and their implications in population genetic studies using microsatellites
| Method | Assumption or question | References | Related issues | References |
|---|---|---|---|---|
| In general | Individual runs are stochastic and may settle on local optima | Gilbert et al. ( | Ensure that a sufficient number of steps and runs have been performed | Gilbert et al. ( |
| BAPS and STRUCTURE | Hardy–Weinberg equilibrium within populations | Pritchard et al. ( | No inbreeding; if suspected, use InStruct | Gao et al. ( |
| Individuals are not related by direct descent; related individuals should be removed prior to analysis | Anderson and Dunham ( | |||
| BAPS | Gametic linkage equilibrium within populations. Tight (BAPS) or loose (STRUCTURE) linkage allowed for one model only | Falush et al. ( | Use of linkage model requires data be haploid, or phased data from a diploid or tetraploid | Corander and Tang ( |
| STRUCTURE | Sufficient number of markers should be unlinked. Phasing optional for diploids, required for polyploids | Falush et al. ( | ||
| BAPS and STRUCTURE | Is the information content of the dataset sufficiently high? | – | Population structure: Incomplete lineage sorting may confound inference when using as many 50 microsatellite loci | Orozco-terWengel et al. ( |
| Admixture: If few microsatellite loci are used, the sample should include a significant number of pure individuals | Pritchard et al. ( | |||
| STRUCTURE and NewHybrids | Hybrid identification: reliable only for many loci (>24–50), especially when differentiation is low | Vähä and Primmer ( | ||
| STRUCTURE | Recessive allele model: null allelesare due to polymorphism | Falush et al. ( | Not appropriate for data that are missing due to experimental error | Falush et al. ( |
| STRUCTURE | Prior population | Falush et al. ( | Using standardized values, performance of STRUCTURE and BAPS declines at standardized | Latch et al. ( |
| BAPS | Two models that incorporate population information a priori are available | Corander et al. ( |
BAPS, Bayesian analysis of population structure.
The two prior population models, LOCPRIOR and USEPOPINFO, should be used for weak and strong structure, respectively.
Major assumptions of and questions relating to inference of migration or population size, and their implications in population genetic studies using microsatellites
| Method | Assumption or question | References | Related issues | References |
|---|---|---|---|---|
| Migration | ||||
| BayesAss | Gametic linkage equilibrium within populations | Wilson and Rannala ( | Detects shifts in gametic linkage equilibrium to estimate recent migration | Wilson and Rannala ( |
| BayesAss | Low migration rate; immigrants comprise less than one-third of population | Faubet et al. ( | When assumptions are violated, inferences using microsatellites may be accurate only for low migration rates (<0.01) and high differentiation ( | Faubet et al. ( |
| BayesAss | Migration and drift are constant during past few generations | Faubet et al. ( | ||
| GENECLASS2 | Detection of first generation migrants only. Assumes sexual reproduction | Paetkau et al. ( | – | – |
| Population size | ||||
| BOTTLENECK | Infinite allele model, stepwise mutation model, or two-phase mutation model | Piry et al. ( | Reports on sensitivity to mutation model conflict | Cornuet and Luikart ( |
| BOTTLENECK | Infinite allele model, stepwise mutation model, or two-phase mutation model | Piry et al. ( | Low power of test may limit their utility | Peery et al. ( |
| | Generalized stepwise mutation model | Garza and Williamson ( | Can significantly overestimate bottlenecks when mutation parameters are improperly specified | Peery et al. ( |
| | Generalized stepwise mutation model | Garza and Williamson ( | Low power of test may limit their utility | Peery et al. ( |
| MSVAR | Stepwise mutation model | Beaumont ( | Reports on sensitivity to mutation model conflict | Girod et al. ( |