| Literature DB >> 24688859 |
Zhian N Kamvar1, Javier F Tabima1, Niklaus J Grünwald2.
Abstract
Many microbial, fungal, or oomcyete populations violate assumptions for population genetic analysis because these populations are clonal, admixed, partially clonal, and/or sexual. Furthermore, few tools exist that are specifically designed for analyzing data from clonal populations, making analysis difficult and haphazard. We developed the R package poppr providing unique tools for analysis of data from admixed, clonal, mixed, and/or sexual populations. Currently, poppr can be used for dominant/codominant and haploid/diploid genetic data. Data can be imported from several formats including GenAlEx formatted text files and can be analyzed on a user-defined hierarchy that includes unlimited levels of subpopulation structure and clone censoring. New functions include calculation of Bruvo's distance for microsatellites, batch-analysis of the index of association with several indices of genotypic diversity, and graphing including dendrograms with bootstrap support and minimum spanning networks. While functions for genotypic diversity and clone censoring are specific for clonal populations, several functions found in poppr are also valuable to analysis of any populations. A manual with documentation and examples is provided. Poppr is open source and major releases are available on CRAN: http://cran.r-project.org/package=poppr. More supporting documentation and tutorials can be found under 'resources' at: http://grunwaldlab.cgrb.oregonstate.edu/.Entities:
Keywords: Bootstrap; Bruvo’s distance; Clonality; Clone correction; Genotypic diversity; Hierarchy; Index of association; Minimum spanning networks; Permutation; Population genetics
Year: 2014 PMID: 24688859 PMCID: PMC3961149 DOI: 10.7717/peerj.281
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Functions found in poppr and their short descriptions.
| Function | Description |
|---|---|
|
| |
|
| Provides a quick GUI to grab files for import |
|
| Read |
|
| Converts genind objects to |
|
| |
|
| Handles missing data |
|
| Clone censors at a specified population hierarchy |
|
| Detects and removes phylogenetically uninformative loci |
|
| Subsets genind objects by population |
|
| Shuffles genotypes at each locus using four different shuffling algorithms (details in |
|
| Manipulates population hierarchy |
|
| |
|
| Produces dendrograms with bootstrap support based on Bruvo’s distance |
|
| Calculates Bruvo’s distance |
|
| Calculates the percent allelic dissimilarity |
|
| Calculates the index of association |
|
| Calculates the number of multilocus genotypes |
|
| Finds all multilocus genotypes that cross populations |
|
| Returns a table of populations by multilocus genotypes |
|
| Returns a vector of a numeric multilocus genotype assignment for each individual |
|
| Returns a diversity table by population |
|
| Returns a diversity table by population for all compatible files specified |
|
| |
|
| Helper to determine the appropriate parameters for adjusting the grey level for msn functions |
|
| Produces minimum spanning networks based off Bruvo’s distance colored by population |
|
| Produces a minimum spanning network for any pairwise distance matrix related to the data |
Summary table produced by the poppr() function.
Table shown as it would appear in the R console produced by the poppr() function with 999 permutations to calculate I and p-values from the Aeut data set in poppr from Grünwald et al. (2003). Table was obtained with the following code: library(poppr); data(Aeut); poppr(Aeut, sample = 999).
| Pop | N | MLG | eMLG | SE | H | G | Hexp | E.5 | Ia | p.Ia | rbarD | p.rD |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Athena | 97 | 70 | 65.981 | 1.246 | 4.063 | 42.193 | 0.986 | 0.721 | 2.906 | 0.001 | 0.072 | 0.001 |
| Mt. Vernon | 90 | 50 | 50.000 | 0.000 | 3.668 | 28.723 | 0.976 | 0.726 | 13.302 | 0.001 | 0.282 | 0.001 |
| Total | 187 | 119 | 68.453 | 2.989 | 4.558 | 68.972 | 0.991 | 0.720 | 14.371 | 0.001 | 0.271 | 0.001 |
Notes.
census size
multilocus genotypes
expected MLG based on rarefaction
standard error from rarefaction
Shannon-Wiener Index
Stoddart and Taylor’s Index
(Nei, 1978) Expected Heterozygosity
Evenness (E5)
I
p-value for I
p-value for
Permutation algorithms in poppr.
These are implemented in the calculation of I and p-values iterated over all loci independently.
| Method | Name | Units sampled | With replacement | Weight |
|---|---|---|---|---|
| 1 |
| alleles | No | - |
| 2 |
| alleles | Yes | allele frequencies |
| 3 |
| alleles | Yes | equal |
| 4 |
| genotypes | No | - |
Figure 1Multilocus genotype histogram.
Distribution of 12 multilocus genotypes from the Finland population of the H3N2 SNP data set (Jombart, 2008).
Figure 2Linkage disequilibrium.
Visualizations of tests for linkage disequilibrium, where observed values (blue dashed lines) of I and are compared to histograms showing results of 999 permutations using method 1 in Table 1. Results are shown for the sexual population 5 of the nancycats data set (Jombart, 2008) (A) and for the clonal Athena population of the Aeut data set (Grünwald et al., 2003) (B).
Figure 3Minimum spanning network.
Example minimum spanning network using Bruvo’s distance on a simulated partially clonal data set with 50 individuals genotyped over 10 microsatellite loci produced with the software SimuPOP v.1.0.8 (Peng & Amos, 2008). Each node represents a unique multilocus genotype. Node shading (colors) represent population membership, while edge widths and shading represent relatedness. Edge length is arbitrary.
Figure 4Dendrogram based on genetic distance.
UPGMA tree produced from Bruvo’s distance with 1000 bootstrap replicates (node values greater than 50% are shown). Data from population 9 of the nancycats data set (Jombart, 2008).
Citation of methods and indices implemented in poppr.
| Method/Index | Citation | Function(s) in |
|---|---|---|
| Expected MLG (rarefaction) |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
| ||
|
| ||
| Clone correction |
| |
|
| ||
| Minimum Spanning Networks |
|
|
|
| ||
| Bruvo’s Distance |
|
|
|
| ||
|
| ||
| Bootstrapping |
|
|
| Neighbor Joining |
|
|
| UPGMA |
|
|
Comparison of programs that calculate I.
|
|
|
|
|
| |
|---|---|---|---|---|---|
|
| Yes | Yes | Yes | Yes | Yes |
|
| Yes | No | No | Yes | Yes |
|
| Yes | Yes | Yes | No | No |
Performance comparison.
Comparison of performance on one data set of 237 individuals over nine loci. Each time point represents an average of 10 independent runs. Calculations of I are based on 100 permutations.
|
|
| |
|---|---|---|
|
| 13.4 | 0.3 |
|
| - | 58.3 |
|
| 547.2 | - |