| Literature DB >> 20066126 |
Paul C D Johnson1, Daniel T Haydon.
Abstract
Microsatellite genetic marker data are exploited in a variety of fields, including forensics, gene mapping, kinship inference and population genetics. In all of these fields, inference can be thwarted by failure to quantify and account for data errors, and kinship inference in particular can benefit from separating errors into two distinct classes: allelic dropout and false alleles. Pedant is MS Windows software for estimating locus-specific maximum likelihood rates of these two classes of error. Estimation is based on comparison of duplicate error-prone genotypes: neither reference genotypes nor pedigree data are required. Other functions include: plotting of error rate estimates and confidence intervals; simulations for performing power analysis and for testing the robustness of error rate estimates to violation of the underlying assumptions; and estimation of expected heterozygosity, which is a required input. The program, documentation and source code are available from http://www.stats.gla.ac.uk/~paulj/pedant.html.Entities:
Keywords: allelic dropout; false alleles; genotyping error; maximum likelihood; microsatellites; software
Year: 2009 PMID: 20066126 PMCID: PMC2789690 DOI: 10.4137/bbi.s373
Source DB: PubMed Journal: Bioinform Biol Insights ISSN: 1177-9322
Figure 1.The Pedant interface, showing ML error rate estimates for allelic dropout and false alleles with 90% and 95% confidence regions.
Summary statistics assessing the performance of the programs MasterBayes and Pedant in estimating error rates from two simulated data sets.
| Low error rates ( | MasterBayes | ɛ1 = 0.01 | 0.0118 (0.0105, 0.0130) | 0.0111 (0.0072, 0.0160) | 17.9% |
| ɛ2 = 0.005 | 0.0066 (0.0058, 0.0073) | 0.0056 (0.0038, 0.0085) | 31.3% | ||
| Pedant | ɛ1 = 0.01 | 0.0107 (0.0095, 0.0119) | 0.0098 (0.0061, 0.0147) | 7.0% | |
| ɛ2 = 0.005 | 0.0041 (0.0034, 0.0048) | 0.0032 (0.000, 0.0062) | −18.3% | ||
| High error rates ( | MasterBayes | ɛ1 = 0.05 | 0.0526 (0.0496, 0.0556) | 0.0517 (0.0411, 0.0619) | 5.2% |
| ɛ2 = 0.02 | 0.0228 (0.0211, 0.0244) | 0.0221 (0.0169, 0.0287) | 13.8% | ||
| Pedant | ɛ1 = 0.05 | 0.0511 (0.0483, 0.0539) | 0.0516 (0.0409, 0.0598) | 2.2% | |
| ɛ2 = 0.02 | 0.0193 (0.0177, 0.0208) | 0.0178 (0.0140, 0.0248) | −3.7% |
For each data set 100 duplicate genotypes were simulated in MasterBayes from 100 loci with 10 alleles per locus. Allele frequencies for each locus were generated randomly from the broken stick distribution.
In MasterBayes locus-specific error rates were estimated as the median of the relevant marginal distribution based on 100,000 MCMC iterations (after discarding 10,000 burn-in iterations). In Pedant the ML estimates located during 10,000 search iterations were used.