| Literature DB >> 28083094 |
Suresh A Sethi1, Daniel Linden2, John Wenburg3, Cara Lewis3, Patrick Lemons4, Angela Fuller1, Matthew P Hare5.
Abstract
Error-tolerant likelihood-based match calling presents a promising technique to accurately identify recapture events in genetic mark-recapture studies by combining probabilities of latent genotypes and probabilities of observed genotypes, which may contain genotyping errors. Combined with clustering algorithms to group samples into sets of recaptures based upon pairwise match calls, these tools can be used to reconstruct accurate capture histories for mark-recapture modelling. Here, we assess the performance of a recently introduced error-tolerant likelihood-based match-calling model and sample clustering algorithm for genetic mark-recapture studies. We assessed both biallelic (i.e. single nucleotide polymorphisms; SNP) and multiallelic (i.e. microsatellite; MSAT) markers using a combination of simulation analyses and case study data on Pacific walrus (Odobenus rosmarus divergens) and fishers (Pekania pennanti). A novel two-stage clustering approach is demonstrated for genetic mark-recapture applications. First, repeat captures within a sampling occasion are identified. Subsequently, recaptures across sampling occasions are identified. The likelihood-based matching protocol performed well in simulation trials, demonstrating utility for use in a wide range of genetic mark-recapture studies. Moderately sized SNP (64+) and MSAT (10-15) panels produced accurate match calls for recaptures and accurate non-match calls for samples from closely related individuals in the face of low to moderate genotyping error. Furthermore, matching performance remained stable or increased as the number of genetic markers increased, genotyping error notwithstanding.Entities:
Keywords: capture–recapture; genotyping error; inference; non-invasive; sample matching
Year: 2016 PMID: 28083094 PMCID: PMC5210676 DOI: 10.1098/rsos.160457
Source DB: PubMed Journal: R Soc Open Sci ISSN: 2054-5703 Impact factor: 2.963
Simulation scenarios for performance testing the error-tolerant likelihood-based match calling model.a
| marker scenarios | no. loci | no. alleles | allele frequencies | error rate (per locus) | allele specification | error rate specification |
|---|---|---|---|---|---|---|
| SNP: baseline | 32, 48, 64, 80, 96, 128 | 2 | MAF: 0.2, 0.3, 0.4 | 0.00, 0.01, 0.02, 0.05, 0.10, 0.25 | equivalent to data generation | equivalent to data generation |
| SNP: misspecified error rates | 48, 64 | 2 | 0.3 | 0.01, 0.10 | equivalent to data generation | error rates underestimated by 50% and overestimated by 50% |
| SNP: misspecified allele frequencies | 48, 64 | 2 | 0.3 | 0.01, 0.10 | MAF underestimated by 0.1 or overestimated by 0.1 | equivalent to data generation |
| SNP: inclusion of high-error loci | 64 | 2 | 0.3 | 48 loci at 0.01,16 at 0.10; 48 loci at 0.10, 16 at 0.25 | equivalent to data generation | equivalent to data generation |
| MSAT: baseline | 5, 10, 15, 20 | 5, 10, 20 | all alleles equal, frequency = 1/(no. alleles) | (ADO, FA): (0.00, 0.00), (0.01,0.01), (0.05,0.02), (0.20,0.05) | equivalent to data generation | equivalent to data generation |
| MSAT: misspecified error rates | 10, 15 | 10 | all alleles equal, frequency = 1/(no. alleles) | (ADO, FA): (0.05,0.02), (0.20,0.05) | equivalent to data generation | error rates underestimated by 50% and overestimated by 50% |
| MSAT: misspecified allele frequencies | 10, 15 | 10 | all alleles equal, frequency = 1/(no. alleles) | (ADO, FA): (0.05,0.02), (0.20,0.05) | five alleles at 0.15 and 5 alleles at 0.05 | equivalent to data generation |
| MSAT: inclusion of high-error loci | 15 | 10 | all alleles equal, frequency = 1/(no. alleles) | (ADO, FA): 10 loci at (0.05,0.02), 5 loci at (0.20,0.05) | equivalent to data generation | equivalent to data generation |
aGenotyping error rates are per-locus; simulations convert per-locus error rates to per-allele rates (see Material and methods). MAF, minor allele frequency; ADO, allelic dropout; FA, false allele.
Figure 1.Simulation results of the error-tolerant likelihood-based match calling model for biallelic (SNP) markers. Plots are organized by minor allele frequency (MAF) along rows and true relationship state along columns. Results are the proportion of comparisons of 10 000 pairs of simulated samples from a given relationship state, minor allele frequency, number of loci and per-locus genotyping error rate that are called a match. Pairs of genotypes simulated from the ‘same individual’ relationship represent recapture events, for which match call rates of 1.00 are perfectly accurate and less than 1.00 indicate that missed recapture events occur; pairs of genotypes simulated from the ‘full sibling’ or ‘unrelated’ relationship states represent samples from different individuals, for which match call rates of 0.00 are perfectly accurate and more than 0.00 indicate that false recapture events occur. Note that y-axis values differ across columns.
Figure 2.Simulation results of the error-tolerant likelihood-based match calling model for multiallelic (MSAT) markers. Plots are organized by the number of (equal frequency) alleles per locus, a, along rows, and true relationship state along columns. Results are the proportion of comparisons of 1000 pairs of simulated samples from a given relationship state, number of loci, number of alleles and per-locus genotyping error rate that are called a match. Pairs of genotypes simulated from the ‘same individual’ relationship represent recapture events, for which match call rates of 1.00 are perfectly accurate and less than 1.00 indicate that missed recapture events occur; pairs of genotypes simulated from the ‘full sibling’ or ‘unrelated’ relationship states represent samples from different individuals, for which match call rates of 0.00 are perfectly accurate and more than 0.00 indicate that false recapture events occur. Note that y-axis values differ across columns.
Match calling sensitivity analysis for SNPs.
| match call rate (difference from exact)b | |||
|---|---|---|---|
| scenarioa | |||
| 0.9999 (0.0001) | |||
| 0.0109 (0.0045) | 0.9463 (0.0147) | ||
| 0.0036 (0.0011) | 0.9626 (0.0101) | ||
| 0.0132 (0.0068) | 0.9536 (0.022) | ||
| 0.0056 (0.0031) | 0.9668 (0.0143) | ||
| 1.0000 (0.0002) | |||
| 0.0470 (0.0406) | 0.9896 (0.0580) | ||
| 0.0262 (0.0237) | 0.9956 (0.0431) | ||
| high-error loci included, 48 with 1% error and 16 with 10% errorc | 1.0000 (0.0002) | ||
| high-error loci included, 48 with 10% error and 16 with 25% errorc | 0.9319 (0.0003) | ||
aSee table 1 for additional scenario details. = vector of allele frequencies; γ = per-locus genotyping error rate.
bMatch rate call discrepancies are calculated as (scenario specific match rate – match rate from exactly specified input parameters); zero differences are bolded, negative differences are italicized. See table 1 for additional scenario details.
cResults are compared against a 48 SNP panel with 1% genotyping error.
Match calling sensitivity analysis for MSATs.
| match call rate (difference from exact)b | |||
|---|---|---|---|
| scenarioa | |||
| 0.006 (0.002) | 0.994 (0.001) | ||
| 0.003 (0.001) | 0.999 (0.001) | ||
| 0.044 (0.019) | 0.99 (0.005) | ||
| 0.011 (0.003) | 0.997 (0.000) | ||
| error rates overestimated, 10 loci, latent ADO = 0.05, FA = 0.02 | 0.010 (0.006) | 0.999 (0.006) | |
| error rates overestimated, 15 loci, latent ADO = 0.05, FA = 0.02 | 1.000 (0.002) | ||
| error rates overestimated, 10 loci, latent ADO = 0.20, FA = 0.05 | 0.090 (0.065) | 0.998 (0.013) | |
| error rates overestimated, 15 loci, latent ADO = 0.20, FA = 0.05 | 0.057 (0.049) | 1.000 (0.003) | |
| error rates underestimated, 10 loci, latent ADO = 0.05, FA = 0.02 | |||
| error rates underestimated, 15 loci, latent ADO = 0.05, FA = 0.02 | |||
| error rates underestimated, 10 loci, latent ADO = 0.20, FA = 0.05 | |||
| error rates underestimated, 15 loci, latent ADO = 0.20, FA = 0.05 | |||
| high-error loci included, 10 loci with ADO = 0.05 and FA = 0.02, 5 loci with ADO = 0.20 and FA = 0.05c | 0.999 (0.006) | ||
aAll simulations specify loci with 10 alleles. See table 1 for additional scenario details. = vector of allele frequencies.
bMatch rate call discrepancies are calculated as (scenario specific match rate – match rate from exactly specified input parameters); zero differences are bolded, negative differences are italicized.
cResults are compared against a 10 loci panel with ADO = 0.05 and FA = 0.02.
Sample clustering results for Pacific walrus SNP multilocus genotype dataa.
| no. loci | within occasion recaptures 2013 | within occasion recaptures 2014 | between occasion recapturesb | range in common positive PCR loci for matches | maximum number of discrepancies observed in matches |
|---|---|---|---|---|---|
| 31 (sets 1 and 2) | 231 | 340 | 8 | 27–31 | 1 |
| 32 (sets 1 and 3) | 231 | 343 | 8 | 27–32 | 1 |
| 31 (sets 2 and 3) | 237 | 347 | 8 | 27–31 | 1 |
| 47 (sets 1–3) | 231 | 336 | 8 | 43–47 | 1 |
| 63 (sets 1–4) | 231 | 336 | 8 | 58–63 | 1 |
aSNPs are grouped into sets of 16; one locus in set 2 manifested a single allele and was purged from the analysis. A locus-level genotyping error floor of 1% was imposed for all loci during the likelihood-based error-tolerant matching protocol (see electronic supplementary material, table S4.1 for empirical rates).
bAll SNP set combinations identified the same individuals as recaptures across sampling occasions.