| Literature DB >> 21507252 |
Øystein A Haaland1, Kevin A Glover, Bjørghild B Seliussen, Hans J Skaug.
Abstract
BACKGROUND: The use of DNA methods for the identification and management of natural resources is gaining importance. In the future, it is likely that DNA registers will play an increasing role in this development. Microsatellite markers have been the primary tool in ecological, medical and forensic genetics for the past two decades. However, these markers are characterized by genotyping errors, and display challenges with calibration between laboratories and genotyping platforms. The Norwegian minke whale DNA register (NMDR) contains individual genetic profiles at ten microsatellite loci for 6737 individuals captured in the period 1997-2008. These analyses have been conducted in four separate laboratories for nearly a decade, and offer a unique opportunity to examine genotyping errors and their consequences in an individual based DNA register. We re-genotyped 240 samples, and, for the first time, applied a mixed regression model to look at potentially confounding effects on genotyping errors.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21507252 PMCID: PMC3112247 DOI: 10.1186/1471-2156-12-36
Source DB: PubMed Journal: BMC Genet ISSN: 1471-2156 Impact factor: 2.797
Empirical estimates of error rates by laboratory.
| Lab (country) | Period | Sample size | |||||
|---|---|---|---|---|---|---|---|
| Lab 1 (Canada) | 97-02 | 116 (120) | 0.02373 | 0.00443 | 0.01422 | 0.00246 | 0.02825 |
| Lab 2 (Iceland) | 03-05 | 60 | 0.00500 | 0.00288 | 0.00333 | 0.00166 | 0.00666 |
| Lab 3 (Iceland) | 06 | 19 (20) | 0 | - | 0 | - | 0 |
| Lab 4 (Norway) | 07-08 | 39 (40) | 0.00256 | 0.00256 | 0.00128 | 0.00128 | 0.00256 |
| Total | 97-08 | 234 (240) | 0.01325 | 0.00236 | 0.00812 | 0.00131 | 0.01617 |
The column "Period" shows the catch period for which each laboratory had responsibility for the NMDR. Except for the catches from 1997-98, which were genotyped in the year 2000, all individuals were genotyped the year following capture. The column "Sample size" contains the number of successfully regenotyped individuals from the laboratories, with the total number of regenotyping attempts in parenthesis. Initially 20 samples were selected from each year, but 6 samples had to be thrown out because of amplification failure. Locus EV001 was included for all probabilities of error in this table. In columns labeled LOCUS and ALLELE, pand pare the mean error rates (number of errors detected divided by number of loci/alleles), and SD(p) the corresponding empirical standard deviations. The column labeled PRED contains the predicted values, , of p, calculated from pusing (2).
Candidate model set for NMDR study.
| Model | Fixed | Rand | AIC | p | SD(p) | CV(p) | W | |
|---|---|---|---|---|---|---|---|---|
| M1 | LOCUS + YEAR | IND | 1.35 | 295.7 | 0.0774 | 0.0214 | 0.28 | 5.76e-1 |
| M2 | YEAR | IND | 1.24 | 297.5 | 0.0243 | 0.0045 | 0.19 | 2.34e-1 |
| M3 | LAB + LOCUS | IND | 1.43 | 299.3 | 0.0514 | 9.52e-2 | ||
| Lab1 | 0.0823 | 0.0242 | 0.29 | |||||
| Lab2 | 0.0185 | 0.0088 | 0.48 | |||||
| Lab4 | 0.0102 | 0.0089 | 0.87 | |||||
| M4 | LAB | IND | 1.32 | 301.0 | 0.0162 | 4.07e-2 | ||
| Lab1 | 0.0261 | 0.0045 | 0.17 | |||||
| Lab2 | 0.0054 | 0.0024 | 0.44 | |||||
| Lab4 | 0.0029 | 0.0025 | 0.86 | |||||
| M5 | LOCUS + YEAR | - | - | 303.1 | 0.0768 | 0.0239 | 0.29 | 1.42e-2 |
| M6 | LOCUS + YEAR | MP:IND | 1.22 | 303.5 | 0.0771 | 0.0237 | 0.33 | 1.17e-2 |
| M7 | YEAR | - | - | 303.6 | 0.024 | 0.0046 | 0.2 | 1.11e-2 |
| M8 | YEAR | MP:IND | 1.13 | 304.0 | 0.0242 | 0.0047 | 0.2 | 9.08e-3 |
| M9 | LAB + LOCUS | - | - | 307.1 | 0.0511 | 1.92e-3 | ||
| Lab1 | 0.0822 | 0.0263 | 0.32 | |||||
| Lab2 | 0.0182 | 0.0118 | 0.65 | |||||
| Lab4 | 0.0094 | 0.0097 | 1.04 | |||||
| M10 | LAB + LOCUS | MP:IND | 1.31 | 307.2 | 0.0516 | 1.83e-3 | ||
| Lab1 | 0.0828 | 0.0405 | 0.49 | |||||
| Lab2 | 0.0193 | 0.0172 | 0.89 | |||||
| Lab4 | 0.0102 | 0.0131 | 1.28 | |||||
| M11 | LAB | - | - | 307.5 | 0.016 | 1.58e-3 | ||
| Lab1 | 0.0259 | 0.0048 | 0.19 | |||||
| Lab2 | 0.0056 | 0.0032 | 0.58 | |||||
| Lab4 | 0.0028 | 0.0031 | 1.08 | |||||
| M12 | LAB | MP:IND | 1.20 | 307.9 | 0.0162 | 1.29e-3 | ||
| Lab1 | 0.0261 | 0.038 | 0.15 | |||||
| Lab2 | 0.0056 | 0.081 | 1.44 | |||||
| Lab4 | 0.0028 | 0.062 | 2.15 | |||||
| M13 | LOCUS | IND | 1.52 | 308.3 | 0.0513 | 0.0129 | 0.25 | 1.06e-3 |
| M14 | - | IND | 1.41 | 309.9 | 0.016 | 0.0026 | 0.17 | 4.75e-4 |
| M15 | LOCUS | MP:IND | 1.50 | 318.3 | 0.0523 | 0.0057 | 0.11 | 7.12e-6 |
| M16 | - | MP:IND | 1.36 | 319.1 | 0.0162 | 0.0055 | 0.36 | 4.78e-6 |
| M17 | LOCUS | - | - | 319.5 | 0.0512 | 0.0156 | 0.3 | 3.91e-6 |
| M18 | - | - | - | 319.8 | 0.016 | 0.0030 | 0.2 | 3.37e-6 |
Model fit and estimated error rates for different covariate models sorted according to the AIC score (column five). "σ" contains the standard deviations of the random effects. The values in "p" are the probabilities that an error occurs at locus GATA417 in year 2001. Whenever LAB is included in the model, weighted means and laboratory specific error rates are given (Lab 1, Lab 2, Lab 4). "SD(p)" and "CV(p)" are the bootstrap based standard deviations and coefficients of variance of p. The column "W" contains the Akaike weights of the models.
Estimates of (per locus) error rates for best fitting model (M1).
| Locus | p | SD(p) | CV(p) |
|---|---|---|---|
| 0.0774 | 0.0214 (0.0241) | 0.28 | |
| 0.0283 | 0.0112 (0.0122) | 0.41 | |
| 0.0283 | 0.0130 (0.0134) | 0.46 | |
| 0.0213 | 0.0110 (0.0112) | 0.52 | |
| 0.0141 | 0.0093 (0.0095) | 0.66 | |
| 0.0141 | 0.0101 (0.0102) | 0.71 | |
| 0.0141 | 0.0083 (0.0085) | 0.59 | |
| 0.0141 | 0.0091 (0.0092) | 0.64 | |
| 0.0071 | 0.0084 (0.0084) | 1.19 | |
| 0 | 0 | - |
The values in "p" are the probabilities that an error occurs at a particular locus in year 2001. "SD(p)" and "CV(p)" are the bootstrap based standard deviations and coefficients of variance of p. In parenthesis are the "upper limit" standard deviations (i.e., we included bootstrap replica which produced warning messages as having zero errors).
Summary of genotyping errors observed in the NMDR according to laboratory.
| Single false allele size | 7 | 3 | 0 | 2 | 0 | 0 | |||
| Double false allele size | 1 | 5 | 0 | 0 | 0 | 0 | |||
| False homozygote | 5 | 4 | 1 | 0 | 0 | 0 | |||
| False heterozygote | 0 | 2 | 0 | 0 | 0 | 1 | |||
Columns labeled "Total" contain the total number of errors of a specific kind for each laboratory. False allele size is when an allele was erroneously sized compared to the true size for single and double alleles. " = 1 bp" denotes that the erroneously called allele is a distance one from the true allele, whereas ">1 bp" means that the distance was greater than one. False homozygote is where the true genotype was a heterozygote, and false heterozygote is where the true genotype was a homozygote.
Multilocus error rates.
| Dep. | 17.0 | 13.4 | 3.60 | 21.2 |
| Ind. | 20.1 | 18.3 | 1.77 | 8.84 |
Probability of having E errors in a 10 loci profile for model M1. The top row (Dep.) assumes dependence between the loci within an individual, and the bottom row (Ind.) assumes independence. P(E > 1|E > 0) is the conditional probability of having two or more errors given that an error occurs.
Figure 1Probability densities for error rates, .
Error rates and allele lengths.
| Parameter | Coef | SD(coef) | |||
|---|---|---|---|---|---|
| 1.59 | 0.38 | 252 | 223 | 0.0512 | |
| 0.40 | 0.52 | 211 | 203 | 0.0186 | |
| 0.40 | 0.52 | 223 | 202 | 0.0186 | |
| 0.09 | 0.58 | 166 | 155 | 0.0140 | |
| -0.35 | 0.68 | 217 | 203 | 0.0093 | |
| -0.35 | 0.68 | 125 | 117 | 0.0093 | |
| -0.35 | 0.68 | 116 | 106 | 0.0093 | |
| -0.35 | 0.68 | 115 | 103 | 0.0093 | |
| -1.08 | 0.93 | 107 | 93 | 0.0047 | |
| - | - | 175 | 153 | 0 | |
| YEAR | -0.36 | 0.11 | - | - | - |
| Intercept | -4.74 | 0.45 | - | - | - |
| VAR(IND) | 1.81 | 0.94 | - | - | - |
The second column is the values of the coefficients in M1 (Table 2), and the third is their standard deviations. Ais the length of the largest allele, and is calculated using the population allele frequencies. is the mean error rate at the locus. EV001 was not in the analyses, and has therefore no coefficient in M1. Its correct value would have been -∞.