| Literature DB >> 23819691 |
Susan E Johnston, Meri Lindqvist, Eero Niemelä, Panu Orell, Jaakko Erkinaro, Matthew P Kent, Sigbjørn Lien, Juha-Pekka Vähä, Anti Vasemägi, Craig R Primmer.
Abstract
BACKGROUND: DNA extracted from historical samples is an important resource for understanding genetic consequences of anthropogenic influences and long-term environmental change. However, such samples generally yield DNA of a lower amount and quality, and the extent to which DNA degradation affects SNP genotyping success and allele frequency estimation is not well understood. We conducted high density SNP genotyping and allele frequency estimation in both individual DNA samples and pooled DNA samples extracted from dried Atlantic salmon (Salmo salar) scales stored at room temperature for up to 35 years, and assessed genotyping success, repeatability and accuracy of allele frequency estimation using a high density SNP genotyping array.Entities:
Mesh:
Year: 2013 PMID: 23819691 PMCID: PMC3716687 DOI: 10.1186/1471-2164-14-439
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Temporal variation of DNA quality in archived scale samples. We show relationships between A. Total DNA concentration and Year, and B. Proportion of high molecular weight DNA (> 1000 bp) and year. Each point indicates an individual DNA extraction. Samples had been normalised to 50 ng/μl based on NanoDrop Spectrophotometer before total DNA concentration and fragment sizes were measured using an Agilent 2100 Bioanalyzer.
Figure 2Temporal variation of genotyping success in archived scale samples. We show relationships between A. Sample Call Rates and Year, and B. Sample call rate and proportion of DNA > 1000 bp. Each point indicates an individual genotyping run.
Figure 3Correlation between mean estimated allele frequencies and empirical allele frequencies within each pool. Means were calculated from nine replicates within each pool. R2 is the adjusted R2 values from a linear regression. N is the number of individuals included in each pool.
Figure 4Histogram of the mean difference between the empirical and estimated allele frequencies for each locus. The vertical dotted line indicates the mean of the distribution (0.0253).
Results of a general linear model testing factors affecting of allele frequency estimation
| Minor Allele Frequency | 0.0211 | 0.00180 | 11.75 | < 0.001 | |
| ( | GenTrain Score | −0.0226 | 0.00196 | −11.49 | < 0.001 |
| | | | | | |
| | Mono | −0.0149 | 0.00094 | −15.74 | < 0.001 |
| | MSV-3 | −0.0045 | 0.00102 | −4.401 | < 0.001 |
| Minor Allele Frequency | 0.0199 | 0.00352 | 5.65 | < 0.001 | |
| ( | GenTrain Score | −0.0227 | 0.00409 | −5.55 | < 0.001 |
| | | | | | |
| | Mono | −0.0150 | 0.00185 | −8.14 | < 0.001 |
| | MSV-3 | 0.0063 | 0.00186 | 3.37 | < 0.001 |
| Minor Allele Frequency | 0.0093 | 0.00346 | 2.68 | 0.00735 | |
| ( | GenTrain Score | −0.0302 | 0.00407 | −7.43 | < 0.001 |
| | | | | | |
| | Mono | −0.0145 | 0.00182 | −7.96 | < 0.001 |
| MSV-3 | −0.0035 | 0.00196 | −1.79 | 0.0730 |
The effects of minor allele frequency, GenTrain score and locus classification on the accuracy of allele frequency estimates were tested based on the mean difference between empirical and estimated allele frequencies per locus for all individuals. Estimated allele frequencies were estimated from clusters of all individuals (N = 514) and random subsets of 20 and 50 individuals. The mean difference in allele frequency was used as a dependent variable with a Gaussian error structure. S.E. is the standard error, t-value is the t statistic value and P-value is the corresponding significance of the model term.
Figure 5Boxplot demonstrating the accuracy of allele frequency estimation using subsets of reference individuals. Parameter estimation was carried out 100 times for each subset. A. The proportion of pooled samples for which frequency estimates could be calculated. B. The mean adjusted R2 over all pools and all loci for each simulation. C. The mean difference between empirical and estimated allele frequencies over all pools and loci for each simulation.
Figure 6Correlation between empirical and estimated allele frequencies calculated from 20 (left) and 50 (right) individuals. Each point represents the mean allele frequency calculated from nine replicates within each of the four pools (= four points per locus). Empirical allele frequencies were determined from the full dataset (N = 514). R2 is the adjusted R2 value from a linear regression (red line). NB. Note that some points are far removed from the regression line; these are cases where clusters in duplicated regions of the genome have been placed incorrectly due to the small number of reference individuals.