| Literature DB >> 16199750 |
Jesper Brohede1, Rob Dunne, James D McKay, Garry N Hannan.
Abstract
Robust estimation of allele frequencies in pools of DNA has the potential to reduce genotyping costs and/or increase the number of individuals contributing to a study where hundreds of thousands of genetic markers need to be genotyped in very large populations sample sets, such as genome wide association studies. In order to make accurate allele frequency estimations from pooled samples a correction for unequal allele representation must be applied. We have developed the polynomial based probe specific correction (PPC) which is a novel correction algorithm for accurate estimation of allele frequencies in data from high-density microarrays. This algorithm was validated through comparison of allele frequencies from a set of 10 individually genotyped DNA's and frequencies estimated from pools of these 10 DNAs using GeneChip 10K Mapping Xba 131 arrays. Our results demonstrate that when using the PPC to correct for allelic biases the accuracy of the allele frequency estimates increases dramatically.Entities:
Mesh:
Year: 2005 PMID: 16199750 PMCID: PMC1240117 DOI: 10.1093/nar/gni142
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Accuracy measures for allele frequency estimates using PPC
| Sample | Average accuracy | Largest under-estimation | Largest over-estimation | Proportion differing >10% |
|---|---|---|---|---|
| p10_rep1 | 0.048 (0.061) | −0.292 | 0.263 | 10.3 |
| p10_rep1_tech 2 | 0.053 (0.069) | −0.288 | 0.284 | 14.2 |
| p10_rep1_tech 3 | 0.067 (0.084) | −0.364 | 0.400 | 22.4 |
| p10_rep2 | 0.043 (0.056) | −0.244 | 0.247 | 7.6 |
| p10_rep3 | 0.062 (0.079) | −0.629 | 0.445 | 19.3 |
| p10average true replicas | 0.029 (0.038) | −0.194 | 0.206 | 1.8 |
| p10average technical replicas | 0.050 (0.065) | −0.294 | 0.294 | 12.1 |
aAccuracy is measured as specified in the text. Standard deviation in brackets.
bProportion of markers where the estimated allele frequency differs by more than ±10% of the deduced value.
r2-Values from allele frequency estimates in all pools of 10 individuals
| True allele frequency | p10_rep1 | p10_rep1_tech2 | p10_rep1_tech3 | p10_rep2 | p10_rep3 | p10average true replicas | p10average technical replicas | |
|---|---|---|---|---|---|---|---|---|
| True allele frequency | 1 | 0.966 | 0.960 | 0.923 | 0.948 | 0.929 | 0.978 | 0.962 |
| p10_rep1 | 1 | 0.971 | 0.944 | 0.914 | 0.934 | 0.983 | 0.985 | |
| p10_rep1_tech2 | 1 | 0.963 | 0.892 | 0.952 | 0.973 | 0.992 | ||
| p10_rep1_tech3 | 1 | 0.845 | 0.926 | 0.940 | 0.983 | |||
| p10_rep2 | 1 | 0.865 | 0.949 | 0.895 | ||||
| p10_rep3 | 1 | 0.969 | 0.950 | |||||
| p10average true replicas | 1 | 0.978 | ||||||
| p10average technical replicas | 1 |
Note that p10average true replicas and p10average technical replicas are averages of several samples rather than independent microarrays.
aAllele frequencies deduced from individual genotyping.
Average accuracy in different allele frequency intervals for the average replica sample p10average true replicas
| Allele frequency interval | Average accuracy | |
|---|---|---|
| 0.0–0.1 | 0.019 (0.014) | 125 |
| 0.1–0.2 | 0.029 (0.024) | 345 |
| 0.2–0.3 | 0.032 (0.028) | 631 |
| 0.3–0.4 | 0.029 (0.027) | 744 |
| 0.4–0.5 | 0.027 (0.023) | 793 |
| 0.5–0.6 | 0.028 (0.023) | 805 |
| 0.6–0.7 | 0.028 (0.024) | 779 |
| 0.7–0.8 | 0.030 (0.027) | 747 |
| 0.8–0.9 | 0.032 (0.026) | 501 |
| 0.9–1.0 | 0.026 (0.019) | 235 |
aAccuracy is measured as specified in the text. Standard deviation in brackets.
bNumber of SNPs in the interval.
Comparisons of the accuracy using different algorithms
| Reference | Average accuracy | No. of SNPs | Largest under-estimation | Largest over-estimation | Proportion differing >10% |
|---|---|---|---|---|---|
| This article | 0.029 (0.038) | 5705 | −0.194 | 0.206 | 1.8 |
| ( | 0.053 (0.060) | 7059 | −0.395 | 0.213 | 12.6 |
| ( | 0.070 (0.091) | 8179 | −0.514 | 0.441 | 25.5 |
| ( | 0.073 (0.092) | 7633 | −0.510 | 0.469 | 26.5 |
aAccuracy is measured as specified in the text. Standard deviation in brackets.
bProportion of markers where the estimated allele frequency differs by more than ±10% of the deduced value.
Figure 1The figure shows the relationship between the allele frequency deduced from individual genotyping and the allele frequency estimated with (A) the PPC described here, (B) the algorithm described in (20), (C) the algorithm described in (19) and (D) the algorithm described in (9). All estimates were based on the average of three replicas as specified in the text.
Previously described accuracy measures of SNP estimates in pooled DNA using non-array based technologies
| Reference | Technology | Accuracy | No. of SNPs studied | Pool size | No. of pools studied |
|---|---|---|---|---|---|
| ( | Real-time PCR | 0.02 | 5 | 10 | 1 |
| 0.02 | 3 | 100 | 1 | ||
| ( | Real-time PCR | 0.003 | 1 | 56 | 1 |
| 0.005 | 1 | 86 | 1 | ||
| 0.017 | 1 | 127 | 1 | ||
| ( | Pyrosequencing | 0.039 | 3 | 10 | 50 |
| 0.026 | 3 | 20 | 25 | ||
| 0.049 | 3 | 50 | 10 | ||
| 0.047 | 3 | 100 | 5 | ||
| 0.029 | 3 | 200 | 2 | ||
| 0.067 | 3 | 479 | 1 | ||
| ( | Pyrosequencing | 0.011 | 9 | 188 | 1 |
| 0.023 | 9 | 358 | 1 | ||
| 0.022 | 9 | 381 | 1 | ||
| 0.021 | 9 | 739 | 1 | ||
| ( | Pyrosequencing | 0.011 | 7 | 150 | 2 |
| ( | SnaPshot | 0.015 | 5 | 96 | 1 |
| ( | SnaPshot | 0.022 | 10 | 105 | 1 |
| ( | SnaPshot | 0.023 | 15 | 111–220 | NA |
| 0.017 | 7 | 130–222 | NA | ||
| ( | dHPLC | 0.017 | 5 | 96 | 1 |
| ( | dHPLC | 0.013 | 2 | 49–402 | 20 |
| ( | dHPLC | 0.015 | 9 | 111–220 | NA |
| ( | MALDI-TOF | 0.033 | 5 | 96 | 1 |
| ( | MALDI-TOF | 0.026 | 8 | 240 | 1 |
| 0.027 | 8 | 120 | 2 | ||
| 0.027 | 8 | 60 | 4 | ||
| ( | PLACE-SSCP | 0.017 | 1 | 78 | 1 |
The table shows an overview of results from a number of previously published papers in the DNA pooling field. The purpose was to contrast results from microarray technology with other genotyping technologies that have been used with pooled DNA. The accuracy is in some cases an average over several SNPs in multiple replicas as specified in the original references and all data has been corrected for biased allelic representation.
aTechnical replicas of the same pool have not been included.