| Literature DB >> 16197552 |
David W Craig1, Matthew J Huentelman, Diane Hu-Lince, Victoria L Zismann, Michael C Kruer, Anne M Lee, Erik G Puffenberger, John M Pearson, Dietrich A Stephan.
Abstract
BACKGROUND: Pooling genomic DNA samples within clinical classes of disease followed by genotyping on whole-genome SNP microarrays, allows for rapid and inexpensive genome-wide association studies. Key to the success of these studies is the accuracy of the allelic frequency calculations, the ability to identify false-positives arising from assay variability and the ability to better resolve association signals through analysis of neighbouring SNPs.Entities:
Mesh:
Substances:
Year: 2005 PMID: 16197552 PMCID: PMC1262713 DOI: 10.1186/1471-2164-6-138
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Example of RAS statistics for three SNPs based on genotyping of 100 individuals with an average call rate of all SNPs greater than 98%. These example SNPs illustrate how SNP call reliability can vary both between SNPs and within the same SNP, as measured by RAS1 and RAS2 values. Blue spheres are BB individuals, orange triangles are AA individuals, and green squares are AB individuals, grey stars are "Not Called".
Figure 2(A) Allele frequency differences between individual and pooled genotypes. Histogram representing the total number of SNPs at each allele frequency difference between individual and pooled samples. (B) Accuracy of predicted SNP frequencies increases for those SNPs that perform well on Mapping 10K individual assays and decreases for poorly performing SNPs. The mean and median absolute difference between the predicted allelic frequency and individually genotyped allelic frequencies are shown vs. the binned performance of SNPs on individual assays. Performance is ranked by the frequency of calls in a set of 3,000 individually genotyped samples.
Inaccurate SNPs with the largest difference between SNP allele frequencies when genotyped individually vs. calculated from pooled DNA can be partially predicted. Nearly 40% of the SNPs found to be the 100 most inaccurate SNPs were also either (a) one the 500 worst performing SNPs in individual genotyping or (b) had the largest variability between replicates in the pool.
| 24.2% | 27.3% | 38.6% | 27.2% | |
| 12.5% | 14.5% | 22.1% | 20.2% | |
| 4.4% | 5.5% | 8.6% | 5.0% |
Figure 3Identification of the SIDDT locus from pooled genomic DNA by calculating the mean test-statistic for a rolling window of consecutive SNPs. The moving window was determined across the genome and the p-value was calculated from a distribution of 400 bootstraps of the original dataset. Mean window sizes of 1, 3, 5, 10, 15, and 20 are shown and the SIDDT locus is highlighted in yellow. The SIDDT disease locus is the top region for window sizes of 1, 5, 10, 15, and 20.
Identification of disease locus using a moving window. SNPs were ranked by test statistics and sorted by physical position. The average was calculated for a moving window of consecutive SNPs across the genome. The region 6q22.1 was already known to contain the mutation leading to the SIDDT. The rank of region 6q22.1 for a various window sizes in shown in the second column. In the 3rd, 4th, and 5th columns, the top 1, 2, and 3 SNPs were removed from the 6q22.1 regions to probe sensitivity of window size.
| 1 | 1 | 22 | 24 | 60 |
| 2 | 11 | 11 | 19 | 11 |
| 3 | 6 | 6 | 6 | 14 |
| 4 | 1 | 1 | 1 | 2 |
| 5 | 1 | 1 | 1 | 8 |
| 6 | 2 | 2 | 2 | 3 |
| 7 | 1 | 1 | 1 | 13 |
| 8 | 1 | 1 | 1 | 3 |
| 9 | 1 | 1 | 1 | 9 |
| 10 | 1 | 1 | 1 | 3 |