| Literature DB >> 20003400 |
Ramani Anantharaman1, Fook Tim Chew.
Abstract
BACKGROUND: The use of pooled DNA on SNP microarrays (SNP-MaP) has been shown to be a cost effective and rapid manner to perform whole-genome association evaluations. While the accuracy of SNP-MaP was extensively evaluated on the early Affymetrix 10 k and 100 k platforms, there have not been as many similarly comprehensive studies on more recent platforms. In the present study, we used the data generated from the full Affymetrix 500 k SNP set together with the polynomial-based probe-specific correction (PPC) to derive allele frequency estimates. These estimates were compared to genotyping results of the same individuals on the same platform, as the basis to evaluate the reliability and accuracy of pooled genotyping on these high-throughput platforms. We subsequently extended this comparison to the new SNP6.0 platform capable of genotyping 1.8 million genetic variants.Entities:
Mesh:
Substances:
Year: 2009 PMID: 20003400 PMCID: PMC2806376 DOI: 10.1186/1471-2156-10-82
Source DB: PubMed Journal: BMC Genet ISSN: 1471-2156 Impact factor: 2.797
Figure 1Pooling strategy of overlapping sub-pools.
Comparing estimated and actual allele frequencies in sub-pools for 500 k platform.
| Correlation | MAD error | (95% CI) | |
|---|---|---|---|
| Case Pool 1 | 0.981 | 0.042 | (0.0461-0.0463) |
| Case Pool 2 | 0.982 | 0.041 | (0.0451-0.0453) |
| Case Pool 3 | 0.975 | 0.049 | (0.0536-0.0540) |
| Average of Case Pools | 0.987 | 0.035 | (0.0389-0.0391) |
| Control Pool 1 | 0.977 | 0.044 | (0.0496-0.0499) |
| Control Pool 2 | 0.976 | 0.049 | (0.0526-0.0529) |
| Control Pool 3 | 0.978 | 0.046 | (0.0501-0.0504) |
| Average of Control Pools | 0.985 | 0.037 | (0.0411-0.0414) |
Actual allele frequencies obtained from individual genotyping of the 1st set of 60 case and 60 control samples were compared with estimated allele frequencies from pooled genotyping of the same 120 samples. The comparison was carried out for samples in each of the sub-pools, and for averaged allele frequencies across the 3 sub-pools.
Comparing estimated and known allele frequencies in pool replicates for SNP6.0 platform.
| Correlation | MAD error | (95% CI) | |
|---|---|---|---|
| Case Pool Replicate 1 | 0.983 | 0.042 | (0.0435-0.0437) |
| Case Pool Replicate 2 | 0.986 | 0.038 | (0.0397-0.0399) |
| Case Pool Replicate 3 | 0.988 | 0.036 | (0.0373-0.0375) |
| Average of Case Pools | 0.989 | 0.035 | (0.0363-0.0365) |
| Control Pool Replicate 1 | 0.985 | 0.042 | (0.0423-0.0425) |
| Control Pool Replicate 2 | 0.984 | 0.041 | (0.0421-0.0423) |
| Control Pool Replicate 3 | 0.985 | 0.040 | (0.0415-0.0417) |
| Average of Control Pools | 0.988 | 0.036 | (0.0372-0.0374) |
Known allele frequencies obtained from the SNP6.0 Sample Data Set of the Hapmap CHB population were compared with estimated allele frequencies from pooled genotyping of 160 case and 160 control samples. The comparison was carried out for samples in each of the replicate-pools, and for averaged allele frequencies across the 3 replicates.
Comparing accuracy of allele frequency estimates from different reference samples for 500 k platform (1).
| Actual AF compared with: | Correlation | MAD error | (95% CI) | |
|---|---|---|---|---|
| AF estimates from 500 k Sample Data Set | Cases | 0.909 | 0.079 | (0.1015-0.1020) |
| Controls | 0.907 | 0.079 | (0.1024-0.1029) | |
| AF estimates from individually typed samples | Cases | 0.987 | 0.035 | (0.0389-0.0391) |
| Controls | 0.985 | 0.037 | (0.0411-0.0414) | |
Actual allele frequencies obtained from individual genotyping of the 1st set of 60 case and 60 control samples were compared with estimated allele frequencies of the same set 120 samples. Estimated allele frequencies were separately calculated using beta values obtained from the 500 k Sample Data Set, and from the individually genotyped samples on the 500 k array.
Comparing accuracy of allele frequency estimates from different reference samples for 500 k platform (2).
| Actual AF compared with: | Correlation | MAD error | (95% CI) | |
|---|---|---|---|---|
| AF estimates from 500 k Sample Data Set | Cases | 0.921 | 0.072 | (0.0942-0.0946) |
| Controls | 0.914 | 0.077 | (0.0998-0.1002) | |
| AF estimates from individually typed samples | Cases | 0.987 | 0.036 | (0.0389-0.0392) |
| Controls | 0.985 | 0.039 | (0.0422-0.0424) | |
Actual allele frequencies obtained from individual genotyping of the 1st set of 60 case and 60 control samples were compared with the estimated allele frequencies of the 2nd set of 60 case and 60 control samples. Estimated allele frequencies were separately calculated using beta values obtained from the 500 k Sample Data Set, and from the individually genotyped samples on the 500 k array.
Comparing accuracy of allele frequency estimates from different Hapmap reference samples for SNP6.0 platform.
| Known AF compared with: | Correlation | MAD error | (95% CI) | |
|---|---|---|---|---|
| AF estimated from CEU Sample Set (90) | Cases | 0.889 | 0.097 | (0.1095-0.1100) |
| Controls | 0.890 | 0.097 | (0.1086-0.1090) | |
| AF estimated from CHB Sample Set (45) | Cases | 0.989 | 0.035 | (0.0364-0.0365) |
| Controls | 0.988 | 0.037 | (0.0372-0.0374) | |
| AF estimated from JPT Sample Set (45) | Cases | 0.984 | 0.040 | (0.0426-0.0428) |
| Controls | 0.984 | 0.041 | (0.0430-0.0432) | |
| AF estimated from YRI Sample Set (90) | Cases | 0.780 | 0.125 | (0.1567-0.1574) |
| Controls | 0.782 | 0.124 | (0.1550-0.1557) | |
| AF estimated from All Sample Sets (270) | Cases | 0.944 | 0.060 | (0.0796-0.0799) |
| Controls | 0.945 | 0.059 | (0.0789-0.0792) | |
Known allele frequencies from each of the four Hapmap populations (CEU, CHB, JPT and YRI) as provided in the SNP6.0 Sample Data Set, and an average allele frequency across the four populations were compared with the estimated allele frequencies of the 160 case and 160 control samples genotyped on the SNP6.0 platform. The estimated allele frequencies were calculated using beta values obtained from the four individual Hapmap populations and their collated genotypes in the same data set.
Figure 2Bi-plot comparing actual and estimated allele frequencies of a random selection of 10,000 SNPs.
Figure 3Frequency distribution of absolute errors in allele frequencies between individual and pooled genotyping
Comparing errors in estimation of allele frequencies by filtering off uncalled SNPs.
| NoCall cutoff | SNPs Analysed | Correlation | MAD error | 95% CI |
|---|---|---|---|---|
| - | 500568 (100%) | 0.989 | 0.031 | (0.0360-0.0362) |
| 45 | 500488 (99.98%) | 0.989 | 0.031 | (0.0360-0.0362) |
| 40 | 500353 (99.96%) | 0.989 | 0.031 | (0.0359-0.0362) |
| 35 | 500036 (99.89%) | 0.989 | 0.031 | (0.0359-0.0361) |
| 30 | 499222 (99.73%) | 0.989 | 0.031 | (0.0358-0.0360) |
| 25 | 497469 (99.38%) | 0.989 | 0.030 | (0.0356-0.0358) |
| 20 | 493792 (98.65%) | 0.989 | 0.030 | (0.0354-0.0356) |
| 15 | 485709 (97.03%) | 0.989 | 0.030 | (0.0351-0.0353) |
| 10 | 466758 (93.25%) | 0.990 | 0.030 | (0.0346-0.0348) |
| 5 | 415562 (83.02%) | 0.990 | 0.030 | (0.0334-0.0337) |
| 0 | 198749 (39.7%) | 0.993 | 0.028 | (0.0288-0.0291) |
Actual allele frequencies obtained from individual genotyping were compared with estimated allele frequencies from pooled genotyping at various "NoCall" cutoffs.
Comparing errors in estimation of allele frequencies by filtering off rare SNPs.
| MAF cutoff | SNPs Analysed | Correlation | MAD error | 95% CI |
|---|---|---|---|---|
| 1% | 405478 (81%) | 0.982 | 0.031 | (0.0433-0.0435) |
| 5% | 355095 (70.94%) | 0.976 | 0.032 | (0.0449-0.0452) |
| 10% | 306552 (61.24%) | 0.969 | 0.032 | (0.0449-0.0452) |
| 15% | 260485 (52.04%) | 0.961 | 0.031 | (0.0439-0.0442) |
| 20% | 218147 (43.58%) | 0.949 | 0.030 | (0.0425-0.0428) |
Actual allele frequencies obtained from individual genotyping were compared with estimated allele frequencies from pooled genotyping at various minor allele frequency cutoffs.
Sensitivity and specificity of estimated allele frequencies at various minor allele frequency cutoffs.
| MAF cutoff | SNPs Analysed | Specificity | Sensitivity | ||
|---|---|---|---|---|---|
| Median | 95% CI | Median | 95% CI | ||
| 0% | 500568 (100%) | 0.969 | (0.9535-0.9538) | 0.811 | (0.4812-0.4886) |
| 1% | 405478 (81%) | 0.958 | (0.9438-0.9441) | 0.832 | (0.6226-0.6269) |
| 5% | 355095 (70.94%) | 0.954 | (0.9405-0.9408) | 0.859 | (0.7442-0.7464) |
| 10% | 306552 (61.24%) | 0.952 | (0.9388-0.9392) | 0.879 | (0.8030-0.8046) |
| 15% | 260485 (52.04%) | 0.951 | (0.9382-0.9386) | 0.896 | (0.8412-0.8425) |
| 20% | 218147 (43.58%) | 0.951 | (0.9383-0.9387) | 0.909 | (0.8676-0.8687) |
Actual allele frequencies obtained from individual genotyping were compared with estimated allele frequencies from pooled genotyping. Sensitivity and specificity calculations were made at various minor allele frequency cutoffs.