| Literature DB >> 16262895 |
Matthew J Huentelman1, David W Craig, Albert D Shieh, Jason J Corneveaux, Diane Hu-Lince, John V Pearson, Dietrich A Stephan.
Abstract
BACKGROUND: High throughput microarray-based single nucleotide polymorphism (SNP) genotyping has revolutionized the way genome-wide linkage scans and association analyses are performed. One of the key features of the array-based GeneChip Mapping 10K Array from Affymetrix is the automated SNP calling algorithm. The Affymetrix algorithm was trained on a database of ethnically diverse DNA samples to create SNP call zones that are used as static models to make genotype calls for experimental data. We describe here the implementation of clustering algorithms on large training datasets resulting in improved SNP call rates on the 10K GeneChip.Entities:
Mesh:
Year: 2005 PMID: 16262895 PMCID: PMC1280925 DOI: 10.1186/1471-2164-6-149
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Percentage of "NoCall" for SNPs on the 10K GeneChip. SNP performance was investigated for 948 individual genotypes on the 10K GeneChip® Mapping Array. SNPs were grouped based on their overall percentage of "No Call" signals.
Figure 2Percentage of SNPs by chromosome with "No Call" rates greater than 25%. SNPs having "No Call" rates greater than 25% were identified after processing with the MPAM (white bars) or SNiPer (black bars) algorithms. The total number of these poor performing SNPs was then divided by the total number of SNPs on the respective chromosome. The worst performing chromosome was 19 which is also known to have the highest gene density.
Figure 3A graphical representation of the performance of 6 example SNPs for 948 individuals. Screen shots of the call zones (ellipses) and respective calls (solid shapes) for select SNPs from 948 individual genotypes. Blue represents call zone and calls of "B/B", Green represents "A/B", and Purple represents "A/A". Red represents those individuals that produced a "No Call" for the SNP. RAS1 and RAS2 scores are indicated on the x and y-axis respectively. Panel (a) SNP_A-1517236 and (b) SNP_A-1510986 represent SNPs with tightly clustered RAS scores, but inadequately trained call zones. An infrequently called SNP (SNP_A-1606312) with no systematic explanation is illustrated in panel c. Some SNPs cluster tightly at their RAS2 values, but have widespread RAS1 values (SNP_A-1513739) as in panel d. The opposite effect is seen in panel e (SNP_A-1508518). Panel f shows a SNP that is called >99% of the time (SNP_A-1511517) in these 948 individuals.
Comparison of the Affymetrix MPAM and SNiPer algorithms.
| 5.22% ± 0.03% | ------ | 99.94% | |
| 0.97% ± 1.27% | 98.61% ± 0.21% | 99.80% |