| Literature DB >> 17137502 |
Mohua Podder1, William J Welch, Ruben H Zamar, Scott J Tebbutt.
Abstract
BACKGROUND: Single nucleotide polymorphisms (SNPs) are DNA sequence variations, occurring when a single nucleotide--adenine (A), thymine (T), cytosine (C) or guanine (G)--is altered. Arguably, SNPs account for more than 90% of human genetic variation. Our laboratory has developed a highly redundant SNP genotyping assay consisting of multiple probes with signals from multiple channels for a single SNP, based on arrayed primer extension (APEX). This mini-sequencing method is a powerful combination of a highly parallel microarray with distinctive Sanger-based dideoxy terminator sequencing chemistry. Using this microarray platform, our current genotype calling system (known as SNP Chart) is capable of calling single SNP genotypes by manual inspection of the APEX data, which is time-consuming and exposed to user subjectivity bias.Entities:
Mesh:
Substances:
Year: 2006 PMID: 17137502 PMCID: PMC1702553 DOI: 10.1186/1471-2105-7-521
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Examples of SNP Chart Application. Examples of SNP Charts for the SNP rs1106577 to illustrate the data structure (e.g., Table 1). Template DNA from three Coriell samples with three possible genotypes (CC, CT and TT) and one negative control (NN) are shown in four different charts. Each chart shows four-channel fluorescent intensity data (A, C, G, and T) on the vertical axes, from 12 rs1106577-specific array spots (duplicate spots for six different probes). On the horizontal axes, 12 probe-names corresponding to 12 spots are given sequentially. 1st and 2nd spots from the left ("LEFT C/T") refer to the left-hand APEX probe that will give either a single C (green) signal (for homozygous CC genotypes) or a T (blue) signal (for homozygous TT genotypes) or a mixture of C and T (heterozygous CT). 3rd and 4th spots from the left ("RIGHT G/A") refer to the right-hand APEX probe that interrogates the DNA strand nucleotide complementary to that of the left-hand APEX probe, thus giving a single G (red) signal (for CC), a single A (yellow) signal (for TT), or a mixed G and A signal (for CT). From left, spots 5 to 12, inclusive, represent allele-specific APEX probes in which a base-specific fluorescence signifies the presence of the allele. Among them, spots 5 to 8 refer to the "_1" probes corresponding to the first allele (C in the case of rs1106577) and spots 9 to 12 refer to the "_2" probes corresponding to the second allele (T). The redundancy and consistency of the data across different probes give high confidence in the assigned genotypes.
Data structure for SNP rs1106577 and DNA sample Coriell NA17102 (CC) (CC-chart in Figure 1)
| A | C | G | T | |||
| Spot 1 | APEX_LEFT | C and/or T | 732 | 258 | ||
| Spot 2 | APEX_LEFT | C and/or T | 965 | 348 | ||
| Spot 3 | APEX_RIGHT | G and/or A | 85 | 233 | ||
| Spot 4 | APEX_RIGHT | G and/or A | 104 | 269 | ||
| Spot 5 | ASO_1LEFT | T | 109 | 5284 | 80 | |
| Spot 6 | ASO_1LEFT | T | 107 | 5456 | 83 | |
| Spot 7 | ASO_2LEFT | T | 90 | 88 | 20 | |
| Spot 8 | ASO_2LEFT | T | 76 | 106 | 22 | |
| Spot 9 | ASO_1RIGHT | G | 288 | 182 | 992 | |
| Spot 10 | ASO_1RIGHT | G | 369 | 209 | 1098 | |
| Spot 11 | ASO_2RIGHT | G | 138 | 68 | 187 | |
| Spot 12 | ASO_2RIGHT | G | 151 | 68 | 193 |
Values of the explanatory variables for SNP rs1106577 and DNA sample Coriell NA17102
| APEX.L | APEX.XL | APEX.YL | 45,293 | 1,713 |
| APEX.R | APEX.XR | APEX.YR | 4,121 | 543 |
| ASO.L | ASO.XL | ASO.YL | 91,413 | 404 |
| ASO.R | ASO.XR | ASO.YR | 6,254 | 378 |
Figure 2Example of a well-behaved SNP: rs1932819. All the classifiers give three well separated clusters for the SNP rs1932819 [red, green, blue and black colored symbols respectively denote the classes YY, XY, XX and NN].
Figure 3Example of critical SNP: rs1003399. Sample 11 is correctly classified by both ASO probes and APEX.R probe but wrongly classified by APEX.L probe for the SNP: rs1003399, whereas for sample 20, APEX.L probe works the best [red, green, blue and black colored symbols respectively denote the classes YY, XY, XX and NN].
Results from Dynamic-variable LDA
| Coriell | SIRS | 96 | 99.6% | 98.9% | 94.9% | 99.6% |
| SIRS | Coriell | 102 | 99.9% | 99.3% | 95.6% | 99.8% |
| Coriell | CV | 96 | 100.0% | 98.7% | 94.2% | 99.2% |
| SIRS | CV | 102 | 99.9% | 99.3% | 96.0% | 99.8% |
Results from Simple LDA
| Coriell | SIRS | 96 | 99.4% | 97.3% | 98.1% | 97.3% |
| SIRS | Coriell | 102 | 99.5% | 93.0% | 99.5% | 93.0% |
| Coriell | CV | 96 | 99.8% | 98.4% | 99.7% | 98.5% |
| SIRS | CV | 102 | 99.4% | 99.5% | 98.9% | 99.6% |
Applying LDA using four sets of classifiers
| ASO.L | log(ASO.XL), log(ASO.YL) |
| ASO.R | log(ASO.XR), log(ASO.YR) |
| APEX.L | log(APEX.XL), log(APEX.YL) |
| APEX.R | log(APEX.XR), log(APEX.YR) |
Posterior probabilities from four LDA classifiers
| XX | XY | YY | NN | |
| ASO.L | ||||
| ASO.R | ||||
| APEX.L | ||||
| APEX.R |
Posterior probabilities from Table 6 for SNP rs1003399 and target sample Coriell NA17111
| CC | CG | GG | NN | |
| ASO.L | <0.001 | 0.001 | 0.999 | <0.001 |
| ASO.R | <0.001 | 0.003 | 0.997 | <0.001 |
| APEX.L | <0.001 | 1.000 | <0.001 | <0.001 |
| APEX.R | <0.001 | 0.005 | 0.995 | <0.001 |
Resultant posterior probabilities from two methods
| Dynamic-variable LDA | Simple LDA | |
| CC | <0.001 | <0.001 |
| CG | 0.253 | 1.000 |
| GG | 0.746 | <0.001 |
| NN | <0.001 | <0.001 |