| Literature DB >> 18596082 |
David Benovoy1, Tony Kwan, Jacek Majewski.
Abstract
Hybridization-based technologies, such as microarrays, rely on precise probe-target interactions to ensure specific and accurate measurement of RNA expression. Polymorphisms present in the probe-target sequences have been shown to alter probe- hybridization affinities, leading to reduced signal intensity measurements and resulting in false-positive results. Here, we characterize this effect on exon and gene expression estimates derived from the Affymetrix Exon Array. We conducted an association analysis between expression levels of probes, exons and transcripts and the genotypes of neighboring SNPs in 57 CEU HapMap individuals. We quantified the dependence of the effect of genotype on signal intensity with respect to the number of polymorphisms within target sequences, number of affected probes and position of the polymorphism within each probe. The effect of SNPs is quite severe and leads to considerable false-positive rates, particularly when the analysis is performed at the exon level and aimed at detecting alternative splicing events. Finally, we propose simple solutions, based on 'masking' probes, which are putatively affected by polymorphisms and show that such strategy results in a large decrease in false-positive rates, with a very modest reduction in coverage of the transcriptome.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18596082 PMCID: PMC2490733 DOI: 10.1093/nar/gkn409
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Boxplots illustrating the positional effect of SNPs within the probe target region. Probe signal ratios between perfect complementary regions and regions with a single mismatch.
Comparison of association analyses with and without a SNP mask
| SNP Mask | ||
|---|---|---|
| Positive for association | Negative for association | |
| No Mask | ||
| Positive for association | True positive | False positive |
| Negative for association | False negative | True negative |
Enrichment for probes with polymorphic target region in the top 1% of significant association for probes, probe sets and meta-probe sets
| Number of SNP overlaps | Enrichment (odds ratio) | ||
|---|---|---|---|
| Probe | Probe set | Meta-probe set | |
| All | 16.83 | 4.30 | 2.46 |
| 1 | 16.78 | 1.98 | 1.94 |
| 2 | 19.39 | 5.02 | 2.12 |
| 3 | NA | 10.89 | 2.40 |
| 4 | NA | 15.64 | 3.00 |
| ≥5 | NA | 14.84 | 3.01 |
Figure 2.ZNF37A is an example of a false-positive induced by a SNP (rs176889). (A) The ZNF37A mRNA molecule is illustrated with the coding region in yellow and the 5′ and 3′ UTRs is represented in white. The horizontal green rectangles represent the 4 probe sets that target this transcript. The red bars represent the position of SNP rs176889 in the coding sequence of this transcript. (B) The alignment of the 4 probe sequences that constitute probe set 3 243 183 and SNP rs176889 falls within each of these probes (red box). (C) Plots illustrating the association between each of the 4 probes and the different genotypes for SNP rs176889. Probe 496 020 does not contain any SNP and the association is non-significant. It is the only probe used to estimate probe set 3 243 183 expression scores. (D) Probe set 3 243 183 is no longer a false-positive after our masking procedure. (E) The same is observed at the meta-probe set level, where this gene is not significantly associated with SNP rs176889 or any other neighboring SNPs (results not shown).
Effect of the masking procedure on results from the association analysis of probe sets and meta-probe sets
| Probe sets | Meta-probe sets | |
|---|---|---|
| False positives | 446 | 9 |
| False negatives | 41 | 4 |
| True positives | 69 | 102 |
| True negatives | 13 359 | 8115 |
| False positive rate | 0.866 | 0.081 |
| False negative rate | 0.003 | 0.0005 |