| Literature DB >> 23435227 |
Adaikalavan Ramasamy1, Daniah Trabzuni, J Raphael Gibbs, Allissa Dillman, Dena G Hernandez, Sampath Arepalli, Robert Walker, Colin Smith, Gigaloluwa Peter Ilori, Andrey A Shabalin, Yun Li, Andrew B Singleton, Mark R Cookson, John Hardy, Mina Ryten, Michael E Weale.
Abstract
Polymorphisms in the target mRNA sequence can greatly affect the binding affinity of microarray probe sequences, leading to false-positive and false-negative expression quantitative trait locus (QTL) signals with any other polymorphisms in linkage disequilibrium. We provide the most complete solution to this problem, by using the latest genome and exome sequence reference data to identify almost all common polymorphisms (frequency >1% in Europeans) in probe sequences for two commonly used microarray panels (the gene-based Illumina Human HT12 array, which uses 50-mer probes, and exon-based Affymetrix Human Exon 1.0 ST array, which uses 25-mer probes). We demonstrate the impact of this problem using cerebellum and frontal cortex tissues from 438 neuropathologically normal individuals. We find that although only a small proportion of the probes contain polymorphisms, they account for a large proportion of apparent expression QTL signals, and therefore result in many false signals being declared as real. We find that the polymorphism-in-probe problem is insufficiently controlled by previous protocols, and illustrate this using some notable false-positive and false-negative examples in MAPT and PRICKLE1 that can be found in many eQTL databases. We recommend that both new and existing eQTL data sets should be carefully checked in order to adequately address this issue.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23435227 PMCID: PMC3627570 DOI: 10.1093/nar/gkt069
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Studies that have provided an empirical assessment of the polymorphism-in-probe problem
| Article (PMID) | Tissues and sample size | Expression chip (probe length) | SNP set used to check SNP-in-probe (# SNPs in set) | Method of assessment and reported findings |
|---|---|---|---|---|
| Walter | Whole brain from six C57BL/6J strain mice and six DBA/2J strain mice | Affymetrix MOE430 2.0 chips (25-mer probes but only transcript-level was analysed) | NIEHS/Perlegen Mouse Resequencing Project & Mouse Phenome Database SNP Tool & Sanger resequencing (∼3.9 m SNPs) | Compared results before and after masking SNP-containing probes. |
| Nature Methods | 22% false-positive rate and 12% false-negative rates (RMA) or | |||
| PMID: 17762873 | 36% false-positive rate and 13% false-negative rates (MAS 5.0) | |||
| Meyers | 193 neuropathologically normal human brains (pooled regions) | Illumina Human Refseq-8 Expression (50-mer probes) | Genotyped SNPs (366 140 SNPs) | Discarded associations if probe contained a SNP |
| Nature Genetics | 13% of significant | |||
| PMID: 17982457 | 5% of significant | |||
| Benovoy | 57 CEU HapMap individuals, LCLs | Affymetrix Human Exon 1.0 ST (25-mer probes) | HapMap II release 21 (∼ 4 million SNPs) | Compared results before and after masking SNP-containing probes. |
| Nucleic Acids Research | 86.6% false-positive rate and 0.3% false-negative rate (exon-level) | |||
| PMID: 18596082 | 8.1% false-positive rate and 0.05% false-negative rate (gene-level) | |||
| Heinzen | 93 frontal cortex | Affymetrix Human Exon 1.0 ST (25-mer probes) | Genotyped SNPs (<550 thousand SNPs) | Discarded associations if the hit SNP was inside the probe sequence or in high LD ( |
| PLoS Biology | 80 blood cell | |||
| PMID: 19222302 | 36.6% of significant | |||
| Gamazon | 57 CEU HapMap individuals | Affymetrix Human Exon 1.0 ST (25-mer probes) | 1000 Genomes Pilot 1 (April 2009) + dbSNP v129 (unclear on number of SNPs) | Focused on 782 differentially spliced probesets from their previous published study and reports that ∼15% of these could be affected by novel SNPs in 1000Genomes Pilot 1 (compared with dbSNP v129). |
| PLoS One | 56 YRI HapMap individuals, LCLs | |||
| PMID: 20186275 | ||||
| Stranger | 726 individuals from 8 HapMap populations, LCLs | Illumina Sentrix Human-6 Expression BeadChip version 2 (50-mer probes) | 1000 Genomes Pilot 1 (Aug 2010) with MAF > 5% (unclear on number of SNPs) | 6.5% of probes contained SNP(s) within the probe sequence while 7.4% of the significant probes (i.e. has at least one significant |
| PLoS Genetics | ||||
| PMID: 22532805 | ||||
| Therefore, concluded no significant enrichment. | ||||
| Ramasamy | 130 cerebellum 127 frontal cortex | Affymetrix Human Exon 1.0 ST (25-mer probes) | 1000 Genomes Integrated Phase 1 version 3 (March 2012) and NHLBI-ESP (∼9.3 million SNPs ∼1 million indels) | Proportion of |
| 301 cerebellum 304 frontal cortex | Illumina HT12 (50-mer probes) | 31–52.6% in FCTX and 20–46.7% in CRBL | ||
METHOD Indicates methodological articles that explicitly studied this problem in greater detail.
Classification of probes /probesets in both data sets with progressively more comprehensive polymorphism reference data source
| Affymetrix Human Exon 1.0 ST (∼1.2 million 25-mer probes grouped into 298 k probesets) based on the UK Human Brain Expression Consortium (UKBEC, | Illumina Human HT12 (43 009 50-mer probes) based on the North American Brain Expression Consortium (NABEC, | |||||||
|---|---|---|---|---|---|---|---|---|
| Polymorphism reference data source (restricted to autosomes and frequency > 1%) | No. of variants | No. of unique variants in probe sequence | No. of core probesets unaltered | No. of core probesets altered (%) | No. of core probesets discarded (%) | No. of unique variants in probe sequence | No. of probes unaltered | No. of probes discarded (%) |
| Illumina Infinium HumanHap550 (after QC) | 512 771 SNPs | 362 SNPs | 42 638 | 371 (0.86%) | ||||
| Illumina Omni 1M (after QC) | 795 391 SNPs | 20 926 SNPs | 278 585 | 13 515 (4.5%) | 6260 (2.1%) | |||
| CEU panel of HapMap release 28 (August 2010) [unrelated | 2 602 611 SNPs | 24 911 SNPs | 275 010 | 16 162 (5.4%) | 7188 (2.4%) | 1557 SNPs | 41 448 | 1561 (3.6%) |
| SNPs from European panel of 1000 Genomes Integrated Phase 1 version 3 (March 2012) ( | 9 013 135 SNPs | 50 813 SNPs | 254 277 | 28 932 (9.7%) | 15 151 (5.1%) | 5186 SNPs | 38 356 | 4653 (10.8%) |
| + SNPs info from European Americans from the NHLBI-ESP ( | 9 025 738 SNPs | 52 843 SNPs | 252 692 | 29 808 (10.0%) | 15 860 (5.3%) | 5361SNPs | 38 243 | 4766 (11.1%) |
| + indels from European panel of 1000 Genomes Integrated Phase 1 version 3 (March 2012) ( | 9 025 738 SNPs + 927 779 indels | 52 843 SNPs + 2097 indels | 251 313 | 30 621 (10.3%) | 16 426 (5.5%) | 5361 SNPs + 332 indels | 37 993 | 5016 (11.7%) |
Figure 1.The proportion of LD-resolved cis-eQTL signals discarded because of the polymorphism-in-probe sequence problem using different polymorphism reference data sources and P-value thresholds. Multiple significant associations with a probe/probeset caused by SNPs in high LD (r2 ≥ 0.5) were treated as a single ‘LD-resolved’ signal. Also shown is the expected proportion that would be discarded if the rate was the same as the proportion of all probe/probesets (including ones without a cis-eQTL signal) discarded using the 1000 genomes (March 2012) plus Exome Sequencing Project reference data source.
Number of LD-resolved cis-eQTLs (P < 10−12) for the two data sets, using polymorphisms (present with minor allele frequency >1% in Europeans) from the combined 1000 Genomes (March 2012) plus NHLBI Exome Sequence Project data sources
| Affymetrix Human Exon 1.0 (25-mer probe design) based on the UK Human Brain Expression Consortium (UKBEC, | Illumina Human HT-12v3 (50-mer probe design) based on the North American Brain Expression Consortium (NABEC, | ||||
|---|---|---|---|---|---|
| CRBL | FCTX | CRBL | FCTX | ||
| Total number of | 1275 | 705 | Total number of | 1192 | 1018 |
| Type of probeset giving rise to the | Type of probe giving rise to the | ||||
| None of the corresponding probes contain a polymorphism (‘unaltered’) | 517 | 227 | Probe does not contain a polymorphism (‘unaltered’) | 793 | 681 |
| Only one corresponding probe contains polymorphism(s) (‘altered’) | 119 | 54 | Probe contains polymorphisms(s) (‘discarded’) | 396 | 337 |
| Two or more corresponding probes contain polymorphism(s) (‘discarded’) | 639 | 424 | |||
| Proportion of eQTLs discarded (excluding altered) = discarded / (discarded + unaltered) | 55.2% | 65.1% | Proportion of eQTLs discarded | 33.2% | 33.1% |
| Expected proportion of eQTLs to be discarded | 6.1% | Expected proportion of eQTLs to be discarded | 11.7% | ||
The expected proportion to be discarded is the proportion of all probe/probesets discarded (including ones without a cis-eQTL signal).
Figure 2.Illustrative examples of eQTLs with relevance to brain disorders. (A) Boxplots show the false-positive association between rs650927 genotypes and the measured expression of each of the four probes contained within the probeset 3723733 (exon 6 of MAPT) due to an SNP in the probe sequences. Two SNPs are present in the target sequence but only one is in high LD with the hit SNP. (B) Boxplots show the false-negative association between rs34725377 and probeset 3412103 (exon 8 of PRICKLE1) due to an SNP in one of the probe sequences. (C) Boxplots show the false-positive association between rs1751739 and the probe ILMN_1710903 (3′UTR region of the MAPT) due to a common 2-base pair deletion. The association between this SNP and ILMN_2310814, which also targets the 3′UTR of MAPT but is free of polymorphisms, is shown.