| Literature DB >> 22362730 |
Pierre Galichon1, Laurent Mesnard, Alexandre Hertig, Bénédicte Stengel, Eric Rondeau.
Abstract
Genome-wide association studies (GWAS) have become a preferred method to identify new genetic susceptibility loci. This technique aims to understanding the molecular etiology of common diseases, but in many cases, it has led to the identification of loci with no obvious biological relevance. Herein, we show that previously unrecognized sequence homologies have caused single-nucleotide polymorphism (SNP) microarrays to incorrectly associate a phenotype to a given locus when in fact the linkage is to another distant locus. Using genetic differences between male and female subjects as a model to study the effect of one specific genomic region on the whole SNP microarray, we provide strong evidence that the use of standard methods for GWAS can be misleading. We suggest a new systematic quality control step in the biological interpretation of previous and future GWAS.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22362730 PMCID: PMC3367202 DOI: 10.1093/nar/gks169
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
SNPs with genotypes significantly associated with gender according to various platforms
| rsID | Genes | SNP | Odds ratio (female/male) | |||
|---|---|---|---|---|---|---|
| rs4862188 | 4 | 184730519 | WWC2/CDKN2AIP | T/C | 0 | 7.775 × 10−20 |
| rs2880301 | 13 | 18998534 | T/C | 0 | 8.685 × 10−20 | |
| rs3883013 | 15 | 82889661 | C/T | 0 | 8.685 × 10−20 | |
| rs3883011 | 15 | 82889398 | G/C | 0 | 1.368 × 10−19 | |
| rs3883014 | 15 | 82889733 | C/G | 0 | 1.527 × 10−19 | |
| rs2228276 | 19 | 63271452 | ZNF 773/ZNF135 | T/C | 37.3 | 9.515 × 10−08 |
| rs12734338 | 1 | 200736346 | C/T | 0 | 3.57 × 10−31 | |
| rs3881953 | 1 | 200794644 | A/G | 0 | 4.26 × 10−31 | |
| rs3817222 | 1 | 200731383 | T/C | 0 | 1.02 × 10−30 | |
| rs12743401 | 1 | 200743271 | C/T | 0 | 1.02 × 10−30 | |
| rs34868670 | 5 | 40273600 | PTGER4 | C/T | 0 | 1.53 × 10−30 |
| rs2451078 | 13 | 18996289 | G/C | 0.03111 | 8.56 × 10 −25 | |
| rs1556557 | 1 | 241046639 | RSL24D1P4, LOC10012 | A/G | 0 | 6.06 × 10−15 |
| rs3817227 | 1 | 200731465 | G/A | 0 | 6.06 × 10−15 | |
| rs4084639 | 1 | 200776787 | C/G | 0 | 6.06 × 10−15 | |
| rs10914658 | 1 | 33303337 | AK2A, ADC | A/G | 0 | 6.06 × 10−15 |
| rs12734001 | 1 | 200657537 | T/C | 0 | 6.06 × 10−15 | |
| rs12739153 | 1 | 241049487 | RSL24D1P4, LOC10012 | T/G | 0 | 6.06 × 10−15 |
| rs12741415 | 1 | 200741397 | A/G | 0 | 6.06 × 10−15 | |
| rs17319010 | 1 | 222156006 | ACTBP11, CIPC5 | C/A | 0 | 6.06 × 10−15 |
| rs17802433 | 2 | 94901357 | TEKT4 | T/G | 0 | 6.06 × 10−15 |
| rs4862188 | 4 | 184592364 | LOC100127981, CDKN2AIP | T/C | 0 | 6.06 × 10−15 |
| rs2999200 | 13 | 18887941 | T/C | 0 | 6.06 × 10−15 | |
| rs3883011 | 15 | 82889398 | C/G | 0 | 6.06 × 10−15 | |
| rs3883013 | 15 | 82889661 | C/T | 0 | 6.06 × 10−15 | |
| rs17301021 | 15 | 82613080 | G/C | 0 | 6.06 × 10−15 | |
| rs2502344 | 1 | 241137354 | LOC100129949, LOC100420263 | A/G | 0 | 6.81 × 10−15 |
| rs12734338 | 1 | 200736346 | C/T | 0 | 6.81 × 10−15 | |
| rs3883014 | 15 | 82889733 | G/C | 0 | 6.81 × 10−15 | |
| rs3881953 | 1 | 200794644 | A/G | 0 | 7.67 × 10−15 | |
| rs1778596 | 1 | 143702635 | A/T | 0 | 8.66 × 10−15 | |
| rs12743401 | 1 | 200743271 | C/T | 0 | 8.66 × 10−15 | |
| rs2880301 | 13 | 18998534 | T/C | 0 | 1.06 × 10−14 | |
| rs3847124 | 7 | 137842064 | TRIM24 | G/A | 0 | 1.53 × 10−14 |
| rs11166266 | 1 | 99771825 | LPPR4, PALMD | T/C | 0 | 1.87 × 10−14 |
| rs12723357 | 1 | 241185135 | LOC100129949, LOC100420263 | C/T | 0 | 1.87 × 10−14 |
| rs3013398 | 1 | 241209589 | LOC100129949, LOC100420263 | T/C | 87 | 8.60 × 10−14 |
| rs2390647 | 1 | 91130771 | LOC100505821, ZNF644 | C/T | ∞ | 9.65 × 10−14 |
| rs17042395 | 3 | 16568435 | G/A | 0.01149 | 1.09 × 10−13 | |
| rs12372818 | 13 | 46581126 | HT2RA | A/G | 0.02222 | 1.93 × 10−13 |
| rs351881 | 20 | 62314104 | MYT1 | T/C | 0.02222 | 2.19 × 10−13 |
| rs6820128 | 4 | 91700109 | FAM190A | A/G | 0 | 1.13 × 10−12 |
| rs11667496 | 19 | 23750678 | RPSAP58 | G/A | 0.01266 | 1.45 × 10−12 |
| rs4860568 | 4 | 64690977 | TECRL | A/G | 0.01299 | 2.03 × 10−12 |
| rs9881157 | 3 | 35626953 | ARPP21 | C/A | 0.02564 | 1.92 × 10−11 |
| rs4685345 | 3 | 16585452 | G/C | 0.099 | 6.16 × 10−10 | |
| rs6803924 | 3 | 16592069 | G/C | 0.09702 | 1.51 × 10−09 | |
| rs34868670 | 5 | 40273600 | PTGER4 | C/T | 0 | 3.631 × 10−26 |
| rs4737118 | 8 | 43533172 | G/A | 0 | 3.631 × 10−26 | |
| rs12743401 | 1 | 200743271 | C/T | 0 | 4.603 × 10−26 | |
| rs12214551 | 6 | 2991748 | SERPINB8P1 | C/T | 0 | 5.635 × 10−26 |
| rs36019094 | 5 | 40273131 | PTGER4 | A/C | 0 | 8.188 × 10−26 |
| rs7808552 | 7 | 63066168 | VN1R36P, LOC100419780 | G/A | 0 | 9.278 × 10−26 |
| rs3817222 | 1 | 200731383 | T/C | 0 | 9.839 × 10−26 | |
| rs3994533 | 15 | 82882831 | T/C | 0 | 9.839 × 10−26 | |
| rs2880301 | 13 | 18998534 | T/C | 0 | 1.359 × 10−25 | |
| rs12741415 | 1 | 200741397 | A/G | 0 | 1.763 × 10−25 | |
| rs6944297 | 7 | 63937080 | ZNF138, LOC168474 | T/G | 0 | 2.458 × 10−25 |
| rs6836144 | 4 | 119595470 | LOC100128177, LOC100420037 | A/C | ∞ | 5.355 × 10−25 |
| rs1556557 | 1 | 241046639 | RSL24D1P4, LOC100129949 | A/G | 0.006211 | 1.77 × 10−24 |
| rs7039117 | 9 | 97097001 | FANCC | C/T | 0.006617 | 2.553 × 10−23 |
| rs6917603 | 6 | 30125050 | ETF1P1, C6Orf12 | C/T | 0.0559 | 6.801 × 10−20 |
| rs9636470 | 2 | 87947576 | LOC730268, LOC100419917 | G/A | 3.569 | 3.869 × 10−08 |
| rs11635160 | 15 | 82607789 | A/G | 0.2805 | 7.955 × 10−08 |
In bold are the SNPs that also were identified in an Affymetrix 6.0 data set by directly comparing probe intensities.
Figure 1.BLAST alignment analysis of the flanking sequence of a sex-associated SNP (rs12372818 on chromosome 13). Two homologous sequences are present on the Y chromosome (and one on chromosome 3). The presence of the ‘A’ variant on chromosome Y is responsible for a higher frequency of the minor allele in males.
Figure 2.Sex-association score (Chi square statistics) versus the homology score (BLAST raw score) in data set 1 (Illumina 370 k). For each level of homology, mean diamonds with 95% confidence interval. **P < 0.0001 versus no homology (0).
Figure 3.Schematic view of the hybridization of DNA to a microarray probe. Three possibilities include theoretical hybridization, rogue hybridization with a homolog, and bulk hybridization of genomic DNA that sequesters the restriction fragment away from the probe. (1) Hybridization of the target sequence with the probe, according to theory. (2) Hybridization of a sex chromosome sequence with the probe of a homologous autosomal SNP, competing with the theoretical autosomal restriction fragment. (3) Hybridization of a sex chromosome restriction fragment with an autosomal SNP restriction fragment, competing with the microarrays' oligonucleotide probe. (4) Oligonucleotide probes for sex chromosomes’ SNPs hybridize with the same restriction fragment as probes for autosomal SNPs and are thus statistically correlated.
Example on SNP rs13269433 of convergent approaches using BLAST sequence alignment to identify interautosomal SNP homologies and a correlation test to identify interdependent SNPs
| rs13269433, chromosome 8, near MFHAS1 | |||||
|---|---|---|---|---|---|
| Flanking sequence: ATATATATCAGCCAGA[T/C]GTGCCACGTGAGCCTG | |||||
| Blast hits | Alignment | Position on chromosome X | rsID | Correlation ( | Correlation ( |
| Haloacid dehalogenoase-like hydrolase domain-containing protein | ATATATATCAGCCA | 245274 | rs12007101 | −0.79 | 1.37 × 10−11 |
| rs5934477 | 0.74 | 5.94 × 10−10 | |||
| Mastermind-like domain-containing protein 1 | TGCCACGTGAGCCT | 551801 | rs6649480 | −0.74 | 7.47 × 10−10 |
| rs9723770 | −0.78 | 1.86 × 10−11 | |||
| rs5925461 | −0.74 | 9.30 × 10−10 | |||
| rs5970516 | −0.85 | 9.54 × 10−15 | |||
| rs5925482 | 0.82 | 4.56 × 10−13 | |||
| Kelch-like protein 13 | TATATCAGCCAGA | 777982 | rs10465428 | 0.75 | 3.37 × 10−10 |
| rs7885432 | −0.80 | 5.59 × 10−12 | |||
| rs2465941 | 0.80 | 4.66 × 10−12 | |||
| rs2106683 | 0.76 | 1.34 × 10−10 | |||
| Neuroligin-4. X-linked precursor | ATATATATCAGCCA | 922956 | rs17219044 | 0.75 | 5.18 × 10−10 |
| rs36122347 | 0.76 | 2.13 × 10−10 | |||
| rs16983683 | −0.80 | 4.95 × 10−12 | |||
| rs5961738 | 0.84 | 1.75 × 10−14 | |||
| rs12844412 | −0.76 | 1.15 × 10−10 | |||
| rs7881412 | 0.75 | 4.01 × 10−10 | |||
| rs10127411 | 0.80 | 4.35 × 10−12 | |||
| DDB1- and CUL4-sassociated factor 12-like protein 1 | ATATATCAGCCAGA | 1369645 | rs5929972 | −0.80 | 1.95 × 10−12 |
| rs7065014 | −0.74 | 9.58 × 10−10 | |||
| rs201647 | −0.77 | 9.61 × 10−11 | |||
| rs1601226 | 0.83 | 1.57 × 10−13 | |||
| rs16997689 | 0.80 | 4.09 × 10−12 | |||
| PAS domain-containing protein 1 | ATATATATCAGC | 1588059 | rs16995984 | 0.78 | 2.18 × 10−11 |
| rs7051678 | 0.74 | 5.78 × 10−10 | |||
| rs5924663 | 0.78 | 2.56 × 10−11 | |||
| Gamma-aminobutyric acid receptor subunit alpha 3 precursor | TATATATCAGCCA | 2591595 | rs7057635 | −0.75 | 4.41 × 10−10 |
| rs4446880 | 0.75 | 2.88 × 10−10 | |||
| Ribose-phosphate pyrophosphokinase 2 | TATATATCAGCCA | 4384225 | rs16987131 | 0.75 | 3.69 × 10−10 |
| Nance–Horan syndrome protein isoform 1 | CCACGTGAGCCTG | 9035432 | rs7887450 | 0.80 | 4.72 × 10−12 |
| rs6632979 | −0.76 | 1.09 × 10−10 | |||
| rs7473191 | 0.83 | 1.06 × 10−13 | |||
| rs6527811 | −0.81 | 1.14 × 10−12 | |||
| Dystrophin | TATATATCAGCCA | 24534293 | rs16989676 | 0.77 | 8.99 × 10−11 |
| rs16989902 | −0.77 | 6.69 × 10−11 | |||
| rs1158629 | 0.75 | 2.40 × 10−10 | |||
| rs1356619 | 0.75 | 3.07 × 10−10 | |||
| rs1518519 | −0.82 | 3.63 × 10−13 | |||
| rs7887670 | −0.74 | 8.15 × 10−10 | |||
| Melanoma-associated antigen B16 | TGCCACGTGAGCCT | 27045018 | rs6632359 | 0.80 | 1.26 × 10−12 |
| Zinc finger protein 92 homolog | TGCCACGTGAGC | 152706248 | rs2980024 | −0.79 | 1.28 × 10−11 |
From left to right, for each line, the gene nearest to BLAST hit (region of homology to rs13269433 on chromosome X), the aligned sequence, its position on chromosome X, the correlated SNPs in the same region, its correlation factor r and its P-value.