| Literature DB >> 19825846 |
Graham A Heap1, Jennie H M Yang, Kate Downes, Barry C Healy, Karen A Hunt, Nicholas Bockett, Lude Franke, Patrick C Dubois, Charles A Mein, Richard J Dobson, Thomas J Albert, Matthew J Rodesch, David G Clayton, John A Todd, David A van Heel, Vincent Plagnol.
Abstract
Many disease-associated variants identified by genome-wide association (GWA) studies are expected to regulate gene expression. Allele-specific expression (ASE) quantifies transcription from both haplotypes using individuals heterozygous at tested SNPs. We performed deep human transcriptome-wide resequencing (RNA-seq) for ASE analysis and expression quantitative trait locus discovery. We resequenced double poly(A)-selected RNA from primary CD4(+) T cells (n = 4 individuals, both activated and untreated conditions) and developed tools for paired-end RNA-seq alignment and ASE analysis. We generated an average of 20 million uniquely mapping 45 base reads per sample. We obtained sufficient read depth to test 1371 unique transcripts for ASE. Multiple biases inflate the false discovery rate which we estimate to be approximately 50% for random SNPs. However, after controlling for these biases and considering the subset of SNPs that pass HapMap QC, 4.6% of heterozygous SNP-sample pairs show evidence of imbalance (P < 0.001). We validated four findings by both bacterial cloning and Sanger sequencing assays. We also found convincing evidence for allelic imbalance at multiple reporter exonic SNPs in CD6 for two samples heterozygous at the multiple sclerosis-associated variant rs17824933, linking GWA findings with variation in gene expression. Finally, we show in CD4(+) T cells from a further individual that high-throughput sequencing of genomic DNA and RNA-seq following enrichment for targeted gene sequences by sequence capture methods offers an unbiased means to increase the read depth for transcripts of interest, and therefore a method to investigate the regulatory role of many disease-associated genetic variants.Entities:
Mesh:
Substances:
Year: 2010 PMID: 19825846 PMCID: PMC2792152 DOI: 10.1093/hmg/ddp473
Source DB: PubMed Journal: Hum Mol Genet ISSN: 0964-6906 Impact factor: 6.150
Number of 45 bp quality reads (after filtering out low mapping score and clonal reads, see Materials and Methods), heterozygous SNPs and number of heterozygous and imbalanced SNPs (at P < 0.001) for each sample
| Number of quality 45 bp reads | Number of loci with depth >50 at any dbSNP | Number of loci with depth >50 at a heterozygous SNP | Number of imbalanced loci ( | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Total | In pairs | Single | |||||||
| Individual 1, stimulated | 15 151 170 | 15 151 170 | 0 | 2245 | 11 976 | 379 | 559 | 42 | 51 |
| Individual 1, unstimulated | 9 498 415 | 9 498 415 | 0 | 1089 | 6514 | 176 | 260 | 14 | 15 |
| Individual 2, stimulated | 35 964 198 | 24 244 978 | 11 719 220 | 4091 | 24 239 | 831 | 1348 | 80 | 114 |
| Individual 2, unstimulated | 20 722 639 | 20 722 639 | 0 | 2432 | 10 574 | 383 | 579 | 41 | 67 |
| Individual 3, stimulated | 16 336 921 | 16 336 921 | 0 | 1952 | 11 113 | 308 | 465 | 15 | 18 |
| Individual 3, unstimulated | 18 172 107 | 13 704 083 | 4 468 024 | 2030 | 12 491 | 379 | 619 | 28 | 28 |
| Individual 4, stimulated | 16 400 802 | 11 736 947 | 4 663 855 | 2067 | 12 178 | 351 | 536 | 37 | 39 |
| Individual 4, unstimulated | 15 240 509 | 10 688 635 | 4 551 874 | 2043 | 11 561 | 350 | 563 | 32 | 38 |
Figure 1.Probability to detect an allelic imbalance at P < 0.001 as a function of the read depth at a single SNP. We considered three levels of allelic imbalance: 60/40, 67/33 and 70/30. The 67/33 scenario corresponds to a two-fold difference in expression level, i.e. an average of one cycle difference between individuals homozygous at both alleles in a qPCR experiment.
Figure 2.Number of transcripts containing at least one heterozygous dbSNP with read depth 50 (black), 100 (red) and 250 (green) as a function of the number of quality 45 bp reads. Each point represents one condition/individual sample. To provide a more intuitive reference, a mean 1× read-depth across the human genomic DNA requires 65 million 45 bp reads.
Comparison of allelic imbalance between mRNA transcriptome resequencing (RNA-seq) and two validation assays: locus-specific bacterial cloning (C-BASE) and PeakPicker. Both validation assays tested total RNA and genomic DNA. Allele 1 indicates the allele in the reference genome. RNA-seq P-values use a χ2 goodness-of-fit test for a 50–50 allelic ratio. For the C-BASE validation assay P-values test for equal allelic ratio in mRNA and gDNA using a 1df χ2 goodness-of-fit test.
| Sample | Gene | SNP | mRNA resequencing (RNA-seq) | Validation: C-BASE | Validation: PeakPicker | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| mRNA Counts | Total RNA Counts | gDNA counts | Ratio of normalized peak heights, allele 1 by allele 2 | |||||||||
| Allele 1 | Allele 2 | Allele 1 | Allele 2 | Allele 1 | Allele 2 | Total RNA | gDNA | |||||
| Individual 1, stimulated | rs2064068 (G/A) | 82 | 23 | 8.5 × 10−9 | 229 | 101 | 186 | 150 | 2.5 × 10−4 | 1.91 | 0.95 | |
| Individual 4, stimulated | rs2064068 (G/A) | 73 | 25 | 1.2 × 10−6 | 211 | 95 | 168 | 166 | 2.4 × 10−6 | 1.76 | 0.97 | |
| Individual 4, stimulated | rs10405893 (A/G) | 149 | 86 | 4 × 10−5 | 202 | 117 | 176 | 167 | 2 × 10−3 | 1.54 | 1.08 | |
| Individual 4, stimulated | rs1060819 (T/C) | 82 | 223 | 6.8 × 10−16 | 95 | 232 | 196 | 145 | 1.3 × 10−13 | 0.43 | 0.86 | |
Figure 3.Effect of mapping parameters (number of mismatches allowed for read mapping, and/or masking dbSNPs from the reference sequence set) on the allelic ratio distribution using the PE data for individual 2 stimulated at heterozygous dbSNPs passing quality checks and with read depth >50. Red crosses indicate significant allelic imbalance (P < 0.001). The bottom-right histogram has been generated allowing up to five mismatches and mapping reads to the masked reference set. Nb, number.
Figure 4.ASE data for individual 5 following sequence capture for genomic DNA (top) and RNA (bottom) at the FAM118A positive control known eQTL transcript. Crosses mark the read depth at each heterozygous SNP (only heterozygous SNPs with read depth >50 are shown). Vertical bars show the 99.9% confidence interval for the imbalance computed as the ratio between the most common and the least common of the two alleles for each heterozygous SNPs. Red bars mark heterozygous SNPs showing significant imbalance (at P < 0.001). The horizontal bar marks the 1:1 allelic ratio.