| Literature DB >> 28062664 |
Christopher A Odhams1, Andrea Cortini1, Lingyan Chen1, Amy L Roberts1, Ana Viñuela2, Alfonso Buil, Kerrin S Small2, Emmanouil T Dermitzakis, David L Morris1, Timothy J Vyse1,3, Deborah S Cunninghame Graham1.
Abstract
Studies attempting to functionally interpret complex-disease susceptibility loci by GWAS and eQTL integration have predominantly employed microarrays to quantify gene-expression. RNA-Seq has the potential to discover a more comprehensive set of eQTLs and illuminate the underlying molecular consequence. We examine the functional outcome of 39 variants associated with Systemic Lupus Erythematosus (SLE) through the integration of GWAS and eQTL data from the TwinsUK microarray and RNA-Seq cohort in lymphoblastoid cell lines. We use conditional analysis and a Bayesian colocalisation method to provide evidence of a shared causal-variant, then compare the ability of each quantification type to detect disease relevant eQTLs and eGenes. We discovered the greatest frequency of candidate-causal eQTLs using exon-level RNA-Seq, and identified novel SLE susceptibility genes (e.g. NADSYN1 and TCF7) that were concealed using microarrays, including four non-coding RNAs. Many of these eQTLs were found to influence the expression of several genes, supporting the notion that risk haplotypes may harbour multiple functional effects. Novel SLE associated splicing events were identified in the T-reg restricted transcription factor, IKZF2, and other candidate genes (e.g. WDFY4) through asQTL mapping using the Geuvadis cohort. We have significantly increased our understanding of the genetic control of gene-expression in SLE by maximising the leverage of RNA-Seq and performing integrative GWAS-eQTL analysis against gene, exon, and splice-junction quantifications. We conclude that to better understand the true functional consequence of regulatory variants, quantification by RNA-Seq should be performed at the exon-level as a minimum, and run in parallel with gene and splice-junction level quantification.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28062664 PMCID: PMC5409091 DOI: 10.1093/hmg/ddw417
Source DB: PubMed Journal: Hum Mol Genet ISSN: 0964-6906 Impact factor: 6.150
Details of genotype-expression (eQTL) cohorts used in study
| Cohort Name | |||||
|---|---|---|---|---|---|
| Total subjects | 856 | 373 | |||
| Ethnicity | EUR (UK) | EUR (CEU, GBR, FIN, TSI) | |||
| Sex | F | M/F | |||
| Age | 37–85 | ||||
| Investigation | Comparison of candidate-causal eQTL and eGene detection between microarray and RNA-Seq | Validation and comparison of LCL RNA-Seq discoveries in whole blood | Identification of asQTLs using RNA-Seq | ||
| Citation | |||||
| Expression profile type | Microarray | RNA-Seq | RNA-Seq | RNA-Seq | RNA-Seq |
| Unit of expression | Probe | Gene | Meta-exon | Meta-exon | Splice-junction |
| Cell-type | LCL | LCL | LCL | Whole Blood | LCL |
| Subjects used in analysis | 777 | 683 | 765 | 384 | 373 |
| Data format | Genevar (summary results) | Read-count | Summary eQTL results | Summary eQTL results | Raw sequence alignments |
| RNA Platform | Illumina HT-12 V3 | Illumina HiSeq2000 | Illumina HiSeq2000 | Illumina HiSeq2000 | |
| RNA-Seq mapper | BWA v0.5.9 (GRCh37/hg19) | BWA v0.5.9 (GRCh37/hg19) | GEM v1.349 (GRCh37/hg19) | ||
| Reference transcriptome | GENCODE V10 | GENCODE V10 | GENCODE V10 | ||
| RNA-Seq read length | 49-bp PE | 49-bp PE | 75-bp PE | ||
Breakdown of genotype-expression (eQTL) cohorts used in analysis. TwinsUK cohort in lymphoblastoid cell lines (LCLs) used for microarray and RNA-Seq comparison (profiled at gene and meta-exon resolution); meta-exons are described as non-redundant overlapping portions of exons generated flattening of the transcriptome annotation. All TwinsUK (MuTHER) samples used in analysis are derived from the original 856 individuals. Validation of LCL data in whole blood carried out at meta-exon level using 384 of the 856 individuals. Geuvadis cohort used for asQTL identification; splice-junction quantifications were generated by Altrans (57) from the raw sequence alignments. Summary eQTL results include only the eQTL association results per test (where full genotype and expression data were not obtainable).
Figure 1.Heatmap of candidate-causal eQTLs and eGenes detected across the four expression-quantification types. Relative association P-values are shown. If a candidate-causal association (marked *) is identified in at least one quantification type, then the P-value is shown for all quantifications (no * means the association is not candidate-causal). Rows are ordered by decreasing cumulative significance. To normalize across quantification types, relative significance of each association per column was calculated as the –log2 (P/Pmax); where Pmax is the most significant association per quantification type. Data used for heatmap are found in Supplementary Materials, Tables S2, S3, S4, S8 for microarray, gene-level, exon-level, and splice-junction level eQTL analysis respectively.
Figure 2.Gene-level and exon-level analysis implicate NADSYN1 as a candidate-causal eGene. (A) eQTL analysis of rs3794060 reveals the risk variant [C] leads to down-regulation at the gene-level of NADSYN1. (B) Exon-level quantification leads to inference of gene-level effect being driven by expression disruption of two meta-exons of NADSYN1 (meta-exon 11 and meta-exon 12). Association P-values of rs3794060 against exon quantifications are plotted with reference to the specific exon in the collapsed-gene model of NADSYN1 (all annotated transcripts combined).
Figure 3.Non-coding candidate-causal eGenes detected by exon-level RNA-Seq. Three panels denote the eQTLs and corresponding non-coding eGenes identified from eQTL analysis against exon-level quantifications. The top panels display the signal from the GWAS association plotted as –log10 (P), with the exon-level eQTL P-values for the effects showing colocalisation with the GWAS signal. The bottom panel shows RNA-Seq expression from ENCODE (GM12878). (A) rs2431697 is a candidate-causal eQTL for MIR146A. (B) rs2736340 is a candidate-causal eQTL for RP11-148O21.4 and RP11-148O21.2. (C) rs3794060 is a candidate-causal eQTL for RP11-6OL16.2.
Figure 4.Novel eGene IKZF2 and potential causal mechanism using splice-junction quantification. asQTL analysis of rs3768792 against splice-junction quantifications identifies IKZF2 as a candidate-causal eGene with risk variant [G] causing upregulation of the exon 6A–exon 6B junction that is unique to truncated isoform ENST00000413091. A) GWAS association signal across the IKZF2 locus (chr2q34), tagged by rs3768792 localised in the 3′-UTR of IKZF2. asQTL association signal of rs3768792 against splice-junction quantification of exon 6A–exon 6B shows significance and colocalisation with the GWAS signal. B) The exon 6A–exon 6B junction is unique to truncated isoform ENST00000413091. Exon 6B harbours a premature stop-codon and therefore is not translated into the full-length protein that contains the dimerization domains in exon 8. C) Close-up of the exon 6A–exon 6B junction and association (P = 3.80 × 10−05) with GWAS SNP rs3768792. A potential causal asQTL in near-perfect LD was identified that is located within the polypyrimidine tract of the junction and may induce splicing (rs2291241, P = 1.70 × 10−05).
Figure 5.Identification of splicing mechanism in WDFY4. (A) Our SLE GWAS indicates WDFY4 as the candidate gene at the chr10q11.23 locus tagged by intronic variant rs2663052, as well as the missense coding variant rs7097397 in exon 31 that is in strong LD. Cis-eQTL analysis showed rs2663052 is correlated with upregulation of the exon 34A–34B junction of WDFY4 (signal is colocalised with GWAS) that is unique to the short isoform (ENST00000374161). This isoform lacks the two enzymatic WD40 domains of the full length isoform (ENST00000325239). (B) Two potential functional mechanisms may occur when harbouring the risk haplotype that carries both risk alleles. Firstly, an Arg to Gln amino-acid substitution by rs7097397 in exon 31 that is shared by both the canonical and short isoforms of WDFY4, and secondly an upregulation of the short isoform (P = 3.31 × 10−19) that lacks functional domains, caused by rs2663052 or correlated variants, with corresponding down-regulation of the full-length isoform (P = 3.01 × 10−06).
Summary of novel candidate-causal eQTLs and eGenes
| eQTL | eGene | Gene Function Summary | GTEx Tissue Expression | GWAS Catalog Traits |
|---|---|---|---|---|
| rs17167273 | Transcriptional activator involved in T-cell lymphocyte differentiation. Necessary for the survival of CD4(+) CD8(+) immature thymocytes. | LCLs, Spleen, Whole Blood | Multiple sclerosis | |
| rs3768792 | This gene encodes a member of the Ikaros family of zinc-finger proteins. Three members of this protein family (Ikaros, Aiolos and Helios) are hematopoietic-specific transcription factors involved in the regulation of lymphocyte development. | LCLs, Whole Blood | Eosinophil counts | |
| rs10774625 | E3 ubiquitin-protein ligase which accepts ubiquitin from an E2 ubiquitin-conjugating enzyme in the form of a thioester and then directly transfers the ubiquitin to targeted substrates. | Cerebellum, Cerebellar Hemisphere, Thyroid | Metabolite levels, HDL cholesterol, Esophageal cancer | |
| rs3794060 | Known antisense RNA. | – | – | |
| Nicotinamide adenine dinucleotide (NAD) is a coenzyme in metabolic redox reactions, a precursor for several cell signaling molecules, and a substrate for protein posttranslational modifications. | Spleen, Colon, Terminal Ileum | Vitamin D insufficiency | ||
| rs2431697 | microRNA 146a. | LCLs | – | |
| rs3024505 | Functions as a receptor for the Fc fragment of IgA and IgM. Binds IgA and IgM with high affinity and mediates their endocytosis. May function in the immune response to microbes mediated by IgA and IgM. | Kidney, Liver, Terminal Ileum | – | |
| Inhibits the synthesis of a number of cytokines, including IFN-gamma, IL-2, IL-3, TNF and GM-CSF produced by activated macrophages and by helper T-cells. | LCLs, Spleen, Whole Blood | Inflammatory bowel disease, Ulcerative Colitis, Crohn's disease | ||
| This gene encodes a member of the IL10 family of cytokines. Overexpression of this gene leads to elevated expression of several GADD family genes, which correlates with the induction of apoptosis. | Spleen, LCLs, Whole Blood | Inflammatory bowel disease, Alzheimer's disease | ||
| rs2476601 | 5-3 exonuclease that plays a central role in telomere maintenance and protection during S-phase. Participates in the protection of telomeres against non-homologous end-joining (NHEJ)-mediated repair. | LCLs, Fibroblasts, Cerebellar Hemisphere | Rheumatoid arthritis, Type 1 diabetes autoantibodies | |
| Cooperates with PTEN to modulate the kinase activity of AKT1. Its interaction with PTPRB and tyrosine phosphorylated proteins suggests that it may link receptor tyrosine phosphatase with its substrates at the plasma membrane. | Thyroid, Cerebellar Hemisphere, Lung | Rheumatoid arthritis, Type 1 diabetes autoantibodies | ||
| rs2663052 | WDFY family member 4. | LCLs, Spleen, Whole Blood | Rheumatoid arthritis, Stroke | |
| rs2736340 | Known antisense RNA. | Spleen, LCLs, Terminal Ileum | – | |
| Known antisense RNA. | Spleen, LCLs, Terminal Ileum | – | ||
| rs2396545 | Has calcium-dependent phospholipid scramblase activity; scrambles phosphatidylserine, phosphatidylcholine and galactosylceramide. | Terminal Ileum, Colon, Skin | – | |
| rs2289583 | Accepts ubiquitin from the E1 complex and catalyzes its covalent attachment to other proteins. In vitro catalyzes Lys-48-linked polyubiquitination. | Colon, Esophagus, Bladder | Chronic kidney disease, Urate levels |
Candidate-causal eGenes detected by RNA-Seq that have not been documented in previous microarray analyses in LCLs and other primary immune-cell types. Gene Function Summary is taken from a combination of Entrez gene and UNIProt annotation. GTEx tissue expression reports the top three tissue types where the gene is most expressed. The top three traits from GWASs where the gene is reported as the candidate gene is also given.
Found with microarray as well, but RNA-Seq allows for detection of novel alternative-splicing mechanism.