| Literature DB >> 29662164 |
John B Harley1,2,3,4,5, Xiaoting Chen6, Mario Pujato6, Daniel Miller6, Avery Maddox6, Carmy Forney6, Albert F Magnusen6, Arthur Lynch6, Kashish Chetal7, Masashi Yukawa8, Artem Barski9,8,10, Nathan Salomonis9,7, Kenneth M Kaufman6,11,9,12, Leah C Kottyan13,14, Matthew T Weirauch15,16,17,18.
Abstract
Explaining the genetics of many diseases is challenging because most associations localize to incompletely characterized regulatory regions. Using new computational methods, we show that transcription factors (TFs) occupy multiple loci associated with individual complex genetic disorders. Application to 213 phenotypes and 1,544 TF binding datasets identified 2,264 relationships between hundreds of TFs and 94 phenotypes, including androgen receptor in prostate cancer and GATA3 in breast cancer. Strikingly, nearly half of systemic lupus erythematosus risk loci are occupied by the Epstein-Barr virus EBNA2 protein and many coclustering human TFs, showing gene-environment interaction. Similar EBNA2-anchored associations exist in multiple sclerosis, rheumatoid arthritis, inflammatory bowel disease, type 1 diabetes, juvenile idiopathic arthritis and celiac disease. Instances of allele-dependent DNA binding and downstream effects on gene expression at plausibly causal variants support genetic mechanisms dependent on EBNA2. Our results nominate mechanisms that operate across risk loci within disease phenotypes, suggesting new models for disease origins.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29662164 PMCID: PMC6022759 DOI: 10.1038/s41588-018-0102-3
Source DB: PubMed Journal: Nat Genet ISSN: 1061-4036 Impact factor: 38.330
Figure 1Intersection between autoimmune risk loci and TF binding interactions with the genome
a. Results for SLE risk loci. X-axis displays SLE-associated loci. Y-axis displays the top 25 TFs, based on RELI P-values, sorted by the number of loci. A colored box indicates that the given locus contains at least one SLE-associated variant located within a ChIP-seq peak for the given TF. The most significant ChIP-seq dataset cell type is indicated in parentheses. TFs that participate in “EBNA2 super-enhancers”[25] are colored red. The red rectangle identifies those loci and TFs that optimally cluster together (see Online Methods). Bottom panel, left: comparison of EBV-infected B cell lines (grey bars) to EBV negative B cells (white bars). The Y-axis shows the distribution of the RELI –log (Pcs) for each of the eight TFs with available data. Bars indicate mean. Error bars indicate standard deviation. Numbers indicate number of datasets. Horizontal line indicates the Pc<10−6 RELI significance threshold. Bottom panel, right: The top 10 TFs (based on RELI Pc-values) with data available in at least one EBV-infected B cell line (grey bars) and at least one other cell type (white bars). b–g. Results for the other six EBNA2 disorders. Full results are available in Supplementary Data Set 5.
Figure 2Properties of EBNA2-bound autoimmune disease loci
a. Schematic of the RELI algorithm. See Online Methods for details. b. TFs intersecting autoimmune risk loci occupied by EBNA2. RELI was re-executed using EBNA2 disorder variants intersecting EBNA2 ChIP-seq peaks as input. Top TFs are indicated. NFκB subunits are shown in red. Basal transcriptional machinery proteins are shown in blue. c. Most EBNA2-occupied loci are associated with only a single EBNA2 disorder. EBNA2-bound loci were categorized by the number of EBNA2 disorders with which the given locus is associated (X-axis). d. Functional properties of EBNA2 disorder EBNA2-occupied loci. Functional importance of EBNA2-occupied loci, assessed with four criteria. In each panel, variants are segregated into two categories – common variants (left bars) and common variants associated with at least one EBNA2 disorder (right bars). Each category is divided into three types of variants (see key). The Y-axis of each plot indicates the percent of variants in each group that are, for example, eQTLs in EBV-infected B cells (top left plot). Error bars indicate the standard deviation obtained from sampling (with replacement) of 50% of the variants. Values below indicate number of variants. Horizontal bars at the top indicate sampling-derived P-values based on Welch’s one-sided t-test.
Intersection of TF ChIP-seq datasets with multiple genetic loci of diseases and phenotypes.
| Phenotype | Cell line | TF | Number | Fraction | RR | Pc & P |
|---|---|---|---|---|---|---|
| Prostate Ca | VCaP+Dht_18hr | AR | 17 | 0.33 | 3.70 | 2.60E-07 |
| Breast Ca | MCF7+Estradiol | GATA3 | 22 | 0.36 | 3.87 | 7.45E-11 |
| MS | Mutu | EBNA2 | 44 | 0.40 | 4.66 | 6.34E-30 |
| SSc | Mutu | EBNA2 | 2 | 0.10 | - | NS |
| SSc | IB4 | EBNA2 | 1 | 0.05 | - | NS |
| SSc | GM12878 | EBNA2 | 0 | 0.00 | - | NS |
| SLE | Mutu | EBNA2 | 26 | 0.49 | 5.96 | 1.09E-25 |
| SLE | IB4 | EBNA2 | 10 | 0.19 | 7.46 | 1.09E-11 |
| SLE | GM12878 | EBNA2 | 10 | 0.19 | 8.57 | 1.94E-13 |
| SLE | IB4 | EBNA-LP | 4 | 0.08 | - | NS |
| SLE | Mutu | EBNA3C | 5 | 0.09 | - | NS |
| SLE | Raji | EBNA1 | 0 | 0.00 | - | NS |
| SLE | Akata | Zta | 0 | 0.00 | - | NS |
| SLE | Mutu | EBNA2 | 25 | 0.63 | 2.85 | 1.81E-11 |
| SLE | IB4 | EBNA2 | 10 | 0.25 | 3.61 | 2.44E-06 |
| SLE | GM12878 | EBNA2 | 10 | 0.25 | 4.97 | 1.22E-09 |
Detailed results are presented in Supplementary Data Set 3.
RELI null model limited to EBV-infected B cell line open chromatin regions (see text).
RR = ‘relative risk’. Pc = RELI Bonferroni corrected P-value. NS = Pc>10E-6. All disease ancestries are European. Ca = cancer. MS = multiple sclerosis. SSc = systemic sclerosis. SLE = systemic lupus erythematosus.
Figure 3Allele-dependent binding of EBNA2 to autoimmune-associated genetic variants
a. Theoretical models presenting possible allele-dependent action of EBNA2. See text for discussion. b. Allele-dependent co-binding of EBNA2 with multiple proteins. ChIP-seq datasets from EBV-infected B cell lines were examined for evidence of allele-dependent binding at heterozygotes. Datasets are sorted by the proportion of EBNA2 GM12878 allele-dependent events (MARIO ARS value > 0.40, see Online Methods) that favor the same allele (X-axis). Values (N) indicate total number of variants. c. Allele-dependent binding of EBNA2 and human proteins at the Top to bottom: chromosomal band (multi-colored bar), location of EBV-infected B cell line ChIP-seq peaks for various TFs, location of rs3794102 variant, allele-dependent binding events (green bars). X-axis indicates the preferred allele, along with a value indicating the strength of the allelic behavior, calculated as one minus the ratio of the weak to strong reads (e.g., 0.5 indicates the strong allele has approximately twice the reads of the weak allele). d. Allele and EBV-dependent expression of . Allelic qPCR of CD44 expression in EBV-infected and EBV negative Ramos B cells (see key). Fold-change in expression is provided relative to the C allele. Error bars represent standard deviation (n=12: three independent experiments of technical quadruplicates). P-values were calculated using a two-way ANOVA with a Tukey post-hoc test. EBV status and variant genotype were used as the two factors.
Allele-dependent binding of EBNA2 to autoimmune-associated genetic variants.
| Gene(s) | rs ID | ARS | Reads (Str.) | Reads (Weak) | Str. Base | Disease(s) |
|---|---|---|---|---|---|---|
| rs9271693# | 0.66 | 27 | 3 | C | IBD, UC, Lung cancer | |
| rs9271588# | 0.50 | 22 | 11 | C | SjS[ | |
| rs996032# | 0.65 | 27 | 6 | A | SLE (AS) | |
| rs2401138 | 0.63 | 48 | 20 | C | V | |
| rs2382818# | 0.61 | 31 | 12 | A | IBD | |
| rs7198004 | 0.59 | 16 | 0 | G | SLE | |
| rs998592 | 0.50 | 10 | 0 | C | SLE | |
| rs3794102# | 0.58 | 30 | 13 | G | V | |
| rs1465697# | 0.57 | 57 | 29 | C | MS | |
| rs2736335 | 0.53 | 19 | 8 | A | KD, KD (AS), SLE, SLE (AS), SLE (multi) | |
| rs3129763 | 0.52 | 11 | 0 | A | CLL, SSc | |
| rs947474 | 0.52 | 11 | 0 | A | T1D, RA[ | |
| rs2233287 | 0.52 | 17 | 7 | G | SSc | |
| rs13136820 | 0.52 | 141 | 86 | T | GD | |
| rs73318382 | 0.50 | 10 | 0 | A | SLE, SLE (AS), SLE (multi) | |
| rs34437200 | 0.49 | 10 | 2 | A | CelD, IBD, JIA, MS | |
| rs194749# | 0.47 | 11 | 4 | C | IBD, T1D | |
| rs532098# | 0.41 | 24 | 15 | G | SLE | |
| rs674313 | 0.41 | 24 | 15 | G | CLL, SSc | |
| rs1250567 | 0.41 | 8 | 3 | T | MS | |
| rs1738074 | 0.40 | 47 | 32 | T | CelD |
All ChIP-seq results are from Mutu cells, except for the RMI2 locus, which is from GM12878 cells. Additional data are available in Supplementary Data Set 7. Each variant was assigned to a gene (column 1) as follows. If the variant is located within the promoter (+/− 5kb) of a gene expressed in EBV-infected B cells (median RPKM of 2 or more based on GTEx[49] data), assign it to that gene (indicated with ‘*’). Otherwise, if the variant is located within a Hi-C chromatin looping region in GM12878 EBV-infected B cells[50], assign it to the closest interacting gene that is expressed in EBV-infected B cells (indicated with ‘^^’). Otherwise, if the variant is located within a Hi-C chromatin looping region in primary B cells[51], assign it to the closest interacting gene that is expressed in EBV-infected B cells (indicated with ‘^’). Otherwise, assign the variant to the nearest gene that is expressed in EBV-infected B cells. Variants marked with ‘#’ are eQTLs for the indicated gene in at least one EBV-infected B cell dataset[49,52–59]. “ARS”: Allelic Reproducibility Score. “Reads (Strong (Str.))” and “Reads (Weak)” indicate the number of ChIP-seq reads mapping to the strong and weak allele, respectively. All disease associations are taken from the original disease lists (see Supplementary Data Set 1), with the exception of two additional associations (citations provided). GWAS results are of European ancestry, except as indicated (East Asian (AS)). Disease abbreviations: MS, multiple sclerosis; IBD, inflammatory bowel disease; UC, ulcerative colitis; SLE, systemic lupus erythematosus; CLL, chronic lymphocytic leukemia; SSc, systemic sclerosis; SjS, Sjögren’s syndrome; CelD, celiac disease; V, vitiligo; KD, Kawasaki’s disease; T1D, type 1 diabetes; GD, Graves’ disease; JIA, juvenile idiopathic arthritis.
Figure 4Cell types and TFs at disease-associated loci
a. SLE variants significantly intersect H3K27ac-marked regions in EBV-infected B cells. H3K27ac ChIP-seq peaks were collected from 175 different cell lines and types. The Y-axis indicates the negative log of the RELI P-value for the intersection of SLE-associated variants with H3K27ac peaks in each dataset. b. SLE variants intersect active chromatin regions in EBV-infected B cells. Same as (a), but instead using “active chromatin” regions, which are based on combinations of histone marks[44]. c. Global view of RELI results – all diseases against all TFs. Columns and rows show the 94 phenotypes/diseases and 212 TFs with at least one significant (Pc<10−6) RELI result. Color indicates negative log of the RELI P-value (see key). Disease abbreviations are provided in the main text. d. TFs intersecting breast cancer loci. Intersection between disease loci with TF-bound DNA sequences, as in Figure 1. However, here the cluster of TFs and risk loci instead largely may operate in ductal epithelial cells, independently of EBNA2. The top 20 TFs are shown - full results are provided in Supplementary Data Set 3.