| Literature DB >> 30804403 |
Marco Cavalli1, Nicholas Baltzer2, Husen M Umer2, Jan Grau3, Ioana Lemnian3, Gang Pan1, Ola Wallerman1, Rapolas Spalinskas4, Pelin Sahlén4, Ivo Grosse3,5, Jan Komorowski2,6, Claes Wadelius7.
Abstract
Several Genome Wide Association Studies (GWAS) have reported variants associated to immune diseases. However, the identified variants are rarely the drivers of the associations and the molecular mechanisms behind the genetic contributions remain poorly understood. ChIP-seq data for TFs and histone modifications provide snapshots of protein-DNA interactions allowing the identification of heterozygous SNPs showing significant allele specific signals (AS-SNPs). AS-SNPs can change a TF binding site resulting in altered gene regulation and are primary candidates to explain associations observed in GWAS and expression studies. We identified 17,293 unique AS-SNPs across 7 lymphoblastoid cell lines. In this set of cell lines we interrogated 85% of common genetic variants in the population for potential regulatory effect and we identified 237 AS-SNPs associated to immune GWAS traits and 714 to gene expression in B cells. To elucidate possible regulatory mechanisms we integrated long-range 3D interactions data to identify putative target genes and motif predictions to identify TFs whose binding may be affected by AS-SNPs yielding a collection of 173 AS-SNPs associated to gene expression and 60 to B cell related traits. We present a systems strategy to find functional gene regulatory variants, the TFs that bind differentially between alleles and novel strategies to detect the regulated genes.Entities:
Year: 2019 PMID: 30804403 PMCID: PMC6389883 DOI: 10.1038/s41598-019-39633-0
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1AS-SNPs identification in LCLs. 17293 unique AS-SNPs were identified running our pipeline on ChIP-seq data in 7 individual lymphoblastoid cell lines or across the cell lines (7LCLs). The collection of AS-SNPs was intersected with GWAS SNPs associated to B cell related traits and with SNPs in LD with them yielding a subset of 237 AS-SNPs associated to B cell related traits. The same approach was applied using eQTL SNPs defined in B cells and SNPs in LD with them yielding a subset of 714 AS-SNPs associated to gene expression in B cells. 58 AS-SNPs were associated to both gene expression and B cell related traits. These subsets of AS-SNPs showing evidences of biological relevance were further intersected with HiC (contact and loop domains) and HiCap data, and overlapped to TFBS predicted using PMMs.
Number of AS-SNPs identified per cell line.
| Cell line | Total hz SNPs | Total AS-SNPs | Common AS-SNPs (%) | Rare AS-SNPs (%) | Population |
|---|---|---|---|---|---|
| GM12878 | 2221835 | 5318 | 5150 (96,8%) | 168 (3,2%) | Caucasian |
| GM18505 | 2976970 | 2132 | 1936 (90,8%) | 196 (9,2%) | Yoruban |
| GM18526 | 2091055 | 1961 | 1883 (96,0%) | 78 (4%) | Han Chinese |
| GM18951 | 2092387 | 1224 | 1195 (97,6%) | 29 (2,4%) | Japanese |
| GM19099 | 2978401 | 1904 | 1735 (91,1%) | 169 (8,9%) | Yoruban |
| GM19238 | 2964106 | 5580 | 5112 (91,6%) | 468 (8,4%) | Yoruban |
| GM10847 | 2560997 | 1457 | 1401 (96,1%) | 56 (3,9%) | Caucasian |
| 7LCLs | 13025 | 11892 (91,3%) | 1133 (8,7%) | ||
| Tot. unique | 17293 | 16094 (93%) | 1199 (7%) |
Figure 2Definition of AS-SNPs at a population level. ChIP-seq reads were aligned to personalized genomes for each cell line and summed across cell lines at heterozygous positions. The resulting allele specific read count was tested for statistical significance to identify AS-SNPs defined at a population level. Case1: common AS-SNPs already observed in individual cell line maintain the statistically significant difference in reads count between the alleles also at a population level. Case2: common SNPs that are not showing statistically significant difference in read counts between the two alleles in a single cell line might surface as AS-SNPs when looking for AS binding in the population.
Figure 3UCSC Genome browser view of the 26 AS-SNPs (green dots) in high LD with the GWAS SNP rs9272346 (yellow pin) associated to Type 1 Diabetes. Histone modifications tracks for H3K4me1, H3K4me3 and H3K27ac were retrieved from the ENCODE project for the B cell line GM12878 (scaled using vertical viewing range settings) as well as the RepeatMasker track which reports DNA sequences with interspread repeats (e.g. SINE, LINE, etc.) and low complexity DNA sequences. On the bottom track, transcription factors binding sites from ChIP-seq data from the ENCODE project for GM12878 (G) and other B cell lines (g). The insert on the top zooms into the first intron of HLA-DQA1.
Figure 4Schematic representation of using 3D interactions data to prioritize candidate target genes. (A) AS-SNP (red pin) in a regulatory enhancer harbored in a TAD loop defined by HiC data. The genomic architecture reduces the pool of possible target genes to genes enclosed in the TAD (genes 1–4). HiCap analyses allow narrowing the list of putative target genes even further evaluating specific probe interactions, pointing in this example to gene 2 as the likely target for the enhancer element. (B) The presence of an AS-SNP in a TAD interaction domain may lead to an altered assembly of the TAD formation protein complex (e.g. CTCF, cohesin, etc. represented in green) resulting in a different TAD architecture. In this example the disruption of TAD1 extends the list of putative target genes to genes 1–5.
Figure 5Multi-layered evidences for a candidate functional AS-SNP in LD with an eQTL SNP. AS-SNP rs724016 is in LD with an eQTL SNP (rs1344672) associated to the expression of ZBTB38. The SNPs are located in a genomic region with a multi loop TAD architecture defined by HiC data (purple interactions) that narrows the possible target genes to two candidates: ZBTB38 and RASA2. HiCap data (green interaction) show that the probe harboring rs724016 interacts with the promoter of ZBTB38 confirming the association observed in the eQTL study. The distal probe region (~5 kb) is highlighted in gray. Histone modifications tracks for H3K4me1, H3K4me3 and H3K27ac were retrieved from the ENCODE project for the B cell line GM12878 (scaled using vertical viewing range settings) as well as the DNaseI hypersensitive clusters (DHS). The PMM tracks show the networks of intra-motif dependencies at each position using PMMs with defined parameters. The sequence logos for the TF binding motif of BATF has been computed from putative BSs predicted by PMMs.
Figure 6Multi-layered evidence for candidate functional AS-SNPs in LD with two eQTL SNPs. AS-SNP rs1257573 is in LD with two eQTL SNPs (rs2182909 and rs568515) associated to the expression of CR2. The SNPs are located in a genomic region with a multi loop TAD architecture defined by HiC data (purple interactions) that narrows the possible target genes to few candidates. HiCap data (green interaction) showed an interaction between the promoter of CD55 and the regulatory element harboring the AS-SNP rs1257573. The distal probe region (~5 kb) is highlighted in gray. Histone modifications tracks for H3K4me1, H3K4me3 and H3K27ac were retrieved from the ENCODE project for the B cell line GM12878 (scaled using vertical viewing range settings) as well as the DNaseI hypersensitive clusters (DHS).The PMM tracks show the networks of intra-motif dependencies at each position using PMMs with defined parameters. The sequence logo for the TF binding motif of PRD14 has been computed from putative BSs predicted by PMMs.