| Literature DB >> 33138777 |
Siddharth Sethi1, Ilya E Vorontsov2,3, Ivan V Kulakovskiy2,3,4, Simon Greenaway1, John Williams1,5,6, Vsevolod J Makeev2,3,7, Steve D M Brown1, Michelle M Simon8, Ann-Marie Mallon9.
Abstract
BACKGROUND: Efforts to elucidate the function of enhancers in vivo are underway but their vast numbers alongside differing enhancer architectures make it difficult to determine their impact on gene activity. By systematically annotating multiple mouse tissues with super- and typical-enhancers, we have explored their relationship with gene function and phenotype.Entities:
Keywords: Expression; Gene-phenotype prediction; Phenotypes; Protein-protein interactions; Super-enhancers; Tissue-specificity; Transcription factors; Typical-enhancers
Mesh:
Year: 2020 PMID: 33138777 PMCID: PMC7607678 DOI: 10.1186/s12864-020-07109-5
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Overview of TSREs identified in 22 mouse tissues. a Strong enhancers, b Active promoters: Heatmaps showing chromatin state posterior probability of tissue-specific regulatory elements (Taureg ≥ 0.85) (left) and their corresponding DNAse1 signal (right) in every tissue. Each row is a genomic location and columns represent different mouse tissues and cell lines. Grey columns show tissues for which data was not available. The heatmaps have been sorted by the order of the tissues across the columns. (BAT: Brown Adipose Tissue; Bmarrrow: Bone Marrow; BmarrowDm: Bone Marrow derived macrophage; CH12: B-cell lymphoma; Esb4: mouse embryonic stem cells; Es-E14: mouse embryonic stem cell line embryonic day 14.5; MEF: Mouse Embryonic Fibroblast; MEL: Leukaemia; Wbrain: Whole Brain). c Distribution of H3K27ac ChIP-seq signal over cerebellum-specific enhancers stitched together within 12.5 kb (n = 3741). Stitched cohesive units (x-axis) are ranked in an increasing order of their input-normalised H3K27ac signal (reads per million, y-axis). This approach identified 237 SEs (highlighted in blue) and 3504 TEs in cerebellum. d-e Metagene profile of mean H3k27ac ChIP-seq signal across all the SEs and TEs in cerebellum. The profiles are centred on the enhancer regions and the surrounding 2 kb regions around each enhancer is shown. The length of the enhancer region is scaled to represent the median size of SEs (22,600 bp) and TEs (600 bp) in cerebellum. The shaded area shows the standard error (SEM). f Distribution of constituent enhancers within SEs and TEs across all 22 tissues. See also Additional file 1: Figure S2-S5
Fig. 2SEs promote high transcriptional activity and drive tissue-specific expression in mouse. a Box plot showing the total-expression (in log-transformed RPKM) of different enhancer classes across 22 tissues. Each box plot shows the median, middle bar; interquartile range, the box; whiskers, 1.5 times the interquartile range. b Box plot showing the tissue-specific expression of different enhancer classes across 22 tissues. The p-values were calculated using Wilcoxon Rank Sum Test. c Distribution of genes within tissue-specific expression categories (low, intermediate and high) in different enhancer classes. Y-axis for each tissue displays the density of genes scaled across the tissues, but not across the enhancer classes. d Contribution of each enhancer class (in percentage) towards the total number of enhancer associated genes in the genome, categorised by their tissue-specific expression. e A schematic to illustrate the calculation of distinct enhancer tissue-types for each enhancer-associated gene. The number of distinct tissue types of various enhancers associated with the gene of interest are added to compute the number of enhancer tissue-types for a gene. f Heatmaps showing the number of enhancer tissue-types in SEC and TEC. Each row is an enhancer associated gene and columns represent its association with enhancers across 22 tissues and cell types. g Box plot showing the correlation between the number of enhancer tissue-types and tissue-specific expression of SEC and TEC. The trend lines (green: SEs; orange: TEs) were calculated using linear regression. See also Additional file 1: Figure S7 and S8
Fig. 3Mammalian phenotype and human disease ontology terms enriched in SEC and TEC. Listed are the most enriched mammalian phenotypes and human diseases among SEC and TEC in each tissue. The cells in the heatmap display the FDR (q-value) associated with the enriched terms and was calculated using the Benjamini-Hochberg method. The enrichment analysis was performed using ToppGene, which retrieves mouse phenotype annotations from MGD and human disease annotations from ClinVar, DisGenNet, GWAS and OMIM
Fig. 4Phenotype severity of SE and TE associated gene knockouts. Violin plots showing the percentage change (normalised effect size) in phenotype procedures measured between enhancer associated gene knockouts and wild-type controls. The area under the violin is proportionate to the number of data points in each category. The p-values were calculated using the Wilcoxon Rank Sum Test. All phenotyping procedures show no significant difference in phenotype severity between SECs and TECs apart from Acoustic Startle and Pre-pulse Inhibition. See also Additional file 1: Figure S11 and S12
Fig. 5Enhancer associated genes are connected in a dense interactome. The networks display protein-protein interaction maps of enhancer associated genes. Nodes in each network represent enhancer associated genes and edges represent potential protein-protein interactions. Genes associated with tissue-type relevant phenotypes are highlighted in pink and the shape of the node displays SE and TE associated genes (squares: SEC, circles: TEC). See also Additional file 1: Figure S13 and S14
Fig. 6Master regulators enriched in SE and TE constituent enhancers. Heatmap showing the top 3 enriched TFs identified within SEs and TEs in each tissue. The motifs associated with the enriched TFs are shown on the right. NA is shown for TFs with motifs not present in HOCOMOCO v11. The rows of the heatmap are clustered using hierarchal clustering. See also Additional file 1: Figure S15
Fig. 7Predicting gene-phenotype associations in mouse. a Summary of the various gene features (grouped according to their data sources) used to train the random forest classifier to predict gene-phenotype associations. b Bar plot comparing the predictive power of different random forest classifiers across various phenotypes. Error bars denote standard deviation. The classifier trained on all gene features performs the best for majority of the phenotype domains. c Receiver operating characteristic (ROC) curves comparing the performance of 10 random forest classifier models applied to predict genes associated with nervous system phenotype. d Feature importance chart of the best performing model (Exp + PPI + TSRE+TSRE_PPI + TF) showing the top 20 predictor variables important in nervous system phenotype predictions, as measured by the mean decrease in accuracy (x-axis). The PPI feature was identified to be the most important in predicting genes associated with nervous system phenotype, followed by expression in whole brain and cortex. Exp: expression; Enh: enhancer; Prom: promoter; TF: transcription factor. See also Additional file 1: Figure S16 and S17