| Literature DB >> 32728046 |
Jing Zhang1,2, Donghoon Lee1,2, Vineet Dhiman3,4, Peng Jiang5,6, Jie Xu7,8,9, Patrick McGillivray1,2, Hongbo Yang7,8, Jason Liu1,2, William Meyerson1,2, Declan Clarke1,2, Mengting Gu1,2, Shantao Li1,2, Shaoke Lou1,2, Jinrui Xu1,2, Lucas Lochovsky1,2, Matthew Ung10, Lijia Ma3,4,11, Shan Yu3,4, Qin Cao12, Arif Harmanci13, Koon-Kiu Yan1,2, Anurag Sethi1,2, Gamze Gürsoy1,2, Michael Rutenberg Schoenberg1,2, Joel Rozowsky1,2, Jonathan Warrell1,2, Prashant Emani1,2, Yucheng T Yang1,2, Timur Galeev1,2, Xiangmeng Kong1,2, Shuang Liu1,2, Xiaotong Li1,2, Jayanth Krishnan1,2, Yanlin Feng1,2, Juan Carlos Rivera-Mulia14,15, Jessica Adrian16, James R Broach9, Michael Bolt3,4, Jennifer Moran3,4, Dominic Fitzgerald3,4, Vishnu Dileep14, Tingting Liu7,8, Shenglin Mei17, Takayo Sasaki14, Claudia Trevilla-Garcia14,15, Su Wang17, Yanli Wang9, Chongzhi Zang18, Daifeng Wang19,20, Robert J Klein21, Michael Snyder16, David M Gilbert14, Kevin Yip12, Chao Cheng10,22, Feng Yue23,24,25, X Shirley Liu26, Kevin P White27,28,29, Mark Gerstein30,31,32,33.
Abstract
ENCODE comprises thousands of functional genomics datasets, and the encyclopedia covers hundreds of cell types, providing a universal annotation for genome interpretation. However, for particular applications, it may be advantageous to use a customized annotation. Here, we develop such a custom annotation by leveraging advanced assays, such as eCLIP, Hi-C, and whole-genome STARR-seq on a number of data-rich ENCODE cell types. A key aspect of this annotation is comprehensive and experimentally derived networks of both transcription factors and RNA-binding proteins (TFs and RBPs). Cancer, a disease of system-wide dysregulation, is an ideal application for such a network-based annotation. Specifically, for cancer-associated cell types, we put regulators into hierarchies and measure their network change (rewiring) during oncogenesis. We also extensively survey TF-RBP crosstalk, highlighting how SUB1, a previously uncharacterized RBP, drives aberrant tumor expression and amplifies the effect of MYC, a well-known oncogenic TF. Furthermore, we show how our annotation allows us to place oncogenic transformations in the context of a broad cell space; here, many normal-to-tumor transitions move towards a stem-like state, while oncogene knockdowns show an opposing trend. Finally, we organize the resource into a coherent workflow to prioritize key elements and variants, in addition to regulators. We showcase the application of this prioritization to somatic burdening, cancer differential expression and GWAS. Targeted validations of the prioritized regulators, elements and variants using siRNA knockdowns, CRISPR-based editing, and luciferase assays demonstrate the value of the ENCODE resource.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32728046 PMCID: PMC7391744 DOI: 10.1038/s41467-020-14743-w
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Overview of the ENCODEC resource.
Table columns list cell types and rows list assays. Blue table boundary: Cell types with assays in the ENCODE Encyclopedia highlight the breadth of the resource. The large number of cell types allows for comparative analyses between cell-types, as well as cell-type specific analyses. Green table boundary: cell-type specific analyses based on deep annotations of cell lines. The integration of assays allows for high-resolution investigation of genomic biology. Inset: we use annotations from cell-type specific ENCODE assays to build extended gene definitions—coding and non-coding elements that are linked according to their interaction and associated function (top). We relate transcription factors (TFs) and RNA binding proteins (RBPs) in a joint network hierarchy that describes their regulatory potential (middle). By comparing regulatory networks in tumor and normal ENCODE samples, we develop rewiring networks that may relate to regulatory changes that occur in the context of normal-to-tumor transition (bottom).
Fig. 2Regulatory network hierarchies.
a TFs and b RBPs are systematically organized into a hierarchy, forming a joint TF-RBP regulatory network. Higher layer elements tend to regulate lower layer elements. c The regulatory potentials of TFs/RBPs to drive tumor-to-normal expression changes are shown as a heatmap; red and blue indicate up- and down-regulation respectively. d Elevated MYC regulatory activity is associated with reduced disease-specific survival (DSS) in breast cancer (i); MYC knockdown in MCF-7 leads to significantly larger expression reduction in MYC target genes (ii). e MYC expression is more positively correlated with its target genes as compared to other TFs (top); MYC frequently forms FFLs with NRF1. These are mostly coherent FFLs and OR-gate logic predominates (bottom). f Elevated SUB1 regulation activity is associated with reduced overall survival (OS) in lung cancer (i); SUB1 knockdown in HepG2 leads to reduced target gene expression (ii); Targets of SUB1 show slower mRNA decay rate (iii); for cancer-associated target genes of MYC and SUB1, gene expression is decreased with both MYC and SUB1 knockdown (KD), compared with knockdown of either MYC or SUB1 individually, and compared to control (iv).
Fig. 3TF-Gene network rewiring.
Green and red arrows designate edge gain and loss, respectively. a Cell-type specific network using K562 and GM12878: top layer TFs significantly drive tumor-normal differential expression; bottom layer TFs are more often associated with burdened binding sites. b JUND is a top edge-gainer in CML, and its targets demonstrate increased gene expression. However, few of its binding sites are affected by SVs or SNVs. c Rewiring index in CML by direct edge counts using both proximal and distal networks (top) and by gene community analysis (bottom). Comparisons to TF-gene rewiring networks in other cancers are also shown.
Fig. 4Oncogenic transformation and cell state.
We project the expression profiles (left, poly-A long RNA-seq), proximal network (second from right, CTCF ChIP-seq), and distal network (right, candidate cis-regulatory elements) of the ENCODE cell types to a lower dimension space. Stem-like cell types formed a cluster, suggesting stem-like cell types have a distinct profile from normal and cancerous cell types. Further, we find that cancerous cell types tend to locate closer to stem-like clusters. Oncogene knockdown in K562 led to more transcriptomic similarity to a normal cell-type, and tumor suppressor gene (TSG) knockdown led to greater similarity to a tumor cell-type (second from left, top, in comparison to GM12878). In general, we find that oncogene knockdown leads to a slight reversion towards normal state along the stem-like component (second from left, bottom).
Fig. 5Extended genes and mutation burden analysis.
a Mutation status in extended genes can explain expression differences for a larger number of genes than other annotations, such as annotations of coding sequences (CDS). b A 130-kbp deletion in the breast cancer cell line T47D potentially links a distal enhancer to the promoter of ERBB4, leading to its activation. This change does not affect coding sequences, highlighting the value of an extended gene annotation. c Cancer-associated GWAS SNVs display greater enrichment with the inclusion of proximal and distal annotations in extended gene definitions. d Somatic structural variant breakpoints in K562 tend to be associated with the activating histone mark H4K20me1, but not in GM12878.
Fig. 6Variant prioritization and validation.
a A stepwise prioritization scheme for genomic regulators, elements, and variants, using the ENCODEC resources. At each step of prioritization, we indicate criteria for prioritization, as well as the applicable validation assay. b Small-scale validation of prioritized variants using a luciferase reporter assay. Candidate region 5 showed the most significant degree of differential expression and was selected for follow-up analysis. c Multiscale integrative analysis of candidate region 5 with assorted functional genomics data. The affected region is observed in the context of large-scale Hi–C linkages (top), as well as element-level signal tracks of histone modification marks and DNase hypersensitivity together with various TF binding events (middle), and nucleotide level disruption of the FOSL2 motif (bottom).