| Literature DB >> 36052164 |
Chuhui Wang1, Xueqing Zong1, Fanjie Wu1, Ricky Wai Tak Leung1,2, Yaohua Hu3, Jing Qin1.
Abstract
DNA- and RNA-binding proteins (DRBPs) typically possess multiple functions to bind both DNA and RNA and regulate gene expression from more than one level. They are controllers for post-transcriptional processes, such as splicing, polyadenylation, transportation, translation, and degradation of RNA transcripts in eukaryotic organisms, as well as regulators on the transcriptional level. Although DRBPs are reported to play critical roles in various developmental processes and diseases, it is still unclear how they work with DNAs and RNAs simultaneously and regulate genes at the transcriptional and post-transcriptional levels. To investigate the functional mechanism of DRBPs, we collected data from a variety of databases and literature and identified 118 DRBPs, which function as both transcription factors (TFs) and splicing factors (SFs), thus called DRBP-SF. Extensive investigations were conducted on four DRBP-SFs that were highly expressed in chronic myeloid leukemia (CML), heterogeneous nuclear ribonucleoprotein K (HNRNPK), heterogeneous nuclear ribonucleoprotein L (HNRNPL), non-POU domain-containing octamer-binding protein (NONO), and TAR DNA-binding protein 43 (TARDBP). By integrating and analyzing ChIP-seq, CLIP-seq, RNA-seq, and shRNA-seq data in K562 using binding and expression target analysis and Statistical Utility for RBP Functions, we discovered a two-layer regulatory network system centered on these four DRBP-SFs and proposed three possible regulatory models where DRBP-SFs can connect transcriptional and alternative splicing regulatory networks cooperatively in CML. The exploration of the identified DRBP-SFs provides new ideas for studying DRBP and regulatory networks, holding promise for further mechanistic discoveries of the two-layer gene regulatory system that may play critical roles in the occurrence and development of CML.Entities:
Keywords: DNA- and RNA-binding protein; alternative splicing regulatory network; chronic myeloid leukemia; splicing factor; transcription factor; transcriptional regulatory network
Year: 2022 PMID: 36052164 PMCID: PMC9425088 DOI: 10.3389/fmolb.2022.920492
Source DB: PubMed Journal: Front Mol Biosci ISSN: 2296-889X
FIGURE 1Overview of this study. We collected DRBPs from high-throughput, dedicated database, and annotation data. Then, the intersection of DRBP and splicing factor lists are defined as DRBP-SFs. To investigate the functions of DRBP-SFs, we carried out differential expression analysis, functional enrichment analysis and network construction for HNRNPK, HNRNPL, NONO and TARDBP in chronic myeloid leukemia. At last, we verified the model by text mining.
Data sources for DBPs and RBPs.
| Data source | Data type | Database | Features | Reference | Confidence of proteins only appearing in this database |
|---|---|---|---|---|---|
| High-throughput data | ChIP-seq | ReMap2022 | ReMap is a large-scale integrative analysis of DNA-binding experiments for |
| High |
| ChIPBase | ChIPBase, an integrated resource and platform for decoding transcription factor binding maps, expression profiles, and transcriptional regulation of long noncoding RNAs (lncRNAs, lincRNAs), microRNAs, other ncRNAs (snoRNAs, tRNAs, snRNAs, etc.), and protein-coding genes from the ChIP-seq data. |
| High | ||
| JASPAR | JASPAR is an open-access database of curated, nonredundant transcription factor (TF) binding profiles stored as position frequency matrices and TF flexible models for TFs across multiple species in six taxonomic groups. |
| High | ||
| ENCODE | The ENCODE Consortium is an international collaboration of research groups funded by the National Human Genome Research Institute. |
| High | ||
| CLIP-seq | ENCODE | The goal of ENCODE is to build a comprehensive parts list of functional elements in the human genome, including elements that act at the protein and RNA levels, as well as regulatory elements that control cells and circumstances in which a gene is active. |
| high | |
| StarBase | StarBase is designed for decoding Pan-Cancer and Interaction Networks of lncRNAs, miRNAs, competing endogenous RNAs (ceRNAs), RNA-binding proteins (RBPs), and mRNAs from large-scale CLIP-seq (HITS-CLIP,PAR-CLIP, iCLIP, CLASH) data and tumor samples. |
| High | ||
| CLIPdb | CLIPdb is a CLIP-seq database for protein–RNA interactions and aims to characterize the regulatory networks between RBPs and various RNA transcript classes by integrating large amounts of CLIP-seq (including HITS-CLIP, PAR-CLIP, and iCLIP as variations) datasets. |
| High | ||
| Dedicated database data | DBP | POSTAR | POSTAR is one of the largest and first integrative resources and platforms incorporating various post-transcriptional regulation events. It enables the experimental biologists to connect protein–RNA interactions with multilayer information of post-transcriptional regulation and functional genes and helps them generate novel hypotheses about the postregulatory mechanisms of phenotypes and diseases. |
| High |
| The Human Transcription Factors | The “HumanTFs” website displays the 1,639 known or likely human TFs, with a separate page for each TF, along with all known motifs and information and sequence alignments for each dielectric barrier discharge type. |
| High | ||
| CIS-BP | CIS-BP is an online library of transcription factors and their DNA-binding motifs. |
| High | ||
| AnimalTFDB | The Animal Transcription Factor DataBase (AnimalTFDB) is a resource aimed at providing the most comprehensive and accurate information for animal TFs and cofactors. |
| Medium | ||
| CISBP-RNA | CISBP-RNA is the online library of RNA-binding proteins and their motifs. |
| High | ||
| RBP | RBPbase | RBPbase is a database that integrates high-throughput RBP detection studies. |
| High | |
| RNA-binding proteins database (RBPDB) | RBPDB is a collection of experimental observations of RNA-binding sites. |
| High | ||
| ATtRACT | ATtRACT compiles information on 370 RBPs and 1583 RBP consensus–binding motifs, 192 of which are not present in any other database. |
| High | ||
| EuRBPDB | EuRBPDB is a comprehensive and user-friendly database for eukaryotic RBPs. It contains 315,222 RBPs (forms 6,368 ortholog groups) from 162 eukaryotic species, including human, mouse, fly, worm, and yeast. |
| Medium | ||
| Annotation data | DBP and RBP | QuickGO | QuickGO is a web-based tool that allows easy browsing of the Gene Ontology (GO) and all associated electronic and manual GO annotations provided by the GO Consortium annotation groups. |
| Medium |
| UniProt | The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality, and freely accessible set of protein sequences annotated with functional information. |
| Low |
FIGURE 2Gene Ontology (GO) annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis. (A) The top 20 GO terms in biological process, cellular components, and molecular functions. (B) The top 29 KEGG pathways.
FIGURE 3Two-layer regulatory network of Heterogeneous nuclear ribonucleoprotein K (HNRNPK). Pink indicates the target genes of the transcriptional regulatory network of HNRNPK, blue indicates the target genes of splicing regulatory network of HNRNPK, and purple indicates the co-regulated targets of HNRNPK. For more detailed information on genes in the network, please refer to Supplementary Table S7.
FIGURE 4Hypothetical two-layer network regulatory models of genes. (A) DNA- and RNA-binding proteins splicing factors (DRBP-SFs) may regulate the same genes at the transcriptional and splicing level as transcription factors (TFs) and SFs, respectively. (B) One DRBP-SF may act as SF to regulate the splicing of one gene with another SF controlled by this DRBP-SF in the transcriptional regulation level. (C) One DRBP-SF may act as a TF to regulate the transcription of one gene with another TF controlled by this DRBP-SF in the splicing regulation.
Target gene number of two-layer regulatory networks associated with HNRNPK, HNRNPL, NONO, and TARDBP.
| Gene symbol | The number of target genes in transcriptional regulatory network | The number of target genes in splicing regulatory network | The number of events in splicing regulatory network | The number of the same target genes in two networks |
|---|---|---|---|---|
| HNRNPK | 668 | 796 | 1,057 | 36 |
| HNRNPL | 587 | 775 | 1,035 | 30 |
| NONO | 304 | 760 | 1,072 | 37 |
| TARDBP | 639 | 2,197 | 3,863 | 132 |
FIGURE 5Co-regulatory gene network diagram of the 4 DNA- and RNA-binding proteins splicing factors (DRBP-SFs) and analysis of their Venn analysis. (A) Two-layer co-regulated gene network diagram of the four proteins; red indicates a gene co-regulated by all four proteins, gray indicates the genes co-regulated by three proteins, purple indicates the genes co-regulated by two proteins, blue indicates the genes regulated by heterogeneous nuclear ribonucleoprotein K, pink indicates the genes regulated by heterogeneous nuclear ribonucleoprotein L, green indicates the genes regulated by non-POU domain–containing octamer–binding protein, and yellow indicates the genes regulated by TAR DNA-binding protein 43. (B) Venn analysis diagram of the co-regulation of the four proteins.
FIGURE 6Target genes of the two-layer regulatory networks associated with heterogeneous nuclear ribonucleoprotein K (HNRNPK), heterogeneous nuclear ribonucleoprotein L (HNRNPL), non-POU domain–containing octamer–binding protein, and TAR DNA-binding protein 43 (TARDBP). Principal component analysis dimensionality reduction was performed on the expression datasets of Cells-Leukemia cell line (CML) and Whole Blood. (A) Target genes of the two-layer regulatory networks associated with HNRNPK, HNRNPL, NONO, and TARDBP in 3D. (B) Target genes of the two-layer regulatory networks associated with HNRNPK, HNRNPL, NONO, and TARDBP in 2D. (C) Target genes of the two-layer regulatory networks associated with HNRNPK. (D) Target genes of the two-layer regulatory networks associated with HNRNPL. (E) Target genes of the two-layer regulatory networks associated with NONO. (F) Target genes of the two-layer regulatory networks associated with TARDBP.
FIGURE 7Network diagram of regulation model Ⅱ. (A) Heterogeneous nuclear ribonucleoprotein K (HNRNPK) regulates the transcription of nuclear cap–binding protein subunit 2 (NCBP2), and then HNRNPK and NCBP2 can jointly regulate the splicing of genes. Green indicates the genes that act as splicing factors (SFs) regulated by HNRNPK at the transcription level, pink indicates the genes regulated by HNRNPK at the splicing level, purple indicates the genes co-regulated by HNRNPK and NCBP2 at the splicing level, and blue indicates the genes regulated by NCBP2 at the splicing level. Rectangles indicate the target genes of HNRNPK at the transcription level, and ellipse indicates the target gene of HNRNPK and NCBP2 at the splicing level. (B) The binding regions of HNRNPK and NCBP2 in splicing target genes, ACTB, AKT1, ASXL1, TCF3, TFRC, and VEGFA. Input the ChIP-seq data of HNRNPK and NCBP2 into the UCSC Genome Browser to obtain the position image of peak in the genome. Purple and red indicate the protein binding sites, and blue indicates the location of genes in the genome. For more detailed information on genes in the network, please refer to Supplementary Table S8.
FIGURE 8Network diagram of regulation model Ⅲ. (A) Heterogeneous nuclear ribonucleoprotein K (HNRNPK) regulates the splicing of scaffold attachment factor B1 (SAFB); HNRNPK and SAFB can jointly regulate the transcription of downstream genes. Green indicates genes that act as transcription factors regulated by HNRNPK at the splicing level, pink indicates genes regulated by HNRNPK at the transcriptional level, blue indicates genes regulated by SAFB at the transcriptional level, and purple indicates genes co-regulated by HNRNPK and nuclear cap–binding protein subunit 2 (NCBP2) at the transcriptional level. Ellipses indicate the target genes of SAFB and HNRNPK at the transcriptional level, and rectangles indicate target genes of HNRNPK at the splicing level. (B) The binding regions of HNRNPK and SAFB in transcriptional regulation target genes, BCL6 corepressor, CRK-like proto-oncogene adaptor protein, DNA methyltransferase 3 beta, enhancer of zeste 2 polycomb repressive complex 2 subunit, and fibroblast growth factor receptor 3. Input the ChIP-seq data of HNRNPK and SAFB into the UCSC Genome Browser to obtain the position image of peak in the genome. Purple and red indicate the protein binding sites, and blue indicates the location of genes in the genome. For more detailed information on genes in the network, please refer to Supplementary Table S9.