| Literature DB >> 35782725 |
Xiaoying Wang1, Cankun Wang2, Lang Li2, Qin Ma2, Anjun Ma2, Bingqiang Liu1.
Abstract
Cis-regulatory motif (motif for short) identification and analyses are essential steps in detecting gene regulatory mechanisms. Deep learning (DL) models have shown substantial advances in motif prediction. In parallel, intuitive and integrative web databases are needed to make effective use of DL models and ensure easy access to the identified motifs. Here, we present DESSO-DB, a web database developed to allow efficient access to the identified motifs and diverse motif analyses. DESSO-DB provides motif prediction results and visualizations of 690 ENCODE human Chromatin Immunoprecipitation sequencing (ChIP-seq) data (including 161 transcription factors (TFs) in 91 cell lines) and 1,677 human ChIP-seq data (including 547 TFs in 359 cell lines) from Cistrome DB using DESSO, which is an in-house developed DL tool for motif prediction. It also provides online motif finding and scanning functions for new ChIP-seq/ATAC-seq datasets and downloadable motif results of the above 690 DECODE datasets, 126 cancer ChIP-seq, 55 RNA Crosslinking-Immunoprecipitation and high-throughput sequencing (CLIP-seq) data. DESSO-DB is deployed on the Google Cloud Platform, providing stabilized and efficient resources freely to the public. DESSO-DB is free and available at http://cloud.osubmi.com/DESSO/.Entities:
Year: 2022 PMID: 35782725 PMCID: PMC9233226 DOI: 10.1016/j.csbj.2022.06.031
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 6.155
Fig. 1Overview of the webserver. (A) The data table for 2,367 ChIP-Seq datasets with 410 cell lines and 576 TFs. Each entry indicates the number of datasets derived from a specific cell line and a specific TF. The MAF BZIP Transcription Factor F (MAFF) dataset in the HepG2 cell line was highlighted as an example. (B) The identified MAFF sequence and shape motifs using DESSO, with links comparing their position weight matrix (PWM) to the existing motif databases. (C) Snapshot of the MAFF sequence profile (i.e., motif instances information). (D) Line chart for the per-nucleotide vertebrate motif conservation and the ± 50 bps flanking regions within the HepG2 cell line. (E) The line chart for the corresponding mean motif value and the ± 50 bps flanking regions within the HepG2 cell line. (F) The occurrence of the MAFF motif in the corresponding ChIP-seq peaks, which are ranked by peak signal, and an enrichment score of the motif in its corresponding CHIP-seq peaks. (G) The UCSC Genome Browser track hub displays genome-wide predicted binding sites for each binding profile in DESSO-DB.
Fig. 2Single-cell CUT&RUN analysis. (A) An upset plot of the shared and unique motifs among CTCF, NANOG, and SOX2 datasets. (B) Motifs are found in each cell, and motifs identified are denoted as blue. (C) Co-enrichment of SOX2 and NANOG motifs in the upstream regulatory regions of genes Pou5f1 and Pgk1. (D) The regulatory network of SOX2 and NANOG is inferred from motif co-enrichment. The pink nodes represent the genes shared by NANOG and SOX2 with higher potentials to be co-regulated by SOX2 and NANOG.