| Literature DB >> 32612756 |
Barbara Höllbacher1,2,3, Kinga Balázs1, Matthias Heinig2,3, N Henriette Uhlenhaut1,4.
Abstract
Advancements in the field of next generation sequencing lead to the generation of ever-more data, with the challenge often being how to combine and reconcile results from different OMICs studies such as genome, epigenome and transcriptome. Here we provide an overview of the standard processing pipelines for ChIP-seq and RNA-seq as well as common downstream analyses. We describe popular multi-omics data integration approaches used to identify target genes and co-factors, and we discuss how machine learning techniques may predict transcriptional regulators and gene expression.Entities:
Keywords: ChIP-seq; Data integration; Multi-omics; NGS; RNA-seq; Transcriptional regulation
Year: 2020 PMID: 32612756 PMCID: PMC7306512 DOI: 10.1016/j.csbj.2020.05.018
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 1Schematic representation of the transcriptional machinery. Cis-regulatory elements (enhancers or promoters), trans-regulatory elements (transcription factors) as well as epigenetic modifications and 3D chromatin structure are known to influence gene expression. TAD: Topologically associated domain.
Fig. 2Systematic literature search on publications combining gene expression and DNA binding data. (A) Numbers of Pubmed IDs associated with RNA-seq and ChIP-Seq data submissions (retrieved on 01/22/2020). (B) Top 20 most commonly referenced citations from publications in the intersection of the Venn diagram shown in A. PMID: Pubmed ID.
Fig. 3Standard processing workflow of ChIP-seq and RNA-seq. In both cases, the quality of the sequenced reads is checked before performing the alignment. The ChIP-seq data analysis continues with peak calling, followed by differential binding analysis. Searching for motifs in the peak regions and peak annotation are crucial steps. For RNA-seq, the aligned reads are quantified at gene level, the raw counts are then filtered and normalized to enable further comparisons. The differential expression analysis provides a list of significant genes, from which biological meaning may be retrieved. QC: Quality control, DE: differential expression.
Fig. 4Data integration approaches. (A) ChIP-seq and RNA-seq data can be integrated in a discretized fashion by determining the overlap of significantly affected genes in the 2 assays. (B) Newer approaches combine ChIP-seq data from multiple TFs and HMs together with expression data and accessibility data such as DNase-seq and ATAC-seq. They achieve data integration through various different mathematical concepts such GLMs, HMMs and deep neural networks to identify co-regulators, predict gene expression or model TF binding. DE: differential expression, TF: transcription factor. This figure was created with BioRender (biorender.com).