| Literature DB >> 26042150 |
Sachin Pundhir1, Panayiota Poirazi2, Jan Gorodkin1.
Abstract
Functional annotation of the genome is important to understand the phenotypic complexity of various species. The road toward functional annotation involves several challenges ranging from experiments on individual molecules to large-scale analysis of high-throughput sequencing (HTS) data. HTS data is typically a result of the protocol designed to address specific research questions. The sequencing results in reads, which when mapped to a reference genome often leads to the formation of distinct patterns (read profiles). Interpretation of these read profiles is essential for their analysis in relation to the research question addressed. Several strategies have been employed at varying levels of abstraction ranging from a somewhat ad hoc to a more systematic analysis of read profiles. These include methods which can compare read profiles, e.g., from direct (non-sequence based) alignments to classification of patterns into functional groups. In this review, we highlight the emerging applications of read profiles for the annotation of non-coding RNA and cis-regulatory elements (CREs) such as enhancers and promoters. We also discuss the biological rationale behind their formation.Entities:
Keywords: ChIP-seq; RNA-seq; enhancer; non-coding RNA; patterns; read profile
Year: 2015 PMID: 26042150 PMCID: PMC4437211 DOI: 10.3389/fgene.2015.00188
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Typical read profiles obtained for microRNA (has-mir-30e) and transcription factor (P300) from the small RNA-seq and ChIP-seq data, respectively. (A) Read profile for hsa-mir-30e is observed in K562 cell line using small RNA-seq data from the ENCODE project (Fejes-Toth et al., 2009). It consists of two read blocks corresponding to two arms (passenger and mature) of the microRNA. (B) Read profile for P300 transcription factor is observed in K562 cell line using ChIP-seq data from the ENCODE project (Euskirchen et al., 2007). It consists of a peak that signify the position where P300 binds to the genome. Also shown is the peak-valley-peak read profile for histone modification (H3K4me1) observed at P300 peaks (O'Geen et al., 2011). Both P300 (peak) and H3K4me1 (peak-valley-peak) read profiles are enriched at enhancer regions (Merika et al., 1998; Heintzman et al., 2007), and are thus useful for their annotation in the genome.
A brief summary of computational methods that use the concept of read profiles for the prediction of microRNA (miRNA), non-coding RNA (ncRNA) and .
| Micro-RNA prediction | miRDeep, miRDeep2, miRDeep* | Short RNA-seq | Two predominant cluster of reads corresponding to mature and passenger miRNA strand | Bayesian statistics, along with stable hairpin loop secondary structure (Friedländer et al., |
| miRanalyzer | Random forest classifier, along with stable hairpin loop secondary structure (Hackenberg et al., | |||
| miRdba | Optimal alignment of candidate and known miRNA read profiles (Pundhir and Gorodkin, | |||
| Non-coding RNA classification | Langenberger et al. | Short RNA-seq | Varying number of read clusters separated by specific number of nucleotides for major ncRNA classes (miRNA, snoRNA and tRNA). The reads are often arranged at different degree of precision (entropy) | Random forest classifier trained on different read profile features (length, expression and others) to classify miRNA, snoRNA and tRNA (Langenberger et al., |
| Jung et al. | Length and expression depth of read profiles, followed by motif and sequence similarity analysis to predict snRNA and snoRNA (Jung et al., | |||
| deepBlockAlign, ALPS | Optimal alignment between two read profiles to classify miRNA, snoRNA and tRNA (Erhard and Zimmer, | |||
| BlockClust | Graph-kernel trained on different read profile features such as minimum read length and entropy to classify miRNA, snoRNA and tRNA (Videm et al., | |||
| DFilter | TF ChIP-seq | Reads arranged in the form of a peak profile | Hotelling observer based on signal processing to detect regions enriched for peaks (Kumar et al., | |
| Kaikkonen et al. | Histone ChIP-seq | Reads arranged in the form of a peak-valley-peak read profile | Sliding window approach to detect peak-valley-peak read profile in order to measure spatiotemporal activity of CRE (Kaikkonen et al., | |
| CAGT | Pearson correlation coefficient between read profiles that are represented in the form of vector of signal values. Read profiles having high correlation are clustered together (Kundaje et al., | |||
| Detect novel ncRNA classes or known ncRNAs (potentially different) sharing similar processing | deepBlockAlign, ALPS | Short RNA-seq | Read profile characteristics (such as number of read clusters and length) shared by two or more transcripts | Optimal alignment between two read profiles (Erhard and Zimmer, |
Also included are two methods that can detect novel ncRNA classes or known ncRNAs sharing similar processing based on the similarity in their corresponding read profiles.
The application of the computational method.
Name or the literature reference of the computational method.
High-throughput sequencing data that is used by the method for analysis.
Characteristic of read profiles that the method exploits.
Brief description of the computational technique used behind the method.
Figure 2A typical read profile generated upon the processing of miRNA and random processing (degradation) of a non-miRNA transcript. (A) Primary miRNA transcript is precisely processed by Drosha and Dicer enzymes leading to the generation of a ~22 nt duplex (passenger and mature strand) and a loop region. While, the mature miRNA is protected from degradation by being incorporated into the miRNA-induced silencing complex (miRISC), both passenger and loop region are mostly degraded. Therefore, most reads obtained during short RNA-seq experiments correspond to the mature miRNA strand. When mapped to the reference genome, reads corresponding to miRNA and star miRNA align in a pattern (read profile) consisting of two major read clusters sharing almost the same 5′ end base position. (B) In contrast to precise processing of miRNA transcript, non-miRNA transcripts are processed at random base positions. This leads to the generation of many RNA fragments of no fixed length. When sequenced during RNA-seq experiments and mapped to the reference genome, the generated read profile consists of randomly placed reads having high variability in their 5′ end base positions.
Figure 3The transcription factor and histone modification landscape for inactive and active -regulatory elements (CRE; promoters and enhancers) and the corresponding read profiles. (A) When inactive, the DNA corresponding to the CRE is wrapped around histone proteins in the form of a basic structural unit termed as nucleosome. This prevents any interaction of transcription factors (TF) with the DNA. (B,C) When active, a series of histone modifications (H3K4me1, H3K4me3, H3K27ac and others) along side interaction with specific TF (pioneer factor) make the overlapping nucleosomes at a CRE hypermobile. These nucleosomes are then displaced apart leading to formation of nucleosome free regions (NFRs) that are subsequently bound by TFs. During TF ChIP-seq and Histone ChIP-seq experiment, reads corresponding to TF bound NFRs and histone-modified regions flanking the NFRs are obtained, respectively. Upon mapping, it leads to a read profile in the form of a peak shape for NFRs (TF ChIP-seq) and peak-valley-peak shape for regions flanking the NFRs (Histone ChIP-seq). By analyzing the read intensity in these read profiles, we can determine active CRE, TF bound at these CRE and also the level of their activity. Since, distinct sets of histone modifications are observed at enhancers (H3K27ac and H3K4me1) and promoters (H3K4me3 and H3K4me1), analyzing peak-valley-peak histone read profile also facilitates to distinguish between enhancers and promoters.