| Literature DB >> 22053076 |
David Langenberger1, Sachin Pundhir, Claus T Ekstrøm, Peter F Stadler, Steve Hoffmann, Jan Gorodkin.
Abstract
MOTIVATION: High-throughput sequencing methods allow whole transcriptomes to be sequenced fast and cost-effectively. Short RNA sequencing provides not only quantitative expression data but also an opportunity to identify novel coding and non-coding RNAs. Many long transcripts undergo post-transcriptional processing that generates short RNA sequence fragments. Mapped back to a reference genome, they form distinctive patterns that convey information on both the structure of the parent transcript and the modalities of its processing. The miR-miR* pattern from microRNA precursors is the best-known, but by no means singular, example.Entities:
Mesh:
Substances:
Year: 2011 PMID: 22053076 PMCID: PMC3244762 DOI: 10.1093/bioinformatics/btr598
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
The HTS dataset used in this study along with possible ID from GEO, the number of reads and number of block groups
| Dataset | GEO ID | No. of reads | No. of block groups | |
|---|---|---|---|---|
| All | Expression filter | |||
| Human_eb | – | 7 351 304 | 1136 | 455 |
| Human_hesc | – | 7 836 912 | 1386 | 585 |
| Human_34 | GSM450598 | 7 299 034 | 1103 | 377 |
| Human_98 | GSM450608 | 8 371 772 | 1109 | 425 |
| Human_14 | GSM450605 | 8 538 940 | 1614 | 686 |
| Monkey_9 | GSM450615 | 10 698 419 | 1738 | 478 |
The expression filter requires a block group to have at least two blocks with a minimum of 50 reads. Furthermore, block groups >200 nt or <50 nt are excluded.
aBlock groups with >1 blocks, >50 nt and <200 nt in length.
bHuman embryoid cells (Morin ).
cHuman embryonic stem cells (Morin ).
dHuman brain (34 days) (Somel ).
eHuman brain (98 years) (Somel ).
fHuman brain (14 years) (Somel ).
gMonkey brain (9 years) (Somel ).
Fig. 1.Entropy of distinct starting positions for different classes of ncRNA of our 455 block groups in Human_eb dataset. The different profiles suggest that the entropy is a distinct measure for each ncRNA type and could be used for separation.
Fig. 2.Visualization of block and block group alignment steps of deepBlockAlign. (a) Block alignment computed between similarly placed blocks of a miRNA and an unannotated block group. Both the blocks have similar expression and precise arrangement of reads as also represented in Figure 4c for the same example. (b) A representation of alignment computed between two block groups using Sankoff algorithm. The algorithm optimizes the score based on the individual block similarities and pairwise block distances. Pairwise aligned blocks with similar distances are shown in black, single block alignments in gray and inserted or deleted blocks in white.
Fig. 4.Hierarchical clustering of 455 block groups based on alignment score from deepBlockAlign. (a) A tree visualizing the clustering. microRNA loci (red) are well separated from tRNA genes (blue). Within the microRNA cluster, microRNA-offset RNAs (moRs) can be found in one subcluster (IV), illustrating the different read pattern, caused by the additional blocks flanking the mature microRNA regions. Some significant clusters having tRNAs, snoRNAs or unannotated block groups clustering together with microRNAs (II, III, V and VI). tRNAs that are reported to generate products with miRNA-like features are highlighted with arrows. A cluster having tRNAs with different anti-codons but highly similar expression pattern (I). (b) A representation of the deepBlockAlign result for snoRNA-HACA-E3 significantly clustered together with hsa-mir-9-1. The snoRNA candidate shows not only well-placed blocks, like the microRNA, but also precise read arrangements at the 5′ end, suggesting a Dicer processing. (c) Alignment of an unknown block group with the hsa-mir-424 microRNA. (d) Alignment of the tRNA-Ala-AGC with hsa-mir-15a. The tRNA shows a microRNA-like read arrangement and is similar to the example presented from Cole ), having most of the reads stacked at the 5′ end of the tRNA.
Fig. 3.Retrieval of expressed loci in different specimen solely based on read mapping profiles. The histogram shows for pairs of profiles from different developmental (red: Human_34 and Human_9), tissue (blue: Human_eb and Human_hesc) and evolutionary (green: Human_14 and Monkey_9) samples the best ranks found in the respective mate set, supporting non-random processing.