| Literature DB >> 17099231 |
Taishin Kin1, Kouichirou Yamada, Goro Terai, Hiroaki Okida, Yasuhiko Yoshinari, Yukiteru Ono, Aya Kojima, Yuki Kimura, Takashi Komori, Kiyoshi Asai.
Abstract
There are abundance of transcripts that code for no particular protein and that remain functionally uncharacterized. Some of these transcripts may have novel functions while others might be junk transcripts. Unfortunately, the experimental validation of such transcripts to find functional non-coding RNA candidates is very costly. Therefore, our primary interest is to computationally mine candidate functional transcripts from a pool of uncharacterized transcripts. We introduce fRNAdb: a novel database service that hosts a large collection of non-coding transcripts including annotated/non-annotated sequences from the H-inv database, NONCODE and RNAdb. A set of computational analyses have been performed on the included sequences. These analyses include RNA secondary structure motif discovery, EST support evaluation, cis-regulatory element search, protein homology search, etc. fRNAdb provides an efficient interface to help users filter out particular transcripts under their own criteria to sort out functional RNA candidates. fRNAdb is available at http://www.ncrna.org/Entities:
Mesh:
Substances:
Year: 2006 PMID: 17099231 PMCID: PMC1669753 DOI: 10.1093/nar/gkl837
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Data sources of fRNAdb
| Source | Num. seq. (mapped) |
|---|---|
| H-inv 2.0 (non-protein coding transcripts) | 5489 (5217) |
| NONCODE | 5339 (576) |
| RNAdb | 2865 (1306) |
| RNAdb (literature curation) | 1446 (524) |
| RNAdb (human chromosome 7 project) | 306 (299) |
| RNAdb (RIKEN antisense pipeline) | 1113 (486) |
| Total | 13 693 (7102) |
List of attributes
| S. no. | Description | Number of trascripts | Min/max |
|---|---|---|---|
| 1 | Length of the sequence (nt) | 13 693 | 15/107 797 |
| 2 | Number of exons | 7166 | 0/60 |
| 3 | Number of overlapping ESTs | 4184 | 0/6490 |
| 4 | Number of mapped positions | 7158 | 0/892 |
| 5 | GC-content (%) | 13 693 | 4/87 |
| 6 | Maximum length of potential ORF (amino acids) | 12 655 | 0/1664 |
| 7 | Percentage of bases that is covered with repeat elements | 6460 | 0/100 |
| 8 | Repeat elements reside proximal upstream/downstream | 2219 | |
| 9 | Known gene that is a potential sense/antisense of this transcript (exon overlapping required) | 936 | |
| 10 | Number of protein homologs (GenBank NR) | 5811 | 0/250 |
| 11 | Known gene that includes this transcript within its intron | 951 | |
| 12 | Known gene region that overlaps with the mapping extent of this transcript (strand not considered) | 4245 | |
| 13 | Known gene that overlaps with this transcript within its intron in different strand | 965 | |
| 14 | Known gene where this transcript is possibly a part of its 3′-UTR | 757 | |
| 15 | Known gene where this transcript is possibly a part of its 5′-UTR | 77 | |
| 16 | Known gene within upstream 5 kb | 1011 | |
| 17 | Known gene within downstream 5 kb | 402 | |
| 18 | Average conservation score over the mapped exonic region | 6184 | 0/93 |
| 19 | Maximum conservation score over the mapped exonic region | 5741 | 0/98 |
| 20 | Maximum conservation score within 500 base upstream from the mapped 5′ terminal | 6878 | 0/255 |
| 21 | Overlapping UCSC ultra conserved region | 24 | 0/4 |
| 22 | Number of canonical splice signals in this transcript | 751 | 0/30 |
| 23 | Number of poly(A) signals in this transcript | 8081 | 0/199 |
| 24 | Number of CpG island | 1353 | 0/4 |
| 25 | Associated transposon free region | 1137 | |
| 26 | Number of RFAM known RNA motifs in this transcript | 5511 | 0/12 |
| 27 | Number of RNAz predictive RNA motifs in this transcript | 1185 | 0/24 |
| 28 | Number of EvoFold predictive RNA motifs in this transcript | 888 | 0/7 |
| 29 | Maximum | 252 | 0.0/121.0 |
| 30 | Number of cell lines responding to Affy probes in exon regions of this transcript (Affymetrix Transcriptome Phase 2 Tiling Array Analyses) | 1593 | 0/11 |
The number of applicable transcripts and the range of the attributes are shown.
Figure 1The first page shows a set of selection interfaces (A) and the listing table of 13 693 transcripts (B).
Functional RNA-specific tracks
| Track | Description |
|---|---|
| RNAz folds ( | Secondary structure annotation of RNAz |
| ENOR ( | ENOR (expressed non-coding region) [lifted from mm5] |
| Erdmann ( | Erdmann non-coding RNAs |
| NONCODE ( | Mapping information of NONCODE RNAs |
| RNAdb ( | Mapping information of RNAdb RNAs |
| RNA Clusters | Small RNA genes often reside close to each other forming clusters. This track represents computationally identified RNA clusters in human genome |
| Rfam seed folds | Genomic search results with INFERNAL and covariance models generated from RFAM seeds |
| Rfam full | BLAT mapping results for RFAM full sequence dataset |
| antisense ChenJ NAR2004 ( | Sense–antisense pairs among UCSC known genes |
| tRNAscan-SE ( | tRNA genes predicted by tRNAscan-SE |
| Ultra conserved elements ( | 100% conserved elements (≥200 bp) in human, rat and mouse |
| Ultra conserved elements 17 way | 100% conserved elements in 17 vertebrates (longer than 50 bp) |
| Transposon free region ( | Regions longer than 5 kb or 10 kb containing no LINEs, SINEs and LTRs |
| Human accelerated region ( | HAR non-coding gene candidates predicted by ( |
| Regions with |
Figure 2mRNA view of a transcript. Regulatory elements, EST positions, splice positions, repeat elements, six frame stop codons are visualized along with the full span of a cDNA.
miRNA-specific tracks
| Track | Description |
|---|---|
| Known miRNAs | miRBase known miRNAs |
| Predicted miRNAs | miRNAMap and Berezikov's predicted miRNAs |
| Known targets | TarBase experimentally verified miRNA target sites |
| Predicted targets | RNAhybrid, PicTar, miRBase and T-ScanS-predicted miRNA target sites |