Literature DB >> 17099231

fRNAdb: a platform for mining/annotating functional RNA candidates from non-coding RNA sequences.

Taishin Kin1, Kouichirou Yamada, Goro Terai, Hiroaki Okida, Yasuhiko Yoshinari, Yukiteru Ono, Aya Kojima, Yuki Kimura, Takashi Komori, Kiyoshi Asai.   

Abstract

There are abundance of transcripts that code for no particular protein and that remain functionally uncharacterized. Some of these transcripts may have novel functions while others might be junk transcripts. Unfortunately, the experimental validation of such transcripts to find functional non-coding RNA candidates is very costly. Therefore, our primary interest is to computationally mine candidate functional transcripts from a pool of uncharacterized transcripts. We introduce fRNAdb: a novel database service that hosts a large collection of non-coding transcripts including annotated/non-annotated sequences from the H-inv database, NONCODE and RNAdb. A set of computational analyses have been performed on the included sequences. These analyses include RNA secondary structure motif discovery, EST support evaluation, cis-regulatory element search, protein homology search, etc. fRNAdb provides an efficient interface to help users filter out particular transcripts under their own criteria to sort out functional RNA candidates. fRNAdb is available at http://www.ncrna.org/

Entities:  

Mesh:

Substances:

Year:  2006        PMID: 17099231      PMCID: PMC1669753          DOI: 10.1093/nar/gkl837

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

fRNAdb is a database that helps in annotating non-coding transcripts acquired from publicly available databases. H-inv: human full-length non-coding cDNAs (1); NONCODE: experimentally validated non-coding transcripts (2); and RNAdb: non-coding transcripts curated from the literature, human chromosome 7 project, and RIKEN antisense pipeline and other putative non-coding RNAs (3). Details are shown in Table 1. Each transcript is analyzed for various features such as maximum ORF length, the number of protein homologs, the average conservation score, transcription regulatory element motifs, existence of CpG islands and so on (listed in Table 2) that help in filtering out promising non-coding candidates. Transcripts can be filtered with fRNAdb's main listing interface in many different ways (see Figure 1). This main listing interface is linked to our custom UCSC Genome Browser (4) for functional RNAs equipped with our RNA-specific original custom tracks that are specific to screening of functional RNA. Users can inspect a transcript of interest from a genomic view with rich genomic information surrounding the mapped transcript. The information includes the UCSC original tracks such as known genes, genome conservation and Affymetrix transcriptome tracks (5), and our original tracks such as conserved potential secondary structure, existence of known RNA secondary structure motifs and significant RNA secondary structure Z-score regions (for details see Table 3).
Table 1

Data sources of fRNAdb

SourceNum. seq. (mapped)
H-inv 2.0 (non-protein coding transcripts)5489 (5217)
NONCODE5339 (576)
RNAdb2865 (1306)
RNAdb (literature curation)1446 (524)
RNAdb (human chromosome 7 project)306 (299)
RNAdb (RIKEN antisense pipeline)1113 (486)
Total13 693 (7102)
Table 2

List of attributes

S. no.DescriptionNumber of trascriptsMin/max
1Length of the sequence (nt)13 69315/107 797
2Number of exons71660/60
3Number of overlapping ESTs41840/6490
4Number of mapped positions71580/892
5GC-content (%)13 6934/87
6Maximum length of potential ORF (amino acids)12 6550/1664
7Percentage of bases that is covered with repeat elements64600/100
8Repeat elements reside proximal upstream/downstream2219
9Known gene that is a potential sense/antisense of this transcript (exon overlapping required)936
10Number of protein homologs (GenBank NR)58110/250
11Known gene that includes this transcript within its intron951
12Known gene region that overlaps with the mapping extent of this transcript (strand not considered)4245
13Known gene that overlaps with this transcript within its intron in different strand965
14Known gene where this transcript is possibly a part of its 3′-UTR757
15Known gene where this transcript is possibly a part of its 5′-UTR77
16Known gene within upstream 5 kb1011
17Known gene within downstream 5 kb402
18Average conservation score over the mapped exonic region61840/93
19Maximum conservation score over the mapped exonic region57410/98
20Maximum conservation score within 500 base upstream from the mapped 5′ terminal68780/255
21Overlapping UCSC ultra conserved region240/4
22Number of canonical splice signals in this transcript7510/30
23Number of poly(A) signals in this transcript80810/199
24Number of CpG island13530/4
25Associated transposon free region1137
26Number of RFAM known RNA motifs in this transcript55110/12
27Number of RNAz predictive RNA motifs in this transcript11850/24
28Number of EvoFold predictive RNA motifs in this transcript8880/7
29Maximum Z-score of RNA secondary structure over this transcript. Scores lower than −6 are significant. Higher scores are considered insignificant. Stored scores= raw score × −102520.0/121.0
30Number of cell lines responding to Affy probes in exon regions of this transcript (Affymetrix Transcriptome Phase 2 Tiling Array Analyses)15930/11

The number of applicable transcripts and the range of the attributes are shown.

Figure 1

The first page shows a set of selection interfaces (A) and the listing table of 13 693 transcripts (B).

Table 3

Functional RNA-specific tracks

TrackDescription
RNAz folds (15)Secondary structure annotation of RNAz
ENOR (16)ENOR (expressed non-coding region) [lifted from mm5]
Erdmann (6)Erdmann non-coding RNAs
NONCODE (2)Mapping information of NONCODE RNAs
RNAdb (3)Mapping information of RNAdb RNAs
RNA ClustersSmall RNA genes often reside close to each other forming clusters. This track represents computationally identified RNA clusters in human genome
Rfam seed foldsGenomic search results with INFERNAL and covariance models generated from RFAM seeds
Rfam fullBLAT mapping results for RFAM full sequence dataset
antisense ChenJ NAR2004 (17)Sense–antisense pairs among UCSC known genes
tRNAscan-SE (18)tRNA genes predicted by tRNAscan-SE
Ultra conserved elements (19)100% conserved elements (≥200 bp) in human, rat and mouse
Ultra conserved elements 17 way100% conserved elements in 17 vertebrates (longer than 50 bp)
Transposon free region (20)Regions longer than 5 kb or 10 kb containing no LINEs, SINEs and LTRs
Human accelerated region (14)HAR non-coding gene candidates predicted by (14)
Z-scoreRegions with Z-score lower (lower is better) than −6 (actual track score = Z-score × −10)
The first page shows a set of selection interfaces (A) and the listing table of 13 693 transcripts (B). Data sources of fRNAdb List of attributes The number of applicable transcripts and the range of the attributes are shown. Functional RNA-specific tracks

fRNAdb

fRNAdb provides two types of interfaces. The first page presents a list of all transcripts rendered as a table with 35 columns including ones for the attributes described in Table 2 (Figure 1B). The tabular control panel is placed above the table, which presents five tabs labeled ‘Basic’, ‘DB/ID’, ‘Expert’, ‘Sort’ and ‘Column’ (Figure 1A). The Basic tab contains the basic filters: a collection of frequently used filters that provide simple and quick selection of transcripts that match common criteria of functional non-coding RNAs. For example, checking ‘Mapped’ to select only genome-mapped transcripts, ‘Well conserved at best (Max > 50%)’ for transcripts that have maximum conservation score >50% among 17 vertebrates (4) in their exonic regions, ‘EST-supported’ for reliable expression evidence, ‘Tiny ORF (<40 aa)’ enriching for non-coding transcripts, ‘Low Repeat Coverage (<30%)’ for no repeat element contamination, ‘No protein homolog’ for another condition which enriches non-coding transcripts, ‘No overlapping known gene’ is for removing the possibility of being part of a protein-coding gene transcript. After checking the boxes, the ‘refresh’ button runs filtering action and presents results. Our example conditions yield nine hits including one H-inv non-protein coding cDNA and eight RNAdb literature-curated miRNAs. In other words, these criteria match real functional RNAs and also indicate that one non-coding transcript shares the same properties. Clicking on the ID of this transcript produces a detailed view of this transcript shown in Figure 2. This feature visualizer shows graphical representation of a variety of sequence elements found in the transcript including cis-regulatory elements, repeat elements, EST mapping regions and six frame stop codon positions. There are many different ways to filter these non-coding transcripts and there are many more potential candidates hidden in this dataset. More details of the basic filters are provided on the website.
Figure 2

mRNA view of a transcript. Regulatory elements, EST positions, splice positions, repeat elements, six frame stop codons are visualized along with the full span of a cDNA.

mRNA view of a transcript. Regulatory elements, EST positions, splice positions, repeat elements, six frame stop codons are visualized along with the full span of a cDNA. The rest of the tabs offer additional functionality to further improve usability. The DB/ID tab contains DB selection and ID selection boxes. The DB selection box allows you to limit the target databases from currently available databases: H-inv, NONCODE and RNAdb. The ID selection box lets you choose target transcripts that match given string patterns. For example, specifying ‘FR000001’ (fRNAdb ID) in this box limits the target transcript FR000001 alone. The wild-card ‘%’ is allowed for pattern matching. Specifying ‘LIT%’ lets you limit the search to targets whose original IDs start with ‘LIT’. The string pattern is matched against ID, Acc. and Original columns. The Expert tab provides an interface to specify multiple conditions that let you perform more complex filtering than the basic filters. Please refer to the website for more details about the expert filters. The Sort tab has a sorting interface that lets you sort the table with multiple sorting keys. The Column tab allows you to limit visible columns of the main listing table. Since the 35-column table is too wide for ordinary browsers to display on a single screen, you can narrow the width of the table with this interface for better visibility.

UCSC GENOME BROWSER FOR FUNCTIONAL RNAs

We mirrored the UCSC Genome Browser and added our custom tracks specific to functional RNAs and miRNAs as shown in Tables 3 and 4. Most of the tracks have their own sources and reference papers. Our original tracks are RNA clusters, Rfam seed folds, tRNAscan-SE, Ultra Conserved Elements 17way and Z-score (details are shown in Table 3). Besides, we mapped RNA sequences from public functional RNA sequence databases including Erdmann (6), NONCODE, RNAdb and Rfam. The UCSC Genome Browser has several tracks for miRNA genes and targets but we added more tracks including miRBase (7) known miRNA genes, miRNAMap (8) and Berezikov's predicted miRNA genes (9), TarBase (10) known miRNA targets, and predicted miRNA targets from RNAhybrid (11), PicTar 4 species and 5 species (12), miRBase targets and T-ScanS miRNA targets (13). Our custom tracks can be downloaded by using Table browser which can be accessed via ‘Table’ menu of the UCSC Genome Browser.
Table 4

miRNA-specific tracks

TrackDescription
Known miRNAsmiRBase known miRNAs
Predicted miRNAsmiRNAMap and Berezikov's predicted miRNAs
Known targetsTarBase experimentally verified miRNA target sites
Predicted targetsRNAhybrid, PicTar, miRBase and T-ScanS-predicted miRNA target sites
miRNA-specific tracks In the near future, fRNAdb will include more transcripts from other sequence databases or non-coding gene prediction results. For example, Human Accelerated Region (14) is currently included as our custom track of the Genome Browser. Sequences of these non-coding gene candidates will be included in fRNAdb. We will also add more attributes to fRNAdb. Especially attributes representing expression patterns of the transcripts or protein genes related to the transcripts.
  20 in total

1.  Ultraconserved elements in the human genome.

Authors:  Gill Bejerano; Michael Pheasant; Igor Makunin; Stuart Stephen; W James Kent; John S Mattick; David Haussler
Journal:  Science       Date:  2004-05-06       Impact factor: 47.728

2.  Over 20% of human transcripts might form sense-antisense pairs.

Authors:  Jianjun Chen; Miao Sun; W James Kent; Xiaoqiu Huang; Hanqing Xie; Wenquan Wang; Guolin Zhou; Run Zhang Shi; Janet D Rowley
Journal:  Nucleic Acids Res       Date:  2004-09-08       Impact factor: 16.971

3.  Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets.

Authors:  Benjamin P Lewis; Christopher B Burge; David P Bartel
Journal:  Cell       Date:  2005-01-14       Impact factor: 41.582

4.  Phylogenetic shadowing and computational identification of human microRNA genes.

Authors:  Eugene Berezikov; Victor Guryev; José van de Belt; Erno Wienholds; Ronald H A Plasterk; Edwin Cuppen
Journal:  Cell       Date:  2005-01-14       Impact factor: 41.582

5.  Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome.

Authors:  Stefan Washietl; Ivo L Hofacker; Melanie Lukasser; Alexander Hüttenhofer; Peter F Stadler
Journal:  Nat Biotechnol       Date:  2005-11       Impact factor: 54.908

6.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence.

Authors:  T M Lowe; S R Eddy
Journal:  Nucleic Acids Res       Date:  1997-03-01       Impact factor: 16.971

7.  An RNA gene expressed during cortical development evolved rapidly in humans.

Authors:  Katherine S Pollard; Sofie R Salama; Nelle Lambert; Marie-Alexandra Lambot; Sandra Coppens; Jakob S Pedersen; Sol Katzman; Bryan King; Courtney Onodera; Adam Siepel; Andrew D Kern; Colette Dehay; Haller Igel; Manuel Ares; Pierre Vanderhaeghen; David Haussler
Journal:  Nature       Date:  2006-08-16       Impact factor: 49.962

8.  NONCODE: an integrated knowledge database of non-coding RNAs.

Authors:  Changning Liu; Baoyan Bai; Geir Skogerbø; Lun Cai; Wei Deng; Yong Zhang; Dongbo Bu; Yi Zhao; Runsheng Chen
Journal:  Nucleic Acids Res       Date:  2005-01-01       Impact factor: 16.971

9.  RNAdb--a comprehensive mammalian noncoding RNA database.

Authors:  Ken C Pang; Stuart Stephen; Pär G Engström; Khairina Tajul-Arifin; Weisan Chen; Claes Wahlestedt; Boris Lenhard; Yoshihide Hayashizaki; John S Mattick
Journal:  Nucleic Acids Res       Date:  2005-01-01       Impact factor: 16.971

10.  Integrative annotation of 21,037 human genes validated by full-length cDNA clones.

Authors:  Tadashi Imanishi; Takeshi Itoh; Yutaka Suzuki; Claire O'Donovan; Satoshi Fukuchi; Kanako O Koyanagi; Roberto A Barrero; Takuro Tamura; Yumi Yamaguchi-Kabata; Motohiko Tanino; Kei Yura; Satoru Miyazaki; Kazuho Ikeo; Keiichi Homma; Arek Kasprzyk; Tetsuo Nishikawa; Mika Hirakawa; Jean Thierry-Mieg; Danielle Thierry-Mieg; Jennifer Ashurst; Libin Jia; Mitsuteru Nakao; Michael A Thomas; Nicola Mulder; Youla Karavidopoulou; Lihua Jin; Sangsoo Kim; Tomohiro Yasuda; Boris Lenhard; Eric Eveno; Yoshiyuki Suzuki; Chisato Yamasaki; Jun-ichi Takeda; Craig Gough; Phillip Hilton; Yasuyuki Fujii; Hiroaki Sakai; Susumu Tanaka; Clara Amid; Matthew Bellgard; Maria de Fatima Bonaldo; Hidemasa Bono; Susan K Bromberg; Anthony J Brookes; Elspeth Bruford; Piero Carninci; Claude Chelala; Christine Couillault; Sandro J de Souza; Marie-Anne Debily; Marie-Dominique Devignes; Inna Dubchak; Toshinori Endo; Anne Estreicher; Eduardo Eyras; Kaoru Fukami-Kobayashi; Gopal R Gopinath; Esther Graudens; Yoonsoo Hahn; Michael Han; Ze-Guang Han; Kousuke Hanada; Hideki Hanaoka; Erimi Harada; Katsuyuki Hashimoto; Ursula Hinz; Momoki Hirai; Teruyoshi Hishiki; Ian Hopkinson; Sandrine Imbeaud; Hidetoshi Inoko; Alexander Kanapin; Yayoi Kaneko; Takeya Kasukawa; Janet Kelso; Paul Kersey; Reiko Kikuno; Kouichi Kimura; Bernhard Korn; Vladimir Kuryshev; Izabela Makalowska; Takashi Makino; Shuhei Mano; Regine Mariage-Samson; Jun Mashima; Hideo Matsuda; Hans-Werner Mewes; Shinsei Minoshima; Keiichi Nagai; Hideki Nagasaki; Naoki Nagata; Rajni Nigam; Osamu Ogasawara; Osamu Ohara; Masafumi Ohtsubo; Norihiro Okada; Toshihisa Okido; Satoshi Oota; Motonori Ota; Toshio Ota; Tetsuji Otsuki; Dominique Piatier-Tonneau; Annemarie Poustka; Shuang-Xi Ren; Naruya Saitou; Katsunaga Sakai; Shigetaka Sakamoto; Ryuichi Sakate; Ingo Schupp; Florence Servant; Stephen Sherry; Rie Shiba; Nobuyoshi Shimizu; Mary Shimoyama; Andrew J Simpson; Bento Soares; Charles Steward; Makiko Suwa; Mami Suzuki; Aiko Takahashi; Gen Tamiya; Hiroshi Tanaka; Todd Taylor; Joseph D Terwilliger; Per Unneberg; Vamsi Veeramachaneni; Shinya Watanabe; Laurens Wilming; Norikazu Yasuda; Hyang-Sook Yoo; Marvin Stodolsky; Wojciech Makalowski; Mitiko Go; Kenta Nakai; Toshihisa Takagi; Minoru Kanehisa; Yoshiyuki Sakaki; John Quackenbush; Yasushi Okazaki; Yoshihide Hayashizaki; Winston Hide; Ranajit Chakraborty; Ken Nishikawa; Hideaki Sugawara; Yoshio Tateno; Zhu Chen; Michio Oishi; Peter Tonellato; Rolf Apweiler; Kousaku Okubo; Lukas Wagner; Stefan Wiemann; Robert L Strausberg; Takao Isogai; Charles Auffray; Nobuo Nomura; Takashi Gojobori; Sumio Sugano
Journal:  PLoS Biol       Date:  2004-04-20       Impact factor: 8.029

View more
  78 in total

Review 1.  The role of robustness in phenotypic adaptation and innovation.

Authors:  Andreas Wagner
Journal:  Proc Biol Sci       Date:  2012-01-04       Impact factor: 5.349

2.  Identification and characterization of piRNA-like small RNAs in the gonad of sea urchin (Strongylocentrotus nudus).

Authors:  Zhenlin Wei; Xiaolin Liu; Huilin Zhang
Journal:  Mar Biotechnol (NY)       Date:  2011-12-13       Impact factor: 3.619

Review 3.  Long non-coding RNAs and cancer: a new frontier of translational research?

Authors:  R Spizzo; M I Almeida; A Colombatti; G A Calin
Journal:  Oncogene       Date:  2012-01-23       Impact factor: 9.867

4.  Identification of an miRNA candidate reflects the possible significance of transcribed microsatellites in the hairpin precursors of black pepper.

Authors:  Nisha Joy; Eppurathu Vasudevan Soniya
Journal:  Funct Integr Genomics       Date:  2012-02-25       Impact factor: 3.410

5.  RNAspace.org: An integrated environment for the prediction, annotation, and analysis of ncRNA.

Authors:  Marie-Josée Cros; Antoine de Monte; Jérôme Mariette; Philippe Bardou; Benjamin Grenier-Boley; Daniel Gautheret; Hélène Touzet; Christine Gaspin
Journal:  RNA       Date:  2011-09-23       Impact factor: 4.942

6.  The structure of the genotype-phenotype map strongly constrains the evolution of non-coding RNA.

Authors:  Kamaludin Dingle; Steffen Schaper; Ard A Louis
Journal:  Interface Focus       Date:  2015-12-06       Impact factor: 3.906

7.  Analysis and classification of RNA tertiary structures.

Authors:  Mira Abraham; Oranit Dror; Ruth Nussinov; Haim J Wolfson
Journal:  RNA       Date:  2008-09-29       Impact factor: 4.942

8.  Identification of non-coding RNAs with a new composite feature in the Hybrid Random Forest Ensemble algorithm.

Authors:  Supatcha Lertampaiporn; Chinae Thammarongtham; Chakarida Nukoolkit; Boonserm Kaewkamnerdpong; Marasri Ruengjitchatchawalya
Journal:  Nucleic Acids Res       Date:  2014-04-25       Impact factor: 16.971

9.  Neutral network sizes of biological RNA molecules can be computed and are not atypically small.

Authors:  Thomas Jörg; Olivier C Martin; Andreas Wagner
Journal:  BMC Bioinformatics       Date:  2008-10-30       Impact factor: 3.169

10.  FASTR3D: a fast and accurate search tool for similar RNA 3D structures.

Authors:  Chin-En Lai; Ming-Yuan Tsai; Yun-Chen Liu; Chih-Wei Wang; Kun-Tze Chen; Chin Lung Lu
Journal:  Nucleic Acids Res       Date:  2009-05-12       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.