Literature DB >> 17145715

RNAdb 2.0--an expanded database of mammalian non-coding RNAs.

Ken C Pang1, Stuart Stephen, Marcel E Dinger, Pär G Engström, Boris Lenhard, John S Mattick.   

Abstract

RNAdb is a comprehensive database of mammalian non-protein-coding RNAs (ncRNAs). There is increasing recognition that ncRNAs play important regulatory roles in multicellular organisms, and there is an expanding rate of discovery of novel ncRNAs as well as an increasing allocation of function. In this update to RNAdb, we provide nucleotide sequences and annotations for tens of thousands of non-housekeeping ncRNAs, including a wide range of mammalian microRNAs, small nucleolar RNAs and larger mRNA-like ncRNAs. Some of these have documented functions and/or expression patterns, but the majority remain of unclear significance, and include PIWI-interacting RNAs, ncRNAs identified from the latest rounds of large-scale cDNA sequencing projects, putative antisense transcripts, as well as ncRNAs predicted on the basis of structural features and alignments. Improvements to the database comprise not only new and updated ncRNA datasets, but also provision of microarray-based expression data and closer interface with more specialized ncRNA resources such as miRBase and snoRNA-LBME-db. To access RNAdb, visit http://research.imb.uq.edu.au/RNAdb.

Entities:  

Mesh:

Substances:

Year:  2006        PMID: 17145715      PMCID: PMC1751534          DOI: 10.1093/nar/gkl926

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

The mammalian genome encodes thousands of non-protein-coding RNAs (ncRNAs). Ribosomal RNAs (rRNAs), transfer RNAs (tRNAs) and small nuclear RNAs (snRNAs), fulfil mainly housekeeping roles in mRNA translation and splicing. Small nucleolar RNAs (snoRNAs) and the related small Cajal body-specific RNAs (scaRNAs) guide modifications of other RNAs (1–3). MicroRNAs (miRNAs) regulate gene expression by controlling mRNA translation and turnover (4,5), PIWI-interacting RNAs (piRNAs) are thought to be important in spermatogenesis (6,7), while larger ncRNAs have been discovered to be developmentally regulated (8–10) and to function in a range of processes including genomic imprinting, intracellular protein trafficking and brain development (11–16). The abundance of ncRNAs has only become apparent in the past few years and was largely unexpected. Although many recently identified ncRNAs remain of unknown function, and appear to be evolving rapidly (15,17), it is increasingly clear that ncRNAs represent a diverse and important class of functional output from mammalian genomes. RNAdb is a comprehensive database of mammalian ncRNAs. The focus of the database is on ncRNAs that have restricted expression and whose function is likely to be regulatory. Housekeeping RNAs (rRNAs, tRNAs, snRNAs) are not included and are covered elsewhere (18,19). The aim of the database is to provide a nucleotide sequence-based platform to facilitate both bioinformatic and experimental research in the burgeoning field of RNomics. Already, RNAdb has been used to develop machine-learning algorithms for identifying ncRNAs (20), annotate transcripts from a large-scale transcriptome project (8) and examine ncRNA evolution (17). In addition to containing sequence data, individual ncRNA entries in RNAdb are annotated based upon publicly available information in the literature or secondary databases. In this way, the database can also be browsed or searched by the casual user interested in learning more about particular ncRNAs. Since the original release of RNAdb two years ago (21), the number of known mammalian ncRNAs has grown considerably. In recognition of this growth, we have updated the database to include tens of thousands of novel ncRNAs. Some of these have been characterized in isolation, continuing the trend of ad hoc discovery by which many earlier ncRNAs were identified. The majority, however, comes from large-scale cloning and sequencing studies or structural alignment-based predictions. As well as incorporating new ncRNA datasets, the current release of RNAdb provides other enhancements, including microarray-based expression data, closer interface with specialized ncRNA resources such as miRBase and snoRNA-LBME-db (3,22), and the availability of data for use as custom tracks on the UCSC Genome Browser (23).

DATABASE ACCESS

RNAdb is available on-line at . Currently, datasets are stored in relational form in a Microsoft SQL2005 database. The web application is multilayered with the primary presentation layer implemented in C# 2 under the ASP.NET 2.0 framework. The application layer is implemented as a mixture of C# and C++ modules with dataset normalization performed through SQL stored procedures. The database can be accessed or queried in various ways. Users can casually browse the collection. Specific searches can be performed using keywords (with or without Boolean operators) and/or by applying filters across nominated fields. BLAST searches permit users to locate regions of similarity between sequences of interest and those stored in the database. To facilitate links with other on-line resources, users can now directly go to a detailed view of an entry by using the following URL and substituting the RNAdb unique identifier of interest for : . For example, if a user wishes to look at the detailed view for MIR1004, one would use . The entire database is available for download in either FASTA or XML format via the website. Specific datasets are also provided as custom tracks for loading directly into the UCSC Genome Browser (23). This feature allows users to easily take a defined subset of the database and readily apply it to the UCSC Genome Browser's extensive set of comparative and analytical tools.

DATABASE CONTENT

ncRNAs in RNAdb are divided into several distinct datasets (Table 1). This decision was made not only to reflect the different ways in which ncRNAs have been identified, but also in recognition that users may want to separately query and download each set. A description of each dataset is provided below.
Table 1

Summary of RNAdb 2.0 datasets

DatasetOrganismNumber of sequences
miRNAsHuman462
Mouse358
Other1035
snoRNAs and scaRNAsHuman375
Mouse175
Other24
piRNAsHuman32 046
Mouse30 024
Rat26 568
Other ncRNAs from the literatureHuman531
Mouse242
Other167
FANTOM3 ncRNAsMouse34 030
H-Invitational ncRNAsHuman1794
ncRNAs predicted from structural alignments
    RNAzHuman35 984
    Non-coding RNA SearchHuman1273
Mouse1252
EvoFoldHuman47 509
Predicted antisense ncRNAsHuman1068 (in 919 TUs)a
Mouse1615 (in 1395 Tus)a

aSequences were clustered into transcriptional units (TUs) by joining any two or more sequences that had overlapping exons on the same genomic strand.

Summary of RNAdb 2.0 datasets aSequences were clustered into transcriptional units (TUs) by joining any two or more sequences that had overlapping exons on the same genomic strand.

miRNAs

Over 1800 mammalian miRNAs are found within RNAdb. These sequences were obtained from the latest release of miRBase (release 8.2, July 2006) (22). miRBase is the central repository for miRNA data on the web and is regularly maintained. We have elected to directly link the RNAdb miRNA entries to miRBase, so as to keep abreast with the most recent annotations and updates.

snoRNAs and scaRNAs

RNAdb contains more than 500 mammalian snoRNAs and scaRNAs. The snoRNAs fall into two general classes, C/D box and H/ACA snoRNAs, which classically guide ribose methylation and pseudo-uridylation of rRNAs, respectively. Interestingly, some snoRNAs appear to regulate other RNAs, including HBII-52 which regulates the alternative splicing of the serotonin receptor 2C and is implicated in the pathogenesis of Prader-Willi syndrome (24). Human snoRNAs and scaRNAs in RNAdb were derived from snoRNA-LBME-db (release 3, August 2006) (3), and annotations for these sequences are maintained by linking out to this informative and specialized resource.

piRNAs

The PIWI family of proteins is known to be important for germ cell development. PIWI proteins were recently discovered to bind thousands of small RNAs, termed piRNAs (6,7). piRNAs have been identified in testis, 26–31 nt in length, and are distinct from miRNAs. Over 88 000 piRNA candidates have been cloned and sequenced from mouse, human and rat, and are included for the first time in the current release of RNAdb.

Other ncRNAs from the literature

This dataset contains more than 900 unique ncRNA sequences which have been identified and manually curated based upon extensive literature review. A majority of ncRNAs listed here are much longer than those listed above. Altogether, 36 mammalian organisms are represented but most ncRNAs are either from mouse or human. Although some of these transcripts have documented biological roles, most are transcripts of unknown function. As well as sequence data, additional information–including GenBank accessions, references, chromosomal location, transcript length, splicing status, conservation notes, function, disease associations, antisense relationships, imprinting status and tissue expression patterns-is provided wherever possible in separate searchable fields. New additions to this dataset include multiple long mRNA-like ncRNAs whose functions have recently become apparent. For instance, NRON, an ncRNA repressor of the nuclear factor of activated T cells (NFAT), regulates nuclear trafficking of NFAT (13). Taurine upregulated gene 1 (TUG1) is required for photoreceptor development in the eye (25). Saf, which lies antisense to the death receptor Fas, alters the expression of alternative Fas isoforms and increases resistance to Fas-induced apoptosis (26). Evf-2, an ncRNA derived from an ultraconserved element, cooperates with the homeodomain protein Dlx-2 to augment the transcriptional activity of a nearby enhancer (27). Serving as a salient reminder that many ncRNAs are poorly conserved (17) is HAR1F, which is expressed specifically in Cajal-Retzius neurons in the developing human neocortex and has evolved rapidly in the human lineage (15).

FANTOM3 ncRNAs

Using full-length cDNA cloning and sequencing strategies, the Functional Annotation of Mouse (FANTOM) project has identified thousands of novel transcripts from the mouse genome (8). In the most recent round of annotation, 34 030 cDNAs were manually annotated as putative ncRNAs (28), a subset of which were subsequently shown to be derived fragments of very long ncRNAs (29). Since both cloning and manual human annotation is subject to variation and error, the true number of ncRNAs remains unclear. To this end, we provide the results of various computational prediction strategies for use as additional filters in identifying ncRNAs (Supplementary Data 1). In addition to sequence data, details such as the Riken clone identifier, GenBank accession, genomic location, transcript length, likely imprinting status and library of origin are provided. RNAdb also incorporates expression information from publicly available microarray datasets such as GNF SymAtlas (30) (Supplementary Data 2). Although limited to only a small proportion of FANTOM3 ncRNAs, this information allows the identification of transcripts that are dynamically expressed across various tissues and cell types, and is expected to provide a useful starting point for their further characterization. As indicated earlier, the vast majority of ncRNAs identified from large-scale cDNA sequencing projects are of unknown significance. A recent screen of several hundred, well-conserved FANTOM ncRNAs identified not only NRON, but also seven other functional ncRNA genes essential for cell viability or involved in Hedgehog signalling (13,31). Given that this strategy employed only a limited number of cell-based assays and that only a tiny proportion of ncRNAs were examined, it would appear likely that many more functional ncRNAs from the FANTOM collection remain to be uncovered in the future.

H-Invitational ncRNAs

This dataset contains more than 1700 putative ncRNAs from the latest round of the Human Full-length cDNA Annotation Invitational (H-Invitational) project (release 3.4, August 2006) (32). Non-protein-coding transcripts are defined in this dataset by the absence of any open reading frame and by not belonging to the pseudogene classification. In addition to the sequence data, details such as the GenBank accession no., genomic location, transcript length, library of origin and expression data (based upon publicly available microarray data where present; see Supplementary Data 2) are also listed.

ncRNAs predicted from structural alignments

Recently, a number of studies have identified thousands of putative ncRNAs based upon predicted structural features and alignments using novel comparative genomics tools. The datasets resulting from three independent approaches, RNAz (33), Non-coding RNA Search (34) and EvoFold (35), are included here. RNAz combines a comparative approach (scoring conservation of secondary structure) with the observation that ncRNAs are thermodynamically more stable than expected by chance. Using sequences conserved in at least human, mouse, rat and dog, over 35 000 structured elements were identified in the human genome (36). Non-coding RNA Search uses syntenic regions between human and mouse that are unalignable and then utilizes the FOLDALIGN algorithm to identify regions with conserved secondary structure. Finally, EvoFold utilizes a comparative genomics method based on phylogenetic stochastic context-free grammars to identify functional RNAs. Using an eight-way genome-wide alignment of human, chimpanzee, mouse, rat, dog, chicken, zebrafish and Fugu, over 47 000 candidate RNA structures were identified in the human genome.

Comprehensive antisense ncRNA dataset

Natural antisense transcription is now recognized as being a common occurrence in the mammalian transcriptome and a means by which gene expression can be regulated (37–39). Data from tiling array experiments and sequencing of short tags representing 5′ and 3′ ends of transcripts suggest that more than 60% of all human and mouse loci may be transcribed on both strands and give rise to complementary transcripts (9,37). In its original release, RNAdb contained a dataset of putative antisense ncRNAs identified from cDNA and EST databases for human and mouse using a computational pipeline (21). Coinciding with the current release, we have recently re-developed the pipeline that searches for antisense RNAs and experimentally validated a subset of its predictions (40). We will continue to use the improved pipeline in regular updates of the antisense ncRNA dataset.

CONCLUSIONS

The total number of mammalian ncRNA sequences contained in RNAdb has increased ∼10-fold since the database's inception 2 years ago. Such growth in content reflects the high level of interest and activity in the field over this period. Nevertheless, most of the newly added sequences represent putative ncRNAs and their biological roles, if any, remain unclear. The fact that most of the mammalian genome is transcribed suggests that there is either a great deal of transcriptional noise or that these RNAs are fulfilling some unexpected functions in mammalian biology (11,41). Although the search for new ncRNAs is far from exhausted (42), one of today's principal challenges in the field of RNomics is to explore the functional significance of this abundant non-coding transcription. If this challenge is to be successfully met in coming years, then experimental proof of function will be paramount. Such proof often comes slowly and incrementally, and having bioinformatic resources such as RNAdb will be essential to guide and facilitate future discovery.

FUTURE DIRECTIONS

As new ncRNAs are discovered, we will continue to update RNAdb. Submissions of new mammalian ncRNAs are invited, and should be sent to RNAdb@imb.uq.edu.au. We also plan to regularly synchronize our datasets with miRBase and snoRNA-LBME-db, as these resources are updated. Currently, publicly available microarray-based expression data for ncRNAs remain limited, but is likely to be significantly expanded in the future (K. Pang and M. Dinger, unpublished data). Once new expression data are released, they will subsequently be incorporated into RNAdb.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.
  42 in total

1.  A novel class of small RNAs bind to MILI protein in mouse testes.

Authors:  Alexei Aravin; Dimos Gaidatzis; Sébastien Pfeffer; Mariana Lagos-Quintana; Pablo Landgraf; Nicola Iovino; Patricia Morris; Michael J Brownstein; Satomi Kuramochi-Miyagawa; Toru Nakano; Minchen Chien; James J Russo; Jingyue Ju; Robert Sheridan; Chris Sander; Mihaela Zavolan; Thomas Tuschl
Journal:  Nature       Date:  2006-06-04       Impact factor: 49.962

2.  Thousands of corresponding human and mouse genomic regions unalignable in primary sequence contain common RNA structure.

Authors:  Elfar Torarinsson; Milena Sawera; Jakob H Havgaard; Merete Fredholm; Jan Gorodkin
Journal:  Genome Res       Date:  2006-06-02       Impact factor: 9.043

3.  A germline-specific class of small RNAs binds mammalian Piwi proteins.

Authors:  Angélique Girard; Ravi Sachidanandam; Gregory J Hannon; Michelle A Carmell
Journal:  Nature       Date:  2006-06-04       Impact factor: 49.962

4.  The Evf-2 noncoding RNA is transcribed from the Dlx-5/6 ultraconserved region and functions as a Dlx-2 transcriptional coactivator.

Authors:  Jianchi Feng; Chunming Bi; Brian S Clark; Rina Mady; Palak Shah; Jhumku D Kohtz
Journal:  Genes Dev       Date:  2006-05-16       Impact factor: 11.361

Review 5.  Natural antisense and noncoding RNA transcripts as potential drug targets.

Authors:  Claes Wahlestedt
Journal:  Drug Discov Today       Date:  2006-06       Impact factor: 7.851

6.  Complex Loci in human and mouse genomes.

Authors:  Pär G Engström; Harukazu Suzuki; Noriko Ninomiya; Altuna Akalin; Luca Sessa; Giovanni Lavorgna; Alessandro Brozzi; Lucilla Luzi; Sin Lam Tan; Liang Yang; Galih Kunarso; Edwin Lian-Chong Ng; Serge Batalov; Claes Wahlestedt; Chikatoshi Kai; Jun Kawai; Piero Carninci; Yoshihide Hayashizaki; Christine Wells; Vladimir B Bajic; Valerio Orlando; James F Reid; Boris Lenhard; Leonard Lipovich
Journal:  PLoS Genet       Date:  2006-04-28       Impact factor: 5.917

7.  Identification and classification of conserved RNA secondary structures in the human genome.

Authors:  Jakob Skou Pedersen; Gill Bejerano; Adam Siepel; Kate Rosenbloom; Kerstin Lindblad-Toh; Eric S Lander; Jim Kent; Webb Miller; David Haussler
Journal:  PLoS Comput Biol       Date:  2006-04-21       Impact factor: 4.475

8.  Clusters of internally primed transcripts reveal novel long noncoding RNAs.

Authors:  Masaaki Furuno; Ken C Pang; Noriko Ninomiya; Shiro Fukuda; Martin C Frith; Carol Bult; Chikatoshi Kai; Jun Kawai; Piero Carninci; Yoshihide Hayashizaki; John S Mattick; Harukazu Suzuki
Journal:  PLoS Genet       Date:  2006-04-28       Impact factor: 5.917

9.  Transcript annotation in FANTOM3: mouse gene catalog based on physical cDNAs.

Authors:  Norihiro Maeda; Takeya Kasukawa; Rieko Oyama; Julian Gough; Martin Frith; Pär G Engström; Boris Lenhard; Rajith N Aturaliya; Serge Batalov; Kirk W Beisel; Carol J Bult; Colin F Fletcher; Alistair R R Forrest; Masaaki Furuno; David Hill; Masayoshi Itoh; Mutsumi Kanamori-Katayama; Shintaro Katayama; Masaru Katoh; Tsugumi Kawashima; John Quackenbush; Timothy Ravasi; Brian Z Ring; Kazuhiro Shibata; Koji Sugiura; Yoichi Takenaka; Rohan D Teasdale; Christine A Wells; Yunxia Zhu; Chikatoshi Kai; Jun Kawai; David A Hume; Piero Carninci; Yoshihide Hayashizaki
Journal:  PLoS Genet       Date:  2006-04       Impact factor: 5.917

10.  RNA interference is not involved in natural antisense mediated regulation of gene expression in mammals.

Authors:  Mohammad Ali Faghihi; Claes Wahlestedt
Journal:  Genome Biol       Date:  2006-05-09       Impact factor: 13.583

View more
  76 in total

Review 1.  Long non-coding RNAs and cancer: a new frontier of translational research?

Authors:  R Spizzo; M I Almeida; A Colombatti; G A Calin
Journal:  Oncogene       Date:  2012-01-23       Impact factor: 9.867

2.  Deep-sequencing of endothelial cells exposed to hypoxia reveals the complexity of known and novel microRNAs.

Authors:  Christine Voellenkle; Jeroen van Rooij; Alessandro Guffanti; Elena Brini; Pasquale Fasanaro; Eleonora Isaia; Larry Croft; Matei David; Maurizio C Capogrossi; Anna Moles; Armando Felsani; Fabio Martelli
Journal:  RNA       Date:  2012-01-26       Impact factor: 4.942

3.  RNAcentral: A vision for an international database of RNA sequences.

Authors:  Alex Bateman; Shipra Agrawal; Ewan Birney; Elspeth A Bruford; Janusz M Bujnicki; Guy Cochrane; James R Cole; Marcel E Dinger; Anton J Enright; Paul P Gardner; Daniel Gautheret; Sam Griffiths-Jones; Jen Harrow; Javier Herrero; Ian H Holmes; Hsien-Da Huang; Krystyna A Kelly; Paul Kersey; Ana Kozomara; Todd M Lowe; Manja Marz; Simon Moxon; Kim D Pruitt; Tore Samuelsson; Peter F Stadler; Albert J Vilella; Jan-Hinnerk Vogel; Kelly P Williams; Mathew W Wright; Christian Zwieb
Journal:  RNA       Date:  2011-09-22       Impact factor: 4.942

4.  Non-coding RNAs revealed during identification of genes involved in chicken immune responses.

Authors:  Marie-Laure Endale Ahanda; Thomas Ruby; Håkan Wittzell; Bertrand Bed'Hom; Anne-Marie Chaussé; Veronique Morin; Anne Oudin; Catherine Chevalier; John R Young; Rima Zoorob
Journal:  Immunogenetics       Date:  2008-11-14       Impact factor: 2.846

Review 5.  Noncoding RNA in development.

Authors:  Paulo P Amaral; John S Mattick
Journal:  Mamm Genome       Date:  2008-10-07       Impact factor: 2.957

6.  Complex architecture and regulated expression of the Sox2ot locus during vertebrate development.

Authors:  Paulo P Amaral; Christine Neyt; Simon J Wilkins; Marjan E Askarian-Amiri; Susan M Sunkin; Andrew C Perkins; John S Mattick
Journal:  RNA       Date:  2009-09-18       Impact factor: 4.942

Review 7.  Annotating non-coding transcription using functional genomics strategies.

Authors:  Alistair R R Forrest; Rehab F Abdelhamid; Piero Carninci
Journal:  Brief Funct Genomic Proteomic       Date:  2009-11

8.  Long noncoding RNA profiles identify five distinct molecular subtypes of colorectal cancer with clinical relevance.

Authors:  Haoyan Chen; Jie Xu; Jie Hong; Ruqi Tang; Xi Zhang; Jing-Yuan Fang
Journal:  Mol Oncol       Date:  2014-06-02       Impact factor: 6.603

9.  Homology-based annotation of non-coding RNAs in the genomes of Schistosoma mansoni and Schistosoma japonicum.

Authors:  Claudia S Copeland; Manja Marz; Dominic Rose; Jana Hertel; Paul J Brindley; Clara Bermudez Santana; Stephanie Kehr; Camille Stephan-Otto Attolini; Peter F Stadler
Journal:  BMC Genomics       Date:  2009-10-08       Impact factor: 3.969

10.  FASTR3D: a fast and accurate search tool for similar RNA 3D structures.

Authors:  Chin-En Lai; Ming-Yuan Tsai; Yun-Chen Liu; Chih-Wei Wang; Kun-Tze Chen; Chin Lung Lu
Journal:  Nucleic Acids Res       Date:  2009-05-12       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.