Literature DB >> 15608158

NONCODE: an integrated knowledge database of non-coding RNAs.

Changning Liu1, Baoyan Bai, Geir Skogerbø, Lun Cai, Wei Deng, Yong Zhang, Dongbo Bu, Yi Zhao, Runsheng Chen.   

Abstract

NONCODE is an integrated knowledge database dedicated to non-coding RNAs (ncRNAs), that is to say, RNAs that function without being translated into proteins. All ncRNAs in NONCODE were filtered automatically from literature and GenBank, and were later manually curated. The distinctive features of NONCODE are as follows: (i) the ncRNAs in NONCODE include almost all the types of ncRNAs, except transfer RNAs and ribosomal RNAs. (ii) All ncRNA sequences and their related information (e.g. function, cellular role, cellular location, chromosomal information, etc.) in NONCODE have been confirmed manually by consulting relevant literature: more than 80% of the entries are based on experimental data. (iii) Based on the cellular process and function, which a given ncRNA is involved in, we introduced a novel classification system, labeled process function class, to integrate existing classification systems. (iv) In addition, some 1100 ncRNAs have been grouped into nine other classes according to whether they are specific to gender or tissue or associated with tumors and diseases, etc. (v) NONCODE provides a user-friendly interface, a visualization platform and a convenient search option, allowing efficient recovery of sequence, regulatory elements in the flanking sequences, secondary structure, related publications and other information. The first release of NONCODE (v1.0) contains 5339 non-redundant sequences from 861 organisms, including eukaryotes, eubacteria, archaebacteria, virus and viroids. Access is free for all users through a web interface at http://noncode.bioinfo.org.cn.

Entities:  

Mesh:

Substances:

Year:  2005        PMID: 15608158      PMCID: PMC539995          DOI: 10.1093/nar/gki041

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Traditionally, most RNA molecules were regarded as carriers conveying information from the gene to the translation machinery. The most prominent exceptions to this are transfer RNA (tRNA) and ribosomal RNA (rRNA), both of which are directly involved in the process of translation. However, since the late 1990s, it has been widely acknowledged that other types of non-protein-coding RNA molecules are present in organisms ranging from bacteria to mammals, which affect a large variety of processes including plasmid replication, phage development, bacterial virulence, chromosome structure, DNA transcription, RNA processing and modification, development control and others (1–16). These observations suggest that the traditional view of the structure of the genetic regulatory systems in organisms is far from complete. Therefore, further research on non-protein-coding RNA will give us a new framework for considering and understanding the genomic programming of biological complexity. However, the unsystematic naming of non-protein-coding RNAs may be an impediment to effective research. The term small RNAs (sRNAs) has been predominantly used for such RNAs in bacteria, whereas the term non-coding RNAs (ncRNAs) has been the most common term for eukaryotic RNAs of this kind (17,18). To have a common term for all such RNAs, we have opted to apply the term ncRNA to all these functional RNAs, irrespective of the realm of life in which they might appear. The understanding of the importance of ncRNAs in basic cellular processes is ever increasing, and new members and classes of ncRNAs are continuously being reported. Thus, over the years, several databases have been established to collect, organize and classify ncRNA sequences and information. Some databases are intended to collect only certain category of ncRNAs, such as SRP RNAs, tmRNAs or RNase P RNAs, whereas others, such as the Small RNA Database, the Non-coding RNA Database and the Rfam Database, have collected ncRNAs of several categories (19–24). However, even in the latter kind of databases certain ncRNA members or classes are missing. Another problem with all the current databases is that the classification systems for ncRNAs used nowadays are not uniform and only a few attempts have been made to integrate the various classification systems. In these classification systems, some ncRNA groups are named according to cellular localizations, such as snRNAs, snoRNAs or scRNAs, some are named according to functions, like pRNAs (package RNAs), gRNAs (guide RNAs) or tmRNAs (transfer-messenger RNAs), and others again are simply labeled according to their sedimentation coefficients (6S RNA, 5.3S RNA, etc.). Furthermore, because of this lack of integration, one type of ncRNA often appears under several names or in more than one category (7,12,25–30). The ncRNA database NONCODE was created against this background. NONCODE comprises almost all ncRNAs now publicly available (except tRNAs and rRNAs) that are either confirmed experimentally or predicted computationally. The first release of NONCODE (v1.0) contains 5339 non-redundant sequences from 861 organisms, including eukaryotes, eubacteria, archaebacteria, virus and viroids. Furthermore, to integrate existing classification systems, a new classification system labeled the process function class (PfClass) has been introduced, based on the cellular process and function in which a given ncRNA is involved. PfClass provides a unified classification system and a concise functional annotation of ncRNAs. According to the cellular process involved, 5339 ncRNAs were assigned one or more of 26 PfClasses. The PfClass classification system is the first attempt of a unified classification system for ncRNAs. It is our hope that this integrated system will help in clearing up the classification problem. In conclusion, the aim of the NONCODE database is to be a unified gateway to search, retrieve and update information about ncRNAs in order to facilitate research on ncRNAs, gene networks and functional genomics. Through a user-friendly web interface at http://noncode.bioinfo.org.cn, access is free for all users.

METHODS AND IMPLEMENTATION

NONCODE pipeline

GenBank entries were the major source of data, and the PubMed database was used as the starting point for the data collection (31). PubMed was first filtered using queries from a table of keywords, which includes ‘ncRNA’, ‘snoRNA’, ‘snRNA’, ‘tmRNA’, ‘SRP RNA’, ‘gRNA’, etc. The publications that matched with these queries were then examined and the ncRNA sequences were extracted from the obtained literature. By reading the filtered literature, a new set of ncRNA keywords were gained and added into the keywords table. This new keywords table was used to filter the GenBank BCT, INV, MAM, PHG, PLN, PRI, ROD, VRL and VRT divisions automatically, and the filtered result was then manually confirmed. The original sequence and annotation information were imported into the database powered by MySQL. All the data are integrated and organized in such a manner that users can efficiently query and browse information.

NONCODE annotation

One significant characteristic of NONCODE is its content of additional information on the ncRNAs obtained from the related literature. Briefly, seven steps were carried out after the GenBank screening. (i) For each sequence filtered from GenBank, we manually checked whether or not it represented an actual ncRNA and assigned the confirmed sequence an accession number (NcID, i.e. ncRNA id). (ii) Basic information—name, alias, length, organisms, references, etc.—of confirmed sequences was collected from GenBank. (iii) Additional information concerning function, cellular role, cellular location, etc. was included, by consulting relevant literature. Each ncRNA has also been annotated with one of the five specific mechanisms (sequence base pairing, structural complementarity, spatial blocking, catalysis or epimodification), through which it exerts its function. (iv) According to our PfClass classification system, one or more of the 26 PfClasses were assigned to all ncRNAs. Moreover, a subset of 1114 ncRNAs have been divided into nine additional categories according to whether they are specific to gender or tissue or associated with tumors and diseases, etc. (v) To visualize the location of an ncRNA in the genome or in a specific DNA fragment, along with regulatory elements in the flanking sequences, GenBank annotations were used to create figures for all ncRNAs. (vi) Each ncRNA sequence was checked for redundancies using Perl scripts, and each cluster of redundant sequences was given a non-redundant accession number (UniqID, i.e. unique ncRNA id). (vii) The secondary structures of non-redundant ncRNA sequences were predicted using the Vienna RNA Package (32). The predicted result in the PDF format is available through the website.

NONCODE process function classification

Ever since the beginning of ncRNA research there has not been in place any integrated system for classification, and therefore, exists a considerable measure of confusion with respect to naming of ncRNAs. This frequently brings about difficulties when ncRNAs from different sources are collected for analysis. Therefore, when the NONCODE database was established it was carefully considered as how to establish classification criteria that might increase the usefulness of the database resource. The cellular process and function of an ncRNA was chosen as the basic criterion for a unified classification system called PfClass in NONCODE. When labeled according to this system, each kind of ncRNA is named after its cellular process and corresponding function. The actual category is given according to two or three levels of keywords connected by an underscore. The first keyword will be DNA, RNA or Protein, representing a cellular process in which either of the three molecular types is a crucial component. The second keyword describes the actual process, and if the ncRNA is involved in a complex process with several aspects, a third keyword may further indicate a more specific function of the ncRNA. For example, the snRNA U1 will be assigned to the PfClass RNA_processing_splicing, and RNase P RNAs to the PfClass RNA_processing_cleavage (for details see Table 1).
Table 1.

PfClasses in NONCODE v1.0 and their corresponding traditional classes

PfClassCorresponding traditional classes
DNA_imprintingXIST, roX, H19, MHM, KvLQT1-AS, Tsix, Air
DNA_packagingpRNA
DNA_repairRNA a, b, c, d
DNA_replication_initiationRNAII
DNA_replication_regulationctRNA, RNA I
DNA_replication_repressionincA, RNA I
DNA_stabilitytelomerase RNA
DNA_transcription_initiationRNA II
DNA_transcription_regulationinc RNA, copA RNA, SRA
DNA_transcription_regulation of RNA polymerase6S RNA, 7SK
DNA_transcription_repressionRNAI, GcvB RNA
RNA_editinggRNA
RNA_modification_methylationsnoRNA
RNA_modification_methylation&pseudouridylationscaRNA
RNA_modification_pseudouridylationsnoRNA
RNA_processing_cleavageRNase P RNA, RNase MRP RNA, snoRNA
RNA_processing_splicingsnRNA, self-splicing ribozyme RNA, PAN
RNA_reverse_transcriptionmsr RNA
RNA_translation_enhancementcsrB RNA, DsrA RNA
RNA_translation_regulationANTI-RAF1, RprA, sok RNA, VA RNA, RyhB, sar RNA, NaPi-2b1, 5.3S RNA, aHIF
RNA_translation_suppressionmiRNA, DicF, Spot 42, Finp, MicF, OxyS, flmB, PrrB_RsmZ, NTT, GcvB RNA, etc.
RNA_translation_surveillancetmRNA
RNA_translocationScYC RNA, hsr-omega RNA, Xlsirt
Protein_transportSRP_7SL RNA, SRP_4.5S RNA
Miscfunction_mRNAlikeBORG, IGF2AS, CR20, meuRNA, Rian, Ks-1, GNAS1-as RNA, IPW, etc.
Miscfunction_snmBsr RNA, Y RNA, dsrB, vault RNA, 4.5S RNA, 6Sa RNA, G8, etc.

The first column represents the PfClass classification system. Each PfClass is given according to two or three levels of keywords connected by an underscore (‘_’). The first keyword will be DNA, RNA or Protein, representing a cellular process in which either of the three molecular types has a crucial function. The second keyword describes the actual process, and if the ncRNA is involved in a complex process with several aspects, a third keyword may further indicate a more specific function of the ncRNA. The second column lists corresponding traditional classes.

The PfClass classification system represents the first attempt of a unified classification system for ncRNAs. In the future, as our understanding of ncRNAs deepen, and the content of NONCODE further expands, steps will be taken to further extend and perfect the PfClass system in order to increase its usefulness. To further harmonize the exchange of data between different systems, application of Gene Ontology (GO) (33) annotation on our PfClass system will be considered.

CURRENT STATUS AND FUTURE DEVELOPMENTS

Till date, more than 10 000 sequences filtered from GenBank by our in-house program have been manually examined. The current release (v.1.0) of NONCODE contains a total of 6232 entries assigned to 26 PfClasses, and covers 109 traditional classes such as snRNA, snoRNA, microRNA and RNase P RNA. More than 80% of the entries are based on experimental data. Basic information on each entry is provided, including accession number in GenBank, traditional class, name, PfClass, organism, reference, UniqID (accession number without redundancy in NONCODE) and NcID (accession number with redundancy in NONCODE), all of which can be used as keywords for data search. NONCODE also provides additional information on function and cellular role, cellular location, chromosomal information, alternative names, secondary structure and whether or not the ncRNA has undergone splicing. Each ncRNA has also been annotated with one of the five specific mechanisms (sequence base pairing, structural complementarity, spatial blocking, catalysis or epimodification), through which it exerts its function. Figures showing genomic locations for all ncRNAs and their regulatory elements have been included, and a subdivision into nine additional classes (outside the PfClass system) has also been applied to a number of ncRNAs. NONCODE also offers an efficient search option, allowing recovery of sequence, related publications and other information. In the near future, several aspects of NONCODE will be improved. (i) For a number of ncRNAs, information on function, location, etc. is still lacking, and this information will be completed as soon as it becomes available. (ii) As the information on ncRNAs increases and the content of NONCODE further expands, the PfClass system will be further extended and perfected in order to increase its usefulness. GO annotation on the PfClass system will also be considered seriously, with the aim of harmonized exchange of data between the different systems. (iii) Additional services such as BLAST alignment, ncRNAs prediction and possibilities for submission and registration of users' sequences will be provided. In addition, two large-scale screens for novel ncRNAs in Caenorhabditis elegans and human tissues are being carried out in our laboratory (Y. Wang, Z.Y. Sun, Y. Zhao, C.N. Liu, G. Skogerbø, W. Deng, Z. Fu, Y.D. Wang, L. Cai and H.S. He, unpublished data), and the results will be added in the next version of NONCODE. NONCODE is thus designed to adapt and to reflect the most current information on ncRNAs available. It will continue to grow in both content and functionality, and will be updated every six months to include any new data from literature and GenBank.
  32 in total

1.  6S RNA regulates E. coli RNA polymerase activity.

Authors:  K M Wassarman; G Storz
Journal:  Cell       Date:  2000-06-09       Impact factor: 41.582

Review 2.  Small nucleolar RNA-guided post-transcriptional modification of cellular RNAs.

Authors:  T Kiss
Journal:  EMBO J       Date:  2001-07-16       Impact factor: 11.598

Review 3.  Non-coding RNA genes and the modern RNA world.

Authors:  S R Eddy
Journal:  Nat Rev Genet       Date:  2001-12       Impact factor: 53.242

Review 4.  Spliceosomal UsnRNP biogenesis, structure and function.

Authors:  C L Will; R Lührmann
Journal:  Curr Opin Cell Biol       Date:  2001-06       Impact factor: 8.382

5.  An expanding universe of noncoding RNAs.

Authors:  Gisela Storz
Journal:  Science       Date:  2002-05-17       Impact factor: 47.728

6.  Noncoding regulatory RNAs database.

Authors:  Maciej Szymański; Volker A Erdmann; Jan Barciszewski
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

7.  SRPDB: Signal Recognition Particle Database.

Authors:  Magnus Alm Rosenblad; Jan Gorodkin; Bjarne Knudsen; Christian Zwieb; Tore Samuelsson
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

8.  7SK small nuclear RNA binds to and inhibits the activity of CDK9/cyclin T complexes.

Authors:  V T Nguyen; T Kiss; A A Michels; O Bensaude
Journal:  Nature       Date:  2001-11-15       Impact factor: 49.962

9.  The 7SK small nuclear RNA inhibits the CDK9/cyclin T1 kinase to control transcription.

Authors:  Z Yang; Q Zhu; K Luo; Q Zhou
Journal:  Nature       Date:  2001-11-15       Impact factor: 49.962

10.  Differential antisense transcription from the Dictyostelium EB4 gene locus: implications on antisense-mediated regulation of mRNA stability.

Authors:  M Hildebrandt; W Nellen
Journal:  Cell       Date:  1992-04-03       Impact factor: 41.582

View more
  134 in total

Review 1.  Long non-coding RNAs and cancer: a new frontier of translational research?

Authors:  R Spizzo; M I Almeida; A Colombatti; G A Calin
Journal:  Oncogene       Date:  2012-01-23       Impact factor: 9.867

2.  RNAspace.org: An integrated environment for the prediction, annotation, and analysis of ncRNA.

Authors:  Marie-Josée Cros; Antoine de Monte; Jérôme Mariette; Philippe Bardou; Benjamin Grenier-Boley; Daniel Gautheret; Hélène Touzet; Christine Gaspin
Journal:  RNA       Date:  2011-09-23       Impact factor: 4.942

3.  RNA sequence analysis defines Dicer's role in mouse embryonic stem cells.

Authors:  J Mauro Calabrese; Amy C Seila; Gene W Yeo; Phillip A Sharp
Journal:  Proc Natl Acad Sci U S A       Date:  2007-11-07       Impact factor: 11.205

4.  Dynamic regulation of miRNA expression in ordered stages of cellular development.

Authors:  Joel R Neilson; Grace X Y Zheng; Christopher B Burge; Phillip A Sharp
Journal:  Genes Dev       Date:  2007-03-01       Impact factor: 11.361

5.  Evolutionary patterns of non-coding RNAs.

Authors:  Athanasius F Bompfünewerer; Christoph Flamm; Claudia Fried; Guido Fritzsch; Ivo L Hofacker; Jörg Lehmann; Kristin Missal; Axel Mosig; Bettina Müller; Sonja J Prohaska; Bärbel M R Stadler; Peter F Stadler; Andrea Tanzer; Stefan Washietl; Christina Witwer
Journal:  Theory Biosci       Date:  2005-04       Impact factor: 1.919

6.  A role for microRNAs in maintenance of mouse mammary epithelial progenitor cells.

Authors:  Ingrid Ibarra; Yaniv Erlich; Senthil K Muthuswamy; Ravi Sachidanandam; Gregory J Hannon
Journal:  Genes Dev       Date:  2007-12-15       Impact factor: 11.361

7.  Analysis and classification of RNA tertiary structures.

Authors:  Mira Abraham; Oranit Dror; Ruth Nussinov; Haim J Wolfson
Journal:  RNA       Date:  2008-09-29       Impact factor: 4.942

8.  Prediction and identification of tumor-specific noncoding RNAs from human UniGene.

Authors:  Xinting Sang; Haitao Zhao; Xin Lu; Yilei Mao; Ruoyu Miao; Huayu Yang; Yifan Yang; Jiefu Huang; Shouxian Zhong
Journal:  Med Oncol       Date:  2009-09-12       Impact factor: 3.064

9.  CASC15 contributes to proliferation and invasion through regulating miR-766-5p/ KLK12 axis in lung cancer.

Authors:  Yong Bai; Guojun Zhang; Ruirui Cheng; Rui Yang; Heying Chu
Journal:  Cell Cycle       Date:  2019-08-05       Impact factor: 4.534

Review 10.  A critical overview of long non-coding RNA in glioma etiology 2016: an update.

Authors:  Yuan-Feng Gao; Zhi-Bin Wang; Tao Zhu; Chen-Xue Mao; Xiao-Yuan Mao; Ling Li; Ji-Ye Yin; Hong-Hao Zhou; Zhao-Qian Liu
Journal:  Tumour Biol       Date:  2016-09-15
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.