Literature DB >> 21504869

Databases and resources for human small non-coding RNAs.

Eneritz Agirre1, Eduardo Eyras.   

Abstract

Recent advances in high-throughput sequencing have facilitated the genome-wide studies of small non-coding RNAs (sRNAs). Numerous studies have highlighted the role of various classes of sRNAs at different levels of gene regulation and disease. The fast growth of sequence data and the diversity of sRNA species have prompted the need to organise them in annotation databases. There are currently several databases that collect sRNA data. Various tools are provided for access, with special emphasis on the well-characterised family of micro-RNAs. The striking heterogeneity of the new classes of sRNAs and the lack of sufficient functional annotation, however, make integration of these datasets a difficult task. This review describes the currently available databases for human sRNAs that are accessible via the internet, and some of the large datasets for human sRNAs from high-throughput sequencing experiments that are so far only available as supplementary data in publications. Some of the main issues related to the integration and annotation of sRNA datasets are also discussed.

Entities:  

Mesh:

Substances:

Year:  2011        PMID: 21504869      PMCID: PMC3500172          DOI: 10.1186/1479-7364-5-3-192

Source DB:  PubMed          Journal:  Hum Genomics        ISSN: 1473-9542            Impact factor:   4.639


sRNA databases

In 2001, three groups published independent reports on the discovery of a new class of small non-coding RNAs (sRNAs), which were named micro-RNAs (miRNAs) [1-3]. These comprise a large family of small, ~22 nucleotide-long, non-coding RNAs that have emerged as key players in post-transcriptional gene regulation [4]. Subsequent years have witnessed the discovery of many new types of sRNAs. In humans, apart from the hundreds of miRNAs detected so far, there are also many endogenous small interfering RNAs (endo-siRNAs)[5] and piwi-interacting RNAs (piRNAs)[6,7]. These and other short non-coding RNA molecules collectively are called 'sRNAs'. They are generally short (~18-30 nucleotides [nt]); do not code for proteins; exert their function as RNA molecules generally combined with protein factors; and represent a substantial portion of the RNA output of cells. Moreover, sRNAs encompass a diverse, widespread and basal regulatory system: they are known to regulate genes and genomes at different levels, including chromatin structure, transcription, RNA stability and translation [8-10]. Furthermore, they can act as activators or inhibitors and their disruption has been linked to disease [11]. The explosion of information on sRNAs makes necessary its organisation--in terms of their biogenesis, expression properties and functional characteristics--into public databases. Traditionally, GenBank,[12] the European Molecular Biology Laboratory (EMBL)[13] and the DNA Data Bank of Japan (DDBJ)[14] have been the depository of RNA sequences, while the Gene Expression Omnibus (GEO)[15] database at the National Center for Biotechnology Information (NCBI) compiles high-throughput data for miRNAs and other sRNAs from publications. Besides these generic resources, there are specialised databases for sRNAs. The most complete ones are those related to miRNAs, since their functional role in RNA metabolism is also the best characterised [5]. The miRBase database [16] (Table 1) is considered the central repository for microRNA sequence information. It contains all published miRNA sequences linked to primary literature and other secondary databases. In miRBase, the user can browse published miRNA sequences from several species and can perform searches by name, accession number etc. Another database for miRNAs is MirZ [17] (Table 1), which provides analysis tools for mining various datasets from sequencing projects [18] (Table 2) and miRNA expression profiles. MirZ integrates two previously developed resources, the smiRNAdb miRNA expression atlas [18] and the E1MMo miRNA target prediction algorithm [32].
Table 1

sRNA databases

DatabasesRNA typeData availableReference
miRBasemiRNAsSequence, genomic location andpredicted targets of publishedmiRNAs with links to referenceshttp://www.mirbase.org/
MirZmiRNAsSequencing-based miRNAexpression profiles and predictedtargetshttp://www.mirz.unibas.ch
IsomiR databasemiRNAs and isomiRsReads and isomiRs assigned tomiRNAs from human 293T cells,with miRNA annotation frommiRBasehttp://galas.systemsbiology.net/cgi-bin/isomir/find.pl
siRNAdbsiRNAsExperimentally verified andpredicted siRNAs. Sequenceinformation and links to theliteraturehttp://sirna.sbc.su.se/
piRNABankpiRNAsSequence information, clusters andhomology searches for piRNAshttp://pirnabank.ibab.ac.in/
snoRNA-LBME-dbsnoRNAs, scaRNAsSequence, expression informationand predicted targets withbase-pairing informationhttp://www-snorna.biotoul.fr/
RfamsnRNAs, snoRNAs,miRNAs, other structuralRNAsSequence families of structuralRNAs. Families represented by amultiple sequence alignment and aprobabilistic modelhttp://rfam.sanger.ac.uk/
NONCODEmiRNAs, piRNAs,snoRNAs, scaRNAsSequence information with links toGenBank and functional informationhttp://noncode.org/
RNAdbmiRNAs, snoRNAs,piRNAs, other ncRNAsSequence information with links toliterature and other databaseshttp://jsm-research.imb.uq.edu.au/rnadb/
deepBasemiRNAs, piRNAs,endo-siRNAs, nasRNAs,pasRNAs, easRNAs,rasRNAsSequences and clusters for sRNAsfrom different tissues and forcomputationally predicted sRNAshttp://deepbase.sysu.edu.cn/
fRNAdbncRNAs from varioussources (see text)Annotation of known and predictednon-coding RNAs of variouslengths with visualisation ingenomic contexthttp://www.ncrna.org

easRNAs, exon-associated small RNAs; endo-siRNAs, endogenous small interfering RNAs; miRNAs, micro-RNAs; isomiRNAs, nasRNAs, non-coding RNA associated small RNAs; NCRNA, non-coding RNA; pasRNA, promotor-associated small RNAs; piRNAs, piwi-interacting RNAs; rasRNAs, repeat associated small RNAs; scaRNAs, Cajal body-specific RNAs; siRNAs, small interfering RNAs; snRNAs, small nuclear RNAs; snoRNAs, small nucleolar RNAs; SRNA, small RNA.

Table 2

Human sRNA datasets from deep sequencing

sRNABiologyDatasets
TSSa RNAs20-90nt sRNAs, localised within -250 to +50 of TSSs.Similar to PASRs. Dataset included in deepBaseGSE13483[19]
PASRsPromoter-associated small RNAs. 20-200 nt long, with 5'ends coinciding with the TSSsGSE14362[20]
tiRNAs18 nt length sRNAs, localised downstream of TSShttp://fantom.gsc.riken.jp/4/download/Supplemental_Materials/Taft_et_al_2009[21]
spliRNAsNuclear sRNAs, enriched at splice sitesGSE20664[22]
TASRNAsTermini-associated sRNAsGSE7576[23]
aTASRNAsTermini-associated sRNAs from the antisense strandSRA012676[24]
miRNAssRNA sequences from libraries from different humanorgan systems and cell types. Included in MirZGSE7233[18]
miRNAsmiRNAs from HeLa cells. Dataset included in deepBaseGSE10829[25]
miRNAsmiRNAs expressed in human leucocytesGSE19833[26]
sRNAssRNAs associated with AGO1 and AGO2, derived fromsnoRNAs and with miRNA-like functions. Dataset includedin deepBaseGSE13370[27]
sRNAsDerived from tRNAs, in competition with miRNAs28-30
sRNAsDerived from snoRNAs, with miRNA-like functions31

AGO, Argonaute; miRNA, micro-RNAs; SRNA, small RNA; snoRNAs, small nucleolar RNAs; tRNA, transfer RNA; TSS, transcription start site.

sRNA databases easRNAs, exon-associated small RNAs; endo-siRNAs, endogenous small interfering RNAs; miRNAs, micro-RNAs; isomiRNAs, nasRNAs, non-coding RNA associated small RNAs; NCRNA, non-coding RNA; pasRNA, promotor-associated small RNAs; piRNAs, piwi-interacting RNAs; rasRNAs, repeat associated small RNAs; scaRNAs, Cajal body-specific RNAs; siRNAs, small interfering RNAs; snRNAs, small nuclear RNAs; snoRNAs, small nucleolar RNAs; SRNA, small RNA. Human sRNA datasets from deep sequencing AGO, Argonaute; miRNA, micro-RNAs; SRNA, small RNA; snoRNAs, small nucleolar RNAs; tRNA, transfer RNA; TSS, transcription start site. Although it is generally assumed that a single precursor miRNA molecule leads to a single functional miRNA, there is evidence that precursors can be processed with heterogeneous ends, giving rise to isomiRs [33]. A recently published database collects isomiR sequences from high-throughput sequencing of miRNAs from human 293T cells [34] (Table 1). This database allows one to retrieve all reads and isomiRs assigned to a specific miRNA, with results linked to miRBase and Ensembl [35]. The endo-siRNAs, which were first observed in plants, are also a very abundant class of sRNAs [10]. They share some properties regarding biogenesis and function with miRNAs [36,37]. More recently, other sources of endo-siRNAs have been identified, such as convergent mRNA transcripts and sense-antisense pairs [10]. To date, there is only one database specific for siRNAs, siRNAdb,[38] which contains the collections of endogenous and exogenous siRNA molecules from the literature that have been experimentally verified. Moreover, siRNAdb also includes predicted siRNAs based on a combination of computational prediction methods [39-41]. Additionally, the database includes information about targets and experimental sources. A set of target predictions is also available for the non-experimentally verified siRNAs. In 2006, a new abundant sRNA species of approximately 30 nt was observed in extracts of total RNA from mouse testes [42]. These somewhat larger types of sRNA, called piRNAs, were found to exert their function most clearly in the germline [43] and possibly in cancer cell lines [44,45]. In contrast to miRNAs and endo-siRNAs, they derive from single-stranded precursors [10]. The only database exclusively dedicated to piRNAs is piRNAbank,[46] which contains piRNA sequences collected from Genbank [47] and published data, indexed through unique identifiers linked to NCBI and with additional information on gene name and genomic position. Small nucleolar RNAs (snoRNAs) are longer types of non-coding RNA (ncRNA; 60-300 nt in length), but are also regarded as sRNAs. These are a highly evolutionarily conserved class of RNAs which function mostly in the nucleolus and participate in the chemical modification of other RNAs, mainly ribosomal RNAs (rRNAs) [48]. The snoRNA-LBME-db database [49] was created to collect the available information on human snoRNAs. Besides sequence information, it also includes predicted RNA targets and potential base-pairing interactions with these targets. The data have been collected from the literature and can be accessed by the name of the snoRNA or can be downloaded in their entirety. This database includes the Cajal body-specific RNAs (scaRNAs). This is a class of nuclear sRNAs similar to snoRNAs which accumulate specifically in Cajal bodies and guide modifications of small nuclear RNAs (snRNAs). The latter are also considered to be sRNAs and are part of the spliceosome [50]. All of these nuclear sRNAs and other structural RNAs are present in the Rfam database (Table 1), which is a general collection of structured RNA families [51] represented by their sequences and a structural model. Rfam also includes models for miRNAs. Recent advances in deep-sequencing technologies have yielded a large number of short RNA sequences,[16,37,52,53] leading to an exhaustive detection of both known and novel sRNAs. In order to facilitate access to deep-sequencing data, the Short Read Archive (SRA) of NCBI [54] and the European Nucleotide Archive [55] (ENA) http://www.ebi.ac. uk/ena/ have been created. They provide centralised access to published sequencing data, including sRNAs, and, more notably, all the data are accessible through powerful search tools. Many of these sRNAs, however, still await classification. They vary significantly in origin and structure and their function is often unknown, which makes it difficult to develop databases for their storage and analysis. A growing number of public databases are now available for sRNAs obtained from deep-sequencing experiments that classify them in terms of the experimental origin, function (if known) and genomic localisation. These databases try to integrate novel heterogeneous data and sometimes include curated data. The NONCODE database [56] (Table 1) provides organised information for snoRNAs, piRNAs, miRNAs, scaRNAs and other ncRNA classes present in GenBank, classified according to cellular process. Similarly, RNAdb [57] (Table 1) provides access to data from snoRNA-LBME-db, miRBase, the FANTOM project [58] and the H-invitational project [59]. All the datasets are divided according to the main sRNA classes. Additionally, RNAdb contains predictions based on comparative methods. All RNAdb data are available for downloading. The deepBase [60] database (Table 1) contains the most heterogeneous collection of sRNAs from deep-sequencing data from different libraries (Table 2) [19,25,27]. The deep-sequencing data have been mapped to the human genome assembly and annotated according to various sRNA classes, which are defined by the location of the mapped sRNAs: non-coding RNA-associated small RNAs (nasRNAs); sRNAs that overlap promoter regions (pasRNAs), which overlap with the transcription start site (TSS) of genes; exon-associated small RNAs (easRNAs),[60] which overlap with exons from RefSeq [61] genes; and repeat-associated small RNAs (rasRNAs),[60] which overlap with repeat elements from the University of California, Santa Cruz (UCSC) genome browser [62]. Additionally, deepBase includes: all known miRNAs from miRBase and snoRNAs from snoRNA-LBME; novel predicted miRNAs and snoRNAs; and RNA clusters built from the sRNAs that are proximal in the genomic sequence. Finally, fRNAdb [63] (Table 1) is a searchable database, with functional and genomic annotation for ncRNAs of various lengths from snoRNA-LBME-db, miRBase, the FANTOM and the H-invitational projects, NONCODE and RNAdb. At the same URL, the authors provide visualisation of the annotated and predicted ncRNAs with a mirror of the UCSC browser.

sRNA datasets

The sRNA world is expanding: it includes many new sRNA sequences that share many features but also differ in others, which makes it difficult to integrate them into the same resource. Despite the current wealth of public databases, there are many recently discovered sRNA classes that are not present in any specific or general database. Below, we enumerate some of the datasets that we cannot find in the previously mentioned databases, and which are only available directly from the publications. Deep sequencing has led to the detection of at least three new classes of sRNAs linked to the region proximal to the promoter and TSS of genes: promoter-associated sRNAs, which are hypothesised to result from the transcription of independent capped short transcripts or as cleavage products of longer RNAs and which were specifically called 'PASRs';[20] TSS-associated RNAs (TSSa RNAs),[19] included in deepBase and similar to PASRs (Table 2); and transcription initiation RNAs (tiRNAs)[21] (Table 2), which are predominantly 18 nt in length and originate mostly from the region downstream of the TSS, possibly from the backtracking of RNA polymerase II (RNAPII) at the start of transcription [64]. Interestingly, a new class of nuclear sRNAs, also of about 18 nt in length, recently has been found to be associated with the 5' splice sites of genes (spliRNAs)[22] (Table 2). PASRs, TSSaRNAs and tiRNAs datasets are available at NCBI GEO, and spliRNAs at the FANTOM web page [65]. A different class of sRNAs, also with positional biases, are the termini-associated short RNAs (TASRs)[23] (Table 2) and their antisense counterparts (aTASRs),[24] obtained via a change of the sequencing protocol. These are found to be located antisense of the 3'-untranslated regions of genes. They are only available from the SRA at NCBI (Table 2). Interestingly, some authors have proposed that some miRNAs and other sRNAs could stem from the processing of snoRNAs by a still unknown mechanism [66-68]. Moreover, there is also some evidence from deep-sequencing experiments that other sRNAs also could originate from transfer RNAs (tRNAs) [28-30]. The sRNAs derived from snoRNAs [31,68] and tRNAs [28-30] (see Table 2) are not available in any of the databases mentioned above. Additionally, there are datasets of sRNAs expressed in specific cell lines, like novel miRNAs obtained from human leucocytes [26] (Table 2), which have not yet been integrated into any database. These examples of new classes of sRNAs display diverse positional and length biases and their function is mostly uncharacterised. They are therefore difficult to integrate with each other and within the existing sRNA databases.

Discussion

The sRNA data currently available--especially those obtained from deep-sequencing experiments--are very heterogeneous. Ideally, these sRNA datasets should be compiled and integrated together, thereby facilitating functional and computational downstream analyses. One of the main problems with the integration of sRNA data is the lack of functional information. While miRNAs and siRNAs are well characterised in terms of how they interact with their targets,[10,36,69] nothing is known so far about the mode of action of many of the new classes of sRNAs. This is further complicated by the fact that sRNAs may have multiple functions. For instance, exogenous siRNAs may affect gene expression [70] and also splicing [71] by inducing chromatin changes. Likewise, miRNAs can also affect gene expression through a similar pathway affecting chromatin,[72] and other sRNAs originating from sense-antisense transcription can also trigger similar mechanisms [73]. For many of the new sRNA classes, then, we still lack a function and a target definition. Interestingly, the Argonaute (AGO) family of proteins is known to be essential for the function of miRNAs, piRNAs and endo-siRNAs [74]. It may be possible, therefore, that some of the new classes of sRNAs exert their function through one or more members of the AGO family as well, and may interact with DNA or RNA in a similar way. Further evidence on the proteins that interact with the new classes of sRNAs will help in defining their function and possible targets. Another important issue for the classification of sRNAs is the characterisation of their biogenesis. Interestingly, some of the new sRNAs have similar positional biases. The sRNAs localised near the TSS of genes have been linked to transcription activity [21]. Likewise, TSSa RNAs and PASRs have been associated with transcription and processing of other RNAs [20]. These similarities may be indicative of a common biogenesis mechanism. Furthermore, many snRNAs are found to be independent of the processing machinery responsible for the biogenesis of miRNAs [75]. Thus, there must be other factors involved in the generation of these sRNAs. Recent experiments have shown that there is considerable endonucleolytic activity associated with still undetermined proteins [76]. These proteins may be the key factors responsible for the biogenesis of some of the novel classes of sRNAs. In summary, high-throughput methods--and especially deep-sequencing technologies--provide a unique opportunity to explore the wealth of RNA species in diverse cellular contexts. Different types of RNA have been characterised using these technologies, giving rise to new sRNA categories. Further experiments will be necessary to determine the function and biogenesis of these new molecules. Considering the large amount of data that are being generated, we are just touching the tip of the iceberg regarding the sRNA world. The coming years will witness new and exciting developments in molecular and computational biology in the area of sRNAs.
  76 in total

1.  Rational siRNA design for RNA interference.

Authors:  Angela Reynolds; Devin Leake; Queta Boese; Stephen Scaringe; William S Marshall; Anastasia Khvorova
Journal:  Nat Biotechnol       Date:  2004-02-01       Impact factor: 54.908

2.  Guidelines for the selection of highly effective siRNA sequences for mammalian and chick RNA interference.

Authors:  Kumiko Ui-Tei; Yuki Naito; Fumitaka Takahashi; Takeshi Haraguchi; Hiroko Ohki-Hamazaki; Aya Juni; Ryu Ueda; Kaoru Saigo
Journal:  Nucleic Acids Res       Date:  2004-02-09       Impact factor: 16.971

3.  GenBank: update.

Authors:  Dennis A Benson; Ilene Karsch-Mizrachi; David J Lipman; James Ostell; David L Wheeler
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

4.  Improved and automated prediction of effective siRNA.

Authors:  Alistair M Chalk; Claes Wahlestedt; Erik L L Sonnhammer
Journal:  Biochem Biophys Res Commun       Date:  2004-06-18       Impact factor: 3.575

Review 5.  Small regulatory RNAs in mammals.

Authors:  John S Mattick; Igor V Makunin
Journal:  Hum Mol Genet       Date:  2005-04-15       Impact factor: 6.150

6.  A novel class of small RNAs in mouse spermatogenic cells.

Authors:  Shane T Grivna; Ergin Beyret; Zhong Wang; Haifan Lin
Journal:  Genes Dev       Date:  2006-06-09       Impact factor: 11.361

7.  Stem-cell protein Piwil2 is widely expressed in tumors and inhibits apoptosis through activation of Stat3/Bcl-XL pathway.

Authors:  Jae Ho Lee; Dorothea Schütte; Gerald Wulf; Laszlo Füzesi; Heinz-Joachim Radzun; Stephan Schweyer; Wolfgang Engel; Karim Nayernia
Journal:  Hum Mol Genet       Date:  2005-12-23       Impact factor: 6.150

8.  Small interfering RNA-induced transcriptional gene silencing in human cells.

Authors:  Kevin V Morris; Simon W-L Chan; Steven E Jacobsen; David J Looney
Journal:  Science       Date:  2004-08-05       Impact factor: 47.728

9.  snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs.

Authors:  Laurent Lestrade; Michel J Weber
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

10.  siRNAdb: a database of siRNA sequences.

Authors:  Alistair M Chalk; Richard E Warfinge; Patrick Georgii-Hemming; Erik L L Sonnhammer
Journal:  Nucleic Acids Res       Date:  2005-01-01       Impact factor: 16.971

View more
  4 in total

1.  An Introduction to Programming for Bioscientists: A Python-Based Primer.

Authors:  Berk Ekmekci; Charles E McAnany; Cameron Mura
Journal:  PLoS Comput Biol       Date:  2016-06-07       Impact factor: 4.475

2.  The discovery potential of RNA processing profiles.

Authors:  Amadís Pagès; Ivan Dotu; Joan Pallarès-Albanell; Eulàlia Martí; Roderic Guigó; Eduardo Eyras
Journal:  Nucleic Acids Res       Date:  2018-02-16       Impact factor: 16.971

3.  Bioinformatic analysis of endogenous and exogenous small RNAs on lipoproteins.

Authors:  Ryan M Allen; Shilin Zhao; Marisol A Ramirez Solano; Wanying Zhu; Danielle L Michell; Yuhuan Wang; Yu Shyr; Praveen Sethupathy; MacRae F Linton; Gregory A Graf; Quanhu Sheng; Kasey C Vickers
Journal:  J Extracell Vesicles       Date:  2018-08-13

Review 4.  The Underlying Mechanisms of Noncoding RNAs in the Chemoresistance of Hepatocellular Carcinoma.

Authors:  Man Wang; Fei Yu; Xinzhe Chen; Peifeng Li; Kun Wang
Journal:  Mol Ther Nucleic Acids       Date:  2020-05-15       Impact factor: 8.886

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.