| Literature DB >> 19966272 |
Jian-Hua Yang1, Peng Shao, Hui Zhou, Yue-Qin Chen, Liang-Hu Qu.
Abstract
Advances in high-throughput next-generation sequencing technology have reshaped the transcriptomic research landscape. However, exploration of these massive data remains a daunting challenge. In this study, we describe a novel database, deepBase, which we have developed to facilitate the comprehensive annotation and discovery of small RNAs from transcriptomic data. The current release of deepBase contains deep sequencing data from 185 small RNA libraries from diverse tissues and cell lines of seven organisms: human, mouse, chicken, Ciona intestinalis, Drosophila melanogaster, Caenhorhabditis elegans and Arabidopsis thaliana. By analyzing approximately 14.6 million unique reads that perfectly mapped to more than 284 million genomic loci, we annotated and identified approximately 380,000 unique ncRNA-associated small RNAs (nasRNAs), approximately 1.5 million unique promoter-associated small RNAs (pasRNAs), approximately 4.0 million unique exon-associated small RNAs (easRNAs) and approximately 6 million unique repeat-associated small RNAs (rasRNAs). Furthermore, 2038 miRNA and 1889 snoRNA candidates were predicted by miRDeep and snoSeeker. All of the mapped reads can be grouped into about 1.2 million RNA clusters. For the purpose of comparative analysis, deepBase provides an integrative, interactive and versatile display. A convenient search option, related publications and other useful information are also provided for further investigation. deepBase is available at: http://deepbase.sysu.edu.cn/.Entities:
Mesh:
Year: 2009 PMID: 19966272 PMCID: PMC2808990 DOI: 10.1093/nar/gkp943
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.The basic framework of deepBase. All results generated by deepBase are deposited in relational databases and displayed in the visual browser and web page. The web-interface programmes and browser can be accessed by a wide range of research biologists to analyze and visualize data over the internet.
Statistics in deepBase
| Human | Mouse | Chicken | |||||
|---|---|---|---|---|---|---|---|
| small RNA library | 9 | 63 | 4 | 4 | 31 | 25 | 49 |
| Unique read | 1 456 537 | 1 490 531 | 137 801 | 340 879 | 2 522 289 | 3 156 821 | 5 478 930 |
| Locus number | 22 437 894 | 215 546 228 | 782 488 | 3 590 208 | 19 760 563 | 7 402 057 | 14 613 634 |
| nasRNA | 49 703 | 99 657 | 10 370 | 5448 | 63 565 | 137 904 | 12 507 |
| pasRNA | 62 791 | 105 413 | 5633 | 46 411 | 142 645 | 459 139 | 697 750 |
| easRNA | 160 347 | 354 524 | 6666 | 1687 | 751 728 | 1 990 763 | 674 086 |
| rasRNA | 616 070 | 658 476 | 8099 | 34 300 | 1 409 439 | 293 658 | 2 907 928 |
| RNA cluster | 151 245 | 538 138 | 8801 | 62 583 | 77 113 | 215 226 | 114 235 |
| Predicted miRNA | 705 | 588 | 275 | / | 134 | 336 | / |
| Predicted snoRNA | 378 | 603 | 124 | 263 | 145 | 197 | 179 |
Statistics indicating the numbers of small RNA library, unique read mapped to one or more loci, locus number, ncRNA-associated small RNAs (nasRNAs), promoter-associated small RNAs (pasRNAs), exon-associated small RNAs (easRNAs), repeat-associated small RNAs (rasRNAs), RNA cluster, predicted miRNAs and snoRNAs for the seven organisms, including human, mouse, chicken, C. intestinalis, D. melanogaster, C. elegans and Arabidopsis. Arabidopsis miRNA data are not present in the table because miRDeep (5) cannot effectively predict plant miRNAs. C. intestinalis miRNAs have been predicted previously by miRDeep (11).
Figure 2.Snapshot of the deepView browser. (a) The controls directly underneath position the browser over a specific region in the genome. (b) RNA genes from Ensembl or the literature. (c) refSeq Gene. (d) microRNA gene from miRBase v13. (e) RNA clusters generated by this study. (f) The predicted snoRNAs from deep sequencing data using snoSeeker. (g) The predicted miRNA genes from deep sequencing data using miRDeep. (h) Strand-specific cluster expression peak (mapped small RNA density) generated for diverse tissues and cell lines. (i) Reads mapped to the genome.