| Literature DB >> 35556129 |
Ernesto Aparicio-Puerta1, Cristina Gómez-Martín2, Stavros Giannoukakos3,4,5, José María Medina3,4,5, Chantal Scheepbouwer2,6, Adrián García-Moreno7, Pedro Carmona-Saez7, Bastian Fromm8, Michiel Pegtel2, Andreas Keller1, Juan Antonio Marchal5,9,10, Michael Hackenberg3,4,5,10.
Abstract
The NCBI Sequence Read Archive currently hosts microRNA sequencing data for over 800 different species, evidencing the existence of a broad taxonomic distribution in the field of small RNA research. Simultaneously, the number of samples per miRNA-seq study continues to increase resulting in a vast amount of data that requires accurate, fast and user-friendly analysis methods. Since the previous release of sRNAtoolbox in 2019, 55 000 sRNAbench jobs have been submitted which has motivated many improvements in its usability and the scope of the underlying annotation database. With this update, users can upload an unlimited number of samples or import them from Google Drive, Dropbox or URLs. Micro- and small RNA profiling can now be carried out using high-confidence Metazoan and plant specific databases, MirGeneDB and PmiREN respectively, together with genome assemblies and libraries from 441 Ensembl species. The new results page includes straightforward sample annotation to allow downstream differential expression analysis with sRNAde. Unassigned reads can also be explored by means of a new tool that performs mapping to microbial references, which can reveal contamination events or biologically meaningful findings as we describe in the example. sRNAtoolbox is available at: https://arn.ugr.es/srnatoolbox/.Entities:
Year: 2022 PMID: 35556129 PMCID: PMC9252802 DOI: 10.1093/nar/gkac363
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 19.160
Figure 1.Evolution of miRNA-seq data hosted on SRA and consequent improvements in sRNAbench. (A) Accumulated number of samples in SRA per year and the increasing tendency of samples per project. (B) Number of species with publicly available miRNA-seq data on SRA. Both graphics have been generated after querying the database (https://www.ncbi.nlm.nih.gov/sra/) with ‘Library selection’ field set to miRNA-seq. (C) New sRNAbench status page that allows for on-the-go annotation and direct launching of differential expression jobs. (D) New helper tool that can summarize and visualize statistics and count matrix. The microRNA read length distributions were obtained for two parental cell lines (WT; SRR3174960, SRR3174961) and two DICER KO cell lines (KO: SRR3174967,SRR3174968) showing clear differences between these two conditions.
A comparison between sRNAtoolbox 2022 and 2019 databases
| sRNAtoolbox 2022 | sRNAtoolbox 2019 | |
|---|---|---|
|
| Ensembl 104 (or 51 for Metazoan), 350 genomes | Ensembl v91, 97 genomes |
|
| Ensembl Plants v51, 91 genomes | Ensembl Plants, 48 genomes |
|
| NCBI Microbial Genomes (one genome per genus), 3004 genomes | NCBI Microbial Genomes (one genome per genus), 781 genomes |
|
| NCBI Microbial Genomes (one genome per species), 9655 genomes | Not available |
|
| NCBI virus (one genome per species), 10301 genomes | Not available |
|
| RNA central release 20 (snoRNA, snRNA,…) | RNA central release 13 |
|
| pMiren, MirGeneDB, miRbase | MirGeneDB, miRbase |
|
| 358 | 293 |
Figure 2.(A) Genome distribution (human and bacterial collection) of PA + sample SRR10274305, (B) relative frequencies of reads mapped to different phylum reference libraries (SRR10274305), (C) relative frequencies of reads mapped to different phylum reference libraries of a healthy subject (SRR10274270), (D) results table of reads mapped in sense direction to proteobacteria reference sequences, (E) graphical representation of the 20 most conserved Arabidopsis thaliana PmiREN microRNAs, (F) plant species that contain most exact Arabidopsis thaliana miRNA sequences, (G) 20 miRNAs with most matches to animal genomes and (H) animal genomes with most exact matches of A. thaliana miRNA sequences.