Literature DB >> 29722807

The UEA sRNA Workbench (version 4.4): a comprehensive suite of tools for analyzing miRNAs and sRNAs.

Matthew B Stocks¹, Irina Mohorianu¹, Matthew Beckers¹, Claudia Paicu¹, Simon Moxon², Joshua Thody¹, Tamas Dalmay², Vincent Moulton¹.

Abstract

Motivation: RNA interference, a highly conserved regulatory mechanism, is mediated via small RNAs (sRNA). Recent technical advances enabled the analysis of larger, complex datasets and the investigation of microRNAs and the less known small interfering RNAs. However, the size and intricacy of current data requires a comprehensive set of tools, able to discriminate the patterns from the low-level, noise-like, variation; numerous and varied suggestions from the community represent an invaluable source of ideas for future tools, the ability of the community to contribute to this software is essential.
Results: We present a new version of the UEA sRNA Workbench, reconfigured to allow an easy insertion of new tools/workflows. In its released form, it comprises of a suite of tools in a user-friendly environment, with enhanced capabilities for a comprehensive processing of sRNA-seq data e.g. tools for an accurate prediction of sRNA loci (CoLIde) and miRNA loci (miRCat2), as well as workflows to guide the users through common steps such as quality checking of the input data, normalization of abundances or detection of differential expression represent the first step in sRNA-seq analyses. Availability and implementation: The UEA sRNA Workbench is available at: http://srna-workbench.cmp.uea.ac.uk. The source code is available at: https://github.com/sRNAworkbenchuea/UEA_sRNA_Workbench. Supplementary information: Supplementary data are available at Bioinformatics online.

Mesh：

Substances：

Year: 2018 PMID： 29722807 PMCID： PMC6157081 DOI： 10.1093/bioinformatics/bty338

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

RNA interference, a highly conserved regulatory mechanism, is mediated via small RNAs (sRNAs). These can be classified into microRNAs (miRNAs) and small interfering RNAs (siRNAs), differentiated by both biogenesis and mode of action (Carthew and Sontheimer, 2009). sRNAs play key roles in gene regulation in eukaryotes (Wilson and Doudna, 2013). Recent technical advances in high throughput sequencing, in depth and number of available samples and replicates, have enabled the analysis of larger, more complex datasets. Their analysis requires a comprehensive set of tools, able to distinguish patterns from the low-level, noise-like and variation. Drawbacks of existing tools include limited transferability of mRNA-seq methods to sRNA-seq data (Soneson and Delorenzi, 2013), the focus on particular aspects of the analysis e.g. the prediction of miRNAs or the limited number of available analyses within the same suite of tools (Rueda ). To address this we have expanded the functionality of the UEA sRNA Workbench (Stocks ) by including new features to facilitate its usage on a wide variety of hardware and to enable the seamless linking of its stand-alone components (Beckers ). We have reconfigured the code to facilitate an easy future development by members of the s/miRNA community.

2 New features and usage

We now describe the main additions to the new version of the Workbench; other, older features are described in (Stocks ). We also present features enhancing the usability and versatility of this software, such as the addition of pre-configured templates and availability of the software on the Amazon Web Services (AWS).

2.1 Software architecture and usage

The Workbench is implemented in Java, a cross-platform language conferring flexibility between operating systems; all dependencies were compiled per platform forgoing the need for additional configurations. The reconfigured source code is based on a modular design built around a set of interfaces and classes, all of which can be extended. From version 4.4 the source code was released on GitHub. The Workbench supports and uses a variety of file types from raw data (sequencing files) to processed files. The former category includes *.fastq and/or *.fasta (Cock ). Alignment files, generated or accepted within the Workbench, are in PatMaN, SAM (Li ) or Bowtie formats (Ziemann ), as requested by users of previous versions of the Workbench. Reference sequences (e.g. genomes, transcriptomes) are accepted in *.fasta or indexed formats. Annotation information is currently read as gff files. Previous versions used traditional data structures for handling sequencing data (raw or indexed). Recently, larger experiments became affordable; the knock-on effect was a proportional increase in the size of raw/processed files, increasing the memory requirements. To reduce the memory footprint we use secondary storage via a relational database. In addition, the main interface was redesigned to facilitate the chaining of tools into workflows. Each node (tool) performs a specific task of either processing or analyzing the data; on node-completion graphical summaries are presented. To configure a workflow, a wizard guides the user through the data input (samples with/without replicates) and reference genome. Upon setup completion, the structure of the project can be inspected using a tree diagram where leaf nodes represent the data files. Starting with version 4.4 the source code of the UEA sRNA Workbench is available on GitHub, thus enabling the community to amend existing tools and also develop new tools which can be seamlessly linked to the existing framework. To simplify the technical aspects of integrating new features we provide a ‘template’ tool/workflow that already contains the integrative functionality and that can be extended by users.

2.2 Improved stand-alone helper tools

The Workbench helper tools were designed to trim the adapters in sRNA-seq data (adapter sequences for versions of Illumina/454 are pre-loaded; custom-made adapters can be specified). A new feature is the ability to process libraries using bespoke adapters for reducing sequencing bias (e.g. high-definition adapters). The filtering tool was enhanced and added to a stand-alone workflow; it allows the exclusion of unwanted sequences (e.g. tRNA/rRNA fragments, degradation fragments) from multiple datasets as well as the selection of non-transcriptome matching reads. Both tools summarize the output in a size-specific histogram of abundances.

2.3 Locus analysis tools

The prediction of sRNA loci is improved by increasing the number and diversity of samples or the sequencing depth. However, a higher number of available reads also requires an approach to determine a signal to noise threshold. Using expression patterns and entropies, two approaches are available in with Workbench, one for the identification of general sRNA loci, CoLIde (Mohorianu ) and one that improves the prediction of miRNA loci, miRCat2 (Paicu ). CoLIde groups sRNAs in close proximity on the genome that share a common expression pattern into putative loci. The up, down and straight pattern is determined on the relative location of expression intervals and offseted fold change. The expression intervals are either simulated, when no replicates are available, or computed on the normalized, replicated measurements. The patterns can be built on either ordered or un-ordered series. The significance of a locus is based on the dissimilarity of the size class distribution of constituent sRNAs to a random uniform distribution. MirCat2 is a tool for miRNA discovery based on a new approach to scan the genome coupled with an entropy approach for the identification of peaks and exclusion of low abundance sequences, below a noise level. First, all putative peaks are identified using a Kullback–Leibler divergence on abundances. The background level is determined by excluding a peak and re-examining the abundance-distribution. Additional, empirical filters include the exclusion of multiple-matching reads, the analysis of local size class distributions as done in CoLIde and the check for miRNA-like variants. The secondary structures of loci (determined using RNALfold) are used for the identification of miRNA loci (Supplementary Fig. S1e and f).

2.4 Differential expression analysis of sRNAs

A workflow for processing a sRNA project (consisting of several libraries, with/without replicates) from raw data to the identification of differentially expressed (DE) transcripts and expression patterns (Beckers ) is also included. Since this type of analysis may involve a large number of samples, the use of the optimized database feature is default. The workflow comprises of several steps: (i) quality checking of the samples, (ii) evaluation of the effects of various normalizations and selection of an appropriate method, (iii) identification of DE sRNAs and (iv) summarization of expression patterns. First, diagnostics plots are generated e.g. size class distributions of redundant and non-redundant reads, complemented by complexity analyses (Supplementary Fig. S1a). Additional plots include the nucleotide composition, the Jaccard similarity index (Supplementary Fig. S1c), histograms showing the proportions of read assigned to available annotation classes and scatter and MA plots (Supplementary Fig. S1b). These plots are enhanced by boxplots showing the distribution of differential expression, separated per size class (Supplementary Fig. S1d). Second, the normalization of expression levels for transcriptome-matching reads is performed. The user can compare the results of up to six normalization procedures [total count normalization, upper quartile, TMM (edgeR), deSeq normalization, quantile adapted for sequencing data and sub-sampling (without replacement) normalization (Mohorianu )]. For each, the quality check plots are recreated allowing the user to select a method that renders the samples most comparable. Third, DE sRNAs are identified based on fold changes between expression intervals. To exclude low abundance variation, this step uses an offset (noise to signal threshold) calculated using Kullback–Leibler entropy and LOESS smoothing. Lastly, we convert fold-changes into patterns. To assess the DE, users can group sequences sharing a pattern (or motif) and use the sub-sets as starting points for enrichment analyses.

2.5 Availability on the AWS

To accommodate the increase in number of samples/replicates and address the difficulty in accessing suitable servers for the analyses, we enabled the use of the Workbench on the AWS via a pre-configured virtual machine, an Amazon Machine Instance. Scripts to facilitate the transfer of samples (via sftp) and the handling the remote connection are also provided. The AWS version of the Workbench allows users to perform analyses on the cloud, allocating and de-allocating resources dynamically and using resources dependent on their requirements and budget.

3 Discussion

The latest version of the Workbench enables biologists and bioinformaticians to critically and objectively extract information from larger, more complex sRNA datasets by combining workflows and stand-alone tools complemented with intuitive visualization features.

Funding

This work was supported by the Biotechnology and Biological Sciences Research Council (grant BBSRC BB/L021269/1 to V.M. and T.D.). Conflict of Interest: none declared. Click here for additional data file.

12 in total

1. CoLIde: a bioinformatics tool for CO-expression-based small RNA Loci Identification using high-throughput sequencing data.

Authors: Irina Mohorianu; Matthew Benedict Stocks; John Wood; Tamas Dalmay; Vincent Moulton
Journal: RNA Biol Date: 2013-06-28 Impact factor: 4.652

Review 2. Molecular mechanisms of RNA interference.

Authors: Ross C Wilson; Jennifer A Doudna
Journal: Annu Rev Biophys Date: 2013 Impact factor: 12.981

3. The Sequence Alignment/Map format and SAMtools.

Authors: Heng Li; Bob Handsaker; Alec Wysoker; Tim Fennell; Jue Ruan; Nils Homer; Gabor Marth; Goncalo Abecasis; Richard Durbin
Journal: Bioinformatics Date: 2009-06-08 Impact factor: 6.937

Review 4. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants.

Authors: Peter J A Cock; Christopher J Fields; Naohisa Goto; Michael L Heuer; Peter M Rice
Journal: Nucleic Acids Res Date: 2009-12-16 Impact factor: 16.971

5. The UEA sRNA workbench: a suite of tools for analysing and visualizing next generation sequencing microRNA and small RNA datasets.

Authors: Matthew B Stocks; Simon Moxon; Daniel Mapleson; Hugh C Woolfenden; Irina Mohorianu; Leighton Folkes; Frank Schwach; Tamas Dalmay; Vincent Moulton
Journal: Bioinformatics Date: 2012-05-24 Impact factor: 6.937

6. sRNAtoolbox: an integrated collection of small RNA research tools.

Authors: Antonio Rueda; Guillermo Barturen; Ricardo Lebrón; Cristina Gómez-Martín; Ángel Alganza; José L Oliver; Michael Hackenberg
Journal: Nucleic Acids Res Date: 2015-05-27 Impact factor: 16.971

7. Comprehensive processing of high-throughput small RNA sequencing data including quality checking, normalization, and differential expression analysis using the UEA sRNA Workbench.

Authors: Matthew Beckers; Irina Mohorianu; Matthew Stocks; Christopher Applegate; Tamas Dalmay; Vincent Moulton
Journal: RNA Date: 2017-03-13 Impact factor: 4.942

8. Comparison of alternative approaches for analysing multi-level RNA-seq data.

Authors: Irina Mohorianu; Amanda Bretman; Damian T Smith; Emily K Fowler; Tamas Dalmay; Tracey Chapman
Journal: PLoS One Date: 2017-08-08 Impact factor: 3.240

9. A comparison of methods for differential expression analysis of RNA-seq data.

Authors: Charlotte Soneson; Mauro Delorenzi
Journal: BMC Bioinformatics Date: 2013-03-09 Impact factor: 3.169

10. Evaluation of microRNA alignment techniques.

Authors: Mark Ziemann; Antony Kaspi; Assam El-Osta
Journal: RNA Date: 2016-06-09 Impact factor: 4.942

16 in total

1. Arabidopsis RNA Polymerase IV generates 21-22 nucleotide small RNAs that can participate in RNA-directed DNA methylation and may regulate genes.

Authors: Kaushik Panda; Andrea D McCue; R Keith Slotkin
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2020-02-10 Impact factor: 6.237

2. A plant tethering system for the functional study of protein-RNA interactions in vivo.

Authors: Diego Cuerda-Gil; Yu-Hung Hung; Kaushik Panda; R Keith Slotkin
Journal: Plant Methods Date: 2022-06-04 Impact factor: 5.827

3. The RNA workbench 2.0: next generation RNA data analysis.

Authors: Jörg Fallmann; Pavankumar Videm; Andrea Bagnacani; Bérénice Batut; Maria A Doyle; Tomas Klingstrom; Florian Eggenhofer; Peter F Stadler; Rolf Backofen; Björn Grüning
Journal: Nucleic Acids Res Date: 2019-07-02 Impact factor: 16.971

4. Turnip mosaic virus in oilseed rape activates networks of sRNA-mediated interactions between viral and host genomes.

Authors: Nicolas Pitzalis; Khalid Amari; Stéfanie Graindorge; David Pflieger; Livia Donaire; Michael Wassenegger; César Llave; Manfred Heinlein
Journal: Commun Biol Date: 2020-11-23

5. MGcount: a total RNA-seq quantification tool to address multi-mapping and multi-overlapping alignments ambiguity in non-coding transcripts.

Authors: Anna Alemany; Sol Schvartzman; Andrea Hita; Gilles Brocart; Ana Fernandez; Marc Rehmsmeier
Journal: BMC Bioinformatics Date: 2022-01-14 Impact factor: 3.169

6. Comprehensive Analysis of microRNAs in Human Adult Erythropoiesis.

Authors: Aneesha Nath; Janakiram Rayabaram; Smitha Ijee; Abhirup Bagchi; Anurag Dutta Chaudhury; Debanjan Roy; Karthik Chambayil; Jyoti Singh; Yukio Nakamura; Shaji R Velayudhan
Journal: Cells Date: 2021-11-04 Impact factor: 6.600

7. SARS-CoV-2-Encoded MiRNAs Inhibit Host Type I Interferon Pathway and Mediate Allelic Differential Expression of Susceptible Gene.

Authors: Youwei Zhu; Zhaoyang Zhang; Jia Song; Weizhou Qian; Xiangqian Gu; Chaoyong Yang; Nan Shen; Feng Xue; Yuanjia Tang
Journal: Front Immunol Date: 2021-12-23 Impact factor: 7.561

8. An siRNA-guided ARGONAUTE protein directs RNA polymerase V to initiate DNA methylation.

Authors: Meredith J Sigman; Kaushik Panda; Rachel Kirchner; Lauren L McLain; Hayden Payne; John Reddy Peasari; Aman Y Husbands; R Keith Slotkin; Andrea D McCue
Journal: Nat Plants Date: 2021-11-08 Impact factor: 15.793

9. NATpare: a pipeline for high-throughput prediction and functional analysis of nat-siRNAs.

Authors: Joshua Thody; Leighton Folkes; Vincent Moulton
Journal: Nucleic Acids Res Date: 2020-07-09 Impact factor: 16.971

10. PAREameters: a tool for computational inference of plant miRNA-mRNA targeting rules using small RNA and degradome sequencing data.

Authors: Joshua Thody; Vincent Moulton; Irina Mohorianu
Journal: Nucleic Acids Res Date: 2020-03-18 Impact factor: 16.971