Literature DB >> 22693224

PsRobot: a web-based plant small RNA meta-analysis toolbox.

Hua-Jun Wu¹, Ying-Ke Ma, Tong Chen, Meng Wang, Xiu-Jie Wang.

Abstract

Small RNAs (smRNAs) in plants, mainly microRNAs and small interfering RNAs, play important roles in both transcriptional and post-transcriptional gene regulation. The broad application of high-throughput sequencing technology has made routinely generation of bulk smRNA sequences in laboratories possible, thus has significantly increased the need for batch analysis tools. PsRobot is a web-based easy-to-use tool dedicated to the identification of smRNAs with stem-loop shaped precursors (such as microRNAs and short hairpin RNAs) and their target genes/transcripts. It performs fast analysis to identify smRNAs with stem-loop shaped precursors among batch input data and predicts their targets using a modified Smith-Waterman algorithm. PsRobot integrates the expression data of smRNAs in major plant smRNA biogenesis gene mutants and smRNA-associated protein complexes to give clues to the smRNA generation and functional processes. Besides improved specificity, the reliability of smRNA target prediction results can also be evaluated by mRNA cleavage (degradome) data. The cross species conservation statuses and the multiplicity of smRNA target sites are also provided. PsRobot is freely accessible at http://omicslab.genetics.ac.cn/psRobot/.

Entities: Chemical Gene Species

Mesh：

Substances：

Year: 2012 PMID： 22693224 PMCID： PMC3394341 DOI： 10.1093/nar/gks554

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

MicroRNAs (miRNAs) and small interfering RNAs (siRNAs) are two major classes of endogenous regulatory small RNAs (smRNAs) in plants. They are usually 21–24 nucleotides (nt) long, and both function by pairing to targets via sequence complementarity (1). miRNAs are usually generated from limited genomic loci and mainly work post-transcriptionally to down-regulate target mRNAs, whereas siRNAs have much broader origins and can function at both transcriptional and post-transcriptional levels (2–6). Both miRNAs and siRNAs are typically identified by cloning and sequencing of small size RNAs (7). The development and application of the high-throughput sequencing technology have significantly advanced the studies on smRNAs, but also imposed increasing numbers of laboratories facing the tasks of data analysis. Plant miRNAs are usually ∼21 nt long and processed from the pairing stem region of longer precursors with hairpin-shaped secondary structures (8). The presence of stem-loop precursor as the lowest energy folding form has been considered as one of the key criteria for the identification of new miRNAs (9,10). However, as many genomic loci giving rise to siRNAs can also be folded into hairpin-shaped structures (11,12), searching for miRNAs by their expression evidence and the presence of stem-loop precursors may yield many false-positive candidates. According to the community-agreed plant miRNA annotation criteria (9,13), the presence of dominantly expressed candidate sequence and the detection of the pairing sequence (miRNA*) are required for miRNAs (13). These constraints improved the prediction specificity, but the results are still far from ideal. It has been shown that plant miRNAs mainly pair with their targets via nearly perfect sequence complementarity, very similar to the manner of siRNAs (9,13). This has made the predictions for plant miRNA targets relatively straightforward, yielding limited numbers of targets per miRNA rather than hundreds of ones in animals (9,13). Yet, it still requires large amount of work to identify real miRNA targets from the predicted candidate list. In addition, increasing lines of evidence have shown that gaps are also tolerable in the pairing between plant miRNAs and their targets (14), which will enlarge the candidate target list and produce more difficulties for experimental validation (15). Most of the available smRNA target prediction software was designed for animal miRNAs and not ideal for plant data because of the common presence of large bulges in animal miRNA and target alignments as well as the different contributions of miRNA seed sequences or central sequences to the stability of miRNA: target pairs of animals and plants. Several recently developed plant miRNA target prediction tools, such as targetFinder (16,17), psRNATarget (18) and CleaveLand (19), have provided great help to researches. However, they are also limited by the requirement of local installation, lack of degradome data supports or dependence on the target prediction results of third-party software (Supplementary Table S1). PsRobot is designed to partially solve the aforementioned problems in plant miRNA and target prediction. It incorporates the commonly agreed criteria to identify smRNAs with stem-loop shaped precursors from user-uploaded sequences and predict their targets. The multiple user adjustable parameters made the software able to meet different needs from users. To facilitate better classification and functional analysis of input sequences, psRobot integrates the expression information of input sequences in reported plant smRNA binding protein pull-down assay or mutants of major smRNA biogenesis pathway genes. For example, strong association with ARGONAUTE1 (AGO1) protein and expression depletion in dcl1 mutant will strengthen the confidence of a smRNA with stem-loop precursor as an miRNA (1). It also incorporates the available mRNA degradome data for users to evaluate the reliability of miRNA target prediction results. The multiplicity of miRNA binding sites on a single target as well as the cross species conservation status of the target sites are also analyzed and provided. PsRobot can be either used online or downloaded and installed locally. The local version offers a larger capacity for input data size and has the function to incorporate user-uploaded degradome data.

METHODS AND RESULTS

The Stem-loop smRNA Prediction Function

Input, algorithm and parameters

The stem-loop smRNA prediction function takes input smRNA sequences in FASTA or plain text format. For each query smRNA, the software finds its perfectly matched genomic origins, and extracts various lengths of upstream and downstream sequences as precursors, assuming that the smRNA may originate from either the 5′ or 3′ end of precursors, with 10 nt extension at one end of the precursor each time, till reaching the user defined precursor length. The secondary structures of the extracted precursor sequences are then evaluated by the MFOLD program (20). Precursor sequences with stem-loop structure as the minimal free energy folding form and the corresponding query smRNA will be selected and reported in the result pages (Fig. 1a), either in html or text format. If the conservation analysis function of smRNA sequences is selected, the cross species conservation status of smRNAs will be analyzed by aligning the query small sequences to eight selected plant genomes using the BLAST program (allowing up to two mismatches in the smRNA sequences), and the ClustalW (21) alignment of the identified smRNA homologous sequences will be included in the outputs (Fig 1b). The repetitive sequence regions of the 26 preloaded genomes were identified by the RepeatMasker program (http://repeatmasker.org) and stored in the background database. Every genomic locus of the query sequences will be searched against the database to identify repeat sequence originated smRNAs. Parameters for users to adjust include smRNA conservation analysis, the minimal and maximal numbers of mismatched nucleotides within the query smRNA sequences in the obtained precursor structures, the maximal lengths of extracted precursors, and the permission of large loop sequence in the qualified precursors. Although the precursor structures of most canonical plant miRNAs are very short and well paired, there are still some with large bulge or hairpin loops, such as ath-MIR393a and ath-MIR167d (22,23). Enable the “Retain large loop small RNA” function will include precursors with large loops in the prediction results.

Figure 1.

Output of stem-loop smRNA prediction. (A) The genomic mapping and stem-loop precursor prediction of the query smRNAs, detailed information of genomic location, precursor sequence, secondary structure of each predicted stem-loop smRNA loci and their folding energies are included. Sequence in red and capitalized letters is the query smRNA sequence; (B) shows the conservation status of the smRNA in 8 plant species; (C) shows the normalized reads of a given smRNA in smRNA biogenesis mutants and AGO-associated libraries.

Data collection

A preloaded species needs to be selected to define the origins of smRNAs. Up to 26 completed plant genomes are currently supported by psRobot, and future finished genomes will also be incorporated on their release. The Arabidopsis thaliana genome was downloaded from the Arabidopsis Information Resource (TAIR) (24) and the rice genome was from Rice Genome Annotation Project (RGAP) (25) and the Rice Annotation Project Database (RAP-DB) (26) databases. Other plant genomes were downloaded from the Phytozome genome database (27). SmRNA deep sequencing data in smRNA biogenesis mutants and AGO-associated libraries were collected from the NCBI Gene Expression Omnibus database with datasets GSE11094, GSE14695, GSE16959, GSE13605, GSE10036, GSE6682, GSE5343 and GSE6682 for Arabidopsis thaliana (28–35) and datasets GSE20748 and GSE18250 for rice (36,37).

smRNA biogenesis and functional data

PsRobot incorporates published smRNA sequencing data from major plant smRNA biogenesis and function associated protein complexes or gene mutants and returns this information together with the stem-loop prediction results. If a query sequence presents in any of the preloaded database, its sequence reads in each database will be listed (Fig. 1c). As it has been shown that smRNAs of different origins are processed by different Dicer-like (DCL) family proteins and associate with different AGO protein complexes, such information will facilitate users to evaluate the types and functions of the query smRNAs. For example, decreased expression in dcl1 mutant together with enrichment in the AGO1 association libraries provide strong evidence for authentic miRNAs (1), whereas decreased expression in dcl3 mutant together with enrichment in the AGO4 association libraries indicate the smRNAs as siRNAs (38).

Performance

To evaluate the accuracy of stem-loop smRNA prediction function, all miRBase (release v17) (39) recorded Arabidopsis thaliana miRNAs were collected and used as the test dataset. Among the 213 nonredundant sequences, 202 (∼94%) miRNAs were successfully identified as stem-loop smRNAs.

The smRNA Target Prediction Function

Input and parameters

To use the smRNA target prediction function, users can either select the known miRNAs from a plant species or submit their own sequences. The software will search for target sites among the pre-loaded genes/transcripts of the corresponding genome or the user-uploaded target library. The uploaded query smRNA sequences and target library should be either in FASTA or plain text format. The parameters for users to adjust include the following: (1) penalty score for the alignment between smRNAs and targets, which is defined by the formulas below; (2) the boundaries of essential sequence region, within which mismatches or gaps will receive double penalty scores than other regions; (3) the threshold for the total number of gaps within the smRNA and target alignment region; and (4) the region within which gaps are permitted. Degradome sequences mapped within the target sites will be analyzed and presented. Only preloaded degradome data are available for the online version of psRobot, yet users can incorporate their own degradome data via the psRobot_deg program once psRobot is installed locally.

Algorithms

As the pairings between smRNAs and target mRNAs involve global sequence alignment of smRNAs and local sequence alignment of targets, we applied a modified Smith–Waterman algorithm (Formula 1) (40) with the defined scoring system (Formula 2) to calculate the alignment scores between the query smRNAs and targets. The penalty score of each candidate alignment is obtained by subtracting the actual alignment score from the ideal perfect global pairing score (Formula 3). Alignments meet the penalty score cutoff will be backtraced and reported in the result page (Fig. 2). Exhaustive search will be performed on each mRNA to search for the potential presence of multiple target sites (target multiplicity). Parallel computing method is used to accelerate the speed. A standalone local version of this function is also available for download to facilitate analysis on large datasets.

Figure 2.

Result summary table of smRNA target prediction function.

Result summary table of smRNA target prediction function. Formula 1: Formula 2: Formula 3:

Output of the smRNA target prediction function

The primary output of the smRNA target prediction function is summarized in a sortable and searchable table (Fig. 2), with the query sequence, target alignment, alignment penalty score, target annotation, multiplicity of target sites and other supporting information. The contents of the expandable links in the result table are summarized in the next sections.

Target site conservation

Homolog gene groups of eight plant species (Arabidopsis thaliana, Brachypodium distachyon, Carica papaya, Oryza sativa, Populus trichocarpa, Sorghum bicolor, Vitis vinifera, Zea mays) are generated using OrthoMCL (41) with the default parameters, and serve as the source data for the conservation analysis of target sites. Predicted targets are searched against this source data for both paralogous and orthologous sequences with conserved target sites (Fig. 2). The multiple alignments of the conserved target sites and the alignments between smRNA and targets for homolog genes can be viewed via the hyperlink in the “Conservation” column (Fig. 3b and c).

Figure 3.

Illustration of expandable links in the target prediction result table. (A) A sample line of the target prediction result table; (B) target site conservation analysis results. Homolog genes with the same smRNA target site, both within and cross-species, are shown in the ClustalW multiple alignment format; (C) the alignments between miRNA and targets provoked by mousing over the hyperlinks of gene IDs listed in (B); (D) Degradome data of the predicted smRNA target. Positions listed in the table are orders of nucleotides in the smRNA sequences, plus upstream and downstream 5 nt each; ‘Loci number’ represents the number of perfectly mapped loci of each degradome sequence, marked at the start nucleotide position of each sequence. Normalized reads starting at each nucleotide position in the degradome data are listed in following rows. The histogram shows the distribution of the degradome reads. In the example, the target should be cleaved after the 12th nucleotide according to the degradome data; (E) Expression change of target gene in smRNA biogenesis pathway gene mutants compared to wild-type plants. Fold changes of the normalized expression values between mutants and wildtype are shown in the table and the Y axis of the plot. Data from different datasets are distinguished by colors in the plot.

Degradome data

It has been shown that the miRNA cleavage products of targets can be cloned and detected by high-throughput sequencing technology, generating mRNA degradome data among which the 5′ ends of sequences mark the cleavage sites of miRNAs (28,37,42–44). PsRobot integrates well-produced datasets of Arabidopsis thaliana (GSE11094) and rice (GSE18248) in the target prediction results (Fig. 2) (28,37). The abundance of degradome sequences (after normalized to reads per million, RPM) is marked at the starting genomic loci of each sequence (Fig. 3d). Candidate target sites with abundant degradome sequences are more likely to be authentic miRNA targets. The position with the most abundant degradome sequences should represent the miRNA cleavage site.

Target expression in smRNA biogenesis mutants

As the production of smRNAs will be significantly impaired in the mutants of genes involved in plant smRNA biogenesis pathways, such as the dcl, hyl1, hen1 and rdr family genes (38), expression increment of genes in these mutants may indicate an inhibitory target effect by smRNAs. To facilitate inspections from this perspective, psRobot collected the published microarray data and integrated them in the target prediction results (Fig. 3e). Currently, this function is only available for Arabidopsis thaliana (GSE2473, GSE3011, GSE24887) (45,46) and will be expanded to other species on the availability of required data. To test the reliability of the smRNA target prediction results, we selected 75 Arabidopsis thaliana miRNAs with at least one reported target in the ASRP database (Supplementary Table S2) (47). A total of 995 genes were predicted as the targets of the 75 miRNAs using the default parameters, of which 306 (31%) targets were reported by the ASRP database, representing 89% of the 344 validated Arabidopsis miRNA targets (Supplementary Table S1). Significant improvement of the prediction results was achieved by combining the information of target site conservation, detectability in degradome data and expression change in smRNA biogenesis mutants as filters, as demonstrated by the reduction of total predicted targets to 542, of which 292 (54%) were confirmed targets from ASRP (85% of validated targets) (Supplementary Table S1).

CONCLUSIONS

Computational prediction of miRNAs and their targets have suffered high false-positive rate because of limited constraints to apply. To generate more specific prediction results, psRobot integrated the biogenesis and protein association information of plant smRNAs, as well as conservation, cleavage and smRNA dependency information for mRNAs. These information can facilitate users to quickly identify bona fide miRNAs or other functional stem-loop smRNAs and their candidate targets. The ability to handle both single or batch sequence input and the availability of online and local version of the software renders it high flexibility in application.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online: Supplementary Tables 1 and 2.

FUNDING

National Natural Science Foundation of China [90917016 and 30921061 to X.W.]; Chinese Academy of Sciences [20100322 to X.W.]; Ministry of Agriculture [2011ZX08010-002-002 to X.W.]. Funding for open access charge: National Natural Science Foundation of China [90917016]. Conflict of interest statement. None declared.

46 in total

1. Mfold web server for nucleic acid folding and hybridization prediction.

Authors: Michael Zuker
Journal: Nucleic Acids Res Date: 2003-07-01 Impact factor: 16.971

2. MicroRNAs in plants.

Authors: Brenda J Reinhart; Earl G Weinstein; Matthew W Rhoades; Bonnie Bartel; David P Bartel
Journal: Genes Dev Date: 2002-07-01 Impact factor: 11.361

3. Role of Arabidopsis ARGONAUTE4 in RNA-directed DNA methylation triggered by inverted repeats.

Authors: Daniel Zilberman; Xiaofeng Cao; Lisa K Johansen; Zhixin Xie; James C Carrington; Steven E Jacobsen
Journal: Curr Biol Date: 2004-07-13 Impact factor: 10.834

4. Endogenous trans-acting siRNAs regulate the accumulation of Arabidopsis mRNAs.

Authors: Franck Vazquez; Hervé Vaucheret; Ramya Rajagopalan; Christelle Lepers; Virginie Gasciolli; Allison C Mallory; Jean-Louis Hilbert; David P Bartel; Patrice Crété
Journal: Mol Cell Date: 2004-10-08 Impact factor: 17.970

5. An endogenous, systemic RNAi pathway in plants.

Authors: Patrice Dunoyer; Christopher A Brosnan; Gregory Schott; Yu Wang; Florence Jay; Abdelmalek Alioua; Christophe Himber; Olivier Voinnet
Journal: EMBO J Date: 2010-04-22 Impact factor: 11.598

6. Identification of common molecular subsequences.

Authors: T F Smith; M S Waterman
Journal: J Mol Biol Date: 1981-03-25 Impact factor: 5.469

7. Novel and stress-regulated microRNAs and other small RNAs from Arabidopsis.

Authors: Ramanjulu Sunkar; Jian-Kang Zhu
Journal: Plant Cell Date: 2004-07-16 Impact factor: 11.277

8. Computational identification of plant microRNAs and their targets, including a stress-induced miRNA.

Authors: Matthew W Jones-Rhoades; David P Bartel
Journal: Mol Cell Date: 2004-06-18 Impact factor: 17.970

9. OrthoMCL: identification of ortholog groups for eukaryotic genomes.

Authors: Li Li; Christian J Stoeckert; David S Roos
Journal: Genome Res Date: 2003-09 Impact factor: 9.043

10. Prediction and identification of Arabidopsis thaliana microRNAs and their mRNA targets.

Authors: Xiu-Jie Wang; José L Reyes; Nam-Hai Chua; Terry Gaasterland
Journal: Genome Biol Date: 2004-08-31 Impact factor: 13.583

176 in total

Review 1. miRNAs target databases: developmental methods and target identification techniques with functional annotations.

Authors: Nagendra Kumar Singh
Journal: Cell Mol Life Sci Date: 2017-02-15 Impact factor: 9.261

2. Changes in endogenous phytohormones regulated by microRNA-target mRNAs contribute to the development of Dwarf Autotetraploid Chinese Cabbage (Brassica rapa L. ssp. pekinensis).

Authors: Yiheng Wang; Shengnan Huang; Zhiyong Liu; Xiaoyan Tang; Hui Feng
Journal: Mol Genet Genomics Date: 2018-08-16 Impact factor: 3.291

3. Identification of drought-responsive miRNAs in Hippophae tibetana using high-throughput sequencing.

Authors: Gang Fan; Yue Liu; Huan Du; Tingting Kuang; Yi Zhang
Journal: 3 Biotech Date: 2020-01-21 Impact factor: 2.406

4. MicroRNA-like RNAs in plant pathogenic fungus Fusarium oxysporum f. sp. niveum are involved in toxin gene expression fine tuning.

Authors: Xuefei Jiang; Fei Qiao; Yali Long; Hanqing Cong; Huapeng Sun
Journal: 3 Biotech Date: 2017-09-27 Impact factor: 2.406

5. Genome-wide identification of microRNAs involved in the regulation of fruit ripening and climacteric stages in melon (Cucumis melo).

Authors: Selinge Bai; Yunyun Tian; Chao Tan; Shunbuer Bai; Jinfeng Hao; Agula Hasi
Journal: Hortic Res Date: 2020-07-01 Impact factor: 6.793

6. TarHunter, a tool for predicting conserved microRNA targets and target mimics in plants.

Authors: Xuan Ma; Chunyan Liu; Lianfeng Gu; Beixin Mo; Xiaofeng Cao; Xuemei Chen
Journal: Bioinformatics Date: 2018-05-01 Impact factor: 6.937

7. Identification and characterization of microRNAs in Eucheuma denticulatum by high-throughput sequencing and bioinformatics analysis.

Authors: Fan Gao; Fangru Nan; Jia Feng; Junping Lv; Qi Liu; Shulian Xie
Journal: RNA Biol Date: 2015-12-30 Impact factor: 4.652

8. Regulation of FLOWERING LOCUS T by a microRNA in Brachypodium distachyon.

Authors: Liang Wu; Dongfeng Liu; Jiajie Wu; Rongzhi Zhang; Zhengrui Qin; Danmei Liu; Aili Li; Daolin Fu; Wenxue Zhai; Long Mao
Journal: Plant Cell Date: 2013-11-27 Impact factor: 11.277

9. High-throughput sequencing of small RNAs revealed the diversified cold-responsive pathways during cold stress in the wild banana (Musa itinerans).

Authors: Weihua Liu; Chunzhen Cheng; Fanglan Chen; Shanshan Ni; Yuling Lin; Zhongxiong Lai
Journal: BMC Plant Biol Date: 2018-11-29 Impact factor: 4.215

10. Widespread long noncoding RNAs as endogenous target mimics for microRNAs in plants.

Authors: Hua-Jun Wu; Zhi-Min Wang; Meng Wang; Xiu-Jie Wang
Journal: Plant Physiol Date: 2013-02-21 Impact factor: 8.340