Literature DB >> 28377776

isomiR2Function: An Integrated Workflow for Identifying MicroRNA Variants in Plants.

Kun Yang1, Gaurav Sablok2, Guang Qiao1, Qiong Nie1, Xiaopeng Wen1.   

Abstract

In plants, post transcriptional regulation by non-coding RNAs (ncRNAs), in particular miRNAs (19-24 nt) has been involved in modulating the transcriptional landscape in developmental, biotic and abiotic interactions. In past few years, considerable focus has been leveraged on delineating and deciphering the role of miRNAs and their canonical isomiRs in plants. However, proper classification and accurate prediction of plant isomiRs taking into account the relative features by which we define isomiRs, such as templated or non-templated is still lacking. In the present research, we present isomiR2Function, a standalone easily deployable tool that allows for the robust and high-throughput discovery of templated and non-templated isomiRs. Additionally, isomiR2Function allows for identification of differentially expressed isomiRs and in parallel target prediction based on both transcripts or PARE-Seq either using Targetfinder or Cleaveland. isomiR2Function allows for the functional enrichment of the detected targets using TopGO package. Benchmarking of isomiR2Function revealed highly accurate prediction and classification of isomiRs as compared to the previously developed isomiR prediction tools. Additionally, the downstream implementation of additional features allows isomiR2Function to be classified as a single standalone tool for isomiR profiling from discovery to functional roles. All in all, isomiR2Function allows the streamline processing of the miRNA-seq for the identification and characterization of isomiRs with minimal efforts. isomiR2Function can be accessed through: https://github.com/347033139/isomiR2Function.

Entities:  

Keywords:  PARE-seq; functional profiling; isomiRs; miRNAs; plants; post-transcriptional machinery

Year:  2017        PMID: 28377776      PMCID: PMC5359237          DOI: 10.3389/fpls.2017.00322

Source DB:  PubMed          Journal:  Front Plant Sci        ISSN: 1664-462X            Impact factor:   5.753


Introduction

Plants as a system are capable of regulating the stress and developmental responses either transcriptionally, or through post-transcriptional events. In view of emerging roles of non-coding RNAs (ncRNAs) such microRNAs (miRNAs) (Sablok et al., 2015; Karali et al., 2016), artificial microRNAs (Sablok et al., 2011), and circular RNAs (Sablok et al., 2016), it can be easily demonstrated that post-transcriptional regulation plays a critical role in understanding the elusive nature of the plant development and responses to varied kinds of environmental stresses. Post-transcriptionally, the role of the endogenous small RNAs has been well documented, described and widely demonstrated. High-throughput sequencing approaches have been widely applied to profile the role, diversity and functional elucidation of miRNAs in several model species such as Arabidopsis thaliana, Brachypodium distachyon, Oryza sativa, and crop species such as Sorghum and Triticum sp. (Guo et al., 2012; Yao and Sun, 2012; Bertolini et al., 2013; Li et al., 2013; Bologna and Voinnet, 2014). Interestingly, alongside the miRNAs, recently substantial focus has been leveraged to identify the variants of canonical miRNAs, henceforth termed as isomiRs, which have been recently shown to enhance miRNA predictions, binding efficiency (Guo et al., 2014) and also have been shown to be highly regulated as compared to the canonical miRNAs. One such example of isomiRs, are miRNA variants to canonical miR156, which have been shown to be widely regulated as compared to the canonical miRNAs (Baev et al., 2014). Taking into account increasing evidences of isomiRs, substantial efforts have been put forth to identify these miRNA sequence variants, which can be classified on the basis of the 5′ and 3′ nucleotide additions as well as substitutions, occurring as a result of the cleavage variations by ribonuclease Dicer-Like 1 (DCL) (Bartel, 2004). Although the biogenesis of isomiRs is still in naive, they have been further classified into three categories, i.e., 5′- isomiRs, 3′- isomiRs, and polymorphic isomiRs (Neilsen et al., 2012), and have been further grouped into templated (miRNAs length variants having homology to parent genes) and non-templated (non-templated nucleotide additions and/or post-transcriptional RNA edits resulting in no homology to parent gene) isomiRs based on occurrence of substitutions in isomiRs (Neilsen et al., 2012; Rogans and Rey, 2016). In the past few years, considerable less efforts have been leveraged for the identification of isomiRs. It is worth to mention that in plants, no such algorithm yet exits which allows for the high-throughput simultaneous identification and functional classification of isomiRs. Previously developed algorithms for the identification of isomiRs such as isomiRID (de Oliveira et al., 2013), and sRNAtoolbox (Rueda et al., 2015), lack in providing the detailed profiling of the isomiRs, whilst taking into account the sequencing artifacts, expression based read support, visualization of the isomiRs with respect to read depth and mapping, target predictions and functional enrichment. Additionally, previously developed algorithms such as isomiRex (Sablok et al., 2013) are web-based and don’t provide classification and support system for analyzing wide array of plant species. Although the isomiR detection tool DeAnnoIso (Zhang et al., 2016) allows for the detection of isomiRs, however, it can be only accessible through the web interface and lacks supports to detect the isomiRs across wide range of plant species (only four plant species are supported). Another feature that the above stand-alone or the web-based algorithms lack is the integration of the PARE-Seq data and also the in-built support for the functional predictions and target classification. From the database point of view, isomiR database such as YM500 (Cheng et al., 2013) contains pre-analyzed isomiRs, with complete lack of support for the plant species. Taking into account the relative lack of high-throughput detection of isomiRs, we developed isomiR2Function, which allows for the high-throughput detection of plant isomiRs from any miRNA-seq profiling study. isomiR2Function not only allows for the identification of the templated and non-templated 5′- isomiRs and 3′- isomiRs but also allows for the expression quantification using empirical bayes hierarchal model (EBSeq) and/or negative binomial distribution (DESeq). Prediction of biologically relevant target is an important criterion for the identification of isomiRs and their targets. isomiR2Function identifies target based on both alignment penalty using the Targetfinder as well as based on both alignment penalty and the appearance of decayed products using CleaveLand. Following target prediction, it allows for functional enrichment of the identified targets using TopGO package. In parallel, isomiR2Function provides support for the visualization of read mapping on corresponding precursor sequences thus allowing for the identification and read based visualization of the detected isomiRs with respect to the precursor and canonical miRNAs. As compared to the previously published algorithms, isomiR2Function is a stand-alone tool that can be easily configured on any Linux and MAC operating system and can be used to profile isomiRs from any plant species ranging from dicot to monocot. To assess the robustness of the developed algorithm, we benchmarked isomiR2Function against the standalone isomiRID, which only profiles isomiRs and don’t provide support for the visualization, non-templated isomiR detection with classification and also doesn’t allow for target prediction and functional enrichment. To our knowledge, this is the first end-to-end streamlined workflow for the identification and classification of plant isomiRs, which allows for the identification and comprehensive classification of templated and non-templated isomiRs with support for read based visualization coupled with subsequent target prediction and functional enrichment.

Materials and Methods

Algorithm Implementation

isomiR2Function algorithm supports the parallel environment, and can be deployed on any Linux and MAC operating system. isomiR2Function has been developed on Ubuntu OS with 8 GB of RAM. The core algorithm of isomiR2Function is implemented in PERL version 5.014. For the identification of the differentially expressed isomiRs, we have implemented two different packages in R version 3.2: (1) DESeq (Anders and Huber, 2010) and (2) EBSeq (Leng et al., 2015), which have been integrated in the core algorithm of isomiR2Function. For the identification of the targets, two target prediction modules have been implemented, which provides support for transcript based target identification using TargetFinder (Fahlgren et al., 2007) and both transcript and degradome based PARE target identification using CleaveLand (Zhang et al., 2014). In-built functional enrichment support has been provided using TopGO version 2.24.0 (Alexa and Rahnenfuhrer, 2010) within the R framework 3.2.

Sequence Datasets

To test the applicability of the isomiR2Function, we have profiled small RNAs datasets from three model species covering from dicot to monocot clade representing A. thaliana (SRR035251, SRR035252), O. sativa (SRR504362) and B. distachyon (SRR1033810). Additionally, for A. thaliana, degradome dataset (SRR1039524) was also included for target predictions (see Supplementary File 1 for details). For comparative analysis, isomiR2Function was benchmarked against isomiRID (de Oliveira et al., 2013) (see Supplementary File 1 for details).

Results and Discussion

isomiR Detection (Templated and Non-templated)

Recently, several studies have widely elucidated the role of the isomiRs (canonical miRNAs variants) in physiological responses and also demonstrated that the canonical variants are able to affect the miRNA stability and selection (Guo et al., 2014), especially the 5′- isomiRs, which might have different seed region as compared to the canonical miRNAs. It is worth to mention that in case of humans, cooperative ability of the miRNAs and isomiRs has been shown to be a way to increase the microRNAome targeting common biological pathways (Cloonan et al., 2011). Although these canonical variants have been defined as ‘templated’ and ‘non-templated’, based on the subsequent processing of DROSHA/DICER enzymatic machinery (Zhang et al., 2016), in-depth profiling of these 5′- isomiRs and 3′- isomiRs is still evolving in plants small RNA machinery. To provide supports for the high-throughput detection of the templated and non-templated isomiRs in plants, we have developed isomiR2Function, which executes in parallel as well as single thread environment. Figures , describe the workflow of isomiR2Function, which consists of several modules from the pre-processing of the miRNA-seq reads to the end visualization of the detected isomiRs with read based support and also functional enrichment of the predicted targets using either transcript based or PARE-Seq based approaches. Pre-processing and isomiR identification modules of isomiR2Function call pre-processing of the sequenced small RNAs library by adapter cleaning using the local alignment of the defined 5′ and 3′- adapters and uses cutadapt as an adaptor cleaning tool (Martin, 2011). Adapter cleaned and collapsed reads are mapped to RFAM[1], tRNAs[2], and plant snoRNAs database[3] to filter the reads originating from other ncRNAs. The adapter cleaned, filtered and collapsed small RNAs were then mapped to pre-miRNAs allowing no mismatch using bowtie (Langmead et al., 2009). isomiR2Function define templated isomiRs by comparing the mapping information of the small RNAs tags and canonical miRNAs on pre-miRNAs. Schematic view of the isomiR2Function workflow. Integrated workflow of the isomiR detection and functional analysis in isomiR2Function. (A) Shows the seed corrupt algorithm; (B) Displays the KS plots for canonical miRNAs with more than one detected isomiRs. The x-axis indicates different types of isomiRs, the y-axis indicates the abundance ratio of corresponding isomiR to the total isomiRs; (C) Shows the alignment view of isomiR detection, where the top sequence is the pre-miRNA followed by the canonical miRNA. Non-templated bases are indicated with lower case. Mod suggests the terminal drift of isomiRs; (D) Shows the differential expression analysis module using DESeq and EBSeq or intersection of the two algorithms; (E) isomiR2Function supports target prediction using CleaveLand for the PARE and transcript data both or Targetfinder for transcript data only and (F) shows the functional enrichment module of the predicted targets using the TopGO package. For non-templated isomiR detection, in case of sequences potentially originated from multiple pre-miRNAs, only best hits will be analyzed to avoid mapping ambiguity. Furthermore, unmapped sequences will be mapped to genome or transcriptome to decrease false positive rates of non-templated isomiRs identification by filtering mapped sequences. Lastly, sequences, which does not show mapping on any region of genome are re-mapped on pre-miRNA allowing ‘n’ mismatches, where ‘n’ is defined by parameter ‘s’. Taking into account above features, for non-templated isomiR detection, isomiR2Function executes in two steps: (1) In the first step, adapter cleaned tags will be mapped on pre-miRNAs with user-defined ‘n’ mismatches thus allowing the internal SNPs flexibility. Since the distribution of the SNPs can be random or tandem, isomiR2Function further checks the positions of the allowed mismatches. If the allowed mismatches are present at 3′- or 5′- end then the detected isomiRs will be classified under the category of the terminal substituted ones whereas if the allowed mismatches do not contribute to the terminal substitutions then isomiR2Function classifies them as MS for having internal random SNPs, CV for having both internal SNPs and terminal substitutions and TS for having internal tandem SNPs. Since the non-templated isomiRs are mostly by-products of the nucleotidyl transferases possessing 5′–3′ uridyltransferase and/or adenyltransferase activity, implementation of the aforementioned will allow the simultaneous detection of the non-templated isomiRs with internal SNPs and/or with terminal substitutions. (2) For the unmapped tags, detected in the first step, isomiR2Function will execute the additional recursive trimming and mapping step. In case of the recursive mapping and trimming, adapter cleaned tags, which do not map to the pre-miRNAs with the user defined ‘n’ mismatches, one nucleotide will be trimmed from either 5′ end or 3′ end at each recursive cycle and then will be re-mapped on pre-miRNAs for ‘m–n’ round, where ‘m’ is defined by parameter ‘r’. This recursive trimming and mapping continues for each of the adapter cleaned unmapped tags.

isomiR Differential Expression, Target Prediction, Functional Enrichment, and Visualization

For detected isomiRs (templated and non-templated), isomiR2Function provides support for the differential expression analysis either using the DESeq (Anders and Huber, 2010) or EBSeq (Leng et al., 2015). Taking into account the replicate and non-replicate small RNAs sequencing datasets, we implemented both DESeq, EBseq or both (intersection of the top isomiRs taking into account both the algorithms), which allows for the identification of the differentially expressed isomiRs across non-replicate and replicate small RNAs datasets. Following the detection of the differentially expressed isomiRs, isomiR2Function will automatically plot the top 50 differentially expressed isomiRs and will also provide a TAB delimited file of all differentially expressed isomiRs for further exploratory analysis. Following expression analysis, differentially expressed isomiRs will be passed for the transcripts based target identification using TargetFinder (Fahlgren et al., 2007) or degradome based PARE target identification using Cleaveland (Zhang et al., 2014) as defined by the user and the data availability. In-built support for the functional enrichment using TopGO (Alexa and Rahnenfuhrer, 2010) allows for the functional enrichment of the identified targets for hypothesis validations. At the end of each analysis run, isomiR2Function provides a summary file describing several informative descriptions such as total number of isomiRs, canonical and non-canonical isomiRs, complex isomiRs (isomiRs having more than one pre-miRNAs), most abundant isomiR pre-miRNA, and a summary of the 5′ end and 3′ modifications after successful completion of the isomiR profiling (Table ). Statistics of templated isomiRs detection in Arabidopsis thaliana, Brachypodium distachyon, and Oryza sativa.

Benchmarking isomiR2Function

To assess the robustness of isomiR2Function, we benchmarked the developed algorithm across monocot and dicot small RNAs datasets to access the diversity and class of the identified isomiRs. isomiR2Function revealed a wide diversity of the identified isomiRs (Table and Figure ) revealing 3′- additions/deletions as the most abundant event for isomiR variations, which is in line with the previous reports (Rogans and Rey, 2016). To assess the sensitivity and specificity of the algorithm, we compared isomiR2Function and isomiRID to access the accuracy of the implemented algorithm with settings: (1) at most 1 internal SNPs and 6 terminal substitution. (2) from 18 to 26 nt in length. (3) support for the identified isomiRs with only one reads from the sequenced data. (4) at least 16 nt of overlap of the defined canonical isomiRs sequences with canonical miRNAs. Benchmarking analysis revealed high accuracy of isomiR predictions by isomiR2Function as compared to previously published isomiRID in terms of classifying the the non-canonical isomiRs and non-templated canonical isomiRs (Table and Supplementary File 1). Additionally, we observed that isomiRID takes into account the length for recursive trimming and mapping of the trimmed tags as compared to the original adapter cleaned tags. Lastly, isomiRID reported isomiRs with several terminal substitution and with length beyond the user-defined length range thus resulting in incorrect graphical alignments of the identified isomiRs (Supplementary File 1). As compared to isomiRID, isomiR2Function reported correctly previously defiend isomiRs, which were predicted by isomiRID (Table ) plus predicted additional specific non-templated isomiRs, which were not reported by isomiRID. Comparative results for dataset SRR035251 and SRR035252 can be browsed at Supplementary File 1 and https://drive.google.com/file/d/0B-w2z1YjW6TieG9jTUxhSmNwTTQ/view. Different preference of isomiRs across monocot to dicot. The y-axis indicates the number of isomiR types and the x-axis indicates different families from monocot (Brachypodium distachyon and Oryza sativa) and dicot (Arabidopsis thaliana). In dicot, the most abundant isomiRs were canonical isomiRs whereas the most abundant isomiRs were non-canonical isomiRs in monocots. MIR444 generates a considerable number of non-canonical isomiRs in both B. sistachyon and O. sativa, although the most abundant family in O. sativa is MIR7695. Comparative assessment of isomiR profiling using isomiR2Function and isomiRID.

Conclusion

To conclude, isomiR2Function allows for streamline detection of templated and non-templated isomiRs with following additional functionalities: (1) It allows for customizable length range, seed corrupt tolerance and minimum sequencing depth for isomiR identification; (2) Identification of isomiRs on pre-miRNA not overlapping with canonical miRNAs; (3) Information on isomiRs biogenesis using Ks plots; (4) Support for differential expression analysis of the identified isomiRs using DESeq (Anders and Huber, 2010) and EBSeq (Leng et al., 2015); (5) Detailed classification of multiple internal substitution. isomiR2Function classifies isomiRs with multiple internal substitution in more detail, i.e., MS for having internal random SNPs, CV for having both internal SNPs and terminal substitutions, TS for having internal tandem SNPs, 5V/3V for 5′/3′ end substitutions; (6) Support for transcript based target identification using TargetFinder (Fahlgren et al., 2007) or degradome based PARE target identification using Cleaveland (Zhang et al., 2014); (7) Functional enrichment of isomiRs targets for biological implications. Taking into account the implemented functionalities, we believe that isomiR2Function will facilitate the large-scale discovery of isomiRs across the plant species to gain and strengthen the role of the miRNA variants in post-transcriptional regulatory events.

Author Contributions

KY performed the coding for the isomiR identification. GS implemented target predictions, differential expression analysis, GO enrichment and guided the designing of the isomiR2Function algorithm. GQ and QN prepared the datasets for validation of the isomiR2Function. KY, GS, and XW wrote the manuscript. XW originated the concept.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Table 1

Statistics of templated isomiRs detection in Arabidopsis thaliana, Brachypodium distachyon, and Oryza sativa.

Type of isomiRsA. thalianaB. distachyonO. sativa
Total isomiRs120290375
Complex isomiRs5574125
Canonical isomiRs61118125
Non-canonical isomiRs498125
Pre-miRNAs generating isomiRs58103150
Most abundant isomiR pre-miRNAath-MIR166abdi-MIR444bosa-MIR7695
Having 5′- addition only963
Having 3′- addition only15179
Having 5′- deletion only31425
Having 3 prime deletion only182826
Having 5′- addition and 3′- addition both102
Having 5′- addition and 3′- deletion both152330
Having 5′- deletion and 3′- addition both02825
Having 5′- deletion and 3′- deletion both025
Table 2

Comparative assessment of isomiR profiling using isomiR2Function and isomiRID.

isomiR2Function
isomiRID
Templated
Non-templated
Templated
Non-templated
athbdiosaathbdiosaathbdiosaathbdiosa
Total383161327343959888110946383161327343905686910946
Complex120230541229924713836120230541228422703836
Canonical200629836150437482921200629836149931082921
Non-canonica6375413571562662418963754135712214914189
5a1428332501338214283325011182
3a414261392491299414261391474299
5d1753892944018917538929411189
3d47981321495533234798132148520323
5a3a69186939306918672230
5a3d5016421921246144250164219212242442
5d3a17210224445115491721022444289549
5d3d82560897678256076267
No drift0003511023940000351977940
Both3831613273439056869787738316132734390568697877
Specific0005420123069000000
  24 in total

1.  Plant Circular RNAs (circRNAs): Transcriptional Regulation Beyond miRNAs in Plants.

Authors:  Gaurav Sablok; Hongwei Zhao; Xiaoyong Sun
Journal:  Mol Plant       Date:  2016-01-13       Impact factor: 13.164

Review 2.  Exploration of small non coding RNAs in wheat (Triticum aestivum L.).

Authors:  Yingyin Yao; Qixin Sun
Journal:  Plant Mol Biol       Date:  2011-10-19       Impact factor: 4.076

3.  isomiRID: a framework to identify microRNA isoforms.

Authors:  Luiz Felipe Valter de Oliveira; Ana Paula Christoff; Rogerio Margis
Journal:  Bioinformatics       Date:  2013-08-14       Impact factor: 6.937

Review 4.  Artificial microRNAs (amiRNAs) engineering - On how microRNA-based silencing methods have affected current plant silencing research.

Authors:  Gaurav Sablok; Alvaro L Pérez-Quintero; Mehedi Hassan; Tatiana V Tatarinova; Camilo López
Journal:  Biochem Biophys Res Commun       Date:  2011-02-15       Impact factor: 3.575

5.  Insight into small RNA abundance and expression in high- and low-temperature stress response using deep sequencing in Arabidopsis.

Authors:  Vesselin Baev; Ivan Milev; Mladen Naydenov; Tihomir Vachev; Elena Apostolova; Nikolay Mehterov; Mariyana Gozmanva; Georgi Minkov; Gaurav Sablok; Galina Yahubyan
Journal:  Plant Physiol Biochem       Date:  2014-09-16       Impact factor: 4.270

Review 6.  IsomiRs--the overlooked repertoire in the dynamic microRNAome.

Authors:  Corine T Neilsen; Gregory J Goodall; Cameron P Bracken
Journal:  Trends Genet       Date:  2012-08-08       Impact factor: 11.639

7.  Differential expression analysis for sequence count data.

Authors:  Simon Anders; Wolfgang Huber
Journal:  Genome Biol       Date:  2010-10-27       Impact factor: 13.583

8.  sRNAtoolbox: an integrated collection of small RNA research tools.

Authors:  Antonio Rueda; Guillermo Barturen; Ricardo Lebrón; Cristina Gómez-Martín; Ángel Alganza; José L Oliver; Michael Hackenberg
Journal:  Nucleic Acids Res       Date:  2015-05-27       Impact factor: 16.971

9.  High-resolution analysis of the human retina miRNome reveals isomiR variations and novel microRNAs.

Authors:  Marianthi Karali; Maria Persico; Margherita Mutarelli; Annamaria Carissimo; Mariateresa Pizzo; Veer Singh Marwah; Concetta Ambrosio; Michele Pinelli; Diego Carrella; Stefano Ferrari; Diego Ponzin; Vincenzo Nigro; Diego di Bernardo; Sandro Banfi
Journal:  Nucleic Acids Res       Date:  2016-01-26       Impact factor: 16.971

10.  Unveiling the Micronome of Cassava (Manihot esculenta Crantz).

Authors:  Sarah Jane Rogans; Chrissie Rey
Journal:  PLoS One       Date:  2016-01-22       Impact factor: 3.240

View more
  6 in total

Review 1.  Identifying and characterizing functional 3' nucleotide addition in the miRNA pathway.

Authors:  A Maxwell Burroughs; Yoshinari Ando
Journal:  Methods       Date:  2018-08-20       Impact factor: 3.608

2.  MicroRNAs Regulating Autophagy in Neurodegeneration.

Authors:  Qingxuan Lai; Nikolai Kovzel; Ruslan Konovalov; Ilya A Vinnikov
Journal:  Adv Exp Med Biol       Date:  2021       Impact factor: 2.622

3.  Plant IsomiR Atlas: Large Scale Detection, Profiling, and Target Repertoire of IsomiRs in Plants.

Authors:  Kun Yang; Xiaopeng Wen; Suresh B Mudunuri; Gaurav Sablok
Journal:  Front Plant Sci       Date:  2019-01-22       Impact factor: 5.753

4.  Diff isomiRs: Large-scale detection of differential isomiRs for understanding non-coding regulated stress omics in plants.

Authors:  Kun Yang; Xiaopeng Wen; Suresh Mudunuri; G P Saradhi Varma; Gaurav Sablok
Journal:  Sci Rep       Date:  2019-02-05       Impact factor: 4.379

Review 5.  MicroRNA-mediated bioengineering for climate-resilience in crops.

Authors:  Suraj Patil; Shrushti Joshi; Monica Jamla; Xianrong Zhou; Mohammad J Taherzadeh; Penna Suprasanna; Vinay Kumar
Journal:  Bioengineered       Date:  2021-12       Impact factor: 3.269

Review 6.  The Multiverse of Plant Small RNAs: How Can We Explore It?

Authors:  Zdravka Ivanova; Georgi Minkov; Andreas Gisel; Galina Yahubyan; Ivan Minkov; Valentina Toneva; Vesselin Baev
Journal:  Int J Mol Sci       Date:  2022-04-02       Impact factor: 5.923

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.