Literature DB >> 23200136

Finding microRNA targets in plants: current status and perspectives.

Jiandong Ding¹, Shuigeng Zhou, Jihong Guan.

Abstract

MicroRNAs (miRNAs), a class of ~20-24 nt long non-coding RNAs, have critical roles in diverse biological processes including development, proliferation, stress response, etc. With the development and availability of experimental technologies and computational approaches, the field of miRNA biology has advanced tremendously over the last decade. By sequence complementarity, miRNAs have been estimated to regulate certain mRNA transcripts. Although it was once thought to be simple and straightforward to find plant miRNA targets, this viewpoint is being challenged by genetic and biochemical studies. In this review, we summarize recent progress in plant miRNA target recognition mechanisms, principles of target prediction, and introduce current experimental and computational tools for plant miRNA target prediction. At the end, we also present our thinking on the outlook for future directions in the development of plant miRNA target finding methods.

Entities: Chemical Gene Species

Mesh：

Substances：

Year: 2012 PMID： 23200136 PMCID： PMC5054207 DOI： 10.1016/j.gpb.2012.09.003

Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN： 1672-0229 Impact factor: 7.691

Introduction

MicroRNAs (miRNAs) are one of three distinct types of small RNAs (sRNAs, including small interfering RNAs, microRNAs and Piwi-interacting RNAs) currently understood in plants and animals, which are distinguished by their biogenesis, not by their action [1]. The field of miRNA biology emerged with the discovery that the gene lin-4 and let-7, which control developmental timing in the nematode Caenorhabditis elegans, surprisingly did not code for protein, but instead acted as a ∼22 nt RNA transcript which regulated gene expression post-transcriptionally [2], [3], [4]. Since then, miRNAs have been shown to play a variety of regulatory roles and target other regions in addition to 3′ untranslated regions (UTRs) [5], [6], [7], [8]. The field of miRNA biology has quickly progressed through the application of genome-wide approaches for identification of miRNAs [9], [10] and their targets [11], [12], [13], [14]. In this article, we focus on plants and review these tools and their contributions to our understanding of miRNAs as local and global regulators. The biogenesis of plant miRNAs has been documented most in Arabidopsis thaliana (Figure 1). Mature plant miRNAs range in size from 20 to 24 nucleotides. Primary miRNA transcripts (pri-miRNAs) are generally RNA polymerase II transcripts that contain imperfect, self-complementary foldback regions [15], [16]. The length of plant pri-miRNA hairpins is heterogeneous, ranging from approximately 70 to thousands of bases.

Figure 1

Major biogenesis pathways of plant miRNAs The figure shows two major pathways for plant miRNA biogenesis. The pri-miRNA is a primary transcript. The stem–loops on pri-miRNA are cleaved by DCL1 (Dicer-LIKE 1) in the nucleus giving rise to the mature transcript. The generated duplex is shown with a red strand (the miRNA) and a black strand (the miRNA∗). Before exported into the cytosol by HASTY (HST), it is methylated by HUA ENHANCER 1 (HEN1) to render stability. The red strand is integrated in the miRISC and the black strand is either degraded or acts like the red strand. Depending on the degree of complementarity to the target site, miRISC will either cleave the mRNA which will induce immediate degradation or suppress translation procedure.

In animals, the pri-miRNA transcript is first processed by the RNase III domain-containing protein Drosha in association with the RNA-binding protein encoded by DGCR8 [17]. Processed miRNA precursors (pre-miRNA) are exported from the nucleus and are cleaved by 22 bp from the Drosha processing site by the RNase III domain-containing protein Dicer [18]. In plants, there is no Drosha homolog present; rather, plants have Dicer homologs. Of the four Dicer-like (DCL) enzymes in Arabidopsis, DCL1 is responsible for the bulk of miRNA biogenesis [19]. There are two precursor-processing pathways that have been identified for plant miRNA genes. The primary pathway involves stem-to-loop processing in which the sequence and structure beyond the miRNA–miRNA∗ site are necessary and used by the cleavage pathway components to excise the mature sequences [20], [21], [22]. The second pathway involves loop-to-stem processing in which only the structure between the miRNA and miRNA∗ is necessary for the cleavage pathway components to excise the mature sequences [23], [24]. Most plant pri-miRNA hairpins produce a single miRNA/miRNA∗ duplex, but some loci consistently produce multiple duplexes [25]. Unlike animals, miRNA biogenesis in plants is completed within the nucleus [26]. Several accessory factors also contribute to the efficiency and fidelity of miRNA/miRNA∗ excision in plants [27], [28]. The 3′ end of the initial miRNA/miRNA∗ duplex is 2′-O-methylated by the nuclear HEN1 protein [29]; this modification prevents non-templated 3′-polymerization that accelerates miRNA turnover [30]. HASTY (HST), a plant homolog of Exportin-5, is then thought to export miRNA/miRNA∗ duplexes for loading into ARGONAUTE (AGO) proteins [31]. Usually, AGO1 acts as a ‘slicer’ to direct the endonucleolytic cleavage of target RNAs [32], although most other plant AGOs are also likely to possess slicing capabilities [33]. Many animal miRNAs are conserved even between greatly-diverged species [34], especially when considering the crucial ‘seed’ regions, which are often solely responsible for targeting specificity [35]. In plants, a minority of annotated MIR gene families are conserved between plant families, while the majority are family- or species-specific, suggesting that most known MIR genes arose relatively recently in evolutionary time [36]. Unlike highly-conserved, ancient miRNAs, young miRNAs are often weakly expressed and processed imprecisely, lack targets, and display patterns of neutral variation, suggesting that young MIR loci tend to evolve neutrally (for a recent review, see [37]). However, the evidence for any miRNAs conserved between animals and plants is slim, which suggests that animal and plant miRNAs may not come from a same ‘ancestor’. After combining the AGO protein, an outstanding problem in the miRNA field is how miRNAs recognize specific sequences of partial complementarity, complicating the prediction of target sites [38]. The mechanism by which miRNAs regulate target gene expression has been a controversial subject, as there is evidence for target mRNA destabilization, translational repression and even activation of gene expression [39]. The substantial differences between the biogenesis of animal and plant miRNAs are also reflected in the differences in their requirements for target recognition. In plants, miRNAs can silence targets through RNA degradation as well as translational repression pathways [40]. The perfect [41], or near perfect pairing of miRNA and its target site supports endonucleolytic cleavage of the mRNA by AGO (Figure 1). This mechanism is common in plants but much rarer in animals [41]. There are also cases in plants in which miRNAs cause reduced levels of protein, but not mRNA, suggesting that translational repression is directed by miRNA-induced silencing complex (miRISC) [42]. The actual mechanism that blocks protein production is not clear and there is evidence for inhibition of translational initiation or elongation, as well as for directed proteolysis of the peptide that is being synthesized from the targeted mRNA [43]. Overall, miRNAs typically repress gene expression, and it remains to be seen whether positive regulation of targets extends beyond the limited cases that have been uncovered so far [44]. Methods for discerning these different mechanisms of target regulation will be discussed in the next section. In addition to the repression of target mRNAs, some miRNAs have other specialized functions or confer unique properties to miRNA–AGO complexes. For example, some miRNAs can trigger the production of 21-nt siRNAs. Trans-acting siRNAs (ta-siRNAs) are RDR6- and DCL4-dependent products of a refined RNA interference pathway, which function as repressors on specific, co-evolved target mRNAs [45], [46]. Phasing of TAS1/TAS2, TAS3 and TAS4 ta-siRNAs is set by cleavage guided by miR173–AGO1, miR390–AGO7 and miR828–AGO1 complexes, respectively [47], [48], [49]. Thus, miR173, miR390 and miR828 play as activators, rather than repressors, of a siRNA pathway. miRNAs can also direct DNA methylation. A subset of miRNA variants preferentially associate with AGO proteins involved in RNA-directed DNA methylation (RdDM) [50]. In rice, DCL3a also processes multiple MIR foldbacks, yielding 24-nt siRNA-like miRNAs [51]. Unlike 21-nt miRNAs, the 24-nt siRNA-like miRNAs preferentially associate with rice AGO4a and AGO4b and guide the methylation of target genes. In order to facilitate the computational study of miRNAs, especially the miRNA−target interactions, we summarized common online sources and grouped them into several categories (Table 1).

Table 1

Online sources for plant miRNA study

Category	Name	miR	Target		Website	Note	Refs.
Category	Name	miR	Que¹	Pre²	Website	Note	Refs.
miRNA databases	miRBase	√			http://www.mirbase.org/	The biggest online registry for miRNAs. Target results are provided by miRCosm, but no plant is supported.	[52]
miRNA databases	Rfam	√			http://rfam.sanger.ac.uk/	It is a collection of RNA families. miRNA family arrangement is different from miRBase.	[53]

Species-specific sources	ASRP	√	√		http://asrp.cgrb.oregonstate.edu/	Arabidopsis miRNAs and ta-siRNAs are collected, and also their targets.	[66]
Species-specific sources	CSRDB	√	√	√	http://sundarlab.ucdavis.edu/smrnas/	Cereal sRNAs database, sRNAs of rice and maize are identified with 454 sequencing data.	[67]

miRNA annotation databases	MicroPC	√	√	√	http://www3a.biotec.or.th/micropc/index.html	A comprehensive resource for predicting and comparing plant miRNAs.	[54]
	PmiRKB	√	√		http://bis.zju.edu.cn/pmirkb/index.php	Four major functional modules are provided for plant miRNAs.	[55]
	PMRD	√	√		http://bioinformatics.cau.edu.cn/PMRD/	A plant-specific miRNA annotation database.	[56]

Target databases	miRTarBase		√		http://mirtarbase.mbc.nctu.edu.tw/index.html	Provides experimentally-verified miRNA-target interactions.	[60]
Target databases	starBase		√	√	http://starbase.sysu.edu.cn/index.php	Degradome-Seq data are used and five target prediction tools are integrated.	[61]

Genome & transcriptome databases	Phytozome				http://www.phytozome.net/	It provides 31 sequenced and annotated green plant genomes, which have been clustered into gene families at 11 evolutionarily-significant nodes.	[59]
	TAIR				http://www.arabidopsis.org/	TAIR maintains a database of genetic and molecular biology data for the model higher plant Arabidopsis thaliana.	[57]
	TIGR				http://rice.plantbiology.msu.edu/	TIGR provides genome sequence from the Nipponbare subspecies of rice and annotation of the 12 chromosomes.	[58]

High-throughput data	SRA				http://www.ncbi.nlm.nih.gov/sra	Sequence Read Archive (SRA) is a public repository for next-generation sequence data. sRNA sequencing data could be archived.	[64]
	MPSS				http://mpss.udel.edu/	Support several plant species, both sRNA and Degradome-Seq data are available.	[63]
	GEO				http://www.ncbi.nlm.nih.gov/geo/	Gene Expression Omnibus (GEO) database is a public repository for high-throughput gene expression data. Degradome-Seq data is also collected.	[65]

Note: 1Query, allow user to brows pre-computed targets; 2Predict, accept user submitted miRNA and/or mRNA sequences, and return predicted results.

Generally, sequences of miRNA and transcript are required when studying miRNA−target interactions. As the biggest online registry, miRBase collects miRNAs of various species that are identified by experimental or computational methods [52]. Another well-known RNA database, Rfam, also provides miRNA sequences based on homology relationships [53]. Rfam is helpful both for miRNA identification and for target prediction, especially when we need to filter out structural RNAs before analyzing deep sequencing data. Besides miRBase and Rfam, there are three plant-oriented miRNA annotation databases, MicroPC [54], PmiRKB [55] and PMRD [56], which greatly enhance plant miRNA studies. Several datasets are established to maintain plant genomic and/or transcriptomic data. Right now, TAIR [57] and TIGR [58] are such data centers for two well-studied plant model species Arabidopsis and rice, respectively. Additionally, Phytozome [59], incorporating information about 31 plant genomes and also their annotations, will be preferred when studying the evolution of plant species. When estimating the performance of target prediction methods, it would be wise to use those experimentally-validated targets as test data. The emergence of miRTarBase [60] and starBase [61] satisfies such a requirement. Different from miRTarBase, starBase further integrates plant miRNA−target interactions supported by Degradome-Seq. Along with the population of next generation sequencing [62], miRNA study is progressing rapidly. The appearance of MPSS [63], SRA [64] and GEO [65] sheds lights on the development of high-throughput based computational tools to study miRNAs in both animals and plants. Two other specific databases, ASRP [66] and CSRDB [67], are also listed in Table 1 because they perform a series of in-depth research related to sRNAs, including miRNAs.

Experimental methods

Target-specific validation methods

Given the challenge of matching miRNAs to specific target sequences, several approaches have been adopted for identifying functional interactions. In order to explore the principles of miRNA targeting mRNA, target-specific methods were introduced first. Target-specific experimental validation with well-established techniques such as quantitative real-time PCR (qRT-PCR) [68], western blot [69] and 5′-rapid amplification of cDNA ends (5′-RACE) [70], is commonly used to evaluate individual miRNA:target pairs. For a detailed review of methods for the experimental validation of specific miRNA targets, please refer to Thomson et al. [71]. Clearly the miRNA and its target should be co-expressed in order for the miRNA to regulate the expression of its biological target. Co-expression is typically demonstrated by simply performing Northern blot analysis or qRT-PCR using total RNA isolated from a specific cell type and probes or primers specific for a given miRNA and mRNA target [72]. Although the majority of miRNA targets appear to be regulated both at the mRNA and protein level, such regulation is only manifested at the protein level for some targets [73]. For a true target of a specific miRNA, the modulation of miRNA concentration should correspond to a predictable change in the amount of protein encoded by the target mRNA. Therefore, a typical approach to validate the functional importance of a miRNA:target pair is a transient over-expression of a given miRNA mimic in a cell type known to express the putative target protein and subsequent western analysis using a specific antibody against that protein. One limitation of monitoring protein concentration is that it may not be selective enough to distinguish between members of the same miRNA family with similar sequences. Generally, the downstream effects of differential miRNAs can be observed at the protein level by western blot and at the mRNA level by qRT-PCR, although these measures will not distinguish between direct and secondary miRNA targets and it is hard to determine whether the target mRNA is regulated by one single miRNA predominantly or several miRNAs simultaneously. In the specific situation where a miRNA target is directly cleaved, 5′ RNA ligase mediated-RACE (5′ RLM-RACE) may be used to evaluate such targeting [41]. Briefly speaking, 5′-RACE is a PCR-based technique, whereby an RNA adapter is ligated to the free 5′ phosphate of an uncapped mRNA produced from, among other nucleolytic activities, AGO2-directed mRNA cleavage. The ligation product can be reversely transcribed using a forward primer directed against the linker and a gene specific reverse primer which is subsequently PCR amplified, cloned and identified by sequencing. 5′-RACE has been employed extensively to validate products of RISC-mediated cleavage in plants [41].

High-throughput methods

To map RNA–RNA cleavage sites more comprehensively, Addo-Quaye et al. [74] and German et al. [75] introduced the parallel analysis of RNA ends (PARE), also known as Degradome-Seq or genome-wide mapping of uncapped transcripts (GMUCT). Degradome libraries are constructed by ligation of polyA-enriched RNA samples to a custom RNA adaptor containing a 3′ MmeI site, followed by reverse transcription, second-strand synthesis, Mmel digestion, ligation of a 3′ dsDNA adaptor, gel purification and PCR amplification (Figure 2). The power of coupling PARE and high-throughput sequencing has been demonstrated by identifying widespread mRNA cleavage events regulated by miRNAs in Arabidopsis [74], [75], [76], rice [77], [78], grapevine [79] and limited cleavages in mammals [80], [81], [82]. Because extensive base pairing between miRNAs and mRNAs leading to direct RISC-mediated cleavage does not appear to be a major mechanism of miRNA activity in mammals, the use of PARE is most suited to plant systems, where it identifies the large subset of miRNA targets that are subject to direct cleavage [83].

Figure 2

The PARE protocol Parallel analysis of RNA ends (PARE) is also known as genome-wide mapping of uncapped transcripts (GMUCT) or Degradome-seq. PARE is a modified 5′-RACE with high-throughput deep sequencing methods. After cleavage, the downstream sequence of the target site will not be degraded. Thus, it would be possible to preserve the cleavage site by adding a 5′ adaptor. Furthermore, replacing the original long downstream sequence with a shorter subsequence (20 nt) and a new 3′ double strand DNA adaptor will make it realistic to enhance the performance by combining deep sequencing methods after purification and amplification.

Degradome data can be scrutinized to find evidence of cleaved sRNA targets without resorting to computational predictions. Here, the current task is to develop effective and efficient pipelines to make better use of these data. So far, three pipelines have been proposed to process degradome data. CleaveLand is the first general pipeline for detecting fragments diagnostic of sRNA-mediated cleavage from degradome sequencing experiments [84]. To begin with, degradome sequences will be matched to structural RNAs using Oligomap [85] to filter out structural RNAs. The purified dataset is then mapped to the transcriptome by Oligomap. A 26 nt long ‘query’ mRNA subsequence is generated by extracting 13 nt long sequences upstream and downstream of the location of the 5′ end of a matching degradome sequence. All query sequences are aligned to sRNA sequences using the Needle program in the EMBOSS package [86]. Alignments are then scored according to a previously-described scheme [87] and those with scores not exceeding a certain threshold and having at least one degradome sequence start with the 10th nucleotide of complementarity are retained. In order to differentiate spurious results from real targets, the pipeline re-runs using randomly-shuffled sRNA sequences, which have dinucleotide and trinucleotide compositions consistent with those of the input transcriptome, to estimate signal-to-noise ratios. This pipeline has been considered in many subsequent studies, including the well-known web server, starBase [61], whose contribution has been verified [7]. The second Degradome-Seq based analysis pipeline is SeqTar [88], which allows more mismatches and critical mismatch or G:U wobble pair at the position 10 or 11. This method introduces two statistics; not only the alignment between sRNA and query sequence is measured but also the abundance of reads at the center of query sequence is measured too. SeqTar uses a modified Smith–Waterman algorithm to align sRNA to a query sequence, and BLASTN to align degradome sequences to transcriptome. Because loose rules are considered, especially at the cleavage position, this pipeline always predicts many more miRNA:target pairs than CleaveLand [88]. Additionally, SeqTar could predict a potential sRNA if an accumulation of reads is found at a specific position, named as a peak, on a target when no input sRNAs contributed to this accumulation. Recently, Folkes et al. described a new, user-friendly, cross-platform degradome analysis tool, PAREsnip, which enables flexible and comprehensive high-throughput target analysis, allowing users to identify genome-wide networks of sRNA−target interactions resulting in transcript cleavage [89]. Similar to CleaveLand and SeqTar, the input for PAREsnip includes transcriptome, degradome and sRNAome data. In this method, another short read alignment method, PatMaN [90] is employed to map degradome sequences. When searching for sRNA sequences that could potentially cleave a transcript accounting for the degradome peak at a given position, the authors perform a pre-defined rule-based complementarity search algorithm [87], [91], by traveling the 4-way tree [92]. sRNA−target interactions identified by PAREsnip will be further measured by a P-value based metric, and only high quality predictions will be retained. Although the complementary searching rules and selection score are similar, PAREsnip runs much faster than CleaveLand by performing the 4-way tree search. Additionally, PAREsnip employs cross-sample conservation of both sRNA and degradome sequences to reduce false predictions. Another web tool, SoMART, is a collection of tools to process sRNAs [93]. Its slicer detector tool can predict sRNAs that could target the user-provided transcript, then the dRNA mapper tool can be used to align degradome sequences to the transcript. The server was recently updated with a new function, SMART COMPARE, to automatically compare the results from slicer detector and dRNA mapper, and identify slicers supported by degradome reads mapped to predicted cleavage sites.

Computational methods

As new experimental data comes to hand, the accepted modes of miRNA−target interaction are also expanded [94]. Although prediction programs may not incorporate all these experimentally-derived possibilities, their ability to provide potential targets easily and efficiently could greatly facilitate the downstream investigation. In this section, we will discuss the principles considered by existing computational methods. In order to facilitate description, we summarized popular available methods/tools in Table 2 and annotated each method/tool based on principles mentioned below.

Table 2

Plant miRNA target prediction tools

Name	Comp¹	Cons²	Hyb³/Acc⁴	Mul⁵	Fun⁶	Availability⁷	Own sequence⁸		Link	Refs.
Name	Comp¹	Cons²	Hyb³/Acc⁴	Mul⁵	Fun⁶	Availability⁷	miRNA	mRNA	Link	Refs.
PatScan	•					L	yes	yes	N/A	[131]
miRNAassist	•			•		L⁹	yes	yes	N/A	[132]
miRU	•	•				W	no	no	N/A	[133]
WMD3	•		•			W	yes	no	http://wmd3.weigelworld.org/	[114]
TAPIR	•	•	•			W	yes	yes	http://bioinformatics.psb.ugent.be/webtools/tapir/	[115]
UEA sRNA	•		•			W	yes	no	http://srna-tools.cmp.uea.ac.uk/plant/	[103]
Target-align	•					L	yes	yes	http://www.leonxie.com/	[104]
Targetfinder	•	•				L	yes	yes	http://carringtonlab.org/resources/targetfinder	[19]
p-TAREF	•	•	•			W/L	yes	yes	http://scbb.ihbt.res.in/new/p-taref/form1.html	[116]
psRNATarget	•	•	•	•	•	W	yes	yes	http://plantgrn.noble.org/psRNATarget/	[101]
imiRTP	•	•	•	•	•	L	yes	yes	http://admis.fudan.edu.cn/projects/imiRTP.htm	[102]

Note:1 Complementary; 2 Conservation; 3 Hybridization; 4 Accessibility; 5 Multiplicity; 6 Function; 7W/Lindicate whether the tool could be accessed online at website or installed locally; 8yes/no indicate whether users’ own miRNA and/or mRNA sequences could be used by the tool or not; 9 available upon request.

Complementarity

For a miRNA, the complementarity between itself and its target site determines the stability of miRNA:target duplex and therefore has been utilized as a key feature for target gene analysis by computational methods. Early observations suggested that ∼6 nt (position 2–7) cis elements that are required for post-transcriptional repression of Drosophila melanogaster targets are perfectly complementary to the 5′ ends of specific miRNAs [95]. Subsequently, systematic mutagenesis studies highlighted the seed regions for miRNA targeting in Arabidopsis [91], [96]. The conserved pairing of the seed region markedly reduces the occurrence of false positive predictions. In general, a scoring schema that requires a perfect or nearly perfect match within seed regions has been widely employed in published plant miRNA prediction methods/tools. However, some exceptions have been reported. For example, miR398a post-transcriptionally regulates its target gene CSD2 in Arabidopsis, though the seed region contains a bulge and GU wobble [97]. Meanwhile, a set of canonical seed types of different length or with specific initial base was verified by experiments [38], which puzzled the computational methods. One recent study verified the specificity of long seeds but the majority of functional target sites are formed by less specific seeds of only 6 nt [98]. Additionally, they also found a substantial fraction of genuine target sites are non-conserved. Chi et al. identified an alternative binding mode by which miR-124 can regulate its target with G-bugle site (positions 5–6), which cannot be explained by canonical seed matches [99]. All the contradictory evidence seems to require existing computational methods to review their seed region related scoring rules. The degree of complementarity of the central region (position 9–11) is often presented as a decisive feature that determines whether slicing or translational repression and mRNA decay follows from RISC recruitment [100]. Additionally, functional miRNA target sites that only pair with the central region of the miRNA lead to translational inhibition or mRNA decay in humans [80], which might be also present in plants. Thus, many computational methods consider this region when scoring a miRNA:target pair [101], [102], [103], [104]. The differences are the position of the central region, and whether mismatch or wobble pair in this region is allowed, and if it is allowed whether the functional type of this miRNA:target pair is also predicted. As showed in Table 2, psRNATarget [101] and imiRTP [102] predict the functional type of miRNA based on the complementary at the central region of the miRNA:target pair. In addition to the seed and central regions, the miRNA 3′ backbone is also thought to be critical to enhance target recognition in Arabidopsis. The Tcp4-soj18 mutation at position 19 of miR319 or position 16 of the mRNA target site significantly affects their pairing, although these changes cause only a small difference in the calculated interaction free energy [105]. In another study, Zhang et al. finds that perfect complementarity between the 3′ end of miR173 and the 5′ end of AT2G39675 (TAS1c) is crucial by systematically mutating the miR173 target site [106]. Mismatches at 3′ end of miR173 abolish trans-acting siRNA (ta-siRNA) formation, while mismatches at the 5′ end had less effect. Unlike the seed and central region, the 3′ backbone was underrated by all existing computational methods. The advantages and disadvantages of using different sets of complementarity are that considering only stringent-pair types increases specificity but might miss many potential targets, whereas considering both stringent and moderate-stringent-pair types increases sensitivity but might also increase the number of false positives.

Target site accessibility

The secondary structure is very important for predicting both miRNAs and miRNA targets [107], [108]. An effective miRNA−target interaction begins with the hybridization reaction on an open structure at the target site (Figure 3). After binding, miRISC can disrupt the secondary structure at the site to elongate hybridization [109], [110]. Kiryu et al. performed a detailed investigation of accessibility of target sites recently [111]. They found that the efficacy of miRNAs depends strongly on the accessibility of both the very 5′ and 3′ end of their binding sites, which supports existing bioinformatics practice that extracts a certain length subsequence from upstream and downstream of target sites when computationally calculating the accessibility.

Figure 3

Target site accessibility Accessibility of target mRNA and miRNA is believed to increase the precision of miRNA target predictions because the secondary structure around target site will prevent miRNA and mRNA target from contacting. Many methods mentioned in this article consider this fact by employing various tools (like RNAup, RNAduplex, etc.) to calculate target site accessibility, which is represented by the energy required to open secondary structure around target site. The less energy always means the more possibility that miRNA is able to contact target mRNA.

The free energy between a hybridization of miRNA and its target is considered by some computational methods, but it is not a good indicator, especially compared with the accessibility [112], [113]. UEA sRNA [103], WMD3 [114], TAPIR [115] and p-TAREF [116] first calculate the hybridization energy of miRNA:target duplex, then compare it with the optimal hybridization energy, and finally use the percentage as a filter to choose potential targets. Another two methods, psRNATarget and imiRTP, calculate the accessibility by RNAup program in Vienna RNA package [117]. However, despite being theoretically sound, calculating accessibility could be extremely time-consuming, especially when the mRNA sequence is long. Moreover, current thermodynamic models used in RNA secondary structure prediction algorithms are not very accurate [118].

Evolutionary conservation of target sites

miRNA families are comprised to have the same seed site, and are well-conserved among related species [119]. In addition, miRNA families have targets that are conserved among related species. In early target prediction methods, exploring conserved miRNA targets in homology sequences within related species have been used to re-enforce the target prediction [120], [121]. Several recent methods, like Targetfinder [19], TAPIR, p-TAREF, psRNATarget and imiRTP, still consider this principle as part of a scoring schema. The main challenge of such a principle is that the expression pattern of miRNA targets may be time- and space-specific, thus target sequences may not be detected due to their low expression level in specific organisms [75], leading to false negative predictions from conservative analysis. On top of that, there are also species-specific miRNAs and targets. Applying a conservation filter can decrease the false positive rate, but the side effect is also obvious, since it is only effective for conserved miRNAs. When species-specific miRNAs are of interest, it is important to identify targets both with and without conservation. Besides the aforementioned principles, considering the number of putative miRNA sites per mRNA can significantly enhance target prediction [7]. Unfortunately, the importance of target site multiplicity was generally underestimated while highlighting the perfect match of miRNA and its target site (Table 2). One good example is the biogenesis of AtTAS3 to generate ta-siRNA (Figure 4). Axtell et al. firstly proposed the two-hit trigger model, that is, AtTAS3 is targeted by miR390a and miR390b simultaneously before giving rise to ta-siRNA [47]. Later, Montgomery et al. reported that though miRNA-guided cleavage only occurs at one miRNA binding site, the other miRNA binding site is still necessary for AtTAS3 processing [122]. In legumes, an AP2-like gene Medtr2g093060 contains a miR172-cleaved target site and a non-cleaving miR156 target site [123], which is another good example supporting the importance of multiplicity. Based on our latest work (Ding et al., in preparation), we speculate that this phenomenon is more common in plants.

Figure 4

Target site multiplicity In most situations, one miRNA is enough to change the expression of target genes. A. miR173 can trigger AtTAS1A (At2g27400) to generate ta-siRNAs. But there are some exceptions. B. Both miR390a and miR390b are crucial in the generation of ta-siRNAs from AtTAS3 (At3g17185).

Conclusion

The recent discovery that miRNAs can both regulate and be regulated by target interactions has profound implications for understanding their roles in gene regulation [124]. More questions about how miRNA targeting functions in vivo are raised after the identification of the intertwined relationship. An original and remaining challenge in the field is the ability to find miRNA targets with high confidence. Actually, finding true functional miRNA targets is still challenging even though many biological principles of miRNA targeting have been revealed experimentally and computationally. A perfectly-complementary region on mRNA may not function as an effective target site due to its accessibility in terms of secondary structure and other unknown reasons, leading to false positive predictions [125]. Another problem that has hardly been addressed is multiplicity, since different miRNAs can cooperatively regulate individual targets, but miRNA expression signatures differ between cell types and conditions [27]. Thus, miRNA research will increasingly focus upon miRNA-regulated networks [126], in addition to identifying individual miRNA target interactions. Furthermore, besides repressing target gene expression, the reciprocal effect of targets on miRNAs is not entirely clear. In some cases, target interactions offer a protective influence on miRNA stability, whereas in others the outcome is miRNA degradation. Thus far, it seems that the degree of complementarity between a miRNA and its target determines the fate of the miRNA, and it seems that extensive complementarity favors miRNA destruction [127]. Although so many questions remain to be solved, recent progress in miRNA biology makes it clear that combining multiple strategies is required to obtain a comprehensive high-confidence description of miRNA targeting networks.

Integration of various data sources

Unlike miRNAs in animals, plant miRNAs predominately bind mRNAs at CDS, thus many researches focus on CDS while UTRs are ignored. However, new findings indicate that both animal and plant miRNAs can target 5′UTR, 3′UTR and coding regions [6], [7], although the proportion that target UTRs is small. Expression level of target mRNAs or proteins is generally negatively associated with that of its corresponding miRNAs. The use of transcriptome data, particularly microarray and RNA-seq based expression data, should significantly reduce false positive predictions [128]. In addition, experimentally-verified miRNA: target pairs were collected by previous studies [115] and specific databases (like miRTarBase). Instead of using them as test data, p-TAREF originally runs over this information and applies it in the development of machine learning based methods and thus archives better results than other methods (more details, please see [116]).

Integration of known principles

None of the existing prediction tools has been able to incorporate all currently-known principles (Table 2), even those mentioned in this review. Directly building a model with more principles incorporated might provide higher accuracy and enhance site recognition efficacy, but its implementation might also become more complex. The appearance of imiRTP shed lights on further direction by integrating existing popular computational methods to get quality results, which is inspired by the successful experience of animal miRNA target study [102]. Prior to imiRTP, a series of such methods were proposed, but none of them support plant miRNAs. With the increase of novel target prediction methods, more species will be supported and more accurate results will be provided by such tools.

Integration of different technologies

New high-throughput technologies have accelerated the discovery of sequences that are bound by the miRNA complex in vivo [74], [75]. These datasets provide an experimental framework for training computational algorithms to predict the likelihood of a sequence being recognized by specific miRNAs in a biologically-relevant context. Recently, several pipelines have been proposed for analysis of these high-throughput degradome-seq data and can scale miRNA targets at a genome-wide level. Remaining challenges for these methods come from several aspects. First, the computational cost is still high, including computation time and hardware cost. Second, current pipelines always generate much more candidate pairs than traditional computational methods, which is hard to believe that all of them are real predictions. Third, considering degradome-seq uses similar principle as 5′-RACE, novel experimental methods are needed to evaluate candidates identified by these pipelines. Fourth, with degradome-seq technology, only cleavable targets can be identified, while non-cleavable targets will be missed. starBase provides targets predicted by integration of different computational methods and high-throughput technologies. How to use these data will be an interesting and meaningful question. Along with the development of next-generation technology, hundreds of thousands of nucleic acid data are produced. In order to handle this scale of data, popular target prediction tools need to enhance their input/output ability as well as the ability of parallel calculating. For several tools listed in Table 2, we have tested their abilities (more details can be found in Table 1 in [7]). Thus, there will be a trend to involve compute unified device architecture (CUDA) [129] and cloud computing [130], which are newly-developed high-performance computing techniques, to meet the rapidly-increasing demand.

Competing interests

The authors declare no competing interests.

131 in total

Review 1. Experimental validation of miRNA targets.

Authors: Donald E Kuhn; Mickey M Martin; David S Feldman; Alvin V Terry; Gerard J Nuovo; Terry S Elton
Journal: Methods Date: 2008-01 Impact factor: 3.608

2. Construction of Parallel Analysis of RNA Ends (PARE) libraries for the study of cleaved miRNA targets and the RNA degradome.

Authors: Marcelo A German; Shujun Luo; Gary Schroth; Blake C Meyers; Pamela J Green
Journal: Nat Protoc Date: 2009 Impact factor: 13.491

Review 3. Computational methods for transcriptome annotation and quantification using RNA-seq.

Authors: Manuel Garber; Manfred G Grabherr; Mitchell Guttman; Cole Trapnell
Journal: Nat Methods Date: 2011-05-27 Impact factor: 28.547

4. Nuclear processing and export of microRNAs in Arabidopsis.

Authors: Mee Yeon Park; Gang Wu; Alfredo Gonzalez-Sulser; Hervé Vaucheret; R Scott Poethig
Journal: Proc Natl Acad Sci U S A Date: 2005-02-28 Impact factor: 11.205

5. Endogenous siRNA and miRNA targets identified by sequencing of the Arabidopsis degradome.

Authors: Charles Addo-Quaye; Tifani W Eshoo; David P Bartel; Michael J Axtell
Journal: Curr Biol Date: 2008-05-08 Impact factor: 10.834

6. PAREsnip: a tool for rapid genome-wide discovery of small RNA/target interactions evidenced through degradome sequencing.

Authors: Leighton Folkes; Simon Moxon; Hugh C Woolfenden; Matthew B Stocks; Gyorgy Szittya; Tamas Dalmay; Vincent Moulton
Journal: Nucleic Acids Res Date: 2012-03-29 Impact factor: 16.971

7. The sufficient minimal set of miRNA seed types.

Authors: Daniel C Ellwanger; Florian A Büttner; Hans-Werner Mewes; Volker Stümpflen
Journal: Bioinformatics Date: 2011-03-26 Impact factor: 6.937

8. starBase: a database for exploring microRNA-mRNA interaction maps from Argonaute CLIP-Seq and Degradome-Seq data.

Authors: Jian-Hua Yang; Jun-Hao Li; Peng Shao; Hui Zhou; Yue-Qin Chen; Liang-Hu Qu
Journal: Nucleic Acids Res Date: 2010-10-30 Impact factor: 16.971

9. RepTar: a database of predicted cellular targets of host and viral miRNAs.

Authors: Naama Elefant; Amnon Berger; Harel Shein; Matan Hofree; Hanah Margalit; Yael Altuvia
Journal: Nucleic Acids Res Date: 2010-12-10 Impact factor: 16.971

10. PatMaN: rapid alignment of short sequences to large databases.

Authors: Kay Prüfer; Udo Stenzel; Michael Dannemann; Richard E Green; Michael Lachmann; Janet Kelso
Journal: Bioinformatics Date: 2008-05-08 Impact factor: 6.937

16 in total

Review 1. Plant microRNAs: biogenesis, gene silencing, web-based analysis tools and their use as molecular markers.

Authors: Sandhya Tyagi; Sandeep Sharma; Showkat Ahmad Ganie; Mohd Tahir; Reyazul Rouf Mir; Renu Pandey
Journal: 3 Biotech Date: 2019-10-23 Impact factor: 2.406

2. Base-Pairing Requirements for Small RNA-Mediated Gene Silencing of Recessive Self-Incompatibility Alleles in Arabidopsis halleri.

Authors: Nicolas Burghgraeve; Samson Simon; Simon Barral; Isabelle Fobis-Loisy; Anne-Catherine Holl; Chloé Ponitzki; Eric Schmitt; Xavier Vekemans; Vincent Castric
Journal: Genetics Date: 2020-05-27 Impact factor: 4.562

3. Computational Detection of MicroRNA Targets.

Authors: Pedro Gabriel Nachtigall; Luiz Augusto Bovolenta
Journal: Methods Mol Biol Date: 2022

Review 4. Non-coding RNAs and plant male sterility: current knowledge and future prospects.

Authors: Ankita Mishra; Abhishek Bohra
Journal: Plant Cell Rep Date: 2018-01-13 Impact factor: 4.570

Review 5. MicroRNAs and drought responses in sugarcane.

Authors: Agustina Gentile; Lara I Dias; Raphael S Mattos; Thaís H Ferreira; Marcelo Menossi
Journal: Front Plant Sci Date: 2015-02-23 Impact factor: 5.753

6. Small RNAs from Bemisia tabaci Are Transferred to Solanum lycopersicum Phloem during Feeding.

Authors: Paula J M van Kleeff; Marc Galland; Robert C Schuurink; Petra M Bleeker
Journal: Front Plant Sci Date: 2016-11-24 Impact factor: 5.753

7. Cold signaling and cold response in plants.

Authors: Kenji Miura; Tsuyoshi Furumoto
Journal: Int J Mol Sci Date: 2013-03-06 Impact factor: 5.923

Review 8. Role of bioinformatics in establishing microRNAs as modulators of abiotic stress responses: the new revolution.

Authors: Anita Tripathi; Kavita Goswami; Neeti Sanan-Mishra
Journal: Front Physiol Date: 2015-10-26 Impact factor: 4.566

9. A comparison of performance of plant miRNA target prediction tools and the characterization of features for genome-wide target prediction.

Authors: Prashant K Srivastava; Taraka Ramji Moturu; Priyanka Pandey; Ian T Baldwin; Shree P Pandey
Journal: BMC Genomics Date: 2014-05-08 Impact factor: 3.969

10. MicroRNAs: mechanisms, functions and progress.

Authors: Tongbin Li; William C S Cho
Journal: Genomics Proteomics Bioinformatics Date: 2012-11-02 Impact factor: 7.691