| Literature DB >> 20805242 |
Abstract
Considering accessibility of the 3'UTR is believed to increase the precision of microRNA target predictions. We show that, contrary to common belief, ranking by the hybridization energy or by the sum of the opening and hybridization energies, used in currently available algorithms, is not an efficient way to rank predictions. Instead, we describe an algorithm which also considers only the accessible binding sites but which ranks predictions according to over-representation. When compared with experimentally validated and refuted targets in the fruit fly and human, our algorithm shows a remarkable improvement in precision while significantly reducing the computational cost in comparison with other free energy based methods. In the human genome, our algorithm has at least twice higher precision than other methods with their default parameters. In the fruit fly, we find five times more validated targets among the top 500 predictions than other methods with their default parameters. Furthermore, using a common statistical framework we demonstrate explicitly the advantages of using the canonical ensemble instead of using the minimum free energy structure alone. We also find that 'naïve' global folding sometimes outperforms the local folding approach.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20805242 PMCID: PMC3017612 DOI: 10.1093/nar/gkq768
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.While over-representation is an excellent ranking criterion, hybridization energy is a poor one. All miRNA-3′UTR pairs with at least one perfect seed 2–8 match are ranked according to over-representation (measured by PSH and labeled as PACMIT-0.0), total free energy, hybridization energy and random order. Precision versus sensitivity curves are shown for (a) the fruit fly and (b) the human. The number of true positives (i.e. experimentally validated targets) among the top predictions is also shown for (c) the fruit fly and (d) the human. In panel (d), the bars for hybridization and total free energies are not visible because the number of true positives for these two methods is always zero.
Precision obtained with PACMIT using different folding schemes
| Method | Precision | |||
|---|---|---|---|---|
| Fruit fly | Fruit fly | Human | Human | |
| PACMIT-0.0 | 0.900 | 0.900 | 0.483 | 0.483 |
| PACMIT-0.1 | 0.923 | 0.900 | 0.469 | 0.486 |
| 0.466 | ||||
| PACMIT-0.3 | 0.947 | 0.900 | 0.414 | 0.483 |
| PACMIT-0.4 | 0.923 | 0.878 | N.A. | 0.449 |
| PACMIT-0.5 | N.A. | 0.860 | N.A. | 0.424 |
| PACMIT-MFES | 0.923 | 0.923 | 0.405 | 0.405 |
aPrecision obtained for the same sensitivity as that obtained by PACMIT-MFES, i.e. SE = 0.263 for the fruit fly and SE = 0.085 for the human.
bUsing RNAplfold with global folding (W = L = l).
cUsing RNAplfold with local folding (W = 80 and L = 40).
dNot available for the sensitivity cutoff.
*Used here and in the main text to denote global folding.
Figure 2.Considering accessibility can increase the precision of predictions. Precision versus sensitivity curves are shown for (a) the fruit fly and (b) the human. The different folding procedures used to include accessibility are compared with the case in which accessibility is not considered i.e. PACMIT-0.0. See the main text for the precise meaning of each label.
Figure 3.Comparison of PACMIT with other methods. Precision versus sensitivity curves obtained with the default parameters of different methods are shown for (a) the fruit fly and (b) the human. We also show the curves obtained with the ‘high-precision’ parameters for (c) the fruit fly and (d) the human. For the description of the ‘Seed 2–8’ and ‘Random’ curves, see the main text.
Figure 4.PACMIT has a higher number of validated targets among the top predictions. Numbers of validated targets among the top 100, 500 and 1000 predictions are shown for the fruit fly predictions by different methods under the (a) default and (b) ‘high-precision’ parameters. Also shown is the number of validated targets predicted before predicting the first false positive (see the rightmost cluster of bars, labeled ‘While PR = 1’).