| Literature DB >> 22362754 |
Sébastien Tempel1, Fariza Tahi.
Abstract
miRNAs are small non coding RNA structures which play important roles in biological processes. Finding miRNA precursors in genomes is therefore an important task, where computational methods are required. The goal of these methods is to select potential pre-miRNAs which could be validated by experimental methods. With the new generation of sequencing techniques, it is important to have fast algorithms that are able to treat whole genomes in acceptable times. We developed an algorithm based on an original method where an approximation of miRNA hairpins are first searched, before reconstituting the pre-miRNA structure. The approximation step allows a substantial decrease in the number of possibilities and thus the time required for searching. Our method was tested on different genomic sequences, and was compared with CID-miRNA, miRPara and VMir. It gives in almost all cases better sensitivity and selectivity. It is faster than CID-miRNA, miRPara and VMir: it takes ≈ 30 s to process a 1 MB sequence, when VMir takes 30 min, miRPara takes 20 h and CID-miRNA takes 55 h. We present here a fast ab-initio algorithm for searching for pre-miRNA precursors in genomes, called miRNAFold. miRNAFold is available at http://EvryRNA.ibisc.univ-evry.fr/.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22362754 PMCID: PMC3367186 DOI: 10.1093/nar/gks146
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.(A) Almost all known pre-miRNAs contain at least one long stem. Percentage of pre-miRNA hairpins in human genome, mouse genome and in all miRBase, in function of the length of their longest stem. (B) Almost all known pre-miRNAs do not contain big bulges. Percentage of pre-miRNAs in human genome, mouse genome and in all miRBase, having a gap of a given size, i.e. having an excess of nucleotides on one side of the hairpin. A gap of zero corresponds to a same number of nucleotides on both sides. (C) Almost all known pre-miRNAs are covered by a non-exact stem. Percentage of pre-miRNAs in miRBase in function of the percentage of nucleotides covered by a non-exact stem. (D) Size of the biggest symmetrical loop in a non-exact stem. Percentage of non-exact stems from miRBase in function of the size of their biggest symmetrical loop.
Figure 2.(A) Example of a symmetrical matrix for searching for exact and non-exact stems in a given genomic subsequence. Three stems are selected with a threshold of minimum length equal to 4 (surrounded by a blue circle). One of the three stems has been extended to a non-exact stem (surrounded by a red circle). (B) Search for hairpins. The anchor (surrounded by an orange circle) of the non-exact stem shown in (A) is positioned in the matrix, and then is extended in left and in right (green areas) on different diagonals, in order to allow bulges and internal loops.
Results obtained by miRNAFold, CID-miRNA, miRPara and Vmir on an artificial sequence
| Sensitivity | Selectivity | Time (min : | |
|---|---|---|---|
| CID-miRNA | 97 | 11.72 | 90:49 |
| miRPara | 97 | 9.7 | 5:24 |
| VMir | 28 | 1.32 | 2:32 |
| miRNAFold50 | 18.77 | 0:0.88 | |
| miRNAFold60 | 18.96 | 0:0.88 | |
| miRNAFold70 | 97 | 19.17 | 0:0.84 |
| miRNAFold80 | 96 | 22.91 | 0:0.76 |
| miRNAFold90 | 65 |
Comparison of prediction results obtained on an artificial sequence by miRNAFold, CID-miRNA, miRPara and Vmir. miRNAFold was run with different values for the parameter of minimal percentage of verified criteria: 50, 60, 70, 80 and 90. Values in bold denote best results.
Sensitivity of CID-miRNA, miRNAFold, miRPara and VMir
| Human | Mouse | Zebrafish | Sea squirt | |
|---|---|---|---|---|
| CID-miRNA | 38 | 29. 58 | 19.30 | 28.26 |
| miRPara | 98 | 47.37 | 58.7 | |
| VMir | 88.73 | 84.21 | ||
| miRNAFold70 | 91.30 |
Sensitivity results obtained by CID-miRNA, miRNAFold, miRPara and VMir on Human, Mouse, Zebrafish and Sea squirt genomic sequences. Values in bold denote best results.
Selectivity of CID-miRNA, miRNAFold, miRPara and VMir
| Human | Mouse | Zebrafish | Sea squirt | |
|---|---|---|---|---|
| CID-miRNA | 0.69 | 0.82 | 0.75 | |
| miRPara | 5.34 | 1.4 | 5.86 | |
| VMir | 0.56 | 2.93 | 1.35 | 5.29 |
| miRNAFold70 | 0.89 | 7.98 |
Selectivity results of CID-miRNA, miRNAFold, miRPara and VMir obtained on Human, Mouse, Zebrafish and Sea squirt genomic sequences. Values in bold denote best results.
Run time of CID-miRNA, miRNAFold, miRPara and VMir
| Human | Mouse | Zebrafish | Sea squirt | Average | |
|---|---|---|---|---|---|
| CID-miRNA | 54 h 58 m | 54 h 48 m | 54 h 40 m | 55 h 29 m | 55 h 08 m |
| miRPara | 20 h 12 m | 19 h 47 m | 19 h 40 m | 19 h 25 m | 19 h 46 m |
| VMir | 30 m | 30 m | 30 m | 30 m | 30 m |
| miRNAFold70 |
Execution time of the algorithms CID-miRNA, miRNAFold, miRPara and VMir for predicting pre-miRNAs in genomic sequences of 1 million of nucleotides each in the four species Human, Mouse, Zebrafish and Sea squirt. The values of miRNAFold was rounded to the second. The values of CID-miRNA, miRPara, and Vmir were rounded to the minutes. The last column shows the average execution time for a sequence of 1 million of nucleotides. Values in bold denote best results.