| Literature DB >> 28622364 |
Anouar Boucheham1,2,3, Vivien Sommard1,4, Farida Zehraoui1, Adnane Boualem4, Mohamed Batouche3, Abdelhafid Bendahmane4, David Israeli5, Fariza Tahi1.
Abstract
Many computational tools have been proposed during the two last decades for predicting piRNAs, which are molecules with important role in post-transcriptional gene regulation. However, these tools are mostly based on only one feature that is generally related to the sequence. Discoveries in the domain of piRNAs are still in their beginning stages, and recent publications have shown many new properties. Here, we propose an integrative approach for piRNA prediction in which several types of genomic and epigenomic properties that can be used to characterize these molecules are examined. We reviewed and extracted a large number of piRNA features from the literature that have been observed experimentally in several species. These features are represented by different kernels, in a Multiple Kernel Learning based approach, implemented within an object-oriented framework. The obtained tool, called IpiRId, shows prediction results that attain more than 90% of accuracy on different tested species (human, mouse and fly), outperforming all existing tools. Besides, our method makes it possible to study the validity of each given feature in a given species. Finally, the developed tool is modular and easily extensible, and can be adapted for predicting other types of ncRNAs. The IpiRId software and the user-friendly web-based server of our tool are now freely available to academic users at: https://evryrna.ibisc.univ-evry.fr/evryrna/.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28622364 PMCID: PMC5473586 DOI: 10.1371/journal.pone.0179787
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Relationship between piRNA biogenesis (transcription, processing and function) and measured features.
(i) piRNA clusters can be transcribed if particular methylated histone (fly) or A-Myb promoter (mouse) is nearby; (ii) G-quadruplexes could have a role in piRNA processing and (iii) both first and tenth piRNA bases (respectively U and A) represent an important binding zone for Argonaute proteins, participating in a ping-pong cycle where the piRNA sequences bind with transposons.
piRNA’s biological features over species.
| Feature | Species | References |
|---|---|---|
| First Uridine | Fly, Mouse, Human, Rat, Nematode (C. elegans), Zebrafish and Silkworm (Bombyx mori) | [ |
| Tenth Adenine | Human, Fly, Mouse, Zebrafish and Silkworm (Bombyx mori) | [ |
| Occurrence in clusters | Mammals and Insects | [ |
| Binding with transposons | Mammals and Insects | [ |
| CpG islands | Mammals | [ |
| G-Quadruplex | Human, Mouse, Rat and Macaque | [ |
| Transposable elements presence | Mouse and Marmoset | [ |
| Promoter A-Myb | Mouse | [ |
| Inverted repeats | Mouse | [ |
| Distance to centromeres/telomeres | Fly | [ |
| Histone methylation | Fly | [ |
Fig 2The different kernel classes defined in IpiRId and their hierarchical organisation.
IpiRId’s kernels instantiation.
(D: distance; L: minimal length).
| Kernel | Class | Instantiation parameters |
|---|---|---|
| U1|A10 | Specific motifs inside | {motif,position}: {U,1}, {A,10} |
| K-merFreq | K-mer frequencies | |
| K-merPos | K-mer positions | |
| TE binding | Binding with targets | target: Transposable elements (TE) |
| CentroTelo | Specific positions | observation: centromer, telomeres |
| Histone | Specific positions | observation: H3K9me3, H3K27me3 |
| Cluster | Clusters | |
| A-Myb | Promoters upstream | promoter: A-Myb; |
| G-Quadruplex | G-quadruplex | |
| CpG islands | CpG islands | |
| LINE|SINE|LTR | Transposons | TE: LINE, SINE, LTR; |
| InvertRep | Inverted repeats |
The downloaded data used in our integrative approach for piRNAs identificatiton across species.
| Species/Dataset | positive | negative | chip-seq data | transposons | reference genome assembly | |||
|---|---|---|---|---|---|---|---|---|
| piRNA | tRNA | miRNA | exonic regions | H3K9me3 | H3K27me3 | |||
| Homo sapiens | 32 208 | 449 | 1 747 | 9 113 | 6 346 007 | 8 968 536 | 903 140 | hg38 |
| Mus Musculus | 39 986 | 244 | 712 | 4 896 | 2 751 | 1 232 402 | 3 504 253 | mm10 |
| Drosophila melanogaster | 18 508 | 93 | 288 | 740 | 508 | 2 322 | 803 255 | dm6 |
Performance comparison.
5-fold cross-validation results of IpiRId and other existing tools according to: Accuracy (Acc), Sensitivity (Se), Specificity (Sp), Precision (Pre) and F1 score (F1).
| Tool/Species | Human | Mouse | Fly | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Acc | Se | Sp | Pre | F1 | Acc | Se | Sp | Pre | F1 | Acc | Se | Sp | Pre | F1 | |
| piRNApredictor | 71.85+-1.53 | 48.40 | 63.30 | 70.95+-1.15 | 47.79 | 94.10 | 89.01 | 62.19 | 52.17+-3.72 | 63.90 | 40.45 | 51.76 | 57.19 | ||
| Piano | 50 | 0 | 100 | 0 | 0 | 50 | 0 | 100 | 0 | 0 | 87.9+-1.472 | 78.90 | 96.90 | 96.22 | 86.70 |
| Pibomd | 78.13+-1.38 | 78.05 | 78.21 | 78.17 | 78.11 | 79.13+-1.19 | 79.43 | 78.82 | 78.94 | 79.18 | 66.08+-4.02 | 70.44 | 61.72 | 64.78 | 67.94 |
| piRPred | 81.20+-1.25 | 80.54 | 81.86 | 81.67 | 81.07 | 90.92 +-0.51 | 90.36 | 91.48 | 91.39 | 90.87 | 86.36+-2.33 | 86 | 86.72 | 86.66 | 86.30 |
| IpiRId | 89.62 | 89.73 | |||||||||||||
Fig 3ROC space and plots of the 5-fold cross-validation results of IpiRId and other tools across species, with fixed parameters.
Fig 4IpiRId prediction results on piRNA and pseudo-piRNA sequences across species.
Fig 5IpiRId’s features pertinence across species: Mouse, Human and Fly.