| Literature DB >> 32655289 |
Isha Monga1, Indranil Banerjee1.
Abstract
RATIONALE: PIWI-interacting RNAs (piRNAs) are a recently-discovered class of small non-coding RNAs (ncRNAs) with a length of 21-35 nucleotides. They play a role in gene expression regulation, transposon silencing, and viral infection inhibition. Once considered as "dark matter" of ncRNAs, piRNAs emerged as important players in multiple cellular functions in different organisms. However, our knowledge of piRNAs is still very limited as many piRNAs have not been yet identified due to lack of robust computational predictive tools.Entities:
Keywords: algorithm; classification; non-coding RNA; physicochemical; piRNA; prediction
Year: 2019 PMID: 32655289 PMCID: PMC7327968 DOI: 10.2174/1389202920666191129112705
Source DB: PubMed Journal: Curr Genomics ISSN: 1389-2029 Impact factor: 2.236
Fig. (1)Schematic illustration of the overall workflow adopted to develop piRNAPred: Left and right arm demonstrates the processing of piRNA and non-piRNA sequence from piRBase and NONCODE, respectively to generate a dataset (D1684p+1665n) followed by their downstream conversion into sequence, structure, thermodynamic, physiochemical and BINARY1+10 feature space and predictive model development. (A higher resolution / colour version of this figure is available in the electronic copy of the article).
|
|
Performance of different predictive models using SVM during 10-fold cross-validation.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | MONO | 4 | 0 | 1426 | 867 | 798 | 258 | 84.7 | 47.93 | 66.41 | 0.35 | 1 | 5 | 0.75 | |
| 2 | DI | 16 | 0.1 | 1153 | 531 | 1134 | 531 | 68.5 | 68.11 | 68.29 | 0.37 | 5.00E-05 | 50 | 0.73 | |
| 3 | TRI | 64 | 0.1 | 1189 | 501 | 1164 | 495 | 70.6 | 69.91 | 70.26 | 0.41 | 5.00E-05 | 500 | 0.76 | |
| 4 | TETRA | 256 | 0 | 1126 | 282 | 1383 | 558 | 66.9 | 83.06 | 74.92 | 0.51 | 0.01 | 1 | 0.81 | |
| 5 | PENTA | 1024 | 0 | 1363 | 412 | 1253 | 321 | 80.9 | 75.26 | 78.11 | 0.56 | 0.0005 | 50 | 0.85 | |
| 6 | MD | Hybrid | 20 | 0 | 1212 | 574 | 1091 | 472 | 72 | 65.53 | 68.77 | 0.38 | 1.00E-05 | 300 | 0.74 |
| 7 | MT | 68 | 0.1 | 1165 | 484 | 1181 | 519 | 69.2 | 70.93 | 70.05 | 0.40 | 5.00E-05 | 100 | 0.75 | |
| 8 | DT | 80 | 0 | 1228 | 557 | 1108 | 456 | 72.9 | 66.55 | 69.75 | 0.40 | 0.0001 | 50 | 0.76 | |
| 9 | MDT | 84 | 0.1 | 1172 | 499 | 1166 | 512 | 69.6 | 70.03 | 69.81 | 0.40 | 1.00E-05 | 1000 | 0.75 | |
| 10 | MDTT | 340 | 0 | 1233 | 554 | 1111 | 451 | 73.2 | 66.73 | 69.99 | 0.40 | 0.0001 | 10 | 0.76 | |
| 11 | MDTTP | 1364 | -0.1 | 1308 | 572 | 1093 | 376 | 77.7 | 65.65 | 71.69 | 0.44 | 5.00E-05 | 300 | 0.79 | |
| 12 | A | SSTE based features | 32 | 0 | 1575 | 38 | 1627 | 109 | 93.5 | 97.72 | 95.61 | 0.91 | 0.0001 | 10 | 0.70 |
| 13 | B | Thermodynamic energies of RNA dinucleotides | 16 | 0.1 | 1282 | 310 | 1355 | 402 | 76.1 | 81.38 | 78.74 | 0.58 | 0.05 | 250 | 0.86 |
| 14 | C | Physicochemical properties of RNA dinucleotides | 96 | 0 | 1301 | 388 | 1277 | 383 | 77.3 | 76.7 | 76.98 | 0.54 | 1.00E-05 | 5 | 0.85 |
| 15 | BINARY1+10 | position specific NT usage of 1st and 10th position (Binary) | 16 | 0 | 817 | 642 | 522 | 363 | 69.24 | 44.85 | 57.12 | 0.15 | 1.00E-005 | 100 | 0.35 |
| 16 | AB | Hybrids of SSTE, Thermodynamic energies of RNA dinucleotides and Physicochemical properties of RNA dinucleotides (ABC) with MDTTP | 48 | -0.3 | 1653 | 27 | 1638 | 31 | 98.2 | 98.38 | 98.27 | 0.97 | 0.0005 | 500 | 1.00 |
| 17 | AC | 128 | 0 | 1468 | 230 | 1435 | 216 | 87.2 | 86.19 | 86.68 | 0.73 | 1.00E-05 | 1000 | 0.93 | |
| 18 | BC | 112 | 0 | 1301 | 388 | 1277 | 383 | 77.3 | 76.7 | 76.98 | 0.54 | 1.00E-05 | 5 | 0.85 | |
| 19 | ABC | 144 | -0.1 | 1502 | 275 | 1390 | 182 | 89.2 | 83.48 | 86.35 | 0.73 | 1.00E-05 | 500 | 0.93 | |
| 20 | MDTTP + A | 1396 | 0 | 1592 | 99 | 1566 | 92 | 94.5 | 94.05 | 94.3 | 0.89 | 5.00E-05 | 100 | 0.98 | |
| 21 | MDTTP + B | 1380 | 0 | 1436 | 333 | 1332 | 248 | 85.3 | 80 | 82.65 | 0.65 | 5.00E-05 | 200 | 0.90 | |
| 22 | MDTTP + C | 1460 | 0 | 1369 | 127 | 1538 | 315 | 81.3 | 92.37 | 86.8 | 0.74 | 0.01 | 5 | 0.89 | |
| 23 | MDTTP +AB | Hybrid of | 1412 | 0 | 1608 | 71 | 1594 | 76 | 95.5 | 95.74 | 95.61 | 0.91 | 5.00E-05 | 100 | 0.99 |
| 24 | MDTTP +BC | 1476 | 0.1 | 1065 | 9 | 1656 | 619 | 63.2 | 99.46 | 81.25 | 0.67 | 0.01 | 5 | 0.89 | |
| 25 | MDTTP +AC | 1492 | 0 | 1488 | 227 | 1438 | 196 | 88.4 | 86.37 | 87.37 | 0.75 | 1.00E-05 | 50 | 1.00 | |
| 26 | MDTTP +ABC | 1508 | 0.1 | 1660 | 23 | 1642 | 24 | 98.57 | 98.62 | 98.60 | 0.97 | 1.00E-05 | 50 | 0.99 | |
| 27 | MDTTP+ | 1516 | 0 | 1668 | 43 | 1622 | 16 | 99.05 | 97.42 | 98.24 | 0.96 | 1.00E-005 | 50 | 0.99 |
Abbreviations: Acc, Accuracy; diNT, dinucleotide; c, Regularization parameter; FP, False Positive; FN, False Negative; g, Gamma (a kernel density parameter); k-MF, k-Mer Features OR k-MNC, k-Mer Nucleotide Composition; MMP, Mismatch Profile; MCC, Mathews Correlation Coefficient, pseDNC, pseudo Dinucleotide Composition; PCPseDNC, Parallel Correlation Pseudo Dinucleotide Composition; PSSM, Position-Specific Scoring Matrix; SSTE, Structure-Sequence Triplet Elements; SSP, Subsequence Profile; Sn, Sensitivity; Sp, Specificity; SSTE, Structure-Sequence Triplet Elements (A); B, Thermo-dynamic energies of contiguous dinucleotides; C, RNA physicochemical properties of adjoining dinucleotides (diNTs); AB, hybrid of SSTE and Thermo-dynamic energies of contiguous diNTs; Thres, threshold; TN, True Negative; TP, True Positive, AC, hybrid of SSTE and RNA physicochemical property of adjoining diNTs; ABC, hybrid of SSTE, thermo-dynamic energy and RNA physicochemical property of contiguous dinucleotides; ROC, Receiver Operating Characteristic.
Performance of 1508 features on different machine learning techniques during 10-fold cross-validation.
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|
| 1 | SVM ( | 1660 | 24 | 1642 | 23 | 98.57 | 98.62 | 98.60 | 0.97 |
| 2 | Random Forest | 1616 | 68 | 1647 | 18 | 95.96 | 98.92 | 97.43 | 0.95 |
| 3 | Bagging | 1611 | 73 | 1628 | 37 | 95.67 | 97.78 | 96.72 | 0.93 |
| 4 | Classification | 1586 | 98 | 1576 | 89 | 94.18 | 94.65 | 94.42 | 0.89 |
| 5 | J48 Pruned tree | 1553 | 131 | 1515 | 150 | 92.22 | 90.99 | 91.61 | 0.83 |
| 6 | Naive Bayes | 920 | 764 | 1362 | 303 | 54.63 | 81.8 | 68.14 | 0.38 |
| 7 | IbK | 1302 | 382 | 610 | 1055 | 77.32 | 36.64 | 57.09 | 0.15 |
Abbreviations: Acc, Accuracy; FP, False Positive; FN, False Negative; MCC, Mathews Correlation Coefficient, Sn, Sensitivity; Sp, Specificity; SVM, Support Vector Machines; TN, True Negative; TP, True Positive.
Comparison of piRNAPred with current state-of-the-art piRNA prediction methods.
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|
| 1 | piRNA | k-MF (k=5, 1364) | 72.47 | 95.5 | NA | NA | (Zhang | |
| 2 | PIANO | SSTE | 95.89 | 94.6 | 95.27 | NA | (Wang | |
| 3 | Pibomd | SSP | 91.48 | 89.8 | 90.62 | NA | (Liu | |
| 4 | Accurate piRNA prediction | k-MF, MMP, SSP, PSSM, pseDNC, SSTE | 83.10 | 82.10 | 82.6 | 0.651 | (Luo | |
| 5 | GA-WE | k-MF, PCPseDNC, PSSM | 90.6 | 78.3 | 84.4 | 0.694 | (Li | |
| 6 | 2L-piRNA | pseDNC, C | 88.3 | 83.9 | 86.1 | 0.723 | (Liu | |
| 7 | piRNApred | k-MNC, SSTE, B and C | 98.57 | 98.6 | 98.6 | 0.97 | Algorithm proposed in the current study |
Abbreviations: Acc, Accuracy; k-MF, k-Mer Features OR k-MNC, k-Mer Nucleotide Composition OR SP, Spectrum Profile; MMP, Mismatch Profile; MCC, Mathews Correlation Coefficient, pseDNC, pseudo Dinucleotide Composition; PCPseDNC, Parallel Correlation Pseudo Dinucleotide Composition; PSSM, Position-Specific Scoring Matrix; SSTE, Structure-Sequence Triplet Elements; SSP, Subsequence Profile; Sn, Sensitivity; Sp, Specificity; SSTE, Structure-Sequence Triplet Elements (A); B, Thermo-dynamic energies of contiguous dinucleotides; C, RNA physicochemical properties of adjoining dinucleotides (diNTs).