| Literature DB >> 32001758 |
Yen-Wei Chu1,2,3,4,5,6, Kai-Po Chang6,7, Chi-Wei Chen1,8, Yu-Tai Liang1, Zhi Thong Soh1,9, Li-Ching Hsieh10,11,12,13,14,15.
Abstract
MicroRNAs (miRNAs) are short non-coding RNAs that regulate gene expression and biological processes through binding to messenger RNAs. Predicting the relationship between miRNAs and their targets is crucial for research and clinical applications. Many tools have been developed to predict miRNA-target interactions, but variable results among the different prediction tools have caused confusion for users. To solve this problem, we developed miRgo, an application that integrates many of these tools. To train the prediction model, extreme values and median values from four different data combinations, which were obtained via an energy distribution function, were used to find the most representative dataset. Support vector machines were used to integrate 11 prediction tools, and numerous feature types used in these tools were classified into six categories-binding energy, scoring function, evolution evidence, binding type, sequence property, and structure-to simplify feature selection. In addition, a novel evaluation indicator, the Chu-Hsieh-Liang (CHL) index, was developed to improve the prediction power in positive data for feature selection. miRgo achieved better results than all other prediction tools in evaluation by an independent testing set and by its subset of functionally important genes. The tool is available at http://predictor.nchu.edu.tw/miRgo.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32001758 PMCID: PMC6992741 DOI: 10.1038/s41598-020-58336-5
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Distribution of binding energy of miRNA–mRNA pairs for positive (red) and negative data (blue).
Computational tools for predicting miRNA–target interactions.
| Tool | Input typea | Method | Availability | Year | Integrationb |
|---|---|---|---|---|---|
| PicTar | m or g | Sequence complementarity, thermodynamics and statistical model | Web-based | 2005 | x |
| RNA22c | m and g | Sequence complementarity, thermodynamics and statistical model | Stand-alone | 2006 | ○ |
| RNAhybridc | m and g | Thermodynamics and statistical model | Stand-alone | 2006 | ● |
| TargetScan | m and g | Sequence complementarity, thermodynamics | Web-based | 2007 | ● |
| PITAc | m and g | Site accessibility, thermodynamics | Stand-alone | 2007 | ○ |
| miRandac | m and g | Sequence complementarity, thermodynamics | Stand-alone | 2008 | ● |
| miRDB | m or g | Machine learning (support vector machines) | Web-based | 2008 | x |
| RNAduplex | m and g | Thermodynamics and statistical model | Stand-alone | 2011 | ● |
| microT-CDSc | m or g | Sequence complementarity, thermodynamics | Stand-alone | 2012 | ● |
| STarMirDB | m and g | Sequence complementarity, thermodynamics and statistical model | Web-based | 2013 | ● |
| PACCMIT-CDS | m and g | Sequence complementarity and statistical model | Web-based | 2013 | ○ |
| MBSTAR | m and g | Machine learning (support vector machines) | Stand-alone | 2013 | ○ |
| MiRTDL | m and g | Machine learning (convolutional neural network) | Web-based | 2016 | x |
| TarPmiR | m and g | Machine learning (random forest) | Stand-alone | 2016 | ○ |
aThe required input information. m: microRNA, g: gene.
bWhether the tool integrated in miRgo. ○: the tool integrated in miRgo, ●: the integrated tool with selected features, x: the tool not integrated in miRgo.
cThese tools provide web-based service as well, but miRgo utilizes the results generated from stand-alone programs.
The features utilized in miRgo.
| Feature category | Featurea |
|---|---|
| Energy | binding energy, minimum free energy, folding energy |
| Scoring function | mirSVR score, context score, RNA22 p-value, RNAhybrid p-value, logistic probability of the site, miTG score, PACCMIT-CDS p-value, binding probability, m/e motif |
| Evolution evidence | conservation, Pct |
| Binding type | gene start and end sites, microRNA start and end sites, seed type, binding position, binding site, seed match |
| Sequence property | alignment score, nucleotide composition, AU content |
| Structure | ΔGhybrid, ΔGnucl, ΔGtotal, ΔGduplex, ΔGopen, ΔΔG, accessibility |
aThe description for each feature is listed in Supplementary Table S1.
Figure 2Flowchart of the miRgo prediction system.
Figure 3Model performance for various training sets based on different P/N ratios. For each P/N ratio, ten randomly sampled training sets were generated for performance evaluation using six indicators, Sn, Sp, Acc, MCC, the F1-score and the CHL-index.
Performance comparison of different miRNA–target interaction prediction methods for the trA set.
| Prediction method | Sn | Sp | Acc | F1-score | MCC | MCC′ | CHL-index |
|---|---|---|---|---|---|---|---|
| miRgo_trAa | 0.9992 | 0.9981 | 0.9986 | 0.9986 | 0.9973 | 0.9986 | 0.9986 |
| RNA22 | 0.7608 | 0.8798 | 0.8203 | 0.8090 | 0.6452 | 0.8226 | 0.8172 |
| miRanda_0_0 | 0.1967 | 0.9691 | 0.5828 | 0.3204 | 0.2610 | 0.6305 | 0.4671 |
| miRanda_0_C | 0.0568 | 0.9903 | 0.5235 | 0.1065 | 0.1315 | 0.5657 | 0.2296 |
| miRanda_S_0 | 0.1298 | 0.9896 | 0.5596 | 0.2277 | 0.2337 | 0.6169 | 0.3846 |
| miRanda_S_C | 0.0479 | 0.9934 | 0.5206 | 0.0909 | 0.1270 | 0.5635 | 0.2041 |
| STMDB_3US | 0.2434 | 1.0000 | 0.6216 | 0.3915 | 0.3722 | 0.6861 | 0.5338 |
| STMDB_3ULS | 0.4552 | 0.7248 | 0.5900 | 0.5261 | 0.1869 | 0.5934 | 0.5681 |
| STMDB_CS | 0.0000 | 1.0000 | 0.4999 | nullb | 0.0000 | 0.5000 | nullb |
| STMDB_CLS | 0.4892 | 0.5226 | 0.5059 | 0.4975 | 0.0118 | 0.5059 | 0.5031 |
| STMDB_5US | 0.0228 | 1.0000 | 0.5113 | 0.0446 | 0.1074 | 0.5537 | 0.1145 |
| STMDB_5ULS | 0.4451 | 0.4863 | 0.4657 | 0.4545 | −0.0686 | 0.4657 | 0.4619 |
| TargetScan | 0.9668 | 0.9084 | 0.9376 | 0.9394 | 0.8767 | 0.9383 | 0.9384 |
| DIANA_microT | 0.2832 | 0.9988 | 0.6410 | 0.4410 | 0.4038 | 0.7019 | 0.5712 |
| PITA | 0.0978 | 0.9888 | 0.5432 | 0.1763 | 0.1906 | 0.5953 | 0.3263 |
| TarPmiR | 0.9610 | 0.9157 | 0.9384 | 0.9397 | 0.8776 | 0.9388 | 0.9390 |
| MBSTAR | 0.2902 | 0.7101 | 0.5001 | 0.3673 | 0.0003 | 0.5002 | 0.4463 |
| PACCMIT-CDS | 0.0761 | 0.9992 | 0.5376 | 0.1414 | 0.1959 | 0.5980 | 0.2829 |
aThe miRgo_TrA model was trained on the trA training data with 10-fold cross validation.
bnull: The F1-score and the CHL-index cannot be calculated because both TP and FP are zeros in this case.
Performance comparison of the miRgo models with and without the feature selection (FS) procedure for the trA set.
| Model | Sn | Sp | Acc | F1-score | MCC | MCC′ | CHL-index |
|---|---|---|---|---|---|---|---|
| miRgo_trAa | 0.9992 | 0.9981 | 0.9986 | 0.9986 | 0.9973 | 0.9986 | 0.9986 |
| miRgo_trA_FS-mRMRb | 1.0000 | 0.9981 | 0.9990 | 0.9990 | 0.9981 | 0.9990 | 0.9990 |
| miRgo_trA_FS-CVAEb | 1.0000 | 0.9977 | 0.9988 | 0.9988 | 0.9977 | 0.9988 | 0.9988 |
amiRgo_TrA doesn’t include the feature selection (FS) procedure.
bmiRgo_trA_FS-mRMR and miRgo_trA_FS-CVAEis are with the mRMR and CVAttributeEval feature selection method, respectively.
Figure 4The incremental feature selection (IFS) curve of the combination of features. Features ranked by the mRMR method were added one by one from higher to lower rank into models, and 184 models with different combination of features were constructed and evaluated by the CHL index. It can be observed that the combination of 11 most important features makes the CHL index to reach a maximum value of 0.99903.
Performance comparison of the miRgo models with and without the feature selection (FS) procedure for the independent test set.
| Model | Sn | Sp | Acc | F1-score | MCC | MCC′ | CHL-index |
|---|---|---|---|---|---|---|---|
| miRgo_trA | 0.7765 | 0.4066 | 0.7180 | 0.8226 | 0.1538 | 0.5769 | 0.6910 |
| miRgo_trA_FS-mRMR | 0.8840 | 0.2900 | 0.7900 | 0.8760 | 0.1810 | 0.5905 | 0.7316 |
| miRgo_trA_FS-CVAE | 0.9354 | 0.1411 | 0.8098 | 0.8923 | 0.1480 | 0.5524 | 0.7201 |
Performance comparison of different miRNA–target interaction prediction methods for the independent test set.
| Prediction method | Sn | Sp | Acc | F1-score | MCC | MCC′ | CHL-index |
|---|---|---|---|---|---|---|---|
| miRgoa | 0.8840 | 0.2900 | 0.7900 | 0.8760 | 0.1810 | 0.5905 | 0.7316 |
| RNA22 | 0.3917 | 0.7593 | 0.4498 | 0.5453 | 0.1143 | 0.5571 | 0.5127 |
| miRanda_0_0 | 0.0109 | 0.9959 | 0.1666 | 0.0216 | 0.0250 | 0.5125 | 0.0552 |
| miRanda_0_C | 0.4517 | 0.6141 | 0.4774 | 0.5927 | 0.0484 | 0.5242 | 0.5273 |
| miRanda_S_0 | 0.0093 | 1.0000 | 0.1659 | 0.0185 | 0.0386 | 0.5193 | 0.0484 |
| miRanda_S_C | 0.4540 | 0.6058 | 0.4780 | 0.5943 | 0.0439 | 0.5220 | 0.5272 |
| STMDB_3US | 0.3419 | 0.7303 | 0.4033 | 0.4911 | 0.0560 | 0.5280 | 0.4680 |
| STMDB_3ULS | 0.6168 | 0.5394 | 0.6046 | 0.7243 | 0.1160 | 0.5580 | 0.6215 |
| STMDB_CS | 0.3084 | 0.7925 | 0.3849 | 0.4578 | 0.0809 | 0.5405 | 0.4523 |
| STMDB_CLS | 0.6589 | 0.5104 | 0.6354 | 0.7527 | 0.1280 | 0.5640 | 0.6417 |
| STMDB_5US | 0.0312 | 0.9834 | 0.1816 | 0.0602 | 0.0317 | 0.5159 | 0.1248 |
| STMDB_5ULS | 0.6098 | 0.4938 | 0.5915 | 0.7154 | 0.0769 | 0.5385 | 0.6066 |
| TargetScan | 0.5397 | 0.6307 | 0.5541 | 0.6709 | 0.1244 | 0.5622 | 0.5912 |
| DIANA_microT | 0.3076 | 0.7054 | 0.3705 | 0.4514 | 0.0103 | 0.5052 | 0.4352 |
| PITA | 0.0522 | 0.9876 | 0.2000 | 0.0990 | 0.0693 | 0.5346 | 0.1767 |
| TarPmiR | 0.7048 | 0.4896 | 0.6708 | 0.7829 | 0.1513 | 0.5757 | 0.6659 |
| MBSTAR | 0.3512 | 1.0000 | 0.4538 | 0.5199 | 0.2807 | 0.6404 | 0.5273 |
| PACCMIT-CDS | 0.0639 | 0.9461 | 0.2033 | 0.1189 | 0.0150 | 0.5075 | 0.1961 |
amiRgo, a abbreviation of miRgo_trA_FS-mRMR, was constructed by SVM with the mRMR feature selection method and trained on the trA training dataset.
Figure 5Comparison of the CHL Index of the different microRNA target site prediction methods on the functionally important gene sets. (A) For the genes with the biological process annotation. (B) For the genes with the molecular function annotation. (C) For the genes with the cellular component annotation.