| Literature DB >> 21949731 |
Ramkrishna Mitra1, Sanghamitra Bandyopadhyay.
Abstract
BACKGROUND: Machine learning based miRNA-target prediction algorithms often fail to obtain a balanced prediction accuracy in terms of both sensitivity and specificity due to lack of the gold standard of negative examples, miRNA-targeting site context specific relevant features and efficient feature selection process. Moreover, all the sequence, structure and machine learning based algorithms are unable to distribute the true positive predictions preferentially at the top of the ranked list; hence the algorithms become unreliable to the biologists. In addition, these algorithms fail to obtain considerable combination of precision and recall for the target transcripts that are translationally repressed at protein level. METHODOLOGY/PRINCIPAL FINDING: In the proposed article, we introduce an efficient miRNA-target prediction system MultiMiTar, a Support Vector Machine (SVM) based classifier integrated with a multiobjective metaheuristic based feature selection technique. The robust performance of the proposed method is mainly the result of using high quality negative examples and selection of biologically relevant miRNA-targeting site context specific features. The features are selected by using a novel feature selection technique AMOSA-SVM, that integrates the multi objective optimization technique Archived Multi-Objective Simulated Annealing (AMOSA) and SVM.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21949731 PMCID: PMC3174180 DOI: 10.1371/journal.pone.0024583
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Different miRNA-mRNA seed-site interaction patterns (6mer, 7mer-A1, 7mer-m8 and 8mer).
Watson-Crick complimentary regions can be obtained at miRNA seed and out-seed part.
miRNA-targeting site context specific relevant features used in MultiMiTar.
| Feature number | Feature name | Common features |
| Category 1 | ||
| 6 | Number of additional Watson-Crick pairing associated with effective seven mer m8 | * |
| Frequency of Single nucleotide in seed matching out site (Category 3) | ||
| 19 | G’s frequency in effective seed matching out site | |
| Frequency of Di-nucleotides frequency in seed matching site (Category 4) | ||
| 22 | AU’s frequency in effective seed matching site | |
| 24 | AC’s frequency in effective seed matching site | |
| 25 | UA’s frequency in effective seed matching site | |
| 26 | UU’s frequency in effective seed matching site | * |
| 28 | UC’s frequency in effective seed matching site | * |
| 30 | GU’s frequency in effective seed matching site | * |
| 32 | GC’s frequency in effective seed matching site | |
| 35 | CG’s frequency in effective seed matching site | * |
| 36 | CC’s frequency in effective seed matching site | * |
| Frequency of Di-nucleotides in seed matching out site (Category 5) | ||
| 38 | AU’s frequency in effective seed matching out site | |
| 39 | AG’s frequency in effective seed matching out site | * |
| 40 | AC’s frequency in effective seed matching out site | * |
| 42 | UU’s frequency in effective seed matching out site | * |
| 44 | UC’s frequency in effective seed matching out site | * |
| 45 | GA’s frequency in effective seed matching out site | |
| 47 | GG’s frequency in effective seed matching out site | |
| 48 | GC’s frequency in effective seed matching out site | |
| miRNA-mRNA base interaction features in seed region (Category 6) | ||
| 53 | Frequency of AU base pair | * |
| 54 | Frequency of UA base pair | |
| 56 | Frequency of GC base pair | * |
| 57 | Frequency of GU base pair | * |
| 58 | Frequency of CG base pair | * |
| Two consecutive miRNA-mRNA base interaction features in seed region (Bi-Di-nucleotide base pairing) (Category 7) | ||
| 59 | Frequency of AU-AU | * |
| 62 | Frequency of AU-CG | * |
| 64 | Frequency of AU-UG | * |
| 65 | Frequency of UA-AU | |
| 67 | Frequency of UA-GC | |
| 68 | Frequency of UA-CG | * |
| 69 | Frequency of UA-GU | |
| 70 | Frequency of UA-UG | * |
| 73 | Frequency of GC-GC | |
| 74 | Frequency of GC-CG | |
| 78 | Frequency of CG-UA | * |
| 79 | Frequency of CG-GC | * |
| 83 | Frequency of GU-AU | |
| 84 | Frequency of GU-UA | * |
| 86 | Frequency of GU-CG |
The features are selected by using novel feature selection algorithm AMOSA-SVM. Category-wise list of common features selected by at least 90% non-dominated solutions in the archive are denoted by ‘*’.
Figure 2Scatter plot of the True positive rate versus the False positive rate for different algorithms.
The plot is based on the independent test data set.
Performance of MultiMiTar and existing target prediction methods on independent test data set.
| Method | MCC | ACA |
| MultiMiTar | 0.583 | 0.800 |
| TargetMiner | 0.403 | 0.730 |
| PITA | 0.155 | 0.549 |
| TargetScan | 0.135 | 0.582 |
| miRanda | 0.128 | 0.570 |
| NBmiRTar | 0.083 | 0.550 |
| MirTarget2 | 0.052 | 0.495 |
| PicTar | −0.006 | 0.496 |
| DIANA MicroT 3.0 | −0.013 | 0.498 |
| RNAhybrid | −0.029 | 0.487 |
| MicroInspector | −0.216 | 0.378 |
| RNA22 | −0.269 | 0.321 |
| TargetSpy no-seed sens | 0.209 | 0.560 |
| TargetSpy no-seed spec | 0.209 | 0.560 |
| TargetSpy seed sens | 0.234 | 0.557 |
| TargetSpy seed spec | 0.234 | 0.557 |
Category-wise feature selection ratio for TargetMiner and MultiMiTar.
| Feature | Total | Corr-coeff | TargetMiner | MultiMiTar | Common feat. in archive | |||
| category | Feat. | No of Feat. | Ratio(%) | No of Feat. | Ratio(%) | No of Feat. | Ratio(%) | |
| 1 | 12 | 0.964 | 5 | 41.67 | 1 | 8.33 | 1 | 8.33 |
| 2 | 4 | 0.90 | 2 | 50 | 0 | 0 | 0 | 0 |
| 3 | 4 | 0.984 | 2 | 50 | 1 | 25 | 0 | 0 |
| 4 | 16 | 0.734 | 3 | 18.75 | 9 | 56.25 | 5 | 31.25 |
| 5 | 16 | 0.976 | 11 | 68.75 | 8 | 50 | 4 | 25 |
| 6 | 6 | 0.865 | 3 | 50 | 5 | 83.33 | 4 | 66.67 |
| 7 | 32 | 0.784 | 4 | 12.5 | 15 | 46.87 | 8 | 25 |
| Total | 90 | 30 | 33.33 | 39 | 43.33 | 22 | 24.44 | |
Figure 3Performance comparison of several miRNA target prediction algorithms on the Psilac data.
Proteins with log2-fold change <−0.2 are considered as target.
Figure 4Distribution of the predictions of MultiMiTar and other algorithms in recognizing biologically validated miRNA-CDKN1A interactions.
The plots show that MultiMiTar obtains the most preferential distribution that tends to be shifted towards the top 20th percentile compared to the other algorithms.
Pairwise comparisons between different ranked lists distributed preferentially (MultiMiTar) or uniformly (rest of the algorithms).
| MultiMiTar | miRanda | NBmiRTar | PITA | RNAhybrid | DIANA-microT 3.0 | |
| miRanda | 1.73×10−03 | – | – | – | – | – |
| NBmiRTar | 4.91×10−03 | 0.34 | – | – | – | – |
| PITA | 3.45×10−03 | 0.15 | 0.34 | – | – | – |
| RNAhybrid | 1.57×10−03 | 0.43 | 0.45 | 0.06 | – | – |
| DIANA-microT 3.0 | 1.32×10−02 | 0.50 | 0.50 | 0.23 | 0.42 | – |
| TargetMiner | 4.2×10−02 | 3.72×10−02 | 0.12 | 0.15 | 0.10 | 0.13 |
P-values are obtained by wilcoxon rank sum test.
Figure 5Comparison between MultiMiTar and TargetMiner based on ranking results for true positive examples.