| Literature DB >> 25077573 |
Wenlong Xu, Anthony San Lucas, Zixing Wang, Yin Liu.
Abstract
BACKGROUND: Currently available microRNA (miRNA) target prediction algorithms require the presence of a conserved seed match to the 5' end of the miRNA and limit the target sites to the 3' untranslated regions of mRNAs. However, it has been noted that these requirements may be too stringent, leading to a substantial number of missing targets.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25077573 PMCID: PMC4110731 DOI: 10.1186/1471-2105-15-S7-S4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Types of seed matches. Five different types of seed matches used in our study, including canonical seed match types 2t8A1, 2t8, 2t7A1 and 2t7, and a non-canonical 1t8GU wobble type.
Figure 2Signal-to-noise ratio. Signal-to-noise ratio for five different types of seed matches in four different gene regions.
Signal-to-noise ratio, weight and proportion of different types of seed matches in different regions.
| 2t8A1 | 2t8 | 2t7A1 | 2t7 | 1t8GU | ||
|---|---|---|---|---|---|---|
| Number of matches (miRWalk) | 1,235 | 3,105 | 2,657 | 7,296 | 3,734 | Promoters |
| 171 | 628 | 463 | 1,594 | 730 | 5'UTRs | |
| 1,153 | 3,116 | 2,729 | 6,494 | 3,343 | CDSs | |
| 2,366 | 4,103 | 3,204 | 6,920 | 3,797 | 3'UTRs | |
| Number of matches (Average of 50 times random shuffle) | 911 | 2,639 | 2,422 | 7,014 | 3,068 | Promoters |
| 141 | 489 | 391 | 1,415 | 547 | 5'UTRs | |
| 904 | 2,331 | 2,331 | 5,997 | 2,655 | CDSs | |
| 1,069 | 2,593 | 2,513 | 6,060 | 3,175 | 3'UTRs | |
| Signal-to-noise ratio | 1.355 | 1.177 | 1.097 | 1.040 | 1.217 | Promoters |
| 1.213 | 1.284 | 1.184 | 1.126 | 1.333 | 5'UTRs | |
| 1.275 | 1.337 | 1.171 | 1.083 | 1.259 | CDSs | |
| 2.214 | 1.583 | 1.275 | 1.142 | 1.196 | 3'UTRs | |
| Weight | 0.293 | 0.146 | 0.080 | 0.033 | 0.179 | Promoters |
| 0.176 | 0.234 | 0.151 | 0.104 | 0.275 | 5'UTRs | |
| 0.227 | 0.277 | 0.141 | 0.068 | 0.214 | CDSs | |
| 1.000 | 0.480 | 0.227 | 0.117 | 0.161 | 3'UTRs | |
| Proportion (miRWalk) | 7% | 17% | 15% | 40% | 21% | Promoters |
| 5% | 18% | 13% | 44% | 20% | 5'UTRs | |
| 7% | 19% | 16% | 39% | 20% | CDSs | |
| 12% | 20% | 16% | 34% | 19% | 3'UTRs | |
| Proportion (Average of 50 times shuffle) | 6% | 16% | 15% | 44% | 19% | Promoters |
| 5% | 16% | 13% | 47% | 18% | 5'UTRs | |
| 6% | 16% | 16% | 42% | 19% | CDSs | |
| 7% | 17% | 16% | 39% | 21% | 3'UTRs | |
The number of matches in miRWalk dataset is the real number of each seed match type in each target region. The number of matches from random shuffles is the average number of each type of seed match over 50 randomly shuffled mRNA sequences. The signal-to-noise ratio is the ratio of these two numbers. The weight is then calculated via the equation (1). The proportion is the percentage of a specific seed match type in each target region.
Figure 3Performance comparison of different miRNA target prediction methods. The fraction of predicted targets with down regulated protein production in the pSILAC dataset.
Importance of different features.
| Features | Importance |
|---|---|
| number of all kinds of seed matches | 23.28 |
| 79.11 | |
| ΔΔG | 78.01 |
| frequency of outseed A composition | 67.96 |
| frequency of outseed C composition | 47.08 |
| frequency of outseed G composition | 49.72 |
| frequency of outseed U composition | 59.65 |
| frequency of outseed AA composition | 60.37 |
| frequency of outseed AC composition | 50.38 |
| frequency of outseed AG composition | 52.15 |
| frequency of outseed AU composition | 62.19 |
| frequency of outseed CA composition | 52.26 |
| frequency of outseed CC composition | 47.73 |
| frequency of outseed CG composition | 36.09 |
| frequency of outseed CU composition | 48.86 |
| frequency of outseed GA composition | 50.30 |
| frequency of outseed GC composition | 46.81 |
| frequency of outseed GG composition | 50.71 |
| frequency of outseed GU composition | 51.48 |
| frequency of outseed UA composition | 55.21 |
| frequency of outseed UC composition | 49.67 |
| frequency of outseed UG composition | 53.42 |
| frequency of outseed UU composition | 53.22 |
| frequency of seed A composition | 23.54 |
| frequency of seed C composition | 19.62 |
| frequency of seed G composition | 9.80 |
| frequency of seed U composition | 18.49 |
| frequency of seed AA composition | 3.72 |
| frequency of seed AC composition | 6.82 |
| frequency of seed AG composition | 3.41 |
| frequency of seed AU composition | 7.42 |
| frequency of seed CA composition | 10.57 |
| frequency of seed CC composition | 6.14 |
| frequency of seed CG composition | 0.35 |
| frequency of seed CU composition | 7.80 |
| frequency of seed GA composition | 1.01 |
| frequency of seed GC composition | 9.82 |
| frequency of seed GG composition | 0.92 |
| frequency of seed GU composition | 5.35 |
| frequency of seed UA composition | 12.01 |
| frequency of seed UC composition | 5.67 |
| frequency of seed UG composition | 9.15 |
| frequency of seed UU composition | 8.86 |
| frequency of seed AU nucleotide base pairing | 28.55 |
| frequency of seed UA nucleotide base pairing | 15.95 |
| frequency of seed GC nucleotide base pairing | 7.54 |
| frequency of seed CG nucleotide base pairing | 19.06 |
| frequency of seed GU nucleotide base pairing | 3.21 |
| frequency of seed UG nucleotide base pairing | 6.05 |
| Frequency of seed AU-AU Bi-Di-nucleotide base pairing | 3.75 |
| Frequency of seed AU-UA Bi-Di-nucleotide base pairing | 6.80 |
| Frequency of seed AU-GC Bi-Di-nucleotide base pairing | 3.59 |
| Frequency of seed AU-CG Bi-Di-nucleotide base pairing | 5.80 |
| Frequency of seed AU-GU Bi-Di-nucleotide base pairing | - |
| Frequency of seed AU-UG Bi-Di-nucleotide base pairing | 2.64 |
| Frequency of seed UA-AU Bi-Di-nucleotide base pairing | 10.61 |
| Frequency of seed UA-UA Bi-Di-nucleotide base pairing | 5.95 |
| Frequency of seed UA-GC Bi-Di-nucleotide base pairing | - |
| Frequency of seed UA-CG Bi-Di-nucleotide base pairing | 7.86 |
| Frequency of seed UA-GU Bi-Di-nucleotide base pairing | 0.33 |
| Frequency of seed UA-UG Bi-Di-nucleotide base pairing | 2.08 |
| Frequency of seed GC-AU Bi-Di-nucleotide base pairing | - |
| Frequency of seed GC-UA Bi-Di-nucleotide base pairing | 7.14 |
| Frequency of seed GC-GC Bi-Di-nucleotide base pairing | - |
| Frequency of seed GC-CG Bi-Di-nucleotide base pairing | 3.37 |
| Frequency of seed GC-GU Bi-Di-nucleotide base pairing | - |
| Frequency of seed GC-UG Bi-Di-nucleotide base pairing | 1.89 |
| Frequency of seed CG-AU Bi-Di-nucleotide base pairing | 12.21 |
| Frequency of seed CG-UA Bi-Di-nucleotide base pairing | 4.91 |
| Frequency of seed CG-GC Bi-Di-nucleotide base pairing | 7.55 |
| Frequency of seed CG-CG Bi-Di-nucleotide base pairing | 7.72 |
| Frequency of seed CG-GU Bi-Di-nucleotide base pairing | 2.55 |
| Frequency of seed CG-UG Bi-Di-nucleotide base pairing | 2.33 |
| Frequency of seed GU-AU Bi-Di-nucleotide base pairing | 1.02 |
| Frequency of seed GU-UA Bi-Di-nucleotide base pairing | 3.04 |
| Frequency of seed GU-GC Bi-Di-nucleotide base pairing | 0.95 |
| Frequency of seed GU-CG Bi-Di-nucleotide base pairing | 0.23 |
| Frequency of seed UG-AU Bi-Di-nucleotide base pairing | 2.33 |
| Frequency of seed UG-UA Bi-Di-nucleotide base pairing | 1.72 |
| Frequency of seed UG-GC Bi-Di-nucleotide base pairing | - |
| Frequency of seed UG-CG Bi-Di-nucleotide base pairing | 3.50 |
Number of all kinds of seed matches is the sum of all the 5 different seed match types in all different 4 regions. All other features are the properties of each single miRNA-mRNA seed match site. The importance is calculated by the Random Forests method based on the miRWalk dataset as positive training data and its relative random shuffle pairs as negative training data.
Figure 4ROC curve for Random Forest method. The ROC curve for Random Forest obtained by 10-fold cross-validation on pSILAC dataset is shown with the results from other target prediction methods. The and were used as input features for Random Forest.
Figure 5ROC curve of independent testing. The ROC curve for Random Forest with all the 81 features listed in Table 2. The model was trained on miRWalk dataset and tested on the independent pSILAC dataset.