| Literature DB >> 19091027 |
Yuchen Yang1, Yu-Ping Wang, Kuo-Bin Li.
Abstract
BACKGROUND: MicroRNAs (miRNAs) are a set of small non-coding RNAs serving as important negative gene regulators. In animals, miRNAs turn down protein translation by binding to the 3' UTR regions of target genes with imperfect complementary pairing. The identification of microRNA targets has become one of the major challenges of miRNA research. Bioinformatics investigations on miRNA target have resulted in a number of target prediction tools. Although these tools are capable of predicting hundreds of targets for a given miRNA, many of them suffer from high false positive rates, indicating the need for a post-processing filter for the predicted targets. Once trained with experimentally validated true and false targets, machine learning methods appear to be ideal approaches to distinguish the true targets from the false ones.Entities:
Mesh:
Substances:
Year: 2008 PMID: 19091027 PMCID: PMC2638144 DOI: 10.1186/1471-2105-9-S12-S4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Prediction accuracies of the 10-fold cross-validation experiments.
| 163 | 32 | 83.59 | 28 | 10 | 73.68 | 81.97 | 0.86 |
Prediction accuracies are given in TP (true positive), FN (false negative), TN (true negative), FP (false positive), sensitivity SE = TP/(TP + FN), specificity SP = TN/(TN + FP), overall accuracy Q = (TP + TN)/(TP + FN + TN + FP), and AUC (area under the ROC curve).
The top 25 most informative features.
| 3-gram, non-seed, mismatch/AU/AU | 0.0197 | 0.0370 | 0.0000 | 0.0000 | 0.5313 |
| 2-gram, non-seed, mismatch/AU | 0.0526 | 0.0606 | 0.0107 | 0.0353 | 0.4374 |
| 2-gram, entire, mismatch/AU | 0.0441 | 0.0439 | 0.0160 | 0.0265 | 0.3991 |
| 3-gram, entire, GC/gap/gap | 0.0068 | 0.0176 | 0.0228 | 0.0236 | 0.3904 |
| 3-gram, entire, mismatch/mismatch/gap | 0.0060 | 0.0164 | 0.0000 | 0.0000 | 0.3636 |
| 3-gram, non-seed, gap/GU/AU | 0.0095 | 0.0262 | 0.0000 | 0.0000 | 0.3631 |
| 3-gram, entire, gap/GU/AU | 0.0062 | 0.0172 | 0.0000 | 0.0000 | 0.3629 |
| 3-gram, entire, mismatch/AU/AU | 0.0198 | 0.0312 | 0.0044 | 0.0132 | 0.3457 |
| 3-gram, non-seed, mismatch/mismatch/AU | 0.0212 | 0.0422 | 0.0022 | 0.0135 | 0.3417 |
| 2-gram, seed, GU/GC | 0.0117 | 0.0352 | 0.0000 | 0.0000 | 0.3337 |
| 3-gram, entire, AU/mismatch/GU | 0.0054 | 0.0167 | 0.0000 | 0.0000 | 0.3253 |
| 2-gram, entire, gap/gap | 0.0838 | 0.1059 | 0.1512 | 0.1030 | 0.3226 |
| 1-gram, entire, GC | 0.2399 | 0.0957 | 0.2893 | 0.0678 | 0.3021 |
| 1-gram, non-seed, gap | 0.1880 | 0.1581 | 0.2841 | 0.1601 | 0.3020 |
| 2-gram, non-seed, gap/gap | 0.1022 | 0.1406 | 0.1886 | 0.1505 | 0.2969 |
| 3-gram, non-seed, GC/mismatch/AU | 0.0067 | 0.0224 | 0.0000 | 0.0000 | 0.2969 |
| 1-gram, entire, gap | 0.1595 | 0.1225 | 0.2273 | 0.1066 | 0.2958 |
| 3-gram, non-seed, mismatch/mismatch/gap | 0.0067 | 0.0227 | 0.0000 | 0.0000 | 0.2943 |
| 3-gram, entire, mismatch/mismatch/AU | 0.0135 | 0.0259 | 0.0026 | 0.0111 | 0.2937 |
| 2-gram, entire, GC/gap | 0.0199 | 0.0298 | 0.0357 | 0.0243 | 0.2932 |
| 2-gram, entire, GC/GC | 0.0630 | 0.0549 | 0.0928 | 0.0471 | 0.2930 |
| 3-gram, entire, GU/GC/gap | 0.0002 | 0.0028 | 0.0043 | 0.0115 | 0.2895 |
| 1-gram, non-seed, GC | 0.1742 | 0.1261 | 0.2377 | 0.0952 | 0.2870 |
| 3-gram, entire, gap/GU/GC | 0.0005 | 0.0047 | 0.0056 | 0.0136 | 0.2810 |
| 3-gram, non-seed, GU/mismatch/AU | 0.0064 | 0.0233 | 0.0000 | 0.0000 | 0.2756 |
Features are in the format of k-gram type, region, and k-gram code. For example, "3-gram, non-seed, mismatch/AU/AU" represent a mismatch followed by an AU pair followed by an AU pair in the non-seed region (see Materials and Method – Data representation for the detailed definitions of k-gram, region and k-gram code). For each feature, its means and standard deviations in both positive and negative sets are listed. The F score is defined as |(μ+ - μ-)/(σ+ + σ-)|, which measures the discriminating ability of the feature.
List of miRNAs appeared in the training set.
| Positive set | let-7, lin-4, lsy-6, miR-273, miR-61, miR-84 |
| bantam, let-7, miR-1, miR-11, miR-2, miR-278, miR-2a-1, miR-4, miR-7, miR-79 | |
| let-7 | |
| miR-125b, miR-134, miR-181a | |
| let-7, let-7b, let-7e, miR-1, miR-101, miR-103-1, miR-10a, miR-125a, miR-125b, miR-127, miR-130, miR-132, miR-133, miR-136, miR-141, miR-143, miR-145, miR-15, miR-16, miR-17-5p, miR-196, miR-199b, miR-19a, miR-1b, miR-20, miR-221, miR-222, miR-223, miR-23, miR-23a, miR-24, miR-26, miR-32, miR-34, miR-375, miR-431, miR-433-3p, miR-433-5p, miR-434-3p, miR-434-5p | |
| Negative set | let-7 |
| mir-276b, mir-278, mir-286, mir-287, mir-288, mir-303, mir-316, mir-317, mir-318 | |
| mir-124, miR-34, mir-375 | |
| let-7b, let-7e, miR-15, miR-16, miR-24, miR-103, miR-141, miR-145, miR-1, miR-19a, miR-34 | |
A comparison between MiRTif and three ab initio target predicting software programs, PicTar, miRBase and TargetScan using miR-224, which was discovered to be significantly up-regulated in hepatocellular carcinoma patients.
| H3F3B | 3021 | H3 histone, family 3B (H3.3B) | 1 | na | 122 | +1.69 |
| API5 | 8539 | Apoptosis inhibitor 5 | 2 | na | 99 | +1.32 |
| ARMCX2 | 9823 | Armadillo repeat containing, X-linked 2 | 4 | 198 | na | -0.98 |
| CDK9 | 1025 | Cyclin-dependent kinase 9 (CDC2-related kinase) | 9 | 25 | na | -0.15 |
| NCOA6 | 23054 | Nuclear receptor coactivator 6 | 10 | 204 | 47 | +0.94 |
| ATF2 | 1386 | Activating transcription factor 2 | 64 | 19 | 104 | +1.27 |
| NUP153 | 9972 | nucleoporin 153 kDa | 168 | 1 | 46 | -1.46 |
| FOSB | 2354 | FBJ murine osteosarcoma viral Nncogene homolog B | 222 | 4 | 125 | +0.11 |
Ranking indicates the rank predicted by respective software program. "NA" indicates that the particular program did not produce an interaction between the corresponding target gene and miR-224. For the discriminant scores under MiRTif, positive scores indicate true predictions while negative scores negative predictions. The score is proportional to the sample's distance from the hyperplane. So a large positive value implies high confidence that the sample lies in the positive class.
Figure 1Feature vector encoded for a miRNA:target duplex. Each duplex is partitioned into two parts, with the first part (the seed) covers nucleotide 1 to 9 from the 5' end of the miRNA, and the second part (the non-seed) covers the rest of the duplex. Five types of base-pairing are defined. A total of 465 features, consisting of the 1-gram, 2-gram and 3-gram frequencies of the five pairing types, are encoded into a vector representing a miRNA:target duplex.