| Literature DB >> 16729898 |
Peilin Jia1, Tieliu Shi, Yudong Cai, Yixue Li.
Abstract
BACKGROUND: siRNAs are small RNAs that serve as sequence determinants during the gene silencing process called RNA interference (RNAi). It is well know that siRNA efficiency is crucial in the RNAi pathway, and the siRNA efficiency for targeting different sites of a specific gene varies greatly. Therefore, there is high demand for reliable siRNAs prediction tools and for the design methods able to pick up high silencing potential siRNAs.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16729898 PMCID: PMC1524998 DOI: 10.1186/1471-2105-7-271
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Three cut-off values have been used to generate positive and negative subsets from the Dieter's and Satron's datasets respectively. Six columns indicating six combinations of positive and negative subsets are listed with the number of siRNAs in each subset.
| Dieter's Dataset | Satron's Dataset | |||||
| Cut-off 0.5 | Cut-off 0.6 | Cut-off 0.7 | Cut-off 0.5 | Cut-off 0.6 | Cut-off 0.7 | |
| Number of siRNAs in the positive dataset | 1585 | 1180 | 734 | 221 | 178 | 141 |
| Number of siRNAs in the negative dataset | 846 | 1251 | 1697 | 340 | 383 | 420 |
| All | 2431 | 2431 | 2431 | 561 | 561 | 561 |
The self-consistency and jackknife results for the sequence-based method trained by the six combinations listed in table 1.
| Dieter's Dataset | Satron's Dataset | |||
| Self-Consistency | Jackknife | Self-consistency | Jackknife | |
| Cut-Off 0.7 | ||||
| Accuracy | 89.88% | 89.35% | 76.83% | 70.94% |
| Sensitivity | 94.28% | 93.87% | 75.18% | 68.09% |
| Specificity | 87.98% | 87.39% | 77.38% | 71.90% |
| Pearson | 0.658 | 0.6594 | 0.4816 | 0.4021 |
| ROC | 0.975 | 0.9698 | 0.8333 | 0.7557 |
| Cut-Off 0.6 | ||||
| Accuracy | 90.79% | 89.47% | 75.04% | 70.59% |
| Sensitivity | 91.27% | 89.49% | 74.72% | 67.42% |
| Specificity | 90.33% | 89.45% | 75.20% | 72.06% |
| Pearson | 0.8298 | 0.8264 | 0.4699 | 0.3944 |
| ROC | 0.9735 | 0.9686 | 0.8169 | 0.7381 |
| Cut-Off 0.5 | ||||
| Accuracy | 90.25% | 89.43% | 77.18% | 70.23% |
| Sensitivity | 88.33% | 87.70% | 75.57% | 68.78% |
| Specificity | 93.85% | 92.67% | 78.24% | 71.18% |
| Pearson | 0.8288 | 0.8278 | 0.4446 | 0.3851 |
| ROC | 0.9751 | 0.97 | 0.8353 | 0.7675 |
Everything is the same with table 3 except that the dataset is from Satron's work. Satron's Dataset, Jackknife test:
| A+B+C | A+B | B+C | A+C | A | B | C | |
| Cut-off 0.7 | |||||||
| Accuracy | 78.07% | 74.87% | 75.22% | 76.47% | 74.87% | 74.87% | |
| Sensitivity | 21.99% | 9.93% | 4.26% | 24.11% | 16.31% | 0.00% | 0.00% |
| Specificity | 96.09% | 96.67% | 99.05% | 96.97% | 96.67% | 100.00% | 100.00% |
| Pearson | 0.4369 | 0.4458 | 0.4032 | 0.4562 | 0.4432 | 0.2855 | 0.4013 |
| ROC | 0.7476 | 0.7488 | 0.7309 | 0.7648 | 0.755 | 0.6554 | 0.7381 |
| Cut-off 0.6 | |||||||
| Accuracy | 71.66% | 68.98% | 73.08% | 70.23% | 68.27% | 70.41% | |
| Sensitivity | 31.46% | 24.72% | 26.97% | 34.27% | 29.78% | 0.00% | 14.61% |
| Specificity | 90.34% | 89.56% | 94.78% | 91.12% | 89.03% | 100.00% | 96.34% |
| Pearson | 0.4465 | 0.4327 | 0.4254 | 0.4477 | 0.4533 | 0.3273 | 0.3698 |
| ROC | 0.7363 | 0.7228 | 0.7293 | 0.7414 | 0.7375 | 0.6679 | 0.7029 |
| Cut-off 0.5 | |||||||
| Accuracy | 71.12% | 69.34% | 72.19% | 70.77% | 63.99% | 68.09% | |
| Sensitivity | 58.37% | 54.75% | 47.96% | 56.56% | 55.66% | 40.27% | 34.39% |
| Specificity | 81.76% | 81.76% | 83.24% | 82.35% | 80.59% | 79.41% | 90.00% |
| Pearson | 0.4868 | 0.4814 | 0.4642 | 0.4976 | 0.4597 | 0.3625 | 0.3994 |
| ROC | 0.7706 | 0.7721 | 0.7508 | 0.7846 | 0.755 | 0.685 | 0.7132 |
The jackknife results for the method of support vector machine trained by the six combinations listed in table 1. Three attributes have been defined, namely binary system (denoted by "A" in the table), thermodynamic profile ("B" in the table) and composition ("C" in the table). Seven combinations of the attributes are put forward, which are A+B+C (means "binary, thermodynamic and composition"), A+B (means "binary and thermodynamic"), B+C (means "thermodynamic and composition"), A+C (means "thermodynamic and composition"), A (means "binary only"), B (means "thermodynamic only") and C (means "composition"). The self-consistency and jackknife test are executed in all the seven vector space respectively to compare the contribution from each of the three attributes. To save space, here we just listed the results of jackknife test. This table lists results of Dieter's dataset. See table 4 for Satron's dataset. Self-consistency results have been placed in the supplemental file (see additional file 3) Dieter's dataset, jackknife test:
| A+B+C | A+B | B+C | A+C | A | B | C | |
| Cut-off 0.7 | |||||||
| Accuracy | 94.78% | 94.90% | 85.97% | 94.86% | 78.69% | 81.65% | |
| Sensitivity | 86.51% | 87.87% | 67.98% | 87.19% | 91.14% | 48.64% | 57.90% |
| Specificity | 98.35% | 97.94% | 93.75% | 98.17% | 98.29% | 91.69% | 91.93% |
| Pearson | 0.9726 | 0.9752 | 0.8522 | 0.9749 | 0.9808 | 0.7189 | 0.7377 |
| ROC | 0.9899 | 0.9922 | 0.9302 | 0.9913 | 0.9952 | 0.8411 | 0.8809 |
| Cut-off 0.6 | |||||||
| Accuracy | 94.65% | 96.01% | 83.83% | 95.80% | 76.31% | 80.09% | |
| Sensitivity | 94.32% | 96.19% | 82.37% | 95.42% | 96.61% | 73.39% | 79.66% |
| Specificity | 94.96% | 95.84% | 85.21% | 96.16% | 96.80% | 79.06% | 80.50% |
| Pearson | 0.9735 | 0.9786 | 0.8469 | 0.9775 | 0.9825 | 0.7181 | 0.7619 |
| ROC | 0.9912 | 0.9947 | 0.9223 | 0.9937 | 0.9967 | 0.8436 | 0.885 |
| Cut-off 0.5 | |||||||
| Accuracy | 94.65% | 95.56% | 83.67% | 95.23% | 77.46% | 79.47% | |
| Sensitivity | 96.53% | 97.10% | 90.85% | 96.85% | 97.79% | 88.71% | 86.75% |
| Specificity | 91.13% | 92.67% | 70.21% | 92.20% | 93.85% | 56.38% | 65.84% |
| Pearson | 0.9726 | 0.974 | 0.8415 | 0.9761 | 0.98 | 0.7172 | 0.741 |
| ROC | 0.9906 | 0.9928 | 0.9121 | 0.9926 | 0.9951 | 0.8435 | 0.8668 |
Compare with Dieter's results. Using the same training and testing dataset, both of the two methods have been applied to compute the pearson correlation coefficient. Also, cut off value should be specified as 0.5, 0.6 or 0.7. Here we just showed the result when cut off value is 0.6. The results when cut-off value is 0.5 or 0.7 are detailed in supplemental file (see additional file 4), which also shows the accuracy, sensitivity, specificity and ROC for the two methods. For more info about Dieter's work or the explanation about the datasets used by them, please consult [15]. SVM, cut-off 0.6
| Pearson | All(249) | All human(198) | hE2(139) | Rodent(51) |
| All(2182) | 0.9771 | 0.9769 | 0.9743 | 0.9713 |
| All human(1744) | 0.9721 | 0.9722 | 0.9689 | 0.9639 |
| Human E2s(1229) | 0.9653 | 0.9644 | 0.9606 | 0.9593 |
| Rodent(438) | 0.9057 | 0.9077 | 0.895 | 0.8806 |
| Random all (1091) | 0.9660 | 0.9673 | 0.9651 | 0.9510 |
| Random all (727) | 0.9343 | 0.9369 | 0.9387 | 0.9125 |
| Random all (545) | 0.9249 | 0.9252 | 0.9206 | 0.9154 |
| Random all (218) | 0.8502 | 0.8645 | 0.8570 | 0.7713 |
| All-19 | 0.9436 | |||
| All human-19 | 0.9387 | |||
| Rodent-19 | 0.8487 | |||
| SeqSta, cut-off 0.6 | ||||
| Pearson | All(249) | All human(198) | hE2(139) | Rodent(51) |
| All(2182) | 0.8562 | 0.8557 | 0.8452 | 0.8520 |
| All human(1744) | 0.8104 | 0.8106 | 0.8007 | 0.8008 |
| Human E2s(1229) | 0.7294 | 0.7294 | 0.7353 | 0.7257 |
| Rodent(438) | 0.7761 | 0.7688 | 0.7608 | 0.7912 |
| Random all(1091) | 0.8632 | 0.8619 | 0.8472 | 0.8644 |
| Random all(727) | 0.7953 | 0.8023 | 0.7830 | 0.7523 |
| Random all(545) | 0.7812 | 0.7785 | 0.7679 | 0.7748 |
| Random all(218) | 0.7017 | 0.6941 | 0.6681 | 0.7292 |
| All-19 | 0.8224 | |||
| All human-19 | 0.7809 | |||
| Rodent-19 | 0.7097 | |||
By randomly choosing 10 subsets from each of the 5 datasets whose positive subset departs from the negative one greatly, we try to alleviate the bias between the value of sensitivity and specificity. Here we show the average results tested on Satron's dataset with cut-off value as 0.6. All the sub-datasets and results have been supplied in the additional file (see additional file 1 and additional file 3).
| A+B+C | A+B | B+C | A+C | A | B | C | |
| Accuracy | 68.43 ± 1.45% | 67.13 ± 1.95% | 65.37 ± 2.19% | 68.09 ± 2.14% | 68.76 ± 2.59% | 61.55 ± 2.56% | 63.90 ± 2.52% |
| Sensitivity | 69.05 ± 1.20% | 69.78 ± 2.98% | 64.33 ± 3.41% | 67.25 ± 3.19% | 69.72 ± 3.63% | 63.93 ± 2.96% | 58.03 ± 3.63% |
| Specificity | 67.81 ± 3.07% | 64.49 ± 2.21% | 66.4 ± 3.36% | 68.93 ± 2.53% | 67.81 ± 2.68% | 59.16 ± 3.63% | 69.78 ± 4.32% |
| Pearson | 0.4683 ± 0.03219 | 0.4618 ± 0.03205 | 0.4112 ± 0.03306 | 0.4447 ± 0.03097 | 0.4718 ± 0.03558 | 0.3758 ± 0.02486 | 0.3196 ± 0.02920 |
| ROC | 0.7452 ± 0.02383 | 0.7255 ± 0.02886 | 0.7093 ± 0.02450 | 0.7367 ± 0.02454 | 0.7411 ± 0.02933 | 0.6768 ± 0.02220 | 0.6715 ± 0.02659 |
Satron's dataset, cut-off value = 0.6. Record in the positive part before randomly chosen: 178; Records in the negative part before randomly chosen: 383. Each of the 10 randomly chosen subsets has 178 records as positive and 178 out of 383 as negative part. The pseudorandom numbers were generated by the java class of java.lang.Random.