| Literature DB >> 20380733 |
Qi Liu1, Qian Xu, Vincent W Zheng, Hong Xue, Zhiwei Cao, Qiang Yang.
Abstract
BACKGROUND: Gene silencing using exogenous small interfering RNAs (siRNAs) is now a widespread molecular tool for gene functional study and new-drug target identification. The key mechanism in this technique is to design efficient siRNAs that incorporated into the RNA-induced silencing complexes (RISC) to bind and interact with the mRNA targets to repress their translations to proteins. Although considerable progress has been made in the computational analysis of siRNA binding efficacy, few joint analysis of different RNAi experiments conducted under different experimental scenarios has been done in research so far, while the joint analysis is an important issue in cross-platform siRNA efficacy prediction. A collective analysis of RNAi mechanisms for different datasets and experimental conditions can often provide new clues on the design of potent siRNAs.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20380733 PMCID: PMC2873531 DOI: 10.1186/1471-2105-11-181
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Description of the 14 cross-platform RNAi experiments as well as another 2 independent experiments performed at low siRNA concentrations.
| Experiments | #mRNA | #siRNA | Platform label scale (min-max) |
|---|---|---|---|
| E1 | 2 | 179 | 4.0-127.8 |
| E2 | 2 | 67 | 22.0-118.8 |
| E3 | 1 | 14 | 2-52 |
| E4 | 10 | 50 | 1.0-115.7 |
| E5 | 2 | 12 | 18-110 |
| E6 | 4 | 50 | 5.8-124.4 |
| E7 | 3 | 19 | 20-127 |
| E8 | 21 | 103 | 16.0-100.0 |
| E9 | 1 | 34 | 1.5-93.9 |
| E10 | 1 | 6 | 32-77 |
| E11 | 2 | 24 | 5-120 |
| E12 | 2 | 20 | 11.4-76.4 |
| E13 | 1 | 5 | 0-34 |
| E14 | 3 | 40 | 14-110 |
| IE1 | 6 | 20 | 1.56-100 |
| IE2 | 4 | 12 | 1-80 |
"E" denotes "Experiment";"IE" denotes "Independent experiment".
Feature weights for siRNA design derived from multi-task learning
| Feature | Weight | |
|---|---|---|
| 1 | position-dependent nucleotide consensus: sum | 0.1954 |
| 2 | Δ G difference between positions 1 and 18 | 0.0987 |
| 3 | Δ G of sense-antisense siRNA duplexes | 0.0774 |
| 4 | position-dependent nucleotide consensus: preferred | 0.0733 |
| 5 | preferred dinucleotide content index | 0.0726 |
| 6 | local target mRNA stabilities (Δ G) | 0.0651 |
| 7 | position-dependent nucleotide consensus: avoided | 0.0640 |
| 8 | nucleotide content: U | 0.0603 |
| 9 | stability (Δ G) of dimers of siRNAs antisense strands | 0.0537 |
| 10 | stability profile for each two neighboring base pairs in the siRNA sense-antisense in position 1 | 0.0384 |
| 11 | siRNA antisense strand intra-molecular structure stability (Δ G) | 0.0327 |
| 12 | avoid dinucleotide content index | 0.0324 |
| 13 | stability profile for each two neighboring base pairs in the siRNA sense-antisense in position 13 | 0.0298 |
| 14 | stability profile for each two neighboring base pairs in the siRNA sense-antisense in position 18 | 0.0279 |
| 15 | nucleotide content: G | 0.0267 |
| 16 | stability profile for each two neighboring base pairs in the siRNA sense-antisense in position 2 | 0.0222 |
| 17 | stability profile for each two neighboring base pairs in the siRNA sense-antisense in position 6 | 0.0159 |
| 18 | stability profile for each two neighboring base pairs in the siRNA sense-antisense in position 14 | 0.0138 |
| 19 | frequency of potential targets for siRNA | 0.0000 |
Figure 1Computational framework in our study.
Comparison between linear ridge regression and support vector regression for single task siRNA efficacy prediction.
| Test | RMSE | ||||||
|---|---|---|---|---|---|---|---|
| T1 | T2 | T3 | T4 | T5 | T6 | T7 | |
| Linear ridge regression | 23.5544 | 23.0751 | 12.8477 | 30.2501 | 27.8395 | 32.8025 | 32.9677 |
| SVR with linear kernel | 23.6965 | 22.1477 | 13.3903 | 31.9928 | 26.1998 | 32.8823 | 32.2824 |
| SVR with radial basis function kernel | 29.6775 | 24.4753 | 13.5664 | 31.1238 | 37.2164 | 36.2681 | 43.4349 |
| T8 | T9 | T10 | T11 | T12 | T13 | T14 | |
| Linear ridge regression | 26.5710 | 13.6068 | 13.4394 | 36.9945 | 33.6679 | 17.3333 | 28.7044 |
| SVR with linear kernel | 27.0521 | 15.2284 | 25.9767 | 34.9588 | 32.8858 | 19.9620 | 30.7536 |
| SVR with radial basis function kernel | 25.6995 | 43.3165 | 25.9767 | 32.9811 | 26.6623 | 19.9620 | 25.8301 |
"E" denotes "Experiment". Linear ridge regression and support vector regression(with linear kernel and radial basis function kernel) are trained with 50% of the data from each experiment, respectively. p-value calculated by pair t-test on linear ridge regression and SVR with linear kernel is 0.2592. p-value calculated by pair t-test on linear ridge regression and SVR with radial basis function kernel is 0.0913.
Single task learning with direct combination and label scaling for siRNA efficacy prediction.
| Test | RMSE | ||||||
|---|---|---|---|---|---|---|---|
| T1 | T2 | T3 | T4 | T5 | T6 | T7 | |
| Test 1 | 23.5500 | 23.0800 | 12.8500 | 30.2500 | 27.8400 | 32.8000 | 32.9700 |
| Test 2 | 24.9500 | 29.8900 | 31.2700 | 26.8300 | 32.1900 | 29.5200 | 29.2500 |
| T8 | T9 | T10 | T11 | T12 | T13 | T14 | |
| Test 1 | 26.5700 | 13.6100 | 13.4400 | 36.9900 | 33.6700 | 17.3300 | 28.7000 |
| Test 2 | 27.2600 | 15.8700 | 12.3700 | 26.2400 | 30.3800 | 21.4700 | 25.9700 |
"E" denotes "Experiment". Test 1: Selected 50% of the data from each experiment to train a regression model, and tested the model on the remain 50% of the data of each experiment, respectively. Test 2: Scaled all the experimental labels into [0,1] and pooling together 50% of the data from each experiment to train a general model, and tested the model on the remain 50% of the data of each experiment, respectively. p-value calculated by pair t-test on Test 1 and Test 2 is 0.7043.
Figure 2Comparison between multi-task learning and single task learning for siRNA efficacy prediction. Each model is trained with 10%, 30%, 50%, 70% and 90% of the data from each experiment, respectively. STL:Single task learning. MTL: Multi-task learning. RMSE: Root mean square error.
Comparison between multi-task learning and single task learning for siRNA efficacy prediction.
| Test 3 | RMSE | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Single task learning | Multi-task learning | |||||||||
| 10% | 30% | 50% | 70% | 90% | 10% | 30% | 50% | 70% | 90% | |
| E1 | 28.3515 | 24.1538 | 23.5544 | 22.8080 | 23.4952 | 27.6417 | 24.0150 | 23.5313 | 22.7155 | 23.4194 |
| E2 | 28.1353 | 24.7949 | 23.0751 | 21.6717 | 20.5756 | 25.4531 | 22.0457 | 21.1488 | 20.6969 | 20.5423 |
| E3 | 14.1021 | 12.7868 | 12.8477 | 12.8390 | 11.2925 | 12.4403 | 11.5239 | 11.2708 | 11.0255 | 10.0032 |
| E4 | 36.7345 | 32.4953 | 30.2501 | 28.5389 | 25.5934 | 31.6222 | 27.8789 | 27.9831 | 27.5373 | 27.2947 |
| E5 | 37.7847 | 31.7246 | 27.8395 | 27.2221 | 32.1410 | 37.6029 | 27.5771 | 24.0499 | 23.5798 | 24.7571 |
| E6 | 37.9884 | 36.6409 | 32.8025 | 31.0090 | 27.0574 | 34.9948 | 31.9597 | 30.0650 | 28.7117 | 24.6019 |
| E7 | 46.1408 | 40.6899 | 32.9677 | 34.0303 | 29.4516 | 45.3279 | 34.8915 | 30.3053 | 29.9185 | 27.0738 |
| E8 | 29.4008 | 27.4798 | 26.5710 | 24.8380 | 26.7436 | 26.5423 | 24.6162 | 24.4261 | 23.7297 | 24.9686 |
| E9 | 31.9814 | 15.5796 | 13.6068 | 13.8639 | 12.2373 | 35.7421 | 19.8070 | 17.2665 | 16.2435 | 13.3189 |
| E10 | 56.8917 | 19.3907 | 13.4394 | 12.8776 | 11.4408 | 56.8917 | 19.1463 | 12.9610 | 12.2792 | 11.2242 |
| E11 | 40.4318 | 37.2323 | 36.9945 | 34.1775 | 32.1200 | 38.9771 | 31.7360 | 31.0361 | 29.2156 | 28.6740 |
| E12 | 30.7272 | 29.4070 | 33.6679 | 35.2603 | 24.8004 | 29.4405 | 24.4063 | 24.4616 | 24.8690 | 22.1497 |
| E13 | 18.8997 | 18.0514 | 17.3333 | 14.1208 | 13.5105 | 18.8997 | 17.5524 | 16.4534 | 13.1908 | 10.9338 |
| E14 | 34.8579 | 33.0815 | 28.7044 | 25.9012 | 25.7859 | 30.0917 | 27.8195 | 25.3132 | 24.3546 | 24.6832 |
"E" denotes "Experiment". Test 3: Comparison between multi-task learning and single task learning for siRNA efficacy prediction, both trained with 10%, 30%, 50%, 70% and 90% of the data from each experiment, respectively. p-values calculated by pair t-test on multi-task learning and single task learning with different percentages of training data: 0.0268(10%); 0.0046(30%); 0.0093(50%); 0.0151(70%); 0.0389(90%).
Tests on two independent experiments.
| Tests | RMSE | ||
|---|---|---|---|
| IE1 | IE2 | ||
| Test 4 (50% training data) | Single task learning | 34.1116 | 35.8600 |
| Multi-task learning | 29.7394 | 30.5459 | |
| Test 5 (with added tasks, 50% training data) | 26.6910 | 26.1009 | |
"E" denotes "Experiment". Test 4: Comparison between single task learning and multi-task learning on the two independent experiments, both trained with 50% of the data from each experiment, respectively. Test 5: Multi-task learning on the two independent experiments together with the former 14 experiments, totally 16 experiments, trained with 50% of the data from each experiment, respectively.
Description of the RNAi dataset with viewing each mRNA and its binding siRNAs as a task.
| Tasks | #mRNA | #siRNA |
|---|---|---|
| T1 | M60857 | 89 |
| T2 | U47298 | 90 |
| T3 | J03132 | 38 |
| T4 | U92436 | 29 |
| T5 | LaminA | 44 |
| T6 | M16553 | 8 |
| T7 | NM_031313 | 11 |
| T8 | NM_020548 | 9 |
| T9 | X75932 | 10 |
| T10 | NM_002046 | 20 |
| T11 | M26071 | 10 |
| T12 | U47298 | 34 |
| T13 | M16553 | 6 |
| T14 | NM_001315 | 8 |
| T15 | NM_000875 | 16 |
| T16 | M25346 | 8 |
| T17 | AF493916 | 10 |
| T18 | AK122643 | 14 |
| T19 | NM_144586 | 14 |
| T20 | M33197 | 12 |
"T" denotes "Task".
Comparison between multi-task learning and single task learning in a "mRNA" task level.
| Test | RMSE | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| T1 | T2 | T3 | T4 | T5 | T6 | T7 | T8 | T9 | T10 | |
| Test 6 | 22.9156 | 29.7953 | 24.4563 | 20.2755 | 13.6265 | 25.5433 | 28.6792 | 28.6911 | 13.8089 | 47.9704 |
| Test 7 | 22.0309 | 28.8772 | 34.4272 | 22.4800 | 29.5645 | 22.3986 | 23.4719 | 42.3385 | 16.1072 | 34.2505 |
| Test 8 | 22.2569 | 29.4852 | 22.9905 | 19.1120 | 11.7851 | 23.5123 | 29.9718 | 28.4760 | 11.7036 | 37.8482 |
| T11 | T12 | T13 | T14 | T15 | T16 | T17 | T18 | T19 | T20 | |
| Test 6 | 43.6353 | 13.9306 | 14.4649 | 5.6649 | 35.8113 | 33.6464 | 29.6981 | 29.4559 | 30.2422 | 21.0494 |
| Test 7 | 35.4975 | 16.8432 | 13.0795 | 25.0440 | 26.3289 | 36.5158 | 29.9756 | 27.0347 | 26.0495 | 21.7607 |
| Test 8 | 41.2163 | 18.2205 | 13.6913 | 5.7872 | 27.3318 | 27.5945 | 23.6955 | 26.5286 | 24.3853 | 16.2990 |
"T" denotes "Task". Test 6: Selected 50% of the data from each experiment to train a regression model, and tested the model on the remain 50% of the data of each experiment, respectively. Test 7: Scaled all the experimental labels into [0,1] and pooling together 50% of the data from each experiment to train a general model, and tested the model on the remain 50% of the data of each experiment, respectively. Test 8: Multi-task learning for siRNA efficacy prediction, trained with 50% of the data from each experiment, respectively. p-value calculated by pair t-test on Test 6 and Test 7 is 0.5900. p-value calculated by pair t-test on Test 6 and Test 8 is 0.0033.
Test on the efficacy prediction with siRNAs binding to single mRNA.
| Test 9 | RMSE | |||||
|---|---|---|---|---|---|---|
| Task 1 | Task 2 | Task 3 | Task 4 | Task 5 | ||
| STL | 21.7139 | 31.3104 | 22.0464 | 20.5358 | 31.3807 | |
| D1 | STL with combination and scaling | 20.8203 | 24.7029 | 21.2602 | 18.7345 | 28.9061 |
| STL | 32.3753 | 28.3268 | 27.7405 | 22.1219 | 33.1770 | |
| D2 | STL with combination and scaling | 26.9951 | 25.7676 | 25.0711 | 19.9418 | 32.4254 |
"T" denotes"Task". STL: single task learning. Test 9: Two datasets (D1 and D2) are randomly split into 5 sub-tasks and similar study as Test 1-Test 2 are performed on them respectively.