| Literature DB >> 22655076 |
Qi Liu1, Han Zhou, Juan Cui, Zhiwei Cao, Ying Xu.
Abstract
RNA interference via exogenous short interference RNAs (siRNA) is increasingly more widely employed as a tool in gene function studies, drug target discovery and disease treatment. Currently there is a strong need for rational siRNA design to achieve more reliable and specific gene silencing; and to keep up with the increasing needs for a wider range of applications. While progress has been made in the ability to design siRNAs with specific targets, we are clearly at an infancy stage towards achieving rational design of siRNAs with high efficacy. Among the many obstacles to overcome, lack of general understanding of what sequence features of siRNAs may affect their silencing efficacy and of large-scale homogeneous data needed to carry out such association analyses represents two challenges. To address these issues, we investigated a feature-selection based in-silico siRNA design from a novel cross-platform data integration perspective. An integration analysis of 4,482 siRNAs from ten meta-datasets was conducted for ranking siRNA features, according to their possible importance to the silencing efficacy of siRNAs across heterogeneous data sources. Our ranking analysis revealed for the first time the most relevant features based on cross-platform experiments, which compares favorably with the traditional in-silico siRNA feature screening based on the small samples of individual platform data. We believe that our feature ranking analysis can offer more creditable suggestions to help improving the design of siRNA with specific silencing targets. Data and scripts are available at http://csbl.bmb.uga.edu/publications/materials/qiliu/siRNA.html.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22655076 PMCID: PMC3360065 DOI: 10.1371/journal.pone.0037879
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Data description for ten siRNA datasets.
| ID | Dataset | Size | Source | siRNA sequence | Concentration |
|
| Novartis's data | 2431 | Huesken, et al., 2005 | antisence | 50 nM |
|
| Jagla's data | 601 | Jagla, et al., 2005 | antisence | 100 nM |
|
| Katoh's data | 702 | Katoh and Suzuki, 2007 | sense | 10/25 nM |
|
| Amgen-Dharmacon | 239 | Reynolds, et al., 2004 | antisence | 100 nM |
|
| Harborth'data | 42 | Harborth, et al., 2003 | antisence | 100 nM |
|
| Hsieh's data | 108 | Hsieh, et al., 2004 | antisence | 100 nM |
|
| Khvorova's data | 10 | Khvorova, et al., 2003 | antisence | 100 nM |
|
| Vickers'data | 76 | Vickers, et al., 2003 | antisence | 100 nM |
|
| Ui-Tei's data | 50 | Ui-Tei, et al., 2004 | antisence | 50 nM |
|
| Amarzguioui's data | 223 | Amarzguioui and Prydz, 2004 | sense | 25 nM |
Figure 1The computational framework for integrated cross-platform feature selection in siRNA design.
Accuracy of three regression models for siRNA efficacy prediction.
| RMSE | ||||||||||
| Norm form | D1 | D2 | D3 | D4 | D5 | D6 | D7 | D8 | D9 | D10 |
|
| 0.1521 | 0.2359 | 0.1493 | 0.2432 | 0.1062 | 0.2159 | 0.0155 | 0.1938 | 0.1921 | 0.2809 |
|
| 0.1502 | 0.2495 | 0.1584 | 0.2643 | 0.0864 | 0.2318 | 0.0000 | 0.2188 | 0.2893 | 0.2885 |
|
| 0.1575 | 0.2503 | 0.1624 | 0.2716 | 0.0813 | 0.2319 | 0.0018 | 0.2028 | 0.2690 | 0.2893 |
Figure 2A representation of the integrated ranking results.
Feature ranking and correlation coefficients for siRNA design derived from cross-platform data integration.
| Rank ID | Feat No | Feature explanation | R | p-value | Support | Opposite |
| 1 | 414 | ‘GG | 0.3716 | 0.0000 | Lu and Mathews, 2008 | Klingelhoefer et al., 2009 |
| Matveeva et al., 2007 | ||||||
| Shabalina et al., 2006 | ||||||
| 2 | 40 | ‘U @ PS1’ | 0.2791 | 0.0003 | Jagla et al., 2005 | |
| Katoh and Suzuki, 2007 | ||||||
| Reynolds et al., 2004 | ||||||
| Shabalina et al., 2006 | ||||||
| Vert et al., 2006 | ||||||
| 3 | 20 | ‘A @ PS19’ | −0.1443 | 0.1572 | Huesken et al.,2005 | |
| Matveeva et al., 2007 | ||||||
| Shabalina et al., 2006 | ||||||
| 4 | 431 | ‘GG in PS[18,19]’ | −0.1684 | 0.1044 | Klingelhoefer et al., 2009 | |
| Lu and Mathews, 2008 | ||||||
| Matveeva et al., 2007 | ||||||
| Shabalina et al., 2006 | ||||||
| 5 | 21 | ‘G @ PS1’ | −0.2137 | 0.0023 | Matveeva et al., 2007 | |
| Shabalina et al., 2006 | ||||||
| 6 | 494 | ‘GC content<0.55’ | 0.2441 | 0.0038 | Matveeva et al., 2007 | |
| Chalk et al., 2004 | ||||||
| 7 | 59 | ‘C @ PS1’ | −0.1836 | 0.0075 | Matveeva et al., 2007 | |
| Shabalina et al., 2006 | ||||||
| 8 | 77 | ‘C @ PS19’ | 0.1120 | 0.2283 | Huesken et al.,2005 | |
| Jagla et al., 2005 | ||||||
| Katoh and Suzuki, 2007 | ||||||
| Matveeva et al., 2007 | ||||||
| Shabalina et al., 2006 | ||||||
| 9 | 34 | ‘G @ PS14’ | −0.1366 | 0.0876 | Matveeva et al., 2007 | |
| Chalk et al., 2004 | ||||||
| 10 | 11 | ‘A @ PS10’ | 0.1374 | 0.0288 | Huesken et al.,2005 | |
| Jagla et al., 2005 | ||||||
| Katoh and Suzuki, 2007 | ||||||
| Matveeva et al., 2007 | ||||||
| Reynolds et al., 2004 | ||||||
| Vert et al., 2006 | ||||||
| 11 | 65 | ‘C @ PS7’ | −0.1264 | 0.0743 | Katoh and Suzuki, 2007 | |
| Reynolds et al., 2004 | ||||||
| Shabalina et al., 2006 | ||||||
| 12 | 491 | ‘GC content<0.7’ | 0.2408 | 0.0024 | Elbashir et al., | |
| 13 | 492 | ‘GC content<0.65’ | 0.2436 | 0.0043 | ||
| 14 | 493 | ‘GC content<0.6’ | 0.2444 | 0.0170 | Wang et al., 2004 | |
| 15 | 2 | ‘A @ PS1’ | 0.1269 | 0.1147 | Jagla et al., 2005 | |
| Katoh and Suzuki, 2007 | ||||||
| Matveeva et al., 2007 | ||||||
| Reynolds et al., 2004 | ||||||
| Shabalina et al., 2006 | ||||||
| 16 | 39 | ‘G @ PS19’ | 0.0675 | 0.1340 | Jagla et al., 2005 | |
| Katoh and Suzuki, 2007 | ||||||
| Matveeva et al., 2007 | ||||||
| 17 | 125 | ‘GCC in PS[1..19]’ | −0.1272 | 0.2189 | Klingelhoefer et al., 2009 | |
| 18 | 152 | ‘CUU in PS[1..19]’ | 0.1420 | 0.0673 | Vert et al., 2006 | |
| 19 | 92 | ‘CU in PS[1..19]’ | 0.1260 | 0.2946 | ||
| 20 | 76 | ‘C @ PS18’ | 0.0705 | 0.2627 | Shabalina et al., 2006 | |
| 21 | 157 | ‘CCC in PS[1..19]’ | −0.0856 | 0.2423 | Vert et al., 2006 | |
| 22 | 117 | ‘GGC in PS[1..19]’ | −0.1259 | 0.1330 | ||
| 23 | 33 | ‘G @ PS13’ | −0.1174 | 0.1282 | Matveeva et al., 2007 | |
| 24 | 485 | ‘GC content>0.45’ | −0.1719 | 0.0388 | ||
| 25 | 19 | ‘A @ PS18’ | −0.0867 | 0.2145 | Matveeva et al., 2007 | |
| 26 | 140 | ‘UCU in PS[1..19]’ | 0.1395 | 0.0562 | Klingelhoefer et al., 2009 | |
| 27 | 49 | ‘U @ PS10’ | 0.0064 | 0.4615 | Jagla et al., 2005 | |
| Katoh and Suzuki, 2007 | ||||||
| 28 | 155 | ‘CCG in PS[1..19]’ | −0.1352 | 0.1027 | Vert et al., 2006 | |
| 29 | 115 | ‘GGG in PS[1..19]’ | −0.1561 | 0.0092 | ||
| 30 | 471 | ‘G stretch of length > = 3’ | −0.1561 | 0.0092 | ||
| 31 | 120 | GUU in PS[1..19]’ | 0.0362 | 0.3726 | Vert et al., 2006 |
GG denotes the thermodynamic stability of dinucleotides in siRNA antisense strand.
PS denotes the position of nucleotides in the siRNA sequence.
Comparison between the model with 31 identified features (31_Feat) and the model with Klingelhoefer's et al. 19 features (19_Feat) for siRNA efficacy prediction.
| RMSE | |||||||||
| D1 | D2 | D3 | D4 | D5 | D6 | D8 | D9 | D10 | |
|
| 0.1557 | 0.2486 | 0.1615 | 0.2650 | 0.1592 | 0.2418 | 0.4474 | 0.2601 | 0.2205 |
|
| 0.1641 | 0.2516 | 0.1595 | 0.2662 | 0.1601 | 0.2517 | 0.4161 | 0.2546 | 0.2753 |