| Literature DB >> 18073195 |
Zhi John Lu1, David H Mathews.
Abstract
Small interfering RNA (siRNA) are widely used to infer gene function. Here, insights in the equilibrium of siRNA-target hybridization are used for selection of efficient siRNA. The accessibilities of siRNA and target mRNA for hybridization, as measured by folding free energy change, are shown to be significantly correlated with efficacy. For this study, a partition function calculation that considers all possible secondary structures is used to predict target site accessibility; a significant improvement over calculations that consider only the predicted lowest free energy structure or a set of low free energy structures. The predicted thermodynamic features, in addition to siRNA sequence features, are used as input for a support vector machine that selects functional siRNA. The method works well for predicting efficient siRNA (efficacy >70%) in a large siRNA data set from Novartis. The positive predictive value (percentage of sites predicted to be efficient for silencing that are) is as high as 87.6%. The sensitivity and specificity are 22.7 and 96.5%, respectively. When tested on data from different sources, the positive predictive value increased 8.1% by adding equilibrium terms to 25 local sequence features. Prediction of hybridization affinity using partition functions is now available in the RNAstructure software package.Entities:
Mesh:
Substances:
Year: 2007 PMID: 18073195 PMCID: PMC2241856 DOI: 10.1093/nar/gkm920
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Equilibrium considered in the OligoWalk algorithm for predicting the affinity of a structured oligonucleotide (siRNA) to a structured target (mRNA). Involved proteins are not shown and were neglected in the calculations. The free energy change of each equilibrium is Δ G° = − RT ln K, where, K is the equilibrium constant. K1, K2, K3 and K4 are related to , , and , respectively. K0 is also related to . Folding in the target (at the region of hybridization) and self-structure in the siRNA both compete with the formation of the siRNA-target complex needed for cleavage by RISC.
Thermodynamic features predicted by OligoWalk algorithm
| Free energy type | Correlation between ln(Activity) and different free energy changes | |
|---|---|---|
| −0.2298 | 1.78 × 10−15 | |
| −0.1949 (−0.1799) | 2.66 × 10−15 (3.33 × 10−15) | |
| −0.1882 (−0.1873) | 3.11 × 10−15 (2.89 × 10−15) | |
| −0.1812 (−0.1790) | 3.11 × 10−15 (3.11 × 10−15) | |
| −0.3507 | 8.88 × 10−16 | |
aThe correlations were calculated within Novartis data set (12) plus the data sets collected by Shabalina et al. (19). Activity is the percentage amount of the targeted mRNA after RNA interference compared to the control. Here, r is the correlation coefficient. Negative correlations indicate that decreasing each folding free energy change (increased stability) results in increased ln (activity) (decreased silencing efficiency).
bA P-value (probability) <0.05 is statistically significant.
cThe values were calculated from partition function method with folding size of 800 nucleotides centered on the binding site.
dThe values in parenthesis are calculated with the optimal structure prediction method.
eThe best correlation was found by considering 2bp at the end, including the AU end penalty (28).
Prediction performance for efficient siRNA (inhibition efficacy >70%)
| Parameters for SVM | PPV (%) | Sensitivity (%) | Specificity (%) |
|---|---|---|---|
| All 28 features | 78.6 | 22.9 | 95.1 |
| Not considering siRNA's self-structure free energy changes | 77.0 | 19.8 | 95.5 |
| Not considering mRNA's self-structure free energy change | 73.5 | 21.2 | 94.0 |
| Not considering either siRNA or mRNA self-structure free energy changes | 70.5 | 19.1 | 93.7 |
The SVM was trained with Novartis data set (12) and tested on the data sets from different sources, which are collected by Shabalina et al. (19). Positive predictive value (PPV), the percent of selected siRNA sequences that are efficient at silencing, is the main criterion to show the best prediction performance because it measures how well a set of efficient siRNA sequences can be selected.
Figure 2.The correlation between the ln(activity) and the free energy cost of opening local structure of mRNA (). Three prediction methods are used, optimal structure prediction (lowest free energy structure), suboptimal structure prediction (a set of heuristically generated low free energy structures) and the partition function calculation. Activity is the fraction of the targeted mRNA expression after RNA interference treatment as compared to the control. Different sizes of local structure centered on the binding region were folded. 4000 nucleotides of flanking sequence are folded if the sequence is larger than 4000 nucleotides in global folding. The y-axis, r, is the correlation coefficient. The correlations were calculated within Novartis data set (12) plus all other data sets collected by Shabalina et al. (19).
Correlations between ln (activity) of siRNA and different features
| Individual feature | Position | ||
|---|---|---|---|
| mRNA | −0.1971 | 1.11 × 10−15 | |
| all | −0.1895 | 1.55 × 10−15 | |
| all | −0.1974 | 2.89 × 10−15 | |
| all | −0.2501 | 1.78 × 10−15 | |
| 1 versus 19 | −0.3507 | 6.66 × 10−16 | |
| ΔG° | 1 | −0.3427 | 4.44 × 10−16 |
| ΔH° | 1 | −0.3215 | 1.11 × 10−15 |
| U | 1 | −0.2625 | 1.33 × 10−15 |
| G | 1 | 0.2385 | 2.22 × 10−15 |
| ΔH° | all | −0.2473 | 1.78 × 10−15 |
| U | all | −0.1962 | 2.22 × 10−15 |
| UU | 1 | −0.193 | 1.78 × 10−15 |
| G | all | 0.1838 | 3.11 × 10−15 |
| GG | 1 | 0.1434 | 1.20 × 10−12 |
| GC | 1 | 0.1301 | 1.21 × 10−10 |
| GG | all | 0.1605 | 4.88 × 10−15 |
| ΔG° | 2 | −0.1659 | 4.22 × 10−15 |
| UA | all | −0.1267 | 3.61 × 10−10 |
| U | 2 | −0.1332 | 4.26 × 10−11 |
| C | 1 | 0.1434 | 1.21 × 10−12 |
| CC | all | 0.1447 | 7.58 × 10−13 |
| ΔG° | 18 | 0.1024 | 4.22 × 10−07 |
| CC | 1 | 0.1116 | 3.46 × 10−08 |
| GC | all | 0.1403 | 3.63 × 10−12 |
| CG | 1 | 0.1018 | 4.86 × 10−07 |
| ΔG° | 13 | −0.1092 | 6.81 × 10−08 |
| UU | all | −0.1414 | 2.49 × 10−12 |
| A | 19 | 0.0804 | 7.29 × 10−05 |
The siRNA (19 base pairs) sequence features are chosen from the most correlated features found by Ladunga (20) in Novartis data set (12). They are compared with the thermodynamic features predicted by the OligoWalk algorithm. The correlations are calculated within Novartis data set.
aActivity is the fraction of the targeted mRNA after RNA interference compared to the control.
bThe values were calculated from partition function method with folding size as 800 nucleotides centered on the binding site.
Figure 3.ROC curve and PPV of SVM prediction (a) ROC curves and (b) PPV as a function of sensitivity: all 28 features (listed in Table 2) are used to train the SVM. siRNA with different silencing efficacies (>50% and >70%) within Novartis data set (12) are predicted (see Methods section). (c) ROC curves and (d) PPV as a function of sensitivity: the SVM is trained on the whole Novartis data set and tested on the database collected by Shabalina et al. (19). Plots are shown for selecting efficient siRNA (silencing efficacies >70%) both with and without self-structure folding free energy terms. There are 28 features in total (Table 2) when including local sequences terms and folding free energy changes. Thermodynamic features are those predicted by OligoWalk (Table 1).