| Literature DB >> 17553157 |
Abstract
BACKGROUND: RNA interference (RNAi) is a naturally occurring phenomenon that results in the suppression of a target RNA sequence utilizing a variety of possible methods and pathways. To dissect the factors that result in effective siRNA sequences a regression kernel Support Vector Machine (SVM) approach was used to quantitatively model RNA interference activities.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17553157 PMCID: PMC1906837 DOI: 10.1186/1471-2105-8-182
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Guide strand position specific base composition correlation coefficients with RNA interference activities. Positive correlations can be interpreted as the presence of the nucleotide at the position leads to greater RNAi activity, while negative correlations are the presence of the nucleotide resulting in a decreased RNAi activity. Position 1 is the 5' most position of the guide strand.
Figure 2Guide strand secondary structure, thermodynamics, entropy and target secondary structure position specific correlation coefficients with RNA interference activities. Thermodynamics and entropy correlation measures comprise a 2 nucleotide sliding window. Secondary structures can be interpreted as the relative contribution of this position being within a secondary structure and that contribution leads to RNAi activity. Position 1 is the 5' most position of the guide strand.
Figure 3Target secondary structure position specific correlation coefficients to RNA interference activities. Individual positions within an intramolecular base pairing tend to have an overall negative correlation with RNAi activity.
Figure 4Target secondary structure position specific correlation coefficients with directionality of base pairing to RNA interference activities. Correlations of the site pairing with 5' more site are in yellow and pairing with a 3' more site are in blue.
Guide strand position specific base composition (Method 1) for training RBF-epsilon regression SVM model
| train2431 | train2431 | train2431 | |||||
| test2431 | test579 | test2431 10 × cross val | |||||
| FN2431 | |||||||
| 0 | 84(84) | 0.016 | 0.095 | 0.711 | 0.026 | ||
| 1 | 65(64.2) | 0.774 | 0.016 | 0.500 | 0.098 | 0.025 | |
| 2 | 45(43.9) | 0.742 | 0.018 | 0.494 | 0.100 | 0.687 | 0.027 |
| 3 | 32(30.3) | 0.706 | 0.020 | 0.478 | 0.103 | 0.675 | 0.027 |
| 4 | 20(18.9) | 0.658 | 0.023 | 0.460 | 0.103 | 0.658 | 0.028 |
| 5 | 15(14.7) | 0.627 | 0.024 | 0.437 | 0.101 | 0.648 | 0.029 |
| 6 | 11(8.3) | 0.599 | 0.025 | 0.428 | 0.105 | 0.587 | 0.031 |
| 7 | 5(5) | 0.489 | 0.030 | 0.340 | 0.104 | 0.532 | 0.034 |
| 8 | 4(3.7) | 0.473 | 0.031 | 0.337 | 0.104 | 0.504 | 0.035 |
| 9 | 2(2) | 0.407 | 0.033 | 0.256 | 0.108 | 0.454 | 0.037 |
Models trained on dataset2431 and testing performed with dataset2431, dataset579 and 10 × cross validation on dataset2431, features removed by increasing stringency of t-test of individual feature to activity from dataset2431.
Feature numbers in parentheses are the average number of features in cross validations.
Entries in bold are column maximums and are simply visual landmarks.
Feature mapping methods performance in RBF-epsilon regression SVM model training and testing within dataset2431
| train2431 | |||||
| test2431 10 × cross validation | |||||
| Feature mapping method | FN2431 | ||||
| 1-Position specific base | 84 | 0.784 | 0.016 | 0.711 | 0.026 |
| 2-Thermodynamics | 23 | 0.915 | 0.007 | 0.640 | 0.029 |
| 3-Entropy | 23 | 0.730 | 0.021 | 0.094 | 0.046 |
| 4-Guide strand structure | 24 | 0.430 | 0.033 | 0.293 | 0.041 |
| 5-Guide strand features | 32 | 0.266 | 0.037 | 0.243 | 0.042 |
| 6- | 16 | 0.408 | 0.033 | 0.291 | 0.041 |
| 7- | 64 | 0.656 | 0.024 | 0.435 | 0.037 |
| 8- | 256 | 0.590 | 0.027 | 0.532 | 0.034 |
| 9- | 1024* | 0.590 | 0.029 | 0.487 | 0.036 |
| 10- | 4096* | 0.621 | 0.036 | 0.439 | 0.036 |
| 11- | 1360* | 0.614 | 0.026 | 0.559 | 0.033 |
| 12-Target strand structure-nondirectional | 22 | 0.646 | 0.024 | 0.257 | 0.045 |
| 13-Target strand structure-directional | 43 | 0.607 | 0.025 | 0.277 | 0.042 |
| 14-Target imprecise thermo | 22 | 0.932 | 0.007 | 0.272 | 0.045 |
FNdataset = Feature Number count from dataset, R = Pearson correlation coefficient, MSE = mean squared error * Theoretical, but five 5-grams and several 6-grams are absent in the present dataset, reducing the effective feature set size.
Feature mapping method performance in RBF-epsilon regression SVM modeling, alternatively training and testing between dataset2431 and datatset579 and 10 × cross validation within dataset2431 or datatset579
| train2431 | train579 | ||||||||
| test579 | test2431 10 × cross validation | test2431 | test579 10 × cross validation | ||||||
| Method | FN2431 | ||||||||
| 1- | 84 | 0.510 | 0.095 | 0.711 | 0.026 | 0.485 | 0.054 | 0.562 | 0.079 |
| 2- | 23 | 0.379 | 0.105 | 0.640 | 0.029 | 0.367 | 0.069 | 0.500 | 0.087 |
| 3- | 23 | 0.130 | 0.115 | 0.094 | 0.046 | 0.017† | 0.138 | 0.026† | 0.118 |
| 4- | 24 | 0.202 | 0.115 | 0.293 | 0.041 | 0.214 | 0.073 | 0.214 | 0.041 |
| 5- | 32 | 0.214 | 0.112 | 0.243 | 0.042 | 0.164 | 0.046 | 0.194 | 0.107 |
| 11- | 1360 | 0.247 | 0.109 | 0.559 | 0.033 | 0.192 | 0.055 | 0.469 | 0.088 |
| 13- | 43 | 0.045† | 0.111 | 0.277 | 0.042 | 0.071 | 0.104 | 0.262 | 0.105 |
| 14- | 22 | 0.022† | 0.107 | 0.272 | 0.045 | 0.020† | 0.067 | 0.182 | 0.118 |
Method numbers are from Table 1.
FNdataset = Feature Number count from dataset
R = Pearson correlation coefficient, values with a † are not able to reject the HO: R = 0.
MSE = mean squared error
Guide strand position specific base composition (Method 1) for training RBF-epsilon regression SVM model
| train579 | train579 | train579 | |||||
| test579 | test2431 | test579 10 × cross val | |||||
| FN579 | |||||||
| 0 | 84(84) | 0.716 | 0.048 | 0.054 | 0.079 | ||
| 1 | 45(45.9) | 0.048 | 0.483 | 0.056 | 0.541 | 0.081 | |
| 2 | 22(21) | 0.645 | 0.057 | 0.467 | 0.058 | 0.449 | 0.091 |
| 3 | 8(7.1) | 0.489 | 0.075 | 0.353 | 0.055 | 0.419 | 0.092 |
| 4 | 4(3.7) | 0.418 | 0.082 | 0.350 | 0.052 | 0.424 | 0.092 |
| 5 | 3(2.2) | 0.397 | 0.083 | 0.334 | 0.051 | 0.363 | 0.097 |
| 6 | 2(2) | 0.327 | 0.089 | 0.304 | 0.049 | 0.340 | 0.099 |
| 7 | 1(0.2) | 0.281 | 0.093 | 0.176 | 0.053 | - | - |
| 8 | 0(0) | - | - | - | - | - | - |
| 9 | 0(0) | - | - | - | - | - | - |
Models trained on dataset579 and testing performed with dataset579, dataset2431 and 10 × cross validation on dataset579, features removed by increasing stringency of t-test of individual feature to activity from dataset579.
Feature numbers in parentheses are the average number of features in cross validations.
Entries in bold are column maximums and are simply visual landmarks.
Guide strand thermodynamics (Method 2) for training RBF-epsilon regression SVM model
| train2431 | train2431 | train2431 | |||||
| test2431 | test579 | test2431 10 × cross val | |||||
| FN2431 | |||||||
| 0 | 23(23) | 0.007 | 0.379 | 0.105 | 0.640 | 0.029 | |
| 1 | 20(20.2) | 0.912 | 0.007 | 0.379 | 0.104 | 0.642 | 0.029 |
| 2 | 19(19) | 0.911 | 0.007 | 0.363 | 0.113 | 0.641 | 0.029 |
| 3 | 17(17) | 0.906 | 0.008 | 0.383 | 0.111 | 0.642 | 0.029 |
| 4 | 17(15.8) | 0.906 | 0.008 | 0.383 | 0.111 | 0.640 | 0.029 |
| 5 | 13(10.9) | 0.880 | 0.009 | 0.366 | 0.108 | 0.029 | |
| 6 | 9(7.7) | 0.808 | 0.014 | 0.387 | 0.111 | 0.649 | 0.029 |
| 7 | 7(6.5) | 0.740 | 0.018 | 0.401 | 0.108 | 0.652 | 0.029 |
| 8 | 5(4.2) | 0.666 | 0.022 | 0.109 | 0.597 | 0.031 | |
| 9 | 4(3.9) | 0.587 | 0.026 | 0.334 | 0.111 | 0.586 | 0.032 |
Models trained on dataset2431 and testing performed with dataset2431, dataset579 and 10 × cross validation on dataset2431, features removed by increasing stringency of t-test of individual feature to activity from dataset2431.
Feature numbers in parentheses are the average number of features in cross validations.
Entries in bold are column maximums and are simply visual landmarks.
Guide strand thermodynamics (Method 2) for training RBF-epsilon regression SVM model
| train579 | train579 | train579 | |||||
| test579 | test2431 | test579 10 × cross val | |||||
| FN579 | |||||||
| 0 | 23(23) | 0.012 | 0.372 | 0.065 | 0.500 | 0.087 | |
| 1 | 17(15.9) | 0.943 | 0.014 | 0.402 | 0.061 | 0.510 | 0.086 |
| 2 | 11(11.1) | 0.886 | 0.023 | 0.330 | 0.061 | 0.548 | 0.082 |
| 3 | 8(8.2) | 0.782 | 0.038 | 0.350 | 0.067 | 0.081 | |
| 4 | 8(7) | 0.782 | 0.038 | 0.350 | 0.067 | 0.520 | 0.084 |
| 5 | 4(4) | 0.505 | 0.073 | 0.262 | 0.042 | 0.474 | 0.089 |
| 6 | 3(2.8) | 0.502 | 0.073 | 0.460 | 0.050 | 0.409 | 0.095 |
| 7 | 2(1.4) | 0.359 | 0.087 | 0.047 | 0.404 | 0.095 | |
| 8 | 1(0.6) | 0.339 | 0.089 | 0.421 | 0.051 | - | - |
| 9 | 0(0) | - | - | - | - | - | - |
Models trained on dataset579 and testing performed with dataset579, dataset2431 and 10 × cross validation on dataset579, features removed by increasing stringency of t-test of individual feature to activity from dataset579.
Feature numbers in parentheses are the average number of features in cross validations.
Entries in bold are column maximums and are simply visual landmarks.
Guide strand structure features (Method 4) for training RBF-epsilon regression SVM model
| train2431 | train2431 | train2431 | |||||
| test2431 | test579 | test2431 10 × cross val | |||||
| FN2431 | |||||||
| 0 | 24(24) | 0.430 | 0.033 | 0.115 | 0.293 | 0.041 | |
| 1 | 21(20.2) | 0.033 | 0.200 | 0.115 | 0.295 | 0.041 | |
| 2 | 18(17.1) | 0.427 | 0.033 | 0.170 | 0.117 | 0.296 | 0.041 |
| 3 | 14(13.4) | 0.396 | 0.034 | 0.158 | 0.113 | 0.291 | 0.041 |
| 4 | 9(8.1) | 0.370 | 0.035 | 0.145 | 0.114 | 0.305 | 0.041 |
| 5 | 5(5.1) | 0.333 | 0.036 | 0.173 | 0.114 | 0.041 | |
| 6 | 4(4) | 0.327 | 0.036 | 0.167 | 0.113 | 0.305 | 0.041 |
| 7 | 4(3.7) | 0.327 | 0.036 | 0.167 | 0.113 | 0.304 | 0.041 |
| 8 | 3(2.9) | 0.288 | 0.037 | 0.195 | 0.113 | 0.298 | 0.041 |
| 9 | 2(1) | 0.270 | 0.037 | 0.200 | 0.113 | 0.262 | 0.042 |
Models trained on dataset2431 and testing performed with dataset2431, dataset579 and 10 × cross validation on dataset2431, features removed by increasing stringency of t-test of individual feature to activity from dataset2431.
Feature numbers in parentheses are the average number of features in cross validations.
Entries in bold are column maximums and are simply visual landmarks.
Guide strand structure features (Method 4) for training RBF-epsilon regression SVM model
| train579 | train579 | train579 | |||||
| test579 | test2431 | test579 10 × cross val | |||||
| FN579 | |||||||
| 0 | 24(24) | 0.079 | 0.214 | 0.073 | 0.214 | 0.106 | |
| 1 | 16(15.4) | 0.433 | 0.080 | 0.187 | 0.081 | 0.212 | 0.106 |
| 2 | 8(7.5) | 0.319 | 0.089 | 0.230 | 0.059 | 0.251 | 0.105 |
| 3 | 5(4.5) | 0.308 | 0.089 | 0.210 | 0.059 | 0.104 | |
| 4 | 3(2.2) | 0.259 | 0.093 | 0.056 | - | - | |
| 5 | 0(0) | - | - | - | - | - | - |
| 6 | 0(0) | - | - | - | - | - | - |
| 7 | 0(0) | - | - | - | - | - | - |
| 8 | 0(0) | - | - | - | - | - | - |
| 9 | 0(0) | - | - | - | - | - | - |
Models trained on dataset579 and testing performed with dataset579, dataset2431 and 10 × cross validation on dataset579, features removed by increasing stringency of t-test of individual feature to activity from dataset579.
Feature numbers in parentheses are the average number of features in cross validations.
Entries in bold are column maximums and are simply visual landmarks.
Guide strand Xue features (Method 5) for training RBF-epsilon regression SVM model
| train2431 | train2431 | train2431 | |||||
| test2431 | test579 | test2431 10 × cross val | |||||
| FN2431 | |||||||
| 0 | 32(32) | 0.037 | 0.112 | 0.042 | |||
| 1 | 27(25.4) | 0.261 | 0.037 | 0.205 | 0.113 | 0.233 | 0.042 |
| 2 | 16(15.2) | 0.255 | 0.038 | 0.192 | 0.113 | 0.226 | 0.043 |
| 3 | 10(9.4) | 0.247 | 0.038 | 0.187 | 0.113 | 0.221 | 0.043 |
| 4 | 6(6.4) | 0.237 | 0.038 | 0.182 | 0.113 | 0.217 | 0.043 |
| 5 | 4(3.9) | 0.200 | 0.038 | 0.152 | 0.114 | 0.202 | 0.043 |
| 6 | 3(2.6) | 0.187 | 0.039 | 0.145 | 0.114 | 0.196 | 0.043 |
| 7 | 2(1.7) | 0.187 | 0.039 | 0.145 | 0.114 | - | - |
| 8 | 1(0.3) | 0.158 | 0.039 | 0.089 | 0.114 | - | - |
| 9 | 0(0) | - | - | - | - | - | - |
Models trained on dataset2431 and testing performed with dataset2431, dataset579 and 10 × cross validation on dataset2431, features removed by increasing stringency of t-test of individual feature to activity from dataset2431.
Feature numbers in parentheses are the average number of features in cross validations.
Entries in bold are column maximums and are simply visual landmarks.
Guide strand Xue features (Method 5) for training RBF-epsilon regression SVM model
| train579 | train579 | train579 | |||||
| test579 | test2431 | test579 10 × cross val | |||||
| FN579 | |||||||
| 0 | 32(32) | 0.207 | 0.096 | 0.164 | 0.046 | 0.194 | 0.107 |
| 1 | 20(18.8) | 0.095 | 0.167 | 0.046 | 0.107 | ||
| 2 | 8(7.9) | 0.212 | 0.095 | 0.048 | 0.108 | ||
| 3 | 1(1.2) | 0.141 | 0.097 | 0.155 | 0.047 | - | - |
| 4 | 0(0) | - | - | - | - | - | - |
| 5 | 0(0) | - | - | - | - | - | - |
| 6 | 0(0) | - | - | - | - | - | - |
| 7 | 0(0) | - | - | - | - | - | - |
| 8 | 0(0) | - | - | - | - | - | - |
| 9 | 0(0) | - | - | - | - | - | - |
Models trained on dataset579 and testing performed with dataset579, dataset2431 and 10 × cross validation on dataset579, features removed by increasing stringency of t-test of individual feature to activity from dataset579.
Feature numbers in parentheses are the average number of features in cross validations.
Entries in bold are column maximums and are simply visual landmarks.
Guide strand N-Grams (Method 11) for training RBF-epsilon regression SVM model
| train2431 | train2431 | train2431 | |||||
| test2431 | test579 | test2431 10 × cross val | |||||
| FN2431 | |||||||
| 0 | 1360(1360) | 0.026 | 0.246 | 0.109 | 0.033 | ||
| 1 | 777(771.3) | 0.604 | 0.026 | 0.069 | 0.526 | 0.034 | |
| 2 | 424(394.3) | 0.590 | 0.027 | 0.576 | 0.070 | 0.471 | 0.036 |
| 3 | 174(160.7) | 0.533 | 0.029 | 0.516 | 0.075 | 0.391 | 0.039 |
| 4 | 71(59.5) | 0.490 | 0.031 | 0.477 | 0.077 | 0.343 | 0.040 |
| 5 | 27(22.5) | 0.404 | 0.033 | 0.408 | 0.082 | 0.295 | 0.041 |
| 6 | 9(7.1) | 0.319 | 0.036 | 0.311 | 0.090 | 0.294 | 0.041 |
| 7 | 5(3.7) | 0.291 | 0.037 | 0.246 | 0.094 | 0.268 | 0.042 |
| 8 | 2(1.7) | 0.228 | 0.038 | 0.187 | 0.096 | - | - |
| 9 | 0(0) | - | - | - | - | - | - |
Models trained on dataset2431 and testing performed with dataset2431, dataset579 and 10 × cross validation on dataset2431, features removed by increasing stringency of t-test of individual feature to activity from dataset2431.
Feature numbers in parentheses are the average number of features in cross validations.
Entries in bold are column maximums and are simply visual landmarks.
Guide strand N-Grams (Method 11) for training RBF-epsilon regression SVM model
| train579 | train579 | train579 | |||||
| test579 | test2431 | test579 10 × cross val | |||||
| FN579 | |||||||
| 0 | 1360(1360) | 0.615 | 0.070 | 0.192 | 0.055 | 0.088 | |
| 1 | 591(586.1) | 0.641 | 0.063 | 0.028 | 0.421 | 0.093 | |
| 2 | 195(179.3) | 0.060 | 0.467 | 0.032 | 0.431 | 0.091 | |
| 3 | 42(37) | 0.502 | 0.073 | 0.340 | 0.035 | 0.323 | 0.099 |
| 4 | 7(6.4) | 0.370 | 0.085 | 0.212 | 0.038 | 0.224 | 0.105 |
| 5 | 3(1.5) | 0.250 | 0.093 | 0.094 | 0.040 | - | - |
| 6 | 0(0) | - | - | - | - | - | |
| 7 | 0(0) | - | - | - | - | - | - |
| 8 | 0(0) | - | - | - | - | - | - |
| 9 | 0(0) | - | - | - | - | - | - |
Models trained on dataset579 and testing performed with dataset579, dataset2431 and 10 × cross validation on dataset579, features removed by increasing stringency of t-test of individual feature to activity from dataset579.
Feature numbers in parentheses are the average number of features in cross validations.
Entries in bold are column maximums and are simply visual landmarks.
Target strand secondary structure (Method 13) for training RBF-epsilon regression SVM model
| train2431 | train2431 | train2431 | |||||
| test2431 | test579 | test2431 10 × cross val | |||||
| FN2431 | |||||||
| 0 | 43(43) | 0.025 | 0.045 | 0.111 | 0.277 | 0.042 | |
| 1 | 28(25.9) | 0.612 | 0.026 | 0.032 | 0.111 | 0.285 | 0.042 |
| 2 | 13(11.4) | 0.530 | 0.029 | 0.045 | 0.110 | 0.313 | 0.041 |
| 3 | 8(7.4) | 0.479 | 0.031 | 0.109 | 0.041 | ||
| 4 | 3(3.3) | 0.401 | 0.034 | 0.048 | 0.110 | 0.308 | 0.041 |
| 5 | 1(1.2) | 0.327 | 0.036 | 0.045 | 0.110 | 0.282 | 0.041 |
| 6 | 1(1) | 0.327 | 0.036 | 0.045 | 0.110 | 0.287 | 0.041 |
| 7 | 1(1) | 0.327 | 0.036 | 0.045 | 0.110 | 0.287 | 0.041 |
| 8 | 1(1) | 0.327 | 0.036 | 0.045 | 0.110 | 0.287 | 0.041 |
| 9 | 1(1) | 0.327 | 0.036 | 0.045 | 0.110 | 0.287 | 0.041 |
Models trained on dataset2431 and testing performed with dataset2431, dataset579 and 10 × cross validation on dataset2431, features removed by increasing stringency of t-test of individual feature to activity from dataset2431.
Feature numbers in parentheses are the average number of features in cross validations.
Entries in bold are column maximums and are simply visual landmarks.
Target strand secondary structure (Method 13) for training RBF-epsilon regression SVM model
| train579 | train579 | train579 | |||||
| test579 | test2431 | test579 10 × cross val | |||||
| FN579 | |||||||
| 0 | 43(43) | 0.055 | 0.070 | 0.104 | 0.105 | ||
| 1 | 21(19.7) | 0.408 | 0.082 | 0.077 | 0.059 | 0.194 | 0.107 |
| 2 | 5(5.9) | 0.282 | 0.091 | 0.053 | 0.095 | 0.111 | |
| 3 | 3(2.1) | 0.187 | 0.097 | 0.070 | 0.048 | 0.068 | 0.111 |
| 4 | 0(0.1) | - | - | - | - | - | - |
| 5 | 0(0) | - | - | - | - | - | - |
| 6 | 0(0) | - | - | - | - | - | |
| 7 | 0(0) | - | - | - | - | - | - |
| 8 | 0(0) | - | - | - | - | - | - |
| 9 | 0(0) | - | - | - | - | - | - |
Models trained on dataset579 and testing performed with dataset579, dataset2431 and 10 × cross validation on dataset579, features removed by increasing stringency of t-test of individual feature to activity from dataset579.
Feature numbers in parentheses are the average number of features in cross validations.
Entries in bold are column maximums and are simply visual landmarks.
Combining position specific composition (Method 1), thermodynamics (Method 2) and N-Gram (Method 11) for training RBF-epsilon regression SVM model
| train2431 | train2431 | train2431 | |||||
| test2431 | test579 | test2431 10 × cross val | |||||
| FN2431 | |||||||
| 0 | 1467(1467) | 0.793 | 0.015 | 0.518 | 0.098 | 0.023 | |
| 1 | 862(855.7) | 0.802 | 0.014 | 0.506 | 0.089 | 0.764 | 0.023 |
| 2 | 488(457.2) | 0.809 | 0.014 | 0.091 | 0.750 | 0.023 | |
| 3 | 223(208) | 0.817 | 0.013 | 0.526 | 0.091 | 0.728 | 0.025 |
| 4 | 108(94.2) | 0.012 | 0.498 | 0.094 | 0.712 | 0.026 | |
| 5 | 55(48.1) | 0.012 | 0.495 | 0.096 | 0.711 | 0.026 | |
| 6 | 29(23.1) | 0.817 | 0.013 | 0.503 | 0.098 | 0.696 | 0.026 |
| 7 | 17(15.2) | 0.770 | 0.016 | 0.444 | 0.099 | 0.673 | 0.027 |
| 8 | 11(9.6) | 0.702 | 0.020 | 0.452 | 0.106 | 0.616 | 0.030 |
| 9 | 6(5.9) | 0.606 | 0.025 | 0.340 | 0.109 | 0.603 | 0.031 |
Models trained on dataset2431 and testing performed with dataset2431, dataset579 and 10 × cross validation on dataset2431, features removed by increasing stringency of t-test of individual feature to activity from dataset2431.
Feature numbers in parentheses are the average number of features in cross validations.
Entries in bold are column maximums and are simply visual landmarks.
Combining position specific composition (Method 1), thermodynamics (Method 2) and N-Gram (Method 11) for training RBF-epsilon regression SVM model
| train579 | train579 | train579 | |||||
| test579 | test2431 | test579 10 × cross val | |||||
| FN579 | R | ||||||
| 0 | 1467(1467) | 0.753 | 0.046 | 0.064 | 0.070 | ||
| 1 | 653(647.9) | 0.780 | 0.041 | 0.537 | 0.056 | 0.647 | 0.069 |
| 2 | 228(211.4) | 0.038 | 0.521 | 0.058 | 0.640 | 0.071 | |
| 3 | 58(52.3) | 0.778 | 0.039 | 0.491 | 0.086 | 0.613 | 0.074 |
| 4 | 19(17.1) | 0.037 | 0.452 | 0.077 | 0.569 | 0.079 | |
| 5 | 10(7.7) | 0.581 | 0.064 | 0.361 | 0.041 | 0.504 | 0.085 |
| 6 | 5(4.8) | 0.525 | 0.071 | 0.443 | 0.052 | 0.424 | 0.094 |
| 7 | 3(1.6) | 0.374 | 0.086 | 0.436 | 0.050 | 0.405 | 0.095 |
| 8 | 1(0.6) | 0.339 | 0.089 | 0.422 | 0.051 | - | - |
| 9 | 0(0) | - | - | - | - | - | - |
Models trained on dataset579 and testing performed with dataset579, dataset2431 and 10 × cross validation on dataset579, features removed by increasing stringency of t-test of individual feature to activity from dataset579.
Feature numbers in parentheses are the average number of features in cross validations.
Entries in bold are column maximums and are simply visual landmarks.
Combining position specific composition (Method 1) and thermodynamics (Method 2) for training RBF-epsilon regression SVM model
| train2431 | train2431 | train2431 | |||||
| test2431 | test579 | test2431 10 × cross val | |||||
| FN2431 | |||||||
| 0 | 107(107) | 0.878 | 0.009 | 0.450 | 0.104 | 0.701 | 0.026 |
| 1 | 85(84.5) | 0.882 | 0.009 | 0.435 | 0.098 | 0.705 | 0.026 |
| 2 | 64(62.9) | 0.886 | 0.009 | 0.444 | 0.105 | 0.704 | 0.026 |
| 3 | 49(47.3) | 0.884 | 0.009 | 0.463 | 0.105 | 0.699 | 0.026 |
| 4 | 37(34.7) | 0.009 | 0.447 | 0.105 | 0.698 | 0.026 | |
| 5 | 28(25.6) | 0.866 | 0.010 | 0.453 | 0.105 | 0.026 | |
| 6 | 20(16) | 0.822 | 0.013 | 0.106 | 0.685 | 0.027 | |
| 7 | 12(11.5) | 0.757 | 0.017 | 0.417 | 0.107 | 0.672 | 0.028 |
| 8 | 9(7.9) | 0.684 | 0.021 | 0.439 | 0.107 | 0.614 | 0.030 |
| 9 | 6(5.9) | 0.606 | 0.025 | 0.340 | 0.109 | 0.603 | 0.031 |
Models trained on dataset2431 and testing performed with dataset2431, dataset579 and 10 × cross validation on dataset2431, features removed by increasing stringency of t-test of individual feature to activity from dataset2431.
Feature numbers in parentheses are the average number of features in cross validations.
Entries in bold are column maximums and are simply visual landmarks.
Combining position specific composition (Method 1) and thermodynamics (Method 2) for training RBF-epsilon regression SVM model
| train579 | train579 | train579 | |||||
| test579 | test2431 | test579 10 × cross val | |||||
| FN579 | |||||||
| 0 | 107(107) | 0.874 | 0.026 | 0.083 | 0.533 | 0.083 | |
| 1 | 62(61.8) | 0.023 | 0.486 | 0.083 | 0.521 | 0.085 | |
| 2 | 33(32.1) | 0.856 | 0.028 | 0.466 | 0.069 | 0.537 | 0.083 |
| 3 | 16(15.3) | 0.816 | 0.034 | 0.388 | 0.087 | 0.081 | |
| 4 | 12(10.7) | 0.796 | 0.036 | 0.412 | 0.072 | 0.527 | 0.084 |
| 5 | 7(6.2) | 0.537 | 0.069 | 0.288 | 0.051 | 0.497 | 0.087 |
| 6 | 5(4.8) | 0.525 | 0.071 | 0.442 | 0.052 | 0.424 | 0.094 |
| 7 | 3(1.6) | 0.374 | 0.086 | 0.435 | 0.050 | 0.405 | 0.095 |
| 8 | 1(0.6) | 0.339 | 0.089 | 0.421 | 0.051 | - | - |
| 9 | 0(0) | - | - | - | - | - | - |
Models trained on dataset579 and testing performed with dataset579, dataset2431 and 10 × cross validation on dataset579, features removed by increasing stringency of t-test of individual feature to activity from dataset579.
Feature numbers in parentheses are the average number of features in cross validations.
Entries in bold are column maximums and are simply visual landmarks.
Combining thermodynamics (Method 2) and N-Grams (Method 11) for training RBF-epsilon regression SVM model
| train2431 | train2431 | train2431 | |||||
| test2431 | test579 | test2431 10 × cross val | |||||
| FN2431 | |||||||
| 0 | 1383(1383) | 0.746 | 0.018 | 0.101 | 0.025 | ||
| 1 | 797(791.5) | 0.755 | 0.017 | 0.480 | 0.093 | 0.709 | 0.026 |
| 2 | 443(413.3) | 0.763 | 0.017 | 0.480 | 0.094 | 0.688 | 0.027 |
| 3 | 191(177.7) | 0.773 | 0.016 | 0.462 | 0.097 | 0.671 | 0.028 |
| 4 | 88(75.3) | 0.812 | 0.014 | 0.448 | 0.100 | 0.656 | 0.028 |
| 5 | 40(33.4) | 0.013 | 0.442 | 0.097 | 0.659 | 0.028 | |
| 6 | 18(14.8) | 0.789 | 0.015 | 0.448 | 0.096 | 0.659 | 0.028 |
| 7 | 12(10.2) | 0.759 | 0.017 | 0.435 | 0.100 | 0.656 | 0.028 |
| 8 | 7(5.9) | 0.686 | 0.021 | 0.428 | 0.106 | 0.600 | 0.031 |
| 9 | 4(3.9) | 0.587 | 0.026 | 0.335 | 0.111 | 0.586 | 0.032 |
Models trained on dataset2431 and testing performed with dataset2431, dataset579 and 10 × cross validation on dataset2431, features removed by increasing stringency of t-test of individual feature to activity from dataset2431.
Feature numbers in parentheses are the average number of features in cross validations.
Entries in bold are column maximums and are simply visual landmarks.
Combining thermodynamics (Method 2) and N-Grams (Method 11) for training RBF-epsilon regression SVM model
| train579 | train579 | train579 | |||||
| test579 | test2431 | test579 10 × cross val | |||||
| FN579 | |||||||
| 0 | 1383(1383) | 0.726 | 0.050 | 0.067 | 0.068 | ||
| 1 | 608(602) | 0.759 | 0.044 | 0.506 | 0.052 | 0.651 | 0.069 |
| 2 | 206(190.4) | 0.779 | 0.041 | 0.489 | 0.051 | 0.641 | 0.071 |
| 3 | 50(45.2) | 0.758 | 0.042 | 0.488 | 0.067 | 0.601 | 0.076 |
| 4 | 15(13.4) | 0.038 | 0.427 | 0.059 | 0.551 | 0.081 | |
| 5 | 7(5.5) | 0.553 | 0.068 | 0.310 | 0.041 | 0.486 | 0.088 |
| 6 | 3(2.8) | 0.503 | 0.073 | 0.460 | 0.050 | 0.409 | 0.095 |
| 7 | 2(1.4) | 0.359 | 0.087 | 0.463 | 0.047 | 0.404 | 0.095 |
| 8 | 1(0.6) | 0.339 | 0.089 | 0.422 | 0.051 | - | - |
| 9 | 0(0) | - | - | - | - | - | - |
Models trained on dataset579 and testing performed with dataset579, dataset2431 and 10 × cross validation on dataset579, features removed by increasing stringency of t-test of individual feature to activity from dataset579.
Feature numbers in parentheses are the average number of features in cross validations.
Entries in bold are column maximums and are simply visual landmarks.
Combining position specific base composition (Method 1) and N-Grams (Method 11) for training RBF-epsilon regression SVM model
| train2431 | train2431 | train2431 | |||||
| test2431 | test579 | test2431 10 × cross val | |||||
| FN2431 | |||||||
| 0 | 1444(1444) | 0.017 | 0.096 | 0.022 | |||
| 1 | 842(835.5) | 0.782 | 0.016 | 0.479 | 0.098 | 0.782 | 0.022 |
| 2 | 469(438.2) | 0.777 | 0.016 | 0.485 | 0.102 | 0.765 | 0.023 |
| 3 | 206(191) | 0.754 | 0.017 | 0.471 | 0.102 | 0.731 | 0.024 |
| 4 | 91(78.4) | 0.736 | 0.018 | 0.459 | 0.103 | 0.702 | 0.026 |
| 5 | 42(37.2) | 0.700 | 0.020 | 0.459 | 0.102 | 0.677 | 0.027 |
| 6 | 20(15.4) | 0.658 | 0.023 | 0.445 | 0.106 | 0.626 | 0.030 |
| 7 | 10(8.7) | 0.553 | 0.028 | 0.344 | 0.105 | 0.567 | 0.032 |
| 8 | 6(5.4) | 0.512 | 0.029 | 0.356 | 0.103 | 0.525 | 0.034 |
| 9 | 2(2) | 0.407 | 0.033 | 0.257 | 0.108 | 0.454 | 0.037 |
Models trained on dataset2431 and testing performed with dataset2431, dataset579 and 10 × cross validation on dataset2431, features removed by increasing stringency of t-test of individual feature to activity from dataset2431.
Feature numbers in parentheses are the average number of features in cross validations.
Entries in bold are column maximums and are simply visual landmarks.
Combining position specific base composition (Method 1) and N-Grams (Method 11) for training RBF-epsilon regression SVM model
| train579 | train579 | train579 | |||||
| test579 | test2431 | test579 10 × cross val | |||||
| FN579 | |||||||
| 0 | 1444(1444) | 0.751 | 0.054 | 0.423 | 0.051 | 0.070 | |
| 1 | 636(632) | 0.046 | 0.057 | 0.632 | 0.071 | ||
| 2 | 217(200.3) | 0.746 | 0.046 | 0.431 | 0.062 | 0.556 | 0.080 |
| 3 | 50(44.1) | 0.636 | 0.058 | 0.333 | 0.064 | 0.513 | 0.084 |
| 4 | 11(10.1) | 0.532 | 0.071 | 0.381 | 0.055 | 0.460 | 0.089 |
| 5 | 6(3.7) | 0.465 | 0.077 | 0.327 | 0.052 | 0.385 | 0.095 |
| 6 | 2(2) | 0.327 | 0.089 | 0.305 | 0.049 | 0.340 | 0.099 |
| 7 | 1(0.2) | 0.281 | 0.093 | 0.176 | 0.053 | - | - |
| 8 | 0(0) | - | - | - | - | - | - |
| 9 | 0(0) | - | - | - | - | - | - |
Models trained on dataset579 and testing performed with dataset579, dataset2431 and 10 × cross validation on dataset579, features removed by increasing stringency of t-test of individual feature to activity from dataset579.
Feature numbers in parentheses are the average number of features in cross validations.
Entries in bold are column maximums and are simply visual landmarks.
Combining position specific composition (Method 1), thermodynamics (Method 2), N-Gram (Method 11), guide strand structure (Method 4) and Xue features (Method 5) for training RBF-epsilon regression SVM model
| train2431 | train2431 | train2431 | |||||
| test2431 | test579 | test2431 10 × cross val | |||||
| FN2431 | |||||||
| 0 | 1523(1523) | 0.797 | 0.015 | 0.523 | 0.099 | 0.023 | |
| 1 | 910(901.3) | 0.807 | 0.014 | 0.513 | 0.090 | 0.023 | |
| 2 | 522(489.5) | 0.817 | 0.013 | 0.092 | 0.746 | 0.024 | |
| 3 | 247(230.8) | 0.825 | 0.013 | 0.518 | 0.092 | 0.726 | 0.025 |
| 4 | 123(108.7) | 0.850 | 0.011 | 0.495 | 0.097 | 0.710 | 0.026 |
| 5 | 64(57.1) | 0.011 | 0.504 | 0.097 | 0.709 | 0.026 | |
| 6 | 36(29.7) | 0.844 | 0.012 | 0.504 | 0.099 | 0.695 | 0.026 |
| 7 | 23(20.6) | 0.816 | 0.013 | 0.449 | 0.100 | 0.675 | 0.027 |
| 8 | 15(12.8) | 0.738 | 0.018 | 0.457 | 0.105 | 0.618 | 0.030 |
| 9 | 8(6.9) | 0.661 | 0.022 | 0.334 | 0.109 | 0.606 | 0.031 |
Models trained on dataset2431 and testing performed with dataset2431, dataset579 and 10 × cross validation on dataset2431, features removed by increasing stringency of t-test of individual feature to activity from dataset2431.
Feature numbers in parentheses are the average number of features in cross validations.
Entries in bold are column maximums and are simply visual landmarks.
Combining position specific composition (Method 1), thermodynamics (Method 2), N-Gram (Method 11), guide strand structure (Method 4) and Xue features (Method 5) for training RBF-epsilon regression SVM model
| train579 | train579 | train579 | |||||
| test579 | test2431 | test579 10 × cross val | |||||
| FN579 | |||||||
| 0 | 1523(1523) | 0.754 | 0.046 | 0.067 | 0.070 | ||
| 1 | 689(682.1) | 0.786 | 0.040 | 0.543 | 0.055 | 0.638 | 0.071 |
| 2 | 244(226.8) | 0.798 | 0.037 | 0.516 | 0.061 | 0.639 | 0.071 |
| 3 | 64(58) | 0.801 | 0.036 | 0.506 | 0.079 | 0.612 | 0.074 |
| 4 | 22(19.3) | 0.034 | 0.453 | 0.074 | 0.563 | 0.080 | |
| 5 | 10(7.7) | 0.580 | 0.064 | 0.360 | 0.041 | 0.504 | 0.085 |
| 6 | 5(4.8) | 0.525 | 0.071 | 0.442 | 0.052 | 0.424 | 0.094 |
| 7 | 3(1.6) | 0.374 | 0.086 | 0.435 | 0.050 | 0.405 | 0.095 |
| 8 | 1(0.6) | 0.339 | 0.089 | 0.421 | 0.051 | - | - |
| 9 | 0(0) | - | - | - | - | - | - |
Models trained on dataset579 and testing performed with dataset579, dataset2431 and 10 × cross validation on dataset579, features removed by increasing stringency of t-test of individual feature to activity from dataset579.
Feature numbers in parentheses are the average number of features in cross validations.
Entries in bold are column maximums and are simply visual landmarks.
Combining position specific composition (Method 1), thermodynamics (Method 2),N-Gram (Method 11), guide strand structure (Method 4) Xue features (Method 5) and target secondary structure (Method 13) for training RBF-epsilon regression SVM model
| train2431 | train2431 | train2431 | |||||
| test2431 | test579 | test2431 10 × cross val | |||||
| FN2431 | |||||||
| 0 | 1566(1566) | 0.826 | 0.013 | 0.523 | 0.099 | 0.710 | 0.027 |
| 1 | 938(927.2) | 0.842 | 0.012 | 0.514 | 0.090 | 0.723 | 0.026 |
| 2 | 535(500.9) | 0.856 | 0.011 | 0.092 | 0.025 | ||
| 3 | 255(238.2) | 0.873 | 0.010 | 0.519 | 0.092 | 0.718 | 0.025 |
| 4 | 126(112) | 0.902 | 0.008 | 0.496 | 0.097 | 0.705 | 0.026 |
| 5 | 65(58.3) | 0.912 | 0.007 | 0.505 | 0.097 | 0.701 | 0.026 |
| 6 | 37(30.7) | 0.007 | 0.505 | 0.099 | 0.687 | 0.027 | |
| 7 | 24(21.6) | 0.908 | 0.007 | 0.449 | 0.100 | 0.672 | 0.028 |
| 8 | 16(13.8) | 0.884 | 0.009 | 0.457 | 0.105 | 0.615 | 0.030 |
| 9 | 9(7.9) | 0.862 | 0.011 | 0.335 | 0.109 | 0.605 | 0.031 |
Models trained on dataset2431 and testing performed with dataset2431, dataset579 and 10 × cross validation on dataset2431, features removed by increasing stringency of t-test of individual feature to activity from dataset2431.
Feature numbers in parentheses are the average number of features in cross validations.
Entries in bold are column maximums and are simply visual landmarks.
Combining position specific composition (Method 1), thermodynamics (Method 2), N-Gram (Method 11), guide strand structure (Method 4) Xue features (Method 5) and target secondary structure (Method 13) for training RBF-epsilon regression SVM model
| train579 | train579 | train579 | |||||
| test579 | test2431 | test579 10 × cross val | |||||
| FN579 | |||||||
| 0 | 1566(1566) | 0.791 | 0.041 | 0.067 | 0.613 | 0.076 | |
| 1 | 710(701.8) | 0.796 | 0.039 | 0.543 | 0.055 | 0.071 | |
| 2 | 249(232.7) | 0.801 | 0.037 | 0.517 | 0.061 | 0.628 | 0.072 |
| 3 | 67(60.1) | 0.802 | 0.036 | 0.507 | 0.079 | 0.606 | 0.075 |
| 4 | 22(19.4) | 0.034 | 0.454 | 0.074 | 0.561 | 0.080 | |
| 5 | 10(7.7) | 0.581 | 0.064 | 0.361 | 0.041 | 0.504 | 0.085 |
| 6 | 5(4.8) | 0.525 | 0.071 | 0.443 | 0.052 | 0.424 | 0.094 |
| 7 | 3(1.6) | 0.374 | 0.086 | 0.436 | 0.050 | 0.405 | 0.095 |
| 8 | 1(0.6) | 0.339 | 0.089 | 0.422 | 0.051 | - | - |
| 9 | 0(0) | - | - | - | - | - | - |
Models trained on dataset579 and testing performed with dataset579, dataset2431 and 10 × cross validation on dataset579, features removed by increasing stringency of t-test of individual feature to activity from dataset579.
Feature numbers in parentheses are the average number of features in cross validations.
Entries in bold are column maximums and are simply visual landmarks.
Combining and filtering features for training RBF-epsilon regression SVM model on dataset2431 and on dataset579
| train2431 | train579 | ||||||||||
| test579 | test2431 10 × cross val | test2431 | test579 10 × cross val | ||||||||
| Method(s) | FN2431 | FN579 | |||||||||
| 1 | 0 | 84(84) | 0.095 | 0.711 | 0.026 | 84(84) | 0.054 | 0.079 | |||
| 1 | 2 | 45(43.9) | 0.494 | 0.100 | 0.687 | 0.027 | 22(21) | 0.467 | 0.058 | 0.449 | 0.091 |
| 2 | 0 | 23(23) | 0.379 | 0.105 | 0.640 | 0.029 | 23(23) | 0.372 | 0.065 | 0.500 | 0.087 |
| 2 | 2 | 19(19) | 0.363 | 0.113 | 0.641 | 0.029 | 11(11.1) | 0.330 | 0.061 | 0.548 | 0.082 |
| 11 | 0 | 1360(1360) | 0.246 | 0.109 | 0.033 | 1360(1360) | 0.192 | 0.055 | 0.088 | ||
| 11 | 2 | 424(394.3) | 0.576 | 0.070 | 0.471 | 0.036 | 195(179.3) | 0.467 | 0.032 | 0.431 | 0.091 |
| 1,2 | 0 | 107(107) | 0.450 | 0.104 | 0.701 | 0.026 | 107(107) | 0.083 | 0.533 | 0.083 | |
| 1,2 | 2 | 64(62.9) | 0.444 | 0.105 | 0.704 | 0.026 | 33(32.1) | 0.466 | 0.069 | 0.537 | 0.083 |
| 2,11 | 0 | 1383(1383) | 0.101 | 0.025 | 1383(1383) | 0.067 | |||||
| 2,11 | 2 | 443(413.3) | 0.480 | 0.094 | 0.688 | 0.027 | 206(190.4) | 0.489 | 0.051 | 0.641 | 0.071 |
| 1,11 | 0 | 1444(1444) | 0.096 | 1444(1444) | 0.423 | 0.051 | 0.070 | ||||
| 1,11 | 2 | 469(438.2) | 0.485 | 0.102 | 0.765 | 0.023 | 217(200.3) | 0.431 | 0.062 | 0.556 | 0.080 |
| 1,2,11 | 0 | 1467(1467) | 0.518 | 0.098 | 0.023 | 1467(1467) | 0.064 | 0.070 | |||
| 1,2,11 | 2 | 488(457.2) | 0.750 | 0.023 | 228(211.4) | 0.521 | 0.058 | 0.640 | 0.071 | ||
| 1,2,4,5,11 | 0 | 1523(1523) | 0.523 | 0.099 | 0.023 | 1523(1523) | 0.067 | 0.070 | |||
| 1,2,4,5,11 | 2 | 522(489.5) | 0.092 | 0.746 | 0.024 | 244(226.8) | 0.516 | 0.061 | 0.639 | 0.071 | |
| 1,2,4,5,11,13 | 0 | 1566(1566) | 0.523 | 0.099 | 0.710 | 0.027 | 1566(1566) | 0.613 | 0.076 | ||
| 1,2,4,5,11,13 | 2 | 535(500.9) | 0.092 | 0.025 | 249(232.7) | 0.517 | 0.061 | 0.628 | 0.072 | ||
Method numbers are from Table 1.
Models trained on dataset2431 and testing performed with dataset579 and 10 × cross validation on dataset2431, features removed by increasing stringency of t-test of individual feature to activity from dataset2431.
Models trained on dataset579 and testing performed with dataset2431 and 10 × cross validation on dataset579, features removed by increasing stringency of t-test of individual feature to activity from dataset579.
Feature numbers in parentheses are the average number of features in cross validations.
Entries in bold are column maximums from their respective tables and are provided as visual indicators. Italicized entries are column maximums within the table and are again provided as visual indicators.
Figure 5Correlation based Feature Selection (CFS) filtering by cross validation within dataset. Solid diamonds are the average number of features resulting from the CFS models graphed to the left y-axis. Open squares are the average CFS model correlations (R) graphed to the right y-axis. Open triangles are the average pair wise fraction of features found in common between CFS models by cross validation, graphed to the right y-axis. Closed small circles are the mean squared errors (MSE) of the models, graphed to the right y-axis.
Figure 6Venn diagrams representing the relationships among feature sets and subsets, and their model outcomes by cross validation within dataset. The black large circles representing the space of all 1566 possible features, formally this is set Swith cardinality of 1566. The smaller grey circles represent the feature subsets found by CFS selection and the intersections of the grey circles represent the features that were on average consistently found between pair wise comparisons among the cross validations. Specifically, Sand Sare the grey circles and they represent the average of pair wise comparisons of feature subsets found by cross validation on dataset2431 where S⊂Sand S⊂ Sand the average sub set cardinality is represented by the diameter of the grey circle and the S∩ Sis provided as the average fraction of features shared between sub sets.
Figure 7Correlation based Feature Selection (CFS) filtering by cross validation within dataset. Solid diamonds are the average number of features resulting from the CFS models graphed to the left y-axis. Open squares are the average CFS model correlations (R) graphed to the right y-axis. Open triangles are the average pair wise fraction of features found in common between CFS models by cross validation, graphed to the right y-axis. Closed small circles are the mean squared errors (MSE) of the models, graphed to the right y-axis.