| Literature DB >> 24330765 |
Abid Qureshi, Nishant Thakur, Manoj Kumar1.
Abstract
BACKGROUND: Selection of effective viral siRNA is an indispensable step in the development of siRNA based antiviral therapeutics. Despite immense potential, a viral siRNA efficacy prediction algorithm is still not available. Moreover, performances of the existing general mammalian siRNA efficacy predictors are not satisfactory for viral siRNAs. Therefore, we have developed "VIRsiRNApred" a support vector machine (SVM) based method for predicting the efficacy of viral siRNA.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24330765 PMCID: PMC3878835 DOI: 10.1186/1479-5876-11-305
Source DB: PubMed Journal: J Transl Med ISSN: 1479-5876 Impact factor: 5.531
Ten-fold cross validation performance of predictive models on viral siRNA dataset of 1380 sequences (T) using SVM, ANN, KNN and REP Tree machine learning techniques
| | Mononucleotide frequency | 4 | 0.19 | 0.10 | 0.11 | 0.10 | |
| Dinucleotide frequency | 16 | 0.32 | 0.29 | 0.29 | 0.29 | ||
| Trinucleotide frequency | 64 | 0.42 | 0.28 | 0.30 | 0.28 | ||
| Tetranucleotide frequency | 256 | 0.43 | 0.28 | 0.30 | 0.30 | ||
| Pentanucleotide frequency | 1024 | 0.46 | 0.29 | 0.30 | 0.30 | ||
| Binary | 76 | 0.19 | 0.10 | 0.11 | 0.11 | ||
| Thermodynamic features | 21 | 0.26 | 0.22 | 0.21 | 0.20 | ||
| Secondary structure | 28 | 0.07 | 0.04 | 0.04 | 0.04 | ||
| | 1 + 2 + 3 + 4 + 5 | 1364 | 0.48 | 0.30 | 0.31 | 0.31 | |
| 6 + 9 | 1440 | 0.50 | 0.36 | 0.41 | 0.32 | ||
| 6 + 7 + 8 + 9 | 1489 | 0.53 | 0.42 | 0.44 | 0.42 | ||
*Pearson Correlation Coefficient (PCC) is the correlation between experimental and predicted viral siRNA efficacy.
#T1380 is the training dataset of experimental viral siRNA. Predictive Models 1-8 were developed on individual siRNA features while models 9-12 were based on hybrid siRNA features.
Evaluation of performance of predictive models on validation dataset of 345 viral siRNAs(V)
| | Mononucleotide frequency | 4 | 0.16 | 0.08 | 0.09 | 0.08 | |
| Dinucleotide frequency | 16 | 0.30 | 0.23 | 0.22 | 0.24 | ||
| Trinucleotide frequency | 64 | 0.39 | 0.25 | 0.24 | 0.26 | ||
| Tetranucleotide frequency | 256 | 0.40 | 0.26 | 0.27 | 0.28 | ||
| Pentanucleotide frequency | 1024 | 0.42 | 0.27 | 0.28 | 0.29 | ||
| Binary | 76 | 0.03 | 0.02 | 0.02 | 0.01 | ||
| Thermodynamic features | 21 | 0.19 | 0.15 | 0.18 | 0.15 | ||
| Secondary structure | 28 | 0.02 | 0.02 | 0.02 | 0.02 | ||
| | 1 + 2 + 3 + 4 + 5 | 1364 | 0.48 | 0.32 | 0.34 | 0.30 | |
| 6 + 9 | 1440 | 0.48 | 0.32 | 0.34 | 0.32 | ||
| 6 + 7 + 8 + 9 | 1489 | 0.45 | 0.32 | 0.33 | 0.30 | ||
*Pearson Correlation Coefficient (PCC) is the correlation between experimental and predicted viral siRNA efficacy.
#V345 is the validation dataset of experimental viral siRNA not used in training. Predictive Models 1-8 were developed on individual siRNA features while models 9-12 were based on hybrid siRNA features.
Comparison of VIRsiRNApred with existing siRNA efficacy prediction algorithms developed using heterogeneous siRNA dataset
| 1 | [ | NA | GPBoost, SVM | 581 | | 0.46 | 0.40 | Server not available |
| 2 | [ | NA | ANN | 653 | 0.55 | 0.50 | Server not available | |
| 3 | [ | linear | 653 | 0.48 | 0.44 | Server not working | ||
| 4 | [ | NA | linear | 526 | 0.55 | 0.52 | Server not available | |
| 5 | [ | linear | 419 | 0.51 | 0.44 | Server not working | ||
| 6 | [ | SVM | 581 | 0.56 | 0.47 | 0.10 | ||
| 7 | VIRsiRNApred | SVM | 1380 | |||||
*Pearson Correlation Coefficient (PCC) is the correlation between experimental and predicted viral siRNA efficacy.
1Performance on n-fold training dataset of the study.
2Performance on validation data set of the study.
#V345 is the validation dataset of experimental viral siRNA. Algorithms from S.No. 1-6 used mammalian heterogeneous siRNA datasets while S.No. 7 used experimental viral siRNA dataset.
Comparison of VIRsiRNApred with existing siRNA efficacy prediction methods developed using mammalian homogeneous siRNA dataset
| 1 | [ | ANN | 2431 | | 0.66 | 0.60 | Server not available | |
| 2 | [ | Linear | 2431 | 0.67 | 0.57 | Server not working | ||
| 3 | [ | Rule, SVM, RFR | 3589 | 0.85 | 0.59 | 0.12 | ||
| 4 | [ | Linear | 2431 | 0.72 | NA | 0.05 | ||
| 5 | [ | SVM | NA | 2431, | 0.78 | 0.71 | Server not available | |
| 6 | [ | Linear | 702 | 0.77 | 0.60 | 0.18 | ||
| 7 | [ | SVM | 2280 | 0.68 | 0.66 | 0.10 | ||
| 8 | [ | SVM | 2431 | 0.77 | 0.53 | 0.09 | ||
| 9 | [ | Linear | 2182 | 0.67 | NA | Server not working | ||
| 10 | [ | SVM | NA | 2431 | 0.80 | 0.71 | Server not available | |
*Pearson Correlation Coefficient (PCC) is the correlation between experimental and predicted viral siRNA efficacy.
1Performance on n-fold training dataset of the study.
2Performance on test data set of the study.
#V345 is the validation dataset of experimental viral siRNA.
Performance of the SVM models using leave one out cross validation (LOOCV) method
| | Mononucleotide frequency | 4 | 0.32 | 0.29 | |
| Dinucleotide frequency | 16 | 0.36 | 0.32 | ||
| Trinucleotide frequency | 64 | 0.45 | 0.41 | ||
| Tetranucleotide frequency | 256 | 0.48 | 0.44 | ||
| Pentanucleotide frequency | 1024 | 0.52 | 0.48 | ||
| Binary | 76 | 0.26 | 0.14 | ||
| Thermodynamic features | 21 | 0.29 | 0.24 | ||
| Secondary structure | 28 | 0.10 | 0.06 | ||
| | 1 + 2 + 3 + 4 + 5 | 1364 | 0.52 | 0.49 | |
| 6 + 9 | 1440 | 0.54 | 0.51 | ||
| 6 + 7 + 8 + 9 | 1489 | 0.58 | 0.54 | ||
*Pearson Correlation Coefficient (PCC) is the correlation between experimental and predicted viral siRNA efficacy.
# T1380 is the training dataset of experimental viral siRNA. Predictive Models 1-8 were developed on individual siRNA features while models 9-12 were based on hybrid siRNA features.
Performance of the SVM model for each virus in the 1725 viral siRNA dataset using leave one virus out cross validation (LOVOCV) method
| 1 | Influenza A Virus | 1473 | 252 | 0.48 | 0.46 |
| 2 | Human Papillomavirus | 1513 | 212 | 0.43 | 0.40 |
| 3 | John Cunningham Virus | 1517 | 208 | 0.43 | 0.41 |
| 4 | Respiratory Syncytial Virus | 1577 | 148 | 0.45 | 0.41 |
| 5 | Human Immunodeficiency Virus | 1590 | 135 | 0.46 | 0.42 |
| 6 | Metapneumovirus | 1610 | 115 | 0.48 | 0.43 |
| 7 | Hepatitis B Virus | 1638 | 87 | 0.51 | 0.45 |
| 8 | Hepatitis C Virus | 1645 | 80 | 0.51 | 0.44 |
| 9 | Ebola Zaire Virus | 1652 | 73 | 0.49 | 0.43 |
| 10 | Human Coxsackievirus | 1653 | 72 | 0.50 | 0.47 |
| 11 | West Nile Virus | 1685 | 40 | 0.51 | 0.47 |
| 12 | Bovine Papillomavirus | 1689 | 36 | 0.52 | 0.48 |
| 13 | Influenza B Virus | 1689 | 36 | 0.52 | 0.46 |
| 14 | SARS Coronavirus | 1691 | 34 | 0.53 | 0.48 |
| 15 | Herpes Simplex Virus | 1704 | 21 | 0.54 | 0.48 |
| 16 | Human Rhinovirus | 1704 | 21 | 0.54 | 0.46 |
| 17 | Orthopoxvirus | 1705 | 20 | 0.55 | 0.49 |
| 18 | Measles Virus | 1709 | 16 | 0.56 | 0.51 |
| 19 | Hepatitis Delta Virus | 1710 | 15 | 0.56 | 0.51 |
| 20 | Reovirus | 1712 | 13 | 0.56 | 0.51 |
| 21 | African Swine Fever Virus | 1714 | 11 | 0.55 | 0.49 |
| 22 | Dengue Virus | 1714 | 11 | 0.56 | 0.49 |
| 23 | Hazara Nairovirus | 1714 | 11 | 0.56 | 0.49 |
| 24 | Enterovirus | 1717 | 8 | 0.56 | 0.50 |
| 25 | Epstein-Barr Virus | 1719 | 6 | 0.56 | 0.52 |
| 26 | Hepatitis A Virus | 1719 | 6 | 0.56 | 0.51 |
| 27 | Human Metapneumovirus | 1719 | 6 | 0.58 | 0.51 |
| 28 | Hepatitis E Virus | 1720 | 5 | 0.58 | 0.53 |
| 29 | Japanese Encephalitis Virus | 1720 | 5 | 0.58 | 0.51 |
| 30 | St. Louis Encephalitis | 1720 | 5 | 0.58 | 0.53 |
| 31 | Junin Virus | 1721 | 4 | 0.58 | 0.52 |
| 32 | Yellow Fever Virus | 1721 | 4 | 0.58 | 0.52 |
| 33 | Lassa Virus | 1722 | 3 | 0.58 | 0.52 |
| 34 | Rotavirus | 1723 | 2 | 0.58 | 0.53 |
| 35 | Sendai Virus | 1723 | 2 | 0.58 | 0.51 |
| 36 | Marburg Virus | 1724 | 1 | 0.58 | 0.52 |
| 37 | Polio Virus | 1724 | 1 | 0.58 | 0.53 |
*Pearson Correlation Coefficient (PCC) is the correlation between experimental and predicted viral siRNA efficacy.
#During 10-fold cross validation training, best performing viral siRNA sequence feature combining composition (mono to penta nucleotide frequency), binary and thermo features: 6 + 7 + 9) were used.
Figure 1Workflow of the VIRsiRNApred model development.
Figure 2Web server and its functionality (top) submit page (bottom) result output.