| Literature DB >> 26212482 |
Gandharva Nagpal1, Sudheer Gupta1, Kumardeep Chaudhary1, Sandeep Kumar Dhanda1, Satya Prakash1, Gajendra P S Raghava1.
Abstract
Immunomodulatory oligodeoxynucleotides (IMODNs) are the short DNA sequences that activate the innate immune system via toll-like receptor 9. These sequences predominantly contain unmethylated CpG motifs. In this work, we describe VaccineDA (Vaccine DNA adjuvants), a web-based resource developed to design IMODN-based vaccine adjuvants. We collected and analyzed 2193 experimentally validated IMODNs obtained from the literature. Certain types of nucleotides (e.g., T, GT, TC, TT, CGT, TCG, TTT) are dominant in IMODNs. Based on these observations, we developed support vector machine-based models to predict IMODNs using various compositions. The developed models achieved the maximum Matthews Correlation Coefficient (MCC) of 0.75 with an accuracy of 87.57% using the pentanucleotide composition. The integration of motif information further improved the performance of our model from the MCC of 0.75 to 0.77. Similarly, models were developed to predict palindromic IMODNs and attained a maximum MCC of 0.84 with the accuracy of 91.94%. These models were evaluated using a five-fold cross-validation technique as well as validated on an independent dataset. The models developed in this study were integrated into VaccineDA to provide a wide range of services that facilitate the design of DNA-based vaccine adjuvants (http://crdd.osdd.net/raghava/vaccineda/).Entities:
Mesh:
Substances:
Year: 2015 PMID: 26212482 PMCID: PMC4515643 DOI: 10.1038/srep12478
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Bar graph showing average percent nucleotide composition of IMODNs, non-IMODNs, bacterial genomes and the human genome.
Figure 2Dinucleotide composition of IMODNs and non-IMODNs represented by Circos Plot.
The width of the ribbons shows average percent composition of the dinucleotides in IMODNs and non-IMODNs.
The performance of models developed on training datasets-IMODN2193_train and IMODN966P_train using various compositional features.
| Feature | IMODN2193_train | IMODN966P_train | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Thres | Sen | Spec | Acc | MCC | AUC | Thres | Sen | Spec | Acc | MCC | AUC | |
| MNC | 0.2 | 76.17 | 75.83 | 76.00 | 0.52 | 0.82 | 0.1 | 78.39 | 78.26 | 78.32 | 0.57 | 0.86 |
| DNC | −0.1 | 83.64 | 84.83 | 84.24 | 0.68 | 0.92 | −0.1 | 84.53 | 83.63 | 84.08 | 0.68 | 0.92 |
| TNC | −0.1 | 85.12 | 85.63 | 85.38 | 0.71 | 0.93 | −0.1 | 87.08 | 87.08 | 87.08 | 0.74 | 0.95 |
| TetNC | −0.1 | 85.52 | 86.49 | 86.00 | 0.72 | 0.94 | −0.1 | 89.64 | 89.90 | 89.77 | 0.80 | 0.96 |
| PNC | −0.2 | 87.51 | 87.63 | 87.57 | 0.75 | 0.94 | −0.2 | 90.15 | 90.54 | 90.35 | 0.81 | 0.97 |
Thres Threshold, Sen Sensitivity (%), Spec Specificity (%), Acc Accuracy (%), MCC Matthews Correlation Coefficient, AUC Area Under the Curve, MNC Monoucleotide Composition, DNC Dinucleotide Composition, TNC Trinucleotide Composition, TetNC Tetranucleotide Composition, PNC Pentanucleotide Composition.
The performance of models developed using hybrid features on training datasets-IMODN2193_train and IMODN966P_train.
| Feature | Motifs | Thres | IMODN2193_train | IMODN966P_train | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Sen | Spec | Acc | MCC | AUC | Sen | Spec | Acc | MCC | AUC | |||
| Exclusive | Top10 | −0.2 | 87.8 | 88.65 | 88.23 | 0.76 | 0.95 | 91.18 | 91.05 | 91.11 | 0.82 | 0.97 |
| Exclusive | Top20 | −0.2 | 88.03 | 89.17 | 88.6 | 0.77 | 0.96 | 91.18 | 92.71 | 91.94 | 0.84 | 0.98 |
| Relax_200 | Top10 | −0.2 | 85.69 | 86.83 | 86.26 | 0.73 | 0.93 | 90.15 | 91.05 | 90.6 | 0.81 | 0.98 |
| Relax_200 | Top20 | −0.2 | 86.09 | 86.83 | 86.46 | 0.73 | 0.93 | 90.41 | 91.05 | 90.73 | 0.81 | 0.97 |
| Relax_300 | Top10 | −0.2 | 85.35 | 86.32 | 85.83 | 0.72 | 0.93 | 89.90 | 91.43 | 90.66 | 0.81 | 0.97 |
| Relax_300 | Top20 | −0.2 | 85.92 | 86.89 | 86.4 | 0.73 | 0.93 | 90.15 | 91.05 | 90.6 | 0.81 | 0.97 |
| Relax_400 | Top10 | −0.2 | 87.23 | 86.32 | 86.77 | 0.74 | 0.93 | 90.03 | 90.92 | 90.47 | 0.81 | 0.97 |
| Relax_400 | Top20 | −0.2 | 87.17 | 87.00 | 87.09 | 0.74 | 0.94 | 90.15 | 90.66 | 90.41 | 0.81 | 0.97 |
Exclusive MERCI motifs found exclusively in positive sequences, Top10 top 10 MERCI motifs in the category, Relax_200 MERCI motifs found in positive sequences and up to 200 sequences in negative dataset.
Figure 3A representative scheme of datasets and their use in model development.