| Literature DB >> 26861761 |
Kumardeep Chaudhary1, Gandharva Nagpal1, Sandeep Kumar Dhanda1, Gajendra P S Raghava1.
Abstract
Our innate immune system recognizes a foreign RNA sequence of a pathogen and activates the immune system to eliminate the pathogen from our body. This immunomodulatory potential of RNA can be used to design RNA-based immunotherapy and vaccine adjuvants. In case of siRNA-based therapy, the immunomodulatory effect of an RNA sequence is unwanted as it may cause immunotoxicity. Thus, we developed a method for designing a single-stranded RNA (ssRNA) sequence with desired immunomodulatory potentials, for designing RNA-based therapeutics, immunotherapy and vaccine adjuvants. The dataset used for training and testing our models consists of 602 experimentally verified immunomodulatory oligoribonucleotides (IMORNs) that are ssRNA sequences of length 17 to 27 nucleotides and 520 circulating miRNAs as non-immunomodulatory sequences. We developed prediction models using various features that include composition-based features, binary profile, selected features, and hybrid features. All models were evaluated using five-fold cross-validation and external validation techniques; achieving a maximum mean Matthews Correlation Coefficient (MCC) of 0.86 with 93% accuracy. We identified motifs using MERCI software and observed the abundance of adenine (A) in motifs. Based on the above study, we developed a web server, imRNA, comprising of various modules important for designing RNA-based therapeutics (http://crdd.osdd.net/raghava/imrna/).Entities:
Mesh:
Substances:
Year: 2016 PMID: 26861761 PMCID: PMC4748260 DOI: 10.1038/srep20678
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1A diagrammatic representation of major mammalian pattern recognition receptors (PRRs) with their localization within the cell.
Figure 2Comparison of Mononucleotide Composition between immunomodulatory and non-immunomodulatory RNAs.
Figure 3Comparison of Dinucleotide Composition between immunomodulatory and non-immunomodulatory RNA.
Figure 4Comparison of Trinucleotide Composition between immunomodulatory and non-immunomodulatory RNA.
Distribution of the positive and negative motifs with their coverage and maximum number of gaps allowed.
| Positive motifs | MaximumGap length | Coverage | Negative motifs | MaximumGap length | Coverage |
|---|---|---|---|---|---|
| AAA-AA-AA-A | 5 | 270 | C-G-G-G-GG-G | 5 | 81 |
| AA-AAA-AA-AA | 5 | 268 | G-C-G-GGG | 5 | 66 |
| A-AAAA-AA-A | 5 | 268 | C-G-G-GGG | 5 | 66 |
| AAAAAA | 0 | 267 | C-C-C-GGG | 5 | 62 |
| AAA-G-A-AAA-A-G | 5 | 266 | G-C-GGG | 2 | 61 |
| AA-AAA-A-G-AA | 2 | 194 | C-G-GG-G | 1 | 59 |
| A-A-G-C-AAA | 1 | 141 | A-GC-G-GG-G | 5 | 59 |
| AA-G-C-A-AA | 1 | 139 | C-G-GGG | 2 | 55 |
| A-AA-G-G-A-A-A | 1 | 103 | G-C-G-GG | 1 | 54 |
| G-AAAA-A-A | 1 | 98 | AG-GC-G-AG | 5 | 53 |
The motifs were extracted using MERCI program.
Figure 5Two-Sample Logo of the 17 positions in the 5′ and 3′ termini of the immunomodulatory (enriched) and non-immunomodulatory (depleted) RNA sequences.
Average performance obtained from internal five-fold cross-validation of the SVM models developed for predicting immunomodulatory potential of RNA sequence using different types of features.
| Features (Composition) | Thres | Average Performance of Models | ||||
|---|---|---|---|---|---|---|
| Type | Vector size | Sen | Spec | Acc | MCC | |
| MNC | 4 | 0 | 78.01 ± 1.87 | 77.98 ± 1.09 | 78.29 ± 1.45 | 0.57 ± 0.03 |
| DNC | 16 | 0 | 88.84 ± 1.06 | 89.73 ± 1.11 | 89.25 ± 0.84 | 0.78 ± 0.02 |
| TNC | 64 | 0 | 91.83 ± 0.81 | 91.71 ± 1.09 | 91.83 ± 0.83 | 0.84 ± 0.02 |
| TetNC | 256 | 0 | 91.85 ± 0.51 | 90.70 ± 1.31 | 91.32 ± 0.71 | 0.83 ± 0.02 |
| PNC | 1024 | 0 | 93.96 ± 0.67 | 91.88 ± 1.41 | 93.00 ± 0.83 | 0.86 ± 0.02 |
| 5′ bin | 68 | 0 | 89.00 ± 1.06 | 88.87 ± 0.79 | 88.94 ± 0.69 | 0.78 ± 0.01 |
| 3′ bin | 68 | 0 | 89.48 ± 1.09 | 88.39 ± 1.13 | 88.82 ± 1.24 | 0.78 ± 0.02 |
| 5′ - 3′ bin | 136 | 0 | 91.70 ± 0.60 | 88.99 ± 1.52 | 90.44 ± 0.89 | 0.81 ± 0.02 |
| hybrid | 200 | 0 | 91.20 ± 0.84 | 92.48 ± 0.85 | 91.79 ± 0.70 | 0.84 ± 0.01 |
Thres: Threshold, Sen: Sensitivity, Spec: Specificity, Acc: Accuracy, MCC: Matthews Correlation Coefficient, MNC: Monoucleotide Composition, DNC: Dinucleotide Composition, TNC: Trinucleotide Composition, TetNC: Tetranucleotide Composition, PNC: Pentanucleotide Composition, 5′bin: Binary profile 5′, 3′bin: Binary profile 3′, 5′-3′bin: Binary Profile 5′-3′, hybrid: Binary Profile 5′-3′+Trinucleotide Composition.
Average performance obtained from the external validation of SVM models on independent datasets developed using different types of features.
| Features (Composition) | Thres | Average Performance of Models | ||||
|---|---|---|---|---|---|---|
| Types | Vector size | Sen | Spec | Acc | MCC | |
| MNC | 4 | 0 | 78.42 ± 5.26 | 79.71 ± 3.38 | 79.02 ± 2.44 | 0.58 ± 0.04 |
| DNC | 16 | 0 | 90.50 ± 3.60 | 89.42 ± 2.03 | 90.00 ± 2.43 | 0.80 ± 0.05 |
| TNC | 64 | 0 | 92.58 ± 2.27 | 88.65 ± 2.02 | 90.80 ± 0.82 | 0.82 ± 0.02 |
| TetNC | 256 | 0 | 92.83 ± 2.52 | 90.58 ± 2.47 | 91.79 ± 1.56 | 0.84 ± 0.03 |
| PNC | 1024 | 0 | 95.00 ± 2.39 | 91.73 ± 2.09 | 93.48 ± 1.49 | 0.87 ± 0.03 |
| 5′ bin | 68 | 0 | 89.83 ± 2.88 | 88.85 ± 2.91 | 89.38 ± 1.83 | 0.79 ± 0.04 |
| 3′ bin | 68 | 0 | 91.25 ± 3.00 | 85.96 ± 3.15 | 88.79 ± 1.57 | 0.78 ± 0.03 |
| 5′-3′ bin | 136 | 0 | 93.92 ± 3.47 | 90.67 ± 3.82 | 92.41 ± 3.09 | 0.84 ± 0.06 |
| hybrid | 200 | 0 | 93.50 ± 3.23 | 91.44 ± 3.19 | 92.54 ± 2.34 | 0.85 ± 0.05 |
Lengthwise structure analysis of IMORNs and non-IMORNs as predicted by the RNAfold program.
| Length Bin | IMORNs | non-IMORNs | ||||||
|---|---|---|---|---|---|---|---|---|
| 17–20 | 21–23 | 24–27 | Total | 17–20 | 21–23 | 24–27 | Total | |
| # of Sequences | 425 | 68 | 109 | 602 | 44 | 454 | 22 | 520 |
| # of Extended Linear Structure Sequences | 273 (64.23%) | 32 (47.05%) | 48 (44.04%) | 353 (58.64%) | 11 (25.00%) | 115 (25.33%) | 2 (9.09%) | 128 (24.62%) |
| # of Stem Loop Structure Sequences | 152 (35.76%) | 36 (52.95%) | 61 (55.96%) | 249 (41.36%) | 33 (75.00%) | 339 (74.67%) | 20 (90.91%) | 392 (75.38%) |
| Total # of loops in sequences | 152 | 36 | 65 | 253 | 33 | 340 | 23 | 396 |
| # of loops with at least one Uridine | 81 (53.29%) | 31 (86.11%) | 61 (93.84%) | 173 (68.37%) | 26 (78.79%) | 247 (72.65%) | 16 (69.56%) | 289 (72.97%) |
Figure 6Flow diagram showing the sampling method used for evaluating the prediction models.