| Literature DB >> 36116149 |
Nimisha Ghosh1, Indrajit Saha2, Nikhil Sharma3, Suman Nandi4.
Abstract
In the worrisome scenarios of various waves of SARS-CoV-2 pandemic, a comprehensive bioinformatics pipeline is essential to analyse the virus genomes in order to understand its evolution, thereby identifying mutations as signature SNPs, conserved regions and subsequently to design epitope based synthetic vaccine. We have thus performed multiple sequence alignment of 4996 Indian SARS-CoV-2 genomes as a case study using MAFFT followed by phylogenetic analysis using Nextstrain to identify virus clades. Furthermore, based on the entropy of each genomic coordinate of the aligned sequences, conserved regions are identified. After refinement of the conserved regions, based on its length, one conserved region is identified for which the primers and probes are reported for virus detection. The refined conserved regions are also used to identify T-cell and B-cell epitopes along with their immunogenic and antigenic scores. Such scores are used for selecting the most immunogenic and antigenic epitopes. By executing this pipeline, 40 unique signature SNPs are identified resulting in 23 non-synonymous signature SNPs which provide 28 amino acid changes in protein. On the other hand, 12 conserved regions are selected based on refinement criteria out of which one is selected as the potential target for virus detection. Additionally, 22 MHC-I and 21 MHC-II restricted T-cell epitopes with 10 unique HLA alleles each and 17 B-cell epitopes are obtained for 12 conserved regions. All the results are validated both quantitatively and qualitatively which show that from genetic variability to synthetic vaccine design, the proposed pipeline can be used effectively to combat SARS-CoV-2.Entities:
Keywords: Bioinformatics Pipeline; Clade; Conserved Regions; Non-synonymous signature SNP; SARS-CoV-2; T-cell epitopes
Mesh:
Substances:
Year: 2022 PMID: 36116149 PMCID: PMC9444899 DOI: 10.1016/j.intimp.2022.109224
Source DB: PubMed Journal: Int Immunopharmacol ISSN: 1567-5769 Impact factor: 5.714
Fig. 1Pipeline of the work.
Fig. 2(a) Phylogenetic Tree in Radial view (b) Geographical Distribution (c) Phylogenetic Tree in Rectangular view (d) Value of Entropy for the change in Nucleotide (e) Coding Regions of SARS-CoV-2 Genome (f) Signature SNPs (g) Venn Diagram of 5 clades and (h) Identification of Primers and Probes using Primer-BLAST.
List of Signature SNPs in each clade for 4996 Indian SARS-CoV-2 Genomes.
| Clade | Genomic | Frequency | Nucleotide | Protein | Protein | Mapped with Coding and |
|---|---|---|---|---|---|---|
| Position | Change | Change | Coordinate | Non-Coding Region | ||
| 19A | 11083 | 425 | G | Synonymous, L | 37 | NSP6 |
| 13730 | 374 | C | A | 97 | RdRp | |
| 28311 | 364 | C | P | 13 | Nucleocapsid | |
| 23929 | 360 | C | Synonymous | 789 | Spike | |
| 6312 | 359 | C | T | 1198 | NSP3 | |
| 19524 | 111 | C | Synonymous | 495 | Exon | |
| 6310 | 98 | C | S | 1197 | NSP3 | |
| 1397 | 77 | G | V | 198 | NSP2 | |
| 29742 | 77 | G | Not Present | Not Present | 3’ UTR | |
| 28688 | 74 | T | Synonymous | 139 | Nucleocapsid | |
| 19B | 28144 | 87 | T | L | 84 | ORF8 |
| 8782 | 86 | C | Synonymous | 76 | NSP4 | |
| 28878 | 83 | G | S | 202 | Nucleocapsid | |
| 29742 | 81 | G | Not Present | Not Present | 3’ UTR | |
| 22468 | 62 | G | Synonymous, Synonymous | 302 | Spike | |
| 11230 | 19 | G | M | 86 | NSP6 | |
| 7945 | 16 | C | Synonymous | 1742 | NSP3 | |
| 28167 | 15 | G | E | 92 | ORF8 | |
| 2705 | 9 | A | T | 634 | NSP2 | |
| 14500 | 9 | G | V | 354 | RdRp | |
| 20A | 23403 | 2472 | A | D | 614 | Spike |
| 241 | 2458 | C | Not Present | Not Present | 5’ UTR | |
| 3037 | 2455 | C | Synonymous | 106 | NSP3 | |
| 14408 | 2377 | C | P | 323 | RdRp | |
| 26735 | 1432 | C | Synonymous | 71 | Membrane | |
| 18877 | 1427 | C | Synonymous | 280 | Exon | |
| 25563 | 1418 | G | Synonymous, Q | 57 | ORF3a | |
| 28854 | 1230 | C | S | 194 | Nucleocapsid | |
| 22444 | 1191 | C | Synonymous | 294 | Spike | |
| 2836 | 557 | C | Synonymous | 39 | NSP3 | |
| 20B | 3037 | 1923 | C | Synonymous | 106 | NSP3 |
| 241 | 1922 | C | Not Present | Not Present | 5’ UTR | |
| 23403 | 1922 | A | D | 614 | Spike | |
| 14408 | 1912 | C | P | 323 | RdRp | |
| 28881 | 1868 | G | R | 203 | Nucleocapsid | |
| 28882 | 1868 | G | Synonymous | 203 | Nucleocapsid | |
| 28883 | 1867 | G | G | 204 | Nucleocapsid | |
| 313 | 1120 | C | Synonymous | 16 | Leader protein | |
| 5700 | 1106 | C | A | 994 | NSP3 | |
| 4354 | 281 | G | Synonymous | 545 | NSP3 | |
| 20C | 241 | 18 | C | Not Present | Not Present | 5’ UTR |
| 1059 | 18 | C | T | 85 | NSP2 | |
| 3037 | 18 | C | Synonymous | 106 | NSP3 | |
| 14408 | 18 | C | P | 323 | RdRp | |
| 23403 | 18 | A | D | 614 | Spike | |
| 25563 | 18 | G | Synonymous, Q | 57 | ORF3a | |
| 16260 | 9 | C | Synonymous | 8 | Helicase | |
| 28821 | 9 | C | S | 183 | Nucleocapsid | |
| 28221 | 4 | G | E | 110 | ORF8 | |
| 28371 | 4 | G | S | 33 | Nucleocapsid | |
Fig. 3Highlighted amino acid changes in the protein structures for the non-synonymous signature SNPs of (a) NSP2 (b) NSP3 (c) NSP6 (d) RdRp (e) Spike (f) ORF3a (g) ORF8 and (h) Nucleocapsid.
Sequence and structural homology-based prediction for non-synonymous signature SNPs along with their protein structural stability.
| Clade | Genomic | Amino residue | Protein | PROVEAN | PolyPhen-2 | I-Mutant 2.0 | |||
|---|---|---|---|---|---|---|---|---|---|
| Coordinates | Change | Effect | Score | Prediction | Score | Stability | DDG | ||
| 19A | 11083 | L37F | NSP6 | Neutral | -1.369 | Benign | 0.027 | Decrease | 0.05 |
| − | − | ||||||||
| 28311 | P13L | Nucleocapsid | Neutral | -1.23 | Probably Damaging | 1.000 | Increase | 0.11 | |
| 6312 | T1198I | NSP3 | Neutral | -0.085 | Probably Damaging | 0.998 | Decrease | -0.72 | |
| 6312 | T1198K | NSP3 | Neutral | −0.353 | NG | NG | Decrease | -1.37 | |
| 6310 | S1197R | NSP3 | Neutral | -0.835 | NG | NG | Decrease | -0.88 | |
| 1397 | V198I | NSP2 | Neutral | 0.307 | Benign | 0.006 | Increase | 0.18 | |
| 19B | 28144 | L84S | ORF8 | Neutral | 2.333 | Benign | 0.002 | Decrease | -2.87 |
| 28878 | S202N | Nucleocapsid | Neutral | -0.404 | Probably Damaging | 0.994 | Decrease | -0.8 | |
| 28878 | S202I | Nucleocapsid | Deleterious | -3.308 | Probably Damaging | 0.998 | Increase | 0.22 | |
| 28878 | S202T | Nucleocapsid | Neutral | -1.428 | Probably Damaging | 0.986 | Decrease | -0.53 | |
| 11230 | M86I | NSP6 | Neutral | -0.427 | Benign | 0.025 | Decrease | -1.02 | |
| 28167 | E92K | ORF8 | Neutral | -1.5 | NG | NG | Decrease | -1.05 | |
| 2705 | T634A | NSP2 | Neutral | -0.004 | Benign | 0.106 | Decrease | -1.13 | |
| − | − | ||||||||
| 20A | 23403 | D614G | Spike | Neutral | 0.598 | Benign | 0.004 | Decrease | -1.94 |
| 14408 | P323L | RdRp | Neutral | -0.865 | Benign | 0.005 | Decrease | -0.80 | |
| − | − | ||||||||
| 28854 | S194L | Nucleocapsid | Deleterious | -4.272 | Probably Damaging | 0.994 | Increase | 0.45 | |
| 20B | 23403 | D614G | Spike | Neutral | 0.598 | Benign | 0.004 | Decrease | -1.94 |
| 14408 | P323L | RdRp | Neutral | -0.865 | Benign | 0.005 | Decrease | -0.80 | |
| 28881 | R203K | Nucleocapsid | Neutral | -1.604 | Probably Damaging | 0.969 | Decrease | -2.26 | |
| − | − | ||||||||
| 28883 | G204R | Nucleocapsid | Neutral | -1.656 | Probably Damaging | 1 | Decrease | 0 | |
| 5700 | A994D | NSP3 | Neutral | -1.103 | NG | NG | Decrease | -0.78 | |
| 20C | − | − | |||||||
| 14408 | P323L | RdRp | Neutral | -0.865 | Benign | 0.005 | Decrease | -0.80 | |
| 23403 | D614G | Spike | Neutral | 0.598 | Benign | 0.004 | Decrease | -1.94 | |
| − | − | ||||||||
| 28821 | S183Y | Nucleocapsid | Deleterious | -2.75 | Probably Damaging | 0.998 | Increase | 0 | |
| 28221 | E110Q | ORF8 | Neutral | -0.25 | NG | NG | Decrease | -1.13 | |
| 28371 | S33I | Nucleocapsid | Neutral | -1.372 | NG | NG | Increase | 0.63 | |
Conserved Regions (CnRs) as derived from 4996 SARS-CoV-2 genomes with associated details
| DNA Sequence of | Protein | Length | BLAST Specificity | % of BLAST Specificity | Coding | Starting | Ending | Length of | Coded |
|---|---|---|---|---|---|---|---|---|---|
| Conserved Region (CnR) | Sequence | of CnR | Score of CnR | Score as Query Coverage | Region (CR) | Coordinate | Coordinate | Coding Region | Proteins |
| 1282-CACTTGCGAATTTTGTGGCACTGAGAATTTGACTAAAGAAGGTGCCACTACTTGTGGTTACTTACCCCAAAATGCTGTTGTTAAAATTTATTGTCCAGCATGTCACAATTCAGAAGTAGGACCTGAGCATAGTCTTG-1418 | TCEFCGTENLTKEGATTCGYLPQNAVVKIYCPACHNSEVGPEHSL | 137 | 254 | 100 | ORF1ab | 266 | 21552 | 21287 | NSP2 |
| 12422-AGAGATGGTTGTGTTCCCTTGAACATAATACCTCTTACAACAGCAGCCAAACTAATGGTTGTCATACCAGACTATAACACATATAAAAATACGTGTGATGGTACAACATTTACTTATGCATCAGCATTGTGGGAAAT-12558 | RDGCVPLNIIPLTTAAKLMVVIPDYNTYKNTCDGTTFTYASALWE | 137 | 254 | 100 | ORF1ab | 266 | 21552 | 21287 | NSP8 |
| 13125-GGGGACAACCAATCACTAATTGTGTTAAGATGTTGTGTACACACACTGGTACTGGTCAGGCAATAACAGTTACACCGGAAGCCAATATGGATCAAGAATCCTTTGGTGGTGCATCGTGTTGTCTGTACTGCCGTTGCCACATAGATCATCCAAATCCTAAAGGATTTTGTGACTTAAAAGGTAAGTATGTACAAATACCTACAACTTGTGCTAATGACCCTGTGGGTTTTACACTTAAAAACACAGT-13371 | GQPITNCVKMLCTHTGTGQAITVTPEANMDQESFGGASCCLYCRCHIDHPNPKGFCDLKGKYVQIPTTCANDPVGFTLKNT | 247 | 457 | 100 | ORF1ab | 266 | 21555 | 21290 | NSP10 |
| 14075-TCAATGGTAACTGGTATGATTTCGGTGATTTCATACAAACCACGCCAGGTAGTGGAGTTCCTGTTGTAGATTCTTATTATTCATTGTTAATGCCTATATTAACCTTGACCAGGGCTTTAACTGCAGAGTCAC-14206 | NGNWYDFGDFIQTTPGSGVPVVDSYYSLLMPILTLTRALTAES | 132 | 244 | 100 | ORF1ab | 266 | 21552 | 21287 | RdRp |
| 14221-TTAACAAAGCCTTACATTAAGTGGGATTTGTTAAAATATGACTTCACGGAAGAGAGGTTAAAACTCTTTGACCGTTATTTTAAATATTGGGATCAGACATACCACCCAAATTGTGTTAACTGTTTGGATGACAGATGCATTCTGCATTGTGCAAACTTTAATGTTTTATTCTCTACAGTGTTCCCA-14406 | LTKPYIKWDLLKYDFTEERLKLFDRYFKYWDQTYHPNCVNCLDDRCILHCANFNVLFSTVFP | 186 | 344 | 100 | ORF1ab | 266 | 21552 | 21287 | RdRp |
| 15607-TTACAACACAGACTTTATGAGTGTCTCTATAGAAATAGAGATGTTGACACAGACTTTGTGAATGAGTTTTACGCATATTTGCGTAAACATTTCTCAATGATGATACTCTCTGACGATGCTGTTGTGTGTTT-15737 | LQHRLYECLYRNRDVDTDFVNEFYAYLRKHFSMMILSDDAVVC | 131 | 243 | 100 | ORF1ab | 266 | 21552 | 21287 | RdRp |
| 15991-GATGGTACACTTATGATTGAACGGTTCGTGTCTTTAGCTATAGATGCTTACCCACTTACTAAACATCCTAATCAGGAGTATGCTGATGTCTTTCATTTGTACTTACAATACATAAGAAAGCTACATGATGAGTTAACAGGACACATGTTAGACATGTATTCTGTTATGCTTACTAATGATAACACTTCAAGGTATTGGGAACCTGAGTTTTATGA-16205 | DGTLMIERFVSLAIDAYPLTKHPNQEYADVFHLYLQYIRKLHDELTGHMLDMYSVMLTNDNTSRYWEPEFY | 215 | 398 | 100 | ORF1ab | 266 | 21552 | 21287 | RdRp |
| 18487-ATACCACTTATGTACAAAGGACTTCCTTGGAATGTAGTGCGTATAAAGATTGTACAAATGTTAAGTGACACACTTAAAAATCTCTCTGACAGAGTCGTATTTGTCTTATGGGCACATGGCTTTGAGTTGACATCTATGAAGTATTTTGTGAAAATAGGACCTGAGCGCACCTGTTGTCTATGT-18669 | IPLMYKGLPWNVVRIKIVQMLSDTLKNLSDRVVFVLWAHGFELTSMKYFVKIGPERTCCLC | 183 | 339 | 100 | ORF1ab | 266 | 21552 | 21287 | Exon |
| 18980-ACATGGTTGTTAAAGCTGCATTATTAGCAGACAAATTCCCAGTTCTTCACGACATTGGTAACCCTAAAGCTATTAAGTGTGTACCTCAAGCTGATGTAGAATGGAAGTTCTATGATGCACAGCCTTGTAGTGACAAAGCTTATAAAATAGAAG-19132 | MVVKAALLADKFPVLHDIGNPKAIKCVPQADVEWKFYDAQPCSDKAYKIE | 153 | 283 | 100 | ORF1ab | 266 | 21552 | 21287 | Exon |
| 24490-TTTAAATGATATCCTTTCACGTCTTGACAAAGTTGAGGCTGAAGTGCAAATTGATAGGTTGATCACAGGCAGACTTCAAAGTTTGCAGACATATGTGACTCAACAATTAATTAGAGCTGCAGAAATCAGAGC-24621 | LNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIR | 132 | 244 | 100 | Spike | 21563 | 25381 | 3819 | Spike glycoprotein |
| 25913-GCACAACAAGTCCTATTTCTGAACATGACTACCAGATTGGTGGTTATACTGAAAAATGGGAATCTGGAGTAAAAGACTGTGTTGTATTACACAGTTACTTCACTTCAGACTATTACCAGCTGTACTCAACTCAATTGAGTACAGACACT-26061 | TTSPISEHDYQIGGYTEKWESGVKDCVVLHSYFTSDYYQLYSTQLSTDT | 149 | 276 | 100 | ORF3a | 25393 | 26217 | 825 | ORF3a protein |
| 27394-ATGAAAATTATTCTTTTCTTGGCACTGATAACACTCGCTACTTGTGAGCTTTATCACTACCAAGAGTGTGTTAGAGGTACAACAGTACTTTTAAAAGAACCTTGCTCTTCTGGAACATACGAGGGCA-27520 | MKIILFLALITLATCELYHYQECVRGTTVLLKEPCSSGTYEG | 127 | 235 | 100 | ORF7a | 27394 | 27756 | 363 | ORF7a protein |
Targeted Conserved Region in SARS-CoV-2 Genome and its corresponding protein sequence in NSP10 which is highlighted by red colour in NSP10 gene.
| DNA Sequence of | Protein | NSP10 protein structure |
|---|---|---|
| Conserved Region (CnR) | Sequence | with target region |
| 13125-GGGGACAACCAATCACTAATTGTGTTAAGATGTTGTGTACACACACTGGTACTGGTCAGGCAATAACAGTTACACCGGAAGCCAATATGGATCAAGAATCCTTTGGTGGTGCATCGTGTTGTCTGTACTGCCGTTGCCACATAGATCATCCAAATCCTAAAGGATTTTGTGACTTAAAAGGTAAGTATGTACAAATACCTACAACTTGTGCTAATGACCCTGTGGGTTTTACACTTAAAAACACAGT-13371 | 35-GQPITNCVKMLCTHTGTGQAITVTPEANMDQESFGGASCCLYCRCHIDHPNPKGFCDLKGKYVQIPTTCANDPVGFTLKNT-115 |
Details of Primers and Probes of NSP10 gene.
| Primer | Primers | ||||||
|---|---|---|---|---|---|---|---|
| Pair | Type | Sequence (5’- | Length | Tm | GC% | Probe Sequence | Probe Length |
| 1 | Forward | 117-TGTTGTCTGTACTGCCGTTG-136 | 20 | 60.05 | 50 | TGTTGTCTGTACTGCCGTTGCCACATAGATCATCCAAATCCTAAAGGATTTTGTGACTTAAAAGGTAAGTATGTACAAATACCTACAACTTGTGCTAATGACCCTGTGGGTTT | 113 |
| Reverse | 229-AAACCCACAGGGTCATTAGC-210 | 20 | 59.46 | 50 | |||
| 2 | Forward | 64-TAACAGTTACACCGGAAGCC-83 | 20 | 59.18 | 50 | TAACAGTTACACCGGAAGCCAATATGGATCAAGAATCCTTTGGTGGTGCATCGTGTTGTCTGTACTGCCGTTGCCACATAGA | 82 |
| Reverse | 145-TCTATGTGGCAACGGCAGTA-126 | 20 | 60.76 | 50 | |||
| 3 | Forward | 95-AGAATCCTTTGGTGGTGCAT-114 | 20 | 59.08 | 45 | AGAATCCTTTGGTGGTGCATCGTGTTGTCTGTACTGCCGTTGCCACATAGATCATCCAAATCCTAAAGGATTTTGTGACTTAAAAGGTAAGTATGTACAAATACCTACAACTTGTGCTAATGACCCTGTGGGTTTT | 136 |
| Reverse | 230-AAAACCCACAGGGTCATTAGC-210 | 21 | 60.16 | 47.62 | |||
| 4 | Forward | 35-GTGTACACACACTGGTACTGG-55 | 21 | 59.89 | 52.38 | GTGTACACACACTGGTACTGGTCAGGCAATAACAGTTACACCGGAAGCCAATATGGATCAAGAATCCTTTGGTGGTGCATCGTGTT | 86 |
| Reverse | 120-AACACGATGCACCACCAAAG-101 | 20 | 60.97 | 50 | |||
| 5 | Forward | 45-ACTGGTACTGGTCAGGCAATA-65 | 21 | 60.16 | 47.62 | ACTGGTACTGGTCAGGCAATAACAGTTACACCGGAAGCCAATATGGATCAAGAATCCTTTGGTGGTGCATCGTGTTGTCTG | 81 |
| Reverse | 125-CAGACAACACGATGCACCA-107 | 19 | 60 | 52.63 | |||
| 6 | Forward | 101-CTTTGGTGGTGCATCGTGTT-120 | 20 | 60.97 | 50 | CTTTGGTGGTGCATCGTGTTGTCTGTACTGCCGTTGCCACATAGATCATCCAAATCCTAAAGGATTTTGTGACTTAAAAGGTAAGTATGTACAAATACCTACAACTTGTGCTAATGACCCTGTGGGTTTTACAC | 134 |
| Reverse | 234-GTGTAAAACCCACAGGGTCAT-214 | 21 | 59.81 | 47.62 | |||
| 7 | Forward | 119-TTGTCTGTACTGCCGTTGC-137 | 19 | 60 | 52.63 | TTGTCTGTACTGCCGTTGCCACATAGATCATCCAAATCCTAAAGGATTTTGTGACTTAAAAGGTAAGTATGTACAAATACCTACAACTTGTGCTAATGACCCTGTGGGTTTTACACTT | 118 |
| Reverse | 236-AAGTGTAAAACCCACAGGGTC-216 | 21 | 59.74 | 47.62 | |||
| 8 | Forward | 66-ACAGTTACACCGGAAGCCAA-85 | 20 | 61.2 | 50 | ACAGTTACACCGGAAGCCAATATGGATCAAGAATCCTTTGGTGGTGCATCGTGTTGTCTGTACTGCCGTTGCCACATAGATCATCCA | 87 |
| Reverse | 152-TGGATGATCTATGTGGCAACG-132 | 21 | 59.81 | 47.62 | |||
| 9 | Forward | 44-CACTGGTACTGGTCAGGCAA-63 | 20 | 61.27 | 55 | CACTGGTACTGGTCAGGCAATAACAGTTACACCGGAAGCCAATATGGATCAAGAATCCTTTGGTGGTGCATCGTGTT | 77 |
| Reverse | 120-AACACGATGCACCACCAAA-102 | 19 | 59.84 | 47.37 | |||
| 10 | Forward | 65-AACAGTTACACCGGAAGCCA-84 | 20 | 61.2 | 50 | AACAGTTACACCGGAAGCCAATATGGATCAAGAATCCTTTGGTGGTGCATCGTGTTGTCTGTACTGCCGTTGCCACATA | 79 |
| Reverse | 143-TATGTGGCAACGGCAGTACA-124 | 20 | 61.34 | 50 | |||
List of most Immunogenic and Antigenic Epitopes for MHC-I, MHC-II restricted T-cell and B-cell Epitopes for 12 CnRs. *I.S.-Immunogenic Score; A.S.-Antigenic Score.
| Protein Sequence | Coded | Type | MHC-I restricted T-cell | MHC-II restricted T-cell | B-cell Epitopes | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Protein | Epitopes | Alleles | I.S.* | A.S.* | Epitopes | Alleles | I.S.* | A.S.* | Epitopes | I.S.* | A.S.* | ||
| 160-TCEFCGTENLTKEGATTCGYLPQNAVVKIYCPACHNSEVGPEHSL-204 | NSP2 | Immunogenic | SEVGPEHSL | HLA-B*40:01 | 0.99 | 0.72 | TTCGYLPQNAVVKIY | HLA-DRB5*01:01 | 4.30 | 0.04 | VVKIYCPACHNSEVGP | 0.96 | 0.66 |
| Antigenic | NSEVGPEHSL | HLA-B*40:01 | 0.79 | 0.82 | ATTCGYLPQNAVVKI | HLA-DRB5*01:01 | 5.20 | 0.18 | |||||
| 111-RDGCVPLNIIPLTTAAKLMVVIPDYNTYKNTCDGTTFTYASALWE-155 | Immunogenic | NTCDGTTFTY | HLA-A*01:01 | 0.97 | -0.03 | VPLNIIPLTTAAKLM | HLA-DRB1*08:02 | 0.25 | 0.88 | MVVIPDYNTYKNTCDG | 0.94 | 0.24 | |
| Antigenic | TTFTYASALW | HLA-B*57:01 | 0.95 | 0.40 | GCVPLNIIPLTTAAK | HLA-DRB1*08:02 | 0.27 | 1.13 | VPLNIIPLTTAAKLMV | 0.57 | 0.74 | ||
| 35-GQPITNCVKMLCTHTGTGQAITVTPEANMDQESFGGASCCLYCRCHIDHPNPKGFCDLKGKYVQIPTTCANDPVGFTLKNT-115 | NSP10 | Immunogenic | DLKGKYVQI | HLA-B*08:01 | 0.92 | 1.38 | LKGKYVQIPTTCAND | HLA-DRB1*04:01 | 0.49 | 0.63 | RCHIDHPNPKGFCDLK | 0.93 | 0.72 |
| Antigenic | HPNPKGFCDL | HLA-B*07:02 | 0.69 | 1.43 | DLKGKYVQIPTTCAN | HLA-DRB1*04:01 | 0.51 | 0.86 | PNPKGFCDLKGKYVQI | 0.66 | 1.55 | ||
| 213-NGNWYDFGDFIQTTPGSGVPVVDSYYSLLMPILTLTRALTAES-255 | RdRp | Immunogenic | SLLMPILTL | HLA-A*02:01 | 0.79 | 0.21 | SYYSLLMPILTLTRA | HLA-DRB1*01:01 | 0.16 | 0.55 | DFIQTTPGSGVPVVDS | 0.93 | 0.36 |
| Antigenic | SGVPVVDSY | HLA-B*35:01 | 0.66 | 0.59 | VDSYYSLLMPILTLTR | 0.62 | 0.47 | ||||||
| 261-LTKPYIKWDLLKYDFTEERLKLFDRYFKYWDQTYHPNCVNCLDDRCILHCANFNVLFSTVFP-322 | RdRp | Immunogenic | KLFDRYFKY | HLA-A*32:01 | 0.95 | -0.05 | TEERLKLFDRYFKYW | HLA-DPA1*01:03/DPB1*02:01 | 0.76 | 0.18 | YFKYWDQTYHPNCVNC | 0.88 | 0.75 |
| Antigenic | RLKLFDRYFKYWDQT | HLA-DPA1*01:03/DPB1*02:01 | 1.20 | 0.44 | |||||||||
| 723-LQHRLYECLYRNRDVDTDFVNEFYAYLRKHFSMMILSDDAVVC-765 | RdRp | Immunogenic | DTDFVNEFY | HLA-A*01:01 | 0.99 | 0.25 | NEFYAYLRKHFSMMI | HLA-DRB1*11:01 | 0.02 | 0.23 | HRLYECLYRNRDVDTD | 0.83 | 0.23 |
| Antigenic | YLRKHFSMM | HLA-B*08:01 | 0.88 | 0.49 | EFYAYLRKHFSMMIL | HLA-DRB1*11:01 | 0.05 | 0.39 | |||||
| 851-DGTLMIERFVSLAIDAYPLTKHPNQEYADVFHLYLQYIRKLHDELTGHMLDMYSVMLTNDNTSRYWEPEFY-921 | RdRp | Immunogenic | QEYADVFHLY | HLA-B*44:03 | 0.99 | 0.27 | VFHLYLQYIRKLHDE | HLA-DRB4*01:01 | 0.37 | 0.28 | GHMLDMYSVMLTNDNT | 0.91 | 0.43 |
| Antigenic | QEYADVFHL | HLA-B*40:01 | 0.98 | 0.36 | HMLDMYSVMLTNDNT | HLA-DRB1*04:05 | 0.42 | 0.55 | HPNQEYADVFHLYLQY | 0.77 | 0.55 | ||
| 150-IPLMYKGLPWNVVRIKIVQMLSDTLKNLSDRVVFVLWAHGFELTSMKYFVKIGPERTCCLC-210 | Exon | Immunogenic | NLSDRVVFV | HLA-A*02:03 | 0.94 | 0.95 | VRIKIVQMLSDTLKN | HLA-DRB4*01:01 | 0.38 | 0.29 | GFELTSMKYFVKIGPE | 0.87 | 1.17 |
| Antigenic | PWNVVRIKIVQMLSD | HLA-DRB4*01:01 | 0.41 | 0.46 | |||||||||
| 315-MVVKAALLADKFPVLHDIGNPKAIKCVPQADVEWKFYDAQPCSDKAYKIE-364 | Exon | Immunogenic | LLADKFPVL | HLA-A*02:01 | 0.94 | 0.08 | MVVKAALLADKFPVL | HLA-DPA1*01:03/DPB1*02:01 | 1.30 | 0.40 | KCVPQADVEWKFYDAQ | 0.80 | 1.34 |
| Antigenic | KCVPQADVEW | HLA-B*57:01 | 0.90 | 1.09 | |||||||||
| 977-LNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIR-1019 | Spike glycoprotein | Immunogenic | AEVQIDRLI | HLA-B*44:03 | 0.90 | -0.56 | VEAEVQIDRLITGRL | HLA-DRB1*03:01 | 1.10 | -0.37 | DRLITGRLQSLQTYVT | 0.77 | -0.36 |
| Antigenic | RLDKVEAEV | HLA-A*02:01 | 0.83 | 0.08 | LQTYVTQQLIRAAEI | HLA-DRB4*01:01 | 2.70 | 0.02 | LNDILSRLDKVEAEVQ | 0.51 | 0.17 | ||
| 175-TTSPISEHDYQIGGYTEKWESGVKDCVVLHSYFTSDYYQLYSTQLSTDT-223 | ORF3a protein | Immunogenic | FTSDYYQLY | HLA-A*01:01 | 0.98 | -0.11 | VLHSYFTSDYYQLYS | HLA-DPA1*01:03/DPB1*04:01 | 0.17 | 0.06 | TSPISEHDYQIGGYTE | 0.93 | 0.72 |
| Antigenic | SEHDYQIGGY | HLA-B*44:03 | 0.91 | 1.04 | HSYFTSDYYQLYSTQ | HLA-DPA1*01:03/DPB1*04:01 | 0.33 | 0.25 | |||||
| 1-MKIILFLALITLATCELYHYQECVRGTTVLLKEPCSSGTYEG-42 | ORF7a | Immunogenic | QECVRGTTVL | HLA-B*40:01 | 0.83 | 0.60 | ILFLALITLATCELY | HLA-DRB1*01:01 | 0.16 | 0.19 | TCELYHYQECVRGTTV | 0.81 | 0.53 |
| Antigenic | ILFLALITL | HLA-A*02:01 | 0.45 | 0.82 | |||||||||
Summary of the most Immunogenic and Antigenic Epitopes along with the Allergic and Toxicity values.
| Coded Proteins | MHC-I restricted T-cell Epitopes | Allergic | Toxicity | MHC-II restricted T-cell Epitopes | Allergic | Toxicity | Linear B-cell Epitopes | Allergic | Toxicity |
|---|---|---|---|---|---|---|---|---|---|
| NSP2 | SEVGPEHSL | Non-Allergen | Non-Toxin | TTCGYLPQNAVVKIY | Non-Allergen | Non-Toxin | VVKIYCPACHNSEVGP | Allergen | Non-Toxin |
| NSEVGPEHSL | Allergen | Non-Toxin | ATTCGYLPQNAVVKI | Non-Allergen | Non-Toxin | ||||
| NSP8 | NTCDGTTFTY | Allergen | Non-Toxin | VPLNIIPLTTAAKLM | Non-Allergen | Non-Toxin | MVVIPDYNTYKNTCDG | Non-Allergen | Non-Toxin |
| TTFTYASALW | Allergen | Non-Toxin | GCVPLNIIPLTTAAK | Non-Allergen | Non-Toxin | VPLNIIPLTTAAKLMV | Non-Allergen | Non-Toxin | |
| NSP10 | DLKGKYVQI | Allergen | Non-Toxin | LKGKYVQIPTTCAND | Allergen | Non-Toxin | RCHIDHPNPKGFCDLK | Allergen | Toxin |
| HPNPKGFCDL | Allergen | Toxin | DLKGKYVQIPTTCAN | Allergen | Non-Toxin | PNPKGFCDLKGKYVQI | Allergen | Non-Toxin | |
| RdRp | SLLMPILTL | Non-Allergen | Non-Toxin | SYYSLLMPILTLTRA | Non-Allergen | Non-Toxin | DFIQTTPGSGVPVVDS | Non-Allergen | Non-Toxin |
| SGVPVVDSY | Allergen | Non-Toxin | VDSYYSLLMPILTLTR | Allergen | Non-Toxin | ||||
| RdRp | KLFDRYFKY | Non-Allergen | Non-Toxin | TEERLKLFDRYFKYW | Allergen | Non-Toxin | YFKYWDQTYHPNCVNC | Non-Allergen | Toxin |
| RLKLFDRYFKYWDQT | Allergen | Non-Toxin | |||||||
| RdRp | DTDFVNEFY | Allergen | Non-Toxin | NEFYAYLRKHFSMMI | Non-Allergen | Non-Toxin | HRLYECLYRNRDVDTD | Non-Allergen | Toxin |
| YLRKHFSMM | Non-Allergen | Non-Toxin | EFYAYLRKHFSMMIL | Non-Allergen | Non-Toxin | ||||
| RdRp | QEYADVFHLY | Allergen | Non-Toxin | VFHLYLQYIRKLHDE | Non-Allergen | Non-Toxin | GHMLDMYSVMLTNDNT | Allergen | Non-Toxin |
| QEYADVFHL | Allergen | Non-Toxin | HMLDMYSVMLTNDNT | Allergen | Non-Toxin | HPNQEYADVFHLYLQY | Non-Allergen | Toxin | |
| Exon | NLSDRVVFV | Non-Allergen | Non-Toxin | VRIKIVQMLSDTLKN | Non-Allergen | Non-Toxin | GFELTSMKYFVKIGPE | Non-Allergen | Non-Toxin |
| PWNVVRIKIVQMLSD | Non-Allergen | Non-Toxin | |||||||
| Exon | LLADKFPVL | Allergen | Non-Toxin | MVVKAALLADKFPVL | Allergen | Non-Toxin | KCVPQADVEWKFYDAQ | Non-Allergen | Non-Toxin |
| KCVPQADVEW | Non-Allergen | Non-Toxin | |||||||
| Spike glycoprotein | AEVQIDRLI | Non-Allergen | Non-Toxin | VEAEVQIDRLITGRL | Non-Allergen | Non-Toxin | DRLITGRLQSLQTYVT | Non-Allergen | Non-Toxin |
| RLDKVEAEV | Allergen | Non-Toxin | LQTYVTQQLIRAAEI | Non-Allergen | Non-Toxin | LNDILSRLDKVEAEVQ | Allergen | Non-Toxin | |
| ORF3a | FTSDYYQLY | Allergen | Non-Toxin | VLHSYFTSDYYQLYS | Non-Allergen | Non-Toxin | TSPISEHDYQIGGYTE | Allergen | Non-Toxin |
| SEHDYQIGGY | Non-Allergen | Non-Toxin | HSYFTSDYYQLYSTQ | Non-Allergen | Non-Toxin | ||||
| ORF7a | QECVRGTTVL | Non-Allergen | Non-Toxin | ILFLALITLATCELY | Non-Allergen | Non-Toxin | TCELYHYQECVRGTTV | Allergen | Toxin |
| ILFLALITL | Non-Allergen | Non-Toxin |
Fig. 4Modelling of MHC-I, MHC-II restricted T-cell and B-cell epitopes for 12 CnRs belonging to (a) NSP2 (b) NSP8 (c) NSP10 (f) RdRp (f) Exon (g) Spike glycoprotein (h) ORF3a and (i) ORF7a.
Docking and Z-scores of most Immunogenic and Antigenic MHC-I restricted T-cell epitopes for 12 CnRs.
| MHC-I restricted T-cell epitopes | Allele PDB ID | Score from AutoDock Vina | Total Energy | vdW Energy | Electric Energy | ERRAT Score | Z Score |
|---|---|---|---|---|---|---|---|
| SEVGPEHSL | 3LN4:A | -7.02 | 56.597 | 4.242 | -84.058 | 92.1127 | -8.92 |
| NSEVGPEHSL | 3LN4:A | -7.826 | 62.78 | 0.135 | -71.237 | 92.1127 | -8.92 |
| NTCDGTTFTY | 3BO8:A | -7.896 | 79.478 | 0.388 | -72.211 | 82.3529 | -8.98 |
| TTFTYASALW | 3VRI:A | -9.932 | 131.03 | -26.04 | -49.8 | 81.5642 | -9.27 |
| DLKGKYVQI | 4QRU:A | -8.007 | 30.829 | -7.715 | -80.4 | 80.4469 | -9.48 |
| HPNPKGFCDL | 4U1H:A | -7.438 | 51.815 | -3.509 | -61.083 | 84.9582 | -8.97 |
| SLLMPILTL | 3UTQ:A | -8.166 | 117.669 | -10.804 | -48.976 | 83.3333 | -9.38 |
| SGVPVVDSY | 2CIK:A | -8.074 | 79.882 | -6.491 | -77.615 | 84.0336 | -9.28 |
| KLFDRYFKY | 5E00:A | -8.323 | 38.063 | 0.837 | -81.052 | 85.1955 | -8.77 |
| DTDFVNEFY | 3BO8:A | -7.786 | 84.77 | -1.521 | -75.162 | 82.3529 | -8.98 |
| YLRKHFSMM | 4QRU:A | -8.029 | 40.78 | -18.508 | -41.459 | 80.4469 | -9.48 |
| QEYADVFHLY | 1N2R:A | -8.848 | 88.793 | -9.037 | -85.66 | 85.1955 | -8.95 |
| QEYADVFHL | 3LN4:A | -7.996 | 48.824 | 1.057 | -95.906 | 92.1127 | -8.92 |
| NLSDRVVFV | 3OX8:A | -7.321 | 2.558 | -17.624 | -83.824 | 82.5843 | -9.3 |
| LLADKFPVL | 3UTQ:A | -7.845 | 60.256 | -0.423 | -73.612 | 83.3333 | -9.38 |
| KCVPQADVEW | 3VRI:A | -7.362 | 44.618 | 9.799 | -82.426 | 81.5642 | -9.27 |
| AEVQIDRLI | 1N2R:A | -7.302 | -5.739 | -14.044 | -59.423 | 85.1955 | -8.95 |
| RLDKVEAEV | 3UTQ:A | -7.406 | -35.156 | -10.383 | -59.389 | 83.3333 | -9.38 |
| FTSDYYQLY | 3BO8:A | -8.007 | 91.699 | -12.984 | -63.351 | 83.3333 | -8.98 |
| SEHDYQIGGY | 1N2R:A | -9.458 | 67.521 | -29.967 | -56.642 | 85.1955 | -8.95 |
| QECVRGTTVL | 3LN4:A | -8.409 | -0.982 | -8.186 | -75.82 | 92.1127 | -8.92 |
| ILFLALITL | 3UTQ:A | -8.656 | 123.773 | -19.829 | -50.913 | 83.3333 | -9.38 |
Docking and Z-scores of most Immunogenic and Antigenic MHC-II restricted T-cell epitopes for 12 CnRs.
| MHC-II restricted T-cell epitopes | Allele PDB ID | Score from AutoDock Vina | Total Energy | vdW Energy | Electric Energy | ERRAT Score | Z Score |
|---|---|---|---|---|---|---|---|
| TTCGYLPQNAVVKIY | 1FV1:B | -8.187 | 51.807 | -11.448 | -73.616 | 83.3333 | -9.38 |
| ATTCGYLPQNAVVKI | 1FV1:B | -7.002 | 53.457 | 3.071 | -74.542 | 92.1127 | -8.92 |
| VPLNIIPLTTAAKLM | 6CPN:B | -7.134 | 76.07 | -0.246 | -70.524 | 82.3529 | -8.98 |
| GCVPLNIIPLTTAAK | 1X7Q:A | −7.298 | 117.674 | 7.064 | -70.22 | 83.7079 | −8.91 |
| LKGKYVQIPTTCAND | 4MD4:B | -7.168 | 26.786 | 18.782 | -118.485 | 84.0336 | -9.28 |
| DLKGKYVQIPTTCAN | 4MD4:B | -7.598 | 51.579 | -8.601 | -62.765 | 84.0336 | -9.28 |
| SYYSLLMPILTLTRA | 2G9H:B | -8.185 | 93.108 | -19.626 | -34.574 | 84.0782 | -9.21 |
| TEERLKLFDRYFKYW | 3WEX:A; 3WEX:B | -8.073 | 35.351 | -8.623 | -76.368 | 83.7079 | -8.95 |
| RLKLFDRYFKYWDQT | 3WEX:A; 3WEX:B | -8.568 | 77.593 | -17.304 | -51.475 | 88.169 | -8.93 |
| NEFYAYLRKHFSMMI | 1A6A:B | -8.465 | 100.048 | -14.017 | -61.447 | 87.9552 | -9.5 |
| EFYAYLRKHFSMMIL | 1A6A:B | -10.032 | 47.328 | -36.397 | -46.922 | 88.4831 | -8.97 |
| VFHLYLQYIRKLHDE | 1T5W:B | -7.431 | 33.396 | -7.497 | -60.178 | 80.4469 | -9.48 |
| HMLDMYSVMLTNDNT | 4MD4:B | ” -8.019” | 88.304 | -12.212 | -63.943 | 83.7535 | -8.95 |
| VRIKIVQMLSDTLKN | 1T5W:B | -6.854 | -59.105 | 37.684 | -153.888 | 77.7465 | -9.09 |
| PWNVVRIKIVQMLSD | 1T5W:B | -7.877 | 92.966 | -19.085 | -38.808 | 83.3333 | -9.38 |
| MVVKAALLADKFPVL | 3WEX:A; 3WEX:B | -7.289 | 7.927 | 1.388 | -98.584 | 77.7465 | -9.09 |
| VEAEVQIDRLITGRL | 1A6A:B | -7.845 | 2.052 | -10.221 | -87.57 | 83.7079 | -8.95 |
| LQTYVTQQLIRAAEI | 1T5W:B | -8.080 | 24.104 | -8.501 | -96.551 | 77.7465 | -9.09 |
| VLHSYFTSDYYQLYS | 3WEX:A; 3WEX:B | -7.453 | 40.904 | 5.179 | -116.223 | 81.5642 | -9.27 |
| HSYFTSDYYQLYSTQ | 3WEX:A; 3WEX:B | -7.964 | 107.759 | -16.583 | -52.05 | 82.3529 | -8.98 |
| ILFLALITLATCELY | 2G9H:B | -8.456 | 39.487 | -18.368 | -86.629 | 85.9944 | -8.83 |
Fig. 5Structural analysis for the most immunogenic MHC-I restricted T-cell epitope “SEVGPEHSL” in 12 CnRs (a) Docking structure of MHC-I restricted T-cell epitope (b) 2D pose representation between the epitope and HLA allele showing the different non-covalent bonds (c) ERRAT Score (d) Z-Score plot (e) Ramachandran plot of the epitope allele structure showing lower energy sites of the residues in different frames and (f) Verify 3D scores in Chain A of the docked complex (g) Verify 3D scores in Chain B of the docked complex.
Fig. 6Structural analysis for the most antigenic MHC-I restricted T-cell epitope “HPNPKGFCDL” in 12 CnRs (a) Docking structure of MHC-I restricted T-cell epitope (b) 2D pose representation between the epitope and HLA allele showing the different non-covalent bonds (c) ERRAT Score (d) Z-Score plot (e) Ramachandran plot of the epitope allele structure showing lower energy sites of the residues in different frames and (f) Verify 3D scores in Chain A of the docked complex (g) Verify 3D scores in Chain B of the docked complex.
Fig. 7Structural analysis for the most immunogenic MHC-II restricted T-cell epitope “NEFYAYLRKHFSMMI” in 12 CnRs (a) Docking structure of MHC-I restricted T-cell epitope (b) 2D pose representation between the epitope and HLA allele showing the different non-covalent bonds (c) ERRAT Score (d) Z-Score plot (e) Ramachandran plot of the epitope allele structure showing lower energy sites of the residues in different frames and (f) Verify 3D scores in Chain A of the docked complex (g) Verify 3D scores in Chain B of the docked complex.
Fig. 8Structural analysis for the most antigenic MHC-II restricted T-cell epitope “GCVPLNIIPLTTAAK” in 12 CnRs (a) Docking structure of MHC-I restricted T-cell epitope (b) 2D pose representation between the epitope and HLA allele showing the different non-covalent bonds (c) ERRAT Score (d) Z-Score plot (e) Ramachandran plot of the epitope allele structure showing lower energy sites of the residues in different frames and (f) Verify 3D scores in Chain A of the docked complex (g) Verify 3D scores in Chain B of the docked complex.