| Literature DB >> 19401763 |
Qi Ying Koo1, Asif M Khan, Keun-Ok Jung, Shweta Ramdas, Olivo Miotto, Tin Wee Tan, Vladimir Brusic, Jerome Salmon, J Thomas August.
Abstract
West Nile virus (WNV) has emerged globally as an increasingly important pathogen for humans and domestic animals. Studies of the evolutionary diversity of the virus over its known history will help to elucidate conserved sites, and characterize their correspondence to other pathogens and their relevance to the immune system. We describe a large-scale analysis of the entire WNV proteome, aimed at identifying and characterizing evolutionarily conserved amino acid sequences. This study, which used 2,746 WNV protein sequences collected from the NCBI GenPept database, focused on analysis of peptides of length 9 amino acids or more, which are immunologically relevant as potential T-cell epitopes. Entropy-based analysis of the diversity of WNV sequences, revealed the presence of numerous evolutionarily stable nonamer positions across the proteome (entropy value of < or = 1). The representation (frequency) of nonamers variant to the predominant peptide at these stable positions was, generally, low (< or = 10% of the WNV sequences analyzed). Eighty-eight fragments of length 9-29 amino acids, representing approximately 34% of the WNV polyprotein length, were identified to be identical and evolutionarily stable in all analyzed WNV sequences. Of the 88 completely conserved sequences, 67 are also present in other flaviviruses, and several have been associated with the functional and structural properties of viral proteins. Immunoinformatic analysis revealed that the majority (78/88) of conserved sequences are potentially immunogenic, while 44 contained experimentally confirmed human T-cell epitopes. This study identified a comprehensive catalogue of completely conserved WNV sequences, many of which are shared by other flaviviruses, and majority are potential epitopes. The complete conservation of these immunologically relevant sequences through the entire recorded WNV history suggests they will be valuable as components of peptide-specific vaccines or other therapeutic applications, for sequence-specific diagnosis of a wide-range of Flavivirus infections, and for studies of homologous sequences among other flaviviruses.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19401763 PMCID: PMC2670515 DOI: 10.1371/journal.pone.0005352
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1General overview of the bioinformatics approach employed in this study.
Number of WNV protein sequences retrieved from NCBI and their maximum percentage amino-acid difference.
| WNV protein | Total length (aa) | No. of sequences analysed | % maximum amino-acid difference |
| C | 123 | 264 | 23 |
| prM | 167 | 417 | 19 |
| E | 497 | 927 | 12 |
| NS1 | 352 | 164 | 16 |
| NS2a | 231 | 143 | 20 |
| NS2b | 131 | 146 | 10 |
| NS3 | 619 | 146 | 10 |
| NS4a | 149 | 142 | 14 |
| NS4b | 256 | 141 | 8 |
| NS5 | 905 | 256 | 10 |
|
|
|
|
|
Retrieved from NCBI Entrez Protein Database on 28th June 2007.
Approximate size indicated in number of amino acids.
Maximum percentage amino-acid difference for each WNV protein, computed using ClustalW [23].
Figure 2Peptide entropy plots for WNV protein alignments.
Figure 3Percentage representation of nonamer variants in relation to the predominant nonamer peptide for all nonamer positions in WNV protein alignments.
Completely conserved sequence fragments (pan-WNV sequences) of WNV proteins.
| WNV protein | Length (aa) | Pan-WNV sequence |
| C | - | None |
| prM | 14 |
|
| 10 |
| |
| E | 11 |
|
| 14 |
| |
| 9 |
| |
| 19 |
| |
| 12 |
| |
| 10 |
| |
| 12 |
| |
| NS1 | 11 |
|
| 9 |
| |
| 9 |
| |
| 10 |
| |
| 25 |
| |
| 11 |
| |
| 10 |
| |
| 10 |
| |
| NS2a | 10 |
|
| 15 |
| |
| NS2b | 10 |
|
| 14 |
| |
| 9 |
| |
| 11 |
| |
| NS3 | 10 |
|
| 10 |
| |
| 10 |
| |
| 10 |
| |
| 10 |
| |
| 12 |
| |
| 12 |
| |
| 13 |
| |
| 11 |
| |
| 12 |
| |
| 9 |
| |
| 27 |
| |
| 14 |
| |
| 12 |
| |
| 10 |
| |
| 10 |
| |
| 16 |
| |
| 11 |
| |
| 11 |
| |
| 11 |
| |
| 16 |
| |
| 9 |
| |
| 11 |
| |
| 9 |
| |
| 10 |
| |
| NS4a | 15 |
|
| 11 |
| |
| 13 |
| |
| 20 |
| |
| NS4b | 9 |
|
| 13 |
| |
| 9 |
| |
| 12 |
| |
| 10 |
| |
| 22 |
| |
| NS5 | 10 |
|
| 9 |
| |
| 17 |
| |
| 22 |
| |
| 10 |
| |
| 10 |
| |
| 9 |
| |
| 16 |
| |
| 12 |
| |
| 10 |
| |
| 13 |
| |
| 18 |
| |
| 29 |
| |
| 10 |
| |
| 18 |
| |
| 10 |
| |
| 12 |
| |
| 29 |
| |
| 16 |
| |
| 13 |
| |
| 10 |
| |
| 15 |
| |
| 23 |
| |
| 12 |
| |
| 19 |
| |
| 14 |
| |
| 18 |
| |
| 27 |
| |
| 21 |
| |
| 12 |
|
Numbers prefixing and affixing sequences represent start and end positions in the protein alignment.
Number of pan-WNV sequences, their length in amino acids and percentage coverage of total protein length.
| WNV protein | Total length (aa) | Pan-WNV sequences | ||
| Number | Length (aa) | % of total protein length (aa) | ||
| C | 123 | 0 | 0 | 0 |
| prM | 167 | 2 | 24 | 14 |
| E | 497 | 7 | 87 | 18 |
| NS1 | 352 | 8 | 95 | 27 |
| NS2a | 231 | 2 | 25 | 11 |
| NS2b | 131 | 4 | 44 | 34 |
| NS3 | 619 | 25 | 296 | 48 |
| NS4a | 149 | 4 | 59 | 40 |
| NS4b | 256 | 6 | 75 | 29 |
| NS5 | 905 | 30 | 464 | 51 |
| Total | 3430 | 88 | 1169 | 34 |
Approximate length indicated in number of amino acids, according to the reference protein sequence described in the .
Approximate percentage rounded off to nearest whole number.
Reported biological properties of pan-WNV sequences.
| WNV protein | Pan-WNV sequence | Functional domains and motifs | Putative post-transcriptional modifications |
| E |
| Dimerisation domain | PKC, CKII |
|
| Dimerisation domain, Fusion Loop | N-myristoylation | |
|
| - | N-myristoylation | |
|
| Immunoglobulin-like domain | CKII | |
|
| Immunoglobulin-like domain | - | |
|
| - | CKII | |
| NS1 |
| - | CKII |
|
| ATP/GTP-binding site motif A (P-loop) | - | |
|
| - | PKC, CKII | |
|
| - | N-myristoylation | |
| NS2a |
| - | CKII |
| NS2b |
| - | N-myristoylation |
| NS3 |
| - | PKC |
|
| Peptidase S7 | - | |
|
| Peptidase S7 | - | |
|
| Peptidase S7 | - | |
|
| Peptidase S7 | N-myristoylation | |
|
| DEAD/H domain | N-myristoylation | |
|
| DEAD/H domain | PKC | |
|
| DEAD/H domain | - | |
|
| DEAD/H domain | - | |
|
| - | TK | |
|
| - | PKC | |
|
| - | CKII, N-myristoylation | |
| NS4a |
| - | N-myristoylation |
|
| - | CKII | |
| NS4b |
| - | N-glycosylation, CKII |
| NS5 |
| - | CKII, N-myristoylation |
|
| - | PKC, N-myristoylation | |
|
| - | PKC | |
|
| - | CKII | |
|
| - | N-glycosylation, CKII | |
|
| RdRp | - | |
|
| RdRp | - | |
|
| RdRp | - | |
|
| - | N-glycosylation | |
|
| RdRp | - | |
|
| RdRp | Amidation | |
|
| RdRp | - | |
|
| RdRp | N-myristoylation | |
|
| RdRp | - | |
|
| RdRp | PKC | |
|
| RdRp/ RdRp catalytic domain | CKII, N-myristoylation | |
|
| RdRp | - | |
|
| RdRp/ RdRp catalytic domain | CKII | |
|
| RdRp | - | |
|
| RdRp | - | |
|
| RdRp | N-myristoylation | |
|
| RdRp | - | |
|
| RdRp | PKC |
Prosite (PS) and Pfam (PF) accession numbers: PS00001, N-glycosylation site; PS00005, Protein kinase C phosphorylation (PKC) site; PS00006, Casein kinase II (CKII) phosphorylation site; PS00007, Tyrosine kinase (TK) phosphorylation site; PS00008, N-myristoylation site; PS00009, Amidation site; PS00017, ATP/GTP-binding site motif A (P-loop); PS50507, RNA-directed RNA polymerase (RdRp) catalytic domain; PF00869, dimerisation domain; PF00949, Peptidase S7; PF00972, RNA-directed RNA polymerase (RdRp); PF02832, Immunoglobulin-like domain; PF07652, Flavivirus DEAD/H domain.
Figure 4Number of pan-WNV sequences conserved in other flaviviruses.
Figure 5Number of flaviviruses shared by the pan-WNV sequences.
WNV sequences with human T-cell epitopes elucidated by other studies.
| WNV protein | Pan-WNV sequence | Reported T-cell epitopes immunogenic in humans | |||
| Sequence | T-cells | HLA restriction | Reference(s) | ||
| NS3 |
|
| CD4 | DR2 |
|
|
|
| CD8 | B07 |
| |
| NS5 |
|
| - | - | 1021472 |
Epitope amino acids matching the pan-WNV sequences are underlined.
1021472 is an accession number of a record in the Immune Epitope Database.
Figure 6Putative HLA supertype-restricted, pan-WNV T-cell epitopes predicted by computational algorithms.
Pan-WNV sequences with human T-cell epitopes identified by use of HLA transgenic mice.
| WNV protein | Pan-WNV sequence | T-cell epitopes immunogenic in HLA transgenic mice | |
| ELISpot activation peptide | ELISpot positive HLA transgenic mouse | ||
| prM | 125-ESWILRNPGYALVA-138* | LVKT | DR2 & DR4 |
|
| A24, B7 & DR2 | ||
| 158-LLLLVAPAYS-167* | RVVFVV | A2, DR2, DR3 & DR4 | |
| E | 104-GCGLFGKGSIDTCA-117* | RGWGN | DR3 & DR4 |
| 293-LKGTTYGVC-301* | EKLQ | DR4 | |
| 370-ELEPPFGDSYIV-381 | KVLI | DR4 | |
| 449-LFGGMSWITQGL-460* | FRS | A2, DR2 & DR3 | |
| NS1 | 209-TWKLERAVLGEVKSCTWPETHTLWG-233* | RLND | DR4 |
| 276-DFDYCPGTTVT-286* | EGRVEI | DR4 | |
| 313-CRSCTLPPLR-322 | GKLITDWC | DR3 & DR4 | |
| 328-GCWYGMEIRP-337 | S | DR4 | |
| NS2a | 69-NSGGDVVHLALMATF-83* | FAES | DR4 |
| NS2b | 1-GWPATEVMTA-10* |
| DR4 |
| 108-SAYTPWAILPS-118* | I | A24, B7 & DR4 | |
| NS3 | 1-GGVLWDTPSP-10 |
| B7 & DR4 |
| 52-TTKGAALMSG-61 | WH | DR3 | |
| 074-EDRLCYGGPW-083 | GSVK | A2 | |
| 145-DVIGLYGNGVIMP-157 | PIVDKNG | A2 | |
|
| A2 | ||
| 161-YISAIVQGERM-171* |
| A2 & DR2 | |
| 191-VLDLHPGAGKTR-202 | MLRKKQIT | A2 & DR2 | |
|
| DR2 | ||
| 235-ALRGLPIRY-243 | VAAEMAE | DR4 | |
| E | DR4 | ||
| 256-EIVDVMCHATLTHRLMSPHRVPNYNLF-282* | PREHNGN | A2, DR2 & DR4 | |
|
| DR2 | ||
|
| A2 & DR2 | ||
| 310-AAAIFMTATPPG-321 | KVELGE | A2 | |
| 337-QTEIPDRAWN-346 | L | A2 | |
| 422-RVIDSRKSVKP-432 | EMGANFKAS | A2 | |
| 470-GDEYCYGGHTNEDDSN-485 |
| A2 & DR3 | |
| 487-AHWTEARIM-495 |
| A2 & DR3 | |
| 526-LRGEERKNFLE-536 | EYR | A2 & DR2 | |
| 563-WCFDGPRTNT-572* | DRR | DR3 | |
| NS4a | 19-KTWEALDTMYVVATA-33* | HFMG | DR2 & DR4 |
| 115-MIVLIPEPEKQRSQTDNQLA-134* |
| DR4 | |
| NS4b | 39-PATAWSLYA-47 | GEFLLDLR | DR2 |
|
| DR2 & DR3 | ||
| 68-TSLTSINVQASAL-80* | DYIN | DR3 & DR4 | |
| 208-VTLWENGASSVWNATTAIGLCH-229* | LITAAA | DR3 & DR4 | |
| NS5 | 107-GPGHEEPQLVQSYGWNIVTMKS-128* |
| DR3 |
| 152-SSAEVEEHRT-161 | CDIGESS | B7 | |
|
| A2, B7 & DR2 | ||
| 208-RNPLSRNSTHEMYWVS-223* |
| DR2 | |
| 451-TCIYNMMGKREK-462 | ECH | A2 | |
| 472-GSRAIWFMWLGARFLEFEALGFLNEDHWL-500 | AK | A24 | |
|
| A24 | ||
| 596-ISREDQRGSGQVVTYALNTFTNL-618* |
| DR2 | |
|
| DR2 & DR4 | ||
| 620-VQLVRMMEGEGV-631* | NTFTNLA | DR4 | |
| 704-GWYDWQQVPFCSNHFTEL-721* |
| DR4 | |
| 741-GRARISPGAGWNVRDTACLAKSYAQMW-767 |
| A24 | |
| 769-LLYFHRRDLRLMANAICSAVP-789* | YAQMWL | B7 & DR4 | |
| 792-WVPTGRTTWSIH-803 | N | DR4 | |
Pan-WNV sequences that are predicted, either by Multipred or TEPITOPE, to contain at least one HLA-DR supertype-restricted binding nonamer are indicated by an asterisk (*).
Epitope amino acids matching the pan-WNV sequences are underlined.