| Literature DB >> 32029759 |
Pragya D Yadav1, Savita Patil1, Santoshkumar M Jadhav2, Dimpal A Nyayanit1, Vimal Kumar1, Shilpi Jain1, Jagadish Sampath1, Devendra T Mourya1, Sarah S Cherian3.
Abstract
The Kyasanur Forest Disease (KFD) has become a major public health problem in the State of Karnataka, India where the disease was first identified and in Tamil Nadu, Maharashtra, Kerala, and Goa covering the Western Ghats region of India. The incidence of positive cases and distribution of the Kyasanur Forest Disease virus (KFDV) in different geographical regions raises the need to understand the evolution and spatiotemporal transmission dynamics. Phylogeography analysis based on 48 whole genomes (46 from this study) and additionally 28 E-gene sequences of KFDV isolated from different regions spanning the period 1957-2017 was thus undertaken. The mean evolutionary rates based the E-gene was marginally higher than that based on the whole genomes. A subgroup of KFDV strains (2006-2017) differing from the early Karnataka strains (1957-1972) by ~2.76% in their whole genomes and representing spread to different geographical areas diverged around 1980. Dispersal from Karnataka to Goa and Maharashtra was indicated. Maharashtra represented a new source for transmission of KFDV since ~2013. Significant evidence of adaptive evolution at site 123 A/T located in the vicinity of the envelope protein dimer interface may have functional implications. The findings indicate the need to curtail the spread of KFDV by surveillance measures and improved vaccination strategies.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32029759 PMCID: PMC7005018 DOI: 10.1038/s41598-020-58242-w
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Comparison of the nucleotide and amino acid divergence in the different gene regions and proteins of KFDV based on whole genome sequences (n = 48) during the different time frames.
| Region/Gene | Length | Percent divergence within recent isolate (2006 to 2017) | Percent divergence within old isolates (1957 to 1972) | Percent divergence between recent and old isolates | Overall percent divergence | Gene-wise dN/dS | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Gene | Protein | Gene | Protein | Gene | Protein | Gene | Protein | Gene | Protein | ||
| (Nucleotide) ( | (Amino acid) ( | ||||||||||
| Genome | 10248 | 3416 | 2.24 | 0.77 | 0.37 | 0.3 | 2.76 | 0.86 | 2.24 | 0.75 | — |
| C | 351 | 117 | 2.98 | 2.26 | 0.35 | 0.66 | 3.27 | 3.17 | 2.79 | 2.47 | 0.1917 |
| prM | 492 | 164 | 2.36 | 1.52 | 0.22 | 0.28 | 2.46 | 1.66 | 2.15 | 1.44 | 0.1136 |
| E | 1488 | 496 | 2.65 | 0.48 | 0.44 | 0.31 | 3.22 | 0.53 | 2.63 | 0.48 | 0.0274 |
| NS1 | 1059 | 353 | 2.08 | 0.75 | 0.4 | 0.4 | 2.7 | 1.18 | 2.15 | 0.9 | 0.0643 |
| NS2 | 1083 | 361 | 2.52 | 1.24 | 0.44 | 0.53 | 2.91 | 1.4 | 2.44 | 1.23 | 0.0881 |
| NS3 | 1863 | 621 | 2.05 | 0.56 | 0.3 | 0.27 | 2.41 | 0.6 | 2 | 0.54 | 0.0408 |
| NS4 | 1203 | 401 | 2.21 | 0.65 | 0.4 | 0.32 | 2.8 | 0.57 | 2.25 | 0.57 | 0.0406 |
| NS5 | 2709 | 903 | 1.99 | 0.62 | 0.35 | 0.13 | 2.68 | 0.57 | 2.09 | 0.54 | 0.0369 |
Species-wise comparison of the nucleotide and amino acid divergence in the different gene regions and proteins of KFDV based on whole genome sequences (n = 48).
| Region | Within host divergence | Between host divergence | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Monkey (M) | Tick (T) | Human (H) | M vs T | M vs H | T vs H | |||||||
| Gene | Protein | Gene | Protein | Gene | Protein | Gene | Protein | Gene | Protein | Gene | Protein | |
| Genome | 2.04 | 0.68 | 1.90 | 0.54 | 2.77 | 0.79 | 2.06 | 0.63 | 2.23 | 0.78 | 2.39 | 0.79 |
| C | 2.84 | 2.62 | 2.48 | 2.46 | 2.69 | 2.20 | 2.86 | 2.77 | 2.69 | 2.40 | 3.00 | 2.66 |
| prM | 1.81 | 1.02 | 1.66 | 0.86 | 2.29 | 1.64 | 1.81 | 0.97 | 2.12 | 1.53 | 2.31 | 1.51 |
| E | 2.50 | 0.38 | 2.36 | 0.30 | 2.58 | 0.52 | 2.54 | 0.34 | 2.60 | 0.51 | 2.48 | 0.53 |
| NS1 | 1.84 | 0.88 | 1.77 | 0.69 | 2.24 | 0.86 | 1.87 | 0.78 | 2.16 | 0.99 | 2.29 | 0.95 |
| NS2 | 2.26 | 1.02 | 2.02 | 0.69 | 2.55 | 1.46 | 2.25 | 0.87 | 2.40 | 1.31 | 2.54 | 1.20 |
| NS3 | 1.73 | 0.43 | 1.75 | 0.46 | 2.02 | 0.55 | 1.83 | 0.46 | 1.93 | 0.53 | 2.19 | 0.61 |
| NS4 | 2.09 | 0.59 | 1.75 | 0.22 | 2.34 | 0.70 | 2.01 | 0.43 | 2.30 | 0.63 | 2.34 | 0.51 |
| NS5 | 1.93 | 0.55 | 1.79 | 0.43 | 2.11 | 0.52 | 1.93 | 0.52 | 2.11 | 0.54 | 2.22 | 0.59 |
Selection pressure analysis of KFDV isolates using the methods (SLAC, FEL, REL and MEME) available in the Datamonkey server.
| Amino acid position in genome | Gene | Amino acid position in gene | MH 1721699 M 2017 (variable | SLAC p-value | FEL p-value | REL Bayes Factor | MEME p-value | FUBAR Post. Pr. |
|---|---|---|---|---|---|---|---|---|
| 4 | C | 4 | G(R) | 0.654 | 0.314 | 1.78 | 0.093 | 0.482 |
| 12 | C | 12 | G(A) | 0.963 | 0.554 | 0.284 | 0.095 | 0.088 |
| 22 | C | 22 | T(P) | 0.667 | 0.374 | 1.707 | 0.015 | 0.446 |
| 55 | C | 55 | R(T) | 0.707 | 0.472 | 1.461 | 0.052 | 0.444 |
| 118 | PrM | 1 | A(V) | 0.667 | 0.516 | 1.345 | 0.071 | 0.421 |
| V(I,A) | 0.297 | 0.883 | ||||||
| A(T) | 0.132 | |||||||
| 710 | E | 429 | M(V,I) | 0.752 | 0.552 | 0.524 | 0.719 | |
| 763 | E | 482 | I(T) | 0.904 | 0.743 | 0.1 | 0.215 | |
| S(N,R) | 0.359 | 0.886 | ||||||
| 1228 | NS2 | 98 | K(V) | 0.589 | 0.386 | 38.482 | 0.72 | |
| G(S,C,D) | 0.298 | 0.044 | 705.118 | 0.939 | ||||
| 1494 | NS3 | 3 | L(V) | 0.889 | 0.795 | 0.093 | 0.219 | |
| 1495 | NS3 | 4 | V(G) | 0.963 | 0.274 | 0.139 | 0.063 | |
| 1496 | NS3 | 5 | F(S,V) | 0.504 | 0.259 | 5.036 | 0.617 | |
| 1501 | NS3 | 10 | T(G) | 0.84 | 0.59 | 4.893 | 0.194 | |
| 1504 | NS3 | 13 | E(G,A) | 0.916 | 0.335 | 14.811 | 0.078 | |
| R(G,T) | 0.351 | 0.208 | 0.875 | |||||
| 1712 | NS3 | 221 | G(R) | 0.328 | 0.18 | 0.199 | 0.892 | |
| 1726 | NS3 | 235 | K(R) | 0.731 | 0.502 | 1.427 | 0.438 | |
| 1842 | NS3 | 351 | R(K) | 0.686 | 0.363 | 1.653 | 0.464 | |
| 1892 | NS3 | 401 | L(P) | 0.378 | 0.26 | 2.022 | 0.498 | |
| 2018 | NS3 | 527 | L(K) | 0.282 | 0.398 | 16.093 | 0.678 | |
| 2145 | NS4 | 33 | E(A) | 0.729 | 0.472 | 1.503 | 0.449 | |
| 2186 | NS4 | 74 | R(I) | 0.687 | 0.668 | 1.088 | 0.427 | |
| 2188 | NS4 | 76 | S(F) | 0.845 | 0.093 | 0.171 | 0.068 | |
| 2190 | NS4 | 78 | S(N) | 0.505 | 0.161 | 13.335 | 0.689 | |
| 2193 | NS4 | 81 | F(I,C) | 0.505 | 0.259 | 5.032 | 0.617 | |
| 2253 | NS4 | 141 | F(L) | 0.71 | 0.358 | 1.867 | 0.448 | |
| 2268 | NS4 | 156 | D(E) | 0.673 | 0.521 | 0.67 | 0.665 | |
| 2474 | NS4 | 362 | T(S) | 0.667 | 0.313 | 2.034 | 0.468 | |
| 2875 | NS5 | 362 | T(P) | 0.667 | 0.313 | 2.034 | 0.468 |
Sites identified to be under positive selection pressure based on the statistically significance level* (shown in bold font) by at least two of the methods are shown in bold.
Figure 1Homology model and positively selected site of KFDV E protein dimer: Homology model of E protein dimer of KFDV and mapping of the positively selected site 123 A. The E protein domains I, II and III are presented in red, yellow and violet, with the fusion loop in orange color. The dimer interface residues are shown in green color. The 123 A residue is displayed in a ball and stick representation in each monomeric unit.
Estimates of substitution rates and root ages for KFDV sequences based on different clock models for the two datasets (a) Whole genomes (b) E-gene sequences.
| Clock Model | Posterior | Marginal likelihood | Mean substitution rate x 10−4 (95% HPD) (subs per site year−1) | Root age (95% HPD) (years back) | Root age (95% HPD) (years) |
|---|---|---|---|---|---|
| a) Whole genomes | |||||
| Uncorrelated exponential | −25106.841 [−25134.062, −25079.784] | −22174.448 (±0.07) | 4.2 (3.2-5.35) | 64.0421 (60.70, 69.4) | November 1952 [June-1947, April-1956] |
| Uncorrelated lognormal | −25112.401 [−25136.159, −25088.453] | −22177.365 (±0.06) | 3.9 (3.16-4.67) | 63.5689 (61.07, 67.57) | June 1953 [May-1949, April-1955] |
| b) E-gene sequences | |||||
| Uncorrelated exponential | −6903.16 [−6945.708, −6861.445] | −3556.612 (±0.18) | 5.4 (3.59-7.43) | 64.82 (60.60-71.43) | February 1952 [June-1945, May-1956] |
| Uncorrelated lognormal | −6922.692 [−6959.007, −6886.008] | −3563.871 (±0.16) | 4.6 (3.3-6.05) | 64.88(61.14-70.2) | January 1952 [October-1946, October-1955] |
Figure 2Maximum Clade Credibility (MCC) tree of KFDV whole gene: Maximum Clade Credibility (MCC) tree for KFDV based on whole genome sequences (n = 48): Key nodes are labeled. The circles at the nodes indicate the posterior clade probabilities with size reflecting the confidence. The 95% HPD limits of the tMRCA (Time to the Most Recent Common Ancestor) estimates are indicated as the translucent horizontal bars at the nodes. The numbers at the nodes correspond to the ancestral states with their probabilities. The branches are colored according to the respective ancestral geographical region. The geographical regions are represented by 2 letter codes (GA: Goa; KA: Karnataka; KL: Kerala; MH: Maharashtra; TN: Tamil Nadu).
Figure 3Maximum Clade Credibility (MCC) tree of KFDV E gene: Maximum Clade Credibility (MCC) tree for KFDV based on envelope gene sequences (n = 76) Key nodes are labeled. The circles at the nodes indicate the posterior clade probabilities with size reflecting the confidence. The 95% HPD limits of the tMRCA (Time to the Most Recent Common Ancestor) estimates are indicated as the translucent horizontal bars at the nodes. The numbers at the nodes correspond to the ancestral states with their probabilities. The branches are colored according to the respective ancestral geographical region. The geographical regions are represented by 2 letter codes (GA: Goa; KA: Karnataka; KL: Kerala; MH: Maharashtra; TN: Tamil Nadu).
Figure 4Bayesian skyline plot: Bayesian skyline plot for KFDV isolates based on (a) whole genome sequences (b) E-gene sequences.
Figure 5Migratory pathways and migration times of KFDV: Schematic representation of the plausible migratory pathways and migration times of KFDV on a map of the study area. Numbers along the arrows indicate the year of migration and introduction into the geographic locations representing the specific Indian State. Map not to scale.