| Literature DB >> 31666378 |
Alice Michie1, Vijaykrishna Dhanasekaran2, Michael D A Lindsay3, Peter J Neville3, Jay Nicholson3, Andrew Jardine3, John S Mackenzie4,5, David W Smith4, Allison Imrie6,4.
Abstract
Ross River virus (RRV), an alphavirus of the Togaviridae family, is the most medically significant mosquito-borne virus of Australia. Past RRV phylogenetic and evolutionary analyses have been based on partial genome analyses only. Three geographically distinct RRV lineages, the Eastern, the Western, and the supposedly extinct North-Eastern lineage, were classified previously. We sought to expand on past phylogenies through robust genome-scale phylogeny to better understand RRV genetic diversity and evolutionary dynamics. We analyzed 106 RRV complete coding sequences, which included 13 genomes available on NCBI and 94 novel sequences derived for this study, sampled throughout Western Australia (1977-2014) and during the substantial Pacific Islands RRV epidemic (1979-1980). Our final data set comprised isolates sampled over 59 years (1959-2018) from a range of locations. Four distinct genotypes were defined, with the newly described genotype 4 (G4) found to be the contemporary lineage circulating in Western Australia. The prior geographical classification of RRV lineages was not supported by our findings, with evidence of geographical and temporal cocirculation of distinct genetic groups. Bayesian Markov chain Monte Carlo (MCMC) analysis revealed that RRV lineages diverged from a common ancestor approximately 94 years ago, with distinct lineages emerging roughly every 10 years over the past 50 years in periodic bursts of genetic diversity. Our study has enabled a more robust analysis of RRV evolutionary history and resolved greater genetic diversity that had been previously defined by partial E2 gene analysis.IMPORTANCE Ross River virus (RRV) causes the most common mosquito-borne infection in Australia and causes a significant burden of suffering to infected individuals as well as being a large burden to the Australian economy. The genetic diversity of RRV and its evolutionary history have so far only been studied using partial E2 gene analysis with a limited number of isolates. Robust whole-genome analysis has not yet been conducted. This study generated 94 novel near-whole-genome sequences to investigate the evolutionary history of RRV to better understand its genetic diversity through comprehensive whole-genome phylogeny. A better understanding of RRV genetic diversity will enable better diagnostics, surveillance, and potential future vaccine design.Entities:
Keywords: Aedes camptorhynchuszzm321990; Western Australia; alphavirus; arbovirus; evolutionary analysis; mosquito; phylogeny
Mesh:
Year: 2020 PMID: 31666378 PMCID: PMC6955267 DOI: 10.1128/JVI.01234-19
Source DB: PubMed Journal: J Virol ISSN: 0022-538X Impact factor: 5.103
FIG 1Sampling locations of RRV isolates sequenced in this study. (A) Map indicates the boundaries of the major Western Australian regions. (B to E) Black circles indicate where mosquito-derived isolates were sampled, while blue stars indicate locations of human-derived RRV isolate sampling. All maps were generated using Arc GIS (ESRI).
The length (excluding gaps), pairwise nucleotide identities, and base frequencies of individual genes within the 106 taxa Ross River virus data set
| Gene region | Gene region length including gaps (nt/aa) | Avg pairwise identity (nt/aa) (%) | Base frequencies (A, C, G, T) (%) |
|---|---|---|---|
| nsP1 | 1,602/534 | 98.4/99.0 | (28.8, 24.7, 26.7, 19.9) |
| nsP2 | 2,394/798 | 98.3/99.5 | (28.1, 24.4, 26.4, 21.2) |
| nsP3 | 1,650/550 | 95.2/96.5 | (24.0, 26.3, 29.0, 20.7) |
| nsP4 | 1,833/661 | 97.9/99.2 | (28.5, 23.9, 26.5, 21.1) |
| C | 810/270 | 98.5/99.1 | (33.4, 24.9, 26.7, 15.1) |
| 6K | 180/60 | 98.8/98.1 | (19.8, 25.4, 25.5, 29.3) |
| E1 | 1,314/438 | 98.1/99.6 | (26.3, 26.6, 26.4, 20.7) |
| E2 | 1,266/422 | 98.0/99.3 | (25.2, 28.2, 26.9, 19.8) |
| E3 | 192/64 | 97.2/99.5 | (24.8, 32.2, 22.3, 20.6) |
FIG 2Maximum likelihood phylogeny (RAxML) reconstruction of 106 RRV whole-genome sequences. Virus nomenclature includes the strain name, location of collection, species of origin, and the year of sampling. GenBank accession numbers are provided for sequences derived from NCBI. Taxa are colored based on their geographical origin. Bootstrap support values >70% are presented above nodes.
FIG 3Maximum clade credibility tree (MCC) of 106 dated Ross River virus whole-genome sequences, estimated under an uncorrelated log normal (UCLN) molecular clock, assuming a GTR + G + I nucleotide substitution model. Clades are colored for the geographical origin of the taxa. Posterior probability values of >0.70 are shown above branches. Nodes defining major genetic groups are named A to H and are referenced in the lower table. Calendar years are shown on the x axis. The table presents the divergence time (time to most recent common ancestor [tMRCA]) of distinct nodes (A to H) and the nucleotide substitution rates, with statistical error reported as the 95% highest probability density (95% HPD).
Codon sites with significant evidence of positive selection pressure
| Codon site | Selection detection method | Amino acid substitution and genome location | Sequences with derived amino acid state | |||
|---|---|---|---|---|---|---|
| FEL | SLAC | MEME | FUBAR | |||
| Nonstructural polyproteins | ||||||
| 248 | I248T, nsP1 | G1: 9057, 8961 | ||||
| G2: SW12358, K3011, WK20, AN572.2 | ||||||
| G3: AN572.1, AN205, SW2089, P42134, P41472, P41453, P42161, P42273, P42115, P41971 | ||||||
| G4: P5131, QML1, SW99359, SE1168, DC59627, SW94735, MIDI13, SW94735, SW97414, DC55607, RRV_TT, K79390, K78118, K76352, K80776, K80535, K67847, K65195, K61297, DC36486, SW72780, SW72209, SW64247, DC29695 | ||||||
| 441 | 0.085 | 0.158 | K441E, nsP1 | G2: WK20 | ||
| G3: AN205, P41472, F9073, 218100, 218072 | ||||||
| G4: MIDI86 | ||||||
| 1165 | 0.080 | 1.00 | A333T, nsP3 | G3: AN205 | ||
| G4: DC40243, K80535, K80776 | ||||||
| A333V, nsP3 | G4: DC30218, SW64247, DC36664, SW71961, DC36025 | |||||
| Structural polyproteins | ||||||
| 329 | 0.196 | 0.06 | G59R, E3 | G1: 2982, T48, 3078, 2975, 8961, 9057 | ||
| G3: K2505, SHLS735, AN72.1 | ||||||
| G59E, E3 | G4: DC40243, SW71959, SW72718, SW72780, SW72961, SW72209, SW83959 | |||||
| 929 | 0.208 | 0.06 | V113I, E1 | All G3 (except AN572.1), All G4 (except K50610, K51670, DC55607, SE1168, SW97414, DC59627, SW94735, SW99359, K67847, K80535, K80776, MIDI32, MIDI86) | ||
Refers to amino acid positions within either the nonstructural or structural polyprotein.
FEL, fixed-effect likelihood; SLAC, single-likelihood ancestor counting; MEME, mixed-effects model of evolution; FUBAR, fast, unconstrained Bayesian approximation. Significant selection sites confirmed by at least two of these methods as well as the isolates that contain the corresponding amino acid substitution are in boldface font.
FIG 4Bayesian skyline plot demonstrating fluctuations in relative RRV effective population size (y axis) through time, in calendar years (x axis). The center line demonstrates the mean estimate of effective population size, with the upper and lower lines showing statistical error as the 95% highest probability density (95% HPD).
RRV isolates with observed deletions within the hypervariable domain of the nsP3 gene
| Isolate name | Virus genotype | Genomic location of deletion (nt position) | P*P*PR motif affected | Size of deletion (nt) |
|---|---|---|---|---|
| DC5692 | 2 | 5414–5416 | None | 3 |
| SW2191 | 2 | 5378–5416 | 2 | 39 |
| K1198 | 2 | 5378–5416 | 2 | 39 |
| K3011 | 2 | 5309–5383 | None | 75 |
| SW24015 | 2 | 5336–5416 | 2 | 81 |
| SW29862 | 2 | 5338–5418 | 2 | 81 |
| SW2089 | 3 | 5434–5446 | 3 | 33 |
| P42134 | 3 | 5380–5415 | 2 | 36 |
| SW42256 | 4 | 5380–5415 | 2 | 36 |
| SW83959 | 4 | 5380–5415 | 2 | 36 |
| K50081 | 4 | 5380–5415 | 2 | 36 |
| SW94735 | 4 | 5379–5414 | 2 | 36 |
| DC7053 | 4 | 5377–5415 | 2 | 39 |
| DC29695 | 4 | 5377–5415 | 2 | 39 |
| DC36486 | 4 | 5377–5415 | 2 | 39 |
| SW72209 | 4 | 5375–5413 | 2 | 39 |
| K65195 | 4 | 5356–5415 | 2 | 60 |
| P5131 | 4 | 5356–5415 | 2 | 60 |
| SW72780 | 4 | 5356–5415 | 2 | 60 |
| DC55607 | 4 | 5338–5415 | 2 | 78 |
| K79390 | 4 | 5338–5415 | 2 | 78 |
| SW97414 | 4 | 5338–5415 | 2 | 78 |
| SW72718 | 4 | 5312–5413 | 2 | 102 |
| SW74249 | 4 | 5281–5415 | 2 | 135 |
For most isolates, these deletion events resulted in the loss or partial loss of one of the four conserved RRV proline (P*P*PR) motifs within nsP3. The identity of the affected proline motif (numbered 1 to 4) is presented for each deletion. The genotype of the isolate and the genomic location and size of the deletion events are also presented.
FIG 5MAFFT alignment of the Ross River virus data set revealed a 12-amino-acid insertion within the hypervariable region of the nsP3 gene, which was unique and characteristic of G3 and G4 isolates. Two isolates, DC36025 and RRV_TT, had a 5-amino-acid and 12-amino-acid deletion within this insertion region, respectively. No G1 or G2 isolates in our study contained this 36-nucleotide insertion.