| Literature DB >> 31616569 |
Reilly Hostager1, Manon Ragonnet-Cronin1, Ben Murrell1, Charlotte Hedskog2, Anu Osinusi2, Simone Susser3, Christoph Sarrazin2,4, Evguenia Svarovskaia2, Joel O Wertheim1.
Abstract
Recombination is an important driver of genetic diversity, though it is relatively uncommon in hepatitis C virus (HCV). Recent investigation of sequence data acquired from HCV clinical trials produced twenty-one full-genome recombinant viruses belonging to three putative inter-subtype forms 2b/1a, 2b/1b, and 2k/1b. The 2k/1b chimera is the only known HCV circulating recombinant form (CRF), provoking interest in its genetic structure and origin. Discovered in Russia in 1999, 2k/1b cases have since been detected throughout the former Soviet Union, Western Europe, and North America. Although 2k/1b prevalence is highest in the Caucasus mountain region (i.e., Armenia, Azerbaijan, and Georgia), the origin and migration patterns of CRF 2k/1b have remained obscure due to a paucity of available sequences. We assembled an alignment which spans the entire coding region of the HCV genome containing all available 2k/1b sequences (>500 nucleotides; n = 109) sampled in ninteen countries from public databases (102 individuals), additional newly sequenced genomic regions (from 48 of these 102 individuals), unpublished isolates with newly sequenced regions (5 additional individuals), and novel complete genomes (2 additional individuals) generated in this study. Analysis of this expanded dataset reconfirmed the monophyletic origin of 2k/1b with a recombination breakpoint at position 3,187 (95% confidence interval: 3,172-3,202; HCV GT1a reference strain H77). Phylogeography is a valuable tool used to reveal viral migration dynamics. Inference of the timed history of spread in a Bayesian framework identified Russia as the ancestral source of the CRF 2k/1b clade. Further, we found evidence for migration routes leading out of Russia to other former Soviet Republics or countries under the Soviet sphere of influence. These findings suggest an interplay between geopolitics and the historical spread of CRF 2k/1b.Entities:
Keywords: circulating recombinant form; hepatitis C virus; phylogeography
Year: 2019 PMID: 31616569 PMCID: PMC6785677 DOI: 10.1093/ve/vez041
Source DB: PubMed Journal: Virus Evol ISSN: 2057-1577
Figure 1.Density of sequence data available per gene segment in HCV genome. The full coding region of the HCV genome was divided into nine partitions: one for each gene, with NS4A and NS4B combined into a single partition. The densely sampled regions are sections of the genome which contain at least 25 sequences of the 109 included, totaling four partitions.
Figure 2.Inferred recombinant breakpoints for 2b/1a, 2b/1b, and 2k/1b full-genomes. Breakpoint estimates are shown with 95% confidence intervals. For 2k/1b, the results of the combined genomic analysis are depicted as a shaded red box (95% confidence intervals), with the breakpoint point estimate as a solid vertical line. HCV recombinant viruses follow the naming convention: 5′ genomic segment/3′ genomic segment. Genomic positions are shown relative to the genotype 1a H77 reference strain.
Figure 3.Maximum likelihood phylogenetic trees depicting position of recombinant HCV taxa. (A) 2b/1a recombinant hemigenomes relative to subtype 1a taxa (n = 1,049 taxa). (B) 2b/1b and 2k/1b recombinant hemigenomes relative to subtype 1b reference taxa (n = 463 taxa). (C) 2b/1a and 2b/1b recombinant hemigenomes relative to subtype 2b reference taxa (n = 116 taxa). All non-recombinant clades comprising four or more taxa are collapsed. Recombinant genomes are bolded and colored. Complete phylogenies including bootstrap support values are available as Supplementary Material.
Region-specific evolutionary rates for CRF 2k/1b from full coding region analysis with partitioned GTR+Γ4 nucleotide substitution models and partitioned URL molecular clocks.
| Genomic region | Mean rate | 95% HPD |
|---|---|---|
| Core | 0.45 | 0.38–0.52 |
| E1 | 5.53 | 4.71–6.43 |
| E2 | 1.42 | 1.22–1.63 |
| P7 | 1.74 | 1.27–2.21 |
| NS2 | 0.70 | 0.62–0.79 |
| NS3 | 1.40 | 1.27–1.54 |
| NS4A/NS4B | 0.62 | 0.50–0.75 |
| NS5A | 1.09 | 0.98–1.22 |
| NS5B | 0.71 | 0.61–0.81 |
| Core | 0.45 | 0.38–0.52 |
aRates shown as 10−3 substitutions/site/year.
Figure 4.Phylogeographic reconstruction of the HCV 2k/1b clade from full coding region analysis. Phylogenetic inference performed using a single substitution model and relaxed molecular clock. Node and branch colors correspond to the inferred most probable ancestral state location, and each lineage is labeled with country name. Posterior probability support for inferred geographic location is depicted as the area of the circle on each internal node (larger circles indicate greater support; scale shown in insert). Gray bar at 1991 represents the collapse of the Soviet Union.
Figure 5.Density plots of the relevant migration events between Russia and six other countries (A–F) estimated from full coding region analysis with a single substitution model and molecular clock partition. Probability that the number of migration events is zero is shown as a bar at the 0 on the X-axis. Only migration routes with Bayes Factor (BF) ≥20 in the phylogeographic analysis are reported here. See Supplementary Table S3 for results from other partition strategies and densely sampled region analyses.