Molecular characterization of early hepatitis C virus (HCV) infection remains rare. Ten out of 78 patients of a hematology/oncology center were found to be HCV RNA positive two to four months after hospitalization. Only two of the ten patients were anti-HCV positive. HCV hypervariable region 1 (HVR1) was amplified in seven patients (including one anti-HCV positive) and analyzed by next generation sequencing (NGS). Genetic variants were reconstructed by Shorah and an empirically established 0.5% variant frequency cut-off was implemented. These sequences were compared by phylogenetic and diversity analyses. Ten unrelated blood donors with newly acquired HCV infection detected at the time of donation (HCV RNA positive and anti-HCV negative) served as controls. One to seven HVR1 variants were found in each patient. Sequences intermixed phylogenetically with no evidence of clustering in individual patients. These sequences were more similar to each other (similarity 95.4% to 100.0%) than to those of controls (similarity 64.8% to 82.6%). An identical predominant variant was present in four patients, whereas other closely related variants dominated in the remaining three patients. In five patients the HCV population was limited to a single variant or one predominant variant and minor variants of less than 10% frequency. In conclusion, NGS analysis of a cluster of HCV infections acquired in the hospital setting revealed the presence of low diversity, very closely related variants in all patients, suggesting an early-stage infection with the same virus. NGS combined with phylogenetic analysis and classical epidemiological analysis could help in tracking of HCV outbreaks.
Molecular characterization of early hepatitis C virus (HCV) infection remains rare. Ten out of 78 patients of a hematology/oncology center were found to be HCV RNA positive two to four months after hospitalization. Only two of the ten patients were anti-HCV positive. HCV hypervariable region 1 (HVR1) was amplified in seven patients (including one anti-HCV positive) and analyzed by next generation sequencing (NGS). Genetic variants were reconstructed by Shorah and an empirically established 0.5% variant frequency cut-off was implemented. These sequences were compared by phylogenetic and diversity analyses. Ten unrelated blood donors with newly acquired HCV infection detected at the time of donation (HCV RNA positive and anti-HCV negative) served as controls. One to seven HVR1 variants were found in each patient. Sequences intermixed phylogenetically with no evidence of clustering in individual patients. These sequences were more similar to each other (similarity 95.4% to 100.0%) than to those of controls (similarity 64.8% to 82.6%). An identical predominant variant was present in four patients, whereas other closely related variants dominated in the remaining three patients. In five patients the HCV population was limited to a single variant or one predominant variant and minor variants of less than 10% frequency. In conclusion, NGS analysis of a cluster of HCV infections acquired in the hospital setting revealed the presence of low diversity, very closely related variants in all patients, suggesting an early-stage infection with the same virus. NGS combined with phylogenetic analysis and classical epidemiological analysis could help in tracking of HCV outbreaks.
The high level of intrahost and interhost hepatitis C virus (HCV) diversity results from the high error rate of RNA dependent RNA polymerase (RdRp) and fast replication of the virus. Consequently, HCV population represents a swarm of closely related variants called quasispecies. HCV variability enables the evasion of host adaptive immune responses and establishment of chronic infection, as well as drug resistance [1, 2]. Viral molecular diversity is often significantly reduced upon virus transmission to a new host (bottleneck effect) [3-6]. The bottleneck phenomenon may be affected by the size of the inoculum, HCV genotype, viral load and complexity of the virus population in the donor (number and frequency of variants), as well as recipient host factors, such as IL28B genotype [7, 8].Studies on the early evolution of HCV following infection are rare due to the limited availability of clinical samples from the early stages of infection [9, 10]. In previous studies HCV intrahost diversity was analyzed using such well-established techniques such as DNA heteroduplex gel shift method [11] or bulk clonal sequencing [12]. However, their sensitivity with respect to minor variant detection is typically low [4, 13]. Novel methods suitable for in-depth analysis of quasispecies phenomenon were introduced such as single-genome analysis and next-generation sequencing (NGS) allowing for the evaluation of a wide spectrum of genetic variants, including those of minor frequency [5, 14, 15]. Despite some technical limitations, a reliable detection of variants constituting as little as 0.5% of the population became feasible [16].In the present study we took advantage of a unique opportunity to investigate in-depth genetic diversity during early stages of infection, by analyzing a cluster of HCV infection among patients of a regional hematology and oncology center in Southern Poland. We investigated the diversity of hypervariable region 1 (HVR1) which represents a highly exposed fragment of envelope 2 glycoprotein playing a major role in HCV cell entry (receptor binding, membrane fusion) and is a major target for specific antiviral response (antibody shielding, epitopes for antibody responses) [17, 18]. Its variability facilitates immune evasion and reflects the immune pressure of the host [19].Our study demonstrates the presence of very low HVR1 diversity in the early stage of viral infection. Since these variants were closely related, the patients were most likely infected from a common source.
Materials and methods
Patients
In November and December 2015 five clinically overt cases of acute hepatitis C infection were diagnosed among patients of a regional hematology and oncology center in Southern Poland. As all these patients had repeated hospital stays between August and October 2015, all patients hospitalized in this period in the same ward were contacted and asked to provide a blood sample for HCV infection screening and analysis. Out of 129 inpatients, 34 were already dead by the time of the study, 17 refused participation or could not be reached (including one patient from the initial cluster), and 78 provided both a sample and consent. Out of these tested individuals, HCV RNA was found in ten patients. Extensive epidemiological investigation did not identify the source of infection. Basic clinical and virological data on the study subjects are presented in Table 1.
Table 1
Clinical and virological characteristics of ten hospitalized patients infected with HCV 1b.
Patient ID
Sex
Age (years)
Anti-HCVb
Alanine aminotransferase activity (ALT) levels at the time of HCV-RNA detection [U L-1] (normal values 10–40 U L-1)
Viral load [IU mL-1]
1
F
26
negative
540
6.9x101
2
M
49
negative
481
2.5x104
3a
M
73
positive
N/A
1.6x107
4a
F
80
negative
498
6.3x106
5a
M
55
negative
751
1.6x107
6a
M
66
negative
621
1.2x107
7
F
68
negative
N/A
1.7x104
8
F
63
negative
N/A
3.7x106
9
M
64
positive
N/A
7.4x104
10
M
54
negative
N/A
1.2x108
N/A—not available
a first patients to be diagnosed with clinically overt acute HCV infection
N/A—not availablea first patients to be diagnosed with clinically overt acute HCV infectionbElecsys Anti-HCV assay (Roche Diagnostics, Mannheim, Germany)Plasma samples from ten HCV RNA–positive, anti-HCV negative blood donors were used as controls for phylogenetic and sequence similarity comparisons. These controls were infected with the same HCV 1b subtype, and their infection was identified at the time of attempted donation.The study was approved by the Bioethical Committee of the Medical University of Warsaw (Approval Number WUM AKBE/144/16) and Institute of Hematology and Transfusiology (Approval Number 55/2013) and all subjects and controls provided written informed consent.
HVR1 amplification
HVR1 amplification was done as described in a previous publication [20]. In brief, total RNA was extracted from 250 μl of serum by a modified guanidiniumthiocyanate-phenol/chlorophorm method using Trizol (Life Technologies, Carlsbad, CA, USA). Next, RNA was subjected to reverse transcription at 42°C for 60 minutes using AccuScript High Fidelity Reverse Transcriptase (Agilent Technologies, Santa Clara, CA, USA) and random hexamers. A region of 175 nt length encompassing HVR1 was amplified in two-step PCR using FastStart High Fidelity Taq DNA Polymerase (Roche, Indianapolis, IN, USA). Primers used for the first round amplification were as follows: 5′-CATTGCAGTTCAGGGCCGTGCTA-3′ (nt 1632–1610) and 5′-GGTGCTCACTGGGGAGTCCT-3′ (nt 1389–1408), according to the sequence of reference strain H77 (GenBank accession no. AF009606). Primers employed in the second PCR contained tags recognized by GS Junior sequencing platform, standard 10-nucleotide multiplex identifiers and target-complementary sequence [5′- TCCATGGTGGGGAACTGGGC-3′ (positions 1428–1447) and 5′-TGCCAACTGCCATTGGTGTT-3′ (positions 1603–1584)] [20].
Pyrosequencing
Approximately 3×107 DNA amplicons were subjected to emulsion PCR using the GS Junior Titanium emPCR Lib-A Kit (454 Life Sciences, Branford, CT, USA). Pyrosequencing was carried out according to the manufacturer’s protocol for sequencing amplicons using GS Junior (454 Life Sciences).
Data analysis
Sequencing errors (mismatches, insertions and deletions) were corrected and haplotypes reconstructed using the program diri_sampler from the Shorah software suite (https://www1.ethz.ch/bsse/cbg/software/shorah) [21]. Haplotypes of posterior probability > 95% and represented by at least 10 reads were extracted with LStructure (https://github.com/ozagordi/LocalVariants/blob/master/src/LStructure.py). Based on pyrosequencing and reconstruction of a cloned HVR1 sequence [16] we were previously able to reliably detect variants constituting as little as 0.5% of the population and this cut-off was implemented in the current analysis. Subsequently, reconstructed haplotypes of frequency >0.5% were aligned to the consensus sequence (the most frequent sequence in all patients) and translated into amino acid sequences by MEGA (Molecular Evolutionary Genetics Analysis), version 6.0 (http://www.megasoftware.net/) [22]. Phylogenetic trees were constructed according to the Maximum Likelihood method based on the Tamura-Nei model [23] using MEGA 6.0. We used the same approach in our previous studies [20, 24] and the superiority of Tamura-Nei model for the analysis of HVR1 was reported by others [25]. The robustness of tree topology was estimated by the bootstrapping method (resampling 1000 data sets) using MEGA 6.0. Genetic diversity parameters were assessed by DNA SP version 5 (http://www.ub.edu/dnasp/) and MEGA 6.0. Sequence similarity was compared using Clustal 2.1 Percent Identity Matrix (http://www.clustal.org/omega/) [26]. Furthermore, Highlighter from HIV.lanl.gov was used to visualize individual sequence polymorphisms [13].
Results
In the present study the HVR1 amplification and molecular analysis were successful in seven out of ten HCV-infectedpatients. In three patients HVR1 could not be amplified mostly likely due to low viral load (patients 1 and 9), and mismatch between primers and particular viral strain (patient 10). An average of 4506 HVR1 sequence reads was obtained per sample (median 4149); (Table 2). Reads were reconstructed by SHORAH and, after implementation of the experimentally established 0.5% cut-off, one to seven HCV variants were retained per sample (mean 3.4, median 3.0). Mean nucleotide diversity was 0.032 (median 0.015) and number of nucleotide substitutions was 11.4 per patient (median 5.0). After translation to amino acid sequence, the number of variants ranged from one to three per patient (mean 2.4, median 3.0). The detailed data for each patient are presented in Table 2.
Table 2
Diversity parameters of HVR1 HCV variants in seven hospitalized patients infected with HCV 1b.
Patient ID
Number of NGS reads before filtering
Number of HVR1 nucleotide variantsa
Number of HVR1 amino acid variantsa
Number of nucleotide substitutionsa,b
Nucleotide diversity (per site)a,b
2
3363
2
2
4
0.01515
3
3588
3
2
6
0.01705
4
4149
7
3
12
0.01705
5
4185
5
3
5
0.01061
6
5987
3
3
3
0.00852
7
6481
1
1
0
0.00000
8
3789
3
3
50
0.15333
MEAN (median)
4506 (4149)
3.4 (3.0)
2.4 (3.0)
11.4 (5.0)
0.03167 (0.01515)
a >0.5% frequency cutoff
b With respect to consensus sequence (the most frequent sequence in all patients)
a >0.5% frequency cutoffb With respect to consensus sequence (the most frequent sequence in all patients)When all patients’ sequences were phylogenetically compared with sequences from ten unrelated controls, it was found that all the patients’ sequences clustered together except for two variants of lower frequency (26.6% and 3%) in patient 8, which clustered with variants from one control (C_118). Moreover, sequences derived from the cluster were interspersed with one another, with no evidence of clustering in individual patients (Fig 1).
Fig 1
Phylogenetic analysis of HVR1 variants in seven hospitalized patients infected with HCV 1b and in ten unrelated controls.
Variant frequencies are expressed as percent values and follow haplotype number. Pt denotes patients from the infection cluster. For clarity, each patient is marked with a different graphical symbol. Controls (C) came from the same geographic area and were infected with the same HCV subtype 1b. Bootstrap values obtained with 1000 replications are shown at the particular bifurcation points. HVR1 genotype 1a sequence (GenBank accession no. EF56024) has been used as an outgroup.
Phylogenetic analysis of HVR1 variants in seven hospitalized patients infected with HCV 1b and in ten unrelated controls.
Variant frequencies are expressed as percent values and follow haplotype number. Pt denotes patients from the infection cluster. For clarity, each patient is marked with a different graphical symbol. Controls (C) came from the same geographic area and were infected with the same HCV subtype 1b. Bootstrap values obtained with 1000 replications are shown at the particular bifurcation points. HVR1 genotype 1a sequence (GenBank accession no. EF56024) has been used as an outgroup.Intrapatient phylogenetic trees could be constructed only for patients in whom at least three HVR1 variants were present (patients 3, 4, 5, 6 and 8; Fig 2). As seen, in patients 3, 4 and 6 the trees displayed star-like phylogeny while in patient 5 the tree was more complex, with higher number of clades. Nevertheless, the two dominant variants of 54.1% and 23.5% frequency differed by one substitution only. The variant of 54.1% frequency was identical to the consensus variant (the most prevalent variant in the analyzed patients) and the variant of 23.5% frequency harbored substitution which was also seen in variants from some other patients (patient 2, 3 and 4). In patient 8 all three variants show significant divergence from each other. In the remaining patients HVR1 population was comprised of only one (patient 7) or two variants (patient 2). In the latter the major variant prevailed at 97.6% of population.
Fig 2
Phylogenetic analysis of HVR1 variants in hospitalized patients 3, 4, 5, 6 and 8.
The trees from patients 3, 4 and 6 are consistent with infection with a single founder (star-like phylogeny). Variant frequencies are expressed as percent values and follow haplotype number. For clarity, each patient is marked with a different graphical symbol (corresponding to Fig 1).
Phylogenetic analysis of HVR1 variants in hospitalized patients 3, 4, 5, 6 and 8.
The trees from patients 3, 4 and 6 are consistent with infection with a single founder (star-like phylogeny). Variant frequencies are expressed as percent values and follow haplotype number. For clarity, each patient is marked with a different graphical symbol (corresponding to Fig 1).Sequence similarity analysis revealed that HVR1 sequences from the analyzed patients were more similar to each other (95.4% to 100.0%) than to the sequences derived from controls (64.8% to 82.6%). The only exception were two low frequency variants (26.6% and 3.0%) seen in patient 8. Similarity of the latter two variants to variants from the other six patients ranged from 79.0% to 82.8% while similarity to variants found in controls ranged from 67.6% to 98.3%. Comparison of sequences from all seven patients is shown on Fig 3.
Fig 3
Highlighter plot showing differences (mismatches and gaps) of HVR1 HCV sequence variants in seven hospitalized patients.
Variants are compared to consensus sequence for all sequences present in all patients. A, C, T and G mismatches and gaps are shown in green, blue, orange, red and gray, respectively. Nucleotide numbering follows the reference strain H77 (GenBank accession no. AF009606).
Highlighter plot showing differences (mismatches and gaps) of HVR1 HCV sequence variants in seven hospitalized patients.
Variants are compared to consensus sequence for all sequences present in all patients. A, C, T and G mismatches and gaps are shown in green, blue, orange, red and gray, respectively. Nucleotide numbering follows the reference strain H77 (GenBank accession no. AF009606).Nucleotide sequence analysis revealed that the predominant variant was identical in patients 5, 6, 7 and 8 and a very similar variant was predominant in patients 2 and 4 (Fig 3). These two predominant variants differed only by two nucleotide substitutions (98.86% similarity). In patient 3 the predominant variants were slightly different from predominant variants in patients 2, 4, 5, 6, 7 and 8 (98.30% similarity).When the frequency structure of variants was analyzed it was found that in patients 2, 3, 4, 6 and 7 (71.4% of patients) the HVR1 populations were “narrow” (i.e. limited to single variant or to one predominant variant and minor variants of less than 10% frequency). When compared to consensus sequence, nucleotide substitutions were largely non-silent (Fig 4).
Fig 4
Highlighter plot showing silent and non-silent mutations of HVR1 HCV sequence variants in seven hospitalized patients from the infection cluster.
Variants are compared to consensus sequence for all sequences present in all patients. Silent and non-silent mutations are shown in green and red, respectively. Nucleotide numbering follows the reference strain H77 (GenBank accession no. AF009606).
Highlighter plot showing silent and non-silent mutations of HVR1 HCV sequence variants in seven hospitalized patients from the infection cluster.
Variants are compared to consensus sequence for all sequences present in all patients. Silent and non-silent mutations are shown in green and red, respectively. Nucleotide numbering follows the reference strain H77 (GenBank accession no. AF009606).When analyzing amino acid substitutions compared to the consensus sequence, it was found that changes affected codons 384, 386, 392, 398, 404, 407 within the HVR1 and 413 outside the HVR1, whereas in variants 3 and 2 from patient 8 there were multiple changes (Fig 5). However, all the identified changes were outside of the potential N-glycosylation site at position 417.
Fig 5
Highlighter plot showing differences in amino acid composition of HVR1 HCV sequence variants in hospitalized patients from the infection cluster.
Variants are compared to consensus sequence built from all sequences present in all patients. Color coding of amino acid is shown in the legend. Potential N-glycosylation site in the consensus sequence is marked by a purple dot. Amino acid numbering follows the reference strain H77 (GenBank accession no. AF009606). HVR1 spans codons 384–410.
Highlighter plot showing differences in amino acid composition of HVR1 HCV sequence variants in hospitalized patients from the infection cluster.
Variants are compared to consensus sequence built from all sequences present in all patients. Color coding of amino acid is shown in the legend. Potential N-glycosylation site in the consensus sequence is marked by a purple dot. Amino acid numbering follows the reference strain H77 (GenBank accession no. AF009606). HVR1 spans codons 384–410.
Discussion
In the present study we characterized HCV HVR1 variants in a group of patients from a regional hematology and oncology center. These patients were found to be HCV RNA positive after hospitalization but the source of infection remained unknown despite an extensive epidemiological investigation. As only one of the seven patients seroconverted at that time, the samples likely represented the very early stage of infection. We identified the presence of the same or nearly identical viral strain in all patients by phylogenetic linkage and by high sequence similarity of patients’ HVR1 variants (95.4% to 100.00%).In five out of seven patients the phylogenetic analysis was either consistent with star-like phylogeny or there were no more than two variants present (one of which was dominant and the other was minor) which also suggests infection with a single founder. In addition, diversity per site within each patient viral population was low and roughly the same (except for patient 8), as would be expected by assuming that the divergence time since transmission was short and similar for each patient.The complexity of variant populations, reflected by the number of nucleotide variants, was very low (mean 3.4 variants) which is in contrast to high diversity of HVR1 displayed during chronic infection. For example, in our previous study, the average number of HVR1 variants during chronic HCV 1b infection was 30–40 [16, 24]. This low complexity probably reflects the bottleneck effect at the time of transmission and suggests that the infection has been initiated by a single variant, so called “founder”, which must have been very similar or even identical to the consensus sequence inferred from all patients’ sequences [3, 4, 27]. Furthermore, the structure of HVR1 population was “narrow” in the majority of cases (as one predominant variant was accompanied by minor variants at <10% frequency), which is also compatible with a recent single variant infection. Previous studies showed that in chronic hepatitis C viral population tends to become more “flat” in terms of frequency structure, with higher predominance of moderate and low frequency variants [24]. Alternatively, the structure of the populations could have been affected by the presence of immunosuppression due to immunosuppressive drugs and the underlying disease. However, during such an early phase of infection the immune system response, which could narrow the population diversity is likely to be limited [4].So far very few published studies analyzed diversity in the early stages of HCV infection [4–6, 28–32]. In the typical clinical setting, the complexity and diversity of HCV quasispecies is reduced at the time of transmission [6, 33]. However, a bottleneck may not be present in case of massive infections [30, 34].In our study, the exact route of infection was not identified, but these could have been errors during line flushing and/or multidose vials use. In this case the inoculum (i.e. infectious dose) would have been very small which could explain the bottleneck and very “narrow” character of the viral lineages in the infected patients.Whether one of the patients was the source of infection is unclear. Patient 8 differed from the other patients due to her higher intrahost HVR1 heterogeneity and high similarity of two of her strains to those found in the unrelated control. As these samples were sequenced in separate runs, using different multiplex sequence identifiers (MIDs) the findings were unlikely to be artifactual (sequencing or demultiplexing error or contamination). These data imply that patient 8 could have acquired the infection earlier and was subsequently superinfected with the predominant strain. Alternatively, the patient could have been the source of infection herself, transmitting only the predominant strain to other patients. Indeed, in the study of Campo et al, where multiple HCV infection outbreaks were studied by NGS, intrahost HVR1 populations derived from the infection source displayed the highest genetic heterogeneity [35].Another possibility is that the source of infection was patient 5. This patient’s HVR1 variants displayed more complex phylogeny (higher number of clades) and frequency distribution which are typical for the later phase of infection (higher predominance of moderate and low frequency variants closely related to each other). Patient 5 was also the first to display elevation of ALT activity levels.
Conclusions
Our NGS analysis of a cluster of HCV infections in the hospital setting revealed the presence of low diversity, very closely related variants in all patients, suggesting an early-stage infection with the same viral variant. NGS combined with phylogenetic analysis and classical epidemiological analysis could help in tracking of HCV outbreaks.
Authors: P Farci; A Shimoda; A Coiana; G Diaz; G Peddis; J C Melpolder; A Strazzera; D Y Chien; S J Munoz; A Balestrieri; R H Purcell; H J Alter Journal: Science Date: 2000-04-14 Impact factor: 47.728
Authors: Mattia C F Prosperi; Andrea De Luca; Simona Di Giambenedetto; Laura Bracciale; Massimiliano Fabbiani; Roberto Cauda; Marco Salemi Journal: PLoS One Date: 2010-10-25 Impact factor: 3.240
Authors: Zhi Liu; Dale M Netski; Qing Mao; Oliver Laeyendecker; John R Ticehurst; Xiao-Hong Wang; David L Thomas; Stuart C Ray Journal: J Clin Microbiol Date: 2004-09 Impact factor: 5.948
Authors: Jesus F Salazar-Gonzalez; Maria G Salazar; Brandon F Keele; Gerald H Learn; Elena E Giorgi; Hui Li; Julie M Decker; Shuyi Wang; Joshua Baalwa; Matthias H Kraus; Nicholas F Parrish; Katharina S Shaw; M Brad Guffey; Katharine J Bar; Katie L Davis; Christina Ochsenbauer-Jambor; John C Kappes; Michael S Saag; Myron S Cohen; Joseph Mulenga; Cynthia A Derdeyn; Susan Allen; Eric Hunter; Martin Markowitz; Peter Hraber; Alan S Perelson; Tanmoy Bhattacharya; Barton F Haynes; Bette T Korber; Beatrice H Hahn; George M Shaw Journal: J Exp Med Date: 2009-06-01 Impact factor: 14.307
Authors: Richard J P Brown; Natalia Hudson; Garrick Wilson; Shafiq Ur Rehman; Sara Jabbari; Ke Hu; Alexander W Tarr; Persephone Borrow; Michael Joyce; Jamie Lewis; Lin Fu Zhu; Mansun Law; Norman Kneteman; D Lorne Tyrrell; Jane A McKeating; Jonathan K Ball Journal: J Virol Date: 2012-08-01 Impact factor: 5.103
Authors: Hamish McWilliam; Weizhong Li; Mahmut Uludag; Silvano Squizzato; Young Mi Park; Nicola Buso; Andrew Peter Cowley; Rodrigo Lopez Journal: Nucleic Acids Res Date: 2013-05-13 Impact factor: 16.971
Authors: David L Thomas; Chloe L Thio; Maureen P Martin; Ying Qi; Dongliang Ge; Colm O'Huigin; Judith Kidd; Kenneth Kidd; Salim I Khakoo; Graeme Alexander; James J Goedert; Gregory D Kirk; Sharyne M Donfield; Hugo R Rosen; Leslie H Tobler; Michael P Busch; John G McHutchison; David B Goldstein; Mary Carrington Journal: Nature Date: 2009-10-08 Impact factor: 49.962
Authors: Thomas J Stopka; Omar Yaghi; Min Li; Elijah Paintsil; Kenneth Chui; David Landy; Robert Heimer Journal: PLoS One Date: 2022-08-25 Impact factor: 3.752