Literature DB >> 35050401

Complete genome sequencing and evolutionary analysis of hepatitis C virus subtype 6a, including strains from Guangdong Province, China.

Ru Xu1,2, Hao Wang1,2, Jieting Huang1,2, Min Wang1,2, Qiao Liao1,2, Zhengang Shan1,2, Huishan Zhong1,2, Xia Rong3,4,5, Yongshui Fu6,7,8,9.   

Abstract

We performed an evolutionary analysis using whole genome sequence isolates of hepatitis C virus (HCV) 6a from Guangdong Province and reference sequences from various countries. Less than 5% of the HCV genome was found to be under positive selection. The E1 and E2 proteins had the highest proportion of positively selected sites both within and outside of CD8 T cell epitopes in all of the strains. Regions corresponding to CD8 T cell epitopes were under negative selection except in the isolates from Guangdong. Furthermore, we found evidence of three introductions of the virus into Guangdong from Vietnam and other Southeast Asian countries. Thus, this study provides information about the transmission of HCV 6a by comparison of full-length sequences, indicating the impact of selective constraints in Guangdong and across China.
© 2022. The Author(s), under exclusive licence to Springer-Verlag GmbH Austria, part of Springer Nature.

Entities:  

Mesh:

Year:  2022        PMID: 35050401      PMCID: PMC8771652          DOI: 10.1007/s00705-021-05358-7

Source DB:  PubMed          Journal:  Arch Virol        ISSN: 0304-8608            Impact factor:   2.574


Introduction

Hepatitis C virus (HCV) is divided into eight genotypes (1–8) and subdivided into more than 90 subtypes, which are named in alphabetical order (https://talk.ictvonline.org/ictv_wikis/flaviviridae/w/sg_flavi/634/table-1). Of them, genotype 6 (gt6) is the most divergent genotype and is predominantly distributed in South China and other Southeast Asian countries [1-3]. HCV 6a, the first subtype of gt6 discovered, is endemic in Vietnam, Hong Kong, and South China [3, 4], especially in Guangdong Province. In the southern part of China, Guangdong is the most populous province, with 126 million people. Approximately 32 million of these are migrants who work in business or are temporarily hired as laborers (http://www.stats.gov.cn/tjsj/pcsj/). Guangdong Province also has a large population of drug users, which account for the highest proportion in China. Unsafe injection practices have posed an increasing risk for HCV infection in Guangdong in recent years. Collectively, these factors have created an environment that has allowed an increase in HCV transmission in Guangdong and elsewhere in China. Our recent study showed that HCV 6a is the most prevalent subtype in Guangdong Province and has become more prevalent in other provinces [5], indicating that it is being transmitted rapidly all over China. It has been shown that subtype 6a has spread rapidly across China [6]. It has been reported that transmission is an instrumental force driving HCV evolution [7]. When the virus infects a novel host, selection pressure on the viral genome drives the virus to mutate to escape the host immune system [8]. Therefore, positive selection analysis can be used to identify specific sites involved in immune escape. Comparison of whole genome sequences can maximize the evolutionary resolution [9] and help to identify positively selected sites in the genome. The aim in this study was to investigate the evolutionary history of HCV 6a, obtain a detailed view of the mechanism of selection, and understand how this may differentially affect variants of this virus in Guangdong and elsewhere in China.

Materials and methods

Previously, the E1 genotypes of 100 HCV 6a isolates from a cohort of blood donors (BDs) were determined, revealing that 6a is predominant in Guangdong Province [4]. Fourteen representative 6a sequences were selected according to their position in a maximum-likelihood phylogenetic tree (Supplementary Fig. S1). All of these isolates were from treatment-naïve individuals. The Institutional Review Board at the Guangzhou Blood Center approved this study, and the guidelines set by this board were strictly followed. In addition, the study protocol followed the ethical guidelines set in place by the 1975 Declaration of Helsinki and was approved by the Medical Ethics Committee of Guangzhou Blood Center. Total RNA was extracted using an Agencourt RNAdvance Blood Kit (Beckman Coulter) and then reverse transcribed using Superscript III (Invitrogen). Metagenomic library construction, Illumina HiSeq sequencing, and bioinformatic analysis were performed as described previously [10]. RDP4 [11] and GARD [12] were used to identify evidence of recombination. All 108 available whole-genome reference sequences of HCV 6a were downloaded from the NCBI nucleotide database (https://www.ncbi.nlm.nih.gov/gene). Sequences that lacked location and sampling date information were excluded (n = 5). Ultimately, 117 sequences (103 reference sequences and 14 sequences from this study) were used to construct a Bayesian Markov chain Monte Carlo (MCMC) tree using the SRD06 nucleotide partitioning model, an uncorrelated lognormal relaxed molecular clock, and a Bayesian Skyline coalescent model. We classified the whole genome sequences of HCV 6a into two groups (Guangdong and non-Guangdong) for positive selection analysis according to their geographic location. We ensured that all sequences were derived from DAA-naïve patients, excluding the reference sequences from DAA-treated patients described in the literature. The mixed-effects model of evolution (MEME), implemented in the Datamonkey package (http://www.datamonkey.org/), has superior performance over older models. Therefore, MEME was used to detect adaptive evolution between Guangdong genomes and non-Guangdong genomes. Information about the positions of CD4 and CD8 T-cell epitopes was obtained from the Los Alamos National Laboratory website (http://hcv.lanl.gov/content/immuno/immuno-main.html) and the Immune Epitope Database and Analysis Resource (http://www.iedb.org). The nucleotide sequences reported in this study were deposited in the GenBank database with the accession numbers MZ161145-MZ161158.

Results

To obtain reliable consensus sequences, we defined clear criteria for consensus base calling based on quality score and depth. The criteria were as follows: 1) Phred quality score should be above 32, and 2) the minimum depth at a position of the HCV genome should be greater than or equal to 5. After trimming the sequences, all 14 sequences had a complete open reading frame. Neither RDP4 nor GARD showed evidence of recombination. The genotyping results based on whole genome sequences were identical to those obtained based on the E1 gene sequences in our previous study [4]. The evolutionary rate of the HCV 6a whole genome was found to be 9.59 × 10−4 substitution/sites/year (s/s/y). For partitioned genes, E2 had the fastest evolutionary rate (9.40 × 10−3 s/s/y), followed by E1, NS3, NS5A, NS4A, P7, NS4B, NS2, NS5B, NS4A, Core, and the 5'UTR (Supplementary Table S1). The HCV 6a sequences formed two clades (I and II) in an MCMC tree (Fig. 1). The reference sequences located at the root of clade I and all of those in clade II were from Vietnam, Asian immigrants (Canada), or a community in China, indicating that the HCV 6a strains circulating worldwide may have originated from Vietnam and other Asian countries. Except for HCV072, the Guangdong HCV 6a isolates were concentrated in three groups, indicating three transmission events in Guangdong Province. Group A originated around 1964 (95% CI: 1871–1988), while group B originated around 1966 (95% CI: 1941–1975). HCV 6a was transported from Vietnam to Hong Kong, then to Guangdong, where it became endemic. Group C originated around 1968 (95% CI: 1884–1981), and most of the sequences in this group were from Guangdong Province.
Fig. 1

An MCMC tree based on whole-genome sequences of HCV 6a isolates. Colored branches represent the geographic distribution of the sequences. HCV 6a isolates identified in this study are indicated by a solid red circle. The posterior value and the time to the most recent common ancestor (tMRCA) are shown for the main clusters.

An MCMC tree based on whole-genome sequences of HCV 6a isolates. Colored branches represent the geographic distribution of the sequences. HCV 6a isolates identified in this study are indicated by a solid red circle. The posterior value and the time to the most recent common ancestor (tMRCA) are shown for the main clusters. MEME analysis detected a total of 102 and 105 positively selected sites (PSS) among Guangdong and non-Guangdong sequences, respectively. The proteins with the highest proportion of PSSs were E1 and E2. The one with the lowest proportion was NS3, both in the Guangdong and non-Guangdong sequences (Fig. 2). There was a significant difference in the proportion of PSSs between Guangdong and the non-Guangdong region (χ2 = 39.257, P < 0.05). Non-Guangdong HCV 6a isolates tended to have more PSSs in E2 (28 sites vs. 25 sites) and NS2 (seven sites vs. four sites), whereas Guangdong HCV 6a had more PSSs in E1 (18 sites vs. 12 sites), P7 (five sites vs. one site), and Core (six sites vs. two sites). Fifteen homologous sites were found to be positively selected in both regions, located in Core (one site), E1 (three sites), E2 (eight sites), NS5A (one site), and NS5B (two sites). A map of Guangdong and non-Guangdong genomes representing the different layers of data analyzed (PSSs, CD8 cell epitopes, and CD4 T cell epitopes) is shown in Figure 3. An association between PSSs and the presence of CD8 epitopes was found (χ2 = 9.675, P < 0.05). Regions corresponding to CD8 T cell epitopes were under negative selection except in the isolates from Guangdong. There was no association between PSS, CD8 T cell epitopes, and CD4 T cell epitopes in the Guangdong genomes. The proteins with the highest proportion of PSSs within CD8 T cell epitopes were E1 and E2 in both locations (Supplementary Table S2). Seven amino acid sites located within CD8 T cell epitopes were found to be positively selected both in the Guangdong and non-Guangdong subsets. Among these, site 158 is located in the core region, which is targeted by cytotoxic T lymphocyte (CTL)-restricted HLA type A*02:01. Amino acid positions 227, 235, and 241 are located in the E1 region, targeted by HLA type A*02:01(227) and B35 (235 and 241). Sites 402 and 405 in E2 are within a known epitope targeted by HLA type A2. Site 2274 in NS5A is located in a region targeted by HLA type B60. However, heterogeneity between genotypes and mutations in the above epitopes where these seven PSSs are located has not been considered. Although two positively selected sites, 532 in E2 and 1078 in NS3, which were targeted by HLA type B60 and A2 within a CD8 T cell epitope, were only found in Guangdong. T cells specific for these two epitopes (GENDTDVFVL, aa 530-539, and AINGVMWTV, aa 1071-1079) displayed HCV genotype 6 reactivity according to records in the IEDB database (http://www.iedb.org/).
Fig. 2

Distribution of PSSs in the polyprotein of H77 reference sequence for the Guangdong and non-Guangdong subsets. “PSS” indicates a positively selected site

Fig. 3

Map of the HCV-6a Guangdong (yellow) and non-Guangdong (green) genomes, indicating the location of PSSs, CD8 T cell epitopes (blue), and CD4 T cell epitopes (purple)

Distribution of PSSs in the polyprotein of H77 reference sequence for the Guangdong and non-Guangdong subsets. “PSS” indicates a positively selected site Map of the HCV-6a Guangdong (yellow) and non-Guangdong (green) genomes, indicating the location of PSSs, CD8 T cell epitopes (blue), and CD4 T cell epitopes (purple)

Discussion

We performed a comparative analysis of evolutionary forces that shape the genomes of HCV 6a. Our results reveal a history of separate introductions of HCV 6a into Guangdong Province. In addition, we found that there are several differences as well as similarities between Guangdong and non-Guangdong genomes. These results help us understand the underlying factors driving HCV 6a evolution at the genome level. The molecular evolution of HCV 6a has previously only been investigated using a single genome region or two concatenated partial genome regions [13]. In this study, we analyzed 117 whole-genome sequences and found that HCV 6a worldwide evolved from a common ancestor in 1905, making it younger than any other subtypes of HCV gt6 [14]. The topology of the MCMC tree showed that Vietnam and other Asian countries could be the origin of 6a worldwide, which is consistent with the previous studies [3, 15]. Although there has been no spatiotemporal phylogenetic analysis, the data provide considerable evidence to support this hypothesis [15]. In addition, we found that HCV 6a was introduced into Guangdong around 1960–1970 via three transmission events. First, HCV 6a was introduced from Vietnam into Denmark and Guangdong in 1964, and HCV 6a strains were probably transmitted separately to Guangdong and Denmark, as there is no evidence of population migration between Guangdong and Denmark. In the second introduction, HCV 6a was apparently transported from Vietnam to Hong Kong and then to Guangdong, with cross-dissemination between these two places. It has been speculated that HCV 6a came from Vietnam to Hong Kong and then to Guangdong due to the fact that Hong Kong acted as the "first collecting port" for refugees from Vietnam [16]. However, because there is a lack of time-stamped sequence data from Hong Kong from that time period, this hypothesis could not be tested. In this study, we found evidence in the phylogeographic tree that the transmission route was from Vietnam to Hong Kong and then to Guangdong. Two historical events also corroborate this scenario: one is the association between Hong Kong and Vietnamese immigration, and the other is the population exchange between Guangdong Province and Hong Kong in the initial stage after the foundation of the People’s Republic of China [17]. In the third introduction, HCV 6a appears to have been introduced directly from Vietnam into Guangdong, causing a regional epidemic, as suggested in a previous study [16]. Guangdong Province, which is very close to Vietnam, accepted many refugees from Vietnam during the Vietnam War, and a fraction of those people might have been infected. We also found that most isolates in group C were from Guangdong Province. We speculate that the members of this clade clustered together because of similarities in the basic characteristics or lifestyle of the population in this area, resulting in a concentration of specific virus sequences. It is worth noting that HCV072 does not cluster into clade A, B, or C like other Guangdong isolates. The other isolates clustering with HCV072 were mainly from one community in China and Asian immigrants residing in Canada. Although detailed information on this specific community is lacking, we speculate that HCV072 was imported from Asian immigrants when they moved to Canada. Genetic drift may also account for why HCV072 did not cluster with the other Guangdong isolates. Most published studies detecting PSSs in this virus have analyzed different subtypes, such as HCV 1a and HCV 1b, without considering the region from which the sequences were obtained [8, 18]. In this study, we found that the proportion of PSSs differed between Guangdong and non-Guangdong genomes. Non-Guangdong HCV 6a strains have more PSSs in the E2 and NS2 regions, while Guangdong HCV 6a strains have more PSSs in the E1, P7, and Core regions. We speculate that geographical location is related to viral adaptation between HCV and the host's cellular immune response. The main reason for this speculation is that the virus transmission pattern and host genetic background in different regions led to the different positive sites. Seven amino acid sites within CD8 T cell epitopes were found to be positively selected both in the Guangdong and non-Guangdong subsets. However, when considering the heterogeneity between genotypes and mutations in the epitopes, only two sites are located in epitopes that are targeted by HLA type B60 and A2 in Guangdong. HLA-A2 accounts for 30.28%, and HLA type B60, which is broadly specific for the antigen epitope B*40:01 motif, account for 13-14.4% of the population of Guangdong Province (http://allelefrequencies.net/). The difference in selectivity between Guangdong and non-Guangdong subsets revealed that HCV 6a evolved under different forces and constraints in Guangdong and other regions. This information could be beneficial for finding constrained codon sites (non-PSSs) in the CD8 T cell epitope for viral vaccination to limit amino acid substitutions and reduce the likelihood of viral escape in different regions [19]. The negative selection of CD8 T cell epitopes outside Guangdong suggested that HCV 6a was better able to escape cellular immune defenses in other regions than it was in Guangdong, which was consistent with the fact that HCV 6a is the prevalent subtype in Guangdong Province [5]. The highest proportion of PSSs both within and outside of CD8 T cell epitopes was found in E1 and E2 in both locations, consistent with a previous study [20]. Furthermore, these regions were reported previously to have especially high genetic variability [21], suggesting the generation of escape mutants due to interactions between positive selection and the host immune response. The E1 and E2 glycoproteins are involved in the binding and fusion of HCV during cell entry. Therefore, the positively selected sites within CD8 T cell epitopes are likely to be a consequence of the human immune response and virus mutation after infection of a new host. On the other hand, the protein with the lowest proportion of sites under positive selection was NS3 (Fig. 1), a potential target for direct-acting antivirals that inhibit virus replication [22]. When considering epitopes, the proportion of PSSs in NS3 within the CD8 T cell epitope was higher in Guangdong (1.11%) than elsewhere (0.32%) (Supplementary Table S2). Since NS3 is one of the most frequent HCV protein targets of the CD8 T cell response [23, 24], the larger number of PSSs within the CD8 T cell epitope in Guangdong Province would make the CD8 T-cell-mediated immune response less effective, leading to chronic infection. In conclusion, we have analyzed the propagation route and evolution rate of HCV 6a in Guangdong by comparing whole-genome sequences. Guangdong and non-Guangdong HCV 6a sequences exhibit different selective pressures in different parts of their genomes. This potentially provides information about the interactions between HCV and its host. The locations of PSSs, especially within epitopes, will provide useful information for choosing CD8 T cell epitopes for vaccine development and for the formulation of an HCV 6a prevention strategy in Guangdong, China. Below is the link to the electronic supplementary material. Supplementary Fig. S1 Maximum-likelihood phylogenetic tree of E1 from a previous study [4]. The chosen representative samples are labeled by $$$ Supplementary file2 (DOC 33 kb) Supplementary file3 (DOC 40 kb)
  2 in total

1.  Very high prevalence of hepatitis C virus genotype 6 variants in southern Vietnam: large-scale survey based on sequence determination.

Authors:  Van H Pham; Huyen D P Nguyen; Phat T Ho; Dien V Banh; Hoa L T Pham; Phiet H Pham; Ling Lu; Kenji Abe
Journal:  Jpn J Infect Dis       Date:  2011       Impact factor: 1.362

2.  Hepatitis C virus genotypes and subtypes circulating in Mainland China.

Authors:  Ying Chen; Changshun Yu; Xueru Yin; Xiaolei Guo; Shangwei Wu; Jinlin Hou
Journal:  Emerg Microbes Infect       Date:  2017-11-01       Impact factor: 7.163

  2 in total
  1 in total

1.  The Transmission Route and Selection Pressure in HCV Subtype 3a and 3b Chinese Infections: Evolutionary Kinetics and Selective Force Analysis.

Authors:  Ru Xu; Xia Rong; Elihu Aranday-Cortes; Sreenu Vattipally; Joseph Hughes; John McLauchlan; Yongshui Fu
Journal:  Viruses       Date:  2022-07-11       Impact factor: 5.818

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.