| Literature DB >> 28512348 |
Shengdi Li1,2, Zhen Wang1, Yixue Li3,4, Guohui Ding5,6.
Abstract
Hepatitis B virus (HBV) is classified into several genotypes, correlated with different geographic distributions, clinical outcomes and susceptible human populations. It is crucial to investigate the evolutionary significance behind the diversification of HBV genotypes, because it improves our understanding of their pathological differences and pathogen-host interactions. Here, we performed comprehensive analysis of HBV genome sequences collected from public database. With a stringent criteria, we generated a dataset of 2992 HBV genomes from eight major genotypes. In particular, we applied a specified classification of non-synonymous and synonymous variants in overlapping regions, to distinguish joint and independent gene evolutions. We confirmed the presence of selective constraints over non-synonymous variants in consideration of overlapping regions. We then performed the McDonald-Kreitman test and revealed adaptive evolutions of non-synonymous variants during genotypic differentiation. Remarkably, we identified strong positive selection that drove the differentiation of PreS1 domain, which is an essential regulator involved in viral transmission. Our study presents novel evidences for the adaptive evolution of HBV genotypes, which suggests that these viruses evolve directionally for maintenance or improvement of successful infections.Entities:
Mesh:
Year: 2017 PMID: 28512348 PMCID: PMC5434055 DOI: 10.1038/s41598-017-02012-8
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Population structure of 2992 HBV genomes. (A) The Principal Component Analysis of 2992 HBV genome sequences, which were labeled in different colors according to their genotypes predicted by fragment typing. (B) The unrooted phylogeny tree was computed using Neighbor-Joining method. 100-time bootstrapping test was performed and showed as numbers on clade.
Figure 2Polymorphisms within HBV genotypes in presence of constraints. (A) Z-transformed pair-wise nucleotide difference Z was displayed for eight genotypes (A–H) over a 100-bp sliding window with a step size of 50 bp. Regions without gene overlaps are highlighted with grey background. (B) Distribution of derived allele frequency (DAF) of non-synonymous variants and synonymous variants, where non-synonymous variants comprise INS, DNS and NNS variants. In particular, histogram of genotypes A–F were showed in a high resolution with more separate intervals of variant frequencies (each bar represents an interval of 2.5%), while distributions in genotypes G, H were showed in low resolution (4%) because of small sample sizes.
Figure 3Adaptive evolution of non-synonymous variants in HBV. (A–B) Plots of AS statistics of NNS, INS and DNS variants. The calculation was performed using (A) low frequency and (B) high frequency variants separately. Low frequency variants were defined by <1% DAF or singletons. AS was first computed based on polymorphic variants in each genotype separately and then all polymorphic variants were combined (dot in black). Genotypes G and H are not showed because of small sample size (Supplementary Tables S4 and S5). Error bars denotes the 90% confidence intervals derived from bootstrapping tests of 1000 times for AS. Blue dashed line denotes the neutral index, where AS = 0.
MK test and AS, α statistic of different categories of variants.
| Variant type | Fixed mutation | Polymorphic mutation |
| MK test’s | P value ( |
|---|---|---|---|---|---|
| synonymous | 383 | 1450 | / | / | / |
| NNS | 242 | 1013 | −0.145 | −0.106 | 0.2735 |
| INS | 288 | 926 | 0.236 | 0.151 | 0.0651 |
| DNS | 121 | 338 | 0.438 | 0.262 | 0.0114 |
Figure 4Adaptive evolution of protein-coding genes in HBV. Plots of AS statistics for protein-coding genes in HBV. (A) Independent gene evolutions were estimated based on NNS + INS or INS (PreS1/PreS2/S contains no NNS variants) variants, and (B) co-evolutions of overlapped genes were estimated with DNS variants, using all observed mutations from eight genotypes. Genes or regions with few observations of fixed or polymorphic mutations (<10 mutations) are not showed (Table 2). Error bars indicate 90% confidence intervals and blue dashed line denotes neutral index similar as in Fig. 3.
MK test and AS, α statistic of variants in genes.
| Variant type | Gene | Fixed mutation | Polymorphic mutation |
| MK test’s | P value ( |
|---|---|---|---|---|---|---|
| synonymous | / | 383 | 1450 | / | / | / |
| INS |
| 19 | 32 | 1.168 | 0.555 | 0.0049 |
| INS |
| 20 | 63 | 0.265 | 0.168 | 0.4839 |
| INS |
| 40 | 154 | −0.024 | −0.017 | 0.9283 |
| NNS + INS |
| 322 | 1013 | 0.267 | 0.169 | 0.0312 |
| NNS + INS |
| 100 | 299 | 0.340 | 0.21 | 0.0669 |
| NNS + INS |
| 3 | 22 | −0.954 | −0.937 | 0.2762 |
| NNS + INS |
| 26 | 356 | −1.855 | −2.617 | 1.08E-10 |
| DNS |
| 68 | 80 | 1.685 | 0.689 | 2.73E-12 |
| DNS |
| 13 | 53 | −0.107 | −0.077 | 0.814 |
| DNS |
| 20 | 94 | −0.312 | −0.241 | 0.3915 |
| DNS |
| 12 | 62 | −0.449 | −0.365 | 0.3302 |
| DNS |
| 2 | 9 | −0.247 | −0.187 | 0.8253 |
| DNS |
| 6 | 40 | −0.816 | −0.761 | 0.1943 |
The PreS1/PreS2/S ORF is completely overlapped by P ORF, therefore PreS1, PreS2, S contains no NNS variants.
Genotype-specific amino acid variants in PreS1 region.
| Genotypea | Typical amino acid variants on |
|---|---|
| A | 48I, 54A, 67L, 74I, 89S, 90T, 91I |
| B | 35K, 39E, 45L, 48H, 56H, 87S |
| C | 10Q |
| D | 39A, 51T, 65L, 86Q, 91N, 114N |
| E | 16H, 19T, 39R, 45H, 53T, 84M, 86K, 109T |
| F | 8T, 47K, 84V |
| G | 25L, 48K, 51P, 81S, 84T |
| H | 8A, 47T, 88S, 90P |
| A + B | 10K |
| A + C | 35G, 57Q |
| B + D | 54D, 108L |
| C + D | 60A |
| D + G | 19S |
| E + G | 3L, 4S, 5W, 6T, 7V, 9L, 10E, 11W, 14K, 63Y |
| F + H | 3A, 4P, 5L, 7T, 10R, 33L, 39S, 40S, 54M, 62G, 100R, 104K, 108V |
| A + B + C | 3G, 4W, 5S, 7K, 14T, 88V |
| A + C + E | 51H |
| A + C + G | 39N, 115S |
| B + F + H | 51N |
| C + E + G | 54E |
| D + E + G | 18T, 38T |
| D + F + H | 14Q |
aThe viral genotype(s) and their specific PreS1 amino acid residues are listed. Only genotype-specific variants in three or less genotypes are showed.
bThe genotype-specific amino acid variants are represented by their positions on PreS1 peptide (from 1st to 119th residues) and the amino acid symbol.