| Literature DB >> 30268787 |
Anna L McNaughton1, Valentina D'Arienzo2, M Azim Ansari1, Sheila F Lumley3, Margaret Littlejohn4, Peter Revill4, Jane A McKeating2, Philippa C Matthews5.
Abstract
Hepatitis B virus (HBV) is a unique, tiny, partially double-stranded, reverse-transcribing DNA virus with proteins encoded by multiple overlapping reading frames. The substitution rate is surprisingly high for a DNA virus, but lower than that of other reverse transcribing organisms. More than 260 million people worldwide have chronic HBV infection, which causes 0.8 million deaths a year. Because of the high burden of disease, international health agencies have set the goal of eliminating HBV infection by 2030. Nonetheless, the intriguing HBV genome has not been well characterized. We summarize data on the HBV genome structure and replication cycle, explain and quantify diversity within and among infected individuals, and discuss advances that can be offered by application of next-generation sequencing technology. In-depth HBV genome analyses could increase our understanding of disease pathogenesis and allow us to better predict patient outcomes, optimize treatment, and develop new therapeutics.Entities:
Keywords: Diversity; Evolution; Genotype; Hepatitis B Virus
Mesh:
Year: 2018 PMID: 30268787 PMCID: PMC6347571 DOI: 10.1053/j.gastro.2018.07.058
Source DB: PubMed Journal: Gastroenterology ISSN: 0016-5085 Impact factor: 22.682
Figure 1Relationships between HBV and other hepadnaviruses, genotype diversity, and genome size. (A) Phylogenetic tree of the relation among avian, mammalian, and other hepadnaviruses. Hepadnavirus reference sequences for avian (NC_005950.1, NC_001344.1, NC_016561.1, NC_005890.1, NC_001486.1, NC_035210.1, NC_005888.1), mammalian (NC_003977.2, NC_028129.1, NC_024445.1, NC_024444.1, NC_024443.1, NC_020881.1, NC_004107.1, NC_001484.1), and other (NC_027922.1, NC_030446.1, NC_030445.1) species were downloaded from Genbank. This dataset was further supplemented with hepadnavirus isolates from chimpanzees, orangutans, and gorillas (AF193863, FJ798097, FJ798098) and some widely cited HBV genotype strains (X02763, D00330, AY123041, V01460, X75657, X69798, AF160501, AY090454). (B) Midpoint-rooted maximum likelihood phylogenetic tree generated using MEGA7 with bootstrap replicates of 1000 used, indicating relations between HBV genotypes and subtypes and their typical geographic distribution. Widely used reference sequences for genotypes A–D and F are included. For genotypes with a single subtype, the reference sequences were used to generate the tree. The sequences used to generate the tree were genotype A: KP234050.1, HE974376.1, KP234052.1, AY934764.1, KP234053.1, GQ331048.1; genotype B: D50521.1, AB073825.1, GQ924628.1, AB073826.1, AB219427.1, DQ463792.1, AP011091.1, AP011093.1, GQ205440.1, GQ358146.1; genotype C: KM999990.1, KY629637.1, AB554019.1, AB554018.1, AB644281.1, AB644283.1, AB644286.1, AB644287.1, KP017272.1, KU695741.1, KF873519.1, KM999992.1, KM999993.1, AP011107.1, KP017269.1, AP011108.1; genotype D: AB104711.1, HQ700511.1, KP090181.1, FJ692533.2, DQ315780.1, KF170740.1, KP322600.1, FJ904406.1; genotype F: AF223963.1, AY311369.1, AY311370.1, AB166850.1; genotype I: AB562462.1, FJ023671.1; genotype J: AB486012.1; and HBVdb genotype reference sequences for genotypes A–H, respectively: X02763, D00331, AY123041.1, V01460.1, X75657.1, X69798.1, AF160501, AY090454. (C) Relative genome sizes of viruses pathogenic to humans including HBV (3.2 kB; arrow). Genomes were obtained from https://www.ncbi.nlm.nih.gov/genomes/GenomesGroup.cgi?taxid=10239 and sorted by nucleotide length. For each virus type, a representative genome was selected. Metadata, including accession numbers for each organism, can be found at 10.6084/m9.figshare.6080402. CMV, cytomegalovirus; dsRNA, double-stranded RNA; EBV, Epstein-Barr virus; gt, genotype; HAV, hepatitis A virus; HCV, hepatitis C virus; HDV, hepatitis D virus; HEV, hepatitis E virus; HHV, human herpesvirus; HPV, human papillomavirus; HSV, herpes simplex virus; HTLV, human T-lymphotropic virus; LCMV, lymphocytic choriomeningitis virus; MERS, Middle East Respiratory Syndrome; SARS, Severe Acute Respiratory Syndrome; ssDNA, single-stranded DNA; ssRNA, single-stranded RNA; VZV, varicella zoster virus.
Figure 2Annotated HBV genome and replication cycle. (A) The 4 overlapping ORFs and the 7 products encoded. Gene products are indicated by text boxes, with start and end positions derived using X02763.1 as a reference strain. The major functional domains of the P gene product are indicated (dotted lines). Large HBs consist of pre-S1, pre-S2, and S; medium HBs consist of pre-S2 and S; and small HBs consist of S only. The overlap of >1000 nucleotides between the P and S genes is the largest gene overlap of any known animal virus. The near-complete negative DNA strand and partially complete positive DNA strands (dotted line indicates approximated missing region) also are shown, in addition to the position of EcoR1. The 5′ end of the complete negative-sense DNA strand is covalently bound to the viral RT. The complementary positive-sense DNA strand is partially complete, covering approximately two thirds of the viral genome. The 5′ end of the incomplete strand is defined by a short oligo-ribonucleotide region; the 3′ end varies within and among hosts. (B) Replication cycle (adapted from Liang, special issue). (i) Infective HBV virions in serum, often referred to as Dane particles (diameter, 42 nm). The capsid structure has icosahedral symmetry: T = 4 (31 nm; 90% of population) and T = 3 (28 nm; 10% of population).37, 38 (ii) The virus enters hepatocytes by HSPG (low-affinity binding) and solute carrier family 10 member 1 (SLC10A1; also called sodium taurocholate co-transporting polypeptide NTCP; high-affinity binding). (iii) The molecular processes of un-coating and nuclear import are unclear but likely require cell proteins. (iv) Viral DNA enters the nucleus as RC-DNA. (v) Viral DNA is reconfigured as cccDNA within the nucleus by the cell’s DNA repair factors; this stable structure occurs in association with host histones that mediate DNA packaging. (vi) The open cccDNA structure is a template for host RNA polymerase II. (vii) DNA is transcribed to pre-genomic RNA intermediates in the nucleus, creating 4 mRNAs (blue): a 3.5-kb transcript encoding precore RNA (full-length pre-genomic RNA also shown in green); 2.4- and 2.1-kb mRNA transcripts for pre-S and S, respectively; and a 0.7-kb mRNA encoding the X protein. The RNA is transported to the cytoplasm, where it is translated to 7 viral proteins (short, medium, and long S proteins, core, e antigen, polymerase, and X protein). (viii) HBV RT produces a negative-strand DNA from pre-genomic RNA. The RNA template is degraded by RNase H, and then synthesis of the positive-strand DNA is initiated. HBV DNA is repackaged in relaxed form with other proteins inside the host cell. (ix) New virions and viral proteins are released into the blood. Excess HBsAg forms small noninfectious, subviral particles (∼20 nm diameter), and long filaments; free HBeAg and capsids also are secreted. C, core; HBeAg, hepatitis B e antigen; HBx, hepatitis B X protein; HSPG, heparan sulfate proteoglycan; NCTP, Na+-taurocholate co-transporting polypeptide pol, polymerase; TP, terminal protein.
HBV Subtypes and Genotype Features
| Genotype | Subtypes | Geographic distribution | Genome length (bp) | Distinguishing features of HBV sequence |
|---|---|---|---|---|
| A | A1–A4 | Africa, Europe | 3221 | 6-bp insertion in core gene; G1896A |
| B | B1–B5 | Japan, China, Southeast Asia | 3215 | B1 and B5 are pure strains, whereas B2, B3, and B4 are recombinants with genotype C in the core region |
| C | C1–C16 | Southeast Asia, Australia | 3215 | BCP mutations |
| D | D1–D7 | Worldwide, Middle East, West Africa | 3182 | 33-bp deletion in pre-S1 |
| E | No subtypes described | West Africa | 3212 | 3-bp deletion in pre-S1 |
| F | F1–F4 | North and South America | 3215 | G1896A |
| G | No subtypes described | Central America and Europe | 3248 | 36-bp insertion in core and 3-bp deletion in pre-S1; insertion results in a high level of core expression; stop codons at positions 2 and 28 (G1896A |
| H | No subtypes described | Central America | 3215 | G1896A |
| I | I1–I2 (putative | Southeast Asia | 3215 | Evolved as a recombinant of genotypes A, C, and G |
| J (putative) | No subtypes described | Japan | 3182 | Single isolate identified in elderly Japanese patient with HCC; highly divergent from other human HBV strains; likely a genotype C–gibbon |
NOTE. Further details about genotypes and subtypes can be found in Rajoriya et al and Tong and Revill.
BCP, basal core promoter; HBeAg, hepatitis B e antigen.
Insertions and deletions relative to 3215-bp genome length.
G1896A mutation introduces a premature stop codon in the precore, resulting in loss of HBeAg expression.
Basal core promoter mutations at A1762T and G1764A result in decreased HBeAg expression.
Few sequences of genotype I have been characterized, although the genetic distance between isolates suggests there might be 2 subtypes.
Figure 3HBV diversity. (A) Relation between genome type and substitution rate. Estimates of evolutionary rate (substitutions per nucleotide per year) were taken from Sanjuán and were calculated using Bayesian molecular clock approaches. For the different genome types, median rates of evolution were 9.32 × 10−6 (interquartile range [IQR], 7.00 × 10−7–7.20 × 10−5) for dsDNA, 6.36 × 10−4 (IQR, 1.60 × 10−4–1.88 × 10−3) for dsRNA, 1.10 × 10−3 (IQR, 4.52 × 10−4–2.69 × 10−3) for +ssRNA, 9.17 × 10−4 (IQR, 3.55 × 10−4–3.40 × 10−3) for −ssRNA, and 2.08 × 10−4 (IQR, 1.36 × 10−4–5.65 × 10−4) for ssDNA. (B) Distribution of diversity along the HBV genome. Full-length HBV genome sequences were obtained from HBVdb in August 2017 (n = 5383). Sequences were aligned using MAFFT (https://mafft.cbrc.jp/alignment/server/). Sequences for each genotype were randomly shuffled using a function within SSE 1.3 and 250 sequences of each genotype were randomly selected for analysis to normalize the number of sequences of each genotype analyzed. Only 225 sequences were available for genotype F; genotypes G, H, I, and J were excluded from the analysis because there were insufficient numbers of sequences available for comparison with other genotypes. Within-genotype pairwise nucleotide distances were calculated for genotypes A–F using SSE 1.3 using a window size of 150 bp and increments of 20 bp. The greatest variability (typically >5% sequence divergence) is observed in regions where there are no overlapping ORFs. Entropy at each nucleotide within the dataset was calculated using SSE 1.3. (C) Comparison of Shannon entropy at each site of overlapping and nonoverlapping regions of the HBV genome. Genotypes were analyzed individually and regions of the genome were divided into overlapping and nonoverlapping regions using an annotated genome (https://hbvdb.ibcp.fr/HBVdb/HBVdbGenome). Mean Shannon entropy in overlapping regions is significantly lower at 0.16 (95% confidence interval, 0.14–0.17) than in nonoverlapping regions (0.20; 95% confidence interval, 0.18–0.21; P < .0001 by Mann-Whitney U-test). C, core; dsRNA, double-stranded RNA; HCV, hepatitis C virus; ssDNA, single-stranded DNA; ssRNA, single-stranded RNA.
Determinants of Diversity and Conservation Within HBV
| HBV attribute | Factors associated with sequence conservation | Factors associated with sequence diversity |
|---|---|---|
| Genome structure and sequence | Overlapping ORFs and regulatory regions in viral genome ( | Redundancy within the third codon position in regions where ORFs overlap allows selection of mutations ( |
| Persistence and transmission | Superior transmission potential of wild-type variants. | Long duration of infection can generate diverse quasispecies populations within hosts. |
| Replication cycle | Stable reservoir of cccDNA with long half-life ( | Error-prone viral RT enzyme with high substitution rate when transcribing pgRNA into RC-DNA ( |
| Genotypes | Lowest level of diversity is observed in genotype E ( | Increased diversity is a feature of some specific genotypes; genotype F diverges considerably from other genotypes |
BCP, basal core promoter; HBeAg, hepatitis B e antigen.