| Literature DB >> 32134374 |
Anna L McNaughton1, Peter A Revill2,3, Margaret Littlejohn2,3, Philippa C Matthews4,1,5, M Azim Ansari6.
Abstract
Hepatitis B virus (HBV) is a diverse, partially double-stranded DNA virus, with 9 genotypes (A-I), and a putative 10th genotype (J), characterized thus far. Given the broadening interest in HBV sequencing, there is an increasing requirement for a consistent, unified approach to HBV genotype and subgenotype classification. We set out to generate an updated resource of reference sequences using the diversity of all genomic-length HBV sequences available in public databases. We collated and aligned genomic-length HBV sequences from public databases and used maximum-likelihood phylogenetic analysis to identify genotype clusters. Within each genotype, we examined the phylogenetic support for currently defined subgenotypes, as well as identifying well-supported clades and deriving reference sequences for them. Based on the phylogenies generated, we present a comprehensive set of HBV reference sequences at the genotype and subgenotype level. All of the generated data, including the alignments, phylogenies and chosen reference sequences, are available online (https://doi.org/10.6084/m9.figshare.8851946) as a simple open-access resource.Entities:
Keywords: HBV; phylogenetics; reference sequences; whole genome
Mesh:
Substances:
Year: 2020 PMID: 32134374 PMCID: PMC7416611 DOI: 10.1099/jgv.0.001387
Source DB: PubMed Journal: J Gen Virol ISSN: 0022-1317 Impact factor: 3.891
Fig. 1.HBV genome lengths of each genotype. Standard genome lengths, and sites with deletions and insertions, are illustrated for genotypes A–J, along with a map of the HBV genome layout. Deletions and insertions are shown relative to genotype A, which is widely used as a numbering reference for HBV. Deletions are shown as white gaps and sites of insertions are indicated in black with triangles above them. All genotypes have a 6 bp deletion in the core (C), relative to genotype A (at nucleotide (nt)2354). Genotypes E and G have a 3 bp deletion in the pre-S1 region (at nt2861) and genotypes D and J have a 33 bp deletion at the start of the pre-S1 region (at nt2854) [2]. Genotype G also has a 33 bp insertion in the core (at nt1903).
Fig. 2.Genomic-length maximum-likelihood phylogeny of all genotype A–I HBV sequences included in analysis (n=2839) after removing highly similar sequences, indicating the number of sequences in each genotype analysed separately (Figs 3–10). Bootstrap support ≥70 after 1000 replicates is given for the deepest branches on the tree. The scale bar indicates the estimated nucleotide substitutions per site. *, a strain known to be from a 14th century skeleton clustering distantly with genotype A, LT992441, was removed from the subsequent analysis. **, KU736915 was identified as a genotype D/E recombinant and removed from the subsequent analysis.
Proposed reference sequences for HBV genotypes, subgenotypes and clades
The number of sequences in each clade is given for each subgenotype and clade identified. Note that the total sum of the subgenotype sequence clusters may not correspond to the total number of genotype sequences, as a number of sequences did not group within a specific clade. Reference sequences for the genotypes are highlighted in grey boxes. In Figs 3–10, genotype references are marked with blue dots and subgenotype reference sequences are marked with red dots. Subgenotypes B5, C7, C9 and D6 are not included, as these sequences either did not cluster as monophyletic clades or were not retained in our analysis. Hamming distance indicates the number of nucleotide differences between the clade consensus and the chosen reference. The pairwise distance is the number of nucleotide differences between the clade consensus and the chosen reference normalized by length of the genome.
|
HBV genotype |
Subgenotype |
No. of sequences |
Reference GenBank ID |
Hamming distance |
Pairwise distance |
References |
Collection year* |
Country of origin |
|---|---|---|---|---|---|---|---|---|
|
A |
A1 (1) |
50 |
KP168423 |
26 |
0.008 |
[ |
2012 |
Kenya |
|
|
|
|
21 |
0.007 |
[ |
2006 |
Haiti | |
|
A2 |
80 |
EU594385 |
5 |
0.002 |
[ |
2004 |
Estonia | |
|
A3 |
9 |
AM184126 |
29 |
0.009 |
[ |
2005 |
Gabon | |
|
A4 |
2 |
KM606737 |
|
|
[ |
2015 |
Cuba | |
|
A5 |
25 |
FJ692601 |
10 |
0.003 |
[ |
2006 |
Haiti | |
|
A6 |
2 |
GQ331046 |
|
|
[ |
2006 |
Belgium | |
|
B |
B1 |
47 |
D23679 |
23 |
0.007 |
[ |
1993 |
Japan |
|
B2 (1) |
131 |
JQ801514 |
15 |
0.005 |
[ |
2009 |
Thailand | |
|
|
|
|
10 |
0.003 |
[ |
2010 |
Taiwan, ROC | |
|
B3 |
106 |
AP011085 |
23 |
0.007 |
[ |
2001 |
Indonesia | |
|
B4 |
69 |
AB073835 |
35 |
0.011 |
[ |
2001 |
Japan | |
|
B6 |
36 |
AB287314 |
29 |
0.009 |
[ |
2006 |
Alaska | |
|
C |
C1† |
240 |
DQ089781 |
29 |
0.009 |
[ |
2005 |
Hong Kong SAR |
|
C2 (1) |
261 |
KC774182 |
21 |
0.007 |
[ |
2012 |
PR China | |
|
|
|
|
14 |
0.004 |
[ |
2007 |
PR China | |
|
C2 (3) |
157 |
AP011098 |
10 |
0.003 |
[ |
2009 |
Indonesia | |
|
C4 |
21 |
KF873526 |
68 |
0.021 |
[ |
2011 |
Australia | |
|
C5 |
15 |
AP011099 |
41 |
0.013 |
[ |
2009 |
Indonesia | |
|
C6‡ |
2 |
EU670263 |
28 |
0.009 |
[ |
2008 |
Philippines | |
|
C8 |
15 |
AP011107 |
42 |
0.013 |
[ |
2009 |
Indonesia | |
|
C10 |
21 |
KJ173333 |
33 |
0.010 |
[ |
2012 |
PR China | |
|
C11§ |
28 |
AB554015 |
86 |
0.027 |
[ |
2010 |
Indonesia | |
|
UA (1) |
31 |
KC774298 |
15 |
0.005 |
[ |
2012 |
PR China | |
|
UA (2) |
16 |
DQ089802 |
55 |
0.018 |
[ |
2005 |
Hong Kong SAR | |
|
D |
D1 (1) |
216 |
AB222711 |
11 |
0.003 |
[ |
2005 |
Uzbekistan |
|
|
|
|
17 |
0.005 |
[ |
2013 |
India | |
|
D2 |
100 |
MF925358 |
21 |
0.007 |
[ |
2015 |
Bangladesh | |
|
D3 |
78 |
FJ692507 |
18 |
0.006 |
[ |
2006 |
Haiti | |
|
D4 |
15 |
FJ692533 |
17 |
0.008 |
[ |
2006 |
Haiti | |
|
D5 |
15 |
GQ205389 |
19 |
0.006 |
[ |
2008 |
India | |
|
D7 |
15 |
FJ904435 |
44 |
0.014 |
[ |
2006 |
Tunisia | |
|
E |
|
|
|
19 |
0.006 |
[ |
2006 |
Guinea |
|
F |
|
|
|
13 |
0.004 |
[ |
2007 |
Chile |
|
F2 |
13 |
DQ899143 |
26 |
0.008 |
[ |
2006 |
Venezuela | |
|
F3 |
19 |
MH051986 |
17 |
0.005 |
[ |
2011 |
Venezuela | |
|
F4 |
18 |
KJ843175 |
17 |
0.005 |
[ |
2012 |
Argentina | |
|
G |
|
|
|
|
|
[ |
2001 |
USA |
|
H |
|
|
|
36 |
0.011 |
[ |
2008 |
Argentina |
|
I |
|
|
|
17 |
0.005 |
[ |
2007 |
Vietnam |
|
I2 |
4 |
FJ023669 |
17 |
0.005 |
[ |
2008 |
Laos | |
|
J¶ |
|
|
|
|
|
[ |
2006 |
Japan/Borneo |
n/a, not applicable; (Subgenotype) this genotype does not diverge into multiple subtypes; (Hamming/Pairwise distance), too few sequences identified belonging to the genotype/subgenotype to generate consensus sequences for selection of closest biological isolate.
*Collection date of sample or year submitted to GenBank (if collection date not given).
†C1 is a large clade that also contains sequences labelled as subgenotype C3, with no clear separation between the two sets of sequences.
‡This sequence has been used previously as a subgenotype C7 reference in a number of publications [21, 22]. No other putative C7 sequences are proposed in the literature.
§C11, a large number of sequences were unpublished in this clade (>30 first closest seqs).
¶Genotype J remains putative, with a single isolate identified in a Japanese patient. The isolate shows considerable divergence from other known HBV strains and is thought to be a recombinant of genotype C and a gibbon HBV isolate.
Fig. 5.Genomic-length maximum-likelihood phylogeny of HBV genotype C sequences. Well-defined clades have been highlighted with coloured dotted lines and reference sequences for each clade indicated (red dots). The proposed reference strain for the genotype, GQ377617, is highlighted with a blue dot. The subgenotype is given where it could be reliably identified. Bootstrap support for branches ≥70 after 1000 replicates is indicated. The scale bar indicates the estimated nucleotide substitutions per site. We were unable to verify the subgenotype of two genotype C clades, and these have been designated unassigned clades 1 and 2 [unassigned_C (1) and unassigned_C (2), respectively].
Fig. 11.Genomic-length maximum-likelihood phylogenetic tree of HBV genotype, subgenotype and clade reference strains identified in Figs 3–10 and listed in Table 1 with accession numbers. The genotype is given in each case and the subgenotype or clade identification is given where possible. Bootstrap support for branches ≥70 after 1000 replicates is indicated. The scale bar indicates the estimated nucleotide substitutions per site. In addition to the references identified in Figs 3–10, genotype A isolate X02763 and genotype D isolate NC_003977.2 have been included in the tree.
Fig. 12.Pairwise distance distribution for the genomic-length sequences of HBV genotypes A, B, C, D, E, F, H and I. Probability densities of pairwise distances for whole-genome sequences of HBV genotypes. Genotypes E, F, H and I are shown on a separate plot from genotypes A–D as they contained smaller number of sequences. Too few sequences were available after filtering for genotypes G and (putative) genotype J to be analysed.