Literature DB >> 23125365

HBVdb: a knowledge database for Hepatitis B Virus.

Juliette Hayer¹, Fanny Jadeau, Gilbert Deléage, Alan Kay, Fabien Zoulim, Christophe Combet.

Abstract

We have developed a specialized database, HBVdb (http://hbvdb.ibcp.fr), allowing the researchers to investigate the genetic variability of Hepatitis B Virus (HBV) and viral resistance to treatment. HBV is a major health problem worldwide with more than 350 million individuals being chronically infected. HBV is an enveloped DNA virus that replicates by reverse transcription of an RNA intermediate. HBV genome is optimized, being circular and encoding four overlapping reading frames. Indeed, each nucleotide of the genome takes part in the coding of at least one protein. However, HBV shows some genome variability leading to at least eight different genotypes and recombinant forms. The main drugs used to treat infected patients are nucleos(t)ides analogs (reverse transcriptase inhibitors). Unfortunately, HBV mutants resistant to these drugs may be selected and be responsible for treatment failure. HBVdb contains a collection of computer-annotated sequences based on manually annotated reference genomes. The database can be accessed through a web interface that allows static and dynamic queries and offers integrated generic sequence analysis tools and specialized analysis tools (e.g. annotation, genotyping, drug resistance profiling).

Entities: CellLine Chemical Disease Gene Species

Mesh：

Substances：
Viral Proteins

Year: 2012 PMID： 23125365 PMCID： PMC3531116 DOI： 10.1093/nar/gks1022

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Hepatitis B virus (HBV) is a major health problem worldwide with more than 350 million people being chronic carriers. Chronic HBV infection is associated with a significantly increased risk of developing severe liver diseases, including liver cirrhosis, and hepatocellular carcinoma (HCC), one of the most common forms of human cancer. The estimated risk of HCC in chronic HBV carriers is ∼100 times greater than in uninfected individuals (1). Currently available anti-HBV drugs have limitations. Indeed, interferon alpha administration is associated with adverse reactions, while nucleos(t)ide analogs are virostatic and require long-term administration (2,3). HBV is an enveloped DNA virus that belongs to the Hepadnaviridae, a family of hepatotropic DNA viruses infecting certain mammalian or avian hosts (4). It contains a small (∼3.2 kb), partially double-stranded relaxed-circular DNA (rcDNA) genome that replicates by reverse transcription of an RNA intermediate, the pregenomic RNA (pgRNA). The genome encodes four overlapping open reading frames (ORFs) that are translated to produce the viral core protein (5,6), the surface proteins (5,7), a polymerase/reverse transcriptase (RT) (2,4), and HBx (8,9). The HBV life cycle starts with the binding of the virus to an unknown receptor of the host cell. Then, the viral particle is internalized. The virion rcDNA is delivered to the nucleus, where it is repaired to form a covalently closed-circular cccDNA. The episomal cccDNA serves as the template for the transcription of the pgRNA and the other viral mRNAs by the host RNA polymerase II (10). The viral genome is variable because of the spontaneous error rate of the viral polymerase and the lack of proof reading activity. There are eight genotypes of HBV designated A–H based on >8% nt variation over the entire genome. The eight HBV genotypes are distributed in distinct geographical localizations (11–13). Recombinant forms involving different genotypes have also been reported (14). However, the extensive overlap between the four encoded ORFs limits the diversity that the virus can tolerate. Indeed, every nucleotide participates in the coding of at least one viral protein attesting of an optimized small genome (15,16). Moreover, the genome variability is also constrained by environmental pressure exerted by the host immune response and the antiviral drugs for treated patients. To allow researchers to investigate the genetic variability of HBV sequences and viral resistance to treatment, several databases and repositories (17–19) have been published to date. Moreover, tools specific to HBV genotyping (20–22) are available for virologists, as well as tools aimed at drug resistance mutations analysis, among which some are freely accessible (23) and others need a registration (24). We have developed HBVdb, a database that contains a collection of computer-annotated HBV sequences thanks to manually annotated reference genomes. The sequences taken as input are the ones publicly available in the INSDC (25), including partial and complete HBV genomes. The database can be queried via a web interface and the query results can be further analysed with the numerous integrated generic and specialized (e.g. annotation, genotyping, drug resistance profiling) analysis tools.

DATABASE BUILDING

We developed a fully automated procedure to annotate all HBV sequences from the European Nucleotide Archive (ENA) (26), using a reference set of 16 manually annotated and non-recombinant complete genomes representing the 8 genotypes. The HBVdb building process starts with the retrieval of all the HBV entries in ENA. The second step is the automatic annotation of these entries in their text format (flat file ENA format). The annotated HBVdb entries are then loaded into a PostgreSQL relational database system. Finally, sequence datasets are extracted and multiple sequence alignments together with associated data are computed. The HBVdb is updated on a monthly basis. The program for the automatic annotation, as well as for the querying and the management of the database, are implemented in Java and SQL programming languages.

ANNOTATION PROCEDURE

A standard numbering system of HBV genomes exists, defined by the (often hypothetical) EcoR1 restriction site as the origin of the genome (27). However, the circularity of the HBV genome leads to the deposit of sequences in generic public databases that do not follow this system. Such sequences will result in one or several partial pairwise alignments with the reference genomes. To circumvent this problem, we modeled the HBV by duplicating the sequence of each reference genome (from ∼3.2 kb to ∼6.4 kb). The first step of the automated annotation procedure is a similarity search, using the FASTA program (28) (Figure 1A) in order to identify the most similar reference genome to the sequence to annotate. The second step of the annotation procedure checks if the query sequence follows the EcoR1 numbering by looking if the query sequence is aligned to only one replicate or if it overlaps both. In the latter case, the part of the query sequence that is aligned on the second replicate is shifted to the corresponding region of the first replicate. If there are gaps between the shifted part and the fixed part, they are replaced by ‘n’. This checking ensures that all the sequences follow the EcoR1 numbering system. The third step consists in optimizing the global query-reference pairwise alignment in order to avoid erroneous translation or to detect non-functional sequences. For each coding sequence (CDS) found in the query, a pairwise alignment is produced from the global one and divided into non-overlapping 3-nt windows (i.e. codons). If a window contains one or two gaps, the process tries to optimize gaps in order to have only 3 nt in the window or three gaps (codon deletion). If the optimization fails, the entry is discarded from the database. In the fourth step, the reference genome features (e.g. CDS, mat_peptide) are mapped onto the sequence to annotate when they are present. The sequence is genotyped in a fifth step. The last step corresponds to the drug resistance profiling. Finally, the annotated HBV entry is formatted as an ENA text entry for its later inclusion in the relational database.

Figure 1.

Annotation and genotyping processes. (A) The annotation procedure starts with the computation of a pairwise alignment between the query sequence (Q) and the most similar duplicated reference genome sequence (R1, R2: replicate 1, 2; nn indicates missing nucleotides between shifted and fixed parts of the query sequence). This alignment is split up into CDS alignments that are optimized before the mapping of features and the transfer of annotations. (B) Genotyping process. These pairwise alignments between query (Q) and reference sequences (e.g. GenoA, genotype A reference genome) are iterated using sliding windows to compute a matrix of mean identities. The matrix is used to produce the array of genotype at each query sequence position. Predicted genotype is deduced from this array.

GENOTYPING

Starting from all the pairwise query-reference alignments (Figure 1B), the algorithm computes a matrix containing the identity percentage at each position of the query for each genotype. The identity percentage is computed by summing the identity percentage over overlapping sliding windows (window length 301 nt, window step 1 nt) divided by the number of windows used at each query position. The maximum identity percentage calculated for each query position is taken from the matrix to fill an array with the corresponding genotype letter, as long as the maximum mean identity percentage is above or equal to 90%. The genotype is then computed from this array. The procedure is able to process only sequences with lengths equal or above the window size. Overall, the genotyping algorithm allows the identification of ‘pure’ genotype sequence as well as recombinant ones (14,29). In the result file, users can find the number of informative positions used in the genotype computation. This value can be used as a confidence value of the genotype prediction. The accuracy of our genotyping tool is similar to Oxford (20,30), jpHMM (21) and NCBI (22) HBV genotyping tools, including recombinant genomes detection. The HepSEQ genotyping tool (17) uses only polymerase/surface genes in genotype computation and is less accurate in recombinant detection (Supplementary Tables S1 and S2).

DRUG RESISTANCE PROFILING

An alignment between the query protein sequence and a reference reverse transcriptase (RT) sequence is computed. The algorithm searches, in the query sequence, for mutations defining known resistance profiles to lamivudine, telbivudine, entecavir, adefovir and tenofovir drugs (31). If all the mutations of one profile are found, the profile is reported with the associated drug and resistance status. There are three possible resistance statuses designated by ‘Sensitive' (S), ‘Intermediate' (I, reduced susceptibility) and ‘Resistant' (R). The algorithm output provides the detected profile, the status and mutation positions in the query sequence and according to the RT numbering system (32). In its current implementation, our resistance tool does not look for antiviral drug-associated potential vaccine-escape mutants [ADAPVEM (33)] contrarily to the HepSeq polymerase annotator (17).

DATABASE CONTENT

The text format of an HBVdb entry is an extension of the ENA-Annotation format as used for the euHCVdb (34). Some elements of the ENA-Annotation entry are conserved such as the accession number, the organism name, the creation date, and the references. After the annotation procedure, some elements are corrected and/or completed in the entry, mainly in the features. Indeed, a set of new qualifiers that store specific data is added to some features. The qualifier PRABI_genotype is added to the source feature to indicate a provisional genotype predicted by the genotyping tool. The qualifier PRABI_name is added to each feature to ensure standard names across all the database entries. Concerning the protein annotations, some qualifiers are added to the mat_peptide features. These qualifiers, noted PRABI_prodft, follow the feature table format of the UniProtKB database (35). The PRABI_prodft qualifiers describe the protein chains, the protein domains, some sites like active sites (act_site) that designate the catalytic residues of enzymes, and the resistance mutations (res_mut) with the drug and the resistance status.

WEB INTERFACE

The HBVdb is accessible through a website (Figure 2A) divided into two parts. In the static part, the user can find general information about HBV and the nomenclature used through genome organization, protein descriptions (Figure 2B), the reference genomes and the genotypes. The user can also access pre-computed ‘Nucleotide’ and ‘Protein’ datasets sorted according to the genotype (rows) and the protein or CDS names (columns). The datasets provide full-length sequences in Pearson/Fasta format, their multiple sequence alignments computed with Muscle (36) and displayed in Clustal W format (Figure 2C), and the corresponding residue repertoires with Shannon entropies that are useful for analysing conserved/variable alignment positions. The user can download the corresponding files for further analysis. Furthermore, the alignments can be interactively edited with the ‘EditAlignment’ applet developed by our team. In the dynamic part, users can extract their own dataset by combining multiple criteria (e.g. genotype and sequence length and protein name). The datasets can be exported as Pearson/Fasta sequences, accession number lists and entry flat files for further analysis with the integrated analysis tools.

Figure 2.

Snapshot of the HBVdb web interface. (A) The HBVdb homepage, with the menu bar and all the menus repeated as buttons for ease-of-use. (B) The HBV core protein information page. (C) The multiple sequence alignment of the genotype H SHBs proteins. (D) The Annotate tool detailed result page for an input nucleotide sequence showing the CDS positions with links to their alignment, the genotype and the resistance status. (E) The Resistance tool result page showing the resistance status and the resistance profile. The available analysis tools are either generic or specialized. The generic analysis tools (e.g. BLAST (37) or Clustal W (38)) are available through the NPS@ server (39), that is an integrated sequence analysis web server. ‘Annotate’ (Figure 2D), ‘Genotype’ and ‘Resistance’ (Figure 2E) specialized tools allow the analysis of one or several HBV sequences uploaded by the user. The annotation of a nucleotide sequence produces a result page listing the CDS found in the query sequence, presenting the predicted genotype, and giving the drug resistance status (with a link to resistance output file) if the nucleotide sequence contains the CDS of the RT domain. The results page also gives access to the global pairwise alignment between the query sequence and the reference genome, as well as the pairwise alignments for each CDS. The user can also access the text entry format of the annotated sequence. The annotation of a protein sequence ends up with a result page indicating the most similar sequence used for its annotation, with links to the entry and the pairwise alignment, and the resistance status if the sequence contains the RT domain. The ‘Genotype’ tool allows the user to genotype nucleotide sequences. It produces a result page giving the predicted genotype, and the genotype computed at each position of the query sequence. The ‘Resistance’ tool enables the detection of known drug resistance mutations from nucleotide or protein sequences. The output lists the drug, the resistance status (R, I, S) or ‘n.a.’ if the query sequence does not contain the HBV RT domain, and the identified mutations.

STATISTICS

HBVdb is available since June 2012. The release 4 (September 2012) comprises 39 289 sequences, including 3606 complete genomes.

CONCLUSIONS AND PERSPECTIVES

HBVdb is a collection of human HBV sequences, annotated thanks to a computer-automated system. The automatic annotation process used to generate HBVdb entries guarantees standard annotations and allows efficient keyword searches. The HBVdb website allows researchers to access pre-computed datasets and to extract data thanks to a dynamic query system. The extracted data can be further analysed with a set of bioinformatics tools available in the NPS@ server. HBVdb website also enables users to annotate their own sequences through the HBVdb automatic annotation process. In the future, the annotation will be enriched with new data (e.g. mapping of promoters) and analysis tools (e.g. ADAPVEM detection) will be added. The HBVdb and related programs will be useful to face the analysis of the flood of sequences that are generated thanks to the Next-Generation Sequencing (NGS) technologies.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online: Supplementary Tables 1 and 2.

FUNDING

J.H. is recipient of a doctoral fellowship from the Ministère de l’Enseignement Supérieur et de la Recherche. HBVdb is hosted on the Pôle Rhône-Alpes de BioInformatique platform funded by the Groupement d’Intérêt Scientifique Infrastructures en Biologie Sante et Agronomie (GIS IBiSA). Funding for open access charge: Groupement d'Intérêt Scientifique Infrastructures en Biologie Sante et Agronomie (GIS IBiSA). Conflict of interest statement. None declared.

39 in total

Review 1. NPS@: network protein sequence analysis.

Authors: C Combet; C Blanchet; C Geourjon; G Deléage
Journal: Trends Biochem Sci Date: 2000-03 Impact factor: 13.807

2. An automated genotyping system for analysis of HIV-1 and other microbial sequences.

Authors: Tulio de Oliveira; Koen Deforche; Sharon Cassol; Mika Salminen; Dimitris Paraskevis; Chris Seebregts; Joe Snoeck; Estrelita Janse van Rensburg; Annemarie M J Wensing; David A van de Vijver; Charles A Boucher; Ricardo Camacho; Anne-Mieke Vandamme
Journal: Bioinformatics Date: 2005-08-02 Impact factor: 6.937

Review 3. EASL Clinical Practice Guidelines: management of chronic hepatitis B.

Authors:
Journal: J Hepatol Date: 2008-10-29 Impact factor: 25.083

4. SeqHepB: a sequence analysis program and relational database system for chronic hepatitis B.

Authors: Lilly K W Yuen; Anna Ayres; Margaret Littlejohn; Danielle Colledge; Andrew Edgely; William J Maskill; Stephen A Locarnini; Angeline Bartholomeusz
Journal: Antiviral Res Date: 2006-12-22 Impact factor: 5.970

5. Improved tools for biological sequence comparison.

Authors: W R Pearson; D J Lipman
Journal: Proc Natl Acad Sci U S A Date: 1988-04 Impact factor: 11.205

Review 6. Hepatitis B virus infection from an evolutionary point of view: how viral, host, and environmental factors shape genotypes and subgenotypes.

Authors: Natalia M Araujo; Ricardo Waizbort; Alan Kay
Journal: Infect Genet Evol Date: 2011-04-22 Impact factor: 3.342

Review 7. Hepatitis B virus resistance to nucleos(t)ide analogues.

Authors: Fabien Zoulim; Stephen Locarnini
Journal: Gastroenterology Date: 2009-09-06 Impact factor: 22.682

8. Ongoing and future developments at the Universal Protein Resource.

Authors:
Journal: Nucleic Acids Res Date: 2010-11-04 Impact factor: 16.971

9. The International Nucleotide Sequence Database Collaboration.

Authors: Ilene Karsch-Mizrachi; Yasukazu Nakamura; Guy Cochrane
Journal: Nucleic Acids Res Date: 2011-11-12 Impact factor: 16.971

10. HepSEQ: International Public Health Repository for Hepatitis B.

Authors: Saravanamuttu Gnaneshan; Samreen Ijaz; Joanne Moran; Mary Ramsay; Jonathan Green
Journal: Nucleic Acids Res Date: 2006-11-27 Impact factor: 16.971

70 in total

1. Fatal fulminant hepatitis caused by infection with subgenotype A1 hepatitis B virus with C1766T/T1768A core promoter mutations.

Authors: Takashi Hoshino; Hitoshi Takagi; Yuhei Suzuki; Atsushi Naganuma; Ken Sato; Satoru Kakizaki; Tsutomu Nishizawa; Hiroaki Okamoto; Masanobu Yamada
Journal: Clin J Gastroenterol Date: 2016-05-10

2. Ultradeep pyrosequencing and molecular modeling identify key structural features of hepatitis B virus RNase H, a putative target for antiviral intervention.

Authors: Juliette Hayer; Christophe Rodriguez; Georgios Germanidis; Gilbert Deléage; Fabien Zoulim; Jean-Michel Pawlotsky; Christophe Combet
Journal: J Virol Date: 2013-10-30 Impact factor: 5.103

3. A non-viral CRISPR/Cas9 delivery system for therapeutically targeting HBV DNA and pcsk9 in vivo.

Authors: Chao Jiang; Miao Mei; Bin Li; Xiurui Zhu; Wenhong Zu; Yujie Tian; Qiannan Wang; Yong Guo; Yizhou Dong; Xu Tan
Journal: Cell Res Date: 2017-01-24 Impact factor: 25.617

4. Bioinformatic curation and alignment of genotyped hepatitis B virus (HBV) sequence data from the GenBank public database.

Authors: Trevor G Bell; Mukhlid Yousif; Anna Kramvis
Journal: Springerplus Date: 2016-10-28

5. Characterization of small genomic regions of the hepatitis B virus should be performed with more caution.

Authors: Lei Jia; Fengyu Hu; Hanping Li; Lin Li; Xiaoping Tang; Yongjian Liu; Haohui Deng; Jingwan Han; Jingyun Li; Weiping Cai
Journal: Virol J Date: 2018-12-10 Impact factor: 4.099

Review 6. Unraveling the web of viroinformatics: computational tools and databases in virus research.

Authors: Deepak Sharma; Pragya Priyadarshini; Sudhanshu Vrati
Journal: J Virol Date: 2014-11-26 Impact factor: 5.103

7. No Resistance to Tenofovir Alafenamide Detected through 96 Weeks of Treatment in Patients with Chronic Hepatitis B Infection.

Authors: Andrea L Cathcart; Henry Lik-Yuen Chan; Neeru Bhardwaj; Yang Liu; Patrick Marcellin; Calvin Q Pan; Maria Buti; Stephanie Cox; Bandita Parhy; Eric Zhou; Ross Martin; Silvia Chang; Lanjia Lin; John F Flaherty; Kathryn M Kitrinos; Anuj Gaggar; Namiki Izumi; Young-Suk Lim
Journal: Antimicrob Agents Chemother Date: 2018-09-24 Impact factor: 5.191

Review 8. Molecular identification of hepatitis B virus genotypes/subgenotypes: revised classification hurdles and updated resolutions.

Authors: Mahmoud Reza Pourkarim; Samad Amini-Bavil-Olyaee; Fuat Kurbanov; Marc Van Ranst; Frank Tacke
Journal: World J Gastroenterol Date: 2014-06-21 Impact factor: 5.742

9. Quantification, epitope mapping and genotype cross-reactivity of hepatitis B preS-specific antibodies in subjects vaccinated with different dosage regimens of BM32.

Authors: Inna Tulaeva; Carolin Cornelius; Petra Zieglmayer; René Zieglmayer; René Schmutz; Patrick Lemell; Milena Weber; Margarete Focke-Tejkl; Alexander Karaulov; Rainer Henning; Rudolf Valenta
Journal: EBioMedicine Date: 2020-08-24 Impact factor: 8.143

Review 10. Hepatitis B virus nuclear export elements: RNA stem-loop α and β, key parts of the HBV post-transcriptional regulatory element.

Authors: Chun Shen Lim; Chris M Brown
Journal: RNA Biol Date: 2016-03-31 Impact factor: 4.652