| Literature DB >> 24705205 |
Guang Lan Zhang1, Angelika B Riemer, Derin B Keskin, Lou Chitkushev, Ellis L Reinherz, Vladimir Brusic.
Abstract
High-risk human papillomaviruses (HPVs) are the causes of many cancers, including cervical, anal, vulvar, vaginal, penile and oropharyngeal. To facilitate diagnosis, prognosis and characterization of these cancers, it is necessary to make full use of the immunological data on HPV available through publications, technical reports and databases. These data vary in granularity, quality and complexity. The extraction of knowledge from the vast amount of immunological data using data mining techniques remains a challenging task. To support integration of data and knowledge in virology and vaccinology, we developed a framework called KB-builder to streamline the development and deployment of web-accessible immunological knowledge systems. The framework consists of seven major functional modules, each facilitating a specific aspect of the knowledgebase construction process. Using KB-builder, we constructed the Human Papillomavirus T cell Antigen Database (HPVdb). It contains 2781 curated antigen entries of antigenic proteins derived from 18 genotypes of high-risk HPV and 18 genotypes of low-risk HPV. The HPVdb also catalogs 191 verified T cell epitopes and 45 verified human leukocyte antigen (HLA) ligands. Primary amino acid sequences of HPV antigens were collected and annotated from the UniProtKB. T cell epitopes and HLA ligands were collected from data mining of scientific literature and databases. The data were subject to extensive quality control (redundancy elimination, error detection and vocabulary consolidation). A set of computational tools for an in-depth analysis, such as sequence comparison using BLAST search, multiple alignments of antigens, classification of HPV types based on cancer risk, T cell epitope/HLA ligand visualization, T cell epitope/HLA ligand conservation analysis and sequence variability analysis, has been integrated within the HPVdb. Predicted Class I and Class II HLA binding peptides for 15 common HLA alleles are included in this database as putative targets. HPVdb is a knowledge-based system that integrates curated data and information with tailored analysis tools to facilitate data mining for HPV vaccinology and immunology. To our best knowledge, HPVdb is a unique data source providing a comprehensive list of HPV antigens and peptides. Database URL: http://cvc.dfci.harvard.edu/hpv/.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24705205 PMCID: PMC3975992 DOI: 10.1093/database/bau031
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1.Schematic overview of the KB-builder framework.
Figure 2.Classification of the viruses in the HPVdb based on cancer risk was done using the virus classification system suggested by the ICTV.
The number of antigen entries in HPVdb grouped by their UniProt review status and type of antigen sequences
| Sequence Type | Reviewed | Not reviewed | Total |
|---|---|---|---|
| Complete sequence | 160 | 1684 | 1844 |
| Fragment | 2 | 935 | 937 |
| Total | 162 | 2619 | 2781 |
Figure 3.Screenshots of HPV antigen search tool and result pages. (A) HPV antigen search page. (B) The search result page—the accession numbers in the result table are hyperlinked to HPV antigen information pages. (C) HPV00092 (UniPort ID: P06788) information table.
Figure 4.(A) A screenshot of a T cell epitope record table in the HPVdb. This table catalogs the relevant information of T cell epitope T000125, i.e. epitope sequence, restricted HLA allele, PubMed id(s) of the reference paper(s) and its characteristics (e.g. information on how the epitope was identified). A multiple sequence alignment of the protein sequences containing the epitope (highlighted) is displayed. (B) A screenshot of the conservation analysis result page obtained by clicking on ‘check conservation of T cell epitope T000125’ button.
Figure 5.(A) A screenshot of sequence variability analysis tool page. (B) Plot of entropy (red curve) and percentage of sequences (blue curve) containing the consensus amino acid. The consensus sequence is shown below X-axis with conserved positions in blue. A conserved position is one with: entropy <1, gap fraction <0.1 and consensus amino acid >90%.
The analysis tools integrated in HPVdb and their URLs
| Tool | URL | References |
|---|---|---|
| BLAST | ( | |
| MAFFT MSA | ( | |
| NetMHCpan | ( | |
| NetMHCIIpan | ( | |
| Search tool for HPV antigens | ||
| Search tool for T cell epitope/HLA ligand | ||
| Blast HPVdb | ||
| MSA of HPV sequences | ||
| Sequence variability analysis tool | ( | |
| T cell epitope/HLA ligand visualization tool | ||
| Classification of the viruses based on cancer risk | ||
| HLA binding prediction tool | Embedded in each antigen entry table | |
| T cell epitope/HLA ligand conservation analysis tool | Embedded in each experimentally validated T cell epitope/HLA entry table; also embedded in each HLA binding prediction result page. |