| Literature DB >> 26504853 |
Christian Simon1, Ulrich J Kudahl2, Jing Sun3, Lars Rønn Olsen4, Guang Lan Zhang5, Ellis L Reinherz6, Vladimir Brusic5.
Abstract
FluKB is a knowledge-based system focusing on data and analytical tools for influenza vaccine discovery. The main goal of FluKB is to provide access to curated influenza sequence and epitope data and enhance the analysis of influenza sequence diversity and the analysis of targets of immune responses. FluKB consists of more than 400,000 influenza protein sequences, known epitope data (357 verified T-cell epitopes, 685 HLA binders, and 16 naturally processed MHC ligands), and a collection of 28 influenza antibodies and their structurally defined B-cell epitopes. FluKB was built using a modular framework allowing the implementation of analytical workflows and includes standard search tools, such as keyword search and sequence similarity queries, as well as advanced tools for the analysis of sequence variability. The advanced analytical tools for vaccine discovery include visual mapping of T- and B-cell vaccine targets and assessment of neutralizing antibody coverage. FluKB supports the discovery of vaccine targets and the analysis of viral diversity and its implications for vaccine discovery as well as potential T-cell breadth and antibody cross neutralization involving multiple strains. FluKB is representation of a new generation of databases that integrates data, analytical tools, and analytical workflows that enable comprehensive analysis and automatic generation of analysis reports.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26504853 PMCID: PMC4609449 DOI: 10.1155/2015/380975
Source DB: PubMed Journal: J Immunol Res ISSN: 2314-7156 Impact factor: 4.818
Figure 1Overview of the architecture of FluKB. (a) Users can access FluKB through an interactive user interface where they can select specific data and tools or deploy a predefined workflow. (b) Top to bottom: data are collected, cleaned, and enriched. Higher-level knowledge extraction is enabled by utilization of tools assembled into workflows.
Figure 2Semiautomated generation and updating of protein sequence repository of FluKB (upper path) and data extraction and repository creation of the influenza specific epitopes from IEDB (lower path).
All the alternative instances of the host Mallard in all of the entries in the knowledge base of FluKB. The standardized name for the search key is Mallard; Anas platyrhynchos; 8839.
| Alternative names for Mallard | Number of times present in dataset | Status |
|
| ||
|
| 1 | Ambiguous |
|
| 9 | Error |
|
| 39 | Correct |
| Domestic duck | 58 | Variant |
| Domestic Mallard | 11 | Variant |
| Feral duck | 1 | Ambiguous |
| Khaki campbell duck | 12 | Variant |
| Mallard | 25,844 | Correct |
| Mallard duck | 1,172 | Redundant |
| Pekin duck | 137 | Variant |
| Peking duck | 92 | Error |
| Sentinel duck | 13 | Error |
| Wdk | 11 | Ambiguous (abbreviation) |
| Wild duck | 952 | Ambiguous |
|
| ||
| Total | 28,352 | |
The data fields in each protein entry.
| Field title | Field content |
|---|---|
| CVC accession | Accession number unique to FluKB |
| NCBI accession | Accession number unique to NCBI |
| GenBank GI | Accession number unique to GenBank |
| Type | The influenza virus type |
| Subtype | The serotype |
| Host | Host of collection |
| Country | Location of collection |
| Year | Time of collection |
| Isolate | Isolate name |
| Vaccine strain | Years the strain has been used as a vaccine strain |
| Original nomenclature | The original nomenclature from the raw data |
| Protein name | The protein name, based on template BLAST |
| Sequence type | Full or fragment of the protein sequence |
| Predict HLA binders | Predict HLA binders to the sequence |
| Blast sequence | Blast the sequence for similar sequences of FluKB |
| Sequence | The amino acid sequence of the entry |
| Epitopes in sequence | IEDB epitopes found in the protein sequence |
The types of errors found for the geographical location in the metadata of the entry.
| Type of error | Number |
|---|---|
| Case errors | 26,335 |
| Redundant information | 14,165 |
| Alternative name and abbreviations | 11,208 |
| Misspellings | 2027 |
| Alternative spellings | 9,112 |
| True location could not be determined | 9493 |
|
| |
| Total | 72,340 |
Figure 3Record FLU0306481, Protein HA representing the 2013 H7N9 outbreak of bird flu in China.
The sources of the integrated tools in FluKB and URL for their stand-alone versions web services.
| Tool | URL | Reference |
|---|---|---|
| BLAST |
| [ |
| MAFFT MSA |
| [ |
| NetMHCpan 2.8a |
| [ |
| NetMHCIIpan 3.0 |
| [ |
| Block entropy |
| ISO-3166 |
| BlockLogo |
| [ |
Figure 4T-cell epitope EPI102 and the T-cell epitope analysis from the record FLU0021633. (a) Entry EPI150 with the graphical display of present T-cell epitope variants. (b) Graphical display of predicted T-cell epitopes (HLA-B∗07:02, length 10, IC50 < 1000 nM).
Figure 5The results of B-cell epitope analysis II. Analysis of entry FLU0099373 for broadly neutralizing antibody F10 interaction. (a) The BLAST result of the query sequence to the sequence with highest identity in FLUKB, (b) the discontinuous peptide extracted from the query sequence and respective residue positions highlighted in full sequence, and (c) the summary information of neutralizing antibody. The discontinuous peptide from the query sequence is compared to all discontinuous peptides generated from FLUKB (with their neutralizing status listed). The identical sequence is highlighted in yellow with estimated status of neutralization given.