| Literature DB >> 20959289 |
Brian Yang1, Samantha Sayers, Zuoshuang Xiang, Yongqun He.
Abstract
Protective antigens are specifically targeted by the acquired immune response of the host and are able to induce protection in the host against infectious and non-infectious diseases. Protective antigens play important roles in vaccine development, as biological markers for disease diagnosis, and for analysis of fundamental host immunity against diseases. Protegen is a web-based central database and analysis system that curates, stores and analyzes protective antigens. Basic antigen information and experimental evidence are curated from peer-reviewed articles. More detailed gene/protein information (e.g. DNA and protein sequences, and COG classification) are automatically extracted from existing databases using internally developed scripts. Bioinformatics programs are also applied to compute different antigen features, such as protein weight and pI, and subcellular localizations of bacterial proteins. Presently, 590 protective antigens have been curated against over 100 infectious diseases caused by pathogens and non-infectious diseases (including cancers and allergies). A user-friendly web query and visualization interface is developed for interactive protective antigen search. A customized BLAST sequence similarity search is also developed for analysis of new sequences provided by the users. To support data exchange, the information of protective antigens is stored in the Vaccine Ontology (VO) in OWL format and can also be exported to FASTA and Excel files. Protegen is publically available at http://www.violinet.org/protegen.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20959289 PMCID: PMC3013795 DOI: 10.1093/nar/gkq944
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Semi-automatic annotation of protective antigens in Protegen overall design and architecture. Manual curation includes peer-reviewed publications from PubMed. A PubMed ID (PMID) is extracted and used to retrieve detailed citation information (e.g. authors, journal, and date). The evidence that proves the status of protective antigen for each protein is curated from published experimental studies. Vaccines associated with the protective antigens are also curated. PDB IDs are manually retrieved when available to provide 3D structure information of individual protective antigens. Internally developed script uses an input sequence ID from a NCBI database (e.g. NCBI Entrez Gene database) to automatically retrieve different types of information. The extracted DNA and protein sequences are further used for bioinformatics analyses using different methods.
Curated protective antigens from select pathogens as of 14 August 2010
| Pathogen (Disease name) | Number of protective antigens |
|---|---|
| Bacteria (10 out of 44) | |
| | 19 |
| | 10 |
| | 10 |
| | 17 |
| | 10 |
| | 14 |
| | 24 |
| | 19 |
| | 12 |
| | 24 |
| Viruses (10 out of 40) | |
| Dengue virus (Dengue fever) | 4 |
| Ebola virus (Hemorrhagic fever) | 13 |
| Herpes simplex virus type 1 and type 2 (Herpes) | 9 |
| Human Immunodeficiency Virus (AIDS) | 5 |
| Human papillomavirus (HPV) | 4 |
| Influenza virus (Influenza) | 37 |
| Japanese encephalitis virus (Japanese encephalitis) | 6 |
| Marburg virus (Hemorrhagic fever) | 6 |
| Pseudorabies virus (Aujeszky's disease) | 8 |
| Rotavirus (Severe diarrhea) | 8 |
| Parasites (10 out of 19) | |
| | 3 |
| | 5 |
| | 4 |
| | 5 |
| | 11 |
| | 10 |
| | 8 |
| | 26 |
| | 3 |
| | 9 |
| Others | |
| Allergy | 14 |
| Cancer | 36 |
| Ricin Toxin | 1 |
| | 9 |
| Total | 403 |
Figure 2.Example of protective antigen query and BLAST sequence similarity analysis. A COG category search of ‘Cell wall/membrane/envelope biogenesis’ in conjunction with a subcellular localization search of ‘Outer Membrane’ (A) identified 11 genes from the Protegen database, including Pla from Yersinia pestis strain CO92, and Pal from Haemophilus influenza strain 86-028NP (B). Clicking the Protegen antigen ID associated with Pla provided curated data including the sequence strain, NCBI Gene GI, NCBI Protein GI, protein name, NCBI taxonomy ID, DNA and protein sequences as well as other information (C). A BLAST sequence similarity analysis of the DNA sequence produced multiple hits with significant alignments (D).