| Literature DB >> 11125070 |
M M Albà1, D Lee, F M Pearl, A J Shepherd, N Martin, C A Orengo, P Kellam.
Abstract
VIDA is a new virus database that organizes open reading frames (ORFs) from partial and complete genomic sequences from animal viruses. Currently VIDA includes all sequences from GenBank for Herpesviridae, Coronaviridae and Arteriviridae. The ORFs are organized into homologous protein families, which are identified on the basis of sequence similarity relationships. Conserved sequence regions of potential functional importance are identified and can be retrieved as sequence alignments. We use a controlled taxonomical and functional classification for all the proteins and protein families in the database. When available, protein structures that are related to the families have also been included. The database is available for online search and sequence information retrieval at http://www.biochem.ucl.ac.uk/bsm/virus_database/ VIDA.html.Mesh:
Substances:
Year: 2001 PMID: 11125070 PMCID: PMC29831 DOI: 10.1093/nar/29.1.133
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1(a) Flow of procedures used to build VIDA 1.0. Virus protein sequences are retrieved for a defined virus family from GenBank and fields are parsed [see also (c)]. Three separate procedures are performed on the protein sequences, namely, identification of structural relatives (CATH domain profiles), construction of HPFs and protein functional annotation and virus taxonomical classification. These data are all mapped and visualised through web pages in a searchable format. (b) Schematic representation of an HPF with four protein members and one conserved sequence region, the rectangles and ovals with same filling represent regions of sequence similarity. Initially all sequence domains are identified using XDOM with default parameters. Our own C programs are then used to compile the HPFs based on the fact that all proteins in an HPF must share at least one region of conserved sequence similarity and the HPF should be as large as possible and not fragmented. (c) Example of homologous protein family table showing the virus proteins included in family 102. Within the table information is provided on the proposed function and functional class of the HPF. In addition the proteins, virus family, virus names and GenBank derived gene names are provided. Links to the HPF conserved sequence region, proteins in FASTA format and source EMBL record are included. Where available links to structural PDB files and CATH domains are also provided.
VIDA 1.0 summary information
| Virus family | Complete genomes | Protein entriesa | HPFsb | Structural relatives |
| Herpesviridae | 26 | 4196 | 756 | 61 |
| Coronaviridae | 7 | 491 | 42 | 3 |
| Arteriviridae | 7 | 790 | 14 | 1 |
| Total | 40 | 5477 | 812 | 65 |
aTotal number of complete protein sequence entries.
bHomologous protein families.