| Literature DB >> 17474977 |
Lucio Marcello1, Suraj Menon, Pauline Ward, Jonathan M Wilkes, Nicola G Jones, Mark Carrington, J David Barry.
Abstract
BACKGROUND: Trypanosomes are coated with a variant surface glycoprotein (VSG) that is so densely packed that it physically protects underlying proteins from effectors of the host immune system. Periodically cells expressing a distinct VSG arise in a population and thereby evade immunity. The main structural feature of VSGs are two long alpha-helices that form a coiled coil, and sets of relatively unstructured loops that are distal to the plasma membrane and contain most or all of the protective epitopes. The primary structure of different VSGs is highly variable, typically displaying only ~20% identity with each other. The genome has nearly 2000 VSG genes, which are located in subtelomeres. Only one VSG gene is expressed at a time, and switching between VSGs primarily involves gene conversion events. The archive of silent VSGs undergoes diversifying evolution rapidly, also involving gene conversion. The VSG family is a paradigm for alpha helical coiled coil structures, epitope variation and GPI-anchor signals. At the DNA level, the genes are a paradigm for diversifying evolutionary processes and for the role of subtelomeres and recombination mechanisms in generation of diversity in multigene families. To enable ready availability of VSG sequences for addressing these general questions, and trypanosome-specific questions, we have created VSGdb, a database of all known sequences. DESCRIPTION: VSGdb contains fully annotated VSG sequences from the genome sequencing project, with which it shares all identifiers and annotation, and other available sequences. The database can be queried in various ways. Sequence retrieval, in FASTA format, can deliver protein or nucleotide sequence filtered by chromosomes or contigs, gene type (functional, pseudogene, etc.), domain and domain sequence family. Retrieved sequences can be stored as a temporary database for BLAST querying, reports from which include hyperlinks to the genome project database (GeneDB) CDS Info and to individual VSGdb pages for each VSG, containing annotation and sequence data. Queries (text search) with specific annotation terms yield a list of relevant VSGs, displayed as identifiers leading again to individual VSG web pages.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17474977 PMCID: PMC1868767 DOI: 10.1186/1471-2105-8-143
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Information flow in a VSGdb query. The user inputs a query with defined parameters through a web interface, which is processed by CGI scripts to extract information from the source files and present the results as a web page.
Figure 2Annotation components utilized by VSGdb. The rectangles represent the primary tags which define parts of a VSG (CDS defines an entire VSG). The ovals represent features of each primary tag. Secondary annotation was carried out to link the systematic_ ID, domain_combo and product features of each VSG to all of its components. These features are used in the sequence retrieval (as FASTA or as a database for BLAST). All features are used in the construction of individual VSG pages. Features containing curation and comments are used in the text search.
Figure 3Screenshot of the web form to define parameters for retrieval of sequences from genome project (annotated) VSGs in FASTA format or as a temporary database against which to BLAST a query sequence. For non-genome project (unannotated) VSGs, only the type of sequence and the organism need to be specified.
Figure 4Screenshot of an individual VSG page (here Tb927.1.520). It contains all the sequence and annotation information available within the VSGdb for this particular VSG. Individual VSG pages can be accessed by typing in the VSG identifier in the searchbar or through the results of the text search and VSG BLAST functions. These are available only for annotated VSGs.