| Literature DB >> 17094801 |
Markus Fischer1, Quan K Thai, Melanie Grieb, Jürgen Pleiss.
Abstract
BACKGROUND: The emerging field of integrative bioinformatics provides the tools to organize and systematically analyze vast amounts of highly diverse biological data and thus allows to gain a novel understanding of complex biological systems. The data warehouse DWARF applies integrative bioinformatics approaches to the analysis of large protein families. DESCRIPTION: The data warehouse system DWARF integrates data on sequence, structure, and functional annotation for protein fold families. The underlying relational data model consists of three major sections representing entities related to the protein (biochemical function, source organism, classification to homologous families and superfamilies), the protein sequence (position-specific annotation, mutant information), and the protein structure (secondary structure information, superimposed tertiary structure). Tools for extracting, transforming and loading data from public available resources (ExPDB, GenBank, DSSP) are provided to populate the database. The data can be accessed by an interface for searching and browsing, and by analysis tools that operate on annotation, sequence, or structure. We applied DWARF to the family of alpha/beta-hydrolases to host the Lipase Engineering database. Release 2.3 contains 6138 sequences and 167 experimentally determined protein structures, which are assigned to 37 superfamilies 103 homologous families.Entities:
Mesh:
Substances:
Year: 2006 PMID: 17094801 PMCID: PMC1647292 DOI: 10.1186/1471-2105-7-495
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Conceptual data schema for DWARF, using Logical Data Structure (LDS) notation. Each box represents an entity, which is implemented as a database table. Primary Key attributes that identify each data instance of the entity are underlined. Entities associated with the protein are shaded in dark grey on the upper left, protein structure in white on the upper right, and entities describing protein sequence properties are shaded in light grey. Lines between the entities describe existing relationships.
Oxyanion hole region for representative sequences of the GGGX class.
| abH1 | 1MX1F | |
| abH2 | Q99156 | |
| abH2 | 50546206 | |
| abH3 | 1LPS | |
| abH4 | 1JJIA | |
| abH5 | AAC50666 | |
| abH5 | CAF98042 | |
| abH6 | 1JKMA |
The sequence block was extracted from an alignment of representative GGGX sequences. This alignment illustrates the high similarity within the oxyanion hole region for the families abH2 and abH5 (without protein structure entries), and the remaining GGGX class families which include proteins with known structure. The oxyanion hole forming residue is indicated in bold letters.
Figure 2Distribution of protein functions for the three α/β-hydrolase classes GGGX (a), GX (b), and Y (c). Putative proteins with no function assigned and functions with a low percentage were grouped into "others".