| Literature DB >> 15801979 |
Silke Trissl1, Kristian Rother, Heiko Müller, Thomas Steinke, Ina Koch, Robert Preissner, Cornelius Frömmel, Ulf Leser.
Abstract
BACKGROUND: Structural and functional research often requires the computation of sets of protein structures based on certain properties of the proteins, such as sequence features, fold classification, or functional annotation. Compiling such sets using current web resources is tedious because the necessary data are spread over many different databases. To facilitate this task, we have created COLUMBA, an integrated database of annotations of protein structures. DESCRIPTION: COLUMBA currently integrates twelve different databases, including PDB, KEGG, Swiss-Prot, CATH, SCOP, the Gene Ontology, and ENZYME. The database can be searched using either keyword search or data source-specific web forms. Users can thus quickly select and download PDB entries that, for instance, participate in a particular pathway, are classified as containing a certain CATH architecture, are annotated as having a certain molecular function in the Gene Ontology, and whose structures have a resolution under a defined threshold. The results of queries are provided in both machine-readable extensible markup language and human-readable format. The structures themselves can be viewed interactively on the web.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15801979 PMCID: PMC1087474 DOI: 10.1186/1471-2105-6-81
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Schematic entity-relationship model of COLUMBA. The dark gray part in the middle is the subschema that originates from the Protein Data Bank (PDB). The other subschemas are represented by a single box indicating the name of the data source and are grouped according to a broad classification of their content.
Data sources integrated in COLUMBA.
| Source | download page | format | Parsed by |
| PDB | flat file | BioPython | |
| SCOP | flat file | BioPython | |
| CATH | flat file | own | |
| DSSP | computed | - | own |
| ENYZME | flat file | BioPython | |
| Boehringer | HTML | own | |
| KEGG | HTML | own | |
| Swiss-Prot | flat file | bioSQL | |
| GO | flat file | bioSQL | |
| GOA | DB dump | COPY | |
| Taxonomy | flat file | bioSQL | |
| PISCES | DB dump | own |
The forth column gives the parsers used.
Number of entries from the PDB.
| from PDB | to | Number of entries |
| SCOP | 42 908 | |
| CATH | 32 825 | |
| DSSP | 54 028 | |
| chains (total 60 241) | Swiss- Prot | 36 651 |
| NCBI Taxonomy | 36 651 | |
| Gene Ontology via GOA | 36 008 | |
| PISCES | 8 367 | |
| SCOP & CATH | 32 439 | |
| SCOP & CATH & Swiss-Prot | 27 972 | |
| Enzyme | 12 510 | |
| Boehringer | 5 029 | |
| 7162 compounds (total 33 779) | KEGG | 7 162 7 162 9 172 |
| Enzyme & KEGG | 7 162 | |
| Enzyme & SCOP & CATH | 9 172 | |
| Enzyme & SCOP & CATH & KEGG | 5 054 | |
| Enzyme & Swiss-Prot | 9 440 | |
| entries (total 26 104) | all minus PISCES | 2 868 |
| all | 621 | |
They are divided into compounds and chains, which link to second-party databases and selected combinations of them.
Figure 2Screen shots of COLUMBA web-forms. (A) Interface for the full text search. (B) Query form for the metabolism information, where the result set can be restricted by information from ENZYME and KEGG.
Figure 3Screen shots of COLUMBA query results. (A) Result set for a query requesting structures from the ENZYME class '1.-.-.-' combined with a full text condition on 'TIM barrel'. (B) COLUMBA Explorer detailed view of the PDB structure 1d3h.
The number of enzymes for selected metabolic pathways from KEGG.
| Metabolic pathway | Enzyme total | CATH class | |||||
| Total | with str. | coverage | a / b | a | b | Few | |
| all pathways | 1 952 | 508 | 26,0 | 443 | 114 | 107 | 15 |
| Fatty acid biosynthesis (path 1) | 14 | 7 | 50,0 | 6 | 0 | 2 | 0 |
| Oxidative phosphorylation | 10 | 5 | 50,0 | 3 | 3 | 3 | 1 |
| Streptomycin biosynthesis | 14 | 7 | 50,0 | 6 | 0 | 1 | 0 |
| Pyrimidine metabolism | 59 | 30 | 50,8 | 29 | 6 | 5 | 0 |
| Selenoamino acid metabolism | 21 | 11 | 52,3 | 11 | 2 | 2 | 1 |
| Pentose phosphate pathway | 33 | 18 | 54,5 | 17 | 2 | 2 | 2 |
| Methionine metabolism | 23 | 13 | 56,5 | 13 | 3 | 1 | 1 |
| One carbon pool by folate | 23 | 13 | 56,5 | 13 | 2 | 1 | 0 |
| Phe, Tyr and Trp biosynthesis | 31 | 19 | 61,2 | 18 | 6 | 2 | 0 |
| Glycolysis / Gluconeogenesis | 38 | 24 | 63,1 | 24 | 2 | 5 | 2 |
| Reductive carboxylate cycle (CO2 fixation) | 13 | 9 | 69,2 | 8 | 3 | 1 | 1 |
| Aminoacyl-tRNA biosynthesis | 21 | 16 | 76,1 | 16 | 8 | 6 | 0 |
| Carbon fixation | 23 | 18 | 78,2 | 18 | 2 | 3 | 0 |
The sum of observations in CATH classes can be higher than the number of enzymes with structures from the pathway, because in one chain, several domains with distinct folds can occur.
Figure 4The CATH wheel for KEGG pathways. The color of the CATH wheel represents the CATH classes, where yellow stands for alpha/beta, red for mainly alpha, blue for mainly beta, and green for Few Secondary Structures. The inner circle represents the CATH architectures (C.A.), where the width of each segment represents the number of enzymes found to exhibit that architecture. The outer circle stands for the Topology (C.A.T.). (A) shows the distribution of all enzymes participating in KEGG pathways with the '3-layer(aba) sandwich' representing the largest architecture. (B) shows the CATH wheel for the pathway 'Pyrimidine metabolism' while (C) for 'Glycolysis/Gluconeogenesis'.