| Literature DB >> 23433509 |
Clemens Wrzodek1, Finja Büchel, Manuel Ruff, Andreas Dräger, Andreas Zell.
Abstract
BACKGROUND: The KEGG PATHWAY database provides a plethora of pathways for a diversity of organisms. All pathway components are directly linked to other KEGG databases, such as KEGG COMPOUND or KEGG REACTION. Therefore, the pathways can be extended with an enormous amount of information and provide a foundation for initial structural modeling approaches. As a drawback, KGML-formatted KEGG pathways are primarily designed for visualization purposes and often omit important details for the sake of a clear arrangement of its entries. Thus, a direct conversion into systems biology models would produce incomplete and erroneous models.Entities:
Mesh:
Year: 2013 PMID: 23433509 PMCID: PMC3623889 DOI: 10.1186/1752-0509-7-15
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Figure 1Generation of systems biology models from KEGG pathways. The flowchart shows all major steps involved in the creation of initial systems biology models from KEGG pathways. The whole method requires two sources: a KGML-formatted KEGG pathway and access to other KEGG databases, e.g., via the KEGG API. The preprocessing steps, depicted on the top, involve mainly the removal of inappropriate nodes and processing of reactions. An important step is the removal of duplicate entries. However, some further steps require information about these duplicates (e.g., when using the layout extension package for SBML) and thus, it is not always part of the preprocessing and may be performed at a later stage. Depending on the desired output format, separate processing steps are executed that involve appropriate conversion and annotation of the initial model.
Figure 2Simplified class structure and mapping from KGML to BioPAX. The figure shows the raw mapping of KGML to BioPAX class instances. The type attribute of each entry determines how it is translated (see Table 1). Reactions that are catalyzed by enzymes are translated to Catalysis, whereas non-catalyzed reactions are translated directly to BiochemicalReactions. Relations are translated differently, depending on their subtype, the participating entities and the chosen BioPAX level (see Table 2). To keep the clarity, the figure does not include the information that in BioPAX Level 2, control and conversion inherit from physicalInteraction. Furthermore, a Catalysis consists of two elements: a Controller and a Controlled element. For our purposes, Controller is always an enzyme and Controlled is a BiochemicalReaction. Similarly, KGML relations may be translated to a Control element that regulates either a Conversion or TemplateReaction.
BioPAX instances and SBO terms corresponding to KGML entry types
| compound | smallMolecule | 247 (simple chemical) |
| enzyme | protein | 252 (polypeptide chain) |
| gene | protein | 252 (polypeptide chain) |
| ortholog | protein | 252 (polypeptide chain) |
| group | complex | 253 (non-covalent complex) |
| map | pathway | 552 (reference annotation) |
This table depicts the conversion of KGML entries to BioPAX or SBML. The conversion depends on the KGML entry type attribute. For BioPAX, different class instances are initialized. Conversions to SBML always involve the creation of a species with the given SBO term for each KGML entry. The KGML specification states that an entry of type ‘gene’ “is a gene product (mostly a protein)”. Additionally, a ‘group’ “is a complex of gene products (mostly a protein complex)” [16]. For compatibility with previous KGML versions, the deprecated type ‘genes’ corresponds to ‘group’ since KGML v0.6.1. Further, entries of type ‘reaction’ are not listed in the table, but discussed in a separate section.
BioPAX instances and ontology terms corresponding to KGML relation subtypes
| activation | conversion, control | 170 (stimulation) | ||
| inhibition | conversion, control | 169 (inhibition) | ||
| expression | TemplateReaction, -Regulation | 170 (stimulation) | 10467 | |
| repression | TemplateReaction, -Regulation | 169 (inhibition) | ||
| indirect effect | conversion | 344 (molecular interaction) | ||
| state change | conversion | 168 (control) | ||
| binding/association | ComplexAssembly | 177 (non-covalent binding) | 914 | 5488 |
| dissociation | ComplexAssembly | 180 (dissociation) | ||
| missing interaction | MolecularInteraction | 396 (uncertain process) | ||
| phosphorylation | conversion, control | 216 (phosphorylation) | 217 | 16310 |
| dephosphorylation | conversion, control | 330 (dephosphorylation) | 203 | 16311 |
| glycosylation | conversion, control | 217 (glycosylation) | 559 | 70085 |
| ubiquitination | conversion, control | 224 (ubiquitination) | 220 | 16567 |
| methylation | conversion, control | 214 (methylation) | 213 | 32259 |
This table shows how relations are handled during conversion to BioPAX or SBML. The conversion depends on the subtype of each relation. For each subtype, the corresponding BioPAX element, as well as terms from different ontologies are specified. When converting to BioPAX, all terms are annotated as an instance of InteractionVocabulary, whereas an SBML transition has a field for the SBO term and other terms are added as controlled vocabularies on the transition. Please note that some BioPAX elements are subject to certain conditions and others need to be replaced by more generic classes in BioPAX Level 2, due to differences in both releases. Please see the KEGG to BioPAX section for more details.
Figure 3Simplified class structure and mapping from KGML to SBML. This mapping includes the SBML qualitative models (qual) and groups extension packages. Most properties are encoded as attributes on the actual classes. Tables 1 and 2 give further details about translation of entries and relations. SBML can only handle reactions. Therefore, SBML-qual is required to properly encode relations. This extension package requires its own model. Subsequently, the SBML-core model and each species have to be duplicated to obtain a qualitativeModel including the translated relations. Furthermore, the groups extension package can be used for a proper encoding of groups in SBML.
Comparison of different available converters for KEGG pathways
| | ||||||
|---|---|---|---|---|---|---|
| SBML | ✓ | ∘ | ✓ | ✓ | ✓ | ✓ |
| BioPAX | - | ✓ | - | ✓ | - | ✓ |
| Machine interpretable | ∘ | ✓ | ∘ | ✓ | ✓ | ✓ |
| Human interpretable | ✓ | - | ✓ | - | ✓ | ✓ |
| Signaling pathways | - | - | - | - | - | ✓ |
| Complete | - | ✓ | ✓ | - | ✓ | ✓ |
| No duplicate entries | ✓ | ✓ | ✓ | ✓ | - | ✓ |
| No duplicate reactions | ✓ | - | ✓ | ✓ | ✓ | ✓ |
| Unbundle reactions | ✓ | - | - | - | - | ✓ |
| Revision of reactions | ✓ | - | - | - | ✓ | ✓ |
| Stoichiometry | - | - | - | - | - | ✓ |
| Valid | ✓ | n/a | - | ✓ | ✓ | ✓ |
| Level.Version | 1.1 up to 2.3 | n/a | 2.1 | 2.4 | 2.4 | 2.4, 3.1 |
| SBO terms | - | n/a | - | - | ✓ | ✓ |
| Notes | - | n/a | - | - | ✓ | ✓ |
| Annotations | - | n/a | - | - | ✓ | ✓ |
| Valid | n/a | - | n/a | - | n/a | ✓ |
| Level | n/a | 2 | n/a | 2 | n/a | 2, 3 |
| Appropriate classes | n/a | ✓ | n/a | - | n/a | ✓ |
| Notes | n/a | - | n/a | - | n/a | ✓ |
| Annotations | n/a | ✓ | n/a | - | n/a | ✓ |
| SM annotations | n/a | - | n/a | - | n/a | ✓ |
This table compares various applications that can convert KEGG pathways to BioPAX or SBML models. A checkmark (✓) is given, if the corresponding converter completely fulfills all requirements, a circle (∘) states that the requirements are only met partially or incorrectly and a minus (-) indicates features, which are not contained at all. ‘n/a’ indicates that a criterion is not applicable to a converter. A model is Machine interpretable if entities in the model can directly be mapped to a database. The criterion Human interpretable indicates that a model somehow assigns human readable names or gene symbols to entities. Signaling pathways are supported if the converters can read and convert KEGG models with relations. A conversion is complete if every relevant reaction of a KGML pathway also occurs in any form in the translated document. For visualization purposes, KGML files often contain multiple copies of entries or reactions. These duplicates should be removed. The contained reactions are often bundled (multiple reactions are summarized as one) or miss some reaction participants. Revision of reactions refers to the completion of missing reaction participants. The stoichiometry is not contained in KGML documents and must be parsed from reaction equations in the KEGG REACTION database. To test the validity of the models, we used the corresponding validators from SBML.org and BioPAX.org. A model is marked as valid, if the validator does not return any errors. For SBML, we further inspect if the models contain SBO terms. It is further recommended to include notes, such as human readable descriptions, and annotations (e.g., cross-references in form of CV terms, MIRIAM URNs, Xrefs). Only for BioPAX, it is important to use the appropriate classes (instances of smallMolecule for small molecules and instances of protein for proteins) and a nice feature to fill the available BioPAX fields for chemical formula or molecular weight of small molecules (SM annotations).