| Literature DB >> 26055101 |
Ismael Navas-Delgado1, María Jesús García-Godoy2, Esteban López-Camacho2, Maciej Rybinski2, Armando Reyes-Palomares3, Miguel Ángel Medina3, José F Aldana-Montes2.
Abstract
In the last few years, the Life Sciences domain has experienced a rapid growth in the amount of available biological databases. The heterogeneity of these databases makes data integration a challenging issue. Some integration challenges are locating resources, relationships, data formats, synonyms or ambiguity. The Linked Data approach partially solves the heterogeneity problems by introducing a uniform data representation model. Linked Data refers to a set of best practices for publishing and connecting structured data on the Web. This article introduces kpath, a database that integrates information related to metabolic pathways. kpath also provides a navigational interface that enables not only the browsing, but also the deep use of the integrated data to build metabolic networks based on existing disperse knowledge. This user interface has been used to showcase relationships that can be inferred from the information available in several public databases.Entities:
Mesh:
Year: 2015 PMID: 26055101 PMCID: PMC4460419 DOI: 10.1093/database/bav053
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Comparison of kpath Browser features with those of related tools
| Reference information | Pathway Editor Tool | Multiple species | Facilitate sharing across group members | Attached source information on nodes and edges | Manipulate node properties | Pathway comparison/alignment | Multiple-linked views | Zooming | Query Pathways | Genetic information and pathways | Build history of edited pathways | Update of Database | Building network from node list(s) by search | Integration of updated data from multiple sources | Export to standard formats | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Kegg | ✓ | ✓ | ✓ | |||||||||||||
| Biocarta | ✓ | ✓ | ✓ | |||||||||||||
| EcoCyc | ✓ | ✓ | ||||||||||||||
| Pathway Editor | ✓ | ✓ | ✓ | |||||||||||||
| PathwayAssist | ✓ | ✓ | ✓ | ✓ | ✓ | |||||||||||
| GenePath | ✓ | ✓ | ✓ | |||||||||||||
| GeneMAPP | ✓ | ✓ | ✓ | ✓ | ✓ | |||||||||||
| Cytoscape | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||||
| Knowledge Editor | ✓ | |||||||||||||||
| Biological Story Editor | ✓ | ✓ | ||||||||||||||
| Patika | ✓ | ✓ | ✓ | ✓ | ✓ | |||||||||||
| Genies | ✓ | ✓ | ||||||||||||||
| Vector PathBlazer | ✓ | ✓ | ✓ | ✓ | ✓ | |||||||||||
| MapMan | ✓ | ✓ | ||||||||||||||
| Pubgene | ✓ | ✓ | ✓ | ✓ | ✓ | |||||||||||
| MetScape 3 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||
| MetDraw | ✓ | ✓ | ||||||||||||||
| GeneSpring | ✓ | ✓ | ||||||||||||||
| GenePath | ✓ | |||||||||||||||
| Chibe | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||||||||
| PCViz | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||||||
| STKE | ✓ | ✓ | ✓ | |||||||||||||
| KeggScape | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||
| kpath Browser | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
aCells of this group correspond to database websites where pathways can be visualized.
bCells of this group correspond to visualization tools.
Figure 1.A conceptual representation of the data model proposed for the RDF pathway database. Circles denote concepts and links denote properties; concepts on the respective ends of a link (property) denote its domain and range.
Data properties and their provenance (prefix khaos-pathways omitted for readability)
| Concept | Property | Provenance | |
|---|---|---|---|
| Compound | Name | Kegg and Reactome | |
| synonym | Kegg | ||
| Formula | Kegg | ||
| Mass | Kegg | ||
| Enzyme | Name | Kegg | |
| Synonym | Kegg | ||
| Ecnumber | Kegg | ||
| Gene | Id | Kegg | |
| Glycan | Name | Kegg | |
| Synonym | Kegg | ||
| Formula | Kegg | ||
| Mass | Kegg | ||
| Pathway | Name | Kegg and Reactome | |
| Metabolism | Own Data | ||
| Protein | Name | Uniprot/Swissprot | |
| Synonym | Uniprot/Swissprot | ||
| UniprotID | Uniprot/Swissprot | ||
| GeneName | Uniprot/Swissprot | ||
| Keyword | Uniprot/Swissprot | ||
| Comment | Uniprot/Swissprot | ||
| Organism | Name | NCBI taxonomy | |
| Synonym | NCBI taxonomy | ||
| KeggCode | Kegg | ||
| Comment | NCBI taxonomy | ||
| Reaction | Name | Kegg and reactome | |
A summary of nodes and edges stored in kpath RDF graph
| Concept | No. of entities | Outgoing links per entity | Literals per entity |
|---|---|---|---|
| pathways:Compound | 24 708 | – | 3.9 |
| pathways:Enzyme | 4245 | 105.34 | 6.81 |
| pathways:Gene | 689 997 | – | 2 |
| pathways:Glycan | 10965 | – | 2.1 |
| pathways:Organism | 2278 | – | 3.69 |
| pathways:Pathway | 83 097 | 16.36 | 2.01 |
| pathways:Protein | 538 849 | 1.22 | 16.95 |
| pathways:Reaction | 12 815 | 6.68 | 1.87 |
The Concept column represents all types (classes) stored in the RDF graph of the endpoint. The no. of entities represents the number of resources included in each type (e.g. in the pathway class). The third column represents the average of edges (links) in the graph. The fourth column represents the average of literals per resource.
Figure 2.The top of the figure shows the SPARQL query used to get Kegg genes and SwissProt proteins linked by their corresponding organism codes and EC numbers. Results are shown at the bottom of the figure.
Figure 3.kpath Browser user interface to search for pathways.
Figure 4.Detailed biochemical reactions and associated components for each case of use. (A) Glyoxylase deficiency. (B) Hypercholesterolemia.