| Literature DB >> 31254167 |
Miroslav Kratochvíl1,2, Jiří Vondrášek1, Jakub Galgonek3.
Abstract
MOTIVATION: The existing connections between large databases of chemicals, proteins, metabolites and assays offer valuable resources for research in fields ranging from drug design to metabolomics. Transparent search across multiple databases provides a way to efficiently utilize these resources. To simplify such searches, many databases have adopted semantic technologies that allow interoperable querying of the datasets using SPARQL query language. However, the interoperable interfaces of the chemical databases still lack the functionality of structure-driven chemical search, which is a fundamental method of data discovery in the chemical search space.Entities:
Keywords: Interoperability; Linked data; Small molecule databases; Substructure search
Year: 2019 PMID: 31254167 PMCID: PMC6599361 DOI: 10.1186/s13321-019-0367-2
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Fig. 1Overview of the interoperable chemical structure search implementation, separated into logical layers. Clients submit the SPARQL queries to the frontend HTTP server in Apache Tomcat. The incoming queries are passed to the SPARQL engine, which uses the connected RDF/SQL mapping to translate them into equivalent SQL queries. The resulting queries are evaluated by backend PostgreSQL DBMS, using the Sachem extension (which executes chemical structure search) and pgSPARQL extension (which evaluates SPARQL-specific parts of the query and gathers metadata). Stored data include the relational data in PostgreSQL and specialized chemical indexes in Sachem. Resultsets of the evaluated queries are passed back to the SPARQL engine, translated into a SPARQL result in the requested format, and sent back to the client
Parameters of the chemical structure query executed via the SPARQL procedure call
| Parameter name | Description and values |
|---|---|
|
| |
| sachem:query | Query molecule structure, formatted as SMILES or MDL |
| sachem:topn | Maximum number of results to return |
|
| |
| sachem:searchMode | Chooses between exact structure and substructure search, values: |
| sachem:substructureSearch | |
| sachem:exactSearch | |
| sachem:tautomerMode | Tautomer handling, accepted values: |
| sachem:ignoreTautomers (do not consider tautomerism) | |
| sachem:inchiTautomers (use InChI-based algorithm [ | |
| sachem:chargeMode | Selects coalescing of unspecified charge values in query: |
| sachem:defaultChargeAsAny (unspecified charge is wildcard) | |
| sachem:defaultChargeAsZero (unspecified charge matches only uncharged atoms) | |
| sachem:ignoreCharges (ignores all charge annotations) | |
| sachem:isotopeMode | Selects coalescing of unspecified isotope values in query: |
| sachem:defaultIsotopeAsStandard (unspecified isotope matches only the standard isotope) | |
| sachem:defaultIsotopeAsAny (unspecified isotope is wildcard) | |
| sachem:ignoreIsotopes (ignore all isotope annotations) | |
| sachem:stereoMode | Handling of stereochemistry: |
| sachem:strictStereo (remove results with conflicting stereochemistry information) | |
| sachem:ignoreStereo (ignore all stereochemistry annotations) | |
|
| |
| sachem:cutoff | Minimum similarity score of returned results in range 0–1, defaults to 0.8 |
Fig. 2a An example of a federated SPARQL query that connects assay results of morphine derivatives to corresponding organism names. b Response to the same query in JSON format (shortened for brevity). c Schematic view of the distributed query processing. Colors match the execution place of the corresponding query parts, and the data source of the JSON response entries