| Literature DB >> 26175801 |
Gang Fu1, Colin Batchelor2, Michel Dumontier3, Janna Hastings4, Egon Willighagen5, Evan Bolton1.
Abstract
BACKGROUND: PubChem is an open repository for chemical structures, biological activities and biomedical annotations. Semantic Web technologies are emerging as an increasingly important approach to distribute and integrate scientific data. Exposing PubChem data to Semantic Web services may help enable automated data integration and management, as well as facilitate interoperable web applications. DESCRIPTION: This work, one of a series covering the PubChemRDF project, describes an approach to translate PubChem Substance and Compound information into Resource Description Framework (RDF) format. Basic examples are provided to demonstrate its use. The aim of this effort is to provide two new primary benefits to researchers in a cost-effective manner. Firstly, we aim to remove the inherent limitations of using the web-based resource PubChem by allowing a researcher to use readily available semantic technologies (namely, RDF triple stores and their corresponding SPARQL query engines) to query and analyze PubChem data on local computing resources. Secondly, this work intends to help improve data sharing, analysis, and integration of PubChem data to resources external to NCBI and across scientific domains, by means of the association of PubChem data to existing ontological frameworks, including CHEMical INFormation ontology, Semanticscience Integrated Ontology, and others.Entities:
Year: 2015 PMID: 26175801 PMCID: PMC4500850 DOI: 10.1186/s13321-015-0084-4
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
The prefixes and corresponding namespaces of PubChem subdomains and standardized ontologies
| Prefixa | Namespaceb |
|---|---|
| PubChemRDF subdomains | |
| compound |
|
| substance |
|
| descr |
|
| inchikey |
|
| syno |
|
| concept |
|
| reference |
|
| nbr |
|
| source |
|
| vocab | http://rdf.ncbi.nlm.nih.gov/pubchem/vocabulary# |
| External RDF resources | |
| pdbr |
|
| mesh |
|
| chembl |
|
| linkedchem |
|
aPrefix substitutes full URI namespace in the context of XML qualified name (QName).
bNamespaces can be associated with element and attribute names in URI references; SIO and CHEMINF share the same namespace.
Figure 1RDF diagram representing the attributes for substances SID103554720, SID43118161, SID26697365, SID822166, and compound CID60823, as well as the annotations for synonym and InChIKey instances.
Figure 2RDF diagram representing PubChem data provenance model.
CHEMINF IDs, corresponding labels, and definitions of terms used to annotate interrelationship between compounds and substances
| CHEMINF term ID | Label | Definition |
|---|---|---|
| CHEMINF_000477 | Has PubChem normalized counterpart | Non-symmetrica predicate between substance as domainb and compound as rangec |
| CHEMINF_000480 | Has component with uncharged counterpart | Non-symmetric predicate between a mixture compound as domain and its component as range |
| CHEMINF_000455d | Is isotopologue of | Symmetrice predicate between two compounds (isotopomers) |
| CHEMINF_000461d | Is stereoisomer of | Symmetric predicate between two compounds (stereoisomers) |
| CHEMINF_000462 | Has same connectivity as | Symmetric predicate between two compounds with same connectivity |
| CHEMINF_000482 | Similar to by PubChem 2-D similarity algorithm | Symmetric predicate between two similar compounds according to 2-D Tanimoto score |
| CHEMINF_000483 | Similar to by PubChem 3-D similarity algorithm | Symmetric predicate between two similar compound according to 3-D Shape and Color Tanimoto scores |
aNon-symmetric means the subject and object in the triple are not interchangeable.
bDomain is the subject of triple.
cRange is the object of triple.
dThe predicate is sub-property of CHEMINF_000462.
eSymmetric means the subject and object in the triple are interchangeable.
Figure 3RDF diagram representing the calculated attributes of CID60823, and its interconnections with other compounds.
Figure 4RDF diagram representing PubChem 2-/3-D similarity neighboring and score.
Calculated chemical descriptor and the corresponding ontology term ID
| Property name | Term ID | Software library |
|---|---|---|
| Molecular weight | CHEMINF_000334 | PubChem |
| Molecular formula | CHEMINF_000335 | |
| Total formal charge | CHEMINF_000336 | |
| Mono isotopic weight | CHEMINF_000337 | |
| Exact mass | CHEMINF_000338 | |
| Compound identifier | CHEMINF_000140 | |
| Covalent unit count | CHEMINF_000369 | |
| Defined atom stereocenter count | CHEMINF_000370 | |
| Defined bond stereocenter count | CHEMINF_000371 | |
| Isotope atom count | CHEMINF_000372 | |
| Heavy atom count | CHEMINF_000373 | |
| Undefined atom stereocenter count | CHEMINF_000374 | |
| Undefined bond stereocenter count | CHEMINF_000375 | |
| Canonical SMILES | CHEMINF_000376 | OEChem |
| Isomeric SMILES | CHEMINF_000379 | |
| Preferred IUPAC name | CHEMINF_000382 | LexiChem |
| Hydrogen bond donor count | CHEMINF_000387 | Cactvs |
| Hydrogen bond acceptor count | CHEMINF_000388 | |
| Rotatable bond count | CHEMINF_000389 | |
| Structure complexity | CHEMINF_000390 | |
| Tautomer count | CHEMINF_000391 | |
| TPSA | CHEMINF_000392 | |
| XLogP3 | CHEMINF_000395 | XLogP3 |
| IUPAC InChI | CHEMINF_000396 | InChI |
| IUPAC InChIKey | CHEMINF_000399 |
The software library used by PubChem to calculate property values are associated with each chemical property.
Name, version, and corresponding CHEMINF term ID for software libraries used to calculate chemical properties of PubChem compound
| Name | Versiona | CHEMINF term ID |
|---|---|---|
| PubChem | 2.1 | CHEMINF_000333 |
| OEChem | 1.9.0 | CHEMINF_000429 |
| LexiChem | 2.2.0 | CHEMINF_000384 |
| Cactvs | 3.408 | CHEMINF_000386 |
| XLogP3 | 3.0 | CHEMINF_000394 |
| InChI | 1.0.4 | CHEMINF_000398 |
aPlease note that these versions will change as a function of software updates.
The types and corresponding CHEMINF term ID of the depositor-provided synonyms and identifiers
| Database identifier | CHEMINF term ID |
|---|---|
| ChEMBL identifier | CHEMINF_000412 |
| KEGG identifier | CHEMINF_000409 |
| Human Metabolome Database identifier | CHEMINF_000408 |
| ChemSpider identifier | CHEMINF_000405 |
| ChEBI identifier | CHEMINF_000407 |
| DrugBank identifier | CHEMINF_000406 |
| CAS registry number | CHEMINF_000446 |
| EC numbera | CHEMINF_000447 |
| RTECS numberb | CHEMINF_000566 |
| LipidMaps identifier | CHEMINF_000564 |
| National service center number | CHEMINF_000565 |
| Unique ingredient identifier | CHEMINF_000563 |
| Validated chemical database identifierc | CHEMINF_000467 |
| Drug trade name | CHEMINF_000561 |
| International nonproprietary name | CHEMINF_000562 |
| PubChem depositor-supplied name | CHEMINF_000339 |
aA seven-digit identifier for chemical substances for regulatory purposes within the European Union.
bIdentifying numbers used in the Registry of Toxic Effects of Chemical Substances (RTECS) database of toxicity information.
c Identifying descriptor is the superclass of other identifier types in the table.