| Literature DB >> 27293485 |
Sunghwan Kim1, Paul A Thiessen1, Tiejun Cheng1, Bo Yu1, Benjamin A Shoemaker1, Jiyao Wang1, Evan E Bolton1, Yanli Wang1, Stephen H Bryant1.
Abstract
BACKGROUND: PubChem is an open archive consisting of a set of three primary public databases (BioAssay, Compound, and Substance). It contains information on a broad range of chemical entities, including small molecules, lipids, carbohydrates, and (chemically modified) amino acid and nucleic acid sequences (including siRNA and miRNA). Currently (as of Nov. 2015), PubChem contains more than 150 million depositor-provided chemical substance descriptions, 60 million unique chemical structures, and 225 million biological activity test results provided from over 1 million biological assay records. DESCRIPTION: Many PubChem records (substances, compounds, and assays) include depositor-provided cross-references to scientific articles in PubMed. Some PubChem contributors provide bioactivity data extracted from scientific articles. Literature-derived bioactivity data complement high-throughput screening (HTS) data from the concluded NIH Molecular Libraries Program and other HTS projects. Some journals provide PubChem with information on chemicals that appear in their newly published articles, enabling concurrent publication of scientific articles in journals and associated data in public databases. In addition, PubChem links records to PubMed articles indexed with the Medical Subject Heading (MeSH) controlled vocabulary thesaurus.Entities:
Year: 2016 PMID: 27293485 PMCID: PMC4901473 DOI: 10.1186/s13321-016-0142-6
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Summary of depositor-provided cross-references to PubMed articles from PubChem substances and compounds
| Number of cross-references | Number of records involved | ||||
|---|---|---|---|---|---|
| NSID-PMID | NCID-PMID | NPMID | NSID | NCID | |
| All | 5,614,567 | 5,412,256 | 2,192,601 | 301,358 | 261,497 |
| IBM Almaden Research Centera | 5,196,617 | 5,125,878 | 2,107,354 | 152,777 | 147,576 |
| Comparative toxicogenomics databaseb | 226,585 | 111,029 | 110,000 | 14,463 | 7856 |
| NIAID ChemDBc | 144,477 | 133,012 | 11,951 | 114,953 | 104,418 |
| IUPHAR/BPS guide to PHARMACOLOGYd | 14,309 | 11,250 | 7163 | 6398 | 4913 |
| Human metabolome databasee | 13,998 | 13,971 | 10,414 | 1788 | 1781 |
| Immune epitope database (IEDB)f | 4849 | 3863 | 1747 | 2067 | 1948 |
| BioCycg | 3318 | 3267 | 1386 | 2989 | 2939 |
| DrugBankh | 3249 | 3225 | 3158 | 1044 | 1030 |
| Biocatalysis/biodegradation database (BBD)i | 2299 | 2299 | 644 | 1343 | 1342 |
| Bioinformatics and drug design (BIDD) groupj | 2270 | 2262 | 91 | 1768 | 1673 |
| Others | 2596 | 2200 | 1180 | 1768 | 1448 |
NSID and NCID are the number of PubChem substances and compounds with depositor-provided cross-references to PubMed articles, respectively, and NSID-PMID and NCID-PMID are the number of depositor-provided cross-references from PubChem substance and compound records to PubMed articles, respectively. NPMID is the number of unique PubMed articles associated with the PubChem records via the depositor-provided cross-references
a http://www.research.ibm.com/labs/almaden/index.shtml
bRef. [15]. http://ctdbase.org
c http://chemdb.niaid.nih.gov
dRef. [16]. http://www.guidetopharmacology.org
eRef. [17]. http://www.hmdb.ca
fRef. [18]. http://www.iedb.org
gRef. [19]. http://biocyc.org
hRef. [20]. http://www.drugbank.ca
iRef. [21]. http://eawag-bbd.ethz.ch
j http://bidd.nus.edu.sg
Summary of cross-references from literature-extracted bioassay data to PubMed articles
| Source | NAID-PMID | NSID-PMID | NCID-PMID | NPMID | NAID | NSID | NCID |
|---|---|---|---|---|---|---|---|
| ChEMBLa | 829,503 | 1,068,347 | 1,058,637 | 55,582 | 828,594 | 860,191 | 849,149 |
| PDBbindb | 6946 | 20,221 | 16,993 | 5252 | 4 | 10,543 | 8158 |
| IUPHAR/BPS guide to pharmacologyc | 442 | 1088 | 1080 | 151 | 55 | 273 | 264 |
| BindingDBd | 143 | 3114 | 3113 | 121 | 19 | 3101 | 3098 |
| GLIDAe | – | – | – | – | 6 | 19,474 | 19,458 |
NAID, NSID and NCID are the number of PubChem assays, substances and compounds extracted from scientific articles, respectively; and NAID-PMID, NSID-PMID and NCID-PMID are the number of cross-references from PubChem assays, substances, and compounds to PubMed articles, respectively. NPMID is the number of unique PubMed articles from which the assay data are extracted
aRef. [22]. https://www.ebi.ac.uk/chembl/
bRef. [23]. http://www.pdbbind-cn.org
cRef. [16]. http://www.guidetopharmacology.org
dRef. [26]. https://www.bindingdb.org
eRef. [27]. http://pharminfo.pharm.kyoto-u.ac.jp/services/glida/
Fig. 1The substance record page for SID 85856310 (warfarin), with a link to the source article published in Nature Chemical Biology. The original article has a link to SID 85856310 in PubChem, allowing article readers to access comprehensive information on warfarin available in the PubChem Compound database, by clicking a link to the Compound Summary page for CID 54678486
Fig. 2Retrieving compound records annotated with MeSH terms using the Advanced Search Builder. Clicking the “Advanced” link under the “Compound” tab on the PubChem Homepage directs users to the PubChem Compound Advanced Search Builder. Selecting the “MeSHTerm” from the dropdown menu and providing a MeSH term in the search box will retrieve compounds annotated with that MeSH term
Entrez indices used to search for records with MeSH annotations
| Entrez index | Description |
|---|---|
| Compound database | |
| Complete MeSH term | Retrieve compounds annotated with the MeSH term that |
| MESH term | Retrieve compounds annotated with MeSH terms that |
| MeSH tree node | Retrieve compounds annotated with the MeSH term that match the query and those annotated with any MeSH terms beneath the node corresponding to that MeSH term. For example, “Penicillins[MeSHTreeNode]” will retrieve records annotated with MeSH term “Penicillins”and those with MeSH terms “Oxacillin”, “Cloxacillin”, and so on, which correspond to child nodes beneath the “Penicillins” node in the MeSH tree |
| MeSH description | Retrieve compounds annotated with the MeSH terms whose description contains the query string |
| PharmAction | Retrieve compounds annotated with the Pharmacological Action term, which are a subset of MeSH terms |
| PharmActionID | Retrieve compounds annotated with the Pharmacological Action term corresponding to the numeric identifier given as a query |
| BioAssay database | |
| MeSH term active | Retrieve assays in which only an active substance is annotated with the MeSH term given as a query |
| MeSH term tested | Retrieve assays in which any tested substance is annotated with the MeSH term given as a query |
| MeSH description active | Retrieve assays in which only an active substance is annotated with the MeSH terms whose descriptions have a query string |
| MeSH description tested | Retrieve assays in which any tested substance is annotated with the MeSH terms whose descriptions have a query string |
| Pharm action active | Retrieve assays in which only an active substance has the Pharmacological Action annotation given as a query |
| Pharm action tested | Retrieve assays in which any tested substance has the Pharmacological Action annotation given as a query |
Fig. 3The document summary (DocSum) page that shows the results for a search for “warfarin”. Scientific articles associated with the returned compound records can be accessed via the Entrez Links, which are available under the “Find related data” menu (for multiple records) or from a link for individual compound records
Fig. 4The literature section of the Compound Summary page (DocSum) for CID 5288826 (morphine). Clicking the “Literature” section in the Table of Contents allows users to jump to the literature section, which consists of two subsections: depositor-provided and NLM-curated PubMed citations
Fig. 5Retrieving compound records with a particular property using the Entrez filters. Selecting the “Filter” option under the drop-down menu on the PubChem Compound Advanced Search Builder allows you to retrieve compounds with a particular property or annotation. Available filters can be shown or hidden by clicking the “Show/Hide index list” button
Entrez filters used to search for records with MeSH annotations
| Entrez filter | Description |
|---|---|
| Compound database | |
| has_mesh | Equivalent to “pccompound_mesh” |
| has_pharm | Equivalent to “pccompound_mesh_pharm” |
| pccompound_mesh | Select compounds annotated with MeSH terms. Equivalent to “has_mesh” |
| pccompound_mesh_pharm | Select compounds annotated with MeSH Pharmacological Actions. Equivalent to “has_pharm” |
| pccompound_pmc | Select compounds that have associated full-text articles in PubMed Central |
| pccompound_pubmed | Select compounds that have depositor-provided cross-references to PubMed articles |
| pccompound_pubmed_mesh | Select compounds associated with PubMed abstracts that are annotated with common MeSH annotations |
| pccompound_pubmed_publisher | Select compounds that have cross-references to PubMed articles, provided to PubMed by publishers |
| BioAssay database | |
| pcassy_pmc | Select assays that have associated full-text articles in PubMed Central |
| pcassay_pubmed | Select assays that have depositor-provided cross-references to PubMed articles |
| pcassay_pubmed_major | Select assays that have cross-references to the PubMed articles that contains the original bioactivity data in the assays |
Fig. 6Distribution of PMIDs per CID and CIDs per PMID for three types of Entrez links. Distributions of a PMIDs per CID and b CIDs per PMID are shown for three Entrez links between PubChem Compound and PubMed: “pccompound_pubmed”, “pccompound_pubmed_mesh”, and “pccompound_pubmed_publisher”. See texts for the description of these links
Fig. 7Venn diagrams for depositor-provided CID-PMID associations and those generated via MeSH. The Venn diagrams compare depositor-provided CID-PMID associations and automated annotations via MeSH in terms of a the number of CID-PMID associations, and b CIDs and c PMIDs involved in these associations