| Literature DB >> 26194581 |
Abstract
Bioactive structures published in medicinal chemistry patents typically exceed those in papers by at least twofold and may precede them by several years. The Big-Bang of open automated extraction since 2012 has contributed to over 15 million patent-derived compounds in PubChem. While mapping between chemical structures, assay results and protein targets from patent documents is challenging, these relationships can be harvested using open tools and are beginning to be curated into databases.Entities:
Mesh:
Year: 2015 PMID: 26194581 PMCID: PMC4548146 DOI: 10.1016/j.ddtec.2014.12.001
Source DB: PubMed Journal: Drug Discov Today Technol ISSN: 1740-6749
Figure 1Finding and extracting selected examples from WO2014066132 and WO2014065434. In the upper panel the search term matches are highlighted in green. In the lower-left panel, example 72 from 5434 (page 238 in the PDF) was reported to have an IC50 against purified enzyme of 13.6 nM (page 249). The structure was determined from an initial image conversion using OSRA [11] and subsequently edited in the PubChem sketcher [12] from which PubChem searches were launched. The SMILES and InChIKey are shown below.
FC4C([C@@]1(NC(OC2C1COC2)N)C#C)CC(NC(O)C3CNC(OCF)CN3)CC4
InChIKey = SOYARSISURDFSW-SKMDKRRUSA-N.
In the lower-right panel example 8 from 6132 is shown (page 60 in the PDF) that has a reported IC50 of 105 nM (on page 63). Using chemicalize.org [13], the IUPAC name was used to generate a range of molecular outputs including a SMILES string and the InChIKey below.
CC(C)(O)C1C(F)CNC(N1)N1C[C@H]2CSC(N)NC2(C1)C1CNCCN1
InChIKey = IKIJFJKFIYFTBZ-YOZOHBORSA-N.
Comparative assessment of patent structures inside PubChem. The specified sources can be retrieved from PubChem by simple selects (e.g. ‘SureChem’[SourceName] but note this is now SureChEMBL) with the results as CID counts. PubChem and ChEMBL are included for comparison. Totals are in millions for each source and dates are from the last update. Subsequent columns are filters expressed as %. In order these are; stereo and E/Z (completely or partially unspecified), Mw < 400, unique structures (to that source), entries with two components, rule-of-five with 200-800 Mw range. Int. refers to links to patent documents provided inside PubChem. Ext. refers to document mappings in the source links
| 49.8 | May-14 | 35% | 58% | 52% | 2.5% | 72% | |||
| 2.3 | Jun-12 | 41% | 71% | 31% | 0.3% | 58% | Yes | Yes | |
| 9.3 | Mar-13 | 38% | 52% | 52% | 5.9% | 63% | No | Yes | |
| 3.9 | Aug-12 | 29% | 48% | 27% | 5.4% | 56% | Yes | Yes | |
| 0.9 | Apr-14 | 26% | 48% | 17% | 1.9% | 61% | |||
| 4.2 | May-14 | 25% | 49% | 15% | 4.0% | 53% | No | No | |
| 15.5 | May-14 | 35% | 52% | 47% | 4.7% | 57% |
Signifies the out-links are subscriber-only.
Figure 2Examples of curated and annotated database mappings from patents. The upper panel shows part of the Guide to PHARMACOLOGY (GToPdb) entry (http://www.guidetopharmacology.org/GRAC/LigandDisplayForward?ligandId=6476) for AZD9668 as a clinically tested neutrophil elastase (UniProt P08246) inhibitor. The curator's notes and the two references are shown; including a SureChEMBL link to the patent US20070203129 [19] (additional connectivity for this entry has been added for the next update). The lower panel shows one of the views on BindingDB for PubChem CID 44247663 from a US8541427 [20] on BACE1 inhibitors.