| Literature DB >> 27899562 |
Anna Gaulton1, Anne Hersey1, Michał Nowotka1, A Patrícia Bento1,2, Jon Chambers1, David Mendez1, Prudence Mutowo1, Francis Atkinson1, Louisa J Bellis1, Elena Cibrián-Uhalte1, Mark Davies1, Nathan Dedman1, Anneli Karlsson1, María Paula Magariños1,2, John P Overington1, George Papadatos1, Ines Smit1, Andrew R Leach3.
Abstract
ChEMBL is an open large-scale bioactivity database (https://www.ebi.ac.uk/chembl), previously described in the 2012 and 2014 Nucleic Acids Research Database Issues. Since then, alongside the continued extraction of data from the medicinal chemistry literature, new sources of bioactivity data have also been added to the database. These include: deposited data sets from neglected disease screening; crop protection data; drug metabolism and disposition data and bioactivity data from patents. A number of improvements and new features have also been incorporated. These include the annotation of assays and targets using ontologies, the inclusion of targets and indications for clinical candidates, addition of metabolic pathways for drugs and calculation of structural alerts. The ChEMBL data can be accessed via a web-interface, RDF distribution, data downloads and RESTful web-services.Entities:
Mesh:
Year: 2016 PMID: 27899562 PMCID: PMC5210557 DOI: 10.1093/nar/gkw1074
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Data sources included in the ChEMBL release 22
| Short name | Source | No. compounds | No. assays | No. activities |
|---|---|---|---|---|
| LITERATURE | Scientific Literature | 967 242 | 963 186 | 5 635 084 |
| PUBCHEM_BIOASSAY | PubChem BioAssays | 489 575 | 2937 | 7 559 601 |
| GATES_LIBRARY | Gates Library compound collection | 68 490 | 2 | 69 444 |
| BINDINGDB | BindingDB Database | 68 149 | 1317 | 99 061 |
| GSK_TCMDC | GSK Malaria Screening | 13 467 | 6 | 81 198 |
| ST_JUDE_LEISH | St Jude Leishmania Screening | 13 422 | 6 | 42 105 |
| USP/USAN | USP Dictionary of USAN and International Drug Names | 11 356 | 0 | 0 |
| DNDI | Drugs for Neglected Diseases Initiative (DNDi) | 7053 | 233 | 14 452 |
| ASTRAZENECA | AstraZeneca Deposited Data | 5799 | 15 | 11 687 |
| NOVARTIS | Novartis Malaria Screening | 5614 | 6 | 27 888 |
| ORANGE_BOOK | Orange Book | 2016 | 0 | 0 |
| SUPPLEMENTARY | Deposited Supplementary Bioactivity Data | 1786 | 13 | 4817 |
| CANDIDATES | Clinical Candidates | 1633 | 0 | 0 |
| ST_JUDE | St Jude Malaria Screening | 1524 | 16 | 5456 |
| TP_TRANSPORTER | TP-search Transporter Database | 1434 | 3592 | 6765 |
| DRUGMATRIX | DrugMatrix | 930 | 113 678 | 350 929 |
| METABOLISM | Curated Drug Metabolism Pathways | 828 | 0 | 0 |
| GSK_TB | GSK Tuberculosis Screening | 826 | 15 | 1814 |
| WHO_TDR | WHO-TDR Malaria Screening | 740 | 16 | 5853 |
| GSK_TCAKS | GSK Kinetoplastid Screening | 592 | 13 | 7235 |
| MMV_MBOX | MMV Malaria Box | 400 | 138 | 45 158 |
| MMV_PBOX | MMV Pathogen Box | 400 | 0 | 0 |
| ATLAS | Gene Expression Atlas Compounds | 398 | 0 | 0 |
| DRUGS | Manually Added Drugs | 378 | 0 | 0 |
| GSK_PKIS | GSK Published Kinase Inhibitor Set | 366 | 456 | 169 451 |
| OSM | Open Source Malaria Screening | 211 | 22 | 344 |
| WITHDRAWN | Withdrawn Drugs | 192 | 0 | 0 |
| TG_GATES | Open TG-GATEs | 160 | 158 199 | 158 199 |
| SANGER | Sanger Institute Genomics of Drug Sensitivity in Cancer | 137 | 714 | 73 169 |
| FDA_APPROVAL | FDA Approval Packages | 43 | 1386 | 1387 |
| HARVARD | Harvard Malaria Screening | 37 | 4 | 111 |
Figure 1.Examples of more complex queries that could be performed (e.g. using web services) by combining BioAssay Ontology, protein family and GO classifications.
Figure 2.Compound Report Card for Troglitazone showing mechanism of action, indication and withdrawal information (https://www.ebi.ac.uk/chembl/compound/inspect/CHEMBL408).
Figure 3.Metabolism scheme for Simvastatin (https://www.ebi.ac.uk/chembl/compound/inspect/CHEMBL1064).