| Literature DB >> 27678076 |
Prudence Mutowo1, A Patrícia Bento2, Nathan Dedman2, Anna Gaulton2, Anne Hersey2, Jane Lomax3, John P Overington2.
Abstract
BACKGROUND: The process of discovering new drugs is a lengthy, time-consuming and expensive process. Modern day drug discovery relies heavily on the rapid identification of novel 'targets', usually proteins that can be modulated by small molecule drugs to cure or minimise the effects of a disease. Of the 20,000 proteins currently reported as comprising the human proteome, just under a quarter of these can potentially be modulated by known small molecules Storing information in curated, actively maintained drug discovery databases can help researchers access current drug discovery information quickly. However with the increase in the amount of data generated from both experimental and in silico efforts, databases can become very large very quickly and information retrieval from them can become a challenge. The development of database tools that facilitate rapid information retrieval is important to keep up with the growth of databases. DESCRIPTION: We have developed a Gene Ontology-based navigation tool (Gene Ontology Tree) to help users retrieve biological information to single protein targets in the ChEMBL drug discovery database. 99 % of single protein targets in ChEMBL have at least one GO annotation associated with them. There are 12,500 GO terms associated to 6200 protein targets in the ChEMBL database resulting in a total of 140,000 annotations. The slim we have created, the 'ChEMBL protein target slim' allows broad categorisation of the biology of 90 % of the protein targets using just 300 high level, informative GO terms. We used the GO slim method of assigning fewer higher level GO groupings to numerous very specific lower level terms derived from the GOA to describe a set of GO terms relevant to proteins in ChEMBL. We then used the slim created to provide a web based tool that allows a quick and easy navigation of protein target space. Terms from the GO are used to capture information on protein molecular function, biological process and subcellular localisations. The ChEMBL database also provides compound information for small molecules that have been tested for their effects on these protein targets. The 'ChEMBL protein target slim' provides a means of firstly describing the biology of protein drug targets and secondly allows users to easily establish a connection between biological and chemical information regarding drugs and drug targets in ChEMBL. The 'ChEMBL protein target slim' is available as a browsable 'Gene Ontology Tree' on the ChEMBL site under the browse targets tab ( https://www.ebi.ac.uk/chembl/target/browser ). A ChEMBL protein target slim OBO file containing the GO slim terms pertinent to ChEMBL is available from the GOC website ( http://geneontology.org/page/go-slim-and-subset-guide ).Entities:
Keywords: Bioinformatics; Biology; Database; Drug discovery; Ontologies; Protein
Year: 2016 PMID: 27678076 PMCID: PMC5039825 DOI: 10.1186/s13326-016-0102-0
Source DB: PubMed Journal: J Biomed Semantics
Proteins mapped to GO slim terms per species
| Species | Proteins targets mapped to slim |
|---|---|
| Homo sapiens | 3254 |
| Rattus norvegicus | 899 |
| Mus musculus | 828 |
| Bos taurus | 194 |
| Sus scrofa | 98 |
| Escherichia coli K-12 | 74 |
| Oryctolagus cuniculus | 74 |
| Mycobacterium tuberculosis | 73 |
| Saccharomyces cerevisiae S288c | 70 |
| Staphylococcus aureus | 50 |
Fig. 1Searching the ChEMBL database using the GO tree to retrieve all proteins involved in response to toxic substance and their related compound and bioactivity information. Panel a shows the biological process node of the GO tree with a 'toxic substance' keyword search. Panel b shows the search output of the list of proteins annotated with the 'toxic substance' GO term
Fig. 3Mechanism of action for drugs at intersection of protein GO categories
Fig. 2Number of drugs used as Antineoplastic and Immunomodulating Agents (ATC Class L) targeting proteins in 5 biological process categories generated using the ChEMBL slim