Literature DB >> 27256313

CART-a chemical annotation retrieval toolkit.

Samy Deghou1, Georg Zeller1, Murat Iskar1, Marja Driessen1, Mercedes Castillo1, Vera van Noort2, Peer Bork3.   

Abstract

MOTIVATION: Data on bioactivities of drug-like chemicals are rapidly accumulating in public repositories, creating new opportunities for research in computational systems pharmacology. However, integrative analysis of these data sets is difficult due to prevailing ambiguity between chemical names and identifiers and a lack of cross-references between databases.
RESULTS: To address this challenge, we have developed CART, a Chemical Annotation Retrieval Toolkit. As a key functionality, it matches an input list of chemical names into a comprehensive reference space to assign unambiguous chemical identifiers. In this unified space, bioactivity annotations can be easily retrieved from databases covering a wide variety of chemical effects on biological systems. Subsequently, CART can determine annotations enriched in the input set of chemicals and display these in tabular format and interactive network visualizations, thereby facilitating integrative analysis of chemical bioactivity data.
AVAILABILITY AND IMPLEMENTATION: CART is available as a Galaxy web service (cart.embl.de). Source code and an easy-to-install command line tool can also be obtained from the web site. CONTACT: bork@embl.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author 2016. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2016        PMID: 27256313      PMCID: PMC5018367          DOI: 10.1093/bioinformatics/btw233

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Understanding the effects of chemicals, in particular small organic molecules, on biological systems is fundamental to research in pharmacology, toxicology, chemical biology and related fields. Bioactivities of chemicals can be investigated at various scales analyzing drug-associated readouts, such as protein interactions, cellular phenotypes, toxicity or side effects (Iskar ). Owing to the development of high-throughput screening technologies, bioactivity data for large chemical libraries has rapidly accumulated in recent years and is increasingly becoming available in public repositories (see Table 1). While this has created tremendous opportunities for research that aims to integrate these heterogeneous data sets in order to gain a better systemic understanding of chemical effects, in practice such efforts are severely impeded by disparities in data representation. In particular, unambiguous identification of chemicals across databases can be difficult, because a myriad of synonyms and trade names exist for many chemicals, and even controlled nomenclature and structural descriptions are sometimes ambiguous, similar to the problem of mapping between various gene, transcript and protein nomenclatures, now overcome by many bioinformatics tools (Huang , among others). To address the persisting need in chemoinformatics, we here present CART, a Chemical Annotation Retrieval Toolkit. In solving the chemical name-matching problem, CART aims at integrating bioactivity annotations across various databases to provide functional annotation and enrichment analysis for chemicals. Thereby CART can identify coherent functional themes, analogous to gene ontology annotation tools, such as DAVID (Huang ). This makes CART useful, e.g. for the automatic characterization of hits derived from chemical screens (Rihel , for instance). Also in other contexts, annotating chemicals with various biological effects is becoming an important task, which has so far largely required expert manual annotation, but can be greatly simplified by CART.
Table 1.

Chemical bioactivity databases available through CART

BioactivityDatabaseSizeaReferences
Molecular targetSTITCH221 724 / 9015stitch.embl.de
TTD11 340 / 1120bidd.nus.edu.sg/group/cjttd
DrugBank853 / 147www.drugbank.ca
Gene interactionsCTD6334 / 8346ctdbase.org
MetabolizationDrugBank396 / 64www.drugbank.ca
Therapeutic class.ChEMBL1118 / 1538www.ebi.ac.uk/chembl/ftc
ATC2515 / 924www.whocc.no/atc
Drug side effectsSIDER1309 / 4130sider.embl.de
ToxicityDrugMatrix742 / 22ntp.niehs.nih.gov/drugmatrix

aAnnotated chemicals/annotation terms, see Supplementary Figure S3 and Supplementary Material S2.

Chemical bioactivity databases available through CART aAnnotated chemicals/annotation terms, see Supplementary Figure S3 and Supplementary Material S2.

2 Approach

The first component of CART consists of matching user-provided chemical names to a comprehensive dictionary of synonyms, serving as a reference space for disambiguation to unique chemical identifiers (Fig. 1). To improve matching sensitivity over exact synonym look-up, we additionally implemented an approximate text matching method based on the Apache Lucene search engine (http://lucene.apache.org/) and heuristics such as the conversion between salt (e.g. salicylate) and acid form (salicylic acid, see Supplementary Material S1 for details). CART also offers the possibility to match structural chemical identifiers, SMILES and InChI keys, via exact string matching. Taken together, these search capabilities go beyond what existing tools, such as e.g. CTD (Davis ), currently offer (see Supplementary Table S1).
Fig. 1.

Typical CART workflow including chemical name matching, annotation retrieval and enrichment analysis. The lower panels contain a toy example of non-steroidal anti-inflammatory (NSAID) compounds and show excerpts of how these are matched and annotated by CART, the rightmost panel displays a (partial) enrichment network; PTGS, prostaglandin-endoperoxide synthase targets; M01A, ATC code for NSAIDs, Adj. P, FDR-corrected P-value, nephritis and vasculitis are NSAID-associated side effects. See Supplementary Material S3 and Supplementary Figure S4 for an application of CART to hits from a drug screen.

Typical CART workflow including chemical name matching, annotation retrieval and enrichment analysis. The lower panels contain a toy example of non-steroidal anti-inflammatory (NSAID) compounds and show excerpts of how these are matched and annotated by CART, the rightmost panel displays a (partial) enrichment network; PTGS, prostaglandin-endoperoxide synthase targets; M01A, ATC code for NSAIDs, Adj. P, FDR-corrected P-value, nephritis and vasculitis are NSAID-associated side effects. See Supplementary Material S3 and Supplementary Figure S4 for an application of CART to hits from a drug screen. Mapping to CART's chemical reference space facilitates subsequent retrieval of bioactivity annotations (Table 1, Supplementary Material S2). This allows for easy, multi-facetted annotation of chemical libraries, synonym retrieval, which is useful e.g. for text mining, and the identification of bioactivities that are enriched in the user-provided input. Statistical significance for these enrichments is established using Fisher’s exact test with FDR correction for multiple testing. In a typical use case, users may want to subject a set of hits resulting from a high-throughput chemical screen to CART analysis. After name matching, the enrichment analysis can be done relative to a user-specified background, in this case the library of all chemicals probed in the screen. Enriched annotations are subsequently retrieved from databases describing chemical effects at various scales, including molecular targets, metabolizing enzymes, functional classifications, indication areas and side effects (Table 1, Supplementary Material S2). The results are visualized as a network linking the input set of chemicals to enriched annotations (Fig. 1, Supplementary Material S3, Supplementary Figure S4). Implemented in Cytoscape.js (Franz ), this network can be interactively explored. The Galaxy (Goecks ) front-end of CART enables users to combine individual modules into new workflows, allowing for easy customization and extension of the standard use case described above. Galaxy moreover facilitates reproducibility due to its history and sharing functionalities (Goecks ).

2 Results

CART uses a comprehensive chemical reference space of about 98.8 million names and synonyms and 68.3 million InChIKeys that are disambiguated to 37.7 million chemical identifiers based on information from the STITCH database version 4.0 (Kuhn ). Matching user-provided chemical names into this reference space is very fast, e.g. processing 1,000 chemicals takes  <40 s (Supplementary Figure S1), allowing integrative analyses at a large scale. This is becoming crucial due to the data deluge of publicly available chemical bioactivity data (Wang ). We benchmarked the accuracy of CART's (approximate) name matching algorithm using four datasets, for which a mapping to STITCH or PubChem identifiers already existed so that they could serve as a gold standard. We found CART’s sensitivity to range between 92 and 100% on these benchmarks, while precision ranged between 79 and 98% (Supplementary Figure S2). As an additional means of ensuring high analysis standards, CART enables the user to interactively curate the automatic name matching results before proceeding further. Owing to its unified reference chemical space, CART offers seamless integration of user-provided data with a number of databases containing functional annotations of chemicals at various scales (Table 1). These databases vary in scope, as the number of annotated chemicals ranges from >220 000 compounds with known protein interactions (Kuhn ; Qin ) to a few hundred drugs for which therapeutic classification, metabolization and toxicity information (Croset ; Kuhn ; Law ) is publicly available (Supplementary Figure S3). However, for a set of 1,120 well-characterized chemicals, annotations from ≥5 databases are provided (Supplementary Figure S3). CART’s annotation and enrichment functionality is demonstrated on drug sets previously defined in a study by Rihel that screened chemicals for behavioural effects on zebrafish larvae (Supplementary Material S3 and Supplementary Figure S4). It revealed coherent themes of drug bioactivities, which could otherwise only be discovered by expert manual annotations (as done in Rihel ). In summary, CART implements a fast and accurate approach for matching chemical names to a comprehensive chemical universe. This facilitates the retrieval of enriched annotations from various databases describing chemical effects on biological systems (Table 1) and their exploration in an interactive network view. CART thus makes integrative analysis of chemical bioactivity data easy even for non-specialists.
  12 in total

Review 1.  Drug discovery in the age of systems biology: the rise of computational approaches for data integration.

Authors:  Murat Iskar; Georg Zeller; Xing-Ming Zhao; Vera van Noort; Peer Bork
Journal:  Curr Opin Biotechnol       Date:  2011-12-05       Impact factor: 9.740

2.  Zebrafish behavioral profiling links drugs to biological targets and rest/wake regulation.

Authors:  Jason Rihel; David A Prober; Anthony Arvanites; Kelvin Lam; Steven Zimmerman; Sumin Jang; Stephen J Haggarty; David Kokel; Lee L Rubin; Randall T Peterson; Alexander F Schier
Journal:  Science       Date:  2010-01-15       Impact factor: 47.728

3.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences.

Authors:  Jeremy Goecks; Anton Nekrutenko; James Taylor
Journal:  Genome Biol       Date:  2010-08-25       Impact factor: 13.583

4.  PubChem's BioAssay Database.

Authors:  Yanli Wang; Jewen Xiao; Tugba O Suzek; Jian Zhang; Jiyao Wang; Zhigang Zhou; Lianyi Han; Karen Karapetyan; Svetlana Dracheva; Benjamin A Shoemaker; Evan Bolton; Asta Gindulyte; Stephen H Bryant
Journal:  Nucleic Acids Res       Date:  2011-12-02       Impact factor: 16.971

5.  The functional therapeutic chemical classification system.

Authors:  Samuel Croset; John P Overington; Dietrich Rebholz-Schuhmann
Journal:  Bioinformatics       Date:  2013-10-30       Impact factor: 6.937

6.  The Comparative Toxicogenomics Database's 10th year anniversary: update 2015.

Authors:  Allan Peter Davis; Cynthia J Grondin; Kelley Lennon-Hopkins; Cynthia Saraceni-Richards; Daniela Sciaky; Benjamin L King; Thomas C Wiegers; Carolyn J Mattingly
Journal:  Nucleic Acids Res       Date:  2014-10-17       Impact factor: 16.971

7.  The SIDER database of drugs and side effects.

Authors:  Michael Kuhn; Ivica Letunic; Lars Juhl Jensen; Peer Bork
Journal:  Nucleic Acids Res       Date:  2015-10-19       Impact factor: 16.971

8.  Therapeutic target database update 2014: a resource for targeted therapeutics.

Authors:  Chu Qin; Cheng Zhang; Feng Zhu; Feng Xu; Shang Ying Chen; Peng Zhang; Ying Hong Li; Sheng Yong Yang; Yu Quan Wei; Lin Tao; Yu Zong Chen
Journal:  Nucleic Acids Res       Date:  2013-11-21       Impact factor: 16.971

9.  STITCH 4: integration of protein-chemical interactions with user data.

Authors:  Michael Kuhn; Damian Szklarczyk; Sune Pletscher-Frankild; Thomas H Blicher; Christian von Mering; Lars J Jensen; Peer Bork
Journal:  Nucleic Acids Res       Date:  2013-11-28       Impact factor: 16.971

10.  Cytoscape.js: a graph theory library for visualisation and analysis.

Authors:  Max Franz; Christian T Lopes; Gerardo Huck; Yue Dong; Onur Sumer; Gary D Bader
Journal:  Bioinformatics       Date:  2015-09-28       Impact factor: 6.937

View more
  3 in total

1.  A Review of Bioinformatics Tools to Understand Acetaminophen-Alcohol Interaction.

Authors:  Bryan Hedgpeth; Roy Missall; Anna Bambaci; Matthew Smolen; Sevgi Yavuz; Jessica Cottrell; Tinchun Chu; Sulie L Chang
Journal:  Medicines (Basel)       Date:  2019-07-25

2.  DenovoProfiling: A webserver for de novo generated molecule library profiling.

Authors:  Zhihong Liu; Jiewen Du; Ziying Lin; Ze Li; Bingdong Liu; Zongbin Cui; Jiansong Fang; Liwei Xie
Journal:  Comput Struct Biotechnol J       Date:  2022-08-02       Impact factor: 6.155

3.  Extensive impact of non-antibiotic drugs on human gut bacteria.

Authors:  Lisa Maier; Mihaela Pruteanu; Michael Kuhn; Georg Zeller; Anja Telzerow; Exene Erin Anderson; Ana Rita Brochado; Keith Conrad Fernandez; Hitomi Dose; Hirotada Mori; Kiran Raosaheb Patil; Peer Bork; Athanasios Typas
Journal:  Nature       Date:  2018-03-19       Impact factor: 49.962

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.