Literature DB >> 26590256

STITCH 5: augmenting protein-chemical interaction networks with tissue and affinity data.

Damian Szklarczyk¹, Alberto Santos², Christian von Mering¹, Lars Juhl Jensen², Peer Bork³, Michael Kuhn⁴.

Abstract

Interactions between proteins and small molecules are an integral part of biological processes in living organisms. Information on these interactions is dispersed over many databases, texts and prediction methods, which makes it difficult to get a comprehensive overview of the available evidence. To address this, we have developed STITCH ('Search Tool for Interacting Chemicals') that integrates these disparate data sources for 430 000 chemicals into a single, easy-to-use resource. In addition to the increased scope of the database, we have implemented a new network view that gives the user the ability to view binding affinities of chemicals in the interaction network. This enables the user to get a quick overview of the potential effects of the chemical on its interaction partners. For each organism, STITCH provides a global network; however, not all proteins have the same pattern of spatial expression. Therefore, only a certain subset of interactions can occur simultaneously. In the new, fifth release of STITCH, we have implemented functionality to filter out the proteins and chemicals not associated with a given tissue. The STITCH database can be downloaded in full, accessed programmatically via an extensive API, or searched via a redesigned web interface at http://stitch.embl.de.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Proteins

Year: 2015 PMID： 26590256 PMCID： PMC4702904 DOI： 10.1093/nar/gkv1277

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

The role of small molecules in biological systems can be understood only in the relation to the function of the targeted biomolecules, which, in turn, is largely defined by their interaction partners (1–3). The role of the interaction network is even more prominent in the area of the drug development, since diseases are often a consequence of multiple changes in the same pathway or protein complex (4,5). Taking into account the neighborhood of the targeted proteins and the topology of the network itself can lead to a better understanding of a drug's cellular impact (6,7). Furthermore, as only a subset of all proteins are viable drug targets (8), most therapeutics target proteins in the network vicinity from more prospective, but undruggable, proteins (7). Several databases provide proteome-wide protein–chemical interactions (9–11) and several other (12–14) put protein–chemical interactions in the context of protein–protein interaction networks, which is essential for effective in silico drug discovery. A drug's impact on the organism and its efficacy depend on its engagement with the targeted proteins and the extent to which it disrupts the protein–protein and protein–chemical interaction network (7,15). This is related to the concentration of the drug, the strength with which it modulates the activity of the target, and the distribution of target proteins among different tissues (16). To enable the users to rationally select possible drug targets, we have added two new features to STITCH: a new mode that allows users to show known binding affinities between proteins and chemicals, and the ability to filter the network to show only proteins related to a selected tissue. STITCH, in its fifth release, shares protein space with STRING v10 (17) and now encompasses more than 9 600 000 proteins from 2031 eukaryotic and prokaryotic genomes. Also, its chemical space grew by a quarter compared to the previous version (18), from 340 000 to 430 000 compounds (not including different stereoisomers). STITCH is available through new redesigned web interface at http://stitch.embl.de and via an extensive API that allows programmatic access, including the ability to disambiguate queries, modify all network parameters and generate images. In order to enable large-scale analysis, which may not be feasible through web-interface or API, the precomputed network and the supplementary information are freely available for download.

SOURCES OF INTERACTIONS

Although there is a plethora of data available from which protein–chemical networks could be derived, their dispersed nature, different precision, name-space and focus make it cumbersome to assemble a full picture of all available knowledge. The STITCH pipeline aggregates high-throughput experiments data, manually curated datasets and the results of several prediction methods into a single global network of protein–protein and protein–chemical interactions. This does not expose the user to the heterogeneity of the underlying data, yet, at the same time, keeps all the primary evidence of the interaction readily accessible. A large part of the known interactions comes from manually curated datasets such as DrugBank (19), GPCR-ligand database (GLIDA) (20), Matador (21), the Therapeutic Targets Database (TTD) (22) and the Comparative Toxicogenomics Database (CTD) (23), and several pathway databases including the Kyoto Encyclopedia of Genes and Genomes (KEGG) (12), NCI/Nature Pathway Interaction Database (24), Reactome (25) and BioCyc (26). As there can be overlap between different manually curated datasets, we do not consider multiple reports of identical interactions as being independent from each other. Instead, we count redundant interactions only once and do not increase the confidence level. Other large sources of protein–chemical links are the datasets of experimentally validated interactions, which include ChEMBL (27), PDSP Ki Database (28), Protein Data Bank (PDB) (29) and two high-throughput kinase–ligand interactions studies (30,31). Also in this case, interactions may be reported in different databases and with different binding affinities. To compute the final confidence score, we only take the strongest reported affinity into account. The sources of verified protein–chemical interactions are complemented by automated text mining and a structure-based prediction method (18). The text-mining pipeline include co-occurrence text-mining and natural language processing of all MEDLINE abstracts as well as available PubMed Central open-access full-text articles (32). The newest addition to the text-mining sources are NIH RePORTER grant abstracts (https://projectreporter.nih.gov/). Considering co-occurring terms, adding the RePORTER data increased the number of high-confidence interactions between human proteins and chemicals from 2740 to 4740. Extensive benchmarking of each data source allows us to provide unified confidence score for every interaction while taking into account the sources’ predicted precision.

DISPLAY OF BINDING AFFINITIES IN THE NETWORK VIEW

Small molecules that activate or inhibit proteins such as enzymes or receptors are among the most studied classes of exogenous small molecules. In order to assess the effect and confidence of protein–ligand binding, as well as variability in the affinity of known ligands, it is essential to know the binding affinity between the compound and its target. Usually, this binding affinity is quantified as the inhibition constant K. In some cases, Ki values are not available, but other values such as the IC50 or EC50 (half of the maximal inhibitory concentration) can serve as an approximation. Ki values of drugs vary greatly, from nanomolar inhibition constants to relatively high values, such as 52 μM between aspirin and cyclooxygenase 2 (27). Therefore, for any given drug, it is not so much the absolute value of the K, but rather the relative binding affinities that determine the impact on the interaction network. In previous versions of STITCH, Ki values from primary sources (27,28) were accessible to the user through the web-interface. In the new release of STITCH, the user can now choose to switch the network view to show the binding affinities of all protein–chemical interactions for which this value is known (Figure 1). This new network view is similar to the STITCH's confidence view: the thickness of the edge between nodes scales with the Ki value. If a Ki is not available, EC50 or IC50 will be used to determine the depicted strength of the interaction. If there are multiple measurements available, the lowest value (i.e. highest reported affinity) will be used to determine the thickness of the edge.

Figure 1.

Display of binding affinities. The user interface of STITCH has been updated and the option to scale edge width of protein–chemical interactions according to binding affinity has been added. The shown network of multiple NSAIDs makes their different binding affinities clear: for example, aspirin has relatively low binding affinities, whereas rofecoxib is specifically binding PTGS2.

DATA AND FILTERING FOR TISSUE SPECIFICITY

The protein–chemical network in STITCH is global and as such considers interactions anywhere in an organism. However, in multicellular organisms such as humans, not all proteins are present in every tissue. STITCH 5 addresses this through a new feature that allows users to filter a human interaction network so that only the proteins believed to be present in a specified tissue are shown (Figure 2). To provide this feature, STITCH now integrates tissue-specific protein expression patterns from two data sources. First, the TISSUES resource (33), which combines evidence from UniProt annotations, systematic large-scale transcriptomics and proteomics studies, and co-occurrence text mining. For use in STITCH, the text-mining evidence was recomputed based on the same texts used elsewhere in STITCH. Second, STITCH incorporates baseline expression patterns from tissues deposited in the Expression Atlas (34). Before augmenting the network with tissues data, users have to choose if they want to use data from TISSUES or Expression Atlas. The TISSUES resource contains confidence levels ranging from one (lowest confidence) to five (highest confidence). Accordingly, on the STITCH website users can select a tissue and a minimum confidence level. In contrast, datasets from the Expression Atlas are transformed into percentiles. The confidence score for a protein–protein interaction in the given tissue is then multiplied with the geometric mean of the two proteins’ expression percentiles. For protein–chemical interactions, the confidence score is multiplied with the protein's expression percentile. To access the tissue expression patterns, users can search for tissues either by typing parts of the tissue names or by selecting a tissue from a list. Then, users can submit the changed settings to STITCH. In return, an updated network will be shown. As non-expressed nodes are removed (using TISSUES) or confidence values get updated (using Expression Atlas), other interaction partners may become part of the network.

Figure 2.

Filtering interaction networks according to tissue expression patterns. (A) The interaction network around diclofenac and PTGS1/2 is shown without filtering for tissue expression patterns. In this and the following panels, the top five interaction partners with the highest scores are shown. (B) Using the TISSUES resource, only proteins believed to be expressed in blood platelets (with medium confidence, i.e. three stars in TISSUES) become part of the interaction network. For these settings, PTGS2 is not expressed and is therefore shown in a lighter color. (C) Expression patterns according to RNA-seq data from the Human Protein Atlas are used to focus on genes expressed in smooth muscle. Confidence scores of interactions are scaled by the geometric mean of the binding partners’ expression percentiles. Due to the recomputed confidence scores, four interaction partners have been replaced by other proteins.

USE CASES

STITCH has been widely used for a variety of different purposes. These fall into three broad classes: (i) small- to medium-scale analyses performed via the web interface, (ii) large-scale analyses that make use of the bulk download files and (iii) reuse of data from STITCH for development of new web-based resources. Work by O'Reilly et al. on identifying potential drug targets for α1-antitrypsin deficiency exemplifies the web-based usage (35). Through a genome-wide RNAi screen in a Caenorhabditis elegans disease model, the authors identified 104 C. elegans genes of interest (having 85 human orthologs). To validate these as potential drug targets, the authors queried STITCH and MetaCore for each of the human proteins and thereby identified a compounds for use in follow-up experiments. Conversely, STITCH can also be queried for a set of chemicals to identify possible targets, as exemplified by the screen by Kumar et al. of compounds capable of altering intracellular manganese levels (36). The ability to see binding affinities in the new web interface makes STITCH 5 even better suited for such use cases than previous versions. STITCH is also commonly used for large-scale analyses, which we facilitate by making the data available for bulk download. Ligeti et al. used these files to construct a network neighborhood of proteins around each drug and showed that the neighborhood overlap of two drugs can predict synergy of drug combinations (37). On a related note, Vogt et al. made use of both the drug thesaurus and the protein–chemical interaction from STITCH to predict drug contraindications (38). Last, but not least, the integrated data provided by STITCH is useful to researchers who develop their own web resources and prediction methods. An example of this is the ChemDIS resource, which combines the protein–chemical interactions from STITCH with tools for gene enrichment analysis to link chemicals via proteins to GO terms, pathways and diseases (39). The experimental protein–chemical interactions from STITCH are also sometimes used as a benchmark set when developing prediction methods as exemplified by Zhou et al. (40).

39 in total

1. Ligand efficiency: a useful metric for lead selection.

Authors: Andrew L Hopkins; Colin R Groom; Alexander Alex
Journal: Drug Discov Today Date: 2004-05-15 Impact factor: 7.851

2. BindingDB and ChEMBL: online compound databases for drug discovery.

Authors: Anne Mai Wassermann; Jürgen Bajorath
Journal: Expert Opin Drug Discov Date: 2011-04-21 Impact factor: 6.098

3. STRING v10: protein-protein interaction networks, integrated over the tree of life.

Authors: Damian Szklarczyk; Andrea Franceschini; Stefan Wyder; Kristoffer Forslund; Davide Heller; Jaime Huerta-Cepas; Milan Simonovic; Alexander Roth; Alberto Santos; Kalliopi P Tsafou; Michael Kuhn; Peer Bork; Lars J Jensen; Christian von Mering
Journal: Nucleic Acids Res Date: 2014-10-28 Impact factor: 16.971

4. A Network-Based Target Overlap Score for Characterizing Drug Combinations: High Correlation with Cancer Clinical Trial Results.

Authors: Balázs Ligeti; Zsófia Pénzváltó; Roberto Vera; Balázs Győrffy; Sándor Pongor
Journal: PLoS One Date: 2015-06-05 Impact factor: 3.240

5. Cellular manganese content is developmentally regulated in human dopaminergic neurons.

Authors: Kevin K Kumar; Edward W Lowe; Asad A Aboud; M Diana Neely; Rey Redha; Joshua A Bauer; Mihir Odak; C David Weaver; Jens Meiler; Michael Aschner; Aaron B Bowman
Journal: Sci Rep Date: 2014-10-28 Impact factor: 4.379

6. The Comparative Toxicogenomics Database's 10th year anniversary: update 2015.

Authors: Allan Peter Davis; Cynthia J Grondin; Kelley Lennon-Hopkins; Cynthia Saraceni-Richards; Daniela Sciaky; Benjamin L King; Thomas C Wiegers; Carolyn J Mattingly
Journal: Nucleic Acids Res Date: 2014-10-17 Impact factor: 16.971

7. PID: the Pathway Interaction Database.

Authors: Carl F Schaefer; Kira Anthony; Shiva Krupa; Jeffrey Buchoff; Matthew Day; Timo Hannay; Kenneth H Buetow
Journal: Nucleic Acids Res Date: 2008-10-02 Impact factor: 16.971

8. The Reactome pathway knowledgebase.

Authors: David Croft; Antonio Fabregat Mundo; Robin Haw; Marija Milacic; Joel Weiser; Guanming Wu; Michael Caudy; Phani Garapati; Marc Gillespie; Maulik R Kamdar; Bijay Jassal; Steven Jupe; Lisa Matthews; Bruce May; Stanislav Palatnik; Karen Rothfels; Veronica Shamovsky; Heeyeon Song; Mark Williams; Ewan Birney; Henning Hermjakob; Lincoln Stein; Peter D'Eustachio
Journal: Nucleic Acids Res Date: 2013-11-15 Impact factor: 16.971

9. Expression Atlas update--a database of gene and transcript expression from microarray- and sequencing-based functional genomics experiments.

Authors: Robert Petryszak; Tony Burdett; Benedetto Fiorelli; Nuno A Fonseca; Mar Gonzalez-Porta; Emma Hastings; Wolfgang Huber; Simon Jupp; Maria Keays; Nataliya Kryvych; Julie McMurry; John C Marioni; James Malone; Karine Megy; Gabriella Rustici; Amy Y Tang; Jan Taubert; Eleanor Williams; Oliver Mannion; Helen E Parkinson; Alvis Brazma
Journal: Nucleic Acids Res Date: 2013-12-04 Impact factor: 16.971

10. Clinical drug response can be predicted using baseline gene expression levels and in vitro drug sensitivity in cell lines.

Authors: Paul Geeleher; Nancy J Cox; R Stephanie Huang
Journal: Genome Biol Date: 2014-03-03 Impact factor: 13.583

347 in total

1. Multi-omic meta-analysis identifies functional signatures of airway microbiome in chronic obstructive pulmonary disease.

Authors: Zhang Wang; Yuqiong Yang; Zhengzheng Yan; Haiyue Liu; Boxuan Chen; Zhenyu Liang; Fengyan Wang; Bruce E Miller; Ruth Tal-Singer; Xinzhu Yi; Jintian Li; Martin R Stampfli; Hongwei Zhou; Christopher E Brightling; James R Brown; Martin Wu; Rongchang Chen; Wensheng Shu
Journal: ISME J Date: 2020-07-27 Impact factor: 10.302

2. A Systems Pharmacology Approach Uncovers Wogonoside as an Angiogenesis Inhibitor of Triple-Negative Breast Cancer by Targeting Hedgehog Signaling.

Authors: Yujie Huang; Jiansong Fang; Weiqiang Lu; Zihao Wang; Qi Wang; Yuan Hou; Xingwu Jiang; Ofer Reizes; Justin Lathia; Ruth Nussinov; Charis Eng; Feixiong Cheng
Journal: Cell Chem Biol Date: 2019-06-06 Impact factor: 8.116

3. Predicting Tumor Cell Response to Synergistic Drug Combinations Using a Novel Simplified Deep Learning Model.

Authors: Heming Zhang; Jiarui Feng; Amanda Zeng; Philip Payne; Fuhai Li
Journal: AMIA Annu Symp Proc Date: 2021-01-25

4. Cognitive analysis of metabolomics data for systems biology.

Authors: Erica L-W Majumder; Elizabeth M Billings; H Paul Benton; Richard L Martin; Amelia Palermo; Carlos Guijas; Markus M Rinschen; Xavier Domingo-Almenara; J Rafael Montenegro-Burke; Bradley A Tagtow; Robert S Plumb; Gary Siuzdak
Journal: Nat Protoc Date: 2021-01-22 Impact factor: 13.491

5. A small-molecule screen reveals that HSP90β promotes the conversion of induced pluripotent stem cell-derived endoderm to a hepatic fate and regulates HNF4A turnover.

Authors: Ran Jing; Cameron B Duncan; Stephen A Duncan
Journal: Development Date: 2017-03-30 Impact factor: 6.868