Lihua Liu1, David R Damerell2, Leonidas Koukouflis2, Yufeng Tong1,3,4, Brian D Marsden2,5, Matthieu Schapira1,3. 1. Structural Genomics Consortium, University of Toronto, Toronto, ON, Canada. 2. Structural Genomics Consortium, University of Oxford, Headington Oxford, Oxfordshire, UK. 3. Department of Pharmacology and Toxicology, University of Toronto, Toronto, ON, Canada. 4. Department of Chemistry and Biochemistry, University of Windsor, Windsor, ON, Canada. 5. Kennedy Institute of Rheumatology, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Roosevelt Drive, Headington, Oxford, Oxfordshire, UK.
Abstract
MOTIVATION: Protein ubiquitination plays a central role in important cellular machineries such as protein degradation or chromatin-mediated signaling. With the recent discovery of the first potent ubiquitin-specific protease inhibitors, and the maturation of proteolysis targeting chimeras as promising chemical tools to exploit the ubiquitin-proteasome system, protein target classes associated with ubiquitination pathways are becoming the focus of intense drug-discovery efforts. RESULTS: We have developed UbiHub, an online resource that can be used to visualize a diverse array of biological, structural and chemical data on phylogenetic trees of human protein families involved in ubiquitination signaling, including E3 ligases and deubiquitinases. This interface can inform target prioritization and drug design, and serves as a navigation tool for medicinal chemists, structural and cell biologists exploring ubiquitination pathways. AVAILABILITY AND IMPLEMENTATION: https://ubihub.thesgc.org.
MOTIVATION: Protein ubiquitination plays a central role in important cellular machineries such as protein degradation or chromatin-mediated signaling. With the recent discovery of the first potent ubiquitin-specific protease inhibitors, and the maturation of proteolysis targeting chimeras as promising chemical tools to exploit the ubiquitin-proteasome system, protein target classes associated with ubiquitination pathways are becoming the focus of intense drug-discovery efforts. RESULTS: We have developed UbiHub, an online resource that can be used to visualize a diverse array of biological, structural and chemical data on phylogenetic trees of human protein families involved in ubiquitination signaling, including E3 ligases and deubiquitinases. This interface can inform target prioritization and drug design, and serves as a navigation tool for medicinal chemists, structural and cell biologists exploring ubiquitination pathways. AVAILABILITY AND IMPLEMENTATION: https://ubihub.thesgc.org.
In the past 2 years, important progress was made in drug-discovery targeting ubiquitination pathways. First, potent, selective and reversible chemical inhibitors of ubiquitin-specific proteases (USPs), a protein family that had resisted intense medicinal chemistry efforts for over a decade, were discovered (Gavory et al., 2017; Kategaya ; Liang ; Turnbull ). Second, proteolysis targeting chimeras (PROTACs)—heterobifunctional molecules that recruit E3 ligases to protein targets for ubiquitination and subsequent proteasomal degradation—matured from a novel chemical biology concept to a promising paradigm for drug discovery (Churcher, 2018). To date, selective inhibitors were disclosed for 2 out of 57 USPs in the human genome (USP1 and USP7) (Gavory et al., 2017; Kategaya ; Liang ; Turnbull ), and PROTACs are currently exploiting 5 out of > 600 human E3 ligases (VHL, CRBN, MDM2, IAPs and DCAF15) (Chan ; Demizu ; Fischer ; Han ; Ohoka ; Schneekloth ; Testa ; Uehara ). The rapidly growing body of data on the biology, structure and chemistry of these emerging and important target classes will guide target selection and drug design.Here, we present UbiHub, an online data hub where drug-discovery scientists focused on ubiquitination pathways can easily navigate data of relevance to their work. The UbiHub graphic user interface is based on the representation of protein families as phylogenetic trees, onto which heterogeneous data collected from diverse repositories and the literature can be projected and scrutinized.
2 Materials and methods
2.1 Assembling protein families
Four protein families are included in UbiHub: 8 E1 ubiquitin activating enzymes, 41 E2 ubiquitin conjugating enzymes, 634 E3 ubiquitin ligases and 113 de-ubiquitinases (DUBs). The composition of each family was derived from searches of their respective signature domains in the PFAM (Finn ), and SMART (Schultz ) databases. Previously reported atypical enzymes were added to the E1 list (Schulman and Harper, 2009). The E3 ligases list was complemented with a previously reported genome-wide functional annotation of human E3s and a systematic inventory of DCAFs (Lee and Zhou, 2007; Li ). To improve visibility of the very large E3 family, it was divided into 297 proteins relying on multi-subunit complexes (mostly E3s interacting with Cullins, adaptor proteins and E2-recruiting subunits) and 337 standalone E3 ligases. DUBs were divided into 57 USPs and 56 functionally related, but biochemically distinct non-USP proteins. The composition and subfamily classification of DUBs was based on a previously reported inventory of deubiquitinating enzymes (Nijman ) and on the latest developments in the field (Kwasna ; Maurer and Wertz, 2016).
2.2 Ubiquitin-proteasome system association
Ubiquitination can serve as a signal for ubiquitin-proteasome system (UPS)-mediated degradation or other non-degradation related signaling pathways. The association of E3 ligases to the UPS was estimated automatically and assigned a confidence score of 0 (no indication of UPS association) to 3 (reliable UPS association) based on 3 criteria. First we looked whether the word ‘degrad’ was found in the Function section of the UniProt entry of the protein (UniProt Consortium, 2018). Second, we searched for the word ‘degrad’ among the Reactome pathways (Fabregat ) linked to the protein. Third, we compiled for each E3 ligase the list of Reactome pathways assigned to all protein interactors from the BioGrid database (Stark, 2006), and searched for the word ‘degrad’ in the pathways that were enriched among these interactors (pathways enriched at least three times compared with their prevalence in the human proteome, and found in at least three interactors). The UPS association score was set to 0, 1, 2 or 3 when none, one, two or all of these conditions were met respectively. Upon literature review of over 30 random E3s, we found the score to be reasonable in over 90% of cases, and adjusted it manually when it was found inaccurate.
2.3 Phylogenetic trees and data collection
Phylogenetic trees are generated, and biological, structural and chemical data collected as previously described for ChromoHub (Liu ; Shah ), and stored in a MySQL database. Additionally, gene essentiality in cancer is extracted from the Broad Institute’s cancer dependency map, where we use data from CRISPR-knockout studies and essentiality scores corrected for copy-number effect, and data from RNAi knock-down studies using DEMETER2 normalization (McFarland ; Meyers ).
3 Results
The graphical user interface is based on zoom-able phylogenetic trees that represent any pre-selected protein family. In the case of E3 ligases, users can choose to only display proteins that are associated with the UPS with a pre-defined confidence level. A checkbox menu allows users to simultaneously tag proteins on a tree with diverse icons related to biological, structural or chemical data. Clicking on any of these icons brings pop-up windows with figures providing further details and html links to the source of information (PubMed record or public repository such as PDB entry). The checkbox menu includes click-able ‘?’ symbols next to each menu item that can be used to display information on the data source and the way the data were processed. Through this graphical interface, users can have a bird’s-eye view of the disease association landscape of an entire protein family, medicinal chemists can rapidly retrieve compounds co-crystallized with their protein target, structural biologists can inspect the structural coverage of a protein or its phylogenetic neighbors, and cell biologists can find the K or IC50 and selectivity profile of chemical inhibitors, produce the chemical coverage of E3 ligases involved in the UPS, or quickly visualize the cancer dependency map of USPs.
Authors: Sebastian M B Nijman; Mark P A Luna-Vargas; Arno Velds; Thijn R Brummelkamp; Annette M G Dirac; Titia K Sixma; René Bernards Journal: Cell Date: 2005-12-02 Impact factor: 41.582
Authors: Wei Li; Mario H Bengtson; Axel Ulbrich; Akio Matsuda; Venkateshwar A Reddy; Anthony Orth; Sumit K Chanda; Serge Batalov; Claudio A P Joazeiro Journal: PLoS One Date: 2008-01-23 Impact factor: 3.240
Authors: Robert D Finn; Alex Bateman; Jody Clements; Penelope Coggill; Ruth Y Eberhardt; Sean R Eddy; Andreas Heger; Kirstie Hetherington; Liisa Holm; Jaina Mistry; Erik L L Sonnhammer; John Tate; Marco Punta Journal: Nucleic Acids Res Date: 2013-11-27 Impact factor: 16.971
Authors: Odetta Antico; Alban Ordureau; Michael Stevens; Francois Singh; Raja S Nirujogi; Marek Gierlinski; Erica Barini; Mollie L Rickwood; Alan Prescott; Rachel Toth; Ian G Ganley; J Wade Harper; Miratul M K Muqit Journal: Sci Adv Date: 2021-11-12 Impact factor: 14.136