Literature DB >> 25348404

WDSPdb: a database for WD40-repeat proteins.

Yang Wang1, Xue-Jia Hu1, Xu-Dong Zou1, Xian-Hui Wu1, Zhi-Qiang Ye2, Yun-Dong Wu3.   

Abstract

WD40-repeat proteins, as one of the largest protein families, often serve as platforms to assemble functional complexes through the hotspot residues on their domain surfaces, and thus play vital roles in many biological processes. Consequently, it is highly required for researchers who study WD40 proteins and protein-protein interactions to obtain structural information of WD40 domains. Systematic identification of WD40-repeat proteins, including prediction of their secondary structures, tertiary structures and potential hotspot residues responsible for protein-protein interactions, may constitute a valuable resource upon this request. To achieve this goal, we developed a specialized database WDSPdb (http://wu.scbb.pkusz.edu.cn/wdsp/) to provide these details of WD40-repeat proteins based on our recently published method WDSP. The WDSPdb contains 63,211 WD40-repeat proteins identified from 3383 species, including most well-known model organisms. To better serve the community, we implemented a user-friendly interactive web interface to browse, search and download the secondary structures, 3D structure models and potential hotspot residues provided by WDSPdb.
© The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Year:  2014        PMID: 25348404      PMCID: PMC4383882          DOI: 10.1093/nar/gku1023

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

WD40-repeat protein, named for containing one or multiple WD40 domains, is one of the largest protein families. About 1% genes in human genome encode proteins that belong to this family (1). A typical WD40 domain consists of 6–8 structurally conserved WD40 repeats, each of which containing four anti-parallel β-strands, and then folds into a β-propeller conformation exposing three types of surfaces, i.e. top, bottom and side surfaces. Through certain residues on these surfaces (hotspot residues), WD40 proteins extensively take part in protein–protein interactions (PPIs) (2–4), and as a result, they often serve as hubs in cellular networks. They mainly provide platforms to assemble proteins or nucleic acids into functional complexes, which play vital roles in many biological processes, such as DNA replication, transcription, RNA processing, histone modification/recognition and protein degradation. For example, through PPIs provided by WD40 platforms, protein complexes such as E3 ubiquitin ligase (5,6), G protein (7) and MLL1 (8) are formed and then able to perform their bioactivities. Consequently, it is of great importance to predict or annotate the structural information and potential PPI hotspot residues of WD40 domains in order to understand the functionality of them. To date, protein family (including the WD40-repeat family) annotations are presented in several databases, such as UniProt (9), SMART (10), Pfam (11), Prosite (12) and Superfamily (13). Although these databases provide valuable information about thousands of protein families or subfamilies, sensitively identifying WD40-repeat proteins and deriving their structure information remains a challenge. As a result, the number of WD40 proteins in proteomes is still much underestimated (1). The main difficulty of family annotation for WD40 proteins is that the average pairwise sequence identity for WD40 domain is too low for most regular HMM or sequence pattern recognition models (14). Moreover, for most WD40 proteins, detailed structural information and potential residues for PPIs are still lacking in those general-purpose databases. It would be highly demanding to develop a comprehensive WD40-repeat protein family-specific knowledgebase to provide such important information. Recently, we reported a method WDSP (WD40 Structure Predictor) to identify WD40 repeats and to predict the secondary structures of WD40 domains based on their primary sequences (14). More practically, it can determine the repeat and beta-strand boundary more accurately based on better-predicted secondary structures (14). We further improved this tool to build the 3D structure models of WD40 domains as well as the potential surface hotspot residues responsible for PPIs. WDSP incorporated both local residue information (amino acid occurrence preference and loop length propensity) and non-local WD40 family specific structural features (conserved hydrogen-bond network (15,16) and β-bulge (17) into the scoring function, and used a genetic algorithm to combine the identified WD40 repeats into domains. As a result, it achieved higher sensitivity than other general-purpose tools such as Pfam, Prosite and SMART, while keeping very low false positive rates (14). By using WDSP, we scanned all protein sequences in UniProt database and annotated 63 211 WD40 proteins, each of which comprises at least six WD40 repeats. In addition to the predicted secondary structures and potential hotspot residues, the tertiary structure models for these WD40 domains were built and stored in the database, namely WDSPdb. WDSPdb used a user-friendly 3D structure visualization interface and a color-highlighted texting manner to display the aforementioned detailed information. We believe WDSPdb will benefit the WD40 and PPI research community, especially the experimentalists who are not familiar with protein structure modeling tools.

MATERIALS AND METHODS

Data summary

The data source of protein sequences in WDSPdb is from the UniProt knowledgebase (release version 201310). Taken together, 63 211 WD40-repeat proteins with 71 480 WD40 domains and 489 411 WD40 repeats from 3383 species were identified by the WDSP program (Table 1). Among these proteins, 726 685 potential hotspot residues responsible for PPIs on the top surface were predicted. WD40 proteins were known to be abundant in eukaryotes while considered rare in prokaryotes (1). Interestingly, a large number of bacteria WD40 proteins were also hit by WDSP program and were stored in our database. The WDSP program also identified some WD40 proteins in archaea and viruses.
Table 1.

Statistics of WDSPdb. The numbers of identified WD40 proteins (with ≥6 repeats), WD40 domains, WD40 repeats, and potential hotspots in total, different taxa and several model organisms

CategoryWD40 proteinsWD40 domainsWD40 repeatsPotential hotspotsSpecies
Total63 21171 480489 411726 6853383
Eukaryota58 28465 311447 323662 833860
Bacteria4832606541 35862 7042476
Archaea505941963734
Virus454531151113
Homo sapiens61070848377084
Mus musculus56265945086530
Danio rerio40746732424788
Drosophila melanogaster29931921933159
Caenorhabditis elegans14215710761562
Arabidopsis thaliana35838426353866
Oryza sativa1618123178
Saccharomyces cerevisiae8392635969
Schizosaccharomyces pombe1041157871,147

The framework of WDSPdb

Figure 1 shows the details of identifying and classifying the WD40-repeat proteins in WDSPdb. First, for each protein sequence in the UniProt database, the WDSP program was used to identify WD40 repeats. If no less than six WD40 repeats were identified, the protein was classified as a WD40 protein that contains one or more WD40 domains (if more than eight repeats). Second, for each WD40 domain, the secondary structure of the domain and potential hotspot residues for PPIs were predicted and displayed in a table of the result page (Figure 2A). Third, the predicted 3D structure models were presented in the interactive JSmol applet (http://jsmol.sourceforge.net/). Finally, general annotations extracted from the UniProt database were also shown in the result page.
Figure 1.

The framework of WDSPdb.

Figure 2.

(A) The secondary structure table provided by the output of the WDSP. Each row represents a WD40 repeat sequence. Secondary structure markers are colored in the table heading. Residues shown in blue in each repeat form family-conserved DHSW tetrad hydrogen bond networks for structure stabilization. Residues shown in red are hotspot residues predicted to be responsible for PPI. (B) The structure of DHSW tetrad hydrogen bonds network. (C) The interactive interface implemented by Jsmol applet for viewing and manipulating the 3D structure. When clicking on the potential hotspot residues listed in the table, they will display as sticks with red labels.

The framework of WDSPdb. (A) The secondary structure table provided by the output of the WDSP. Each row represents a WD40 repeat sequence. Secondary structure markers are colored in the table heading. Residues shown in blue in each repeat form family-conserved DHSW tetrad hydrogen bond networks for structure stabilization. Residues shown in red are hotspot residues predicted to be responsible for PPI. (B) The structure of DHSW tetrad hydrogen bonds network. (C) The interactive interface implemented by Jsmol applet for viewing and manipulating the 3D structure. When clicking on the potential hotspot residues listed in the table, they will display as sticks with red labels.

DHSW tetrad hydrogen bond networks and 3D structure models generation

In most WD40 proteins, one or more DHSW tetrad (four residues consisting of Asp, His, Ser and Trp) hydrogen bond networks (blue residues in Figure 2A and B) can always be identified. These tetrads are specifically conserved in WD40 protein family and were proved to contribute much to structural stabilization (15,16). In fact, besides tetrads, pentad and triad hydrogen bond networks also exist in WD40 proteins widely. They are all important WD40 structural features, and highlighting them in the result page will help researchers understand the structural stability intuitively. Moreover, identification of these structural features substantially benefits the 3D structure prediction procedure that follows. The 3D structure models of WD40 domain were generated by an in-house program WDSP3D. It combines Modeller v9.12 homology modeling package (18) and secondary structure-based sequence alignment to obtain more accurate 3D structure predictions. In the Modeller input file, DHSW tetrads identified by WDSP were treated with distance restraints. The PDB structure with the closest sequence identity was used as the template for each annotated WD40 domain, and simulated annealing Molecular Dynamics (MD) refinement process was used for each model. In our test, the backbone structures of predicted models (using different PDB structures as templates) are quite consistent with the original PDB structure, while the long loop structure is more arbitrary. The loops and side-chains structures can further be refined using longer-time MD simulations. The refined structures will be updated to the database in the future.

Potential hotspot residues responsible for PPIs

Hotspot residues are the major contributors for a certain PPI. In the initial version of this database, we provided the potential hotspot residues prediction on the top surface. Gaudet et al. and Wu et al. first reported that WD40 protein binds other proteins on the top surface by the 16th, 18th and 34th residues of each WD40 blade (19,20). We found it is actually a common phenomenon in the WD40 protein family (17). These three residues are at the R1, R1–2 and D-1 positions, where the R1 is one of the three residues (R1, R2, X) in a WD40-protein-family-conserved β-bulge, the R1–2 is the position two residues ahead of the R1 and the D-1 refers to the position just ahead of the Asp residue forming the DHSW tetrad. If binding-type residues (Arg, His, Lys, Asp, Glu, Trp, Tyr, Phe, Leu, Ile, Met, Asn, Gln) occur at these R1, R1–2 and D-1 positions, we assign them as potential hotspot residues. Thus, up to 18–24 residues (each WD40 domain has 6–8 WD40 repeats) are possibly assigned as potential hotspot residues (red residues in Figure 2A and C) in a WD40 domain. We believe this information can help not only experimentalists to select mutagenesis residues, but also computational biologists to build protein complex models. We also provided a convenient way to accentuate these residues on 3D structure models, i.e. when clicking on a potential hotspot residue listed in the secondary structure table, this residue will display as sticks in the 3D model panel (Figure 2C).

WEB INTERFACE

Database organization

MySQL was used as the database management system. Two tables were created to store the data. One table stored the general information of proteins, and the other stored the detailed structural information of WD40 domains. UniProt ID of each protein is the main key to organize and link the two tables. We adopted Tomcat as the web server utility, and JSP technology was utilized to display the results from browsing and searching.

Data browse and search

To present the data clearly and nicely, WDSPdb provided two different ways to view the data: (i) Users can browse by species. In the ‘DataBase’ drop-down menu, several well-known species names can be directly selected to display all identified WD40 proteins within this organism. (ii) Users can view the data by searching UniProt ID, gene name, Genbank ID or description. Users can also restrict their search within a specified taxon or organism by a drop-down menu.

The result page

The result page (Figure 3) from the database browsing or searching is composed of three parts, that is, the general annotations extracted from UniProt database, the interactive JSmol applet to present the 3D structure model and a table to show the detailed secondary structure, specific hydrogen bond network for structure stabilization and hotspot residues for PPIs. Within the structure panel of the JSmol applet, users are not only able to rotate, zoom and move the structure to get better visual angles, but also can do more sophisticated operations with the imbedded JSmol console. All of these data, including 3D structures, can be downloaded for further investigation.
Figure 3.

The result page for each identified WD40-repeat protein, which comprises general annotation from UniProt database, JSmol applet presenting the predicted 3D structure and the detailed secondary structure table.

The result page for each identified WD40-repeat protein, which comprises general annotation from UniProt database, JSmol applet presenting the predicted 3D structure and the detailed secondary structure table.

The WDSP predictor page

We reserved the WDSP program page to allow the users to input their own protein sequences and to judge whether the query sequence contains WD40 domains. If a WD40 domain exists, the secondary structure and hotspot residues table will be shown in the result page.

DISCUSSION

Comparison with other databases

We compared WD40 proteins in our database with those in other widely used protein family databases: UniProt, SMART, Pfam and Prosite (Table 2). By WDSP, we have identified 99 262 proteins with at least one WD40 repeat and 63 211 proteins with at least six WD40 repeats. Compared with other four databases, WDSPdb contains many more predicted WD40 proteins especially for those with at least six WD40 repeats. These WD40 repeats could form single or multiple complete WD40 domains potentially. Using human protein WDR46 as an example, which is a well-known WD40 protein, only WDSPdb identified seven WD40 repeats and a regular WD40 domain can thus be inferred. However, none of the other databases annotated them completely. Moreover, WDSPdb is superior to other databases in the record number of multiple-WD40-domain proteins (with more than eight WD40 repeats). Taken together, WDSPdb stored 7444 proteins with multiple WD40 domains, and we found that they are more likely to appear in bacteria than in eukaryota.
Table 2.

Comparison of WD40 proteins among WDSPdb, SMART, Pfam, Prosite, UniProt databases and the union set of SMART+Pfam+Prosite+UniProt database

Protein (repeat≥1)1Protein (repeat≥6)2Mutil-WD40-domain proteins
WDSPdb99 26263 2117444
SMART83 87739 3786511
Pfam73 29815 0182256
Prosite68 37697501749
UniProt31962033198
SMART+Pfam+Prosite+UniProt84 91239 8836610

1Only proteins with at least six WD40 repeats are stored in WDSPdb, since a WD40 domain requires at least six WD40 repeats to form a complete structure.

2Proteins with more than eight WD40 repeats.

1Only proteins with at least six WD40 repeats are stored in WDSPdb, since a WD40 domain requires at least six WD40 repeats to form a complete structure. 2Proteins with more than eight WD40 repeats.

Conclusion and future perspectives

WDSPdb is a specialized WD40-repeat protein structures and potential PPI hotspot residues database. It contains the most comprehensive list of WD40 proteins while keeping low false positive rate. WDSPdb will be a powerful tool for scientists who are studying WD40 proteins or WD40 interacting proteins. From the structural point of view, to visualize potential hotspot residues and variants in the 3D structures is very helpful for understanding why some variants are disease-causing but others are not. Actually, our result from WDSPdb was successfully applied to interpret several recently discovered disease-causing mutations (21). We believe that WDSPdb will be utilized in a broader spectrum of circumstances for comprehending the structural basis of many biological processes. We will continue to add comprehensive annotations and links from the literature and other databases to WDSPdb. We will also improve the underlying algorithm of WDSP to get more accurate results for WDSPdb. It is worth pointing out that the top surface of WD40 domain is the most active surface for PPIs, while the side surface and bottom surface could also participate in binding. We will gradually include potential PPI or protein–DNA interaction hotspot residues on these surfaces into our database. Moreover, our recently developed RSFF1 force field (22) will be used to refine our 3D structure models, especially for those with long loops.

AVAILABILITY

WDSPdb is freely available at http://wu.scbb.pkusz.edu.cn/wdsp/.
  22 in total

Review 1.  WD40 proteins propel cellular networks.

Authors:  Christian U Stirnimann; Evangelia Petsalaki; Robert B Russell; Christoph W Müller
Journal:  Trends Biochem Sci       Date:  2010-05-05       Impact factor: 13.807

Review 2.  Diversity of WD-repeat proteins.

Authors:  Temple F Smith
Journal:  Subcell Biochem       Date:  2008

3.  Is Asp-His-Ser/Thr-Trp tetrad hydrogen-bond network important to WD40-repeat proteins: a statistical and theoretical study.

Authors:  Xian-Hui Wu; Hui Zhang; Yun-Dong Wu
Journal:  Proteins       Date:  2010-04

4.  Histone H3 recognition and presentation by the WDR5 module of the MLL1 complex.

Authors:  Alexander J Ruthenburg; Wooikoon Wang; Daina M Graybosch; Haitao Li; C David Allis; Dinshaw J Patel; Gregory L Verdine
Journal:  Nat Struct Mol Biol       Date:  2006-07-09       Impact factor: 15.369

Review 5.  The WD repeat: a common architecture for diverse functions.

Authors:  T F Smith; C Gaitatzes; K Saxena; E J Neer
Journal:  Trends Biochem Sci       Date:  1999-05       Impact factor: 13.807

6.  Folding a WD repeat propeller. Role of highly conserved aspartic acid residues in the G protein beta subunit and Sec13.

Authors:  I Garcia-Higuera; C Gaitatzes; T F Smith; E J Neer
Journal:  J Biol Chem       Date:  1998-04-10       Impact factor: 5.157

Review 7.  CRL4s: the CUL4-RING E3 ubiquitin ligases.

Authors:  Sarah Jackson; Yue Xiong
Journal:  Trends Biochem Sci       Date:  2009-10-07       Impact factor: 13.807

8.  Structure of a beta-TrCP1-Skp1-beta-catenin complex: destruction motif binding and lysine specificity of the SCF(beta-TrCP1) ubiquitin ligase.

Authors:  Geng Wu; Guozhou Xu; Brenda A Schulman; Philip D Jeffrey; J Wade Harper; Nikola P Pavletich
Journal:  Mol Cell       Date:  2003-06       Impact factor: 17.970

9.  Mutation of POC1B in a severe syndromic retinal ciliopathy.

Authors:  Bodo B Beck; Jennifer B Phillips; Malte P Bartram; Jeremy Wegner; Michaela Thoenes; Andrea Pannes; Josephina Sampson; Raoul Heller; Heike Göbel; Friederike Koerber; Antje Neugebauer; Andrea Hedergott; Gudrun Nürnberg; Peter Nürnberg; Holger Thiele; Janine Altmüller; Mohammad R Toliat; Simon Staubach; Kym M Boycott; Enza Maria Valente; Andreas R Janecke; Tobias Eisenberger; Carsten Bergmann; Lars Tebbe; Yang Wang; Yundong Wu; Andrew M Fry; Monte Westerfield; Uwe Wolfrum; Hanno J Bolz
Journal:  Hum Mutat       Date:  2014-08-11       Impact factor: 4.878

10.  SUPERFAMILY--sophisticated comparative genomics, data mining, visualization and phylogeny.

Authors:  Derek Wilson; Ralph Pethica; Yiduo Zhou; Charles Talbot; Christine Vogel; Martin Madera; Cyrus Chothia; Julian Gough
Journal:  Nucleic Acids Res       Date:  2008-11-26       Impact factor: 16.971

View more
  44 in total

1.  Genome Wide Analysis of WD40 Proteins in Saccharomyces cerevisiae and Their Orthologs in Candida albicans.

Authors:  Buddhi Prakash Jain
Journal:  Protein J       Date:  2019-02       Impact factor: 2.371

2.  Artificial Recruitment of UAF1-USP Complexes by a PHLPP1-E1 Chimeric Helicase Enhances Human Papillomavirus DNA Replication.

Authors:  David Gagnon; Michaël Lehoux; Jacques Archambault
Journal:  J Virol       Date:  2015-04-01       Impact factor: 5.103

Review 3.  The substrate specificity of eukaryotic cytosolic chaperonin CCT.

Authors:  Keith R Willison
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2018-06-19       Impact factor: 6.237

Review 4.  C9orf72: At the intersection of lysosome cell biology and neurodegenerative disease.

Authors:  Joseph Amick; Shawn M Ferguson
Journal:  Traffic       Date:  2017-03-23       Impact factor: 6.215

5.  Centriole Remodeling during Spermiogenesis in Drosophila.

Authors:  Atul Khire; Kyoung H Jo; Dong Kong; Tara Akhshi; Stephanie Blachon; Anthony R Cekic; Sarah Hynek; Andrew Ha; Jadranka Loncarek; Vito Mennella; Tomer Avidor-Reiss
Journal:  Curr Biol       Date:  2016-10-27       Impact factor: 10.834

6.  TBL1XR1 Mutations Drive Extranodal Lymphoma by Inducing a Pro-tumorigenic Memory Fate.

Authors:  Leandro Venturutti; Matt Teater; Andrew Zhai; Amy Chadburn; Leena Babiker; Daleum Kim; Wendy Béguelin; Tak C Lee; Youngjun Kim; Christopher R Chin; William T Yewdell; Brian Raught; Jude M Phillip; Yanwen Jiang; Louis M Staudt; Michael R Green; Jayanta Chaudhuri; Olivier Elemento; Pedro Farinha; Andrew P Weng; Michael D Nissen; Christian Steidl; Ryan D Morin; David W Scott; Gilbert G Privé; Ari M Melnick
Journal:  Cell       Date:  2020-07-02       Impact factor: 41.582

7.  Human Cytomegalovirus Hijacks WD Repeat Domain 11 for Virion Assembly Compartment Formation and Virion Morphogenesis.

Authors:  Bo Yang; YongXuan Yao; Han Cheng; William J Britt; Sitang Gong; Min-Hua Luo; Xian-Zhang Wang; Yue-Peng Zhou; Sheng-Nan Huang; Xuehui Ma; Hong Yang; Jinpeng Wu; Xuan Jiang; Shuang Cheng; Jin-Yan Sun; Wen-Bo Zeng; Jason Chen; Fu-Kun Zhang; Hong-Jie Shen; Jian-Yang Gu; Michael A McVoy
Journal:  J Virol       Date:  2022-01-12       Impact factor: 6.549

8.  RAMOSA1 ENHANCER LOCUS2-Mediated Transcriptional Repression Regulates Vegetative and Reproductive Architecture.

Authors:  Xue Liu; Mary Galli; Iris Camehl; Andrea Gallavotti
Journal:  Plant Physiol       Date:  2018-10-22       Impact factor: 8.340

Review 9.  WD40 repeat domain proteins: a novel target class?

Authors:  Matthieu Schapira; Mike Tyers; Maricel Torrent; Cheryl H Arrowsmith
Journal:  Nat Rev Drug Discov       Date:  2017-10-13       Impact factor: 84.694

10.  DbStRiPs: Database of structural repeats in proteins.

Authors:  Broto Chakrabarty; Nita Parekh
Journal:  Protein Sci       Date:  2021-03-06       Impact factor: 6.725

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.