Literature DB >> 31161214

WDSPdb: an updated resource for WD40 proteins.

Jing Ma1, Ke An1, Jing-Bo Zhou1, Nuo-Si Wu2, Yang Wang3,4, Zhi-Qiang Ye1, Yun-Dong Wu1,5.   

Abstract

SUMMARY: The WD40-repeat proteins are a large family of scaffold molecules that assemble complexes in various cellular processes. Obtaining their structures is the key to understanding their interaction details. We present WDSPdb 2.0, a significantly updated resource providing accurately predicted secondary and tertiary structures and featured sites annotations. Based on an optimized pipeline, WDSPdb 2.0 contains about 600 thousand entries, an increase of 10-fold, and integrates more than 37 000 variants from sources of ClinVar, Cosmic, 1000 Genomes, ExAC, IntOGen, cBioPortal and IntAct. In addition, the web site is largely improved for visualization, exploring and data downloading.
AVAILABILITY AND IMPLEMENTATION: http://www.wdspdb.com/wdsp/ or http://wu.scbb.pkusz.edu.cn/wdsp/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2019. Published by Oxford University Press.

Entities:  

Mesh:

Substances:

Year:  2019        PMID: 31161214      PMCID: PMC6853709          DOI: 10.1093/bioinformatics/btz460

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

The WD40-repeat proteins are a subfamily of β-propellers, and their sequence and structure relationships and association with diseases have been widely studied (Kopec and Lupas, 2013; Paoli, 2001; Pons ; Song ). As one of the most popular interactors in protein–protein interaction (PPI) networks, they act as scaffolds to assemble various molecular machineries, and play versatile roles in fundamental biological processes including signal transduction, ubiquitination, cell cycle control, etc. (Stirnimann ; Xu and Min, 2011). Obtaining their structural information is the key to revealing their interacting details and thus to understanding their biological functions and to obtaining insights to the underlying pathogenic mechanisms, but available experimental structures are heavily lacked regarding their abundance in eukaryotic proteomes. WDSPdb (Wang ) is a database providing accurate structure predictions and featured sites annotations specifically for WD40 domains, based on the WDSP tool (Wang ). WD40 domains, as a type of β-propellers, are composed of several repeated β-sheet units with a circular layout. WDSPdb offers the boundaries of β-strands for each repeat unit, and affords thermal-stabilizing hydrogen bond network sites and potential interaction hotspots. These data are deficient in general-purpose domain databases, but are indispensable to understand the functional roles of WD40 proteins. Since its publication, WDSPdb 1.0 has served the scientific community frequently. However, its contents are currently heavily lagged compared to the rapid increase of protein sequences in public databases, and the data coverage is relatively small due to its over-strict criteria of data inclusion. In this work, we have optimized the overall curation pipeline, and then applied it to a more recent version of UniProtKB (The UniProt Consortium, 2017) to construct WDSPdb 2.0.

2 Materials and methods

We first updated the WDSP tool, which relies on a WD40-specific position weight matrix (PWM) and PSIPRED (Buchan ) as backends. We have expanded the experimental structures from 33 to 65 for generating the PWM. Meanwhile, the PSIPRED has been replaced from V3 to V4. With these improvements, the updated WDSP has outperformed its previous version and other general-purpose domain annotations, as measured by the 5-fold cross-validation F1 score (Supplementary Material). We have then optimized the overall annotation pipeline for more comprehensive inclusion of WD40 proteins and better annotation quality (Fig. 1A and Supplementary Material). The optimized pipeline is briefly described as follows: (i) retrieve the protein sequences of UniProtKB, including Swiss-Prot and TrEMBL section. (ii) Utilize the HMMSearch to screen all the input sequences based on 54 WD40-related profiles, and retain the sequences with E-value no greater than 10 as WD40 candidates. (iii) Employ the updated WDSP to predict the secondary structures and the featured sites. (iv) Assign confidence categories (‘High’, ‘Middle’ and ‘Low’) to each WD40 candidate according to our customized rules. (v) Use MODELLER (Webb and Sali, 2016) to build 3D structures for single-domain WD40s, and integrate missense variants and their associated annotations to WD40s with ‘High’ confidence from the Swiss-Prot section.
Fig. 1.

(A) The pipeline of curating WDSPdb 2.0. (B) The distribution of WD40 proteins in different confidence categories, calculated separately for Swiss-Prot and TrEMBL source. The WD40 proteins in both WDSPdb 1.0 and 2.0 are indicated with slashes. ‘All’ equals the sum of ‘High’, ‘Middle’ and ‘Low’

(A) The pipeline of curating WDSPdb 2.0. (B) The distribution of WD40 proteins in different confidence categories, calculated separately for Swiss-Prot and TrEMBL source. The WD40 proteins in both WDSPdb 1.0 and 2.0 are indicated with slashes. ‘All’ equals the sum of ‘High’, ‘Middle’ and ‘Low’ WDSPdb 2.0 is based on UniProtKB (release July 5, 2017), a more recent version, and an optimized curation pipeline allowing the inclusion of more non-canonical WD40 proteins. As a result, the data coverage is about 10 times of WDSPdb 1.0. In brief, it contains 594 319 WD40 proteins with 4 033 034 repeats from 4426 species. Among these proteins, 852 295 potential side-chain hydrogen bond networks and 4 963 216 PPI hotspots were predicted. Specifically, from ClinVar (Landrum ), Cosmic (Forbes ), IntOGen (Gonzalez-Perez ), cBioPortal (Cerami ), IntAct (Kerrien ), 1000 Genomes (1000 Genomes Project Consortium, 2015) and ExAC (Exome Aggregation Consortium, 2016), we have mapped to 252 WD40 proteins 37 184 variants, which are pathogenic, cancer-related, cancer-driver, cancer highly recurrent, PPI-influencing or neutral. WDSPdb 2.0 comprises almost all of the entries in WDSPdb 1.0, and only a few entries are exclusive in WDSPdb 1.0 due to entry merging, removing and renaming in the process of UniProtKB updates (Fig. 1B and Supplementary Material). As expected, the intersection of WDSPdb 2.0 and 1.0 mainly belongs to the ‘High’ confidence category, and most newly added entries are assigned to other confidence categories, since the new pipeline has adopted looser inclusion criteria. Many proteins that are widely considered as WD40 proteins but absent in WDSPdb 1.0 have been included in WDSPdb 2.0, such as LRRK2, PALB2 and APAF1. Taken together, WDSPdb 2.0 is much more comprehensive regarding the record number and annotation information. We re-implemented the web interface using Django to provide cleaner and more organized browsing experiences. It adopts a powerful table plug-in that enables customized data display and download in multiple formats, and has replaced the visualization tool to NGL viewer (Rose and Hildebrand, 2015) for faster loading and smoother operation. A REST service has also been implemented for downloading the secondary structure annotations. In addition, we deployed the updated WDSP tool with options of parameter tuning (the searching database and the iterative times), which would provide predictions for users’ own sequences.

3 Conclusion and discussion

WDSPdb 2.0 has incorporated significant improvements. The version 1.0 is confined to typical WD40 proteins only, but users have frequently requested annotations of atypical ones. This update recorded as many as possible putative WD40 proteins with more accurate structure predictions, and has assigned confidence levels to meet requirements of customized usages. The integration of variant data will enable the direct and intuitive exploring of the relationship between variants and featured sites in the structural context. The web interface is also largely enhanced for better browsing, visualization, and downloading. We will regularly update WDSPdb to continuously benefit the researchers in the fields of repeat proteins, PPIs and genetic variants interpretation. Click here for additional data file.
  19 in total

Review 1.  Protein folds propelled by diversity.

Authors:  M Paoli
Journal:  Prog Biophys Mol Biol       Date:  2001       Impact factor: 3.667

Review 2.  WD40 proteins propel cellular networks.

Authors:  Christian U Stirnimann; Evangelia Petsalaki; Robert B Russell; Christoph W Müller
Journal:  Trends Biochem Sci       Date:  2010-05-05       Impact factor: 13.807

3.  Comparative Protein Structure Modeling Using MODELLER.

Authors:  Benjamin Webb; Andrej Sali
Journal:  Curr Protoc Bioinformatics       Date:  2016-06-20

4.  The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data.

Authors:  Ethan Cerami; Jianjiong Gao; Ugur Dogrusoz; Benjamin E Gross; Selcuk Onur Sumer; Bülent Arman Aksoy; Anders Jacobsen; Caitlin J Byrne; Michael L Heuer; Erik Larsson; Yevgeniy Antipin; Boris Reva; Arthur P Goldberg; Chris Sander; Nikolaus Schultz
Journal:  Cancer Discov       Date:  2012-05       Impact factor: 39.397

5.  Scalable web services for the PSIPRED Protein Analysis Workbench.

Authors:  Daniel W A Buchan; Federico Minneci; Tim C O Nugent; Kevin Bryson; David T Jones
Journal:  Nucleic Acids Res       Date:  2013-06-08       Impact factor: 16.971

6.  WDSPdb: a database for WD40-repeat proteins.

Authors:  Yang Wang; Xue-Jia Hu; Xu-Dong Zou; Xian-Hui Wu; Zhi-Qiang Ye; Yun-Dong Wu
Journal:  Nucleic Acids Res       Date:  2014-10-27       Impact factor: 16.971

7.  NGL Viewer: a web application for molecular visualization.

Authors:  Alexander S Rose; Peter W Hildebrand
Journal:  Nucleic Acids Res       Date:  2015-04-29       Impact factor: 16.971

8.  Analysis of protein-coding genetic variation in 60,706 humans.

Authors:  Monkol Lek; Konrad J Karczewski; Eric V Minikel; Kaitlin E Samocha; Eric Banks; Timothy Fennell; Anne H O'Donnell-Luria; James S Ware; Andrew J Hill; Beryl B Cummings; Taru Tukiainen; Daniel P Birnbaum; Jack A Kosmicki; Laramie E Duncan; Karol Estrada; Fengmei Zhao; James Zou; Emma Pierce-Hoffman; Joanne Berghout; David N Cooper; Nicole Deflaux; Mark DePristo; Ron Do; Jason Flannick; Menachem Fromer; Laura Gauthier; Jackie Goldstein; Namrata Gupta; Daniel Howrigan; Adam Kiezun; Mitja I Kurki; Ami Levy Moonshine; Pradeep Natarajan; Lorena Orozco; Gina M Peloso; Ryan Poplin; Manuel A Rivas; Valentin Ruano-Rubio; Samuel A Rose; Douglas M Ruderfer; Khalid Shakir; Peter D Stenson; Christine Stevens; Brett P Thomas; Grace Tiao; Maria T Tusie-Luna; Ben Weisburd; Hong-Hee Won; Dongmei Yu; David M Altshuler; Diego Ardissino; Michael Boehnke; John Danesh; Stacey Donnelly; Roberto Elosua; Jose C Florez; Stacey B Gabriel; Gad Getz; Stephen J Glatt; Christina M Hultman; Sekar Kathiresan; Markku Laakso; Steven McCarroll; Mark I McCarthy; Dermot McGovern; Ruth McPherson; Benjamin M Neale; Aarno Palotie; Shaun M Purcell; Danish Saleheen; Jeremiah M Scharf; Pamela Sklar; Patrick F Sullivan; Jaakko Tuomilehto; Ming T Tsuang; Hugh C Watkins; James G Wilson; Mark J Daly; Daniel G MacArthur
Journal:  Nature       Date:  2016-08-18       Impact factor: 49.962

9.  UniProt: the universal protein knowledgebase.

Authors: 
Journal:  Nucleic Acids Res       Date:  2016-11-29       Impact factor: 16.971

10.  A method for WD40 repeat detection and secondary structure prediction.

Authors:  Yang Wang; Fan Jiang; Zhu Zhuo; Xian-Hui Wu; Yun-Dong Wu
Journal:  PLoS One       Date:  2013-06-11       Impact factor: 3.240

View more
  8 in total

1.  Receptor-like role for PQLC2 amino acid transporter in the lysosomal sensing of cationic amino acids.

Authors:  Gabriel Talaia; Joseph Amick; Shawn M Ferguson
Journal:  Proc Natl Acad Sci U S A       Date:  2021-02-23       Impact factor: 11.205

2.  Identification of a protein unique to the genus Plasmodium that contains a WD40 repeat domain and extensive low-complexity sequence.

Authors:  Gladys T Cortés; Martha Margarita Gonzalez Beltran; Claudio J Gómez-Alegría; Mark F Wiser
Journal:  Parasitol Res       Date:  2021-06-18       Impact factor: 2.289

3.  Maize Shrek1 encodes a WD40 protein that regulates pre-rRNA processing in ribosome biogenesis.

Authors:  Hui Liu; Zhihui Xiu; Huanhuan Yang; Zhaoxing Ma; Dalin Yang; Hongqiu Wang; Bao-Cai Tan
Journal:  Plant Cell       Date:  2022-09-27       Impact factor: 12.085

4.  Proximity labeling reveals non-centrosomal microtubule-organizing center components required for microtubule growth and localization.

Authors:  Ariana D Sanchez; Tess C Branon; Lauren E Cote; Alexandros Papagiannakis; Xing Liang; Melissa A Pickett; Kang Shen; Christine Jacobs-Wagner; Alice Y Ting; Jessica L Feldman
Journal:  Curr Biol       Date:  2021-07-08       Impact factor: 10.900

5.  Activation of cryptic splicing in bovine WDR19 is associated with reduced semen quality and male fertility.

Authors:  Maya Hiltpold; Guanglin Niu; Naveen Kumar Kadri; Danang Crysnanto; Zih-Hua Fang; Mirjam Spengeler; Fritz Schmitz-Hsu; Christian Fuerst; Hermann Schwarzenbacher; Franz R Seefried; Frauke Seehusen; Ulrich Witschi; Angelika Schnieke; Ruedi Fries; Heinrich Bollwein; Krzysztof Flisikowski; Hubert Pausch
Journal:  PLoS Genet       Date:  2020-05-14       Impact factor: 5.917

6.  Understanding the Early Evolutionary Stages of a Tandem Drosophilamelanogaster-Specific Gene Family: A Structural and Functional Population Study.

Authors:  Bryan D Clifton; Jamie Jimenez; Ashlyn Kimura; Zeinab Chahine; Pablo Librado; Alejandro Sánchez-Gracia; Mashya Abbassi; Francisco Carranza; Carolus Chan; Marcella Marchetti; Wanting Zhang; Mijuan Shi; Christine Vu; Shudan Yeh; Laura Fanti; Xiao-Qin Xia; Julio Rozas; José M Ranz
Journal:  Mol Biol Evol       Date:  2020-09-01       Impact factor: 16.240

7.  Pwp1 regulates telomere length by stabilizing shelterin complex and maintaining histone H4K20 trimethylation.

Authors:  Yangyang Yu; Wenwen Jia; Yao Lyu; Dingwen Su; Mingliang Bai; Junwei Shen; Jing Qiao; Tong Han; Wenqiang Liu; Jiayu Chen; Wen Chen; Dan Ye; Xudong Guo; Songcheng Zhu; Jiajie Xi; Ruixin Zhu; Xiaoping Wan; Shaorong Gao; Jiyue Zhu; Jiuhong Kang
Journal:  Cell Discov       Date:  2019-11-05       Impact factor: 10.849

Review 8.  The role of WDR76 protein in human diseases.

Authors:  Jie Yang; Fei Wang; Baoan Chen
Journal:  Bosn J Basic Med Sci       Date:  2021-10-01       Impact factor: 3.363

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.