Literature DB >> 28381244

DisBind: A database of classified functional binding sites in disordered and structured regions of intrinsically disordered proteins.

Jia-Feng Yu1, Xiang-Hua Dou2, Yu-Jie Sha1, Chun-Ling Wang2, Hong-Bo Wang1, Yi-Ting Chen2, Feng Zhang2, Yaoqi Zhou3,4, Ji-Hua Wang5,6.   

Abstract

BACKGROUND: Intrinsically unstructured or disordered proteins function via interacting with other molecules. Annotation of these binding sites is the first step for mapping functional impact of genetic variants in coding regions of human and other genomes, considering that a significant portion of eukaryotic genomes code for intrinsically disordered regions in proteins.
RESULTS: DisBind (available at http://biophy.dzu.edu.cn/DisBind ) is a collection of experimentally supported binding sites in intrinsically disordered proteins and proteins with both structured and disordered regions. There are a total of 226 IDPs with functional site annotations. These IDPs contain 465 structured regions (ORs) and 428 IDRs according to annotation by DisProt. The database contains a total of 4232 binding residues (from UniProt and PDB structures) in which 2836 residues are in ORs and 1396 in IDRs. These binding sites are classified according to their interacting partners including proteins, RNA, DNA, metal ions and others with 2984, 258, 383, 350, and 262 annotated binding sites, respectively. Each entry contains site-specific annotations (structured regions, intrinsically disordered regions, and functional binding regions) that are experimentally supported according to PDB structures or annotations from UniProt.
CONCLUSION: The searchable DisBind provides a reliable data resource for functional classification of intrinsically disordered proteins at the residue level.

Entities:  

Keywords:  Binding site; Database; Function classification; Intrinsic disorder; Protein disorder prediction; Protein function

Mesh:

Substances:

Year:  2017        PMID: 28381244      PMCID: PMC5382478          DOI: 10.1186/s12859-017-1620-1

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


Background

More and more proteins are shown to be partially or wholly unstructured or intrinsically disordered [1, 2]. These intrinsically disordered proteins (IDPs) or regions (IDRs) in a protein have a wide variety of functions ranging from molecular recognition, molecular assembly, protein modification to entropic chain activities [3]. Flexible disordered regions offer many unique advantages such as facilitating multiple binding partners, enabling posttranslational modifications and preventing aggregations [4]. Some of IDPs implicated in human diseases are attractive targets for drug discovery [5]. Recognizing the importance of IDPs, several databases have been built. DisProt is the first curated database that contains a collection of experimentally verified IDPs and IDRs [6]. The latest release contains a total of 694 proteins with 1539 disordered regions (a just published newer release expands to more than 800 entries [7] and we will update ours in the next version). D2P2, on the other hand, consists of computationally predicted IDPs from 1765 proteomes from 1256 distinct species [8]. Both computational and experimental annotations were used in MobiDB to annotate >500,000 disordered proteins [9]. Computational annotations relied on a consensus of predictors including IUPRED [10] and ESpritz [11]. Its most recent version [12] further linked to information from post-translational modification in universal protein resource (UniProt) [13] and STRING protein-protein interactions [14]. IDEAL [15] was a database incorporating functional with structural/disorder annotations for 582 IDPs (as of the latest release on 12/Jun/2015) by manually integrating protein data bank (PDB) [16], UniProt [13] and DisProt databases [6]. It has been focused on interaction network of IDPs with induced folding sites annotated in disordered regions. Here we have compiled a database, DisBind (Disorder Binding sites), which is dedicated to classification of functional binding sites of IDPs and proteins with both intrinsically disordered and structured regions from the DisProt database, regardless if IDPs have or do not have experimentally determined structures by induced folding. Residue-level binding sites are important first step for understanding the functional impacts of genetic variants in coding regions of human and other genomes, considering that a significant portion of eukaryotic genomes code for intrinsically disordered regions in proteins [17]. We categorize binding sites into eight categories according to their binding partners: DNA, RNA, proteins, cofactor/heme, metal ions, substrate/ligand, ATP/GTP, and others. Although some categories only have a few sites, we include them in the database for completeness. This database provides a classification of functional binding sites in IDPs annotated according to experimentally supported evidences. As a comparison, IDEAL does not contain binding sites from metals and ligands. DisProt does not contain binding site information. For completeness, both structured and disordered regions of an intrinsically disordered protein are annotated. Most disordered regions with annotated binding sites do not have known structures. Some disordered regions, however, have experimentally-determined structures when they are in complex with their interaction partners (binding induced folding or conformational selections). For those special cases, we annotated secondary structure motifs involved in binding regions which can provide a basis for initial understanding of binding mechanisms.

Construction and content

We obtained all annotated IDRs and IDPs from the recent version of DisProt database (v6.02). The binding sites for those IDPs are either retrieved from the annotation of specific binding sites in UniProt and/or derived from the high-resolution complex structures (resolution better than 3.5 Å) in PDB. Most binding sites from UniProt are ion binding sites whereas binding sites from PDB structures are mainly IDP-RNA, IDP-DNA and IDP-protein interactions. For IDPs in a complex structure, binding residues in IDPs are determined by a cutoff distance of 3.5 Å between any atoms of an IDP and its binding partner as with previous studies [18, 19]. Binding partners are classified into 8 categories: DNA, RNA, proteins, cofactor/heme,metal ions, substrate/ligand, ATP/GTP, and others. The secondary structure information of binding residues were also obtained from PDB based on the DSSP (Dictionary of protein secondary structure) assignment [20]. Eight secondary structure groups are combined into three classes i.e. α-helix (H, G, I), β-sheet (B, E) and coil (T, S, D). We note that the link to DSSP only exists for those IDPs with three-dimensional structural regions determined. If the same IDP binds with different proteins associated with different PDB structures, they were annotated separately.

Utility and Discussion

Current version of DisBind contains 226 IDPs with functional site annotations. These IDPs contain 465 structured regions (ORs) and 428 IDRs according to annotation by DisProt. For completeness, both structured and unstructured regions are annotated. The database contains a total of 4232 binding residues (from UniProt and PDB structures) in which 2836 residues are in ORs and 1396 in IDRs. In Table 1, these binding residues are further classified according to their binding partners. The largest subset of DisBind involves with binding to proteins with 772 binding residues in disordered regions. This followed by 189, 55, and 69 residues in disordered regions that interact with RNA and DNA, and metal ions, respectively. Only a few binding sites are located for the remaining functional categories.
Table 1

The number of residues and binding residues of IDPs and IDRs according to binding partners of IDPs in DisBind

Category# IDPsa # all Residues# Residues in IDRs# Binding Residues
IDPsb IDRsORsHelixc Sheetc IDPsIDRsORs
ALL226166235299081363271705439423213962836
Protein1275758612822447641299244298410701914
RNA1260401286475410613125818969
DNA3212092285392393016438355328
Metal8140351624234109--35069281
Cofactor12682511935632--41239
Substrate32579110144777--61259
ATP/GTP3214695247512220--37136
Others4422855202320832--1238115

aSome IDPs can bind to different partners. b# of residues or binding residues in IDPs refer to all residues or all binding residues regardless if they are in structured, unstructured, or unannotated regions. c# of helical or sheet residues in IDRs

Please note that IDPs may contain both structured regions and IDRs as well as un-annotated regions

The number of residues and binding residues of IDPs and IDRs according to binding partners of IDPs in DisBind aSome IDPs can bind to different partners. b# of residues or binding residues in IDPs refer to all residues or all binding residues regardless if they are in structured, unstructured, or unannotated regions. c# of helical or sheet residues in IDRs Please note that IDPs may contain both structured regions and IDRs as well as un-annotated regions Figure 1 shows the top page of DisBind which consists of seven parts: ‘Home’, ‘Classification’ , ‘Browse’, ‘Search’, ‘Blast’,‘Download’ and ‘Help’. Under the ‘Classification’ option, the collected items can be retrieved according to their partners (i.e., DNA, RNA, protein, cofactor/heme, metal ions, substrate/ligand, ATP/GTP and others). All items collected in DisBind numbered from N00001 to N00226 can be also retrieved by clicking ‘Browse’ option. Alternatively, a user can obtain the collected information by inputting any keywords by the ‘Search’ option or protein sequences by the ‘Blast’ option. In addition, all of binding sites along with their secondary structures can be downloaded in the fasta format. ‘Help” page contains detailed explanation of each page and meaning of color codes.
Fig. 1

The front page of DisBind database

The front page of DisBind database The information stored for each IDP has five parts as demonstrated in Fig. 2 by using N00004 as an example. Part I provides the basic information such as identification numbers from DisBind, DisProt, UniProt, and NCBI along with the protein name and its sequence length. Part II contains specific binding sites and corresponding binding partners according to UniProt annotations and/or the PDB complex structure along with PDB ID #. A click on the PDB ID# will directly link to the protein databank for structural visualization. These sites along with annotated disordered regions by DisProt are highlighted in the sequence. The secondary structure in disordered regions is shown along with sequence presented in Part III. Parts IV and V contains comments from DisProt regarding the disordered protein and corresponding references on functional annotations, respectively.
Fig. 2

Information collected for each IDP as demonstrated for IDP N00004

Information collected for each IDP as demonstrated for IDP N00004

Conclusion

DisBind is a database dedicated to residue-level classification of functional binding sites in disordered and structured regions of intrinsically disordered proteins. This database compiled information from the structural database (protein databank), the database of experimentally validated disordered proteins (DisProt), and the comprehensive protein sequence and functional database (UniProt). The database is fully searchable and freely accessible. In the next version of the dataset, we will significantly expand the dataset by including disordered proteins (>17000) that are indirectly supported by X-ray crystallography and Nuclear Magnetic resonance collected in MobiDB [12]. Moreover, we plan to incorporate predicted regions using existing techniques such as IUPRED [10] and ESpritz [11] as well as recently accurate developed techniques such as SPOT-Disorder [21]. This large dataset should provide an ultimate resource for functional site classifications in IDPs.

Availability and requirements

Database homepage: http://biophy.dzu.edu.cn/DisBind. These data are freely available without restrictions for use by academics.
  21 in total

1.  Orderly order in protein intrinsic disorder distribution: disorder in 3500 proteomes from viruses and the three domains of life.

Authors:  Bin Xue; A Keith Dunker; Vladimir N Uversky
Journal:  J Biomol Struct Dyn       Date:  2012

2.  Rational drug design via intrinsically disordered protein.

Authors:  Yugong Cheng; Tanguy LeGall; Christopher J Oldfield; James P Mueller; Ya-Yue J Van; Pedro Romero; Marc S Cortese; Vladimir N Uversky; A Keith Dunker
Journal:  Trends Biotechnol       Date:  2006-07-28       Impact factor: 19.536

Review 3.  Advantages of proteins being disordered.

Authors:  Zhirong Liu; Yongqi Huang
Journal:  Protein Sci       Date:  2014-03-17       Impact factor: 6.725

4.  Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features.

Authors:  W Kabsch; C Sander
Journal:  Biopolymers       Date:  1983-12       Impact factor: 2.505

5.  Intrinsic protein disorder in complete genomes.

Authors:  A K Dunker; Z Obradovic; P Romero; E C Garner; C J Brown
Journal:  Genome Inform Ser Workshop Genome Inform       Date:  2000

6.  Prediction and functional analysis of native disorder in proteins from the three kingdoms of life.

Authors:  J J Ward; J S Sodhi; L J McGuffin; B F Buxton; D T Jones
Journal:  J Mol Biol       Date:  2004-03-26       Impact factor: 5.469

7.  MetaDBSite: a meta approach to improve protein DNA-binding sites prediction.

Authors:  Jingna Si; Zengming Zhang; Biaoyang Lin; Michael Schroeder; Bingding Huang
Journal:  BMC Syst Biol       Date:  2011-06-20

8.  STRING v10: protein-protein interaction networks, integrated over the tree of life.

Authors:  Damian Szklarczyk; Andrea Franceschini; Stefan Wyder; Kristoffer Forslund; Davide Heller; Jaime Huerta-Cepas; Milan Simonovic; Alexander Roth; Alberto Santos; Kalliopi P Tsafou; Michael Kuhn; Peer Bork; Lars J Jensen; Christian von Mering
Journal:  Nucleic Acids Res       Date:  2014-10-28       Impact factor: 16.971

9.  Corrigendum: DisProt 7.0: a major update of the database of disordered proteins.

Authors:  Damiano Piovesan; Francesco Tabaro; Ivan Mičetić; Marco Necci; Federica Quaglia; Christopher J Oldfield; Maria Cristina Aspromonte; Norman E Davey; Radoslav Davidović; Zsuzsanna Dosztányi; Arne Elofsson; Alessandra Gasparini; András Hatos; Andrey V Kajava; Lajos Kalmar; Emanuela Leonardi; Tamas Lazar; Sandra Macedo-Ribeiro; Mauricio Macossay-Castillo; Attila Meszaros; Giovanni Minervini; Nikoletta Murvai; Jordi Pujols; Daniel B Roche; Edoardo Salladini; Eva Schad; Antoine Schramm; Beata Szabo; Agnes Tantos; Fiorella Tonello; Konstantinos D Tsirigos; Nevena Veljković; Salvador Ventura; Wim Vranken; Per Warholm; Vladimir N Uversky; A Keith Dunker; Sonia Longhi; Peter Tompa; Silvio C E Tosatto
Journal:  Nucleic Acids Res       Date:  2016-12-13       Impact factor: 16.971

10.  D²P²: database of disordered protein predictions.

Authors:  Matt E Oates; Pedro Romero; Takashi Ishida; Mohamed Ghalwash; Marcin J Mizianty; Bin Xue; Zsuzsanna Dosztányi; Vladimir N Uversky; Zoran Obradovic; Lukasz Kurgan; A Keith Dunker; Julian Gough
Journal:  Nucleic Acids Res       Date:  2012-11-29       Impact factor: 16.971

View more
  5 in total

1.  Partner-specific prediction of RNA-binding residues in proteins: A critical assessment.

Authors:  Yong Jung; Yasser El-Manzalawy; Drena Dobbs; Vasant G Honavar
Journal:  Proteins       Date:  2018-12-30

Review 2.  Liquid-Liquid Phase Separation by Intrinsically Disordered Protein Regions of Viruses: Roles in Viral Life Cycle and Control of Virus-Host Interactions.

Authors:  Stefania Brocca; Rita Grandori; Sonia Longhi; Vladimir Uversky
Journal:  Int J Mol Sci       Date:  2020-11-28       Impact factor: 5.923

3.  IDPsBind: a repository of binding sites for intrinsically disordered proteins complexes with known 3D structures.

Authors:  CanZhuang Sun; YongE Feng; GuoLiang Fan
Journal:  BMC Mol Cell Biol       Date:  2022-07-26

4.  DIBS: a repository of disordered binding sites mediating interactions with ordered proteins.

Authors:  Eva Schad; Erzsébet Fichó; Rita Pancsa; István Simon; Zsuzsanna Dosztányi; Bálint Mészáros
Journal:  Bioinformatics       Date:  2018-02-01       Impact factor: 6.937

5.  Decision-Tree Based Meta-Strategy Improved Accuracy of Disorder Prediction and Identified Novel Disordered Residues Inside Binding Motifs.

Authors:  Bi Zhao; Bin Xue
Journal:  Int J Mol Sci       Date:  2018-10-07       Impact factor: 5.923

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.