Literature DB >> 29036655

MFIB: a repository of protein complexes with mutual folding induced by binding.

Erzsébet Fichó¹, István Reményi², István Simon¹, Bálint Mészáros¹.

Abstract

MOTIVATION: It is commonplace that intrinsically disordered proteins (IDPs) are involved in crucial interactions in the living cell. However, the study of protein complexes formed exclusively by IDPs is hindered by the lack of data and such analyses remain sporadic. Systematic studies benefited other types of protein-protein interactions paving a way from basic science to therapeutics; yet these efforts require reliable datasets that are currently lacking for synergistically folding complexes of IDPs.
RESULTS: Here we present the Mutual Folding Induced by Binding (MFIB) database, the first systematic collection of complexes formed exclusively by IDPs. MFIB contains an order of magnitude more data than any dataset used in corresponding studies and offers a wide coverage of known IDP complexes in terms of flexibility, oligomeric composition and protein function from all domains of life. The included complexes are grouped using a hierarchical classification and are complemented with structural and functional annotations. MFIB is backed by a firm development team and infrastructure, and together with possible future community collaboration it will provide the cornerstone for structural and functional studies of IDP complexes.
AVAILABILITY AND IMPLEMENTATION: MFIB is freely accessible at http://mfib.enzim.ttk.mta.hu/. The MFIB application is hosted by Apache web server and was implemented in PHP. To enrich querying features and to enhance backend performance a MySQL database was also created. CONTACT: simon.istvan@ttk.mta.hu, meszaros.balint@ttk.mta.hu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical

Mesh：

Substances：

Year: 2017 PMID： 29036655 PMCID： PMC5870711 DOI： 10.1093/bioinformatics/btx486

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

Intrinsically disordered proteins (IDPs) do not have a stable structure under native conditions (Wright and Dyson, 1999), yet they perform crucial biological roles being deeply embedded in regulatory and signaling pathways, amongst others (Dyson and Wright, 2005; Wright and Dyson, 2015). Despite the lack of intrinsic tertiary structure of IDPs, many critical biological processes require them to interact with molecular partners, most often other proteins. During the vast majority of these interactions IDPs do adopt a stable bound structure—hence their folding is coupled to binding (Sugase ) giving rise to weak, transient, yet highly specific interactions. In accord, IDPs often represent hubs of protein–protein interaction networks (Haynes ) presenting promising therapeutic targets (Joshi and Vendruscolo, 2015). In line with their biological importance, IDPs are heavily studied. The resulting information are collected in disorder-specific databases (such as DisProt, Piovesan or IDEAL, Fukuchi ) and are disseminated as various levels of annotation in core biology databases, such as UniProt (Pundir ). The majority of these information pertains to the establishment of which protein regions are disordered and which have intrinsic structure, with some additional information about the detailed structural properties of IDPs (Varadi ). These data are in turn used to develop prediction algorithms that enable the in silico identification of IDP regions (Oates ) and functional sites (Dosztanyi ; Malhis ), which aids experimental verification, creating an iterative synergistic workflow. This targeted research and synergy can be seen in the identification of IDPs; other areas of unstructural biology still lack this kind of focus. The identification of the interactions of IDPs in structural detail seems to be much more sporadic, lacking systematic targeted efforts. While no specific IDP interaction database exists, a subset of such interactions have been studied in detail (Mészáros ; Mohan ). The interaction between IDPs and ordered proteins are often mediated by short linear motifs (SLiMs) residing in the IDP partner (Fuxreiter ), and in accord, SLiM databases—such as the Eukaryotic Linear Motif database (Dinkel )—can provide a starting point for structural studies of IDP–ordered protein interactions. In contrast to the study of IDP–ordered protein interactions, protein complexes formed exclusively by IDPs are far less understood from both structural and functional points of view. The primary reason behind the lack of systematic research of IDP-only complexes is the lack of well-organized and accessible data. While several such complexes are known (and some have been studied in detail, see for example, Demarest ), no specific database exists, and the majority of corresponding data are scattered in various databases. Yet, a targeted database often proves to be not only beneficial, but vital for the development of research areas in biology (Baxevanis and Bateman, 2015). Our current work lays this missing foundation of the systematic structural/functional studies of IDP complexes by assembling Mutual Folding Induced by Binding (MFIB). MFIB is constructed by integrating information from a range of databases and a wealth of literature to assemble by far the largest repository of protein complexes, where the interacting chains mutually fold as a result of the interaction.

2 Database assembly

MFIB aims to serve as a starting point for the functional and structural analysis of interactions between IDPs. In accord, the existence of a solved complex structure of the interacting protein partners was a prerequisite for inclusion in the dataset. The existence of a solved structure also serves as verification of the interaction and proof that the proteins involved in fact adopt a stable structure upon interacting. Accordingly, the PDB (version March 28, 2017) was taken as a starting point, and was filtered and annotated using various criteria and information from other databases to derive a high-quality set of interacting IDPs. Structures that contain at least two protein chains in interaction were selected and were filtered for structure quality (keeping only nuclear magnetic resonance structures, and X-ray structures with a resolution better than 5 Å to discard poor quality structures) and biological relevance (discarding chimeras and other structures containing non-biological polypeptide chains). Complexes where non-protein chains—typically DNA and RNA—participate in the interaction were also discarded. The remaining set of candidate complexes were annotated based on experimental evidence in various annotation databases (see Fig. 1). Disorder annotations were taken from DisProt (version 7 v0.4) (Piovesan ) and IDEAL (version March 29, 2017) (Fukuchi ). Using these manually curated information, protein chains in the candidate PDB complexes were annotated using three different approaches.

Fig. 1

Workflow of the construction of MFIB. The figure shows the annotation steps of a hypothetical example of three interacting disordered protein regions, where the three chains are annotated through direct, UniRef90-transfer and Pfam-transfer of annotations (marked A, B and C, respectively). Light grey boxes represent disordered protein regions. Smaller black boxes mark regions that are present in the candidate PDB structure. Boxes with dashed outline represent Pfam objects. Arrows show the transfer of annotations either with direct sequence comparisons (direct annotations between UniProt sequences) or with mapping (using Pfam, UniRef90 clusters, or BLAST in the case of transfer between UniRef90 sequences and between UniProt and the PDB candidate proteins) First, some candidate protein chains had direct disorder annotations, meaning that they cover the same region in the corresponding UniProt protein sequence as referenced in disorder databases. Second, annotations were transferred to close homologues, considering proteins that share at least 90% sequence identity (i.e. they belong to the same UniRef90 sequence cluster). As the third level of annotations, disorder information was transferred through Pfam (release 31.0, Bateman, 2000) objects (families, domains, motifs or repeats). If a Pfam object covered at least 70% of both an interacting chain and a disorder annotation, then the disordered status was also assigned to the interacting chain. Taking all three types of annotations (direct, UniRef90-transferred and Pfam-transferred) into account, all candidate complexes were categorized. Complexes containing only disordered chains were kept; and complexes with both disordered chains and chains without annotations were further inspected. If evidence uncovered using literature searches indicated that the unknown chains were in fact disordered, the complex was also kept. The database-based annotations coupled with information from the literature resulted in a set of 1406 complexes that all exclusively contain protein chains that are disordered in their monomeric form. Each complex is manually inspected by database curators with a focus on the validity of the experimental evidence for disorder to assure the reliability of the database. Curators also check the true biological assemblies of the complexes using PISA (Proteins, Interfaces, Structures and Assemblies) to avoid the inclusion of non-biological contacts due to crystallization. These manually curated protein complexes together comprise MFIB. To reduce redundancy, complexes in MFIB were clustered based on sequence similarities of their constituent chains. Protein chains were considered to be similar if they belong to the same UniRef90 cluster and show at least 70% overlap. Two complexes are deemed related if they contain the same number of proteins, and the proteins from the two structures show pairwise similarity. Related complexes were grouped into clusters forming the entries in MFIB. This clustering grouped the 1406 structures into 205 MFIB entries. Furthermore, each entry in MFIB is assigned a class and a subclass during the manual annotation and curation step. Supplementary Table S1 shows the 8 classes and 33 subclasses currently defined in MFIB.

3 Web interface

MFIB is made available through a dedicated website at http://mfib.enzim.ttk.mta.hu/. The 205 entries representing interactions of IDPs form the core of MFIB. Accordingly, each entry is assigned a unique accession and has a separate page that details information about the given complex. Furthermore, the MFIB server also includes features to ease searching and navigating through the database. The ‘Home’ page describes the basis and purpose of the database for users unfamiliar with MFIB. The ‘Statistics’ page shows basic statistics about MFIB. The ‘Help’ page answers questions connected to the conception, assembly, design and usability of the database and the server. MFIB also offers several ways of structured access to the database including browsing, searching and multiple ways of downloading data in XML and text formats for local use.

4 Discussion

The construction of MFIB presents the first systematic collection of data concerning complexes formed by IDPs. It is based on the integration of structural and sequence annotation databases coupled with the results of an extensive manual literature survey. Previous studies of complexes of mutually folding IDPs were typically based on 10–35 structures (Gunasekaran ; Nussinov ; Rumfeldt ). In contrast, MFIB contains over 1400 complex structures organized into 205 entries. These data provide the missing cornerstone of future structural and functional studies of the synergistic folding of IDPs. The data contained in MFIB not only far surpasses the number of complexes used in previous analyses but also provides a wide coverage of possible IDP–IDP interactions in many ways. Entries in MFIB cover all three domains of life and also include complexes from viral proteins shedding light on the importance of synergistic folding in host–pathogen interactions. MFIB entries also cover the majority of possible oligomeric compositions from dimers to hexamers, including both hetero- and homo-oligomers. Most importantly, entries in MFIB also cover the known spectrum of protein disorder. Protein disorder is a highly heterogeneous property with various IDPs exhibiting markedly different levels of flexibility in their unbound form. MFIB contains complexes of IDP regions from near random coil proteins (such as the CBP (CREB Binding Protein)-interacting region of ACTR, Demarest ), through molten globules (such as the Arc repressor, Peng ) to near-ordered structures, where a monomeric structure can be stabilized with a limited number of mutations (such as the nucleoside diphosphate kinase, Giartosio ). The presented MFIB database currently presents the far largest collection of interactions between IDPs; yet there are undoubtedly many more information scattered in the PDB and the literature that are not currently incorporated. In accord, we consider the present version of MFIB as a stepping stone and plan to constantly update, expand and revise the database. This process will rely on the past experience of the authors in database-maintenance, the firm technical and infrastructural background of the initiative, and the encouragement of a community effort to contribute to MFIB. Click here for additional data file.

25 in total

1. Mutual synergistic folding in recruitment of CBP/p300 by p160 nuclear receptor coactivators.

Authors: Stephen J Demarest; Maria Martinez-Yamout; John Chung; Hongwu Chen; Wei Xu; H Jane Dyson; Ronald M Evans; Peter E Wright
Journal: Nature Date: 2002-01-31 Impact factor: 49.962

Review 2. Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm.

Authors: P E Wright; H J Dyson
Journal: J Mol Biol Date: 1999-10-22 Impact factor: 5.469

3. Local structural disorder imparts plasticity on linear motifs.

Authors: Monika Fuxreiter; Peter Tompa; István Simon
Journal: Bioinformatics Date: 2007-03-25 Impact factor: 6.937

4. UniProt Protein Knowledgebase.

Authors: Sangya Pundir; Maria J Martin; Claire O'Donovan
Journal: Methods Mol Biol Date: 2017

5. Mechanism and evolution of protein dimerization.

Authors: D Xu; C J Tsai; R Nussinov
Journal: Protein Sci Date: 1998-03 Impact factor: 6.725

Review 6. Intrinsically disordered proteins in cellular signalling and regulation.

Authors: Peter E Wright; H Jane Dyson
Journal: Nat Rev Mol Cell Biol Date: 2015-01 Impact factor: 94.444

7. Thermal stability of hexameric and tetrameric nucleoside diphosphate kinases. Effect of subunit interaction.

Authors: A Giartosio; M Erent; L Cervoni; S Moréra; J Janin; M Konrad; I Lascu
Journal: J Biol Chem Date: 1996-07-26 Impact factor: 5.157

Review 8. Bioinformatical approaches to characterize intrinsically disordered/unstructured proteins.

Authors: Zsuzsanna Dosztányi; Bálint Mészáros; István Simon
Journal: Brief Bioinform Date: 2009-12-10 Impact factor: 11.622

9. IDEAL in 2014 illustrates interaction networks composed of intrinsically disordered proteins and their binding partners.

Authors: Satoshi Fukuchi; Takayuki Amemiya; Shigetaka Sakamoto; Yukiko Nobe; Kazuo Hosoda; Yumiko Kado; Seiko D Murakami; Ryotaro Koike; Hidekazu Hiroaki; Motonori Ota
Journal: Nucleic Acids Res Date: 2013-10-30 Impact factor: 16.971

10. MoRFchibi SYSTEM: software tools for the identification of MoRFs in protein sequences.

Authors: Nawar Malhis; Matthew Jacobson; Jörg Gsponer
Journal: Nucleic Acids Res Date: 2016-05-12 Impact factor: 16.971

34 in total

1. Where differences resemble: sequence-feature analysis in curated databases of intrinsically disordered proteins.

Authors: Marco Necci; Damiano Piovesan; Silvio C E Tosatto
Journal: Database (Oxford) Date: 2018-01-01 Impact factor: 3.451

Review 2. Dynamic conformational flexibility and molecular interactions of intrinsically disordered proteins.

Authors: Anil Bhattarai; Isaac Arnold Emerson
Journal: J Biosci Date: 2020 Impact factor: 1.826

3. Predicting Protein Conformational Disorder and Disordered Binding Sites.

Authors: Ketty C Tamburrini; Giulia Pesce; Juliet Nilsson; Frank Gondelaud; Andrey V Kajava; Jean-Guy Berrin; Sonia Longhi
Journal: Methods Mol Biol Date: 2022

4. DisProt: intrinsic protein disorder annotation in 2020.

Authors: András Hatos; Borbála Hajdu-Soltész; Alexander M Monzon; Nicolas Palopoli; Lucía Álvarez; Burcu Aykac-Fas; Claudio Bassot; Guillermo I Benítez; Martina Bevilacqua; Anastasia Chasapi; Lucia Chemes; Norman E Davey; Radoslav Davidović; A Keith Dunker; Arne Elofsson; Julien Gobeill; Nicolás S González Foutel; Govindarajan Sudha; Mainak Guharoy; Tamas Horvath; Valentin Iglesias; Andrey V Kajava; Orsolya P Kovacs; John Lamb; Matteo Lambrughi; Tamas Lazar; Jeremy Y Leclercq; Emanuela Leonardi; Sandra Macedo-Ribeiro; Mauricio Macossay-Castillo; Emiliano Maiani; José A Manso; Cristina Marino-Buslje; Elizabeth Martínez-Pérez; Bálint Mészáros; Ivan Mičetić; Giovanni Minervini; Nikoletta Murvai; Marco Necci; Christos A Ouzounis; Mátyás Pajkos; Lisanna Paladin; Rita Pancsa; Elena Papaleo; Gustavo Parisi; Emilie Pasche; Pedro J Barbosa Pereira; Vasilis J Promponas; Jordi Pujols; Federica Quaglia; Patrick Ruch; Marco Salvatore; Eva Schad; Beata Szabo; Tamás Szaniszló; Stella Tamana; Agnes Tantos; Nevena Veljkovic; Salvador Ventura; Wim Vranken; Zsuzsanna Dosztányi; Peter Tompa; Silvio C E Tosatto; Damiano Piovesan
Journal: Nucleic Acids Res Date: 2020-01-08 Impact factor: 16.971

10. IUPred3: prediction of protein disorder enhanced with unambiguous experimental annotation and visualization of evolutionary conservation.

Authors: Gábor Erdős; Mátyás Pajkos; Zsuzsanna Dosztányi
Journal: Nucleic Acids Res Date: 2021-07-02 Impact factor: 16.971