Literature DB >> 29136219

MobiDB 3.0: more annotations for intrinsic disorder, conformational diversity and interactions in proteins.

Damiano Piovesan1, Francesco Tabaro1,2, Lisanna Paladin1, Marco Necci1,3,4, Ivan Micetic1, Carlo Camilloni5, Norman Davey6,7, Zsuzsanna Dosztányi8, Bálint Mészáros8,9, Alexander M Monzon10, Gustavo Parisi10, Eva Schad9, Pietro Sormanni11, Peter Tompa9,12,13, Michele Vendruscolo11, Wim F Vranken12,13,14, Silvio C E Tosatto1,15.   

Abstract

The MobiDB (URL: mobidb.bio.unipd.it) database of protein disorder and mobility annotations has been significantly updated and upgraded since its last major renewal in 2014. Several curated datasets for intrinsic disorder and folding upon binding have been integrated from specialized databases. The indirect evidence has also been expanded to better capture information available in the PDB, such as high temperature residues in X-ray structures and overall conformational diversity. Novel nuclear magnetic resonance chemical shift data provides an additional experimental information layer on conformational dynamics. Predictions have been expanded to provide new types of annotation on backbone rigidity, secondary structure preference and disordered binding regions. MobiDB 3.0 contains information for the complete UniProt protein set and synchronization has been improved by covering all UniParc sequences. An advanced search function allows the creation of a wide array of custom-made datasets for download and further analysis. A large amount of information and cross-links to more specialized databases are intended to make MobiDB the central resource for the scientific community working on protein intrinsic disorder and mobility.
© The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Substances:

Year:  2018        PMID: 29136219      PMCID: PMC5753340          DOI: 10.1093/nar/gkx1071

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

The protein structure-function paradigm is a cornerstone of molecular biology, offering a mechanistic understanding of processes ranging from enzyme catalysis, signal transduction to molecular recognition and allosteric regulation. Underlying this paradigm is the assumption that proteins become functional by assuming a well-defined structure, typically described by the coordinates of all its atoms. A solid foundation of this view is provided by the 130 000 structures of proteins and complexes in the Protein Data Bank, PDB (1). However, it is increasingly recognized that many proteins do not obey this rule. Intrinsically disordered proteins (IDPs) or regions (IDRs) are devoid of order in their native unbound state (2–4). Intrinsic disorder is prevalent in the human proteome (5), appears to play important signaling and regulatory roles (2) and is frequently involved in disease (6). The discovery of intrinsic disorder and its prevalence and functional importance is transforming the field of molecular biology. As intrinsic disorder is emerging as a general phenomenon, databases are collecting and presenting disorder related data in a systematic manner. MobiDB has been a major contributor by providing consensus predictions and functional annotations for all UniProt proteins, driving the field ahead (7,8). The MobiDB upgrade we present in this paper is essential for several reasons. There is a rapid advance in the functional understanding of intrinsic disorder. The functional classification of IDPs/IDRs is becoming ever more elaborate, with several newly recognized functional mechanisms (9). For example, the central role of intrinsic disorder in the formation of membraneless organelles, such as nucleoli and stress granules, by liquid-liquid phase separation has been characterized recently (10–13). A wide range of experimental observations on the structure-function relationship of IDPs/IDRs is furthering our understanding of disordered states and of the manners in which they function (14–16). These developments have also played a central role in the recent update of the DisProt database (17), the central repository of experimentally characterized IDPs and IDRs. The re-curated version of this database contains experimental observations of disorder for more than 800 protein entries and a renewed functional ontology schema. The experimental evidence on which it rests has also been significantly augmented to include a broad range of biophysical techniques. DisProt is the basis for most developments in disorder predictors (18,19), and its recent update is a major motivation for a new version of MobiDB. Additional developments in the field make this release timely. A major source of intrinsic disorder is the identification of residues with missing atomic coordinates in the PDB, which can now be augmented by cryo-electron microscopy (cryo-EM) data. This is having a tremendous impact on structural biology (20,21). Structural descriptions of IDPs and IDRs under physiological conditions have also greatly advanced and are starting to appear in dedicated databases such as IDEAL (22), DIBS (23) and MFIB (24). IDPs and IDRs can perform key roles in molecular recognition by folding upon binding of short linear motifs (SLiMs) covered in the ELM database (25). Generally, the full functional characterization of IDPs and IDRs requires the description not just of their free (disordered) states (26,27), but also of their residual dynamics in the bound states (28). Fuzzy (disordered) complexes can be found in FuzDB (29) and structural ensembles describing the free form (30) in the protein ensemble database (PED (31)). Techniques such as in-cell Nuclear magnetic resonance (NMR) spectroscopy (32,33) and single-molecule fluorescence (34) will soon help study these structures in the physiological state. In reflection of all these developments, we are now launching a significantly updated version of our database, MobiDB 3.0. The new version incorporates additional curated data from specialized databases. Novel annotation features include disorder derived from publicly available NMR chemical shift data (35) and an extended list of predictors. Database searches are facilitated by an improved search algorithm, pre-calculated data and new sections in the database.

DATABASE DESCRIPTION

MobiDB 3.0 is intended to be a central resource for large-scale intrinsic disorder sequence annotation. This new version is organized by both type of disorder annotation and quality of disorder evidence (Figure 1). Disorder information is grouped in three different sections: disorder, linear interacting peptides (LIPs) and secondary structure populations. The latter represents the conformational heterogeneity of IDPs and IDRs as the ability to populate different secondary structure populations in solution. LIPs are structure fragments that interact with other molecules preserving an elongated structure or folding upon binding. The data in MobiDB is organized hierarchically. The top tier is formed by manually curated data from external databases and represents the highest quality annotations. Annotations derived from experimental data such as X-ray and NMR chemical shifts are indirect but far more abundant. At the bottom, predictions provide disorder annotation at lower confidence than experimental evidence. The main disorder definition in MobiDB is provided by a consensus combining all available sources prioritizing curated and indirect evidences over predictions in analogy to the previous version (8). In the following, we will describe the main recent improvements since the previous release. The database schema, web interface and server have been completely redesigned and the underlying technology updated. The feature viewer showing sequence annotations is now fully dynamic and allows the generation of high quality images for publications with a click. Where available, MobiDB annotation is projected directly onto the structure and shown in a new 3D viewer. The look and feel and organization of the page and loading latency were also improved.
Figure 1.

Overview of different annotation data types (A) and levels of accuracy (B) in MobiDB 3.0.

Overview of different annotation data types (A) and levels of accuracy (B) in MobiDB 3.0.

New curated data

MobiDB 3.0 includes different sources of manually curated disorder annotations (Table 1). These annotations fall into two categories: disorder and LIPs. LIPs are binding regions presumed or demonstrated to be intrinsically disordered that fold upon binding. These come under different names such as SLiMs or MoREs (molecular recognition elements) in the literature. The IDEAL database calls them ‘protean’ segments (ProS) (22). MobiDB includes both ‘verified’ and ‘possible’ ProS from IDEAL, where verified means disorder has been experimentally observed in the isolated molecule. The Database of Disordered Binding Sites (DIBS, (23)) collects cases where a disordered region folds upon binding with a globular domain and the Mutual Folding Induced by Binding (MFIB, (24)) database includes disordered regions that fold upon binding with another disordered region. ELM (25) provides SLiM annotations involved in binding and post-translational modifications. General disorder annotation, i.e. without any knowledge about transition driven by interactions, is collected from UniProtKB (36), DisProt (17) and FuzDB (29). UniProtKB provides manually curated disorder annotations under the region field in the features section. FuzDB collects cases of fuzzy complexes, where conformational diversity has a functional role in the regulation and formation of protein complexes or higher-order assemblies. DisProt has been recently revamped and MobiDB now propagates DisProt disordered regions by homology transfer. Regions homologous to experimentally characterized IDRs are mapped across homologs obtained from GeneTree alignments (37). Regions with identity and similarity >80% and an alignment of at least 10 residues are retained as homologous IDRs. Gene3D (38) contributes complementary order annotation to the MobiDB consensus calculation, while Pfam (39) is used to highlight protein domains. Lastly MobiDB also maps CoDNaS information to highlight conformation diversity in globular regions. CoDNaS measures structural differences among conformers of the same protein (40).
Table 1.

Overview of databases integrated into MobiDB 3.0

DatabaseTypeCommentURL
UniProtCuratedDisorder http://www.uniprot.org/
DisProtCuratedDisorder http://www.disprot.org/
FuzDBCuratedDisorder http://protdyn-database.org/
ELMCuratedLIPs http://elm.eu.org/
MFIBCuratedLIPs http://mfib.enzim.ttk.mta.hu/
DIBSCuratedLIPs http://dibs.enzim.ttk.mta.hu/
IDEALCuratedLIPs http://www.ideal.force.cs.is.nagoya-u.ac.jp/IDEAL/
Gene3DCurated/PredictionStructure http://gene3d.biochem.ucl.ac.uk/
PfamCurated/PredictionDomains/Families http://pfam.xfam.org/
CoDNaSIndirectConformational diversity http://ufq.unq.edu.ar/codnas/

New indirect annotations

Previous releases of MobiDB provided indirect annotations from the PDB through missing residues in X-ray structures and mobile regions from NMR ensembles as calculated with the Mobi software (41). In the current release, this annotation has been complemented with additional indirect information from experimental data in the PDB and chemical shifts from the Biological Magnetic Resonance Data Bank (BMRB) (35). The new Mobi 2.0 software (42) is used to extract LIPs and disorder information from PDB files. Disorder is encoded by three different parameters: high-temperature, missing and mobile residues. High-temperature residues are detected from B-factor regions for X-ray and cryo-EM structures using a threshold proportional to the resolution of the structure. Missing residues are available for all experimental types and obtained comparing the experimental sequence (i.e. PDB SEQRES entries) with the observed residues in the structure (i.e. PDB ATOM entries). A mobility estimate is provided for NMR structures by comparing C displacement and local conformations in different aligned models (41). LIPs are identified by comparing intra- versus inter-chain contacts calculated using RING (43). The closest atoms between two residues are used to establish a contact which is then distinguished by chemical type (e.g. hydrogen bond, salt bridge, π−π stack). LIPs are identified as any region where the number of inter-chain contacts is at least two times the number of intra-chain contacts (42). MobiDB 3.0 better exploits the power of NMR spectroscopy to probe the structural properties of proteins in solution, as well as their dynamics on a wide range of timescales (44). Chemical shifts quantify structural fluctuations of proteins up to the millisecond timescale and are relatively easy to measure. Using chemical shifts to obtain information about the statistical populations of different structural motifs allows for a more comprehensive structural description of proteins in solution than static structures or binary definitions such as ‘ordered’ and ‘disordered’ (44). MobiDB 3.0 uses chemical shift data from BMRB directly as reported without applying chemical shift re-referencing methods. The software packages δ2D (45) and Random Coil Index (RCI) (46) are used to calculate two-dimensional ensembles in terms of secondary structure populations (44) and backbone flexibility. Secondary structure populations are calculated only for residues with at least three atom types with measured chemical shifts, as using fewer chemical shifts results in less accurate mappings of the populations (45). MobiDB 3.0 reports the experimental conditions at which the chemical shifts were measured as the structural properties of some proteins can change drastically between different conditions (e.g. binding partners, lipids, pH) and these can help elucidate protein function (44). When an entry in MobiDB is associated to multiple chemical shifts, an overview of the predominant secondary structure conformation is provided in a consensus track. This can be expanded in the feature viewer to show experimental conditions such as pH, temperature, binding partners, molecular state, sample information and the title of the corresponding BMRB entries.

New predictors

MobiDB 3.0 includes the same set of disorder predictors used in the previous release: ESpritz (47), IUpred (48), DisEMBL (49) and VSL2b (50). Consensus generation is handled by MobiDB-lite (51), which uses a stronger majority threshold and enforces at least 20 consecutive disordered residues to provide highly specific predictions. This is completed by a continuous representation of the fraction of methods predicting disorder for each residue. DynaMine (52), Anchor (53) and FeSS (54) are now also part of the annotation pipeline. DynaMine (52) predicts backbone flexibility where 1.0 means complete order (stable conformation, i.e. rigid) and 0 means fully random bond vector movement (highly dynamic, i.e. flexible). Anchor predicts binding regions located in disordered proteins, providing LIP annotations for all proteins in the database. FeSS is a component of the FELLS method (54) providing three-state (helix, sheet, coil) secondary structure propensity. FeSS prediction confidence can be interpreted similarly to the dynamic behavior measured by δ2D in chemical shifts, i.e. a propensity to remain in a given state of secondary structure. The complete list of tools is available in Table 2.
Table 2.

Overview of tools used into MobiDB 3.0

ToolTypeDescription
Mobi 2.0IndirectMissing, high-temperature and mobile residues from PDB structures
RING 2.0IndirectResidue interactions from PDB structures, used to define LIPs
RCIIndirectRandom coil index from BMRB chemical shifts
δ2DIndirectSecondary structure populations from BMRB chemical shifts
DynaMinePredictionRandom coil index
FeSSPredictionSecondary structure prediction component of FELLS
MobiDB-litePredictionLong disorder based on consensus
DisEMBLPredictionDisorder. Versions: 465, Hot-loops
ESpritzPredictionDisorder. Versions: DisProt, NMR, X-ray
IUPredPredictionDisorder. Versions: Short, Long
VSL2bPredictionDisorder
GlobPlotPredictionGlobular regions, used as opposite of disorder
SEGPredictionLow complexity
PfiltPredictionLow complexity
The MobiDB-lite version used in MobiDB 3.0 has been extended to provide a structural characterization of the disorder regions that can help interpret their functional role. It distinguishes different types of disordered regions by measuring the fraction of charged residues and net charge according to a previous classification (55). The different types are: positive polyelectrolites (D_PPE), negative polyelectrolites (D_NPE), polyampholites (D_PA) and weak polyampholites (D_WC). A statistical analysis of the different disorder flavours was already performed on the MobiDB 2.0 data (8).

Usage and annotated data

MobiDB now contains all sequences from UniParc, the most comprehensive non-redundant set of protein sequences. Entries are identified also by UniProtKB (36) accession numbers and can be retrieved by organism, taxonomy and other identifiers provided by UniProtKB. Prediction results are combined with indirect disorder evidences derived from PDB data (using Mobi 2) and data extracted from manually curated third party databases. MobiDB annotations are used by DisProt (17) curators to guide the annotation of disorder regions. MobiDB data is made available to the public via a web interface allowing extensive search functionalities and RESTful services for programmatic access. MobiDB 3.0 includes a pre-calculated consensus for all entries allowing real-time statistics and download of entire datasets in different formats directly from the web interface. The new database schema makes it possible to perform complex search queries and to generate custom datasets, for example retrieving all entries with manually curated annotations. The MobiDB update has been automatized and is scheduled every three months due to the high computational cost of generating predictions for new sequences.

DISCUSSION

MobiDB 3.0 improves on previous releases by adding descriptions of conformational diversity and disorder-related functions, both in terms of experimental data and predictions. A particular field where it may have a significant impact is the establishment of a long-awaited disorder sequence-function relationship schema. The most reliable proxy to this goal is to assess the function of a protein by homology transfer, i.e. transferring functional annotation based on sequence similarity. Aligning IDR sequences is complicated by their high evolutionary variability and often limits evolutionary analysis (56,57). New functional terms introduced in the DisProt update (17), represent non-canonical functions probably only characteristic of IDPs which are not incorporated in functional classification schemes such as GO (58). A large-scale analysis of IDP functional annotations will be necessary to find adequate boundaries for transferring IDP functions by homology. As sufficient data is now available in MobiDB 3.0, we expect a rapid advance in the field of sequence-function correlations of IDPs. For proteins with sufficient NMR data, MobiDB now features quantitative annotations incorporating structure and equilibrium dynamics in a unified framework. These large-scale quantitative annotations will help understand the biological role of order and disorder, and serve as a basis to construct predictive models. As NMR measurements of proteins in their native complex environments, such as inside living cells, are becoming more common (59), we will be able to address fundamental biological questions with greater physiological relevance (60). MobiDB is widely used by scientific community and by third party services such as DisProt (17) and ProViz (61). It has recently joined the InterPro consortium to provide disorder annotation alongside protein domains and families (62). MobiDB is becoming a thematic hub for IDPs inside the European sustainable bioinformatics infrastructure (ELIXIR) and we encourage contributions of novel predictors and datasets. Future work will focus on including IDP annotations into core data resources such as UniProt.
  61 in total

1.  Determination of secondary structure populations in disordered states of proteins using nuclear magnetic resonance chemical shifts.

Authors:  Carlo Camilloni; Alfonso De Simone; Wim F Vranken; Michele Vendruscolo
Journal:  Biochemistry       Date:  2012-03-06       Impact factor: 3.162

Review 2.  Liquid-liquid phase separation in biology.

Authors:  Anthony A Hyman; Christoph A Weber; Frank Jülicher
Journal:  Annu Rev Cell Dev Biol       Date:  2014       Impact factor: 13.827

Review 3.  Intrinsically disordered proteins in cellular signalling and regulation.

Authors:  Peter E Wright; H Jane Dyson
Journal:  Nat Rev Mol Cell Biol       Date:  2015-01       Impact factor: 94.444

Review 4.  Protein Data Bank (PDB): The Single Global Macromolecular Structure Archive.

Authors:  Stephen K Burley; Helen M Berman; Gerard J Kleywegt; John L Markley; Haruki Nakamura; Sameer Velankar
Journal:  Methods Mol Biol       Date:  2017

5.  Conformational propensities of intrinsically disordered proteins influence the mechanism of binding and folding.

Authors:  Munehito Arai; Kenji Sugase; H Jane Dyson; Peter E Wright
Journal:  Proc Natl Acad Sci U S A       Date:  2015-07-20       Impact factor: 11.205

Review 6.  Single-Particle Cryo-EM at Crystallographic Resolution.

Authors:  Yifan Cheng
Journal:  Cell       Date:  2015-04-23       Impact factor: 41.582

Review 7.  Intrinsically disordered proteins in human diseases: introducing the D2 concept.

Authors:  Vladimir N Uversky; Christopher J Oldfield; A Keith Dunker
Journal:  Annu Rev Biophys       Date:  2008       Impact factor: 12.981

8.  Gene3D: a domain-based resource for comparative genomics, functional annotation and protein network analysis.

Authors:  Jonathan Lees; Corin Yeats; James Perkins; Ian Sillitoe; Robert Rentzsch; Benoit H Dessailly; Christine Orengo
Journal:  Nucleic Acids Res       Date:  2011-12-01       Impact factor: 16.971

9.  InterPro in 2017-beyond protein family and domain annotations.

Authors:  Robert D Finn; Teresa K Attwood; Patricia C Babbitt; Alex Bateman; Peer Bork; Alan J Bridge; Hsin-Yu Chang; Zsuzsanna Dosztányi; Sara El-Gebali; Matthew Fraser; Julian Gough; David Haft; Gemma L Holliday; Hongzhan Huang; Xiaosong Huang; Ivica Letunic; Rodrigo Lopez; Shennan Lu; Aron Marchler-Bauer; Huaiyu Mi; Jaina Mistry; Darren A Natale; Marco Necci; Gift Nuka; Christine A Orengo; Youngmi Park; Sebastien Pesseat; Damiano Piovesan; Simon C Potter; Neil D Rawlings; Nicole Redaschi; Lorna Richardson; Catherine Rivoire; Amaia Sangrador-Vegas; Christian Sigrist; Ian Sillitoe; Ben Smithers; Silvano Squizzato; Granger Sutton; Narmada Thanki; Paul D Thomas; Silvio C E Tosatto; Cathy H Wu; Ioannis Xenarios; Lai-Su Yeh; Siew-Yit Young; Alex L Mitchell
Journal:  Nucleic Acids Res       Date:  2016-11-29       Impact factor: 16.971

10.  Prediction of protein binding regions in disordered proteins.

Authors:  Bálint Mészáros; István Simon; Zsuzsanna Dosztányi
Journal:  PLoS Comput Biol       Date:  2009-05-01       Impact factor: 4.475

View more
  86 in total

1.  Nuclear condensates of the Polycomb protein chromobox 2 (CBX2) assemble through phase separation.

Authors:  Roubina Tatavosian; Samantha Kent; Kyle Brown; Tingting Yao; Huy Nguyen Duc; Thao Ngoc Huynh; Chao Yu Zhen; Brian Ma; Haobin Wang; Xiaojun Ren
Journal:  J Biol Chem       Date:  2018-12-04       Impact factor: 5.157

2.  A Unified De Novo Approach for Predicting the Structures of Ordered and Disordered Proteins.

Authors:  John J Ferrie; E James Petersson
Journal:  J Phys Chem B       Date:  2020-06-11       Impact factor: 2.991

3.  Where differences resemble: sequence-feature analysis in curated databases of intrinsically disordered proteins.

Authors:  Marco Necci; Damiano Piovesan; Silvio C E Tosatto
Journal:  Database (Oxford)       Date:  2018-01-01       Impact factor: 3.451

4.  Calcium-binding proteins that are type B″ regulatory subunits of phosphatase 2A in Giardia intestinalis.

Authors:  Magda E Alvarado; Claudia Rubiano; William Sánchez; Andrea Díaz; Moisés Wasserman
Journal:  Parasitol Res       Date:  2018-07-24       Impact factor: 2.289

5.  The Intrinsically Disordered Protein CARP9 Bridges HYL1 to AGO1 in the Nucleus to Promote MicroRNA Activity.

Authors:  Ariel H Tomassi; Delfina A Re; Facundo Romani; Damian A Cambiagno; Lucía Gonzalo; Javier E Moreno; Agustin L Arce; Pablo A Manavella
Journal:  Plant Physiol       Date:  2020-07-07       Impact factor: 8.340

6.  Prediction of secondary structure population and intrinsic disorder of proteins using multitask deep learning.

Authors:  Xu Ying; Andre Leier; Tatiana T Marquez-Lago; Jue Xie; Antonio Jose Jimeno Yepes; James C Whisstock; Campbell Wilson; Jiangning Song
Journal:  AMIA Annu Symp Proc       Date:  2021-01-25

7.  IDPology of the living cell: intrinsic disorder in the subcellular compartments of the human cell.

Authors:  Bi Zhao; Akila Katuwawala; Vladimir N Uversky; Lukasz Kurgan
Journal:  Cell Mol Life Sci       Date:  2020-09-30       Impact factor: 9.261

Review 8.  Dynamic conformational flexibility and molecular interactions of intrinsically disordered proteins.

Authors:  Anil Bhattarai; Isaac Arnold Emerson
Journal:  J Biosci       Date:  2020       Impact factor: 1.826

9.  DISOselect: Disorder predictor selection at the protein level.

Authors:  Akila Katuwawala; Christopher J Oldfield; Lukasz Kurgan
Journal:  Protein Sci       Date:  2019-11-07       Impact factor: 6.725

10.  RNA nucleation by MSL2 induces selective X chromosome compartmentalization.

Authors:  Claudia Isabelle Keller Valsecchi; M Felicia Basilicata; Plamen Georgiev; Aline Gaub; Janine Seyfferth; Tanvi Kulkarni; Amol Panhale; Giuseppe Semplicio; Vinitha Manjunath; Herbert Holz; Pouria Dasmeh; Asifa Akhtar
Journal:  Nature       Date:  2020-11-18       Impact factor: 49.962

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.