Literature DB >> 33963857

Proteo3Dnet: a web server for the integration of structural information with interactomics data.

Guillaume Postic^1,2, Jessica Andreani³, Julien Marcoux⁴, Victor Reys⁵, Raphaël Guerois³, Julien Rey¹, Emmanuelle Mouton-Barbosa⁴, Yves Vandenbrouck⁶, Sarah Cianferani⁷, Odile Burlet-Schiltz⁴, Gilles Labesse⁵, Pierre Tufféry¹.

Abstract

Proteo3Dnet is a web server dedicated to the analysis of mass spectrometry interactomics experiments. Given a flat list of proteins, its aim is to organize it in terms of structural interactions to provide a clearer overview of the data. This is achieved using three means: (i) the search for interologs with resolved structure available in the protein data bank, including cross-species remote homology search, (ii) the search for possibly weaker interactions mediated through Short Linear Motifs as predicted by ELM-a unique feature of Proteo3Dnet, (iii) the search for protein-protein interactions physically validated in the BioGRID database. The server then compiles this information and returns a graph of the identified interactions and details about the different searches. The graph can be interactively explored to understand the way the core complexes identified could interact. It can also suggest undetected partners to the experimentalists, or specific cases of conditionally exclusive binding. The interest of Proteo3Dnet, previously demonstrated for the difficult cases of the proteasome and pragmin complexes data is, here, illustrated in the context of yeast precursors to the small ribosomal subunits and the smaller interactome of 14-3-3zeta frequent interactors. The Proteo3Dnet web server is accessible at http://bioserv.rpbs.univ-paris-diderot.fr/services/Proteo3Dnet/.

Entities: Chemical Disease Gene Species

Year: 2021 PMID： 33963857 PMCID： PMC8262742 DOI： 10.1093/nar/gkab332

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

MS-based proteomics techniques such as combination of affinity purification and mass spectrometry (AP-MS) (1,2), aim to identify sets of proteins that interact to fulfill cellular functions. In such an experiment, a protein of interest (the ‘bait’) is co-precipitated along with its bound partners (the ‘preys’). After elution, those interacting proteins, as well as their relative amounts, are determined by liquid chromatography coupled to tandem mass spectrometry (LC–MS/MS). The resulting list of identified proteins thus defines the interactome of the bait. In this process, the main drawback is the loss of information about the network organization of protein–protein interactions (PPIs). Thus, proteomics experiments generate ‘flat’ lists of candidate partners as an output. To a lesser extent, information loss may also take the form of false negatives, i.e. proteins of the targeted interactome that are found missing in the experimental results. Misleading hits may also occur, for example when two mutually exclusive subunits of a multiprotein complex are both included in the output list. Large-scale methods do not inform either about the binding strength in PPIs, as permanent and transient complexes are indistinct in the flat lists. Knowing the types of the amino acid residues located at the protein–protein interface would also be particularly valuable, regarding applications such as thorough validation, protein engineering or drug design. Unfortunately, proteomics data do not provide atomic-level knowledge about the molecular mechanisms through which proteins interact. Finally, besides proteomics, it should be noted here that such flat lists may be produced by computational studies, e.g. when considering proteins whose genes are located within the same genomic region as candidate partners of a multimeric complex. Establishing connectivity between the different proteins of a proteomics-derived list can hardly be carried out manually, as it requires processing large amounts of data from various sources. With this regard, three web servers are currently available to assist biologists in this task. The STRING database (https://string-db.org/) (3), one of the most cited resources for building PPI networks, is a precomputed global resource including both physical interactions as well as functional associations. It uses multiple types of information, such as genetic interactions, text mining, or experimentally and predicted PPIs from other databases. The GeneMANIA web server (http://genemania.org) (4) uses lists of gene ID as input to perform functional annotation and prediction, by integrating data from PPI databases such as BioGRID (https://thebiogrid.org/) (5) or PathwayCommons (www.pathwaycommons.org) (6). Lastly, Interactome3D (https://interactome3d.irbbarcelona.org/) (7), introduced the use of protein three-dimensional (3D) structures for modeling the protein complexes of a given organism and, thus, may shed light on the protein residues involved in a given interaction. It also includes information retrieved from external resources, such as IntAct (https://www.ebi.ac.uk/intact/) (8). These three web resources are capable of finding interologs, i.e. complex-forming partners predicted on the basis of the interaction between their respective homologous proteins, providing that this interaction has been conserved throughout evolution. This task is performed either through identical gene names across different species or, in the case of Interactome3D, by homology search using BLAST (9). Finally, other useful tools for studying protein complexes can be cited, such as PrePPI (10), ProtCID (11), ComplexPortal (12), CORUM (13), and the XlinkDB database and software tools, specifically dedicated to cross-linking data analysis (14). However, these cannot be compared to the aforementioned servers, as most fall in the database category and focus on different aspects of PPIs. PPI3D (15) is another tool to search for interologs with a 3D interface modeling perspective, but is presently limited to only two sequences. Recently, we have proposed Proteo3Dnet (16), a new integrative pipeline which stands out from the state of the art mainly by its detection of distant homologies and combination of domain-domain interactions as well as motif-domain contacts. We have shown that our method can identify some direct interaction partners to which proteomics techniques and other algorithms are blind. A unique feature of Proteo3Dnet is the integration of information about transient interactions involving Pfam (17) domains and short linear motifs (SLiMs) located in intrinsically disordered regions. Here, we present a web implementation of Proteo3Dnet, which allows for rapid detection of PPI networks from an input list of proteins, and their online exploration.

Data integration

Given a set of protein sequences (bait and preys), Proteo3Dnet provides end-users with an integrative protocol aimed at validating and connecting candidate protein partners detected by interactomics experiments; the information is organized and structured using three means:

Search for complexes of resolved structure

The Proteo3Dnet core processing consists of 4 steps. Search for homologs in the Protein Data Bank using HHsearch. We use the precalculated banks available from the HHsuite repository, in which a non-redundant subset of PDB entries at 70% sequence identity is maintained. To reduce the chance to identify irrelevant homologs, only hits associated with a probability of >95%, a cutoff, that in our experience fits well the objective of lowering the risk to miss some remote templates while identifying the relevant ones. Proteo3Dnet then enlarges the collection of hits with the protein chains belonging to the same cluster. Because the initial search is performed on a subset of the PDB and because it could occur that non-representatives of the clusters correspond to PDB entries containing the structures of complexes of interest, this step extends the search to all homologs of the PDB. Complex identification is based on the search for PDB entries for which several—if not all—chains of the input are homologous. To avoid incorrect complex assignment that could occur in the unit cell of the crystals, the search is performed considering the biological unit specified by the authors, or ranked one otherwise, as provided by the PDB entries in the mmCIF format. Information about homo-oligomers is also collected from these entries. Complexes are then ranked according to the number of chains of the input they encompass, and conditional chain binding stable core complexes are identified. We emphasize that this search procedure covers the entirety of the PDB and, thus, is not per se biased by the number of structures of complexes that are evolutionarily conserved, neither is it regarding species specificity. It simply collects and organizes all the structural information available. Note that in the web server implementation, it is possible to bypass this step by directly querying the Swiss Model Repository (SMR) (18). However, since the SMR is limited to high enough confidence models, this facility usually leads to fewer hits than the Proteo3Dnet standard protocol. It can nevertheless provide a quick overview of the organization of data.

Search for weak/transient interactions

Previous processing corresponds to the identification of complexes at rather high affinity. It is however well known that weaker and more transient functional interactions can occur. Particularly in eukaryotes, these can be mediated by Short Linear Motifs (SLiMs) (19). These are usually located in disordered regions of proteins, often in missing parts of resolved structures. To overcome this limitation, Proteo3Dnet relies on the ELM database that links the motifs with interacting Pfam domains. Occurrences of pairs involving SLiMs-Pfam domains correspond to potential PPIs. The ELM database currently gathers 291 linear motifs (‘ELM classes’) which are reported to bind one or several Pfam domains, according to published experiments (‘ELM instances’). To identify transient interactions, Proteo3Dnet searches for these motifs and their corresponding Pfam domains in the proteins of the input list. The resulting connections are then filtered, based on the location of the motifs within intrinsically disordered regions, as predicted by the ANCHOR2 score of IUPred2A (20), computed and averaged for all positions of the motif. The default value of 0.95 can be raised (up to 1.0) in the ‘Advanced options’, to increase the specificity of these predicted transient PPIs, thus reducing their number.

Additional sources of information

For cases where no 3D structure can be found, the pipeline integrates interaction data from the Biological General Repository for Interaction Data sets (BioGRID) (5). Only interactions characterized by physical (not genetic) interactions between proteins are retrieved, though we discarded those associated with the ‘Far Western’, ‘Co-fractionation’, ‘Co-localization’, ‘Biochemical Activity’ and ‘High Through-put’ experimental systems.

Implementation

Server input

The Proteo3Dnet input consists of a specification of the sequences identified during an interactome experiment. These can be specified either in the form of the UniProt identifiers or full UniProt sequences in the FASTA format. Note that presently, it is not possible to enter data without specifying the UniProt identifier.

Output

The outputs consist of three main sections, that encompass interactive graph exploration, tables reporting information about the hetero- and homo-multimers identified, along with raw information on the homologs identified, and for each input protein, a list of experimentally verified interactions from the IntAct database (8) —which may include additional protein partners—presented as an interactive PPIs matrix (https://github.com/ebi-uniprot/interaction-viewer/). We focus here on the main output that corresponds to the interactive exploration of the groups of sequences identified as in interaction (Figure 1A). This facility is based on cytoscape.js, a JavaScript library for interactive network visualization (https://js.cytoscape.org/) (21).

Figure 1.

Proteo3Dnet results presentation. (A) The interaction graph can be explored interactively using a javascript viewer adapted from cytoscape.js. The top panel provides access to node identification, selection and display modes. Right clicking the nodes or the edges provides access to further information about the sequences, the structures of the complexes identified (B) and the details of the homology search leading to complex identification (C). The graph consists of nodes that correspond to proteins and edges that represent the interactions between them. Three types of nodes are distinguished in the graph and correspond to: (i) input proteins, (ii) the proteins not present in the input but in interaction with those in at least one complex structure, (iii) proteins with interaction suggested by BioGRID (see upper). In terms of AP-MS, the second type can correspond to either proteins that could be detected during the experiment but were for instance below the cutoff, or to proteins that interact conditionally with a ‘core’ complex. Different levels of interaction are proposed: The first one concerns the ability to simplify graph visualization and facilitate its exploration. Different mechanisms for node and edges selection and visualization are proposed. Visualization relies on three levels of visibility (hidden, background, foreground), that, coupled with selection makes it possible to simplify the visualization to obtain a simplified presentation of the information of interest. Selection encompasses interactive selection using the mouse, and higher level and more complex selections using text boxes to specify proteins (nodes), identified complexes or PDBs. Higher levels of selection include selection by edge type (structurally stable, weak and available in BioGRID), expansion to the neighbors of the visible nodes, or the identification of paths linking distant nodes in the graph. Selections can be assigned to each of the different levels of visibility. Additional mechanisms provide means to handle node/edge display and position in the display, e.g. using the mouse. A second one concerns backpropagation to the information available concerning the interactions. Nodes and edges are right-clickable to open pop-up menus. From these menus, it is possible to rapidly get information about the sequences (UniProt), structures (PDB, MolArt (22)) (Figure 1B) and the raw information about the complexes detected (Figure 1C). Similarly, right click on the edges provides access to the PDB complexes in which the nodes are seen together or to ELM motif(s) connecting the two nodes.

Runtime

Runtime depends on the number of sequences and the complexity of the graph generated. It typically ranges from several minutes for small datasets of less than 20 sequences to up to several tens of minutes when the sequence number exceeds one hundred. For example, the 62 proteins of the Pragmin dataset (23) were processed in 15 minutes. For the larger proteasome 20S dataset (n = 192) (24), the calculations were completed in only 20 min. These times were obtained using the ‘Normal’ (i.e. slowest) mode of the server. To avoid very long runs, the maximum number of input proteins currently accepted is 400.

USE-CASES

Advantages of our protocol over the other approaches have been recently discussed through two case studies from published AP-MS data (16). Here we introduce two new case studies showing how Proteo3Dnet can assist users to analyse their data.

Use case of yeast precursors to the small ribosomal subunits

We have queried Proteo3Dnet with a list of interactants obtained by tandem-affinity purification of Saccaromyces cerevisiae precursors to the small ribosomal subunits (hereafter termed pre-40S particles; (25). These 62 proteins include highly abundant 39 Ribosomal Proteins of the Small ribosomal subunit (RPS) and 8 ribosome biogenesis factors (RBFs) together with 14 much less abundant Ribosomal Proteins of the Large ribosomal subunit (RPL), and the Lrg1 protein. Proteo3Dnet, identified templates for all the 62 proteins (Figure 2A). 56 of them were identified in Saccharomyces cerevisiae. There is currently no structure of yeast Dim1 and Nob1 RBFs and these were matched to their human (PDB: 1ZQ9) and archaeal (PDB: 2LCQ) counterparts. A human model (PDB: 3EAP) was also proposed for Lrg1. It is important to note that the presence of RNA molecules within ribosomal particles (or any other type of structuring molecule) is not an issue for our pipeline, since it only considers proteins.

Figure 2.

Use cases. (A, B) Use case of S. cerevisiae precursors to the small ribosomal subunits. (A) Snapshot of Proteo3Dnet showing the 62 proteins identified by AP-MS of pre-40S particles (blue), together with potential interactors from BioGrid analysis (green circles). (B) Structural representation of a eukaryotic cytoplasmic pre-40S ribosomal subunit (PDB: 6FAI) (28). Figure generated in UCSF ChimeraX v.0.9 (29). (C) Use case of 14–3–3zeta frequent interactors. The small subset of interactors of 14–3–3zeta is shown within the cytoscape.js viewer of Proteo3Dnet. 3D based interfaces are highlighted by thicker blue edges and the ELM connection by a thinner blue edge. For the latter, a popup window shows the type of motif identified. Of the 145 complexes retrieved, ∼80 correspond to the full ribosome (either from yeast, as c006, or from other related species like c130 (human), c144 (M. tuberculosis) or c065 (pig). They do not correspond but rather include the pre-40S particles, with low maximal completeness—from 7% (c107) to 60% (c001). On the contrary, ∼60 complexes with maximal completeness ranging from 30% (c016) to 100% (c009) and corresponding to small ribosomal subunits were identified, either from yeast (as c009) or other species. The c009 actually corresponds to the S. cerevisiae pre-40S particle (PDB: 6FAI) and served as a positive control here (Figure 2B). It contains 35 (out of 62) entries with 100% sequence identity. The c027 is a subcomplex corresponding to the dimeric Enp1/Ltv1 RBFs (PDB: 5WWO). Finally, a couple of complexes (c095, c098…), matched large ribosomal subunits that were co-purified traces rather than true partners of the pre-40S particles. Very similar results were obtained on this example using interactome3D although some discrepancies exist. For example, only one interaction is found for Rps6 (with Rps4A) in interactome3D, whereas Proteo3Dnet reports interactions with all the other 38 RPS. In interactome3D, the model of Rio2 is based on a distant similarity with an archeal protein (PDB: 4GYI) while the 3D structure from S. cerevisiae is known (PDB: 6RBD) (26). In addition, this server provides no model for Nob1 while Proteo3Dnet does, using the NMR structure of an archaeal Nob1 (PDB: 2LCQ). Hence, the two servers appeared rather complementary on this example due to small variations, although they both clearly matched the macromolecular assemblage of the yeast small ribosomal subunit.

Case study of 14-3–3zeta frequent interactors

BioGRID lists three partners (Raf1, Bad and Tau) that are detected >12 times in interaction with the human 14–3-3zeta protein. Complete or partial structures are available for all of them and we interrogated Interactome3D and Proteo3Dnet to highlight their likely interfaces. Domain–domain interactions are suggested by the two servers to derive complexes for 14–3–3zeta and Raf1 or 14–3–3zeta and Tau based on known crystal structures (at 100% and 87% of sequence identity, respectively). While Interactome3D pinpoints experimental interactions listed in Intact for 14–3–3zeta and Bad, no structural clue for those interactions is provided. Thanks to an ELM motif for 14–3–3 binding, a structural connection is provided by Proteo3Dnet (Figure 2C). Noteworthy, Bad contains up to 6 canonical 14–3-3 recognition motifs (ELM E-value ∼ 0.0045) among which four corresponds to serine phosphorylation sites according to PhosphoSitePlus (https://www.phosphosite.org/). In conclusion, all the detected 3D interactions with Raf1, Tau and Bad, correspond to recognition of similar phosphosites by the 14–3-3zeta proteins and this suggests competition for the same binding groove.

CONCLUSIONS AND PERSPECTIVES

Protein−protein interactions play a major role in the molecular machinery of life, and their identification through techniques such as AP-MS is highly beneficial. Proteo3Dnet is an on-line facility to assist their analysis. Performing on-the-fly analysis, being not limited to a per species perspective, it proposes a complementary resource to well established tools such as Interactome3D, GeneMania or STRING. It also embeds unique features such as analysis in terms of transient interactions that are out of the scope of former tools. In the context of the rapidly evolving field of PPI identification and analysis, several evolutions of Proteo3Dnet can be foreseen. Firstly, its detection of remote interologs clearly opens the door to the development of the comparative modeling of protein complexes that could contribute to servers such as InterEvDock3 (27). Secondly, Proteo3Dnet is not per se limited to AP-MS data, though several adaptations could be required to make the current implementation independent of the UniProt identifiers.

29 in total

Review 1. Affinity purification-mass spectrometry. Powerful tools for the characterization of protein complexes.

Authors: Andreas Bauer; Bernhard Kuster
Journal: Eur J Biochem Date: 2003-02

2. Subcellular distribution and dynamics of active proteasome complexes unraveled by a workflow combining in vivo complex cross-linking and quantitative proteomics.

Authors: Bertrand Fabre; Thomas Lambour; Julien Delobel; François Amalric; Bernard Monsarrat; Odile Burlet-Schiltz; Marie-Pierre Bousquet-Dubouch
Journal: Mol Cell Proteomics Date: 2012-12-13 Impact factor: 5.911

3. Interactome3D: adding structural details to protein networks.

Authors: Roberto Mosca; Arnaud Céol; Patrick Aloy
Journal: Nat Methods Date: 2012-12-16 Impact factor: 28.547

4. XLinkDB 2.0: integrated, large-scale structural analysis of protein crosslinking data.

Authors: Devin K Schweppe; Chunxiang Zheng; Juan D Chavez; Arti T Navare; Xia Wu; Jimmy K Eng; James E Bruce
Journal: Bioinformatics Date: 2016-04-29 Impact factor: 6.937

5. IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding.

Authors: Bálint Mészáros; Gábor Erdos; Zsuzsanna Dosztányi
Journal: Nucleic Acids Res Date: 2018-07-02 Impact factor: 16.971

6. MolArt: a molecular structure annotation and visualization tool.

Authors: David Hoksza; Piotr Gawron; Marek Ostaszewski; Reinhard Schneider
Journal: Bioinformatics Date: 2018-12-01 Impact factor: 6.937

7. GeneMANIA update 2018.

Authors: Max Franz; Harold Rodriguez; Christian Lopes; Khalid Zuberi; Jason Montojo; Gary D Bader; Quaid Morris
Journal: Nucleic Acids Res Date: 2018-07-02 Impact factor: 16.971

8. ProtCID: a data resource for structural information on protein interactions.

Authors: Qifang Xu; Roland L Dunbrack
Journal: Nat Commun Date: 2020-02-05 Impact factor: 14.919

9. PrePPI: a structure-informed database of protein-protein interactions.

Authors: Qiangfeng Cliff Zhang; Donald Petrey; José Ignacio Garzón; Lei Deng; Barry Honig
Journal: Nucleic Acids Res Date: 2012-11-27 Impact factor: 16.971

10. CORUM: the comprehensive resource of mammalian protein complexes-2019.

Authors: Madalina Giurgiu; Julian Reinhard; Barbara Brauner; Irmtraud Dunger-Kaltenbach; Gisela Fobo; Goar Frishman; Corinna Montrone; Andreas Ruepp
Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971

3 in total

1. InterEvDock3: a combined template-based and free docking server with increased performance through explicit modeling of complex homologs and integration of covariation-based contact maps.

Authors: Chloé Quignot; Guillaume Postic; Hélène Bret; Julien Rey; Pierre Granger; Samuel Murail; Pablo Chacón; Jessica Andreani; Pierre Tufféry; Raphaël Guerois
Journal: Nucleic Acids Res Date: 2021-07-02 Impact factor: 16.971

Review 2. QSalignWeb: A Server to Predict and Analyze Protein Quaternary Structure.

Authors: Sucharita Dey; Jaime Prilusky; Emmanuel D Levy
Journal: Front Mol Biosci Date: 2022-01-05

3. Localpdb- a Python package to manage protein structures and their annotations.

Authors: Jan Ludwiczak; Aleksander Winski; Stanislaw Dunin-Horkawicz
Journal: Bioinformatics Date: 2022-02-23 Impact factor: 6.931

3 in total