Literature DB >> 30931570

A Network Module for the Perseus Software for Computational Proteomics Facilitates Proteome Interaction Graph Analysis.

Abstract

Proteomics data analysis strongly benefits from not studying single proteins in isolation but taking their multivariate interdependence into account. We introduce PerseusNet, the new Perseus network module for the biological analysis of proteomics data. Proteomics is commonly used to generate networks, e.g., with affinity purification experiments, but networks are also used to explore proteomics data. PerseusNet supports the biomedical researcher for both modes of data analysis with a multitude of activities. For affinity purification, a volcano-plot-based statistical analysis method for network generation is featured which is scalable to large numbers of baits. For posttranslational modifications of proteins, such as phosphorylation, a collection of dedicated network analysis tools helps in elucidating cellular signaling events. Co-expression network analysis of proteomics data adopts established tools from transcriptome co-expression analysis. PerseusNet is extensible through a plugin architecture in a multi-lingual way, integrating analyses in C#, Python, and R, and is freely available at http://www.perseus-framework.org .

Entities: Chemical Disease Gene Species

Keywords: Perseus; computational proteomics; network analysis

Mesh：

Substances：

Year: 2019 PMID： 30931570 PMCID： PMC6578358 DOI： 10.1021/acs.jproteome.8b00927

Source DB: PubMed Journal: J Proteome Res ISSN： 1535-3893 Impact factor: 4.466

Introduction

The study of complex systems[1] is concerned with the question of how the relationships between the parts of a system give rise to its collective behavior. Complex systems often generate emergent properties[2] which are not present in an obvious way in its parts. The interactions between the components of a complex system define a network of connections consisting of nodes and edges. Examples of such networks range over all disciplines of science, including the study of social media networks,[3] scientific collaboration networks,[4] and the human brain and its interconnected neurons as a particularly interesting one. Much of the relevant content is concealed in the network constructed from these interactions and is not visible in the components themselves. For instance, the brain connectome,[5] and not the cellular content of the brain, is believed to make us who we are.[6,7] Similarly, the observation of cellular concentrations of biomolecules without considering their interaction would provide a limited picture that ignores potential emergent properties of the biomolecular complex system. Hence, it is mandatory to study biological systems, such as cellular concentrations of biomolecules, in the framework of network biology.[8] At a fundamental level, all network connections between the cellular biomolecules are biochemical reactions, and their specification in biochemical pathways together with their subcellular spatial distribution would provide complete knowledge about the biological network state of the cell. This collective network of all biochemical reactions contains all metabolic reactions, the signaling cascades, gene regulatory networks, and all complex-forming non-covalent interactions between molecules, as for instance protein–protein interactions (PPIs). Due to the limitations of experimental and computational methods to map out this interaction network, we often obtain only partial knowledge about the complete biochemical reaction network from experiments. Networks are, however, not limited to describing fundamental physicochemical interactions between biomolecules. For instance, in a gene co-expression network analysis,[9] one looks for similarity of expression patterns of gene products over many samples. Strongly correlated expression implies that these genes have some kind of non-physical interaction; e.g., they are part of the same transcriptional regulatory program, or they share membership in the same pathway or protein complex. However, the exact relationship in terms of biochemical reactions remains unknown with these and other techniques. Hence, in these cases, networks describe a more coarsely grained level of detail, in which relationships between molecules are not necessarily biochemical reactions, but of a more general kind. Computational proteomics is a mature data science that copes well with the large amounts of data produced in mass spectrometry (MS) experiments.[10] Perseus is an established framework for the downstream bioinformatics analysis of quantitative proteomics data.[11,12] The initial version of Perseus provided a comprehensive framework and set of activities to analyze data matrices originating from quantitative proteomics in a workflow environment. The main idea behind Perseus is to enable researchers in biomedical sciences to perform data analysis themselves. Here we describe how we extend this program to the analysis of biological networks in the context of proteomics. While cytoscape[13] exists as the de facto standard for network analysis and visualization, many proteomics-specific tasks for the generation and analysis of networks are lacking from this framework, as well as workflow navigation. PerseusNet fills this gap and enables non-computational experts to perform complete network-based analysis of their data. We explicitly do not want to re-invent existing methods and algorithms. Instead, we designed an extensible framework that integrates with existing tools, like cytoscape, and interoperates with existing code and scripts from the network analysis community that were written in diverse languages, like Python and R. The data structures within Perseus that hold the networks were set up in a way that facilitates studying dynamic changes in networks and finding differential network properties over complex experimental designs. Side-by-side analysis of networks with data matrices in a common workflow environment allows for a seamless transition between matrix-centric and network-centric approaches. In the following we start with a general description of the new network framework in Perseus, including how it enables multilingual programming and usage of code resources from R and Python. We then introduce the new volcano-plot based analysis workflow scalable to large affinity purification-mass spectrometry (AP-MS) datasets. We describe how general and, more specifically, large-scale PPI networks are handled and curated in Perseus. A section on the analysis of posttranslational modification (PTM)-induced networks, like kinase–substrate relationships for phosphoproteomics, is next. Finally, we cover co-expression analysis in Perseus and its applications to clinical proteomics.

Experimental Section

Creating Interaction Networks from Pulldown Experiments

We created an interaction network from a pull-down screen.[14] First, .RAW files were obtained from PRIDE (PXD003758) and processed with MaxQuant version 1.6.2.10. Mouse protein sequences were downloaded from UniProt (release 2017_07). Parameters “matching between runs” and “LFQ” were selected in addition to the default parameters. Downstream analysis of the ‘proteinGroups.txt’ output table was performed in Perseus using the tools described in this Article. Columns for baits Eed, Ring1b, and Bap1 and their controls in the ESC and NPC cell lines were selected and log transformed. Quantitative profiles were filtered for missing values, and were filtered independently for each of the bait control pairs, retaining only proteins that were quantified in all three replicates of either the bait or control pull-down. Missing values were imputed (width 0.3, down shift 1.8) before combining the tables and performing the multi-volcano analysis (Table S1). The s0 and FDR parameters of the multi-volcano analysis for Class A (higher confidence, s0 = 1, FDR = 0.01%) and Class B (lower confidence, s0 = 1, FDR = 0.2%) were chosen by visual inspection, aiming for a low number of significantly depleted proteins in any of the experiments. Class C interactions, which are based on profile correlation between bait and prey, were not considered in this network due to the limited number of pull-downs in the dataset, which would result in inaccurate correlation estimation. Edges representing known protein complex interactions were annotated in the network. Due to missing mouse CORUM annotations for any of the baits, mouse CORUM annotations were obtained by mapping between mouse and human homologues as listed in the MGI database.[15]

Approximately Scale-Free Topology of the STRING Interaction Network

We downloaded the human STRING interaction network (v10.5) from the STRING website. After filtering for high confidence interactions (combined score > 0.9), the scale-free fit index was calculated according to ref (16). Node degrees were calculated and plotted against their frequency distribution on a log–log scale. The R2 of a linear fit to the log–log space represents the scale-free fit index.

Network Analysis of a Phosphoproteomic Dataset of EGF Stimulation

Two separate analysis tool, PHOTON and KSEA, were applied to the same experimental dataset of 9184 phosphorylation sites with high localization probability (>0.75)[17] (Table S2). Log2 fold-changes for EGF from two replicates were averaged. For PHOTON analysis, we first generated a high-confidence PPI network. We downloaded all interactions from HIPPIE and filtered them for high-confidence interactions (confidence > 0.72), additionally removing high-degree nodes (degree < 700). Nodes in the HIPPIE network are identified by their Entrez GeneID. Therefore, the experimental data were mapped from UniProt to Entrez GeneIDs before the nodes of the network were annotated . Phosphorylation sites with multiple GeneIDs were mapped to all matching nodes in the network. We then performed PHOTON analysis with adjusted default parameters. Network reconstruction with ANAT was enabled with the 100 highest scoring proteins and EGF anchor (GeneID 1950). Additionally, we increased the number of permutations to 100 000. The KSEA analysis was performed on the human site-specific kinase–substrate network from PhosphositePlus.[18] Data and network were matched on the basis of UniProt identifiers.

Co-expression Analysis of a Clinical Proteomics Dataset

Protein quantification data and clinical annotation were obtained from Yanovic et al.[19] SILAC ratios were first transformed to log(light/heavy). The dataset was filtered for the 43 patients unique to ref (19). Using global hierarchical clustering of the patients, four outlier samples were identified and removed from the dataset. Additionally, proteins with less than 70% valid values were removed from the dataset, and the resulting patient profiles were Z-scored (Table S3). Following the WGCNA workflow,[16] the power parameter for the co-expression analysis was selected using the ‘Soft-threshold’ activity provided by PluginCoExpression. Co-expression analysis was performed in a signed network with biweight midcorrelation and the power parameters set to 10. The eigengene of each co-expression module was correlated with the provided clinical data using Pearson correlation and clustered using hierarchical clustering.

PluginInterop Provides a Central Entry Point for All External Plugins

The PluginInterop project is written in the C# programming language and implements several Perseus plugin APIs. For users it provides a number of activities in Perseus for executing script files written in the Python and R languages. Upon selection of any of these activities, users will be prompted with a parameter window, allowing them to pass additional arguments to the script and requiring them to specify the executable that should be used for processing. Since Perseus does not include an installation of Python or R, users will have to install those and any other dependencies separately. PluginInterop aids the user by trying to automatically detect an existing installation and provide meaningful error messages in case of missing dependencies. Developers can additionally leverage the functionality implemented in PluginInterop as a basis for parametrized scripts. In general, developers are free to choose which external scripting language or program they would like to utilize. We found the R and Python scripting languages to be most useful, which is why we provide two companion libraries, ‘perseuspy’ and ‘PerseusR’, to be used alongside PluginInterop. These libraries aid the communication between Perseus and the script. The communication between Perseus and external scripts is straightforward and is easily implemented for any tools of choice. In short, Perseus will persist all necessary data to the hard-drive and call the specified tool with specific command-line arguments. The first arguments contain all the parameters specified by the user, per choice of the developer, either in an XML format or simply separated by spaces. Second, the input data from the workflow is saved to a temporary location which is passed to the script. The final arguments specify the expected location of the output data. The external process can provide status and progress updates to the user, as well as detailed error reporting by printing to stdout/stderr and indicating success or failure through the exit code. Once the process exits, Perseus will parse the output data for its expected location and insert it to the workflow. Any step in the pipeline is customizable for advanced scenarios, such as custom data formats. The PluginInterop binary is automatically included in the latest Perseus version. The source code was published under the permissive, open-source MIT license on Github (https://github.com/cox-labs/PluginInterop). The website also provides more information on how to develop plugins, including a video demonstration. The plugins presented in this Article are all developed on top of PluginInterop and the perseuspy and PerseusR companion libraries.

Library Support for Scripting Languages

We implemented libraries in R and Python which facilitate the interoperability of Perseus with external scripting languages. The main aim of these libraries is to map the data structures of Perseus to a counterpart native to the external language. Developers proficient in these languages will be more comfortable and productive with these native data structures. The largest benefit comes from the resulting integration with the existing data science ecosystem, all now available to Perseus plugin developers. The ‘perseuspy’ module provides data mappings for the Python language. The Perseus expression matrix is mapped to the ‘DataFrame’ object of the popular ‘pandas’ module, which is tightly integrated with ‘numpy’, the de facto standard for numerical computations in Python. The Perseus network collection data type maps to a list of networks from the ‘networkx’ package. It features a variety of graph algorithms and interfaces well with other modules, due to its usage of standard Python dictionaries. ‘perseuspy’ is distributed via The Python Package Index (PyPI), allowing for easy installation of the module for developers and users alike. The code of ‘perseuspy’ is published under the permissive, open-source MIT license, and is available alongside usage examples and more information on https://github.com/cox-labs/perseuspy. For the R language, we implemented the ‘PerseusR’ package. It provides a mapping of the Perseus expression matrix to a custom wrapper class around the R ‘data.frame’ object. The wrapping was necessary to represent Perseus-specific information such as annotation rows. Alternatively, developers can load data as a Bioconductor ‘expressionSet’ object which enables the interface with the entire Bioconductors bioinformatics suite. Currently there is no support for network collections in ‘PerseusR’, but we plan to implement it in the near future. ‘PerseusR’ is also published under the MIT license and its code is available on https://github.com/cox-labs/PerseusR. ‘PerseusR’ is easily installed directly from CRAN.

Implementation of PluginPHOTON

We implemented a Perseus plugin for the PHOTON tool on top of the functionality provided by PluginInterop and perseuspy. PHOTON was previously capable to run only a single experiment at a time with a fixed human PPI network. We expanded its implementation to allow for parallel processing of any number of experiments on any network. These changes make large datasets from any species directly amenable to PHOTON analysis. PluginPHOTON is published under the MIT license, its code is available on https://github.com/jdrudolph/photon, and it is included in the latest Perseus release.

Implementation of PluginCoExpression

We implemented parts of the WGCNA pipeline as a Perseus plugin. PluginCoExpression provides access to the WGCNA functions implemented in the R language via PluginInterop and PerseusR.

Implementation of KSEA in Perseus

KSEA analysis was implemented in Perseus and tested for correctness against the reference implementation.

Results and Discussion

Workflow-Based Biological Network Analysis

PerseusNet was devised to fulfill the computational needs of proteomics researchers wishing to accomplish network analysis of their data. While it is extensible through a new plugin application programming interface (API), and hence any network analysis functionality can be implemented, most tools needed for proteomics research and connecting it to generic network analysis platforms are included in the software (Figure ). Dedicated activities for analyzing AP-MS datasets and phosphoproteomics experiments in the context of kinase–substrate networks belong to the basic infrastructure of PerseusNet. The most common standard data formats (tab, txt, csv, gml, sif, json) are supported as input. An extended multi-language plugin API allows leveraging many existing tools in the analysis workflow. As an important example, co-expression clustering tools are integrated in this way.

Figure 1

Schematic overview of the new network functionality in Perseus. PerseusNet implements a number of processing and analysis steps facilitated by the network collection data type. While including proteomics centric analyses, such as for the analysis of interaction screens, the network module also provides a number of general purpose tools, as, for instance, for network annotation, filtering, and topology determination. With the extension of the Perseus plugin API to networks and furthermore to other programing languages, it becomes possible to integrate existing network analysis tools in Perseus. Networks are easily imported to and exported from Perseus, due to its support for standard formats. To accommodate PerseusNet, we extended the Perseus framework with a new data type termed network collection (Figure ) that represents a set of one or more networks which are analyzed jointly in the workflow. Different networks within the same network collection can, for instance, represent networks derived from different individuals (patients), experimental conditions, or biological replicates. All information in the network collection is organized in data tables, leveraging the existing augmented data matrix[11] in Perseus. General information on the networks in the collection is stored in the networks table, where each row represents an individual network. Here, sample-related annotations, such as calculated global network properties, can be stored to enable their usage in analysis activities operating on a network collection. For instance, if the samples correspond to different patients, the networks table can hold patient-specific information as derived from patient records or questionnaires. These variables can then be used as independent or confounding factors in statistical analysis of the networks.

Figure 2

Schematic representation of the network collection data type. User-facing information is displayed in tabular form with tables listing the networks in the collection, as well as providing detailed information on the nodes and edges of each network. Internally an auxiliary graph data structure aids in the implementation of graph algorithms. Node- and edge-mapping provide the required cross-references between the tabular and graph representation. The nodes and edges of each individual network are stored in a pair of separate tables. The nodes table further describes the entities in the network, while the edges table provides details on the connections between the entities. The entities in the nodes table can be annotated with local network properties, such as the node degree. In case the entities correspond to proteins, biologically meaningful annotations could include membership in gene ontology terms, pathways, or protein complexes. Similarly, edges can be annotated in the edges table with properties of pairwise relationships between proteins, as, for instance, interaction confidence measures. All of these properties are then accessible to the network analysis tools. Furthermore, all mentioned tables can be sorted and searched, allowing all information to be browsed and inspected intuitively. Internally, a graph data structure for each network enables the efficient execution of graph algorithms. We did not aim to include generic graphical representation of networks as node-link diagrams, since this can be achieved in other tools such as Cytoscape, for which we provide simple adaptors for the transfer of networks. However, several activities include specialized visualizations tailored to specific analyses. In Perseus, all data analysis steps are performed within a graphical workflow (see Figure S1.) Enabled by the newly implemented network collection, the Perseus workflow is now capable of all import, processing, and analysis steps in the side-by-side analysis of expression matrices and networks. All data imported into Perseus is represented as a separate entity in the workflow. Any matrix or network undergoing a processing step is not modified in place but rather becomes a new entity that gets connected to the original data in the workflow. By inspecting both input and output data, every step in the analysis is traceable and easily understood. Certain processing steps allow for the transformation of matrices into networks and vice versa, or the mapping of data between the two. As a result, any analysis performed in Perseus, potentially including several side-by-side processing steps of networks and matrices, always remains transparent to the user.

Multilingual Plugin Activities

The network collection data structure (Figure ) and the extended Perseus workflow provide the foundation for enabling various network analyses, many of which are available in Perseus. In general, networks either originate from external sources or are created in a data-driven manner from within the workflow. To facilitate the import of external networks into the workflow, we implemented parsers for standard network formats, such as edge table (.tab|.txt|.csv), GraphML (.gml) (http://graphml.graphdrawing.org/), Cytoscape’s simple interaction format (.sif) (http://manual.cytoscape.org/en/stable/Supported_Network_File_Formats.html), and D3js’s JSONgraph (.json) (http://jsongraphformat.info/), which enable loading interactions from most popular network databases, including STRING,[20] BioGRID,[21] IntAct,[22] CORUM,[23] and PhosphoSitePlus.[18] Furthermore, specific quantitative expression data, such as AP-MS, drives the creation of novel PPI networks, and phosphoproteomics datasets allow for a more detailed view or construction of kinase–substrate relationship networks. Specialized visualizations of such networks are provided (see later sections), which allow for an intuitive visual inspection of the results of the analysis. Perseus is not limited to physical interaction networks: co-expression clustering provides a powerful alternative to regular hierarchical clustering for expression proteomics studies. Finally, any network collection can be exported from the workflow in a plain text file format (Supplementary Data 1) for sharing or use in any other external tools, such as Cytoscape. In order to accommodate these new capabilities in the Perseus plugin system, we extended the Perseus plugin API with new programming interfaces for the network collection and other associated data types, as well as the respective import, processing, and analysis interfaces (see Figure S2.) This fully featured API is available to all developers wishing to extend Perseus’s functionality with plugins. All analyses presented in this Article adhere to the new API. In order to better leverage the existing network analysis ecosystem, we additionally implemented a new mode of interoperability between Perseus and external tools (Figure ). The PluginInterop project enables this functionality and allows the user to run external tools from within the Perseus workflow, most prominently scripts written in the popular R and Python languages. Open-source companion libraries for R (PerseusR, https://github.com/cox-labs/PerseusR) and Python (perseuspy, https://github.com/cox-labs/perseuspy) provide utilities for interfacing with Perseus. As a result, network analysis tools originally implemented in external tools can run from within the Perseus workflow with only minor adjustments. The implementations of the PHOTON and WGCNA plugins presented in this Article are based upon PluginInterop and its companion libraries. Instructions for interested developers on how to write scripts for Perseus or how to adapt existing tools can be found on the PluginInterop website (https://github.com/cox-labs/PluginInterop). In the following sections, we will present a number of network analyses which are now implemented in Perseus, with focus on their application to different types of proteomics data.

Figure 3

Schematic of the Perseus plugin system. Plugins written in C# are native to Perseus and implement their functionality directly on top of the application programming interfaces and data structures provided by the application framework. PluginInterop enables the execution of scripts in the Python and R languages, as well as other external programs. By communicating via the file system, data are transferred between Perseus and the external program. The companion libraries ‘perseuspy’ and ‘PerseusR’ enable developers to access the data science ecosystem in their language of choice. For custom graphical user interface elements and an improved user experience of external tools, developers can implement a thin C# wrapper class that extends the generic functionality of PluginInterop.

Affinity Enrichment MS Interactomics

Affinity purification or enrichment coupled to MS analysis has become a powerful tool for interrogating PPIs.[24,25] Not only is it able to provide a detailed view on proteins of interest, but it can also determine the basic building blocks for the assembly of large-scale PPI networks.[26,27] Historically, protein complex members were detected by subjecting the sample to a series of purification steps followed by MS identification. With the advent of quantitative MS, detecting even transient interactions has become possible by relying not on the identification itself, but instead on quantitative information. The sample is not purified but only enriched for the protein of interest and its interaction partners and then subjected to MS quantification.[28] Confidently identifying bona fide interactions and distinguishing them from background binders, arising from off-target binding or contamination, require data analysis of replicate case and control measurements. Compared to purely fold-change-based methods, statistical tests provide a powerful way to compare case and control samples by calculating a test statistic and an associated p-value and limit the number of false-positives. For visual inspection of the results, the (negative logarithm of the) p-value can be plotted against the size of the effect, i.e., the difference between the means of logarithmic abundances, in a so-called volcano plot. Since one statistical test is performed for each protein, which amounts to a large number of tests performed simultaneously, the significance level needs to be adjusted to avoid increased numbers of false positives due to the multiple hypothesis testing problem.[29] A popular strategy to adjust for multiple testing is to control the false discovery rate (FDR), which can be achieved by permutation-based methods. Furthermore, in the volcano plot method it is necessary to define the functional form of the curves that separate significant from non-significant hits, either by straight lines or, in a more sophisticated way, introduced in the significance analysis of microarrays (SAM) method,[30] by modifying the t test statistic with the background variance parameter s0. This standard workflow is available in Perseus but becomes increasingly cumbersome for interaction screens with more than a handful of baits. Parameter values for s0 and the FDR thresholds are often applied separately for each pulldown, inviting overfitting and cherry-picking, and also requiring results be subsequently combined manually. We implemented the interactive multi-volcano plot (Figure a) to analyze interaction screens with arbitrarily many baits and conditions simultaneously. Given the experimental design of the dataset, defined by baits and conditions, the analysis is applied to each experiment. FDR threshold and s0 parameters for two different Class A (high) and Class B (low) confidence classes can be selected globally. For sufficiently large datasets, instead of dedicated control samples, an internal control can be assembled from the dataset for each pulldown consisting of pulldowns of other, unrelated baits. The results can be inspected through an interactive user interface. All volcano plots are displayed in the overview panel. A multi-functional detail panel shows more information on selected plots and provides zoom, protein selection, and labeling options. If a single plot is selected, the volcano plot is shown in the detail panel. When two plots are selected, the t test differences between the selected experiments are plotted against each other, highlighting changes in the enrichment of proteins between experiments (Figure b). Additionally, all data can be browsed in tabular form, making it easily searchable and allowing for rich styling options. Known interactors or gene ontology annotations matching the experiment can be used to highlight proteins in the plot and can serve as a positive control for the adjustment of test parameters. Since all test parameters are controlled on a global level, overfitting and cherry-picking parameter values is prevented effectively. We integrated the multi-volcano analysis into the new network module. Results from PPI screens can be exported as network objects into the Perseus workflow. A specialized node-link visualization based on the open-source cytoscape.js library[31,32] with multiple layers of information, allows for easy interpretation of the results (Figure c). A PPI network that was newly created in this way can be integrated with existing networks or exported in various formats using the functions available through the network module.

Figure 4

AP-MS. (a) The Hawaii plot provides an overview over an entire dataset, in this case consisting of three baits in two conditions (Table S1 and Experimental Section).[14] Each volcano plot displays the results of a pull-down of a specific bait (Bap1, Eed, Ring1b) in one of the ESC or NPC cell lines. Significant interactors are determined using a permutation-based FDR and the resulting high-confidence Class A (solid line) and low-confidence Class B (dashed lines) thresholds are displayed in the plot. In this case only in the Bap1 ESC pull-down, Class B interactions could be found. Class A interactors are displayed in dark gray, other proteins are shown in light gray. (b) Enrichment plot comparing the Eed pull-downs in ESC and NPC cell lines. Significant interactors in any of the two conditions are displayed in black, nonsignificant proteins are displayed in light gray. Proteins differentially enriched in one of the two conditions will be located far from the diagonal and can be identified visually. (c) Visualization of the resulting protein interaction network for both cell lines. Bait proteins are colored in green, and their interactors are colored in blue. Thick lines represent Class A interactions, thinner lines Class B. Interactions which were already annotated in the human CORUM database are highlighted in red. As an example application, we obtained pull-down experiments of Polycomb group proteins from ref (14), covering the three baits Bap1, Eed, and Ring1 in mouse embryonic stem cells (ESCs) and neural progenitor cells (NPCs). The filtered dataset contained 2995 proteins (Table S1). Using the new multi-volcano analysis (Figure a), we obtained an interaction network connecting the bait proteins with their significantly enriched prey proteins. Bait proteins were identified by their gene name, as specified in the annotation rows of the dataset. In order to have a consistent representation, the protein groups of the preys are also identified by their gene names. The resulting network contained 134 nodes and 140 edges. The results were comparable to the original publication with overlaps between 55% (Ring1b ESC) and 91% (Bap1 ESC) between the previously reported interactions and detected Class A interactions. Differences can be explained by the slightly different methodology used in this Article. We used the s0-modified t test with s0 set to 1.0, and FDRs of 0.01% and 0.2% for Class A and B, respectively, while the authors of ref (14) used individually chosen fold-change and p-value cutoffs for each experiment. No Class C interactions were included. Using the built-in visualization features, such as the enrichment between experiments, we identified several interactions that were conditional on the cell type (Figure b). By annotating the newly created protein-interaction network with known complex interaction from CORUM and inspecting the resulting node-link network visualization (Figure c), previously known and possibly novel interactions could be distinguished. Further confidence in the existence of an interaction between a protein identified in a pulldown and the bait can be obtained by correlation analysis. The correlation of the intensity profiles over many pulldowns with the bait intensity profile is reported in the output tables, together with the volcano plot-derived significance of the interaction. When assembling the interaction network, a threshold is applied to this correlation in order to define an additional class of interactions (Class C), which might not have been found by volcano plot analysis (Class A and Class B). This workflow is especially appealing for interaction screens with a large number of bait proteins.

Importing, Curating, and Probing Large-Scale PPI Networks

While protein interaction screens can uncover novel or condition-specific interactions, a wealth of detected and predicted interactions are already stored in PPI databases.[33] Analyzing large-scale PPI networks jointly with other omics data has great promise. However, a major obstacle to performing systems-level analysis on these large-scale networks is the lack of easy-to-use software solutions to transparently handle the processing and analysis of these networks. Many studies under-utilize the existing resources and mostly report the interactions of a single protein as an afterthought. In the following, we introduce the new network capabilities of Perseus to assemble, filter, and understand large-scale PPI networks, which lay the foundation for any network analysis. The first task is assembling a high-confidence interaction network. Many databases, such as STRING,[20] BioGRID,[21] or HIPPIE,[34] allow researchers to download all interactions in a tabular format, which can be easily loaded into Perseus, even with sizes of up to few millions of interactions. Supporting information on the interactions such as, but not limited to, the interaction type or a measure of confidence remains available at each step in the subsequent data analysis. Networks are not restricted to originate from any single data source. Perseus provides all necessary tools to integrate information from any source, providing full control over the choice of identifier and handling of duplication and ambiguity in the mapping. Conversely, generalized interaction networks such as STRING can be filtered by interaction type to generate a physical interaction network. Confidence measures often integrate diverse knowledge into a single score, derived from how often, and by which experimental technique, an interaction was detected, combined with more abstract measures, such as co-expression and literature co-occurrence of the interaction partners.[20] There are two approaches for interaction confidence aware network analysis (Figure a). Applying a cutoff to the confidence score removes low-confidence interactions from the network, which is especially useful when applying methods that treat all interactions equally. The cutoff can be chosen according to the confidence score distribution and the targeted network size (Figure b). Other methods operate on weighted networks and distinguish between interactions with high or low confidence. In this case the confidence scores can be used as an edge weight. In addition to static confidence scores, one can devise dynamic confidence scores from experimental data which reflect, e.g., changes in abundance or localization of any of the interactors.

Figure 5

Handling large-scale protein interaction databases in Perseus. (a) Interactions in PPI databases are often annotated with confidence scores derived from various sources. Perseus provides tools to load and combine confidence scores derived a variety of data sources, including dynamic confidence adjustments based on condition-specific data. High-confidence networks can be obtained by removing edges below a given hard threshold or alternatively, confidence scores can be utilized directly as so-called edge weights, thereby allowing for the inclusion of lower-confidence interactions. (b) Histogram of the combined confidence score from the human STRING PPI network. Superimposed in orange is the number of interactions in the filtered network if the edges with scores lower than the current value were removed. Filtering out low confidence edges leads to a significant reduction in the number of edges in the final network. (c) Log–log plot of the node degree against the degree frequency generated from the human STRING PPI network. The R2 value of the linear fit (orange) to these data represents the scale-free fit index. A deeper understanding of the network requires a different perspective in addition to the interaction-centric view. Any list of interactions can be converted into a network collection with a single click. A dedicated set of network-specific processing activities are now available. While processing the list of interactions, the focus remains on the edges of the network. In the network view, the focus is shifted to the nodes. With the powerful identifier and data mapping mechanisms in Perseus, nodes are easily annotated with various annotations, such as gene ontology (GO),[35] or quantitative proteomics data. Any annotation can be subsequently used to filter the nodes of the network. One could, for example, extract a sub-network of proteins associated with a specific GO category and their interactions from the large-scale network. Using the data mapping from, e.g., deep proteomes of specific cell lines or tissues, condition-specific sub-networks can be created. Further understanding is gained by studying the intrinsic properties of networks. By calculating node degrees, corresponding to the number of neighbors of each node in the network, hub nodes can be distinguished from peripheral nodes. By analyzing the distribution of the node degrees in the network, global network properties, such as approximate scale-freeness,[36,37] of the topology can be identified (Figure c). Furthermore, intrinsic local network properties, like the node degree, can be correlated with biological properties derived from protein annotations or experimental data. The proper construction of large-scale interaction networks and understanding of their basic properties are central to the successful application of more specialized analyses such as the integration of such networks with PTM data.

Network Analysis of PTM Data

The MS-based study of PTMs is now possible on a global scale for several types of modifications. The best known example is MS-based phosphoproteomics,[38] which is a powerful tool for interrogating signaling events on a large scale. However, drawing conclusions directly from phosphorylation changes is challenging, due to the mostly missing functional information on the inhibitory or excitatory action of a specific protein phosphorylation at a specific site. Network-based approaches for the analysis of phosphorylation data derive functional information on the protein level by interrogating the phosphorylation changes observed in the network neighborhood.[17,39,40] We implemented the popular kinase–substrate enrichment analysis[39] (KSEA) tool for predicting kinase activities in Perseus. Site-specific kinase–substrate networks (Figure a) assign kinases to the experimentally observed phosphorylation sites. The core of the analysis is the calculation of a series of scores (mean, enrichment, Z-score, p-value, q-value) for each kinase, based on the quantitative phosphorylation changes of its substrates. These predicted kinase activities can be analyzed further to find differentially activated kinases. KSEA most often utilizes the curated kinase–substrate network from the PhosphoSitePlus database.[18,41,42] In order to extend the coverage of the network and thereby allow for the utilization of a larger fraction of the experimental data, the network can be supplemented with predicted kinase–substrate interactions from tools such as NetworKIN[43,44] or with low-specificity interactions derived from kinase target sequence motifs.

Figure 6

Network analysis of an EGF stimulation phosphoproteomics study. (a) Comparison of network topologies used for the analysis of phosphoproteomics data. Nodes in the network are represented as gray circles or pie charts where each slice represents the observed phosphorylation changes at a specific site on the protein. Physical protein–protein interactions (left side) are present between all classes of proteins and are by definition undirected. In order to capture the enzymatic action of kinases more accurately, directed interactions (right side) from kinase to substrate are defined in a site-specific manner. (b) KSEA Z-score and PHOTON signaling functionality scores derived from phosphoproteomics data measured after EGF stimulation (Table S2) only weakly correlate to each other (Pearson correlation 0.52). Kinases annotated in GO with the term ‘Epidermal growth factor receptor signaling pathway’ are highlighted in red. Both methods assign high scores to central members of the expected pathway. (c) Signaling network reconstructed by PHOTON from the 100 highest scoring proteins anchored at EGF. The interactive visualization has an automatic layout and phosphorylation data overlay. PHOTON,[17] now available in Perseus, is an alternative approach to KSEA that calculates more broadly defined signaling functionality scores for any protein, rather than activities for kinases only. A data-annotated large-scale PPI network now serves as the input (Figure a). The resulting signaling functionality scores for each experimental condition are based on the observed phosphorylation in the neighborhood of each protein and are assigned a significance by a permutation scheme. The scores can either be analyzed directly, to find proteins with differentially changing signaling functionality, or utilized in a second step of the PHOTON pipeline, in which signaling pathways are automatically reconstructed from the large-scale network that connect the proteins with significant signaling functionality.[17] The Perseus network module allows for performing both KSEA and PHOTON analysis on the same experimental data[17] and a choice of networks.[18,34] When applied to a phosphoproteomics dataset of EGF stimulation,[17] the trade-offs of both methods in terms of coverage can be compared at every step of the analysis by inspecting the matrices and network collections in the workflow. For KSEA, 583 (5.66%) phosphorylation sites could be mapped to 975 (9.43%) site-specific kinase substrate interactions. As expected, PHOTON provided more coverage, with 9148 (99.82%) sites mapped to 2070 (16.87%) nodes in the PPI network. Due to the differences in the utilized methodologies and the chosen networks, resulting scores will differ but can be compared with the analyses and visualizations provided by Perseus (Figure b). While KSEA is tailored to the analysis of phosphoproteomics data due to its focus on kinase–substrate interactions, PHOTON is not limited to phosphorylation. Any quantitative, large-scale PTM dataset can be mapped to the PPI network, signaling functionality scores can be calculated, and sub-networks can be reconstructed. Both tools support the analysis of datasets with multiple conditions, effectively transforming the peptide-level phosphorylation data into protein-level scores. The entire well-established toolset for the analysis of protein quantification data can be applied to these scores, including hierarchical clustering, enrichment analysis,[17] and time-series analysis. To visualize PTM data in the context of any network, we implemented an interactive visualization of directly in Perseus (Figure c) using the cytoscape.js library.[31] The visualization allows for the joint visual inspection of the networks, e.g., sub-networks reconstructed by PHOTON, and the measured data. Browsing the quantitative PTM data in a reduced and highly structured network view while also considering the signaling functionality scores allows for the generation of hypotheses that explain the signal transduction mechanistically.

Co-expression Clustering and Clinical Data

When performing co-expression analysis, the correlation matrix between the proteins in the dataset describes a fully connected, weighted network, in which the weight on each edge denotes the correlation between the quantitative profiles of the two proteins (Figure a). Hence, the actual network usually remains implicit. A hierarchical clustering of the co-expression network can utilize the network neighborhood of each protein and integrate it into the similarity calculation.[45] The cluster dendrogram and the detected co-expression modules are then transferred back to the original data, where their interpretation is equivalent to ordinary hierarchical clustering. In addition to the clustering, a representative expression profile for each of the clusters is generated, which is termed eigengene. This highly reduced view on the data can be correlated with clinical or phenotype data and clustered to gain a better understanding of the behavior of the detected cluster (Figure b). The described co-expression analysis is available in Perseus through the R language interface provided by PluginInterop, which interfaces directly with the established WGCNA library.[16]

Figure 7

Co-expression network analysis on clinical data. (a) The correlation matrix is an equivalent representation of a fully connected network with edge weights corresponding to the correlation between the proteins. (b) Co-expression clustering and identified co-expression modules annotate the original expression matrix. Phenotype data can be correlated with representative co-expression module profiles and provide a high-level interpretation of the modules. (c) Parameter selection of the power parameter for the Yanovich et al.[19] dataset (Table S3 and Experimental Section). The lowest power reaching close to a high scale-free fit index of 0.9 (red line) was selected. (d) Co-expression cluster dendrogram. Each color corresponds to one co-expression module. (e) Correlation heat map between module eigengenes and clinical parameters. We applied the WGCNA co-expression analysis to parts of a cancer proteomics dataset[19] following the recommended workflow (http://www.peterlangfelder.com/wgcna-resources-on-the-web/) from within Perseus. Bi-weight midcorrelation, a robust alternative to Pearson correlation, was chosen to calculate correlations between all pairs of proteins. In order to obtain a scale-free co-expression network, a power parameter of 10 was selected (Figure c), leading to an approximately scale-free network with a scale-free fit index of 0.9. Hierarchical clustering of the co-expression network identified 30 modules (Figure d). The representative expression profiles of each of the modules, as provided by the corresponding module eigengene, were correlated with the available clinical annotations. This high-level overview over the data was then visualized in a heatmap (Figure e). Several modules showed high correlations with specific clinical annotations. The magenta module showed high correlation with the triple-negative subtype (TN) and was significantly enriched for the ‘interferon-gamma-mediated signaling pathway’ GO category (q = 1.12 × 10–05). The top module hub genes with kME > 0.8 were GBP1, TAP1, TAPBP, HLA-A, TAP2, STAT1, and EML4. The purple module showed high correlation with Stage III, but when inspecting the profile of its eigengene, we found it to have a single peak at one patient while being flat for all others. Hence, a set of proteins that are highly expressed in only a single patient, dominate the purple module, thereby limiting the validity of the module.

Software Implementation, Download, and Maintenance

The Perseus network module PerseusNet is implemented in the C# programming language using Visual Studio 2017, like the whole Perseus software. PerseusNet is distributed with Perseus by default and can be downloaded from http://www.perseus-framework.org. The current version, which is described in this Article, is 1.6.2.3. The PluginInterop and PHOTON plugins are also included in the standard download. In the current release, it is recommended to use Windows as operating system, although Linux support is underway, realized in the same way as for the MaxQuant software,[46,47] by ensuring Mono compatibility. A plugin API enables external programmers to extend the functionality of PerseusNet and Perseus in general, by programming their own workflow activities. Plugin extensions by the user community will be linked from the plugin store at http://www.coxdocs.org/doku.php?id=perseus:user:plugins:store upon request. Context-specific documentation is linked from each activity (Figure S3). Step-by-step guides for the integration of external tools, such as Python or R, that have to be installed and configured separately from the main Perseus software, are available online (https://github.com/jdrudolph/PluginInterop). A help forum for Perseus and PerseusNet is available at https://groups.google.com/group/perseus-list. Bugs that are reproducible in the latest available software version should be reported at https://maxquant.myjetbrains.com/youtrack. All presented analyses and necessary installations take less than an hour altogether.

Conclusions

We introduced PerseusNet, the network analysis extension for the Perseus software. It enables proteomics researchers to perform most network analysis by themselves. PerseusNet is highly extensible through a plugin API and its extension to R and Python, which allows for the incorporation of a plethora of existing scripts and programs from the network community. We envision that large part of the future programming will be done not by local developers but by the global community through the plugin API. Programmers can release their plugins under licenses of their choice. We have implemented powerful proteomics-specific activities for AP-MS network generation and PTM-related network analysis, presumably the two main applications for networking in proteomics. We plan to extend PerseusNet in the near future by activities from other proteomics sub-domains, as interaction determination by protein correlation profiling[48] and large-scale network generation from cross-linking experiments on whole-cell lysates.[49]

39 in total

Review 1. Network biology: understanding the cell's functional organization.

Authors: Albert-László Barabási; Zoltán N Oltvai
Journal: Nat Rev Genet Date: 2004-02 Impact factor: 53.242

2. Cytoscape: a software environment for integrated models of biomolecular interaction networks.

Authors: Paul Shannon; Andrew Markiel; Owen Ozier; Nitin S Baliga; Jonathan T Wang; Daniel Ramage; Nada Amin; Benno Schwikowski; Trey Ideker
Journal: Genome Res Date: 2003-11 Impact factor: 9.043

3. A human interactome in three quantitative dimensions organized by stoichiometries and abundances.

Authors: Marco Y Hein; Nina C Hubner; Ina Poser; Jürgen Cox; Nagarjuna Nagaraj; Yusuke Toyoda; Igor A Gak; Ina Weisswange; Jörg Mansfeld; Frank Buchholz; Anthony A Hyman; Matthias Mann
Journal: Cell Date: 2015-10-22 Impact factor: 41.582

Review 4. Scale-free networks in cell biology.

Authors: Réka Albert
Journal: J Cell Sci Date: 2005-11-01 Impact factor: 5.285

5. A general framework for weighted gene co-expression network analysis.

Authors: Bin Zhang; Steve Horvath
Journal: Stat Appl Genet Mol Biol Date: 2005-08-12

6. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification.

Authors: Jürgen Cox; Matthias Mann
Journal: Nat Biotechnol Date: 2008-11-30 Impact factor: 54.908

7. Elucidation of Signaling Pathways from Large-Scale Phosphoproteomic Data Using Protein Interaction Networks.

Authors: Jan Daniel Rudolph; Marjo de Graauw; Bob van de Water; Tamar Geiger; Roded Sharan
Journal: Cell Syst Date: 2016-12-21 Impact factor: 10.304

8. NetworKIN: a resource for exploring cellular phosphorylation networks.

Authors: Rune Linding; Lars Juhl Jensen; Adrian Pasculescu; Marina Olhovsky; Karen Colwill; Peer Bork; Michael B Yaffe; Tony Pawson
Journal: Nucleic Acids Res Date: 2007-11-02 Impact factor: 16.971

9. Mouse Genome Database (MGD)-2018: knowledgebase for the laboratory mouse.

Authors: Cynthia L Smith; Judith A Blake; James A Kadin; Joel E Richardson; Carol J Bult
Journal: Nucleic Acids Res Date: 2018-01-04 Impact factor: 16.971

10. mTOR regulates MAPKAPK2 translation to control the senescence-associated secretory phenotype.

Authors: Nicolás Herranz; Suchira Gallage; Massimiliano Mellone; Torsten Wuestefeld; Sabrina Klotz; Christopher J Hanley; Selina Raguz; Juan Carlos Acosta; Andrew J Innes; Ana Banito; Athena Georgilis; Alex Montoya; Katharina Wolter; Gopuraja Dharmalingam; Peter Faull; Thomas Carroll; Juan Pedro Martínez-Barbera; Pedro Cutillas; Florian Reisinger; Mathias Heikenwalder; Richard A Miller; Dominic Withers; Lars Zender; Gareth J Thomas; Jesús Gil
Journal: Nat Cell Biol Date: 2015-08-17 Impact factor: 28.824

15 in total

Review 1. Redox Systems Biology: Harnessing the Sentinels of the Cysteine Redoxome.

Authors: Jason M Held
Journal: Antioxid Redox Signal Date: 2019-09-09 Impact factor: 8.401

2. High-light-inducible proteins HliA and HliB: pigment binding and protein-protein interactions.

Authors: Minna M Konert; Anna Wysocka; Peter Koník; Roman Sobotka
Journal: Photosynth Res Date: 2022-02-26 Impact factor: 3.429

3. JUMPn: A Streamlined Application for Protein Co-Expression Clustering and Network Analysis in Proteomics.

Authors: David Vanderwall; Poudel Suresh; Yingxue Fu; Ji-Hoon Cho; Timothy I Shaw; Ashutosh Mishra; Anthony A High; Junmin Peng; Yuxin Li
Journal: J Vis Exp Date: 2021-10-19 Impact factor: 1.424

4. Perseus plugin "Metis" for metabolic-pathway-centered quantitative multi-omics data analysis for static and time-series experimental designs.

Authors: Hamid Hamzeiy; Daniela Ferretti; Maria S Robles; Jürgen Cox
Journal: Cell Rep Methods Date: 2022-04-14

Review 5. Bioinformatics Methods for Mass Spectrometry-Based Proteomics Data Analysis.

Authors: Chen Chen; Jie Hou; John J Tanner; Jianlin Cheng
Journal: Int J Mol Sci Date: 2020-04-20 Impact factor: 5.923

6. Temporal Quantitative Changes in the Resistant and Susceptible Wheat Leaf Apoplastic Proteome During Infection by Wheat Leaf Rust (Puccinia triticina).

Authors: Christof Rampitsch; Mei Huang; Slavica Djuric-Cignaovic; Xiben Wang; Ursla Fernando
Journal: Front Plant Sci Date: 2019-10-23 Impact factor: 5.753

7. Human adipose-derived mesenchymal stem cell-conditioned medium ameliorates polyneuropathy and foot ulceration in diabetic BKS db/db mice.

Authors: Cristian De Gregorio; David Contador; Diego Díaz; Constanza Cárcamo; Daniela Santapau; Lorena Lobos-Gonzalez; Cristian Acosta; Mario Campero; Daniel Carpio; Caterina Gabriele; Marco Gaspari; Victor Aliaga-Tobar; Vinicius Maracaja-Coutinho; Marcelo Ezquer; Fernando Ezquer
Journal: Stem Cell Res Ther Date: 2020-05-01 Impact factor: 6.832

8. Absence of miRNA-146a Differentially Alters Microglia Function and Proteome.

Authors: Nellie A Martin; Kirsten H Hyrlov; Maria L Elkjaer; Eva K Thygesen; Agnieszka Wlodarczyk; Kirstine J Elbaek; Christopher Aboo; Justyna Okarmus; Eirikur Benedikz; Richard Reynolds; Zoltan Hegedus; Allan Stensballe; Åsa Fex Svenningsen; Trevor Owens; Zsolt Illes
Journal: Front Immunol Date: 2020-06-05 Impact factor: 7.561

9. Isolate Specific Cold Response of Yersinia enterocolitica in Transcriptional, Proteomic, and Membrane Physiological Changes.

Authors: Chenyang Li; Jayaseelan Murugaiyan; Christian Thomas; Thomas Alter; Carolin Riedel
Journal: Front Microbiol Date: 2020-01-23 Impact factor: 5.640

10. DELTEX2 C-terminal domain recognizes and recruits ADP-ribosylated proteins for ubiquitination.

Authors: Syed Feroj Ahmed; Lori Buetow; Mads Gabrielsen; Sergio Lilla; Chatrin Chatrin; Gary J Sibbet; Sara Zanivan; Danny T Huang
Journal: Sci Adv Date: 2020-08-21 Impact factor: 14.136