Jan Daniel Rudolph1, Jürgen Cox1,2. 1. Computational Systems Biochemistry , Max-Planck Institute of Biochemistry , Am Klopferspitz 18 , 82152 Martinsried , Germany. 2. Department of Biological and Medical Psychology , University of Bergen , Jonas Liesvei 91 , 5009 Bergen , Norway.
Abstract
Proteomics data analysis strongly benefits from not studying single proteins in isolation but taking their multivariate interdependence into account. We introduce PerseusNet, the new Perseus network module for the biological analysis of proteomics data. Proteomics is commonly used to generate networks, e.g., with affinity purification experiments, but networks are also used to explore proteomics data. PerseusNet supports the biomedical researcher for both modes of data analysis with a multitude of activities. For affinity purification, a volcano-plot-based statistical analysis method for network generation is featured which is scalable to large numbers of baits. For posttranslational modifications of proteins, such as phosphorylation, a collection of dedicated network analysis tools helps in elucidating cellular signaling events. Co-expression network analysis of proteomics data adopts established tools from transcriptome co-expression analysis. PerseusNet is extensible through a plugin architecture in a multi-lingual way, integrating analyses in C#, Python, and R, and is freely available at http://www.perseus-framework.org .
Proteomics data analysis strongly benefits from not studying single proteins in isolation but taking their multivariate interdependence into account. We introduce PerseusNet, the new Perseus network module for the biological analysis of proteomics data. Proteomics is commonly used to generate networks, e.g., with affinity purification experiments, but networks are also used to explore proteomics data. PerseusNet supports the biomedical researcher for both modes of data analysis with a multitude of activities. For affinity purification, a volcano-plot-based statistical analysis method for network generation is featured which is scalable to large numbers of baits. For posttranslational modifications of proteins, such as phosphorylation, a collection of dedicated network analysis tools helps in elucidating cellular signaling events. Co-expression network analysis of proteomics data adopts established tools from transcriptome co-expression analysis. PerseusNet is extensible through a plugin architecture in a multi-lingual way, integrating analyses in C#, Python, and R, and is freely available at http://www.perseus-framework.org .
The
study of complex systems[1] is concerned
with the question of how the relationships between the parts of a
system give rise to its collective behavior. Complex systems often
generate emergent properties[2] which are
not present in an obvious way in its parts. The interactions between
the components of a complex system define a network of connections
consisting of nodes and edges. Examples of such networks range over
all disciplines of science, including the study of social media networks,[3] scientific collaboration networks,[4] and the human brain and its interconnected neurons
as a particularly interesting one. Much of the relevant content is
concealed in the network constructed from these interactions and is
not visible in the components themselves. For instance, the brain
connectome,[5] and not the cellular content
of the brain, is believed to make us who we are.[6,7] Similarly,
the observation of cellular concentrations of biomolecules without
considering their interaction would provide a limited picture that
ignores potential emergent properties of the biomolecular complex
system. Hence, it is mandatory to study biological systems, such as
cellular concentrations of biomolecules, in the framework of network
biology.[8]At a fundamental level,
all network connections between the cellular
biomolecules are biochemical reactions, and their specification in
biochemical pathways together with their subcellular spatial distribution
would provide complete knowledge about the biological network state
of the cell. This collective network of all biochemical reactions
contains all metabolic reactions, the signaling cascades, gene regulatory
networks, and all complex-forming non-covalent interactions between
molecules, as for instance protein–protein interactions (PPIs).
Due to the limitations of experimental and computational methods to
map out this interaction network, we often obtain only partial knowledge
about the complete biochemical reaction network from experiments.
Networks are, however, not limited to describing fundamental physicochemical
interactions between biomolecules. For instance, in a gene co-expression
network analysis,[9] one looks for similarity
of expression patterns of gene products over many samples. Strongly
correlated expression implies that these genes have some kind of non-physical
interaction; e.g., they are part of the same transcriptional regulatory
program, or they share membership in the same pathway or protein complex.
However, the exact relationship in terms of biochemical reactions
remains unknown with these and other techniques. Hence, in these cases,
networks describe a more coarsely grained level of detail, in which
relationships between molecules are not necessarily biochemical reactions,
but of a more general kind.Computational proteomics is a mature
data science that copes well
with the large amounts of data produced in mass spectrometry (MS)
experiments.[10] Perseus is an established
framework for the downstream bioinformatics analysis of quantitative
proteomics data.[11,12] The initial version of Perseus
provided a comprehensive framework and set of activities to analyze
data matrices originating from quantitative proteomics in a workflow
environment. The main idea behind Perseus is to enable researchers
in biomedical sciences to perform data analysis themselves. Here we
describe how we extend this program to the analysis of biological
networks in the context of proteomics. While cytoscape[13] exists as the de facto standard for network
analysis and visualization, many proteomics-specific tasks for the
generation and analysis of networks are lacking from this framework,
as well as workflow navigation. PerseusNet fills this gap and enables
non-computational experts to perform complete network-based analysis
of their data. We explicitly do not want to re-invent existing methods
and algorithms. Instead, we designed an extensible framework that
integrates with existing tools, like cytoscape, and interoperates
with existing code and scripts from the network analysis community
that were written in diverse languages, like Python and R. The data
structures within Perseus that hold the networks were set up in a
way that facilitates studying dynamic changes in networks and finding
differential network properties over complex experimental designs.
Side-by-side analysis of networks with data matrices in a common workflow
environment allows for a seamless transition between matrix-centric
and network-centric approaches.In the following we start with
a general description of the new
network framework in Perseus, including how it enables multilingual
programming and usage of code resources from R and Python. We then
introduce the new volcano-plot based analysis workflow scalable to
large affinity purification-mass spectrometry (AP-MS) datasets. We
describe how general and, more specifically, large-scale PPI networks
are handled and curated in Perseus. A section on the analysis of posttranslational
modification (PTM)-induced networks, like kinase–substrate
relationships for phosphoproteomics, is next. Finally, we cover
co-expression analysis in Perseus and its applications to clinical
proteomics.
Experimental Section
Creating Interaction Networks from Pulldown
Experiments
We created an interaction network from a pull-down
screen.[14] First, .RAW files were obtained
from PRIDE (PXD003758)
and processed with MaxQuant version 1.6.2.10. Mouse protein sequences
were downloaded from UniProt (release 2017_07). Parameters “matching
between runs” and “LFQ” were selected in addition
to the default parameters. Downstream analysis of the ‘proteinGroups.txt’
output table was performed in Perseus using the tools described in
this Article. Columns for baits Eed, Ring1b, and Bap1 and their controls
in the ESC and NPC cell lines were selected and log transformed. Quantitative
profiles were filtered for missing values, and were filtered independently
for each of the bait control pairs, retaining only proteins that were
quantified in all three replicates of either the bait or control pull-down.
Missing values were imputed (width 0.3, down shift 1.8) before combining
the tables and performing the multi-volcano analysis (Table S1). The s0 and FDR parameters of the multi-volcano analysis for Class A (higher
confidence, s0 = 1, FDR = 0.01%) and Class
B (lower confidence, s0 = 1, FDR = 0.2%)
were chosen by visual inspection, aiming for a low number of significantly
depleted proteins in any of the experiments. Class C interactions,
which are based on profile correlation between bait and prey, were
not considered in this network due to the limited number of pull-downs
in the dataset, which would result in inaccurate correlation estimation.
Edges representing known protein complex interactions were annotated
in the network. Due to missing mouse CORUM annotations for any of
the baits, mouse CORUM annotations were obtained by mapping between
mouse and human homologues as listed in the MGI database.[15]
Approximately Scale-Free Topology of the
STRING Interaction
Network
We downloaded the human STRING interaction network
(v10.5) from the STRING website. After filtering for high confidence
interactions (combined score > 0.9), the scale-free fit index was
calculated according to ref (16). Node degrees were calculated and plotted against their
frequency distribution on a log–log scale. The R2 of a linear fit to the log–log space represents
the scale-free fit index.
Network Analysis of a Phosphoproteomic Dataset
of EGF Stimulation
Two separate analysis tool, PHOTON and
KSEA, were applied to the
same experimental dataset of 9184 phosphorylation sites with high
localization probability (>0.75)[17] (Table S2). Log2 fold-changes for EGF from two
replicates were averaged. For PHOTON analysis, we first generated
a high-confidence PPI network. We downloaded all interactions from
HIPPIE and filtered them for high-confidence interactions (confidence
> 0.72), additionally removing high-degree nodes (degree < 700).
Nodes in the HIPPIE network are identified by their Entrez GeneID.
Therefore, the experimental data were mapped from UniProt to Entrez
GeneIDs before the nodes of the network were annotated . Phosphorylation
sites with multiple GeneIDs were mapped to all matching nodes in the
network. We then performed PHOTON analysis with adjusted default parameters.
Network reconstruction with ANAT was enabled with the 100 highest
scoring proteins and EGF anchor (GeneID 1950). Additionally, we increased
the number of permutations to 100 000. The KSEA analysis was
performed on the human site-specific kinase–substrate network
from PhosphositePlus.[18] Data and network
were matched on the basis of UniProt identifiers.
Co-expression
Analysis of a Clinical Proteomics Dataset
Protein quantification
data and clinical annotation were obtained
from Yanovic et al.[19] SILAC ratios were
first transformed to log(light/heavy). The dataset was filtered for
the 43 patients unique to ref (19). Using global hierarchical clustering of the patients,
four outlier samples were identified and removed from the dataset.
Additionally, proteins with less than 70% valid values were removed
from the dataset, and the resulting patient profiles were Z-scored (Table S3). Following
the WGCNA workflow,[16] the power parameter
for the co-expression analysis was selected using the ‘Soft-threshold’
activity provided by PluginCoExpression. Co-expression analysis was
performed in a signed network with biweight midcorrelation and the
power parameters set to 10. The eigengene of each co-expression module
was correlated with the provided clinical data using Pearson correlation
and clustered using hierarchical clustering.
PluginInterop Provides
a Central Entry Point for All External
Plugins
The PluginInterop project is written in the C# programming
language and implements several Perseus plugin APIs. For users it
provides a number of activities in Perseus for executing script files
written in the Python and R languages. Upon selection of any of these
activities, users will be prompted with a parameter window, allowing
them to pass additional arguments to the script and requiring them
to specify the executable that should be used for processing. Since
Perseus does not include an installation of Python or R, users will
have to install those and any other dependencies separately. PluginInterop
aids the user by trying to automatically detect an existing installation
and provide meaningful error messages in case of missing dependencies.
Developers can additionally leverage the functionality implemented
in PluginInterop as a basis for parametrized scripts. In general,
developers are free to choose which external scripting language or
program they would like to utilize. We found the R and Python scripting
languages to be most useful, which is why we provide two companion
libraries, ‘perseuspy’ and ‘PerseusR’,
to be used alongside PluginInterop. These libraries aid the communication
between Perseus and the script.The communication between Perseus
and external scripts is straightforward and is easily implemented
for any tools of choice. In short, Perseus will persist all necessary
data to the hard-drive and call the specified tool with specific command-line
arguments. The first arguments contain all the parameters specified
by the user, per choice of the developer, either in an XML format
or simply separated by spaces. Second, the input data from the workflow
is saved to a temporary location which is passed to the script. The
final arguments specify the expected location of the output data.
The external process can provide status and progress updates to the
user, as well as detailed error reporting by printing to stdout/stderr
and indicating success or failure through the exit code. Once the
process exits, Perseus will parse the output data for its expected
location and insert it to the workflow. Any step in the pipeline is
customizable for advanced scenarios, such as custom data formats.The PluginInterop binary is automatically included in the latest
Perseus version. The source code was published under the permissive,
open-source MIT license on Github (https://github.com/cox-labs/PluginInterop). The website also provides more information on how to develop plugins,
including a video demonstration. The plugins presented in this Article
are all developed on top of PluginInterop and the perseuspy and PerseusR
companion libraries.
Library Support for Scripting Languages
We implemented
libraries in R and Python which facilitate the interoperability of
Perseus with external scripting languages. The main aim of these libraries
is to map the data structures of Perseus to a counterpart native to
the external language. Developers proficient in these languages will
be more comfortable and productive with these native data structures.
The largest benefit comes from the resulting integration with the
existing data science ecosystem, all now available to Perseus plugin
developers.The ‘perseuspy’ module provides data
mappings for the Python language. The Perseus expression matrix is
mapped to the ‘DataFrame’ object of the popular ‘pandas’
module, which is tightly integrated with ‘numpy’, the
de facto standard for numerical computations in Python. The Perseus
network collection data type maps to a list of networks from the ‘networkx’
package. It features a variety of graph algorithms and interfaces
well with other modules, due to its usage of standard Python dictionaries.
‘perseuspy’ is distributed via The Python Package Index
(PyPI), allowing for easy installation of the module for developers
and users alike. The code of ‘perseuspy’ is published
under the permissive, open-source MIT license, and is available alongside
usage examples and more information on https://github.com/cox-labs/perseuspy.For the R language, we implemented the ‘PerseusR’
package. It provides a mapping of the Perseus expression matrix to
a custom wrapper class around the R ‘data.frame’ object.
The wrapping was necessary to represent Perseus-specific information
such as annotation rows. Alternatively, developers can load data as
a Bioconductor ‘expressionSet’ object which enables
the interface with the entire Bioconductors bioinformatics suite.
Currently there is no support for network collections in ‘PerseusR’,
but we plan to implement it in the near future. ‘PerseusR’
is also published under the MIT license and its code is available
on https://github.com/cox-labs/PerseusR. ‘PerseusR’ is easily installed directly from CRAN.
Implementation of PluginPHOTON
We implemented a Perseus
plugin for the PHOTON tool on top of the functionality provided by
PluginInterop and perseuspy. PHOTON was previously capable to run
only a single experiment at a time with a fixed human PPI network.
We expanded its implementation to allow for parallel processing of
any number of experiments on any network. These changes make large
datasets from any species directly amenable to PHOTON analysis. PluginPHOTON
is published under the MIT license, its code is available on https://github.com/jdrudolph/photon, and it is included in the latest Perseus release.
Implementation
of PluginCoExpression
We implemented
parts of the WGCNA pipeline as a Perseus plugin. PluginCoExpression
provides access to the WGCNA functions implemented in the R language
via PluginInterop and PerseusR.
Implementation of KSEA
in Perseus
KSEA analysis was
implemented in Perseus and tested for correctness against the reference
implementation.
Results and Discussion
Workflow-Based Biological
Network Analysis
PerseusNet
was devised to fulfill the computational needs of proteomics researchers
wishing to accomplish network analysis of their data. While it is
extensible through a new plugin application programming interface
(API), and hence any network analysis functionality can be implemented,
most tools needed for proteomics research and connecting it to generic
network analysis platforms are included in the software (Figure ). Dedicated activities
for analyzing AP-MS datasets and phosphoproteomics experiments
in the context of kinase–substrate networks belong to the basic
infrastructure of PerseusNet. The most common standard data formats
(tab, txt, csv, gml, sif, json) are supported as input. An extended
multi-language plugin API allows leveraging many existing tools in
the analysis workflow. As an important example, co-expression clustering
tools are integrated in this way.
Figure 1
Schematic overview of the new network
functionality in Perseus.
PerseusNet implements a number of processing and analysis steps facilitated
by the network collection data type. While including proteomics centric
analyses, such as for the analysis of interaction screens, the network
module also provides a number of general purpose tools, as, for instance,
for network annotation, filtering, and topology determination. With
the extension of the Perseus plugin API to networks and furthermore
to other programing languages, it becomes possible to integrate existing
network analysis tools in Perseus. Networks are easily imported to
and exported from Perseus, due to its support for standard formats.
Schematic overview of the new network
functionality in Perseus.
PerseusNet implements a number of processing and analysis steps facilitated
by the network collection data type. While including proteomics centric
analyses, such as for the analysis of interaction screens, the network
module also provides a number of general purpose tools, as, for instance,
for network annotation, filtering, and topology determination. With
the extension of the Perseus plugin API to networks and furthermore
to other programing languages, it becomes possible to integrate existing
network analysis tools in Perseus. Networks are easily imported to
and exported from Perseus, due to its support for standard formats.To accommodate PerseusNet, we
extended the Perseus framework with
a new data type termed network collection (Figure ) that represents a set of one or more networks
which are analyzed jointly in the workflow. Different networks within
the same network collection can, for instance, represent networks
derived from different individuals (patients), experimental conditions,
or biological replicates. All information in the network collection
is organized in data tables, leveraging the existing augmented data
matrix[11] in Perseus. General information
on the networks in the collection is stored in the networks table,
where each row represents an individual network. Here, sample-related
annotations, such as calculated global network properties, can be
stored to enable their usage in analysis activities operating on a
network collection. For instance, if the samples correspond to different
patients, the networks table can hold patient-specific information
as derived from patient records or questionnaires. These variables
can then be used as independent or confounding factors in statistical
analysis of the networks.
Figure 2
Schematic representation of the network collection
data type. User-facing
information is displayed in tabular form with tables listing the networks
in the collection, as well as providing detailed information on the
nodes and edges of each network. Internally an auxiliary graph data
structure aids in the implementation of graph algorithms. Node- and
edge-mapping provide the required cross-references between the tabular
and graph representation.
Schematic representation of the network collection
data type. User-facing
information is displayed in tabular form with tables listing the networks
in the collection, as well as providing detailed information on the
nodes and edges of each network. Internally an auxiliary graph data
structure aids in the implementation of graph algorithms. Node- and
edge-mapping provide the required cross-references between the tabular
and graph representation.The nodes and edges of each individual network are stored
in a
pair of separate tables. The nodes table further describes the entities
in the network, while the edges table provides details on the connections
between the entities. The entities in the nodes table can be annotated
with local network properties, such as the node degree. In case the
entities correspond to proteins, biologically meaningful annotations
could include membership in gene ontology terms, pathways, or protein
complexes. Similarly, edges can be annotated in the edges table with
properties of pairwise relationships between proteins, as, for instance,
interaction confidence measures. All of these properties are then
accessible to the network analysis tools. Furthermore, all mentioned
tables can be sorted and searched, allowing all information to be
browsed and inspected intuitively. Internally, a graph data structure
for each network enables the efficient execution of graph algorithms.
We did not aim to include generic graphical representation of networks
as node-link diagrams, since this can be achieved in other tools such
as Cytoscape, for which we provide simple adaptors for the transfer
of networks. However, several activities include specialized visualizations
tailored to specific analyses.In Perseus, all data analysis
steps are performed within a graphical
workflow (see Figure S1.) Enabled by the
newly implemented network collection, the Perseus workflow is now
capable of all import, processing, and analysis steps in the side-by-side
analysis of expression matrices and networks. All data imported into
Perseus is represented as a separate entity in the workflow. Any matrix
or network undergoing a processing step is not modified in place but
rather becomes a new entity that gets connected to the original data
in the workflow. By inspecting both input and output data, every step
in the analysis is traceable and easily understood. Certain processing
steps allow for the transformation of matrices into networks and vice
versa, or the mapping of data between the two. As a result, any analysis
performed in Perseus, potentially including several side-by-side processing
steps of networks and matrices, always remains transparent to the
user.
Multilingual Plugin Activities
The network collection
data structure (Figure ) and the extended Perseus workflow provide the foundation for enabling
various network analyses, many of which are available in Perseus.
In general, networks either originate from external sources or are
created in a data-driven manner from within the workflow. To facilitate
the import of external networks into the workflow, we implemented
parsers for standard network formats, such as edge table (.tab|.txt|.csv),
GraphML (.gml) (http://graphml.graphdrawing.org/), Cytoscape’s simple interaction format (.sif) (http://manual.cytoscape.org/en/stable/Supported_Network_File_Formats.html), and D3js’s JSONgraph (.json) (http://jsongraphformat.info/), which enable loading interactions from most popular network databases,
including STRING,[20] BioGRID,[21] IntAct,[22] CORUM,[23] and PhosphoSitePlus.[18] Furthermore, specific quantitative expression data, such as AP-MS,
drives the creation of novel PPI networks, and phosphoproteomics
datasets allow for a more detailed view or construction of kinase–substrate
relationship networks. Specialized visualizations of such networks
are provided (see later sections), which allow for an intuitive visual
inspection of the results of the analysis. Perseus is not limited
to physical interaction networks: co-expression clustering provides
a powerful alternative to regular hierarchical clustering for expression
proteomics studies. Finally, any network collection can be exported
from the workflow in a plain text file format (Supplementary Data 1) for sharing or use in any other external
tools, such as Cytoscape. In order to accommodate these new capabilities
in the Perseus plugin system, we extended the Perseus plugin API with
new programming interfaces for the network collection and other associated
data types, as well as the respective import, processing, and analysis
interfaces (see Figure S2.) This fully
featured API is available to all developers wishing to extend Perseus’s
functionality with plugins. All analyses presented in this Article
adhere to the new API.In order to better leverage the existing
network analysis ecosystem, we additionally implemented a new mode
of interoperability between Perseus and external tools (Figure ). The PluginInterop project
enables this functionality and allows the user to run external tools
from within the Perseus workflow, most prominently scripts written
in the popular R and Python languages. Open-source companion libraries
for R (PerseusR, https://github.com/cox-labs/PerseusR) and Python (perseuspy, https://github.com/cox-labs/perseuspy) provide utilities for interfacing with Perseus. As a result, network
analysis tools originally implemented in external tools can run from
within the Perseus workflow with only minor adjustments. The implementations
of the PHOTON and WGCNA plugins presented in this Article are based
upon PluginInterop and its companion libraries. Instructions for interested
developers on how to write scripts for Perseus or how to adapt existing
tools can be found on the PluginInterop website (https://github.com/cox-labs/PluginInterop). In the following sections, we will present a number of network
analyses which are now implemented in Perseus, with focus on their
application to different types of proteomics data.
Figure 3
Schematic of the Perseus
plugin system. Plugins written in C# are
native to Perseus and implement their functionality directly on top
of the application programming interfaces and data structures provided
by the application framework. PluginInterop enables the execution
of scripts in the Python and R languages, as well as other external
programs. By communicating via the file system, data are transferred
between Perseus and the external program. The companion libraries
‘perseuspy’ and ‘PerseusR’ enable developers
to access the data science ecosystem in their language of choice.
For custom graphical user interface elements and an improved user
experience of external tools, developers can implement a thin C# wrapper
class that extends the generic functionality of PluginInterop.
Schematic of the Perseus
plugin system. Plugins written in C# are
native to Perseus and implement their functionality directly on top
of the application programming interfaces and data structures provided
by the application framework. PluginInterop enables the execution
of scripts in the Python and R languages, as well as other external
programs. By communicating via the file system, data are transferred
between Perseus and the external program. The companion libraries
‘perseuspy’ and ‘PerseusR’ enable developers
to access the data science ecosystem in their language of choice.
For custom graphical user interface elements and an improved user
experience of external tools, developers can implement a thin C# wrapper
class that extends the generic functionality of PluginInterop.
Affinity Enrichment MS
Interactomics
Affinity purification
or enrichment coupled to MS analysis has become a powerful tool for
interrogating PPIs.[24,25] Not only is it able to provide
a detailed view on proteins of interest, but it can also determine
the basic building blocks for the assembly of large-scale PPI networks.[26,27] Historically, protein complex members were detected by subjecting
the sample to a series of purification steps followed by MS identification.
With the advent of quantitative MS, detecting even transient interactions
has become possible by relying not on the identification itself, but
instead on quantitative information. The sample is not purified but
only enriched for the protein of interest and its interaction partners
and then subjected to MS quantification.[28]Confidently identifying bona fide interactions and distinguishing
them from background binders, arising from off-target binding or contamination,
require data analysis of replicate case and control measurements.
Compared to purely fold-change-based methods, statistical tests provide
a powerful way to compare case and control samples by calculating
a test statistic and an associated p-value and limit
the number of false-positives. For visual inspection of the results,
the (negative logarithm of the) p-value can be plotted
against the size of the effect, i.e., the difference between the means
of logarithmic abundances, in a so-called volcano plot. Since one
statistical test is performed for each protein, which amounts to a
large number of tests performed simultaneously, the significance level
needs to be adjusted to avoid increased numbers of false positives
due to the multiple hypothesis testing problem.[29] A popular strategy to adjust for multiple testing is to
control the false discovery rate (FDR), which can be achieved by permutation-based
methods. Furthermore, in the volcano plot method it is necessary to
define the functional form of the curves that separate significant
from non-significant hits, either by straight lines or, in a more
sophisticated way, introduced in the significance analysis of microarrays
(SAM) method,[30] by modifying the t test statistic with the background variance parameter s0. This standard workflow is available in Perseus
but becomes increasingly cumbersome for interaction screens with more
than a handful of baits. Parameter values for s0 and the FDR thresholds are often applied separately for each
pulldown, inviting overfitting and cherry-picking, and also requiring
results be subsequently combined manually.We implemented the
interactive multi-volcano plot (Figure a) to analyze interaction screens
with arbitrarily many baits and conditions simultaneously. Given the
experimental design of the dataset, defined by baits and conditions,
the analysis is applied to each experiment. FDR threshold and s0 parameters for two different Class A (high)
and Class B (low) confidence classes can be selected globally. For
sufficiently large datasets, instead of dedicated control samples,
an internal control can be assembled from the dataset for each pulldown
consisting of pulldowns of other, unrelated baits. The results can
be inspected through an interactive user interface. All volcano plots
are displayed in the overview panel. A multi-functional detail panel
shows more information on selected plots and provides zoom, protein
selection, and labeling options. If a single plot is selected, the
volcano plot is shown in the detail panel. When two plots are selected,
the t test differences between the selected experiments
are plotted against each other, highlighting changes in the enrichment
of proteins between experiments (Figure b). Additionally, all data can be browsed
in tabular form, making it easily searchable and allowing for rich
styling options. Known interactors or gene ontology annotations matching
the experiment can be used to highlight proteins in the plot and can
serve as a positive control for the adjustment of test parameters.
Since all test parameters are controlled on a global level, overfitting
and cherry-picking parameter values is prevented effectively. We integrated
the multi-volcano analysis into the new network module. Results from
PPI screens can be exported as network objects into the Perseus workflow.
A specialized node-link visualization based on the open-source cytoscape.js
library[31,32] with multiple layers of information, allows
for easy interpretation of the results (Figure c). A PPI network that was newly created
in this way can be integrated with existing networks or exported in
various formats using the functions available through the network
module.
Figure 4
AP-MS. (a) The Hawaii plot provides an overview over an entire
dataset, in this case consisting of three baits in two conditions
(Table S1 and Experimental
Section).[14] Each volcano plot displays
the results of a pull-down of a specific bait (Bap1, Eed, Ring1b)
in one of the ESC or NPC cell lines. Significant interactors are determined
using a permutation-based FDR and the resulting high-confidence Class
A (solid line) and low-confidence Class B (dashed lines) thresholds
are displayed in the plot. In this case only in the Bap1 ESC pull-down,
Class B interactions could be found. Class A interactors are displayed
in dark gray, other proteins are shown in light gray. (b) Enrichment
plot comparing the Eed pull-downs in ESC and NPC cell lines. Significant
interactors in any of the two conditions are displayed in black, nonsignificant
proteins are displayed in light gray. Proteins differentially enriched
in one of the two conditions will be located far from the diagonal
and can be identified visually. (c) Visualization of the resulting
protein interaction network for both cell lines. Bait proteins are
colored in green, and their interactors are colored in blue. Thick
lines represent Class A interactions, thinner lines Class B. Interactions
which were already annotated in the human CORUM database are highlighted
in red.
AP-MS. (a) The Hawaii plot provides an overview over an entire
dataset, in this case consisting of three baits in two conditions
(Table S1 and Experimental
Section).[14] Each volcano plot displays
the results of a pull-down of a specific bait (Bap1, Eed, Ring1b)
in one of the ESC or NPC cell lines. Significant interactors are determined
using a permutation-based FDR and the resulting high-confidence Class
A (solid line) and low-confidence Class B (dashed lines) thresholds
are displayed in the plot. In this case only in the Bap1 ESC pull-down,
Class B interactions could be found. Class A interactors are displayed
in dark gray, other proteins are shown in light gray. (b) Enrichment
plot comparing the Eed pull-downs in ESC and NPC cell lines. Significant
interactors in any of the two conditions are displayed in black, nonsignificant
proteins are displayed in light gray. Proteins differentially enriched
in one of the two conditions will be located far from the diagonal
and can be identified visually. (c) Visualization of the resulting
protein interaction network for both cell lines. Bait proteins are
colored in green, and their interactors are colored in blue. Thick
lines represent Class A interactions, thinner lines Class B. Interactions
which were already annotated in the human CORUM database are highlighted
in red.As an example application, we
obtained pull-down experiments of
Polycomb group proteins from ref (14), covering the three baits Bap1, Eed, and Ring1
in mouse embryonic stem cells (ESCs) and neural progenitor cells (NPCs).
The filtered dataset contained 2995 proteins (Table S1). Using the new multi-volcano analysis (Figure a), we obtained an
interaction network connecting the bait proteins with their significantly
enriched prey proteins. Bait proteins were identified by their gene
name, as specified in the annotation rows of the dataset. In order
to have a consistent representation, the protein groups of the preys
are also identified by their gene names. The resulting network contained
134 nodes and 140 edges. The results were comparable to the original
publication with overlaps between 55% (Ring1b ESC) and 91% (Bap1 ESC)
between the previously reported interactions and detected Class A
interactions. Differences can be explained by the slightly different
methodology used in this Article. We used the s0-modified t test with s0 set to 1.0, and FDRs of 0.01% and 0.2% for Class A and B,
respectively, while the authors of ref (14) used individually chosen fold-change and p-value cutoffs for each experiment. No Class C interactions
were included. Using the built-in visualization features, such as
the enrichment between experiments, we identified several interactions
that were conditional on the cell type (Figure b). By annotating the newly created protein-interaction
network with known complex interaction from CORUM and inspecting the
resulting node-link network visualization (Figure c), previously known and possibly novel interactions
could be distinguished.Further confidence in the existence
of an interaction between a
protein identified in a pulldown and the bait can be obtained by correlation
analysis. The correlation of the intensity profiles over many pulldowns
with the bait intensity profile is reported in the output tables,
together with the volcano plot-derived significance of the interaction.
When assembling the interaction network, a threshold is applied to
this correlation in order to define an additional class of interactions
(Class C), which might not have been found by volcano plot analysis
(Class A and Class B). This workflow is especially appealing for interaction
screens with a large number of bait proteins.
Importing, Curating, and
Probing Large-Scale PPI Networks
While protein interaction
screens can uncover novel or condition-specific
interactions, a wealth of detected and predicted interactions are
already stored in PPI databases.[33] Analyzing
large-scale PPI networks jointly with other omics data has great promise.
However, a major obstacle to performing systems-level analysis on
these large-scale networks is the lack of easy-to-use software solutions
to transparently handle the processing and analysis of these networks.
Many studies under-utilize the existing resources and mostly report
the interactions of a single protein as an afterthought. In the following,
we introduce the new network capabilities of Perseus to assemble,
filter, and understand large-scale PPI networks, which lay the foundation
for any network analysis.The first task is assembling a high-confidence
interaction network. Many databases, such as STRING,[20] BioGRID,[21] or HIPPIE,[34] allow researchers to download all interactions
in a tabular format, which can be easily loaded into Perseus, even
with sizes of up to few millions of interactions. Supporting information
on the interactions such as, but not limited to, the interaction type
or a measure of confidence remains available at each step in the subsequent
data analysis. Networks are not restricted to originate from any single
data source. Perseus provides all necessary tools to integrate information
from any source, providing full control over the choice of identifier
and handling of duplication and ambiguity in the mapping. Conversely,
generalized interaction networks such as STRING can be filtered by
interaction type to generate a physical interaction network. Confidence
measures often integrate diverse knowledge into a single score, derived
from how often, and by which experimental technique, an interaction
was detected, combined with more abstract measures, such as co-expression
and literature co-occurrence of the interaction partners.[20] There are two approaches for interaction confidence
aware network analysis (Figure a). Applying a cutoff to the confidence score removes low-confidence
interactions from the network, which is especially useful when applying
methods that treat all interactions equally. The cutoff can be chosen
according to the confidence score distribution and the targeted network
size (Figure b). Other
methods operate on weighted networks and distinguish between interactions
with high or low confidence. In this case the confidence scores can
be used as an edge weight. In addition to static confidence scores,
one can devise dynamic confidence scores from experimental data which
reflect, e.g., changes in abundance or localization of any of the
interactors.
Figure 5
Handling large-scale protein interaction databases in
Perseus.
(a) Interactions in PPI databases are often annotated with confidence
scores derived from various sources. Perseus provides tools to load
and combine confidence scores derived a variety of data sources, including
dynamic confidence adjustments based on condition-specific data. High-confidence
networks can be obtained by removing edges below a given hard threshold
or alternatively, confidence scores can be utilized directly as so-called
edge weights, thereby allowing for the inclusion of lower-confidence
interactions. (b) Histogram of the combined confidence score from
the human STRING PPI network. Superimposed in orange is the number
of interactions in the filtered network if the edges with scores lower
than the current value were removed. Filtering out low confidence
edges leads to a significant reduction in the number of edges in the
final network. (c) Log–log plot of the node degree against
the degree frequency generated from the human STRING PPI network.
The R2 value of the linear fit (orange)
to these data represents the scale-free fit index.
Handling large-scale protein interaction databases in
Perseus.
(a) Interactions in PPI databases are often annotated with confidence
scores derived from various sources. Perseus provides tools to load
and combine confidence scores derived a variety of data sources, including
dynamic confidence adjustments based on condition-specific data. High-confidence
networks can be obtained by removing edges below a given hard threshold
or alternatively, confidence scores can be utilized directly as so-called
edge weights, thereby allowing for the inclusion of lower-confidence
interactions. (b) Histogram of the combined confidence score from
the human STRING PPI network. Superimposed in orange is the number
of interactions in the filtered network if the edges with scores lower
than the current value were removed. Filtering out low confidence
edges leads to a significant reduction in the number of edges in the
final network. (c) Log–log plot of the node degree against
the degree frequency generated from the human STRING PPI network.
The R2 value of the linear fit (orange)
to these data represents the scale-free fit index.A deeper understanding of the network requires
a different perspective
in addition to the interaction-centric view. Any list of interactions
can be converted into a network collection with a single click. A
dedicated set of network-specific processing activities are now available.
While processing the list of interactions, the focus remains on the
edges of the network. In the network view, the focus is shifted to
the nodes. With the powerful identifier and data mapping mechanisms
in Perseus, nodes are easily annotated with various annotations, such
as gene ontology (GO),[35] or quantitative
proteomics data. Any annotation can be subsequently used to filter
the nodes of the network. One could, for example, extract a sub-network
of proteins associated with a specific GO category and their interactions
from the large-scale network. Using the data mapping from, e.g., deep
proteomes of specific cell lines or tissues, condition-specific sub-networks
can be created.Further understanding is gained by studying
the intrinsic properties
of networks. By calculating node degrees, corresponding to the number
of neighbors of each node in the network, hub nodes can be distinguished
from peripheral nodes. By analyzing the distribution of the node degrees
in the network, global network properties, such as approximate scale-freeness,[36,37] of the topology can be identified (Figure c). Furthermore, intrinsic local network
properties, like the node degree, can be correlated with biological
properties derived from protein annotations or experimental data.
The proper construction of large-scale interaction networks and understanding
of their basic properties are central to the successful application
of more specialized analyses such as the integration of such networks
with PTM data.
Network Analysis of PTM Data
The
MS-based study of
PTMs is now possible on a global scale for several types of modifications.
The best known example is MS-based phosphoproteomics,[38] which is a powerful tool for interrogating signaling
events on a large scale. However, drawing conclusions directly from
phosphorylation changes is challenging, due to the mostly missing
functional information on the inhibitory or excitatory action of a
specific protein phosphorylation at a specific site. Network-based
approaches for the analysis of phosphorylation data derive functional
information on the protein level by interrogating the phosphorylation
changes observed in the network neighborhood.[17,39,40]We implemented the popular kinase–substrate
enrichment analysis[39] (KSEA) tool for predicting
kinase activities in Perseus. Site-specific kinase–substrate
networks (Figure a)
assign kinases to the experimentally observed phosphorylation sites.
The core of the analysis is the calculation of a series of scores
(mean, enrichment, Z-score, p-value, q-value) for each kinase, based on the quantitative phosphorylation
changes of its substrates. These predicted kinase activities can be
analyzed further to find differentially activated kinases. KSEA most
often utilizes the curated kinase–substrate network from the
PhosphoSitePlus database.[18,41,42] In order to extend the coverage of the network and thereby allow
for the utilization of a larger fraction of the experimental data,
the network can be supplemented with predicted kinase–substrate
interactions from tools such as NetworKIN[43,44] or with low-specificity interactions derived from kinase target
sequence motifs.
Figure 6
Network analysis of an EGF stimulation phosphoproteomics
study. (a) Comparison of network topologies used for the analysis
of phosphoproteomics data. Nodes in the network are represented
as gray circles or pie charts where each slice represents the observed
phosphorylation changes at a specific site on the protein. Physical
protein–protein interactions (left side) are present between
all classes of proteins and are by definition undirected. In order
to capture the enzymatic action of kinases more accurately, directed
interactions (right side) from kinase to substrate are defined in
a site-specific manner. (b) KSEA Z-score and PHOTON
signaling functionality scores derived from phosphoproteomics
data measured after EGF stimulation (Table S2) only weakly correlate to each other (Pearson correlation 0.52).
Kinases annotated in GO with the term ‘Epidermal growth factor
receptor signaling pathway’ are highlighted in red. Both methods
assign high scores to central members of the expected pathway. (c)
Signaling network reconstructed by PHOTON from the 100 highest scoring
proteins anchored at EGF. The interactive visualization has an automatic
layout and phosphorylation data overlay.
Network analysis of an EGF stimulation phosphoproteomics
study. (a) Comparison of network topologies used for the analysis
of phosphoproteomics data. Nodes in the network are represented
as gray circles or pie charts where each slice represents the observed
phosphorylation changes at a specific site on the protein. Physical
protein–protein interactions (left side) are present between
all classes of proteins and are by definition undirected. In order
to capture the enzymatic action of kinases more accurately, directed
interactions (right side) from kinase to substrate are defined in
a site-specific manner. (b) KSEA Z-score and PHOTON
signaling functionality scores derived from phosphoproteomics
data measured after EGF stimulation (Table S2) only weakly correlate to each other (Pearson correlation 0.52).
Kinases annotated in GO with the term ‘Epidermal growth factor
receptor signaling pathway’ are highlighted in red. Both methods
assign high scores to central members of the expected pathway. (c)
Signaling network reconstructed by PHOTON from the 100 highest scoring
proteins anchored at EGF. The interactive visualization has an automatic
layout and phosphorylation data overlay.PHOTON,[17] now available in Perseus,
is an alternative approach to KSEA that calculates more broadly defined
signaling functionality scores for any protein, rather than activities
for kinases only. A data-annotated large-scale PPI network now serves
as the input (Figure a). The resulting signaling functionality scores for each experimental
condition are based on the observed phosphorylation in the neighborhood
of each protein and are assigned a significance by a permutation scheme.
The scores can either be analyzed directly, to find proteins with
differentially changing signaling functionality, or utilized in a
second step of the PHOTON pipeline, in which signaling pathways are
automatically reconstructed from the large-scale network that connect
the proteins with significant signaling functionality.[17]The Perseus network module allows for
performing both KSEA and
PHOTON analysis on the same experimental data[17] and a choice of networks.[18,34] When applied to a phosphoproteomics
dataset of EGF stimulation,[17] the trade-offs
of both methods in terms of coverage can be compared at every step
of the analysis by inspecting the matrices and network collections
in the workflow. For KSEA, 583 (5.66%) phosphorylation sites could
be mapped to 975 (9.43%) site-specific kinase substrate interactions.
As expected, PHOTON provided more coverage, with 9148 (99.82%) sites
mapped to 2070 (16.87%) nodes in the PPI network. Due to the differences
in the utilized methodologies and the chosen networks, resulting scores
will differ but can be compared with the analyses and visualizations
provided by Perseus (Figure b). While KSEA is tailored to the analysis of phosphoproteomics
data due to its focus on kinase–substrate interactions, PHOTON
is not limited to phosphorylation. Any quantitative, large-scale PTM
dataset can be mapped to the PPI network, signaling functionality
scores can be calculated, and sub-networks can be reconstructed.Both tools support the analysis of datasets with multiple conditions,
effectively transforming the peptide-level phosphorylation data into
protein-level scores. The entire well-established toolset for the
analysis of protein quantification data can be applied to these scores,
including hierarchical clustering, enrichment analysis,[17] and time-series analysis. To visualize PTM data
in the context of any network, we implemented an interactive visualization
of directly in Perseus (Figure c) using the cytoscape.js library.[31] The visualization allows for the joint visual inspection of the
networks, e.g., sub-networks reconstructed by PHOTON, and the measured
data. Browsing the quantitative PTM data in a reduced and highly structured
network view while also considering the signaling functionality scores
allows for the generation of hypotheses that explain the signal transduction
mechanistically.
Co-expression Clustering and Clinical Data
When performing
co-expression analysis, the correlation matrix between the proteins
in the dataset describes a fully connected, weighted network, in which
the weight on each edge denotes the correlation between the quantitative
profiles of the two proteins (Figure a). Hence, the actual network usually remains implicit.
A hierarchical clustering of the co-expression network can utilize
the network neighborhood of each protein and integrate it into the
similarity calculation.[45] The cluster dendrogram
and the detected co-expression modules are then transferred back to
the original data, where their interpretation is equivalent to ordinary
hierarchical clustering. In addition to the clustering, a representative
expression profile for each of the clusters is generated, which is
termed eigengene. This highly reduced view on the data can be correlated
with clinical or phenotype data and clustered to gain a better understanding
of the behavior of the detected cluster (Figure b). The described co-expression analysis
is available in Perseus through the R language interface provided
by PluginInterop, which interfaces directly with the established WGCNA
library.[16]
Figure 7
Co-expression network analysis on clinical
data. (a) The correlation
matrix is an equivalent representation of a fully connected network
with edge weights corresponding to the correlation between the proteins.
(b) Co-expression clustering and identified co-expression modules
annotate the original expression matrix. Phenotype data can be correlated
with representative co-expression module profiles and provide a high-level
interpretation of the modules. (c) Parameter selection of the power
parameter for the Yanovich et al.[19] dataset
(Table S3 and Experimental
Section). The lowest power reaching close to a high scale-free
fit index of 0.9 (red line) was selected. (d) Co-expression cluster
dendrogram. Each color corresponds to one co-expression module. (e)
Correlation heat map between module eigengenes and clinical parameters.
Co-expression network analysis on clinical
data. (a) The correlation
matrix is an equivalent representation of a fully connected network
with edge weights corresponding to the correlation between the proteins.
(b) Co-expression clustering and identified co-expression modules
annotate the original expression matrix. Phenotype data can be correlated
with representative co-expression module profiles and provide a high-level
interpretation of the modules. (c) Parameter selection of the power
parameter for the Yanovich et al.[19] dataset
(Table S3 and Experimental
Section). The lowest power reaching close to a high scale-free
fit index of 0.9 (red line) was selected. (d) Co-expression cluster
dendrogram. Each color corresponds to one co-expression module. (e)
Correlation heat map between module eigengenes and clinical parameters.We applied the WGCNA co-expression
analysis to parts of a cancer
proteomics dataset[19] following the recommended
workflow (http://www.peterlangfelder.com/wgcna-resources-on-the-web/)
from within Perseus. Bi-weight midcorrelation, a robust alternative
to Pearson correlation, was chosen to calculate correlations between
all pairs of proteins. In order to obtain a scale-free co-expression
network, a power parameter of 10 was selected (Figure c), leading to an approximately scale-free
network with a scale-free fit index of 0.9. Hierarchical clustering
of the co-expression network identified 30 modules (Figure d). The representative expression
profiles of each of the modules, as provided by the corresponding
module eigengene, were correlated with the available clinical annotations.
This high-level overview over the data was then visualized in a heatmap
(Figure e). Several
modules showed high correlations with specific clinical annotations.
The magenta module showed high correlation with the triple-negative
subtype (TN) and was significantly enriched for the ‘interferon-gamma-mediated
signaling pathway’ GO category (q = 1.12 ×
10–05). The top module hub genes with kME > 0.8
were GBP1, TAP1, TAPBP, HLA-A, TAP2, STAT1, and EML4. The purple module
showed high correlation with Stage III, but when inspecting the profile
of its eigengene, we found it to have a single peak at one patient
while being flat for all others. Hence, a set of proteins that are
highly expressed in only a single patient, dominate the purple module,
thereby limiting the validity of the module.
Software Implementation,
Download, and Maintenance
The Perseus network module PerseusNet
is implemented in the C# programming
language using Visual Studio 2017, like the whole Perseus software.
PerseusNet is distributed with Perseus by default and can be downloaded
from http://www.perseus-framework.org. The current version, which is described in this Article, is 1.6.2.3.
The PluginInterop and PHOTON plugins are also included in the standard
download. In the current release, it is recommended to use Windows
as operating system, although Linux support is underway, realized
in the same way as for the MaxQuant software,[46,47] by ensuring Mono compatibility. A plugin API enables external programmers
to extend the functionality of PerseusNet and Perseus in general,
by programming their own workflow activities. Plugin extensions by
the user community will be linked from the plugin store at http://www.coxdocs.org/doku.php?id=perseus:user:plugins:store upon request. Context-specific documentation is linked from each
activity (Figure S3). Step-by-step guides
for the integration of external tools, such as Python or R, that have
to be installed and configured separately from the main Perseus software,
are available online (https://github.com/jdrudolph/PluginInterop). A help forum for Perseus and PerseusNet is available at https://groups.google.com/group/perseus-list. Bugs that are reproducible in the latest available software version
should be reported at https://maxquant.myjetbrains.com/youtrack. All presented analyses and necessary installations take less than
an hour altogether.
Conclusions
We introduced PerseusNet,
the network analysis extension for the
Perseus software. It enables proteomics researchers to perform most
network analysis by themselves. PerseusNet is highly extensible through
a plugin API and its extension to R and Python, which allows for the
incorporation of a plethora of existing scripts and programs from
the network community. We envision that large part of the future programming
will be done not by local developers but by the global community through
the plugin API. Programmers can release their plugins under licenses
of their choice.We have implemented powerful proteomics-specific
activities for
AP-MS network generation and PTM-related network analysis, presumably
the two main applications for networking in proteomics. We plan to
extend PerseusNet in the near future by activities from other proteomics
sub-domains, as interaction determination by protein correlation profiling[48] and large-scale network generation from cross-linking
experiments on whole-cell lysates.[49]
Authors: Paul Shannon; Andrew Markiel; Owen Ozier; Nitin S Baliga; Jonathan T Wang; Daniel Ramage; Nada Amin; Benno Schwikowski; Trey Ideker Journal: Genome Res Date: 2003-11 Impact factor: 9.043
Authors: Marco Y Hein; Nina C Hubner; Ina Poser; Jürgen Cox; Nagarjuna Nagaraj; Yusuke Toyoda; Igor A Gak; Ina Weisswange; Jörg Mansfeld; Frank Buchholz; Anthony A Hyman; Matthias Mann Journal: Cell Date: 2015-10-22 Impact factor: 41.582
Authors: Rune Linding; Lars Juhl Jensen; Adrian Pasculescu; Marina Olhovsky; Karen Colwill; Peer Bork; Michael B Yaffe; Tony Pawson Journal: Nucleic Acids Res Date: 2007-11-02 Impact factor: 16.971
Authors: Cynthia L Smith; Judith A Blake; James A Kadin; Joel E Richardson; Carol J Bult Journal: Nucleic Acids Res Date: 2018-01-04 Impact factor: 16.971
Authors: Nicolás Herranz; Suchira Gallage; Massimiliano Mellone; Torsten Wuestefeld; Sabrina Klotz; Christopher J Hanley; Selina Raguz; Juan Carlos Acosta; Andrew J Innes; Ana Banito; Athena Georgilis; Alex Montoya; Katharina Wolter; Gopuraja Dharmalingam; Peter Faull; Thomas Carroll; Juan Pedro Martínez-Barbera; Pedro Cutillas; Florian Reisinger; Mathias Heikenwalder; Richard A Miller; Dominic Withers; Lars Zender; Gareth J Thomas; Jesús Gil Journal: Nat Cell Biol Date: 2015-08-17 Impact factor: 28.824
Authors: David Vanderwall; Poudel Suresh; Yingxue Fu; Ji-Hoon Cho; Timothy I Shaw; Ashutosh Mishra; Anthony A High; Junmin Peng; Yuxin Li Journal: J Vis Exp Date: 2021-10-19 Impact factor: 1.424
Authors: Cristian De Gregorio; David Contador; Diego Díaz; Constanza Cárcamo; Daniela Santapau; Lorena Lobos-Gonzalez; Cristian Acosta; Mario Campero; Daniel Carpio; Caterina Gabriele; Marco Gaspari; Victor Aliaga-Tobar; Vinicius Maracaja-Coutinho; Marcelo Ezquer; Fernando Ezquer Journal: Stem Cell Res Ther Date: 2020-05-01 Impact factor: 6.832
Authors: Nellie A Martin; Kirsten H Hyrlov; Maria L Elkjaer; Eva K Thygesen; Agnieszka Wlodarczyk; Kirstine J Elbaek; Christopher Aboo; Justyna Okarmus; Eirikur Benedikz; Richard Reynolds; Zoltan Hegedus; Allan Stensballe; Åsa Fex Svenningsen; Trevor Owens; Zsolt Illes Journal: Front Immunol Date: 2020-06-05 Impact factor: 7.561