Suruchi Aggarwal1,2,3, Ajay Kumar1, Shilpa Jamwal1, Mukul Kumar Midha1, Narayan Chandra Talukdar2,3, Amit Kumar Yadav1. 1. Translational Health Science and Technology Institute, NCR Biotech Science Cluster, 3rd Milestone, Faridabad-Gurgaon Expressway, Faridabad 121001, Haryana, India. 2. Division of Life Sciences, Institute of Advanced Study in Science and Technology, Vigyan Path, Paschim Boragaon, Garchuk, Guwahati, Assam 781035, India. 3. Department of Molecular Biology and Biotechnology, Cotton University, Panbazar, Guwahati, Assam 781001, India.
Abstract
Quantitative proteomics has evolved considerably over the last decade with the advent of higher order multiplexing (HOM) techniques. With the development of methods such as-multitagging, cPILOT, hyperplexing, BONPlex, and MITNCAT, the HOM technique is rapidly taking the center stage in multiplexed quantitative proteomics. These studies combined MS1 and MS2 labels in a single experiment enabling higher sample throughput. While HOM is highly promising, the computational analysis is still a big challenge, as the available tools cannot harness its power completely. We have developed a new quantitative pipeline, HyperQuant to aid in accurately quantitating complex HOM data. The pipeline uses identification results from either MaxQuant or any other search engine and quantitation results from QuantWizIQ. The Mapper and Combiner modules of HyperQuant allow facile integration of the labeled data, along with peptide spectrum match (PSM) intensity/ratio integration for proteins, respectively, for each PSM label combination. This also includes appropriate combination of replicates/fractions before summarizing the protein intensity/ratio, leading to robust quantitation. To the best of our knowledge, this is the first tool for the quantitation of HOM data with flexibility for any combination of MS1 and MS2 labels. We demonstrate its utility in analyzing two 18-plex data sets from the hyperplexing and the BONplex studies. The tool is open source and freely available for noncommercial use. HyperQuant is a highly valuable tool that will help in advancing the field of multiplexed quantitative proteomics.
Quantitative proteomics has evolved considerably over the last decade with the advent of higher order multiplexing (HOM) techniques. With the development of methods such as-multitagging, cPILOT, hyperplexing, BONPlex, and MITNCAT, the HOM technique is rapidly taking the center stage in multiplexed quantitative proteomics. These studies combined MS1 and MS2 labels in a single experiment enabling higher sample throughput. While HOM is highly promising, the computational analysis is still a big challenge, as the available tools cannot harness its power completely. We have developed a new quantitative pipeline, HyperQuant to aid in accurately quantitating complex HOM data. The pipeline uses identification results from either MaxQuant or any other search engine and quantitation results from QuantWizIQ. The Mapper and Combiner modules of HyperQuant allow facile integration of the labeled data, along with peptide spectrum match (PSM) intensity/ratio integration for proteins, respectively, for each PSM label combination. This also includes appropriate combination of replicates/fractions before summarizing the protein intensity/ratio, leading to robust quantitation. To the best of our knowledge, this is the first tool for the quantitation of HOM data with flexibility for any combination of MS1 and MS2 labels. We demonstrate its utility in analyzing two 18-plex data sets from the hyperplexing and the BONplex studies. The tool is open source and freely available for noncommercial use. HyperQuant is a highly valuable tool that will help in advancing the field of multiplexed quantitative proteomics.
Proteomics has enabled
the high throughput study of cellular systems
to uncover the mechanisms regulating cellular health and disease.
Understanding the cellular signaling networks or perturbations to
different types of stimuli requires robust and reproducible quantitation
at a large scale. Quantitative proteomics has made it possible to
identify as well as quantify proteins from multiple conditions in
a single run.[1] Protein quantitation in
shotgun proteomics is carried out using metabolic or chemical labeling.[2] Metabolic labeling of proteins with SILAC (stable
isotope labeling of amino acids in cell culture) replaces essential
amino acids in the cell culture with their stable isotope-labeled
counterparts (such as heavy lysine or arginine).[3] The separate cultures from normal (light) and labeled (heavy)
samples are subsequently mixed in equal amounts, digested, and analyzed
by LC–MS/MS. The peptides from the two samples are reflected
in MS1 spectra as pairs separated by known mass differences
between light and heavy peptides. On sequencing in MS/MS, the peptides
are identified by database search followed by FDR control,[4] while their MS1 intensities are a
proxy for their relative quantitation. Because of mixing the cell
cultures early in the workflow, it is the most accurate technique
for quantitation. However, biological samples cannot always be labeled
in cell culture and require chemical labeling. In chemical labeling
techniques such as iTRAQ[5] (isobaric tags
for relative and absolute quantitation) or TMT[6] (tandem mass tag), digested peptides from two to sixteen samples
are labeled with different variants of isobaric tags that label the
N-termini and the free amino group on lysine residues.[7] The isobaric tags increase the mass of peptides from all
samples equally, and the peaks in MS1 represent a sum of
peptide intensities from all samples. Upon fragmentation in MS/MS,
the unique mass reporter ions from iTRAQ/TMT tags are observed in
the low mass region, while the sequencing peaks are used for identification.
The reporter peaks help in relative quantitation between the samples.[8] While these techniques have made proteome-wide
quantitation possible, strategies that enhance the depth and multiplexing
capacity of quantitation are desirable for systems biology studies.[9]With the advancements in higher order multiplexing
(HOM) technologies
(combining MS1 and MS2 labels), designing a
statistically robust experiment with high sample throughput is now
feasible for studying proteome dynamics at systems level.[10] Identifying and quantifying proteins up to 54
conditions in a single run of the mass spectrometer with the help
of HOM considerably reduces the technical variability arising because
of multiple runs.[11] Since the first experiment
performed in 2010, the technology has vastly evolved with different
combinations of metabolic and chemical labeling such as multitagging,[12] cPILOT,[13−17] hyperplexing,[18] SILAC-iTRAQ tails,[19] TMT-SILAC hyperplexing,[20,21] BONPlex,[51] MITNCAT,[22] and mPDP[23] to achieve higher
sample throughput in a single mass spectrometry run. Even though the
technique has been around for almost a decade, the computational analysis
is still lagging behind and is performed with two different searches
and custom scripts for analysis. Using conventional tools, the quantitative
analysis is cumbersome as the individual runs inadvertently summarize
protein quantitation incorrectly. The dual search strategy is performed
to identify peptides as no search engine can search for two modifications
together on one amino acid. For identification of peptides labeled
with both MS1 and MS2, a modified MS1 search (including mass of MS2 label) is conducted which
provides a summed up MS1 quantitation (Figure A). For quantitation of MS2 labels, another search is performed for MS2 combination
with lighter MS1, and custom scripts are written to derive
MS2 quantitation of heavier MS1 combinations
(Figure B). The downstream
analysis becomes a challenge that requires custom analysis suited
to each experiment type for the correct biological interpretation
or designing ingenious experiments to evade the need for quantitation
from one of the labels, limiting the capacity of HOM.[10] For example, in the 18-plex hyperplexing study, the SILAC
labels were used for biological replicates, while TMT labels were
used for time points.[18] Two database searches
were conducted—one for each label as fixed modification and
quantitation was performed using in-house program Vista.[24] The use of two types of chemical labels in cPILOT
allowed it to be useful for biological and clinical samples that cannot
be metabolically labeled. The protein ratios in the cPILOT studies
were calculated with the help of an internal standard reporter ion,
thus requiring one of the reporters to be used as an internal standard.[15] In TMT-SILAC hyperplexing study, the SILAC labels
were used to study the kinetics of protein quantitation over time points labeled by TMT
labels. Two searches were performed, one with SILAC as a fixed label,
yielding SILAC ratios and one with only TMT labels. A kinetics model
was calculated based on reporter ion intensities to calculate protein
synthesis and degradation.[20] In MITNCAT
study, the protein dynamics was studied with the TMT labels at specific
time points, while SILAC and AHA labels were used exclusively for
protein selection in a time window.[22]
Figure 1
Conceptual
comparison of data analysis performed by current tools
versus HyperQuant. Current tools cannot quantitate the HOM data correctly.
Manipulating label definitions in search tools (SILAC-Light, SILAC-Medium,
and SILAC-Heavy) can help in the identification of PSMs but not correct
protein quantitation whether MS1-based search (SILAC search)
or MS2-based search (iTRAQ/TMT) is used. (A) MS1 quantitative search will combine the quantitative contribution from
all six time points of each SILAC into the respective MS1 peaks (L/M/H), as depicted by the colored boxes (light blue, light
red, and light yellow). (B) MS2 search only allows identification
and quantitation of SILAC light-based reporters. The SILAC medium
and SILAC heavy reporters will not be identified or quantified. (C)
HyperQuant combines data from A with reporter quantitation from QuantWizIQ to correctly carry out quantitation across each SILAC-iTRAQ
combination separately.
Conceptual
comparison of data analysis performed by current tools
versus HyperQuant. Current tools cannot quantitate the HOM data correctly.
Manipulating label definitions in search tools (SILAC-Light, SILAC-Medium,
and SILAC-Heavy) can help in the identification of PSMs but not correct
protein quantitation whether MS1-based search (SILAC search)
or MS2-based search (iTRAQ/TMT) is used. (A) MS1 quantitative search will combine the quantitative contribution from
all six time points of each SILAC into the respective MS1 peaks (L/M/H), as depicted by the colored boxes (light blue, light
red, and light yellow). (B) MS2 search only allows identification
and quantitation of SILAC light-based reporters. The SILAC medium
and SILAC heavy reporters will not be identified or quantified. (C)
HyperQuant combines data from A with reporter quantitation from QuantWizIQ to correctly carry out quantitation across each SILAC-iTRAQ
combination separately.Despite the obvious benefits
of the HOM technique to provide reproducible,
cost-, and time-efficient study designs, it is not yet fully exploited
as a regular proteomic workflow because of challenges in computational
analysis and mixing of quantitation from MS1 and MS2 labels.[10] To enable easy and flexible
analysis of data from HOM studies, we have designed a computational
pipeline—HyperQuant, which is the first tool that collates,
deconvolutes labels, summarizes, and calculates normalized protein
intensities or ratios for each condition (Figure C). The normalized intensities can be used
to calculate relative protein ratios from any combination of metabolic
or chemical labels as per the experiment to facilitate uncomplicated
biological interpretations. HyperQuant is flexible enough to use the
results from any search engine provided in a simple predefined text
format making it interoperable with many search engines. The peptide
and protein quantitation is performed by combining data from multiple
replicates and can output the summarized protein intensity. HyperQuant
has several advantages apart from experiment-agnostic integration
of HOM data. It also rescues proteins with a low number of peptide
spectrum matches (PSMs) in individual replicates as combining PSMs
before outlier removal or filtering saves consistently occurring replicate
level one-hit wonders. Apart from rescuing protein identifications,
increasing the number of peptide data points can enhance the statistical
confidence in the protein quantitation. The poorer (or highly variable)
quantitation values across replicates will be removed during an outlier
removal. HyperQuant also provides normalized intensities for each
sample used in the experiment, thus giving users the flexibility in
choosing any denominator for ratio calculation. To demonstrate the
utility of the pipeline; we have reanalyzed the 18-plex hyperplexing
data from Dephoure and Gygi[18] and compared
HyperQuant results with the results from their study.We also
analyzed our previously published data of “hyperplexing
and click chemistry” (later named BONPlex),[51,10,25] investigating the effect of mycobacterial
infection on human macrophage secretome using a combination of pSILAC,
AHA, and iTRAQ. Although macrophage secretome has been previously
studied using similar techniques by Eichaelbaum et al.[26,27] using pSILAC with AHA and Khan et al.[28] using targeted proteomics, these studies used either LPS or attenuated
bacteria to create an immune response. The sample multiplexing was
merely 2–3 plex and involved multiple 2-plex experiments for
temporal study.[27] Our study exploited AHA
with the power of the HOM technique to reduce the need for multiple
experiments and run-to-run variability.[51] The study was also unique for using clinical and lab strains of
Mtb instead of LPS, thus providing more realistic insights into the
elicited immune response. The experiment focused only on newly synthesized
secreted proteins and their temporal profile upon infection. Quantifying
the secretome can aid in understanding the dynamics of the host–pathogen
crosstalk. With the help of HyperQuant, strain specific as well as
temporal changes in the host secretome were quantitated demonstrating
its utility in a truly HOM experiment.
Results & Discussion
The novel design of an HOM experiment allows two types of labels
for two quantitative dimensions (MS1 and MS2) in one MS run. This can be conceptually represented as a 2-D matrix
to explain two types of labeling (Figure A,B). To ensure the complete biological understanding
of the conditions labeled, the quantitation must be performed on both
the dimensions of the matrix. The analysis is a challenge with the
current tools, necessitating multiple searches for quantitating both
labels and custom analysis scripts. This does not completely harness
the power of HOM studies. To exploit the true multiplexing nature
of technology and easy quantitation without any loss of information
on any label, we designed the HyperQuant tool to integrate identification
with quantitation along with replicate combination. There are two
modules in the HyperQuant tool—Mapper and Combiner. Mapper
aids in mapping of PSM identification and quantification results from
MaxQuant and QuantWizIQ outputs, respectively. Apart from
MaxQuant, users also have the flexibility to use any other identification
engine using a simple text format that specifies PSM and label information.
The Combiner module integrates the PSMs from all replicates and summarizes
the protein intensity/ratio depending on the user choice. The detailed
workflow of the tool is shown in Figure C.
Benchmarking of the HyperQuant Tool Using
Hyperplexing Data
We tested the pipeline for identifying
and quantifying the proteome
using Dephoure and Gygi data.[18] We searched
the 3 × 6 experiment (18-plex) from Dephoure study using MaxQuant,
performed spectra level quantitation by QuantWizIQ, and
integrated the results using the HyperQuant pipeline to obtain protein
ratios. For this particular data, we used the weighted average method
for calculating protein ratios as per the original study. We compared
our results obtained from the pipeline with their protein list. Despite
using different tools (Sequest[29] vs MaxQuant),
we observed that ∼84% of the proteins identified were matching
(Figure A). The number
of peptides identified in light, medium, and heavy SILAC channels
also showed good corroboration (Figure B). Our reanalysis identified 394 more proteins, while
100 proteins were missed, but we did not analyze it further as we
were only interested in quantitative comparison for common proteins.
Figure 2
Benchmarking
of the HyperQuant tool using the public data. (A)
Concordance of proteins identified by the two search engines MaxQuant
(our search) and Sequest (Dephoure and Gygi). (B) For the common proteins,
the scatter plots depict good correlation between the number of peptides
identified for light, medium, and heavy SILAC channels between the
two searches. (C) Overview of HyperQuant benchmarking with results
from our pipeline (x-axis) compared with those from
Dephoure and Gygi data (y-axis). The comparison of
protein quantitation between the two pipelines depicts a high level
of agreement and good correlation between quantitated values despite
the use of different tools.
Benchmarking
of the HyperQuant tool using the public data. (A)
Concordance of proteins identified by the two search engines MaxQuant
(our search) and Sequest (Dephoure and Gygi). (B) For the common proteins,
the scatter plots depict good correlation between the number of peptides
identified for light, medium, and heavy SILAC channels between the
two searches. (C) Overview of HyperQuant benchmarking with results
from our pipeline (x-axis) compared with those from
Dephoure and Gygi data (y-axis). The comparison of
protein quantitation between the two pipelines depicts a high level
of agreement and good correlation between quantitated values despite
the use of different tools.For the 2645 common proteins between our analysis (MaxQuant + HyperQuant)
and Dephoure and Gygi results (Sequest + Vista), the protein ratios
from the two pipelines were compared. Figure C depicts the comparison of ratios across
the TMT[6] labels for each SILAC channel—light,
medium, and heavy separately. Because the protein ratio method used
here uses peptide numbers as weights, there are some differences attributable
to differences in peptide identifications (because of different search
engines); thereby altering quantitation marginally. We still observed
high ratio concordance across all labels. These results suggest that
the HyperQuant pipeline is able to analyze HOM data and can accurately
calculate the quantitative values for every label combination.
Analysis
of BONPlex Data
BONPlex study was designed
to investigate the effect of mycobacterial infection (clinical and
laboratory strains) on the human macrophage secretome. In this study,
BONCAT was used to study the newly translated proteins secreted by
THP-1human macrophages after mycobacterial infection (Figure A). The second technique, SILAC
was used as a pulsed medium and heavy labels to tag the THP-1 cells
infected with different mycobacterial strains namely H37Ra (medium)
and H37Rv (heavy) in one set, while BND433 (medium) and JAL2287 (heavy)
in another, each with their respective light SILAC uninfected controls.
The pulsed SILAC allowed selective labeling of newly synthesized proteins
in a time-window postinfection, while AHA allowed their selective
enrichment (Figure B). Although the light channel contains both pre-existing as well
as newly synthesized proteins, which would be otherwise indistinguishable,
the BONCAT labels allows for their selective enrichment, removing
this quantitation bias. The third technique is isobaric quantitation
with iTRAQ used to label six time points of each infection in the
THP-1 cells (Figure C) from 6 to 26 h at 4 h intervals. The labeled and digested secretome
from the light, medium, and heavy SILAC, each with 6-plex iTRAQ labeling
for time points were mixed, separated by LC, and analyzed by tandem
MS. MaxQuant was used to identify the spectra while the in-house developed
tool QuantWizIQ was used to calculate iTRAQ areas for each
time point. The HyperQuant tool was used to integrate the data from
identification and quantitation followed by replicate combination
and calculation of ratios from the areas.
Figure 3
Overview of the BONPlex
workflow and labeling. (A) Workflow comprises
selective enrichment of newly synthesized proteins with BONCAT (AHA)
and pulsed SILAC labeling. Two sets of SILAC were used to label THP-1
cells infected with different mycobacterial strains. (B) SILAC and
AHA labels were incorporated (after 1 h of methionine, lysine, and
arginine depletion) in specific time windows postinfection. (C) Cells
were harvested, proteins were digested, and time points were labeled
with iTRAQ reagents as shown. The supernatant was selected for LC–MS/MS,
and the 18-plexed data generated from each run was analyzed using
the HyperQuant pipeline.
Overview of the BONPlex
workflow and labeling. (A) Workflow comprises
selective enrichment of newly synthesized proteins with BONCAT (AHA)
and pulsed SILAC labeling. Two sets of SILAC were used to label THP-1
cells infected with different mycobacterial strains. (B) SILAC and
AHA labels were incorporated (after 1 h of methionine, lysine, and
arginine depletion) in specific time windows postinfection. (C) Cells
were harvested, proteins were digested, and time points were labeled
with iTRAQ reagents as shown. The supernatant was selected for LC–MS/MS,
and the 18-plexed data generated from each run was analyzed using
the HyperQuant pipeline.
NSS Analysis for THP-1
Cells
After removing contaminants
and decoy proteins, we identified and quantified 454 newly synthesized
secretome (NSS) proteins using BONPlex in both the experiments combined
together. We observed 343 proteins in set 1 with laboratory strains
and 213 proteins in set 2 with clinical strains (Table S2). We observed that the number of secreted proteins
in infected cells reduces considerably as compared to uninfected cells
in both the sets (Figure A,B). We found that 81, 46, 59, and 36 proteins were secreted
in response to infection with H37Ra (avirulent), H37Rv (virulent),
BND433 (clinical virulent), and JAL2287 (clinical virulent) respectively,
which are hereafter referred to as Ra, Rv, BND, and JAL for the sake
of brevity. We compared the proteins identified in uninfected and
infectedTHP-1 cells in Set 1 (Figure A) and Set 2 (Figure B). We observed that majority of the proteins expressed
in the control (246 in set 1 and 144 in set 2) are not expressed in
infection hinting at suppression of the cellular function, as also
observed in Figure S4A,B. Some proteins
are expressed exclusively in infected cells (34, 17, 24, and 15 proteins
in Ra, Rv, BND, and JAL, respectively, Figure S4A,B). We then combined the NSS proteins in virulent strains
(Rv, BND, and JAL) for comparison against the avirulent Ra as well
as the controls from both the sets (Figure S4C). We observed 27 and 48 proteins to be exclusive to avirulent and
virulent strains, respectively. By observing the quantitation of proteins
identified, we found that the majority of proteins secreted in different
infections are downregulated (Figure C, down + absent in infection & Figure S5A,B). There are some proteins specifically expressed
only under infection conditions (Figure D, up + absent in the control, Figure S6A,B), hinting at infection specific
cellular response (Figure E). The proteins absent in any of the infection were not considered
in this analysis. The macrophages secrete these proteins to probably
aid in the initiation of an immune response. Failure to initiate an
immune response leads to apoptosis or necrosis.[30] The mycobacteria, on the other hand, tries to block the
signals that can lead to immune response or tries to force production
of anti-inflammatory proteins which may be a fail-safe mechanism in
case some inflammatory signals leak out to the extracellular region.[31] In order to identify proteins that can mount
an immune response, we queried UniProt for specific immunomodulatory
keywords. We found 9 chemokines, 43 cytokines, 11 laminins, 14 proteases,
6 matrix metalloproteases, and 3 caspases. We also observed that 275
proteins identified were involved in signaling mechanisms. As we were
studying macrophages responses, we also observed proteins involved
in bacterial response (6 proteins), immune response (55 proteins),
or inflammatory response (48 proteins) in our keyword searches (Tables S1 and S3).
Figure 4
Overview of NSS proteins
identified for each strain. (A,B) represent
the number of NSS proteins observed in the experiment with laboratory
(A) and clinical (B) strain infections respectively. (C,D) depict
the number of up- and downregulated proteins along with the unique
proteins in either the infection or the control. (E) Heatmap showing
the strain specific expression of proteins observed in uninfected
and infected cells. The proteins expressed exclusively in control
but not quantitated in any of the infections were excluded from this
analysis.
Overview of NSS proteins
identified for each strain. (A,B) represent
the number of NSS proteins observed in the experiment with laboratory
(A) and clinical (B) strain infections respectively. (C,D) depict
the number of up- and downregulated proteins along with the unique
proteins in either the infection or the control. (E) Heatmap showing
the strain specific expression of proteins observed in uninfected
and infected cells. The proteins expressed exclusively in control
but not quantitated in any of the infections were excluded from this
analysis.
Common NSS Proteins in
All Four Infections
We observed
sixteen proteins expressed in all four infections. Except one (CSF1R
was absent in control of set 2), all of the proteins were observed
in control as well as in infections. CSF1R is the macrophage colony
stimulating factor-1 receptor that acts as a cell surface receptor
for CSF1 and IL34 and is important for release of proinflammatory
chemokines.[32] It is a tyrosine kinase that
activates the signaling for MAPK, STAT3, and PIP3 pathways.[33] It was downregulated in Ra, Rv, and BND infections
but was upregulated in initial time points of JAL infection, which
later was downregulated by the last time point (Figure S7).Among the other 15 proteins (Figure S7), all proteins were either downregulated
or unchanged in all infections except for IL1RN and PPT1 in JAL. IL1RN
is the receptor antagonist of IL1 that inhibits the binding of IL1
(α and β) to its receptor and reduces the inflammatory
response. In unstimulated macrophages, it is expressed to prevent
any unwanted inflammatory response due to nonspecific stimuli.[34] During Ra infections, it gets highly downregulated
in later time points as the macrophage exhibits delayed immune response
and increased production of IL1β to contain Ra bacilli. In Rv
infection, however, the secretion levels kept increasing in later
time points, although it does not qualify as differentially expressed
at our threshold criteria (±2 fold change). During BND infection,
it remains downregulated in the secretome for the first few time points,
while gradually increasing in the last time point. In JAL infection,
it was overexpressed consistently, demonstrating its heightened capability
to suppress the immune response. PPT1 is a thioesterase responsible
for negative regulation of apoptotic processes in infections. It removes
protein palmitoylation, thus regulating the cellular localization
of substrates to cytosol.[35] It was significantly
downregulated only in one time point (6 h) in Ra. In Rv, the expression
was initially low which then increased to normal levels by 14 h and
dropped back to lower levels again by 26 h. In BND infection, the
secretion behavior was similar to Ra infection, which was under expressed
only in one time point. However, it was significantly upregulated
in JAL infection.
Immune Regulatory Processes
We searched
the immune-related
pathways in UniProt and reactome to identify proteins observed in
specific immune processes in our data. The common defense mechanisms
during infections are cytokine production and secretion, interleukin
production, inflammation, antigen processing and presentation, complement
cascade, apoptosis, and autophagy. Several pathways lead to regulation
of one or more defense response pathways that are observed in this
experiment. The detailed integration of these pathways with the NSS
proteins and how they may be regulated in the infections is shown
in Figure , Table S3, and Figure S8. The most common pathway observed during a bacterial infection is
toll-like receptor (TLR) signaling which leads to activation of the
MYD88 pathway for caspase activation.[36,37] CD14 is the
protein that activates TLR4 for the MYD88 pathway. However, we did
not observe CD14 in infection, although it was observed in uninfected
cells.
Figure 5
Overview summary of the important NSS proteins observed in immune
pathways. The common pathways regulated during immune response and
the proteins observed in the secretome for the infected cells. Proteins
involved in immune pathways but only observed in uninfected cells
are not represented here. The green arrows represent the activation
of a process by the protein, and the red arrow represents the inhibition.
While the upright triangle denotes overexpression, the inverted triangle
represents suppression, and the circle denotes no noticeable change
in secretome. The proteins identified in only specific pathways in
UniProt keyword search are represented here. The overall expression
of NSS proteins for considering either up- or downregulated depends
on its expression in more than three time points. The NSS proteins
with no change of expression means the change is not significant enough
in three or more time points to be considered as up- or downregulated.
Overview summary of the important NSS proteins observed in immune
pathways. The common pathways regulated during immune response and
the proteins observed in the secretome for the infected cells. Proteins
involved in immune pathways but only observed in uninfected cells
are not represented here. The green arrows represent the activation
of a process by the protein, and the red arrow represents the inhibition.
While the upright triangle denotes overexpression, the inverted triangle
represents suppression, and the circle denotes no noticeable change
in secretome. The proteins identified in only specific pathways in
UniProt keyword search are represented here. The overall expression
of NSS proteins for considering either up- or downregulated depends
on its expression in more than three time points. The NSS proteins
with no change of expression means the change is not significant enough
in three or more time points to be considered as up- or downregulated.
MAPK Pathway
MAPK cascade helps
in cytokine production
during infection and also aids in caspase activation.[38] There are several activators of MAPK cascade such as PSAP,
CHI3L1, and CSF1R expressed in all four infections, but all underexpressed
except CSF1R in JAL infection, indicating the bacterial control in
restricting MAPK signaling. We also observed certain proteins such
as GDF15 (down) and ITGA4 (up) that could activate the MAPK pathway
in avirulent infection. MTUS1 was also upregulated in avirulent infection,
which blocks activation of the MAPK pathway, displaying a dual control
on infected cells (one by macrophages and other by infecting bacteria).
PPEF2 had upregulated secretion in Rv infection which blocks MAPK,
while RASGEF1A was also upregulated which activates RAS and thus MAPK
indicating the dual control. In BND, RBPJ was secreted which activates
the NOTCH pathway, which in turn activates MAPK and NFκB pathways.
This interaction could produce SPP1 (in all infections) and CXCL3
(in Ra infection) cytokines.
AKT/PI3K Pathway
The AKT/PI3K pathway
activation leads
to downstream regulation of multiple immune response pathways.[39] It aids in transcription of inflammation pathway
proteins. It directly inhibits apoptosis processes and inhibits the
mTOR pathway. The mTOR pathway inhibits autophagy processes, and in
its absence, autophagy gets activated. Because the avirulent strain
will not be able to block apoptosis of cells to evade infection, many
proteins were observed in the secretome of avirulent infected cells
to activate the AKT pathway such as CCDC88A, IGFBP3, and ANGPT2. AKT
itself is observed in the secretome of the uninfected cells required
to activate growth response pathways. THEM4 is observed in avirulent
strain infection, which is a known inhibitor of the AKT pathway. APP
and CLSTN1 (both down) were observed in avirulent and clinical infection
that leads to the activation of apoptotic pathways. Also, the inhibitors
of the apoptosis pathway—CST3, CTSH, TREM2, and PPT1—were
observed in NSS proteins (Figure ). QSOX1 is a known inhibitor of the autophagy process
identified to be downregulated in all infections. However, in BND
infection, TMEM74 is secreted out which helps in the activation of
the autophagy process.
Complement Cascade, Antigen Processing, and
Presentation
Complement cascade leads to cell lysis, opsonization,
and inflammatory
response.[40] We observed A2M (up) in avirulent
and C4A (up) in Rv infections that assist in regulation of the complement
cascade. Interestingly, there was no known protein observed in clinical
strain infections. We observed B2M (down) in Ra, Rv, and BND infections,
which is known for regulating antigen processing and presentation
in macrophages. B2M is a known target of the mycobacterial ESX secretion
system, which aims to degrade B2M in the ER, so that it cannot be
secreted out for antigen presentation.[41,42] We also observed
HLA-B (JAL), IL4I1, and CENPE (Ra, Rv) that are part of antigen processing
and presentation.Of the four strains, Ra is not able to block
the necessary pathways of immune response. IL1β was observed
in Ra infection that leads to caspase activation and thus apoptosis.
In Rv infection, we observed proteins pertaining to complement cascade,
MAPK pathways, antigen processing and presentation, and so forth that
could lead to an immune response. In BND infection, proteins that
activate NOTCH and autophagy pathways were observed. In JAL infection,
which is the most aggressive virulent strain among the four, upregulated
NSS proteins were observed in inhibitory pathways of apoptosis and
IL1β with only one exception of CSF1R that activated the MAPK
pathway. Even the antigen processing or cytokines expressed showed
no significant change in secretion, indicating its competence in molding
the immune response to suit its purpose.Because the aim of
this study is to demonstrate the utility of
the HyperQuant pipeline in analyzing the HOM study designs, these
examples highlight the advantages of using the tool for analysis of
complex data. Several interesting hypothesis could be generated for
further studies from this data. While we were able to observe many
immune response proteins, we also observed cytoskeletal proteins,
kinases, phosphatases, thiols, and proteases required for housekeeping
functions. Although this analysis was aimed as the proof of concept
for the tool being able to analyze a truly HOM study with 18 different
conditions in one run, several immunomodulatory pathways were observed
which can be a highly useful resource for the TB community.
Conclusions
Several higher order multiplexing studies have been applied to
interesting biological problems over the past decade.[1,10] The advances in HOM techniques, however, outpaced the associated
computational developments for analysis, which prevented its widespread
adoption. Recently, the number of HOM studies is increasing rapidly
necessitating the need for computational tools that allow labs without
computational expertise to be able to exploit this technique.With the development of HyperQuant, we aim to fill this gap and
facilitate easy analysis of data from HOM studies, allowing a true
hyperplexing capacity for quantitation. We have also shown its utility
by reanalyzing 18-plex data from hyperplexing study and BONPlex study.
Summarizing the individual label combinations using areas or ratios,
HyperQuant allows flexible analysis designs from any type of HOM data.
With increased multiplexing from novel HOM studies designs, HyperQuant
is already suited to analyze the data beyond 18-plex. For examples
of possible multiplexing using HOM, the readers are referred to Table
1 of Aggarwal et al. 2019.[10] The tool is
open source, platform independent, freely available, and allows flexibility
in analysis of complex HOM datasets for diverse experimental designs.
By exploiting the HOM technique, including replicates and/or biological
conditions in the same run, better data fidelity and robustness can
be achieved that is usually lacking in most underpowered studies.
To the best of our knowledge, this is the first and only tool available
for analyses of HOM data. This will enable researchers to adopt HOM
techniques to plan interesting studies without worrying about complicated
custom data analysis pipelines.
Methods
BONPlex Dataset
(PXD004281)
The data was searched against
the UniProt human database (downloaded on 7th June 2014)with cRAP
proteins (https://www.thegpm.org/crap) (total 65 745 sequences) and MaxQuant (1.5.0.30) added contaminants,
searched using MaxQuant[43] with carbamidomethylation
and iTRAQ (N-term) as fixed modifications. Two new
modifications were created in the MaxQuant parameter dictionary for
the combined masses of iTRAQ and SILAC (medium and heavy separately)
on lysine residues. SILAC pipeline search was used with SILAC light
as the mass of iTRAQ, while SILAC-medium and SILAC-heavy as the new
modifications created on lysine. The SILAC masses on arginine residues
were 0, 6, and 10 Da, respectively, for light, medium, and heavy.
BONCAT at methionine residues, deamidation of asparagine and glutamine,
and methionine oxidation were used as variable modifications. The
enzyme specificity was trypsin, while missed cleavage was set to 1.
The data was identified at ≤1% FDR for PSMs and proteins. The
precursor and fragment mass tolerances were specified as defaults
for the AB Sciex 5600 instrument in MaxQuant. For isobaric quantitation,
the mgf files were converted using msconvert and provided as an input
to an in-house tool QuantWizIQ (https://sourceforge.net/projects/quantwiz/files/IQ) for iTRAQ quantitation (described later), using the sum intensity
method, as described in Aggarwal et al. 2016.[7] Spectra with the reporter ion intensity below 20 were filtered out,
and the sum intensity-based method was used to calculate the iTRAQ
area. The quantitation results thus obtained were merged with identifications
from MaxQuant, using HyperQuant Mapper, and the normalized protein
intensities were summarized using the median intensity method with
the HyperQuant Combiner. The results are provided in table S2.
Hyperplexing Dataset (Dephoure and Gygi)
The 18-plex
data from the hyperplexing study (kindly provided by N. Dephoure and
S. Gygi) were used for benchmarking of the HyperQuant tool.[18] The data in Thermo RAW format were searched
using MaxQuant with search parameters followed as closely as possible
to those stated in the article. The data was searched with the triplex
SILAC mode in MaxQuant with carbamidomethylation and TMT at the N-term
as fixed modification. The SILAC labels were redefined in the dictionary
for Lysine labels as light + TMT, medium + TMT, and heavy + TMT while
arginine labels were used as is. The data were filtered at ≤1%
FDR. The RAW files from the 18-plex experiment were converted to mzML
format using msconvert. The PSM quantitation from the mzML files was
carried out using QuantWizIQ software using the sum intensity
method, as described in Aggarwal et al. 2016.[7] The HyperQuant mapper
module combined the PSM identifications from MaxQuant with their corresponding
quantitation from QuantWizIQ. The HyperQuant combiner module
was used to integrate replicates and output raw intensities values
from which the protein ratios were calculated using the weighted average
method. The protein list and the ratios thus obtained were compared
with the results from hyperplexing study.
Reporter Quantitation Using
the Isobaric Quantitator (QuantWizIQ)
QuantWizIQ is an isobaric quantitation
tool developed in-house for spectra level quantitation from the HUPO-PSI
standard mzML data format as well the de facto standard-mascot generic
format (MGF).[51] It performs fast iTRAQ/TMT-labeled
spectra quantitation with low memory footprints. It allows isobaric
quantitation from 4/8-plex iTRAQ and 2/6/10/16-plex TMT labels. The
tool also provides information of MS3 scans in case of
TMT labels which HyperQuant can use later for merging identification
and quantitation. It provides users with flexibility of the area method
to perform quantitation. It also provides users with option of intensity
normalization and purity correction. It is a platform independent,
command line tool developed in Perl and freely available for noncommercial
use. The details of the software can be found in the detailed documentation
along with the example data at https://sourceforge.net/projects/quantwiz/files/IQ/. It is also available through Bioconda[44] (https://anaconda.org/bioconda/quantwiz-iq) and Galaxy-P[45] (http://galaxyp.org/) (courtesy galaxy-P
team).
Protein Quantitation
For integrating the identification
and quantitation results individually for each label combination and
calculation of protein ratios, we developed a command-line tool HyperQuant
in the Perl programming language. HyperQuant takes identification
results along with MS/MS quantitation results as input and provides
a list of protein areas after replicate combination. As an example,
the MaxQuant text files were used to select PSM identification results.
For PSMs identified and passed at ≤1% FDR threshold, quantitation
values were extracted from QuantWizIQ results for corresponding
spectra. If no PSM is quantified for a protein, it is reported in
a separate file at this step. This module of the tool is called the
Mapper, which maps the identification and quantitation results. The
Mapper results for all the replicates or a single file (in case of
no replicates) are subsequently provided to another module called
the Combiner. The Combiner creates a dictionary (hash) of each protein
group specified by MaxQuant as a key and information about all the
PSMs in that group as the value. If a protein group is a subset of
another protein group from other replicate(s), the two groups are
combined and the PSMs are merged for these groups as one group (Note S1 and Figure S1). The quantitation values for each label combination are processed
separately (Figure S2). For example, the
intensity values for all the PSMs with the 113-iTRAQ reporter of SILAC
light are used to calculate the standard deviation and values outside
±2 standard deviations range is discarded as outlier. This step
of outlier removal ensures that during combination any extreme value
will be removed (Note S2). The median intensity
of the remaining values is used as the protein intensity for that
sample. This enables normalization of the PSM intensities to be used
for the protein-normalized intensity. The users can also select peptide
ratios of isobaric labels to summarize protein ratios instead of normalized
intensities. There are three methods available to the user for summarizing
protein intensity/ratios that are median, average, and weighted average
methods (Note S3). The results are reported
with details of scan count, unique peptide count, protein length,
and replicates in which the particular labels were observed along
with their calculated quantitation errors. The postsearch filters
on the number of peptides/replicates for each protein can be used
as per the experimental design or user choice. This allows flexibility
of the experimental design as the decoupling of analysis intricacies
now allows innovative ways to analyze data postquantitative integration,
depending on whether the labels were used as replicates. A demonstration
of quantitative processing is shown in Figure S2. The tool is platform independent, open source, and freely
available software for the research community, and it can be downloaded
from https://sourceforge.net/projects/hyper-quant/files/ along with
the example data and detailed documentation enumerating all steps
for analysis.
Ratio Calculation
For BONPlex study,
the ratio is calculated
separately for infection by different strains, as well as temporally,
for further analysis (Figure S3). The log2 fold change was calculated for labeled protein with respect
to corresponding unlabeled counterparts at a given time point for
studying the strain-specific variations in THP-1 responses to infection.Some proteins were only observed in
infected cells. To observe changes in these proteins, iTRAQ ratios
were calculated for all time points with respect to the first one.
The significance threshold chosen was ±2 fold change, that is,
±1 at log2 scale (also see Note S4 and Table S3). All absent and
zero values (identified in any one channel) were converted to −6.64,
(log2 of 0.01, minimum quantified value) which denoted
an infinitely small value or 6.64 for highly overexpressed or only
expressed in infection conditions (log2 of 100, maximum
quantified value).For hyperplexing data, the peptide ratios
were used for calculating
the weighted average protein ratio, as described earlier.[8] The following formula was used:where, N represents the total
number of quantitated peptides for the protein.The protein
ratios were calculated from TMT labels, as the SILAC
labels were used as biological replicates in this experiment. The
results provided as the Supporting Information file by the authors was used for comparison with our results.
Gene Ontology/Keyword Search/Pathway Analysis
For BONPlex
data, we used the peptide cut-off of ≥3 in individual conditions/biological
samples and used the GeneCodis server for gene ontology[46] (Table S3). The data
was filtered at ≤1% Bonferroni FDR. Because the study was on
immune cells, several keywords based on immune response as elucidated
in the literature,[26−28,47,48] were searched in UniProt[49] to the fetch
list of proteins. The lists were then compared to the proteins identified
in this experiment. The lists were generated using the following keywords:
cytokine, chemokine, interferon, interleukin, defense response, caspase,
growth factor, signaling, secretory, proteases, and matrix metalloproteases[26,27] (Tables S1 and S3). The proteins associated
with immune response pathways were also queried in UniProt to use
the resulting list to match with strain specific changes in proteins.
These pathways included TLR-4, MAPK, AKT/PI3K, NOTCH, JAK/STAT, WNT,
RAS, TEK/TIE2, IGF, and NF-κB pathways (Table S3). We also fetched gene ontology categories pertaining
to—defense to bacterial response, inflammation, complement
cascade, cytokine production, interleukin secretion, antigen processing
and presentation, apoptosis, and autophagy pathways as suggested in
the cited literature.[26−28,47,48] The proteins were queried in the reactome[50] database, and the proteins related to immune response pathways were
observed (Figure S8).
Authors: Kevin A Welle; Tian Zhang; Jennifer R Hryhorenko; Shichen Shen; Jun Qu; Sina Ghaemmaghami Journal: Mol Cell Proteomics Date: 2016-10-20 Impact factor: 5.911
Authors: Justyna Sobocińska; Paula Roszczenko-Jasińska; Anna Ciesielska; Katarzyna Kwiatkowska Journal: Front Immunol Date: 2018-01-19 Impact factor: 7.561