Literature DB >> 32455206

HyperQuant-A Computational Pipeline for Higher Order Multiplexed Quantitative Proteomics.

Suruchi Aggarwal^1,2,3, Ajay Kumar¹, Shilpa Jamwal¹, Mukul Kumar Midha¹, Narayan Chandra Talukdar^2,3, Amit Kumar Yadav¹.

Abstract

Quantitative proteomics has evolved considerably over the last decade with the advent of higher order multiplexing (HOM) techniques. With the development of methods such as-multitagging, cPILOT, hyperplexing, BONPlex, and MITNCAT, the HOM technique is rapidly taking the center stage in multiplexed quantitative proteomics. These studies combined MS1 and MS2 labels in a single experiment enabling higher sample throughput. While HOM is highly promising, the computational analysis is still a big challenge, as the available tools cannot harness its power completely. We have developed a new quantitative pipeline, HyperQuant to aid in accurately quantitating complex HOM data. The pipeline uses identification results from either MaxQuant or any other search engine and quantitation results from QuantWizIQ. The Mapper and Combiner modules of HyperQuant allow facile integration of the labeled data, along with peptide spectrum match (PSM) intensity/ratio integration for proteins, respectively, for each PSM label combination. This also includes appropriate combination of replicates/fractions before summarizing the protein intensity/ratio, leading to robust quantitation. To the best of our knowledge, this is the first tool for the quantitation of HOM data with flexibility for any combination of MS1 and MS2 labels. We demonstrate its utility in analyzing two 18-plex data sets from the hyperplexing and the BONplex studies. The tool is open source and freely available for noncommercial use. HyperQuant is a highly valuable tool that will help in advancing the field of multiplexed quantitative proteomics.

Entities: Chemical Disease Gene Species

Year: 2020 PMID： 32455206 PMCID： PMC7240821 DOI： 10.1021/acsomega.0c00515

Source DB: PubMed Journal: ACS Omega ISSN： 2470-1343

Introduction

Proteomics has enabled the high throughput study of cellular systems to uncover the mechanisms regulating cellular health and disease. Understanding the cellular signaling networks or perturbations to different types of stimuli requires robust and reproducible quantitation at a large scale. Quantitative proteomics has made it possible to identify as well as quantify proteins from multiple conditions in a single run.[1] Protein quantitation in shotgun proteomics is carried out using metabolic or chemical labeling.[2] Metabolic labeling of proteins with SILAC (stable isotope labeling of amino acids in cell culture) replaces essential amino acids in the cell culture with their stable isotope-labeled counterparts (such as heavy lysine or arginine).[3] The separate cultures from normal (light) and labeled (heavy) samples are subsequently mixed in equal amounts, digested, and analyzed by LC–MS/MS. The peptides from the two samples are reflected in MS1 spectra as pairs separated by known mass differences between light and heavy peptides. On sequencing in MS/MS, the peptides are identified by database search followed by FDR control,[4] while their MS1 intensities are a proxy for their relative quantitation. Because of mixing the cell cultures early in the workflow, it is the most accurate technique for quantitation. However, biological samples cannot always be labeled in cell culture and require chemical labeling. In chemical labeling techniques such as iTRAQ[5] (isobaric tags for relative and absolute quantitation) or TMT[6] (tandem mass tag), digested peptides from two to sixteen samples are labeled with different variants of isobaric tags that label the N-termini and the free amino group on lysine residues.[7] The isobaric tags increase the mass of peptides from all samples equally, and the peaks in MS1 represent a sum of peptide intensities from all samples. Upon fragmentation in MS/MS, the unique mass reporter ions from iTRAQ/TMT tags are observed in the low mass region, while the sequencing peaks are used for identification. The reporter peaks help in relative quantitation between the samples.[8] While these techniques have made proteome-wide quantitation possible, strategies that enhance the depth and multiplexing capacity of quantitation are desirable for systems biology studies.[9] With the advancements in higher order multiplexing (HOM) technologies (combining MS1 and MS2 labels), designing a statistically robust experiment with high sample throughput is now feasible for studying proteome dynamics at systems level.[10] Identifying and quantifying proteins up to 54 conditions in a single run of the mass spectrometer with the help of HOM considerably reduces the technical variability arising because of multiple runs.[11] Since the first experiment performed in 2010, the technology has vastly evolved with different combinations of metabolic and chemical labeling such as multitagging,[12] cPILOT,[13−17] hyperplexing,[18] SILAC-iTRAQ tails,[19] TMT-SILAC hyperplexing,[20,21] BONPlex,[51] MITNCAT,[22] and mPDP[23] to achieve higher sample throughput in a single mass spectrometry run. Even though the technique has been around for almost a decade, the computational analysis is still lagging behind and is performed with two different searches and custom scripts for analysis. Using conventional tools, the quantitative analysis is cumbersome as the individual runs inadvertently summarize protein quantitation incorrectly. The dual search strategy is performed to identify peptides as no search engine can search for two modifications together on one amino acid. For identification of peptides labeled with both MS1 and MS2, a modified MS1 search (including mass of MS2 label) is conducted which provides a summed up MS1 quantitation (Figure A). For quantitation of MS2 labels, another search is performed for MS2 combination with lighter MS1, and custom scripts are written to derive MS2 quantitation of heavier MS1 combinations (Figure B). The downstream analysis becomes a challenge that requires custom analysis suited to each experiment type for the correct biological interpretation or designing ingenious experiments to evade the need for quantitation from one of the labels, limiting the capacity of HOM.[10] For example, in the 18-plex hyperplexing study, the SILAC labels were used for biological replicates, while TMT labels were used for time points.[18] Two database searches were conducted—one for each label as fixed modification and quantitation was performed using in-house program Vista.[24] The use of two types of chemical labels in cPILOT allowed it to be useful for biological and clinical samples that cannot be metabolically labeled. The protein ratios in the cPILOT studies were calculated with the help of an internal standard reporter ion, thus requiring one of the reporters to be used as an internal standard.[15] In TMT-SILAC hyperplexing study, the SILAC labels were used to study the kinetics of protein quantitation over time points labeled by TMT labels. Two searches were performed, one with SILAC as a fixed label, yielding SILAC ratios and one with only TMT labels. A kinetics model was calculated based on reporter ion intensities to calculate protein synthesis and degradation.[20] In MITNCAT study, the protein dynamics was studied with the TMT labels at specific time points, while SILAC and AHA labels were used exclusively for protein selection in a time window.[22]

Figure 1

Conceptual comparison of data analysis performed by current tools versus HyperQuant. Current tools cannot quantitate the HOM data correctly. Manipulating label definitions in search tools (SILAC-Light, SILAC-Medium, and SILAC-Heavy) can help in the identification of PSMs but not correct protein quantitation whether MS1-based search (SILAC search) or MS2-based search (iTRAQ/TMT) is used. (A) MS1 quantitative search will combine the quantitative contribution from all six time points of each SILAC into the respective MS1 peaks (L/M/H), as depicted by the colored boxes (light blue, light red, and light yellow). (B) MS2 search only allows identification and quantitation of SILAC light-based reporters. The SILAC medium and SILAC heavy reporters will not be identified or quantified. (C) HyperQuant combines data from A with reporter quantitation from QuantWizIQ to correctly carry out quantitation across each SILAC-iTRAQ combination separately. Despite the obvious benefits of the HOM technique to provide reproducible, cost-, and time-efficient study designs, it is not yet fully exploited as a regular proteomic workflow because of challenges in computational analysis and mixing of quantitation from MS1 and MS2 labels.[10] To enable easy and flexible analysis of data from HOM studies, we have designed a computational pipeline—HyperQuant, which is the first tool that collates, deconvolutes labels, summarizes, and calculates normalized protein intensities or ratios for each condition (Figure C). The normalized intensities can be used to calculate relative protein ratios from any combination of metabolic or chemical labels as per the experiment to facilitate uncomplicated biological interpretations. HyperQuant is flexible enough to use the results from any search engine provided in a simple predefined text format making it interoperable with many search engines. The peptide and protein quantitation is performed by combining data from multiple replicates and can output the summarized protein intensity. HyperQuant has several advantages apart from experiment-agnostic integration of HOM data. It also rescues proteins with a low number of peptide spectrum matches (PSMs) in individual replicates as combining PSMs before outlier removal or filtering saves consistently occurring replicate level one-hit wonders. Apart from rescuing protein identifications, increasing the number of peptide data points can enhance the statistical confidence in the protein quantitation. The poorer (or highly variable) quantitation values across replicates will be removed during an outlier removal. HyperQuant also provides normalized intensities for each sample used in the experiment, thus giving users the flexibility in choosing any denominator for ratio calculation. To demonstrate the utility of the pipeline; we have reanalyzed the 18-plex hyperplexing data from Dephoure and Gygi[18] and compared HyperQuant results with the results from their study. We also analyzed our previously published data of “hyperplexing and click chemistry” (later named BONPlex),[51,10,25] investigating the effect of mycobacterial infection on human macrophage secretome using a combination of pSILAC, AHA, and iTRAQ. Although macrophage secretome has been previously studied using similar techniques by Eichaelbaum et al.[26,27] using pSILAC with AHA and Khan et al.[28] using targeted proteomics, these studies used either LPS or attenuated bacteria to create an immune response. The sample multiplexing was merely 2–3 plex and involved multiple 2-plex experiments for temporal study.[27] Our study exploited AHA with the power of the HOM technique to reduce the need for multiple experiments and run-to-run variability.[51] The study was also unique for using clinical and lab strains of Mtb instead of LPS, thus providing more realistic insights into the elicited immune response. The experiment focused only on newly synthesized secreted proteins and their temporal profile upon infection. Quantifying the secretome can aid in understanding the dynamics of the host–pathogen crosstalk. With the help of HyperQuant, strain specific as well as temporal changes in the host secretome were quantitated demonstrating its utility in a truly HOM experiment.

Results & Discussion

The novel design of an HOM experiment allows two types of labels for two quantitative dimensions (MS1 and MS2) in one MS run. This can be conceptually represented as a 2-D matrix to explain two types of labeling (Figure A,B). To ensure the complete biological understanding of the conditions labeled, the quantitation must be performed on both the dimensions of the matrix. The analysis is a challenge with the current tools, necessitating multiple searches for quantitating both labels and custom analysis scripts. This does not completely harness the power of HOM studies. To exploit the true multiplexing nature of technology and easy quantitation without any loss of information on any label, we designed the HyperQuant tool to integrate identification with quantitation along with replicate combination. There are two modules in the HyperQuant tool—Mapper and Combiner. Mapper aids in mapping of PSM identification and quantification results from MaxQuant and QuantWizIQ outputs, respectively. Apart from MaxQuant, users also have the flexibility to use any other identification engine using a simple text format that specifies PSM and label information. The Combiner module integrates the PSMs from all replicates and summarizes the protein intensity/ratio depending on the user choice. The detailed workflow of the tool is shown in Figure C.

Benchmarking of the HyperQuant Tool Using Hyperplexing Data

We tested the pipeline for identifying and quantifying the proteome using Dephoure and Gygi data.[18] We searched the 3 × 6 experiment (18-plex) from Dephoure study using MaxQuant, performed spectra level quantitation by QuantWizIQ, and integrated the results using the HyperQuant pipeline to obtain protein ratios. For this particular data, we used the weighted average method for calculating protein ratios as per the original study. We compared our results obtained from the pipeline with their protein list. Despite using different tools (Sequest[29] vs MaxQuant), we observed that ∼84% of the proteins identified were matching (Figure A). The number of peptides identified in light, medium, and heavy SILAC channels also showed good corroboration (Figure B). Our reanalysis identified 394 more proteins, while 100 proteins were missed, but we did not analyze it further as we were only interested in quantitative comparison for common proteins.

Figure 2

Benchmarking of the HyperQuant tool using the public data. (A) Concordance of proteins identified by the two search engines MaxQuant (our search) and Sequest (Dephoure and Gygi). (B) For the common proteins, the scatter plots depict good correlation between the number of peptides identified for light, medium, and heavy SILAC channels between the two searches. (C) Overview of HyperQuant benchmarking with results from our pipeline (x-axis) compared with those from Dephoure and Gygi data (y-axis). The comparison of protein quantitation between the two pipelines depicts a high level of agreement and good correlation between quantitated values despite the use of different tools. For the 2645 common proteins between our analysis (MaxQuant + HyperQuant) and Dephoure and Gygi results (Sequest + Vista), the protein ratios from the two pipelines were compared. Figure C depicts the comparison of ratios across the TMT[6] labels for each SILAC channel—light, medium, and heavy separately. Because the protein ratio method used here uses peptide numbers as weights, there are some differences attributable to differences in peptide identifications (because of different search engines); thereby altering quantitation marginally. We still observed high ratio concordance across all labels. These results suggest that the HyperQuant pipeline is able to analyze HOM data and can accurately calculate the quantitative values for every label combination.

Analysis of BONPlex Data

BONPlex study was designed to investigate the effect of mycobacterial infection (clinical and laboratory strains) on the human macrophage secretome. In this study, BONCAT was used to study the newly translated proteins secreted by THP-1 human macrophages after mycobacterial infection (Figure A). The second technique, SILAC was used as a pulsed medium and heavy labels to tag the THP-1 cells infected with different mycobacterial strains namely H37Ra (medium) and H37Rv (heavy) in one set, while BND433 (medium) and JAL2287 (heavy) in another, each with their respective light SILAC uninfected controls. The pulsed SILAC allowed selective labeling of newly synthesized proteins in a time-window postinfection, while AHA allowed their selective enrichment (Figure B). Although the light channel contains both pre-existing as well as newly synthesized proteins, which would be otherwise indistinguishable, the BONCAT labels allows for their selective enrichment, removing this quantitation bias. The third technique is isobaric quantitation with iTRAQ used to label six time points of each infection in the THP-1 cells (Figure C) from 6 to 26 h at 4 h intervals. The labeled and digested secretome from the light, medium, and heavy SILAC, each with 6-plex iTRAQ labeling for time points were mixed, separated by LC, and analyzed by tandem MS. MaxQuant was used to identify the spectra while the in-house developed tool QuantWizIQ was used to calculate iTRAQ areas for each time point. The HyperQuant tool was used to integrate the data from identification and quantitation followed by replicate combination and calculation of ratios from the areas.

Figure 3

Overview of the BONPlex workflow and labeling. (A) Workflow comprises selective enrichment of newly synthesized proteins with BONCAT (AHA) and pulsed SILAC labeling. Two sets of SILAC were used to label THP-1 cells infected with different mycobacterial strains. (B) SILAC and AHA labels were incorporated (after 1 h of methionine, lysine, and arginine depletion) in specific time windows postinfection. (C) Cells were harvested, proteins were digested, and time points were labeled with iTRAQ reagents as shown. The supernatant was selected for LC–MS/MS, and the 18-plexed data generated from each run was analyzed using the HyperQuant pipeline.

NSS Analysis for THP-1 Cells

After removing contaminants and decoy proteins, we identified and quantified 454 newly synthesized secretome (NSS) proteins using BONPlex in both the experiments combined together. We observed 343 proteins in set 1 with laboratory strains and 213 proteins in set 2 with clinical strains (Table S2). We observed that the number of secreted proteins in infected cells reduces considerably as compared to uninfected cells in both the sets (Figure A,B). We found that 81, 46, 59, and 36 proteins were secreted in response to infection with H37Ra (avirulent), H37Rv (virulent), BND433 (clinical virulent), and JAL2287 (clinical virulent) respectively, which are hereafter referred to as Ra, Rv, BND, and JAL for the sake of brevity. We compared the proteins identified in uninfected and infected THP-1 cells in Set 1 (Figure A) and Set 2 (Figure B). We observed that majority of the proteins expressed in the control (246 in set 1 and 144 in set 2) are not expressed in infection hinting at suppression of the cellular function, as also observed in Figure S4A,B. Some proteins are expressed exclusively in infected cells (34, 17, 24, and 15 proteins in Ra, Rv, BND, and JAL, respectively, Figure S4A,B). We then combined the NSS proteins in virulent strains (Rv, BND, and JAL) for comparison against the avirulent Ra as well as the controls from both the sets (Figure S4C). We observed 27 and 48 proteins to be exclusive to avirulent and virulent strains, respectively. By observing the quantitation of proteins identified, we found that the majority of proteins secreted in different infections are downregulated (Figure C, down + absent in infection & Figure S5A,B). There are some proteins specifically expressed only under infection conditions (Figure D, up + absent in the control, Figure S6A,B), hinting at infection specific cellular response (Figure E). The proteins absent in any of the infection were not considered in this analysis. The macrophages secrete these proteins to probably aid in the initiation of an immune response. Failure to initiate an immune response leads to apoptosis or necrosis.[30] The mycobacteria, on the other hand, tries to block the signals that can lead to immune response or tries to force production of anti-inflammatory proteins which may be a fail-safe mechanism in case some inflammatory signals leak out to the extracellular region.[31] In order to identify proteins that can mount an immune response, we queried UniProt for specific immunomodulatory keywords. We found 9 chemokines, 43 cytokines, 11 laminins, 14 proteases, 6 matrix metalloproteases, and 3 caspases. We also observed that 275 proteins identified were involved in signaling mechanisms. As we were studying macrophages responses, we also observed proteins involved in bacterial response (6 proteins), immune response (55 proteins), or inflammatory response (48 proteins) in our keyword searches (Tables S1 and S3).

Figure 4

Overview of NSS proteins identified for each strain. (A,B) represent the number of NSS proteins observed in the experiment with laboratory (A) and clinical (B) strain infections respectively. (C,D) depict the number of up- and downregulated proteins along with the unique proteins in either the infection or the control. (E) Heatmap showing the strain specific expression of proteins observed in uninfected and infected cells. The proteins expressed exclusively in control but not quantitated in any of the infections were excluded from this analysis.

Common NSS Proteins in All Four Infections

We observed sixteen proteins expressed in all four infections. Except one (CSF1R was absent in control of set 2), all of the proteins were observed in control as well as in infections. CSF1R is the macrophage colony stimulating factor-1 receptor that acts as a cell surface receptor for CSF1 and IL34 and is important for release of proinflammatory chemokines.[32] It is a tyrosine kinase that activates the signaling for MAPK, STAT3, and PIP3 pathways.[33] It was downregulated in Ra, Rv, and BND infections but was upregulated in initial time points of JAL infection, which later was downregulated by the last time point (Figure S7). Among the other 15 proteins (Figure S7), all proteins were either downregulated or unchanged in all infections except for IL1RN and PPT1 in JAL. IL1RN is the receptor antagonist of IL1 that inhibits the binding of IL1 (α and β) to its receptor and reduces the inflammatory response. In unstimulated macrophages, it is expressed to prevent any unwanted inflammatory response due to nonspecific stimuli.[34] During Ra infections, it gets highly downregulated in later time points as the macrophage exhibits delayed immune response and increased production of IL1β to contain Ra bacilli. In Rv infection, however, the secretion levels kept increasing in later time points, although it does not qualify as differentially expressed at our threshold criteria (±2 fold change). During BND infection, it remains downregulated in the secretome for the first few time points, while gradually increasing in the last time point. In JAL infection, it was overexpressed consistently, demonstrating its heightened capability to suppress the immune response. PPT1 is a thioesterase responsible for negative regulation of apoptotic processes in infections. It removes protein palmitoylation, thus regulating the cellular localization of substrates to cytosol.[35] It was significantly downregulated only in one time point (6 h) in Ra. In Rv, the expression was initially low which then increased to normal levels by 14 h and dropped back to lower levels again by 26 h. In BND infection, the secretion behavior was similar to Ra infection, which was under expressed only in one time point. However, it was significantly upregulated in JAL infection.

Immune Regulatory Processes

We searched the immune-related pathways in UniProt and reactome to identify proteins observed in specific immune processes in our data. The common defense mechanisms during infections are cytokine production and secretion, interleukin production, inflammation, antigen processing and presentation, complement cascade, apoptosis, and autophagy. Several pathways lead to regulation of one or more defense response pathways that are observed in this experiment. The detailed integration of these pathways with the NSS proteins and how they may be regulated in the infections is shown in Figure , Table S3, and Figure S8. The most common pathway observed during a bacterial infection is toll-like receptor (TLR) signaling which leads to activation of the MYD88 pathway for caspase activation.[36,37] CD14 is the protein that activates TLR4 for the MYD88 pathway. However, we did not observe CD14 in infection, although it was observed in uninfected cells.

Figure 5

Overview summary of the important NSS proteins observed in immune pathways. The common pathways regulated during immune response and the proteins observed in the secretome for the infected cells. Proteins involved in immune pathways but only observed in uninfected cells are not represented here. The green arrows represent the activation of a process by the protein, and the red arrow represents the inhibition. While the upright triangle denotes overexpression, the inverted triangle represents suppression, and the circle denotes no noticeable change in secretome. The proteins identified in only specific pathways in UniProt keyword search are represented here. The overall expression of NSS proteins for considering either up- or downregulated depends on its expression in more than three time points. The NSS proteins with no change of expression means the change is not significant enough in three or more time points to be considered as up- or downregulated.

MAPK Pathway

MAPK cascade helps in cytokine production during infection and also aids in caspase activation.[38] There are several activators of MAPK cascade such as PSAP, CHI3L1, and CSF1R expressed in all four infections, but all underexpressed except CSF1R in JAL infection, indicating the bacterial control in restricting MAPK signaling. We also observed certain proteins such as GDF15 (down) and ITGA4 (up) that could activate the MAPK pathway in avirulent infection. MTUS1 was also upregulated in avirulent infection, which blocks activation of the MAPK pathway, displaying a dual control on infected cells (one by macrophages and other by infecting bacteria). PPEF2 had upregulated secretion in Rv infection which blocks MAPK, while RASGEF1A was also upregulated which activates RAS and thus MAPK indicating the dual control. In BND, RBPJ was secreted which activates the NOTCH pathway, which in turn activates MAPK and NFκB pathways. This interaction could produce SPP1 (in all infections) and CXCL3 (in Ra infection) cytokines.

AKT/PI3K Pathway

The AKT/PI3K pathway activation leads to downstream regulation of multiple immune response pathways.[39] It aids in transcription of inflammation pathway proteins. It directly inhibits apoptosis processes and inhibits the mTOR pathway. The mTOR pathway inhibits autophagy processes, and in its absence, autophagy gets activated. Because the avirulent strain will not be able to block apoptosis of cells to evade infection, many proteins were observed in the secretome of avirulent infected cells to activate the AKT pathway such as CCDC88A, IGFBP3, and ANGPT2. AKT itself is observed in the secretome of the uninfected cells required to activate growth response pathways. THEM4 is observed in avirulent strain infection, which is a known inhibitor of the AKT pathway. APP and CLSTN1 (both down) were observed in avirulent and clinical infection that leads to the activation of apoptotic pathways. Also, the inhibitors of the apoptosis pathway—CST3, CTSH, TREM2, and PPT1—were observed in NSS proteins (Figure ). QSOX1 is a known inhibitor of the autophagy process identified to be downregulated in all infections. However, in BND infection, TMEM74 is secreted out which helps in the activation of the autophagy process.

Complement Cascade, Antigen Processing, and Presentation

Complement cascade leads to cell lysis, opsonization, and inflammatory response.[40] We observed A2M (up) in avirulent and C4A (up) in Rv infections that assist in regulation of the complement cascade. Interestingly, there was no known protein observed in clinical strain infections. We observed B2M (down) in Ra, Rv, and BND infections, which is known for regulating antigen processing and presentation in macrophages. B2M is a known target of the mycobacterial ESX secretion system, which aims to degrade B2M in the ER, so that it cannot be secreted out for antigen presentation.[41,42] We also observed HLA-B (JAL), IL4I1, and CENPE (Ra, Rv) that are part of antigen processing and presentation. Of the four strains, Ra is not able to block the necessary pathways of immune response. IL1β was observed in Ra infection that leads to caspase activation and thus apoptosis. In Rv infection, we observed proteins pertaining to complement cascade, MAPK pathways, antigen processing and presentation, and so forth that could lead to an immune response. In BND infection, proteins that activate NOTCH and autophagy pathways were observed. In JAL infection, which is the most aggressive virulent strain among the four, upregulated NSS proteins were observed in inhibitory pathways of apoptosis and IL1β with only one exception of CSF1R that activated the MAPK pathway. Even the antigen processing or cytokines expressed showed no significant change in secretion, indicating its competence in molding the immune response to suit its purpose. Because the aim of this study is to demonstrate the utility of the HyperQuant pipeline in analyzing the HOM study designs, these examples highlight the advantages of using the tool for analysis of complex data. Several interesting hypothesis could be generated for further studies from this data. While we were able to observe many immune response proteins, we also observed cytoskeletal proteins, kinases, phosphatases, thiols, and proteases required for housekeeping functions. Although this analysis was aimed as the proof of concept for the tool being able to analyze a truly HOM study with 18 different conditions in one run, several immunomodulatory pathways were observed which can be a highly useful resource for the TB community.

Conclusions

Several higher order multiplexing studies have been applied to interesting biological problems over the past decade.[1,10] The advances in HOM techniques, however, outpaced the associated computational developments for analysis, which prevented its widespread adoption. Recently, the number of HOM studies is increasing rapidly necessitating the need for computational tools that allow labs without computational expertise to be able to exploit this technique. With the development of HyperQuant, we aim to fill this gap and facilitate easy analysis of data from HOM studies, allowing a true hyperplexing capacity for quantitation. We have also shown its utility by reanalyzing 18-plex data from hyperplexing study and BONPlex study. Summarizing the individual label combinations using areas or ratios, HyperQuant allows flexible analysis designs from any type of HOM data. With increased multiplexing from novel HOM studies designs, HyperQuant is already suited to analyze the data beyond 18-plex. For examples of possible multiplexing using HOM, the readers are referred to Table 1 of Aggarwal et al. 2019.[10] The tool is open source, platform independent, freely available, and allows flexibility in analysis of complex HOM datasets for diverse experimental designs. By exploiting the HOM technique, including replicates and/or biological conditions in the same run, better data fidelity and robustness can be achieved that is usually lacking in most underpowered studies. To the best of our knowledge, this is the first and only tool available for analyses of HOM data. This will enable researchers to adopt HOM techniques to plan interesting studies without worrying about complicated custom data analysis pipelines.

Methods

BONPlex Dataset (PXD004281)

The data was searched against the UniProt human database (downloaded on 7th June 2014)with cRAP proteins (https://www.thegpm.org/crap) (total 65 745 sequences) and MaxQuant (1.5.0.30) added contaminants, searched using MaxQuant[43] with carbamidomethylation and iTRAQ (N-term) as fixed modifications. Two new modifications were created in the MaxQuant parameter dictionary for the combined masses of iTRAQ and SILAC (medium and heavy separately) on lysine residues. SILAC pipeline search was used with SILAC light as the mass of iTRAQ, while SILAC-medium and SILAC-heavy as the new modifications created on lysine. The SILAC masses on arginine residues were 0, 6, and 10 Da, respectively, for light, medium, and heavy. BONCAT at methionine residues, deamidation of asparagine and glutamine, and methionine oxidation were used as variable modifications. The enzyme specificity was trypsin, while missed cleavage was set to 1. The data was identified at ≤1% FDR for PSMs and proteins. The precursor and fragment mass tolerances were specified as defaults for the AB Sciex 5600 instrument in MaxQuant. For isobaric quantitation, the mgf files were converted using msconvert and provided as an input to an in-house tool QuantWizIQ (https://sourceforge.net/projects/quantwiz/files/IQ) for iTRAQ quantitation (described later), using the sum intensity method, as described in Aggarwal et al. 2016.[7] Spectra with the reporter ion intensity below 20 were filtered out, and the sum intensity-based method was used to calculate the iTRAQ area. The quantitation results thus obtained were merged with identifications from MaxQuant, using HyperQuant Mapper, and the normalized protein intensities were summarized using the median intensity method with the HyperQuant Combiner. The results are provided in table S2.

Hyperplexing Dataset (Dephoure and Gygi)

The 18-plex data from the hyperplexing study (kindly provided by N. Dephoure and S. Gygi) were used for benchmarking of the HyperQuant tool.[18] The data in Thermo RAW format were searched using MaxQuant with search parameters followed as closely as possible to those stated in the article. The data was searched with the triplex SILAC mode in MaxQuant with carbamidomethylation and TMT at the N-term as fixed modification. The SILAC labels were redefined in the dictionary for Lysine labels as light + TMT, medium + TMT, and heavy + TMT while arginine labels were used as is. The data were filtered at ≤1% FDR. The RAW files from the 18-plex experiment were converted to mzML format using msconvert. The PSM quantitation from the mzML files was carried out using QuantWizIQ software using the sum intensity method, as described in Aggarwal et al. 2016.[7] The HyperQuant mapper module combined the PSM identifications from MaxQuant with their corresponding quantitation from QuantWizIQ. The HyperQuant combiner module was used to integrate replicates and output raw intensities values from which the protein ratios were calculated using the weighted average method. The protein list and the ratios thus obtained were compared with the results from hyperplexing study.

Reporter Quantitation Using the Isobaric Quantitator (QuantWizIQ)

QuantWizIQ is an isobaric quantitation tool developed in-house for spectra level quantitation from the HUPO-PSI standard mzML data format as well the de facto standard-mascot generic format (MGF).[51] It performs fast iTRAQ/TMT-labeled spectra quantitation with low memory footprints. It allows isobaric quantitation from 4/8-plex iTRAQ and 2/6/10/16-plex TMT labels. The tool also provides information of MS3 scans in case of TMT labels which HyperQuant can use later for merging identification and quantitation. It provides users with flexibility of the area method to perform quantitation. It also provides users with option of intensity normalization and purity correction. It is a platform independent, command line tool developed in Perl and freely available for noncommercial use. The details of the software can be found in the detailed documentation along with the example data at https://sourceforge.net/projects/quantwiz/files/IQ/. It is also available through Bioconda[44] (https://anaconda.org/bioconda/quantwiz-iq) and Galaxy-P[45] (http://galaxyp.org/) (courtesy galaxy-P team).

Protein Quantitation

For integrating the identification and quantitation results individually for each label combination and calculation of protein ratios, we developed a command-line tool HyperQuant in the Perl programming language. HyperQuant takes identification results along with MS/MS quantitation results as input and provides a list of protein areas after replicate combination. As an example, the MaxQuant text files were used to select PSM identification results. For PSMs identified and passed at ≤1% FDR threshold, quantitation values were extracted from QuantWizIQ results for corresponding spectra. If no PSM is quantified for a protein, it is reported in a separate file at this step. This module of the tool is called the Mapper, which maps the identification and quantitation results. The Mapper results for all the replicates or a single file (in case of no replicates) are subsequently provided to another module called the Combiner. The Combiner creates a dictionary (hash) of each protein group specified by MaxQuant as a key and information about all the PSMs in that group as the value. If a protein group is a subset of another protein group from other replicate(s), the two groups are combined and the PSMs are merged for these groups as one group (Note S1 and Figure S1). The quantitation values for each label combination are processed separately (Figure S2). For example, the intensity values for all the PSMs with the 113-iTRAQ reporter of SILAC light are used to calculate the standard deviation and values outside ±2 standard deviations range is discarded as outlier. This step of outlier removal ensures that during combination any extreme value will be removed (Note S2). The median intensity of the remaining values is used as the protein intensity for that sample. This enables normalization of the PSM intensities to be used for the protein-normalized intensity. The users can also select peptide ratios of isobaric labels to summarize protein ratios instead of normalized intensities. There are three methods available to the user for summarizing protein intensity/ratios that are median, average, and weighted average methods (Note S3). The results are reported with details of scan count, unique peptide count, protein length, and replicates in which the particular labels were observed along with their calculated quantitation errors. The postsearch filters on the number of peptides/replicates for each protein can be used as per the experimental design or user choice. This allows flexibility of the experimental design as the decoupling of analysis intricacies now allows innovative ways to analyze data postquantitative integration, depending on whether the labels were used as replicates. A demonstration of quantitative processing is shown in Figure S2. The tool is platform independent, open source, and freely available software for the research community, and it can be downloaded from https://sourceforge.net/projects/hyper-quant/files/ along with the example data and detailed documentation enumerating all steps for analysis.

Ratio Calculation

For BONPlex study, the ratio is calculated separately for infection by different strains, as well as temporally, for further analysis (Figure S3). The log2 fold change was calculated for labeled protein with respect to corresponding unlabeled counterparts at a given time point for studying the strain-specific variations in THP-1 responses to infection. Some proteins were only observed in infected cells. To observe changes in these proteins, iTRAQ ratios were calculated for all time points with respect to the first one. The significance threshold chosen was ±2 fold change, that is, ±1 at log2 scale (also see Note S4 and Table S3). All absent and zero values (identified in any one channel) were converted to −6.64, (log2 of 0.01, minimum quantified value) which denoted an infinitely small value or 6.64 for highly overexpressed or only expressed in infection conditions (log2 of 100, maximum quantified value). For hyperplexing data, the peptide ratios were used for calculating the weighted average protein ratio, as described earlier.[8] The following formula was used:where, N represents the total number of quantitated peptides for the protein. The protein ratios were calculated from TMT labels, as the SILAC labels were used as biological replicates in this experiment. The results provided as the Supporting Information file by the authors was used for comparison with our results.

Gene Ontology/Keyword Search/Pathway Analysis

For BONPlex data, we used the peptide cut-off of ≥3 in individual conditions/biological samples and used the GeneCodis server for gene ontology[46] (Table S3). The data was filtered at ≤1% Bonferroni FDR. Because the study was on immune cells, several keywords based on immune response as elucidated in the literature,[26−28,47,48] were searched in UniProt[49] to the fetch list of proteins. The lists were then compared to the proteins identified in this experiment. The lists were generated using the following keywords: cytokine, chemokine, interferon, interleukin, defense response, caspase, growth factor, signaling, secretory, proteases, and matrix metalloproteases[26,27] (Tables S1 and S3). The proteins associated with immune response pathways were also queried in UniProt to use the resulting list to match with strain specific changes in proteins. These pathways included TLR-4, MAPK, AKT/PI3K, NOTCH, JAK/STAT, WNT, RAS, TEK/TIE2, IGF, and NF-κB pathways (Table S3). We also fetched gene ontology categories pertaining to—defense to bacterial response, inflammation, complement cascade, cytokine production, interleukin secretion, antigen processing and presentation, apoptosis, and autophagy pathways as suggested in the cited literature.[26−28,47,48] The proteins were queried in the reactome[50] database, and the proteins related to immune response pathways were observed (Figure S8).

50 in total

1. Monitoring matrix metalloproteinase activity at the epidermal-dermal interface by SILAC-iTRAQ-TAILS.

Authors: Pascal Schlage; Tobias Kockmann; Jayachandran N Kizhakkedathu; Ulrich auf dem Keller
Journal: Proteomics Date: 2015-05-15 Impact factor: 3.984

2. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification.

Authors: Jürgen Cox; Matthias Mann
Journal: Nat Biotechnol Date: 2008-11-30 Impact factor: 54.908

Review 3. The host immune response to tuberculosis.

Authors: N W Schluger; W N Rom
Journal: Am J Respir Crit Care Med Date: 1998-03 Impact factor: 21.405

4. Time-resolved Analysis of Proteome Dynamics by Tandem Mass Tags and Stable Isotope Labeling in Cell Culture (TMT-SILAC) Hyperplexing.

Authors: Kevin A Welle; Tian Zhang; Jennifer R Hryhorenko; Shichen Shen; Jun Qu; Sina Ghaemmaghami
Journal: Mol Cell Proteomics Date: 2016-10-20 Impact factor: 5.911

5. Microparticles released from Mycobacterium tuberculosis-infected human macrophages contain increased levels of the type I interferon inducible proteins including ISG15.

Authors: Nathan J Hare; Brian Chan; Edwina Chan; Kimberley L Kaufman; Warwick J Britton; Bernadette M Saunders
Journal: Proteomics Date: 2015-07-03 Impact factor: 3.984