Literature DB >> 17553828

The Multi-Q web server for multiplexed protein quantitation.

Chuan-Yih Yu¹, Yin-Hao Tsui, Yi-Hwa Yian, Ting-Yi Sung, Wen-Lian Hsu.

Abstract

The Multi-Q web server provides an automated data analysis tool for multiplexed protein quantitation based on the iTRAQ labeling method. The web server is designed as a platform that can accommodate various input data formats from search engines and mass spectrometer manufacturers. Compared to the previous stand-alone version, the new web server version provides many enhanced features and flexible options for quantitation. The workflow of the web server is represented by a quantitation wizard so that the tool is easy to use. It also provides a friendly interface that helps users configure their parameter settings before running the program. The web server generates a standard report for quantitation results. In addition, it allows users to customize their output reports and information of interest can be easily highlighted. The output also provides visualization of mass spectral data so that users can conveniently validate the results. The Multi-Q web server is a fully automated and easy to use quantitation tool that is suitable for large-scale multiplexed protein quantitation. Users can download the Multi-Q Web Server from http://ms.iis.sinica.edu.tw/Multi-Q-Web.

Entities: Chemical Species

Mesh：

Substances：

Year: 2007 PMID： 17553828 PMCID： PMC1933177 DOI： 10.1093/nar/gkm345

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

The iTRAQ (1) labeling method combined with shotgun proteomic techniques represents a new dimension in multiplexed protein quantitation for relative protein expression of different cell states. The quantitation strategy for iTRAQ-labeling proteomic experiments is based on a set of four isobaric reagents, each of which is comprised of three groups: reporter, balance and reactive groups. After cell lysis, reduction, alkylation and protein digestion, the four-state samples are separately labeled on N-terminals and lysine residues by the reactive group of iTRAQ. The labeled peptides are then combined and analyzed by liquid chromatography (LC) and tandem mass spectrometry (MS/MS). Fully automated liquid chromatography-tandem mass spectrometry (LC-MS/MS) usually generates tens of thousands of MS/MS spectra; hence, the amount of raw data is usually in the order of several gigabytes. Many thousands of peptides can be collectively analyzed by multiple LC-MS/MS runs in each proteomic experiment such that hundreds, or even thousands, of proteins can be identified. However, the high-throughput spectral data always contains a great deal of noise, which makes data analysis for proteomic quantitation intractable. Thus, there is a pressing need for an automated quantitation tool. Although i-TRACKER (2) and ProQUANT (Applied Biosystems, Foster City, CA) have been used for iTRAQ-labeling quantitation, they have some limitations. The former is limited to peptide level quantitation and no protein level analysis is provided, while the latter is limited to instruments developed by Applied Biosystems. Other commonly used quantitation tools such as RelEx (3) and MSQuant (4) are used for the stable isotope labeling technique, a different labeling technique other than iTRAQ for protein quantitation. In RelEx and MSQuant, quantitation is based on MS spectra and designed for two-plex quantitation, whereas iTRAQ-labeling quantitation is based on MS/MS spectra and designed for four-plex quantitation. Therefore, we have developed an automatic quantitation tool for iTRAQ-labeling quantitation, called Multi-Q, which is flexible and accepts various types of protein search results and spectral data. To ensure more accurate quantitation results and to reduce the amount of manual validation, we paid particular attention to the following issues when we designed Multi-Q. First, iTRAQ's relatively large-sample complexity inevitably includes identical peptides produced by two homologous proteins. These degenerate (non-unique) peptides not only result in ambiguous protein identification, but also introduce protein quantitation errors. In contrast, Multi-Q quantitation is based on the non-degenerate peptides. Second, an instrument's dynamic range (5) of detection will very likely affect the accuracy of the quantitation results. Thus, Multi-Q is designed to allow users to input a threshold for dynamic range filtering. Multi-Q is already available as a stand-alone program executable on the Windows platform (6). We have now developed a web server version of Multi-Q. Users can conveniently download the web server and run the program as an internal server. In this version, we provide several enhanced functions as well as a user-friendly visualization interface, as described in the USAGE section.

METHOD

The data analysis procedure of Multi-Q is as follows (Figure 1). Based on the identification results and the MS/MS spectra, signature ions of the iTRAQ-labeled peptides are selected, processed and quantified. Then, for protein ratio determination, peptides with low-identification confidence are removed. We assess the experimental data and filter out peptides outside the instrument's dynamic range prior to peptide ratio normalization. Then, the normalized ratios of quantified, non-degenerate iTRAQ peptides are weighted according to their peak intensities to calculate the average protein ratios, and the final protein quantitation results are exported to an HTML page. The above procedure is Multi-Q's default procedure for calculating protein ratios. Users are allowed to modify or omit some steps of this procedure (see Set up parameters for quantitation in USAGE).

Figure 1.

The Multi-Q web server workflow.

The Multi-Q web server workflow. We now describe each step in detail.

Step 1. Prepare input data

The major MS manufacturers often use different data storage formats, such as WIFF (Applied Biosystems), RAW (Thermal Finnigan and Waters), and BAF (Bruker Daltonics), for mass spectra acquisition. In order to accept different spectral data formats, we use the standard mzXML (7) file as the input spectral data format. For RAW and BAF spectral files, converters are available to convert them into mzXML files. For WIFF spectral files, we provide a tool called mzFast for data conversion from WIFF to mzXML. The mzFast program generates a ‘reduced’ mzXML file by removing spectral peaks with ‘m/z’ over 120, which reduces the cost of data storage and the processing time. The web server accepts search results from MASCOT (8) (CSV format) and SEQUEST (9) [XML format obtained by executing PeptideProphet (10) and ProteinProphet (11)]. It internally generates two data structures: 1) a peptide list containing the peptides' identification confidence scores and degeneracy information, and 2) a protein list containing the correct identification probabilities of all proteins. Using these two data structures, Multi-Q generates a protein summary list with annotated information. Specifically, in MASCOT CSV files, peptides are indexed by the query number, which has its corresponding spectral information. Since quantitation is based on the scan number, we have developed a wiff2scan program to map the query number to the scan number. MASCOT users need to run wiff2scan first to generate the mapping file (.table file). The data flow diagram of Multi-Q is shown in Figure 2.

Figure 2.

The Multi-Q web server data flow diagram.

Step 2. Peptide level processing

After the data has been input, Multi-Q selects iTRAQ labeled peptides with confident MS/MS identification, detects signature ions, and performs automated quantitation of peptide abundance.

Signature ion detection and background subtraction

Peptide ratios are determined by identifying the signature peaks in an MS/MS spectrum with ‘m/z’ of 114 to 117 and comparing their peak intensities. To differentiate true signature peaks from noise, we smooth the mass spectra first with the 3-point moving average method and select peaks using the smoothed spectra. The mass tolerance of signature peak detection is determined internally by the Multi-Q web server. After peak selection, the Multi-Q web server performs background subtraction. The spectrum baseline is defined as the mean of all the valleys in the smoothed curve. Note that the valleys are determined by calculating the first and second order derivatives of the curve. The peak intensity is then calculated by subtracting the baseline from the original data. However, in our experience, not performing smoothing and background subtraction only affects the quantitation results very slightly, but performing the tasks requires a great deal of time. Therefore, the Multi-Q web server provides an option so users can decide whether or not to perform smoothing and background subtraction.

Isotope impurity correction and peptide ratio determination

The iTRAQ quantitation strategy is based on the quartet samples labeled with isotopically distinct tags. The relative peak intensities of the resultant isotope clusters are identified to represent changes in peptide abundance. However, each batch of the iTRAQ reagents contains trace levels of isotopic impurities that must be corrected to avoid variations in the true peak intensity. Since the isotopic distributions of the 114, 115, 116, and 117 signature peaks interfere with each other, over-representation of heavy signature ions will occur. Using the impurity information in the ‘Certificate of Analysis’ provided by the iTRAQ reagent manufacturer, the interference of an isotopic reagent with its two predecessors and two successors can be corrected by solving a simultaneous equation system. Subsequently, the Multi-Q web server calculates the peptide ratios according to the peak intensities of the signature peaks. The ratio calculation uses a weighted average by default, and the Multi-Q web server allows users to employ an unweighted average, i.e. each ion contributes equally.

Step 3. Protein level processing

After determining the peptide ratios, the Multi-Q web server reconstructs the proteins’ abundance ratios from the ratios of their iTRAQ labeled peptides. Only confident peptides are used to determine protein ratios, and the threshold for peptide confidence scores can be input by users. The web server also provides an option for users to input the dynamic range of their instruments see Set up parameters for quantitation in USAGE. For protein quantitation, peptides ratios outside the dynamic range are removed prior to peptide ratio normalization and protein ratio determination.

Normalization of peptide ratios

The Multi-Q web server performs a normalization procedure on the ratios within the dynamic range with the Gaussian distribution (12). The normalization factor, i.e. the reciprocal of the mean of the fitted Gaussian distribution, can be used to correct the systematic bias (13).

Determination of protein abundance ratios peptides

The Multi-Q web server uses non-degenerate peptides to calculate protein rations by default. However, it allows users to decide whether or not to use non-degenerate peptides only. Protein ratios are calculated according to their corresponding peptide ratios. The default calculation uses a weighted average of peptide ratios, and the weight is determined by the peptides' abundance. Users are allowed to use an unweighted average.

Output the results

Initially, the Multi-Q web server allows users to determine the denominators and numerators of their peptide and protein ratios. Then the web server reports ratios accordingly. The Multi-Q web server reports some statistics of the experiment results and a protein summary list, which includes the quantitation results and peptides in each protein. It also has a new interface that provides visualization of results and information. Users can compose their own reports by choosing the fields to be included in the protein summary and peptide summary. After the output's composition has been specified, a pop up window shows the quantitation results. Users can save the results as a web page and use search function provided by the browser to quickly find the proteins or peptides of interest. They can also view and save spectral information about a peptide.

USAGE

Users can download the Multi-Q web server from http://ms.iis.sinica.edu.tw/Multi-Q-Web and follow the installation guide to install the program.

Input phase

The Multi-Q web server sets the ‘Experiment_Data’ folder as the root folder. We recommend that users create a folder under the root to store all input files (including search results, spectral files, and table files) of an experiment. Figures 3 and 4 show the input phase. We only show the input for MASCOT.

Figure 3.

Select the search result type.

Figure 4.

Input phase for MASCOT users.

Select the search result type. Input phase for MASCOT users.

Set up parameters for quantitation

In this stage, users can input their isotope impurity correction table and dynamic range, define their own ratio calculations, and choose peptides for protein quantitation, as shown in Figure 5.

Figure 5.

Set up quantitation parameters.

Set up quantitation parameters. We recommend that new users first run the Multi-Q web server on standard data to determine the dynamic range of their instruments. To do this, the ‘Standard’ data type, not the ‘Sample’ data type, should be selected; and the ‘Dynamic Range’ box should be left unchecked. The web server will output the dynamic range (Figure 6) in addition to the quantitation results. Then, the reported (or modified) dynamic range can be input for subsequent quantitation in other experiments. In this step, the ‘Sample’ data type should be selected.

Figure 6.

Quantitation results for a standard data experiment.

Output

The Multi-Q web server provides copious amounts of information for proteins and peptides, as shown below. Terms used in this report are explained in the User Guide available at: http://ms.iis.sinica.edu.tw/Multi-Q-Web. In addition to the default output report (Figure 6), the web server allows users to customize their output reports by clicking ‘Output composition’ at the top of the web page shown in Figure 6. Users can highlight fields of interest and conveniently browse peptides with very low or high ratios among the large number of fields in the output. Figure 7 shows an example of a customized report.

Figure 7.

An example of a customized report.

An example of a customized report. Finally, the Multi-Q web server provides a visualization interface that shows the four MS/MS spectra (raw data, smoothed data, spectrum with background, and spectrum with isotope impurity correction). By clicking the spectrum number in the peptide summary table, users can easily compare their quantitation results with the MS/MS spectra. Figure 8 shows an MS/MS spectrum with isotope impurity correction.

Figure 8.

An example of an MS/MS spectrum with isotope impurity correction.

CONCLUSION

The Multi-Q web server can help users achieve accurate, robust, high-throughput computation for multiplexed protein quantitation. Compared to the stand-alone version of Multi-Q, this new web server provides many enhanced features as well as more flexibility. First, given the standard protein mixture data as input, it determines the dynamic range of first-time users' instruments. Then the reported dynamic range can be used for subsequent quantitation. Second, to satisfy user preferences, it allows users to choose their parameters for quantitation, such as setting the confidence score threshold, deciding whether or not to use non-degenerate peptides for quantitation, and defining ratios. Third, the web server version further strengthens the result output so that users can easily find useful information. The Multi-Q web server provides a friendly interface for users to customize their own output report, in addition to the default report. In the customized report, users can highlight information of interest, e.g. peptides or proteins with very low or very high ratios. Since the output is shown on a web page, users can use the search function of their browsers to find items of interest. The Multi-Q web server can show MS/MS spectra in the output so that users can easily validate their results. In summary, the Multi-Q web server is a fully automated and easy to use quantitation tool that is suitable for large-scale multiplexed protein quantitation.

11 in total

1. Probability-based protein identification by searching sequence databases using mass spectrometry data.

Authors: D N Perkins; D J Pappin; D M Creasy; J S Cottrell
Journal: Electrophoresis Date: 1999-12 Impact factor: 3.535

2. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search.

Authors: Andrew Keller; Alexey I Nesvizhskii; Eugene Kolker; Ruedi Aebersold
Journal: Anal Chem Date: 2002-10-15 Impact factor: 6.986

3. A correlation algorithm for the automated quantitative analysis of shotgun proteomics data.

Authors: Michael J MacCoss; Christine C Wu; Hongbin Liu; Rovshan Sadygov; John R Yates
Journal: Anal Chem Date: 2003-12-15 Impact factor: 6.986

4. A novel proteomic screen for peptide-protein interactions.

Authors: Waltraud X Schulze; Matthias Mann
Journal: J Biol Chem Date: 2003-12-16 Impact factor: 5.157

5. A statistical model for identifying proteins by tandem mass spectrometry.

Authors: Alexey I Nesvizhskii; Andrew Keller; Eugene Kolker; Ruedi Aebersold
Journal: Anal Chem Date: 2003-09-01 Impact factor: 6.986

6. Multi-Q: a fully automated tool for multiplexed protein quantitation.

Authors: Wen-Ting Lin; Wei-Neng Hung; Yi-Hwa Yian; Kun-Pin Wu; Chia-Li Han; Yet-Ran Chen; Yu-Ju Chen; Ting-Yi Sung; Wen-Lian Hsu
Journal: J Proteome Res Date: 2006-09 Impact factor: 4.466

7. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database.

Authors: J K Eng; A L McCormack; J R Yates
Journal: J Am Soc Mass Spectrom Date: 1994-11 Impact factor: 3.109

8. Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents.

Authors: Philip L Ross; Yulin N Huang; Jason N Marchese; Brian Williamson; Kenneth Parker; Stephen Hattan; Nikita Khainovski; Sasi Pillai; Subhakar Dey; Scott Daniels; Subhasish Purkayastha; Peter Juhasz; Stephen Martin; Michael Bartlet-Jones; Feng He; Allan Jacobson; Darryl J Pappin
Journal: Mol Cell Proteomics Date: 2004-09-22 Impact factor: 5.911

9. i-Tracker: for quantitative proteomics using iTRAQ.

Authors: Ian P Shadforth; Tom P J Dunkley; Kathryn S Lilley; Conrad Bessant
Journal: BMC Genomics Date: 2005-10-20 Impact factor: 3.969

10. A common open representation of mass spectrometry data and its application to proteomics research.

Authors: Patrick G A Pedrioli; Jimmy K Eng; Robert Hubley; Mathijs Vogelzang; Eric W Deutsch; Brian Raught; Brian Pratt; Erik Nilsson; Ruth H Angeletti; Rolf Apweiler; Kei Cheung; Catherine E Costello; Henning Hermjakob; Sequin Huang; Randall K Julian; Eugene Kapp; Mark E McComb; Stephen G Oliver; Gilbert Omenn; Norman W Paton; Richard Simpson; Richard Smith; Chris F Taylor; Weimin Zhu; Ruedi Aebersold
Journal: Nat Biotechnol Date: 2004-11 Impact factor: 54.908

4 in total

1. Cross-talk between GlcNAcylation and phosphorylation: site-specific phosphorylation dynamics in response to globally elevated O-GlcNAc.

Authors: Zihao Wang; Marjan Gucek; Gerald W Hart
Journal: Proc Natl Acad Sci U S A Date: 2008-09-08 Impact factor: 11.205