Literature DB >> 30816924

PANDA: A comprehensive and flexible tool for quantitative proteomics data analysis.

Cheng Chang1, Mansheng Li1, Chaoping Guo2, Yuqing Ding2, Kaikun Xu1, Mingfei Han1, Fuchu He1, Yunping Zhu1.   

Abstract

SUMMARY: As the experiment techniques and strategies in quantitative proteomics are improving rapidly, the corresponding algorithms and tools for protein quantification with high accuracy and precision are continuously required to be proposed. Here, we present a comprehensive and flexible tool named PANDA for proteomics data quantification. PANDA, which supports both label-free and labeled quantifications, is compatible with existing peptide identification tools and pipelines with considerable flexibility. Compared with MaxQuant on several complex datasets, PANDA was proved to be more accurate and precise with less computation time. Additionally, PANDA is an easy-to-use desktop application tool with user-friendly interfaces.
AVAILABILITY AND IMPLEMENTATION: PANDA is freely available for download at https://sourceforge.net/projects/panda-tools/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2018. Published by Oxford University Press.

Entities:  

Mesh:

Substances:

Year:  2019        PMID: 30816924      PMCID: PMC6394390          DOI: 10.1093/bioinformatics/bty727

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Quantitative proteomics is gaining its popularity by providing a global and systematic view on biological processes and cellular functions (Schubert ). There are two kinds of approaches to protein quantification according to whether the sample is isotope labeled, i.e. the label-free and labeled quantifications. Nowadays, numbers of algorithms and software tools have been proposed and developed to facilitate label-free or labeled quantification of proteomics data. Due to the variety of experiment designs and strategies in quantitative proteomics, current quantification software tools are usually only suitable for a few specific quantitative experiment strategies, such as PyQuant (Mitchell ) and SILVER (Chang ) for stable isotope labeling quantification, RIPPER (Van Riper ) and LFQuant (Zhang ) for label-free quantification. Even the famous tool MaxQuant (Cox and Mann, 2008), which contains many methods for label-free and labeled quantifications, cannot support 15N labeling method. Moreover, MaxQuant consists of its own mass spectrometry (MS) data analysis algorithms, which are not compatible with other tools or pipelines. In brief, there is a lack of comprehensive and flexible quantification tools for the rapidly developing quantitative proteomics. Here, we present a new tool named PANDA for accurate and precise analysis of quantitative proteomics with high comprehensiveness and flexibility. PANDA can process MS data from different instrument manufacturers by reading the standard formats mzXML and mzML. It is also able to be compatible with existing peptide identification tools (e.g. Mascot) by supporting the standard format mzIdentML. PANDA contains multiple methods to deal with MS data produced in various kinds of quantitative strategies. Further, by integrating the advanced algorithms of our previous quantification tools LFQuant and SILVER, PANDA has been demonstrated to be accurate and precise for protein quantification.

2 Materials and methods

Benchmark datasets

For label-free quantification, the yeast samples with a serial dilution of UPS2 (Proteomics Dynamic Range Standard, Sigma-Aldrich) standard proteins (1 µg, 0.2µg, 0.04µg, 0.008µg) spiked in from (Chang ) were analyzed in this study. For labeled quantification, a large-scale complex dataset obtained from HeLa cells (Cox and Mann, 2008) with stable isotope labeling by amino acids in cell culture (SILAC) was used. Moreover, several phosphoproteomic datasets (Hogrebe, ) and a 15N labeling dataset (Arsova, ) were also used for evaluation in this study. See Supplementary Methods for details.

PANDA workflow

PANDA is designed for comprehensive and flexible analysis of both label-free and labeled quantitative proteomics data. As shown in Figure 1, PANDA consists of three core layers, i.e. the data layer, the function layer and the algorithm layer. (i) The data layer includes two kinds of input data in PANDA: MS data and peptide identification. For MS data, PANDA can directly process Thermo raw files through MSFileReader. Besides, it can also take the MS data standard formats mzXML and mzML as input. For peptide identification, being able to access the mzIdentML format proposed by the Human Proteome Organization Proteomics Standards Initiative makes it possible for PANDA to quantify the results of the commonly-used peptide identification tools, such as Mascot, SEQUEST, X! Tandem and MS-GF+. Meanwhile, PANDA can read the quality control results of PeptideProphet (Keller ) and PepDistiller (Li ), which further broadens its usage and flexibility. (ii) The function layer contains the current mainstream quantification methods. For label-free quantification, spectral count (SC) method and extracted-ion chromatography (XIC) (also named as intensity-based) method were implemented in PANDA. As to labeled quantification, PANDA supports the prevalent precursor ion labeling methods, i.e., SILAC, 18O, 15N, isotope-coded affinity tags (ICAT) and isotope-coded protein labels (ICPL), as well as product ion labeling methods, i.e. isobaric tag for relative and absolute quantitation (iTRAQ) and tandem mass tag (TMT). Furthermore, users can define their own labeling methods in PANDA. (iii) The algorithm layer includes the basic algorithms for MS data processing and peptide/protein quantification (Supplementary Note 1). Part of them are adapted from LFQuant and SILVER, such as the reversible retention time (RT) alignment algorithm in LFQuant, the multi-filters for XIC construction and the dynamic isotopic matching tolerance algorithm in SILVER.
Fig. 1.

The schema of PANDA workflow. PANDA consists of three core components, i.e., the data layer, the function layer and the algorithm layer

The schema of PANDA workflow. PANDA consists of three core components, i.e., the data layer, the function layer and the algorithm layer

3 Results

In this study, PANDA was compared with MaxQuant (v1.6.0.13, released on Aug 2017) on a yeast dataset with four concentration levels of UPS2 standard proteins spiked in (A–D groups) for label-free quantification and a large-scale HeLa dataset with SILAC labeling as well as several SILAC and TMT labeling phosphoproteomic datasets for labeled quantifications, respectively.

Accuracy evaluation

In the yeast dataset, the theoretical ratios of the spiked-in UPS2 proteins for A/B, A/C and A/D should be 5, 25 and 125. As shown in Supplementary Figure S1, the quantification results of PANDA were closer to the theoretical ratios than those of MaxQuant. In the HeLa dataset, the SILAC ratios of the 3471 proteins commonly quantified by PANDA and MaxQuant were shown in Supplementary Figure S2. The ratio distribution of PANDA was also closer to the theoretical ratio (1: 1) than that of MaxQuant. In the phosphoproteomic datasets, PANDA owns a similar accuracy compared with MaxQuant (Supplementary Figs S3 and S4). These results demonstrated PANDA has a high accuracy for both label-free and labeled quantifications in a wide dynamic range. Specially, another advantage of PANDA is that it can handle 15N labeling data with high accuracy (Supplementary Fig. S5).

Precision evaluation

In the yeast dataset, PANDA showed a lower coefficient of variation (CV) distribution of the yeast proteins for the technical replicates within each group (A–D) than MaxQuant, indicating the high precision of PANDA for label-free quantification (Supplementary Fig. S6). In the HeLa dataset, the protein intensity CVs of the three technical replicates for both SILAC labeled and unlabeled samples were calculated and PANDA also displayed a lower CV distribution than MaxQuant, which proved that PANDA is precise for labeled quantification (Supplementary Fig. S7). More details are provided in Supplementary Notes 2-3. Finally, PANDA is efficient due to the refinement of its source codes and the inclusion of popular third-party libraries, such as GNU scientific library. It spent less computation time than MaxQuant on all the datasets (Supplementary Table S1).

4 Conclusion

In summary, PANDA contains a comprehensive algorithm collection for label-free and labeled quantifications and supports all the main methods in quantitative proteomics. Being able to read proteomics data in public format, PANDA is very flexible and compatible with existing peptide identification tools or MS data analysis pipelines. Most importantly, PANDA is proved to be accurate and precise for label-free and labeled quantifications. Although PANDA can only run in Windows at present, other operating systems will be supported in the future. At last, the quantification results of PANDA can be further analyzed in its affiliated tool PANDA-view (Chang ) for statistical analysis and data visualization.

Funding

This work was supported by the National Key Research and Development Program of China [2017YFA0505002 and 2017YFC0906602] and the National Natural Science Foundation of China [21605159 and 21475150]. Conflict of Interest: none declared. Click here for additional data file.
  12 in total

1.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search.

Authors:  Andrew Keller; Alexey I Nesvizhskii; Eugene Kolker; Ruedi Aebersold
Journal:  Anal Chem       Date:  2002-10-15       Impact factor: 6.986

2.  MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification.

Authors:  Jürgen Cox; Matthias Mann
Journal:  Nat Biotechnol       Date:  2008-11-30       Impact factor: 54.908

3.  SILVER: an efficient tool for stable isotope labeling LC-MS data quantitative analysis with quality control methods.

Authors:  Cheng Chang; Jiyang Zhang; Mingfei Han; Jie Ma; Wei Zhang; Songfeng Wu; Kehui Liu; Hongwei Xie; Fuchu He; Yunping Zhu
Journal:  Bioinformatics       Date:  2013-12-15       Impact factor: 6.937

4.  Quantitative proteomics: challenges and opportunities in basic and applied research.

Authors:  Olga T Schubert; Hannes L Röst; Ben C Collins; George Rosenberger; Ruedi Aebersold
Journal:  Nat Protoc       Date:  2017-06-01       Impact factor: 13.491

5.  LFQuant: a label-free fast quantitative analysis tool for high-resolution LC-MS/MS proteomics data.

Authors:  Wei Zhang; Jiyang Zhang; Changming Xu; Ning Li; Hui Liu; Jie Ma; Yunping Zhu; Hongwei Xie
Journal:  Proteomics       Date:  2012-12       Impact factor: 3.984

6.  RIPPER: a framework for MS1 only metabolomics and proteomics label-free relative quantification.

Authors:  Susan K Van Riper; LeeAnn Higgins; John V Carlis; Timothy J Griffin
Journal:  Bioinformatics       Date:  2016-02-18       Impact factor: 6.937

7.  Quantitative and In-Depth Survey of the Isotopic Abundance Distribution Errors in Shotgun Proteomics.

Authors:  Cheng Chang; Jiyang Zhang; Changming Xu; Yan Zhao; Jie Ma; Tao Chen; Fuchu He; Hongwei Xie; Yunping Zhu
Journal:  Anal Chem       Date:  2016-06-20       Impact factor: 6.986

8.  PyQuant: A Versatile Framework for Analysis of Quantitative Mass Spectrometry Data.

Authors:  Christopher J Mitchell; Min-Sik Kim; Chan Hyun Na; Akhilesh Pandey
Journal:  Mol Cell Proteomics       Date:  2016-05-26       Impact factor: 5.911

9.  Benchmarking common quantification strategies for large-scale phosphoproteomics.

Authors:  Alexander Hogrebe; Louise von Stechow; Dorte B Bekker-Jensen; Brian T Weinert; Christian D Kelstrup; Jesper V Olsen
Journal:  Nat Commun       Date:  2018-03-13       Impact factor: 14.919

10.  PANDA-view: an easy-to-use tool for statistical analysis and visualization of quantitative proteomics data.

Authors:  Cheng Chang; Kaikun Xu; Chaoping Guo; Jinxia Wang; Qi Yan; Jian Zhang; Fuchu He; Yunping Zhu
Journal:  Bioinformatics       Date:  2018-10-15       Impact factor: 6.937

View more
  5 in total

1.  An antibody-based proximity labeling protocol to identify biotinylated interactors of SARS-CoV-2.

Authors:  Limin Shang; Yuehui Zhang; Yuchen Liu; Chaozhi Jin; Yanan Zhao; Jing Zhang; Pei-Hui Wang; Jian Wang
Journal:  STAR Protoc       Date:  2022-05-02

2.  ExpressVis: a biologist-oriented interactive web server for exploring multi-omics data.

Authors:  Xian Liu; Kaikun Xu; Xin Tao; Ronghua Yin; Guangming Ren; Miao Yu; Changyan Li; Hui Chen; Ke Zhao; Shensi Xiang; Huiying Gao; Xiaochen Bo; Cheng Chang; Xiaoming Yang
Journal:  Nucleic Acids Res       Date:  2022-05-25       Impact factor: 19.160

3.  Software Options for the Analysis of MS-Proteomic Data.

Authors:  Avinash Yadav; Federica Marini; Alessandro Cuomo; Tiziana Bonaldi
Journal:  Methods Mol Biol       Date:  2021

4.  Characterization of the Cannabis sativa glandular trichome proteome.

Authors:  Lee James Conneely; Ramil Mauleon; Jos Mieog; Bronwyn J Barkla; Tobias Kretzschmar
Journal:  PLoS One       Date:  2021-04-01       Impact factor: 3.240

Review 5.  Proteomics Landscape of Alzheimer's Disease.

Authors:  Ankit P Jain; Gajanan Sathe
Journal:  Proteomes       Date:  2021-03-10
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.