Literature DB >> 24753486

aLFQ: an R-package for estimating absolute protein quantities from label-free LC-MS/MS proteomics data.

George Rosenberger¹, Christina Ludwig², Hannes L Röst¹, Ruedi Aebersold¹, Lars Malmström².

Abstract

MOTIVATION: The determination of absolute quantities of proteins in biological samples is necessary for multiple types of scientific inquiry. While relative quantification has been commonly used in proteomics, few proteomic datasets measuring absolute protein quantities have been reported to date. Various technologies have been applied using different types of input data, e.g. ion intensities or spectral counts, as well as different absolute normalization strategies. To date, a user-friendly and transparent software supporting large-scale absolute protein quantification has been lacking.
RESULTS: We present a bioinformatics tool, termed aLFQ, which supports the commonly used absolute label-free protein abundance estimation methods (TopN, iBAQ, APEX, NSAF and SCAMPI) for LC-MS/MS proteomics data, together with validation algorithms enabling automated data analysis and error estimation.
AVAILABILITY AND IMPLEMENTATION: aLFQ is written in R and freely available under the GPLv3 from CRAN (http://www.cran.r-project.org). Instructions and example data are provided in the R-package. The raw data can be obtained from the PeptideAtlas raw data repository (PASS00321). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Proteins

Year: 2014 PMID： 24753486 PMCID： PMC4147881 DOI： 10.1093/bioinformatics/btu200

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 INTRODUCTION

A variety of quantitative proteomic methods have been established to measure the relative abundance of proteins across samples. Although relative quantification methods are useful to compare the same proteins between multiple biological samples, they do not provide the possibility to directly compare the data with other datasets or compare different proteins within a dataset with each other and they, by definition, do not provide absolute quantitative data. Further, specific applications, such as differential equation-based modeling of biological systems or determination of subunit stoichiometry of protein complexes depend on absolute protein quantities. The current gold standard for LC-MS/MS–based absolute protein quantification is the use of stable isotope-labeled standard (SIS) peptides or proteins in precisely determined concentrations (Brun ). These standards are spiked into the biological sample of interest and the absolute concentration of the endogenous peptides, and proteins can directly be determined by calculating the ratio of the measured intensities of the spiked-in heavy and the endogenous light forms. For economic reasons, usually only few proteins are quantified using SIS peptides in a single study. To overcome this limitation, multiple absolute label-free methods have been developed in recent years, which allow the estimation of absolute protein abundances for all or a significant fraction of the identified proteins (Lu ; Ludwig ; Malmström ; Schwanhausser ; Silva ). For a recent discussion and comparison of the methods, see Ahrné . What these methods have in common is that they either use the linear log–log correlation between absolute protein abundance and experimentally estimated protein intensity or an estimate of the total protein concentration of the sample. However, they differ in their protein intensity inference strategy and to date each requires its own computational framework. Here, we provide aLFQ, an open-source implementation of algorithms supporting the estimation of protein quantities by any of the aforementioned methods, and additionally provide automated workflows for data analysis and error estimation.

2 IMPLEMENTATION

aLFQ was implemented in R as a modular S3 package. An example workflow for model selection, depicting the individual functions and sequential arrangement, is shown as diagram in Figure 1. Detailed information on the various workflows and example datasets are provided in the Supplementary Material as well as in the R-package itself.

Fig. 1.

Diagram for exemplary aLFQ workflow with TopX transition and TopN peptide model selection to mediate estimation of protein abundance using SIS peptides. 1. import: generates a generic aLFQ input data structure. 2. ProteinInference: different protein intensity estimation methods can be used to infer protein intensities from measured peptides and transitions. 3. AbsoluteQuantification: using SIS peptides, a model is built and cross-validation is conducted to examine the performance. 4. ALF: different models for varying numbers of transitions and peptides are generated and evaluated and the model with the smallest MFE is selected aLFQ consists of three main modules. The import module provides unified access to the results of common proteomic quantification tools (see Supplementary Material). In addition, an input table with the SIS anchor peptides or anchor proteins and sample-specific absolute abundances or an estimate of the total protein concentration is required. The ProteinInference module enables inference of protein quantities from precursor intensities, transition intensities or spectral counts. If the dataset contains targeted proteomics data, the paired precursor and fragment ion signals, the transitions are first summarized to the precursor level using one of multiple algorithms. To summarize precursor intensities or spectral counts to protein intensities, the TopN (Ludwig ; Malmström ; Silva ), iBAQ (Schwanhausser ), APEX (Lu ), NSAF (Zybailov ) and SCAMPI (Gerster ) methods are provided, enabling direct comparability of the quantitative results. The AbsoluteQuantification module provides absolute protein-abundance estimation from a linear correlation of a set of predefined anchor proteins or peptides. For this, label-free anchor protein intensities and independently determined accurate anchor protein concentrations are both log transformed and a first order linear least-squares regression is calculated. The abundance of all other proteins in the dataset can be estimated based on this regression. The error of the abundance estimation arises from biological and technical variation as well from the protein and peptide intensity estimators. To estimate the error of the predicted protein concentrations, bootstrapping and Monte Carlo cross-validation are performed, with minimization of the mean-fold error (MFE) as objective function.

3 EXAMPLE APPLICATION

An example dataset was produced for this study and is delivered with the aLFQ R-package. The Universal Proteomic Standard 2 (UPS2, Sigma-Aldrich, St. Louis, MO, USA) consists of 48 proteins spanning a dynamic range of five orders of magnitude in bins of eight proteins. The sample was measured in a complex background in shotgun and targeted MS modes (see Supplementary Material). The example data can be accessed using the following commands: library(aLFQ) data(UPS2MS) An exemplary integrated workflow termed ALF (Fig. 1) (Ludwig ) conducting peptide and protein inference model selection can be executed with the following command: ALF(UPS2_SRM) The workflow evaluates the performance for each combination of TopX transitions and TopN peptides. The resulting MFEs are depicted in a levelplot, and the model with the lowest error is selected for estimation of the concentrations of the target proteins.

4 CONCLUSION

aLFQ enables automated absolute label-free protein abundance estimation based on input data from various mass spectrometric measurement modes and analysis software tools. Different quantification methods can be applied in a single framework, and thanks to its implementation in the statistical programming language R, it is accessible to a wide audience of biologists and bioinformaticians. Thus, aLFQ enables easy and fast comparison and selection of the most suitable quantification method and additionally provides an estimation of the absolute abundance estimation error. Funding: G.R. was funded by the Swiss Federal Commission for Technology and Innovation CTI (13539.1 PFFLI-LS), H.L.R. was funded by ETH (ETH-30 11-2), R.A. was funded by PhosphonetX project of SystemsX.ch, the advanced European Research Council grant Proteomics v3.0 (233226) and the Swiss National Science Foundation. Conflict of Interest: none declared.

9 in total

1. Statistical analysis of membrane proteome expression changes in Saccharomyces cerevisiae.

Authors: Boris Zybailov; Amber L Mosley; Mihaela E Sardiu; Michael K Coleman; Laurence Florens; Michael P Washburn
Journal: J Proteome Res Date: 2006-09 Impact factor: 4.466

2. Absolute quantification of proteins by LCMSE: a virtue of parallel MS acquisition.

Authors: Jeffrey C Silva; Marc V Gorenstein; Guo-Zhong Li; Johannes P C Vissers; Scott J Geromanos
Journal: Mol Cell Proteomics Date: 2005-10-11 Impact factor: 5.911

Review 3. Isotope dilution strategies for absolute quantitative proteomics.

Authors: Virginie Brun; Christophe Masselon; Jérôme Garin; Alain Dupuis
Journal: J Proteomics Date: 2009-03-31 Impact factor: 4.044

4. Critical assessment of proteome-wide label-free absolute abundance estimation strategies.

Authors: Erik Ahrné; Lars Molzahn; Timo Glatter; Alexander Schmidt
Journal: Proteomics Date: 2013-07-30 Impact factor: 3.984

5. Statistical approach to protein quantification.

Authors: Sarah Gerster; Taejoon Kwon; Christina Ludwig; Mariette Matondo; Christine Vogel; Edward M Marcotte; Ruedi Aebersold; Peter Bühlmann
Journal: Mol Cell Proteomics Date: 2013-11-19 Impact factor: 5.911

6. Global quantification of mammalian gene expression control.

Authors: Björn Schwanhäusser; Dorothea Busse; Na Li; Gunnar Dittmar; Johannes Schuchhardt; Jana Wolf; Wei Chen; Matthias Selbach
Journal: Nature Date: 2011-05-19 Impact factor: 49.962

7. Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation.

Authors: Peng Lu; Christine Vogel; Rong Wang; Xin Yao; Edward M Marcotte
Journal: Nat Biotechnol Date: 2006-12-24 Impact factor: 54.908

8. Estimation of absolute protein quantities of unlabeled samples by selected reaction monitoring mass spectrometry.

Authors: Christina Ludwig; Manfred Claassen; Alexander Schmidt; Ruedi Aebersold
Journal: Mol Cell Proteomics Date: 2011-11-20 Impact factor: 5.911

9. Proteome-wide cellular protein concentrations of the human pathogen Leptospira interrogans.

Authors: Johan Malmström; Martin Beck; Alexander Schmidt; Vinzenz Lange; Eric W Deutsch; Ruedi Aebersold
Journal: Nature Date: 2009-07-15 Impact factor: 49.962

9 in total

28 in total

1. Quantitative proteomics: challenges and opportunities in basic and applied research.

Authors: Olga T Schubert; Hannes L Röst; Ben C Collins; George Rosenberger; Ruedi Aebersold
Journal: Nat Protoc Date: 2017-06-01 Impact factor: 13.491

2. Application of SWATH Proteomics to Mouse Biology.

Authors: Yibo Wu; Evan G Williams; Ruedi Aebersold
Journal: Curr Protoc Mouse Biol Date: 2017-06-19

3. AP-SWATH Reveals Direct Involvement of VCP/p97 in Integrated Stress Response Signaling Through Facilitating CReP/PPP1R15B Degradation.

Authors: Julia Hülsmann; Bojana Kravic; Matthias Weith; Matthias Gstaiger; Ruedi Aebersold; Ben C Collins; Hemmo Meyer
Journal: Mol Cell Proteomics Date: 2018-03-29 Impact factor: 5.911

Review 4. Algorithms and design strategies towards automated glycoproteomics analysis.

Authors: Han Hu; Kshitij Khatri; Joseph Zaia
Journal: Mass Spectrom Rev Date: 2016-01-04 Impact factor: 10.946

Review 5. Systems proteomics approaches to study bacterial pathogens: application to Mycobacterium tuberculosis.

Authors: Amir Banaei-Esfahani; Charlotte Nicod; Ruedi Aebersold; Ben C Collins
Journal: Curr Opin Microbiol Date: 2017-10-13 Impact factor: 7.934

6. SECAT: Quantifying Protein Complex Dynamics across Cell States by Network-Centric Analysis of SEC-SWATH-MS Profiles.

Authors: George Rosenberger; Moritz Heusel; Isabell Bludau; Ben C Collins; Claudia Martelli; Evan G Williams; Peng Xue; Yansheng Liu; Ruedi Aebersold; Andrea Califano
Journal: Cell Syst Date: 2020-12-16 Impact factor: 10.304

7. In Vivo and in Vitro Proteome Analysis of Human Immunodeficiency Virus (HIV)-1-infected, Human CD4⁺ T Cells.

Authors: Johannes Nemeth; Valentina Vongrad; Karin J Metzner; Victoria P Strouvelle; Rainer Weber; Patrick Pedrioli; Ruedi Aebersold; Huldrych F Günthard; Ben C Collins
Journal: Mol Cell Proteomics Date: 2017-02-21 Impact factor: 5.911

8. Quantitative mass spectrometry of urinary biomarkers.

Authors: Marina Jerebtsova; Sergei Nekhai
Journal: J Integr OMICS Date: 2014-12-01

9. Systems-level Proteomics of Two Ubiquitous Leaf Commensals Reveals Complementary Adaptive Traits for Phyllosphere Colonization.

Authors: Daniel B Müller; Olga T Schubert; Hannes Röst; Ruedi Aebersold; Julia A Vorholt
Journal: Mol Cell Proteomics Date: 2016-07-25 Impact factor: 5.911

10. Data-independent acquisition-based proteome and phosphoproteome profiling across six melanoma cell lines reveals determinants of proteotypes.

Authors: Erli Gao; Wenxue Li; Chongde Wu; Wenguang Shao; Yi Di; Yansheng Liu
Journal: Mol Omics Date: 2021-06-14