Literature DB >> 31909781

iq: an R package to estimate relative protein abundances from ion quantification in DIA-MS-based proteomics.

Thang V Pham1, Alex A Henneman1, Connie R Jimenez1.   

Abstract

SUMMARY: We present an R package called iq to enable accurate protein quantification for label-free data-independent acquisition (DIA) mass spectrometry-based proteomics, a recently developed global approach with superior quantitative consistency. We implement the popular maximal peptide ratio extraction module of the MaxLFQ algorithm, so far only applicable to data-dependent acquisition mode using the software suite MaxQuant. Moreover, our implementation shows, for each protein separately, the validity of quantification over all samples. Hence, iq exports a state-of-the-art protein quantification algorithm to the emerging DIA mode in an open-source implementation.
AVAILABILITY AND IMPLEMENTATION: The open-source R package is available on CRAN, https://github.com/tvpham/iq/releases and oncoproteomics.nl/iq. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2020. Published by Oxford University Press.

Entities:  

Year:  2020        PMID: 31909781      PMCID: PMC7178409          DOI: 10.1093/bioinformatics/btz961

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

The data-independent acquisition (DIA) approach in mass spectrometry (MS)-based proteomics has emerged as a promising alternative to data-dependent acquisition (DDA) because of its ability to provide a more complete data matrix by combining unbiased, broad range precursor ion fragmentation and targeted data extraction (Gillet ). Since in both approaches, protein abundances are derived from multiple partial intensities, peptide precursor and peptide fragment intensities in DDA and DIA, respectively, it is desirable to extend advanced DDA quantification methods and insights to DIA data, which possesses the largest degree of multiplicity. Existing methods to estimate protein abundances for DIA data include the so-called topN approach where the top N most intense ions are aggregated, either by summation or averaging, as they tend to be more robust than less intense ions. The MeanInt approach averages all extracted ion intensities (Ahrné ). Another method is the median polish algorithm (Tukey, 1977), available as part of the standard R distribution and employed by the package MSstats (Choi ). For DDA data, the MaxLFQ algorithm elegantly combines multiple peptide ratios to derive optimal protein ratios between pairs of samples (Cox ), which is widely used as part of the closed source MaxQuant software package. In this note, we present an open-source implementation of the MaxLFQ maximal peptide ratio extraction algorithm for protein quantification in a DIA data processing pipeline called iq, short for ion-based protein quantification.

2 Implementation

The analysis starts with a long-format data table as exported by Spectronaut version 13.0. The default columns include a primary protein identification column PG.ProteinGroups, secondary columns EG.ModifiedSequence, FG.Charge, F.FrgIon, F.Charge and quantitative values in F.PeakArea. By preparing the input data appropriately, we can use iq to accommodate other proteomic data extraction pipelines [see Supplementary Material for examples of processing MaxQuant and OpenSWATH outputs (Röst )]. Figure 1 outlines three main steps in iq. First, unique values in the primary column form the list of proteins and a concatenation of the secondary columns determines the fragments used to quantify individual proteins. We provide an option for median normalization of all observed intensities and visualization for quality control.
Fig. 1.

The iq pipeline. The preprocessing step performs data filtering and median normalization. Subsequently, a numerical matrix is formed for each protein containing its fragment ion intensities. The MaxLFQ algorithm optimizes ratios between all pairs of samples i and j

The iq pipeline. The preprocessing step performs data filtering and median normalization. Subsequently, a numerical matrix is formed for each protein containing its fragment ion intensities. The MaxLFQ algorithm optimizes ratios between all pairs of samples i and j Second, we enumerate a list of matrices of log2-transformed intensities for all proteins. Let X be an observed data matrix for a particular protein with n columns for samples and m rows for quantified fragment ions. The matrix X may have missing values. The aim is to estimate an n-vector for relative protein abundance across all samples. The MaxLFQ algorithm obtains by minimizing the sum of squared differences between the estimated pairwise sample ratio and the median of the observed ratios where is the median of all pairwise fragment intensity ratios observed between sample i and j (here differences in log-space). For each fragment, all intensity ratios without missing values are valid pairs. Third, we solve the above minimization problem as follows. We populate a matrix A of size p × n and a p-element vector where p is the number of valid pairs, and each row of A contains all zeroes except for two entries with value of −1 and +1 corresponding to the sample ratio in . The minimization of (1) becomes an equality constrained least squares where 1 is an n-vector of ones and c is a scaling constant to preserve the average sum intensity. It can be shown that is part of a solution of the system of linear equations where z is an auxiliary variable. We employ the R function lsfit to estimate . Note that (2) has a unique solution when the square matrix on the left hand side is invertible. A close examination shows that is the Laplacian matrix of the sample graph whose nodes are the n samples, and two nodes are directly connected if there is at least a valid sample pair between them. If is connected, that is there is a path connecting any two nodes, it can be shown that (2) has a unique solution. This is not the case when is not connected. Therefore, we implement a recursive spreading algorithm to detect connected components of . Subsequently, the MaxLFQ algorithm is applied to each connected component. In principle, only samples within a connected component can be compared in downstream analysis due to factors such as different ionization efficiency of different protein fragments. An example is illustrated in Supplementary Figure S7. In short, we have specified the condition under which the algorithm will lead to a valid relative quantification. Finally, we also implement the topN method, the MeanInt method, and provide a wrapper for the median polish method in iq. The package is without any dependency on external R packages.

3 Example

We analyze a publicly available dataset used in a benchmark experiment for label-free DDA and DIA proteomics (Bruderer ). The dataset for each mode of data acquisition contains 24 runs of 8 biological replicates and 3 technical replicates, in which 12 proteins were spiked in at different concentrations. MaxQuant version 1.6.4.0 is used to process the DDA data, and Spectronaut version 13.0 is used for DIA data. The result of the MaxQuant DDA search is used as a spectral library in the DIA analysis. We use the default Spectronaut long format export with an addition of the columns: F.ExcludedFromQuantification, F.FrgLossType, F.FrgIon, F.Charge and F.PeakArea. An R script to process the Spectronaut output is as follows: Briefly, the first two statements load the data into R and remove entries not used for quantification. The next four statements load the iq package and perform quantification in three steps as described in Section 2. The last statement exports the result to a text file. Protein quantitative visualization used in Figure 1 is created by a function call iq::plot_protein(protein_list$P00366). In the Supplementary Material, we show that the MaxLFQ algorithm compares favorably with other methods in terms of both correlation to the ground-truth values for the 12 spike-in proteins and the stability of background proteins.

4 Conclusion

The R package iq contains an open-source implementation of the maximal peptide ratio extraction module of MaxLFQ algorithm in a complete pipeline for processing proteomics DIA data. It offers an additional option for protein quantification next to the topN and the median polish approach, while being a direct implementation of a popular algorithm for DDA analysis. We show that optimal relative protein quantification is achieved for comparisons involving the same peptide components. Click here for additional data file.
  6 in total

1.  Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: a new concept for consistent and accurate proteome analysis.

Authors:  Ludovic C Gillet; Pedro Navarro; Stephen Tate; Hannes Röst; Nathalie Selevsek; Lukas Reiter; Ron Bonner; Ruedi Aebersold
Journal:  Mol Cell Proteomics       Date:  2012-01-18       Impact factor: 5.911

2.  Critical assessment of proteome-wide label-free absolute abundance estimation strategies.

Authors:  Erik Ahrné; Lars Molzahn; Timo Glatter; Alexander Schmidt
Journal:  Proteomics       Date:  2013-07-30       Impact factor: 3.984

3.  MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments.

Authors:  Meena Choi; Ching-Yun Chang; Timothy Clough; Daniel Broudy; Trevor Killeen; Brendan MacLean; Olga Vitek
Journal:  Bioinformatics       Date:  2014-05-02       Impact factor: 6.937

4.  OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data.

Authors:  Hannes L Röst; George Rosenberger; Pedro Navarro; Ludovic Gillet; Saša M Miladinović; Olga T Schubert; Witold Wolski; Ben C Collins; Johan Malmström; Lars Malmström; Ruedi Aebersold
Journal:  Nat Biotechnol       Date:  2014-03       Impact factor: 54.908

5.  Extending the limits of quantitative proteome profiling with data-independent acquisition and application to acetaminophen-treated three-dimensional liver microtissues.

Authors:  Roland Bruderer; Oliver M Bernhardt; Tejas Gandhi; Saša M Miladinović; Lin-Yang Cheng; Simon Messner; Tobias Ehrenberger; Vito Zanotelli; Yulia Butscheid; Claudia Escher; Olga Vitek; Oliver Rinner; Lukas Reiter
Journal:  Mol Cell Proteomics       Date:  2015-02-27       Impact factor: 5.911

6.  Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ.

Authors:  Jürgen Cox; Marco Y Hein; Christian A Luber; Igor Paron; Nagarjuna Nagaraj; Matthias Mann
Journal:  Mol Cell Proteomics       Date:  2014-06-17       Impact factor: 5.911

  6 in total
  9 in total

1.  diaPASEF: parallel accumulation-serial fragmentation combined with data-independent acquisition.

Authors:  Florian Meier; Andreas-David Brunner; Max Frank; Annie Ha; Isabell Bludau; Eugenia Voytik; Stephanie Kaspar-Schoenefeld; Markus Lubeck; Oliver Raether; Nicolai Bache; Ruedi Aebersold; Ben C Collins; Hannes L Röst; Matthias Mann
Journal:  Nat Methods       Date:  2020-11-30       Impact factor: 28.547

2.  Intestinal epithelial c-Maf expression determines enterocyte differentiation and nutrient uptake in mice.

Authors:  Catalina Cosovanu; Philipp Resch; Stefan Jordan; Andrea Lehmann; Markus Ralser; Vadim Farztdinov; Joachim Spranger; Michael Mülleder; Sebastian Brachs; Christian Neumann
Journal:  J Exp Med       Date:  2022-09-19       Impact factor: 17.579

3.  Ultra-High-Throughput Clinical Proteomics Reveals Classifiers of COVID-19 Infection.

Authors:  Christoph B Messner; Vadim Demichev; Daniel Wendisch; Laura Michalick; Matthew White; Anja Freiwald; Kathrin Textoris-Taube; Spyros I Vernardis; Anna-Sophia Egger; Marco Kreidl; Daniela Ludwig; Christiane Kilian; Federica Agostini; Aleksej Zelezniak; Charlotte Thibeault; Moritz Pfeiffer; Stefan Hippenstiel; Andreas Hocke; Christof von Kalle; Archie Campbell; Caroline Hayward; David J Porteous; Riccardo E Marioni; Claudia Langenberg; Kathryn S Lilley; Wolfgang M Kuebler; Michael Mülleder; Christian Drosten; Norbert Suttorp; Martin Witzenrath; Florian Kurth; Leif Erik Sander; Markus Ralser
Journal:  Cell Syst       Date:  2020-06-02       Impact factor: 10.304

4.  Simple urine storage protocol for extracellular vesicle proteomics compatible with at-home self-sampling.

Authors:  L A Erozenci; T V Pham; S R Piersma; N F J Dits; G W Jenster; M E van Royen; R J A Moorselaar; C R Jimenez; I V Bijnsdorp
Journal:  Sci Rep       Date:  2021-10-21       Impact factor: 4.379

5.  Age-Related Differences in Structure and Function of Nasal Epithelial Cultures From Healthy Children and Elderly People.

Authors:  Anita Balázs; Pamela Millar-Büchner; Michael Mülleder; Vadim Farztdinov; Lukasz Szyrwiel; Annalisa Addante; Aditi Kuppe; Tihomir Rubil; Marika Drescher; Kathrin Seidel; Sebastian Stricker; Roland Eils; Irina Lehmann; Birgit Sawitzki; Jobst Röhmel; Markus Ralser; Marcus A Mall
Journal:  Front Immunol       Date:  2022-02-28       Impact factor: 7.561

6.  Molecular Physiological Characterization of a High Heat Resistant Spore Forming Bacillus subtilis Food Isolate.

Authors:  Zhiwei Tu; Peter Setlow; Stanley Brul; Gertjan Kramer
Journal:  Microorganisms       Date:  2021-03-23

7.  Longitudinal stability of urinary extracellular vesicle protein patterns within and between individuals.

Authors:  Leyla A Erozenci; Sander R Piersma; Thang V Pham; Irene V Bijnsdorp; Connie R Jimenez
Journal:  Sci Rep       Date:  2021-08-02       Impact factor: 4.379

8.  MSLibrarian: Optimized Predicted Spectral Libraries for Data-Independent Acquisition Proteomics.

Authors:  Marc Isaksson; Christofer Karlsson; Thomas Laurell; Agnete Kirkeby; Moritz Heusel
Journal:  J Proteome Res       Date:  2022-01-19       Impact factor: 5.370

9.  CdrS Is a Global Transcriptional Regulator Influencing Cell Division in Haloferax volcanii.

Authors:  Yan Liao; Verena Vogel; Sabine Hauber; Jürgen Bartel; Omer S Alkhnbashi; Sandra Maaß; Thandi S Schwarz; Rolf Backofen; Dörte Becher; Iain G Duggin; Anita Marchfelder
Journal:  mBio       Date:  2021-07-13       Impact factor: 7.867

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.