| Literature DB >> 30859831 |
Johannes Griss1,2, Florian Stanek3,4, Otto Hudecz3,4, Gerhard Dürnberger3,4,5, Yasset Perez-Riverol2, Juan Antonio Vizcaíno2, Karl Mechtler3,4.
Abstract
Label-free quantification has become a common-practice in many mass spectrometry-based proteomics experiments. In recent years, we and others have shown that spectral clustering can considerably improve the analysis of (primarily large-scale) proteomics data sets. Here we show that spectral clustering can be used to infer additional peptide-spectrum matches and improve the quality of label-free quantitative proteomics data in data sets also containing only tens of MS runs. We analyzed four well-known public benchmark data sets that represent different experimental settings using spectral counting and peak intensity based label-free quantification. In both approaches, the additionally inferred peptide-spectrum matches through our spectra-cluster algorithm improved the detectability of low abundant proteins while increasing the accuracy of the derived quantitative data, without increasing the data sets' noise. Additionally, we developed a Proteome Discoverer node for our spectra-cluster algorithm which allows anyone to rebuild our proposed pipeline using the free version of Proteome Discoverer.Entities:
Keywords: IMP free nodes; Proteome Discoverer; Proteome Discoverer node; benchmarking study; bioinformatics; label-free quantification; mass spectrometry; proteomics; spectral clustering; spectral counting
Mesh:
Substances:
Year: 2019 PMID: 30859831 PMCID: PMC6456873 DOI: 10.1021/acs.jproteome.8b00377
Source DB: PubMed Journal: J Proteome Res ISSN: 1535-3893 Impact factor: 4.466
Figure 1Overview of used workflows to assess the influence of additionally inferred PSMs through clustering on LFQ, for both spectral counting and intensity-based approaches.
Figure 2Number of detected spiked UPS proteins (n = 48) in the Ramus et al. data set from the (A) spectral counting pipeline (two search engines used, X!Tandem and MSGF+) and (B) the intensity-based pipeline (with and without MBR enabled).
Figure 3Results of the statistical analysis using limma for the intensity-based pipeline (A, B) and edgeR for the spectral counting based pipeline (C), as true versus false positive rates. (A) Combined result for the three CPTAC data sets using the intensity-based pipeline. (B) Result for the Ramus et al. data set from the intensity- and (C) from the spectral counting-based pipeline.
Figure 4Logarithmic fold change of background proteins from all comparisons using the spectral counting pipeline. In all analyzed data sets, the estimated fold change of background proteins came close to 0 through the clustering of inferred identifications. Panels show the data for the CPTAC data sets site_65 (A), site_65_OrbiP (B), site_86 (C), and the iPRG (D), the Ramus et al. (E), and the Shalit et al. data sets (F).