| Literature DB >> 33661646 |
Abstract
Error estimation for differential protein quantification by label-free shotgun proteomics is challenging due to the multitude of error sources, each contributing uncertainty to the final results. We have previously designed a Bayesian model, Triqler, to combine such error terms into one combined quantification error. Here we present an interface for Triqler that takes MaxQuant results as input, allowing quick reanalysis of already processed data. We demonstrate that Triqler outperforms the original processing for a large set of both engineered and clinical/biological relevant data sets. Triqler and its interface to MaxQuant are available as a Python module under an Apache 2.0 license from https://pypi.org/project/triqler/.Entities:
Keywords: Bayesian statistics; label-free quantification; mass spectrometry; proteomics; quantification
Year: 2021 PMID: 33661646 PMCID: PMC8041382 DOI: 10.1021/acs.jproteome.0c00902
Source DB: PubMed Journal: J Proteome Res ISSN: 1535-3893 Impact factor: 4.466
Summary of Data Sets and Resultsa
| data set | samples | groups | |||
|---|---|---|---|---|---|
| iPRG2015 | 12 | 4 | 7 | 0.5 | 30 tp (max: 30) + 0 fp |
| MaxLFQ benchmark | 8 | 2 | 4 | 1.0 | 37 tp (max: 40) + 2 fp |
| UPS-Yeast Ratio2 | 6 | 2 | 3 | 0.8 | 9 tp (max: 48) + 0 fp |
| UPS-Yeast Ratio2.5 | 6 | 2 | 3 | 0.8 | 39 tp (max: 48) + 0 fp |
| Glioblastoma | 6 | 2 | 3 | 1.0 | 270 |
| Multiple sclerosis | 27 | 2 | 17 | 0.5 | 10 |
| Cholangiocarcinoma | 30 | 3 | 8 | 0.5 | 50 |
| Lung cancer | 12 | 2 | 6 | 1.0 | 278 |
Results for the engineered data sets (iPRG2015, MaxLFQ benchmark, UPS-Yeast Ratio2, and UPS-Yeast Ratio2.5) demonstrate the high sensitivity and correct FDR control of Triqler. For each of the biological data sets (Glioblastoma, Multiple sclerosis, Cholangiocarcinoma, and Lung cancer), Triqler finds differentially abundant proteins after multiple testing corrections, which the original studies generally were unable to do. S is the minimum number of nonmissing values for a peptide to be retained. F is the log2 fold-change threshold used to evaluate the differential abundance. DE proteins is the number of differentially abundant proteins at 5% differential abundance FDR. If more than two groups were present, then this column lists the sum of the differentially abundant proteins for each pairwise comparison. For the engineered data sets, the first number is the number of true-positives (tp), with the maximum number of true-positives given in parentheses, and the last number is the number of false-positives (fp).
Figure 1Posterior fold-change distributions allow for a quick and intuitive interpretation of Triqler’s results. (a) Posterior distributions of the log2 fold change for the spiked-in UPS proteins in the UPS-Yeast Ratio2.5 data set correctly center around log2(2.5) = 1.3. The proteins are sorted by the confidence of the protein identification, with high-confidence proteins (multiple high-confidence peptides) at the top and low-confidence proteins (few or low-confidence peptides) at the bottom. (b) The number of true-positive differentially abundant proteins in the UPS-Yeast Ratio2.5 data set slowly decreases as a function of an increasing fold-change evaluation threshold. The lower bound for this threshold is given in orange and is calculated from the standard deviation of the protein prior distribution. Below this lower bound, the number of false-positives rapidly increases. (c) For the UPS-Yeast Ratio2 set, the initially chosen threshold of 0.8 leads to very low sensitivity. On the basis of the lower bound estimation, a threshold of 0.5 is still within the range where few false-positives will occur.
Figure 2Posterior distributions reflect the uncertainty of the input data. Posterior distributions for three UPS proteins at the protein, the treatment group, and the fold change between group levels for the UPS-Yeast Ratio2.5 data set. The plots exemplify the different degrees of confidence in the differential abundance, as inferred by Triqler. For P02788, we have multiple peptide identifications that all agree on the relative abundances, which leads to a narrow posterior distribution. For O00762 and Q15483, fewer peptides were identified, and some missing values were present, which leads to wider posterior distributions and, in the case of Q15483, a visible influence of the protein prior to “pulling” the distribution toward 0.
Figure 3Genes and proteins close to a fold-change threshold risk being overlooked. (a) Gene RPL21 was called differentially expressed in the original study of the Glioblastoma data set but missed the 5% FDR cutoff in the Triqler analysis because the log2 fold change was close to 1.0. (b) Gene RPL13 was not called differentially abundant in the original study or by Triqler; however, it shows equally strong evidence of differential expression as RPL21 and should ideally be taken into account as evidence of the regulation of the Ribosome KEGG pathway in downstream pathway analysis tools.
Figure 4Pathways can be easily inspected by heatmaps of posterior distributions. Heatmap of posterior distributions of the fold change for the (a) Ribosome KEGG pathway (ko03010) and (b) Regulation of actin cytoskeleton KEGG pathway (ko04810) of the Glioblastoma data set. The genes are sorted by confidence of the gene being identified, with genes with multiple high-confidence peptides closer to the top and genes with few or low-confidence peptides toward the bottom. The ko03010 pathway shows very consistent down-regulation, whereas the ko04810 pathway displays both up- and down-regulated genes.