| Literature DB >> 27503675 |
Matthew The1, Ayesha Tasnim1, Lukas Käll2.
Abstract
A frequently sought output from a shotgun proteomics experiment is a list of proteins that we believe to have been present in the analyzed sample before proteolytic digestion. The standard technique to control for errors in such lists is to enforce a preset threshold for the false discovery rate (FDR). Many consider protein-level FDRs a difficult and vague concept, as the measurement entities, spectra, are manifestations of peptides and not proteins. Here, we argue that this confusion is unnecessary and provide a framework on how to think about protein-level FDRs, starting from its basic principle: the null hypothesis. Specifically, we point out that two competing null hypotheses are used concurrently in today's protein inference methods, which has gone unnoticed by many. Using simulations of a shotgun proteomics experiment, we show how confusing one null hypothesis for the other can lead to serious discrepancies in the FDR. Furthermore, we demonstrate how the same simulations can be used to verify FDR estimates of protein inference methods. In particular, we show that, for a simple protein inference method, decoy models can be used to accurately estimate protein-level FDRs for both competing null hypotheses.Entities:
Keywords: Bioinformatics; Data processing and analysis; Mass spectrometry-LC-MS/MS; Protein inference; Simulation; Statistical analysis
Mesh:
Substances:
Year: 2016 PMID: 27503675 PMCID: PMC5096025 DOI: 10.1002/pmic.201500431
Source DB: PubMed Journal: Proteomics ISSN: 1615-9853 Impact factor: 3.984
Figure 1The two null hypotheses, and , gave different proportions of true null hypotheses among the discoveries. Here, we plotted the mean and SD of the Observed FDR as a function of Observed FDR over randomized simulations with different absent protein fractions, . The two Observed FDRs do not agree and for lower values of , that is, higher fractions of present proteins, this effect becomes more apparent.
Figure 2The number of proteins not supported by the best scoring peptide inference can be estimated by a decoy model. Here, we plotted the reported fraction of proteins with incorrect best scoring peptide inference as a function of the classical decoy–target ratio, for ten randomized simulations for three different values of the absent protein fraction, , and 20 000 peptide inferences. The classical decoy–target ratio accurately matches the fraction of proteins with incorrect best scoring peptide inference.
Figure 3For high coverage of the present peptide set, the picked target–decoy strategy should be used instead of the classical target–decoy strategy for an accurate estimation of the fraction of proteins with incorrect best scoring peptide inference. Here, for ten randomized simulations, we plotted the reported fraction of proteins with incorrect best scoring peptide inference as a function of (a) the classical decoy–target ratio and (b) the picked decoy–target ratio, for the same simulations of 80 000 peptide inferences and absent protein fraction, . The classical decoy–target ratio is no longer a good estimator for the fraction of proteins with incorrect best scoring peptide inference, and the picked decoy–target ratio should be used instead.
Figure 4The protein absence ratio can be estimated by the classical decoy–target ratio, as long as we compensate for the absent protein fraction . We plotted the reported fraction of absent proteins as a function of the classical decoy–target ratio (red), together with a line (solid, black) and (dashed, blue) for ten randomized simulations with 20 000 peptide inferences. The protein absence ratio roughly corresponds to the classical decoy–target ratio times the absent protein fraction.
| 1: |
| |
|
| ||
|
| ||
|
| ||
|
| ||
|
| ||
| 2: |
| ▷ Set |
| 3: |
| ▷randomly select |
| 4: |
| ▷ digest target proteins into peptides |
| 5: |
| ▷ digest present proteins into peptides |
| 6: |
| ▷ reverse sequences for decoy peptides |
| 7: |
| ▷initialize list of tuples (peptide, PEP, isCorrect, isDecoy) |
| 8: |
| ▷ make peptide pools global for calls to |
| 9: |
| ▷ randomly select |
| 10: |
| |
| 11: |
| |
| 12: |
| ▷ randomly select |
| 13: |
| |
| 14: |
| ▷ randomly select |
| 15: |
| |
| 16: |
| ▷ return list of tuples (peptide, PEP, isCorrect, isDecoy) |
| 17: |
| ▷ |
| 18: |
| ▷ draw from uniform distribution |
| 19: |
| |
| 20: | isCorrect ← False | |
| 21: |
| |
| 22: |
| |
| 23: | isDecoy ← True | |
| 24: |
| |
| 25: |
| |
| 26: | isDecoy ← False | |
| 27: |
| |
| 28: |
| |
| 29: | isCorrect ← True | |
| 30: | isDecoy ← False | |
| 31: |
| |
| 32: |
| |
| 33: |
|