| Literature DB >> 28361671 |
Xiao-Dong Feng1,2, Li-Wei Li2, Jian-Hong Zhang2, Yun-Ping Zhu2, Cheng Chang2, Kun-Xian Shu3, Jie Ma4.
Abstract
BACKGROUND: The mass spectrometry based technical pipeline has provided a high-throughput, high-sensitivity and high-resolution platform for post-genomic biology. Varied models and algorithms are implemented by different tools to improve proteomics data analysis. The target-decoy searching strategy has become the most popular strategy to control false identification in peptide and protein identifications. While this strategy can estimate the false discovery rate (FDR) within a dataset, it cannot directly evaluate the false positive matches in target identifications.Entities:
Keywords: Entrapment sequence method; Proteomics; Quality control; Tandem mass spectrometry; Target-decoy search
Mesh:
Substances:
Year: 2017 PMID: 28361671 PMCID: PMC5374549 DOI: 10.1186/s12864-017-3491-2
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Construction of the target database for Pfu and LM3 datasets
| DataSets | Sample sequences | Entrapment sequences | Sample tryptic peptides | Entrapment tryptic peptides | Shared tryptic peptide | Shared/Sample tryptic peptides (%) |
|---|---|---|---|---|---|---|
|
|
|
| 145358 | 2338004 | 102 | 0.070 |
|
|
|
| 2338004 | 15344503 | 4864 | 0.208 |
|
|
|
| 2338004 | 1479773 | 1333 | 0.057 |
Fig. 1Workflow for evaluation of database search engines and quality control methods using the entrapment sequence method. A total of five search engines (Mascot, X!Tandem, Comet, MS-GF+ and Tide) and four quality control methods (PepDistiller, BuildSummary, PeptideProphet and FDRAnalysis) were studied on the basis of a standard Pfu dataset and a complex LM3 dataset
Fig. 2Plot figures of the numbers of PSM, peptide and protein identified by five search engines under the estimated FDRs on Pfu dataset (a-c) and LM3 dataset (d-f). The reprocessed scores of all five search engines are used
Fig. 3Distribution figures of the identification numbers and FMRs under 0.01 FDR of spectrum, peptide and protein level for five search engines on Pfu dataset (a-c) and LM3 dataset (d-f). The reprocessed scores of all five search engines are used
Fig. 4Plot figures of the numbers of PSM, peptide and protein filtered by four quality control methods under the estimated FDRs on Pfu dataset (a-c) and LM3 dataset (d-f)
Fig. 5Distribution figures of the identification numbers and FMRs under 0.01 FDR of spectrum, peptide and protein level for four quality control methods on Pfu dataset (a-c) and LM3 dataset (d-f)
Fig. 6Distribution figures of the PSMs, FDRs and FMRs identified by different number of search engines on LM3 dataset. a Distributions of original FDRs and FMRs under 0.01 spectrum FDR. b Distributions of refined FDRs and FMRs under 0.01 spectrum FDR, as the PSMs in each subgroup are further filtered to keep their sub-FDRs lower than the pre-defined one