| Literature DB >> 29028897 |
Xuan Guo1,2,3, Zhou Li1,2, Qiuming Yao2, Ryan S Mueller4, Jimmy K Eng5, David L Tabb6, William Judson Hervey7, Chongle Pan1,2.
Abstract
Motivation: Complex microbial communities can be characterized by metagenomics and metaproteomics. However, metagenome assemblies often generate enormous, and yet incomplete, protein databases, which undermines the identification of peptides and proteins in metaproteomics. This challenge calls for increased discrimination of true identifications from false identifications by database searching and filtering algorithms in metaproteomics.Entities:
Mesh:
Year: 2018 PMID: 29028897 PMCID: PMC6192206 DOI: 10.1093/bioinformatics/btx601
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Consistency and accuracy of PSM identifications by three diverse scoring functions
| Soil 1 | Soil 2 | Soil 3 | Marine 1 | Marine 2 | Marine 3 | ||
|---|---|---|---|---|---|---|---|
| Total | # Spectra | 374 692 | 454 828 | 360 409 | 128 648 | 132 605 | 119 403 |
| % Spectra | 100% | 100% | 100% | 100% | 100% | 100% | |
| Unanimous PSM | # Spectra | 126 386 | 117 693 | 108 730 | 54 357 | 47 759 | 56 139 |
| % Spectra | 34% | 26% | 30% | 42% | 36% | 47% | |
| % Decoy | 7% | 8% | 7% | 4% | 4% | 3% | |
| Majority PSM: | # Spectra | 42 721 | 53 157 | 43 706 | 13 343 | 15 053 | 12 014 |
| WDP & Xcorr | % Spectra | 11% | 12% | 12% | 10% | 11% | 10% |
| Minority PSM: MVH | % Decoy, Majority | 36% | 36% | 33% | 29% | 29% | 27% |
| % Decoy, Minority | 44% | 45% | 44% | 41% | 44% | 40% | |
| Majority PSM: | # Spectra | 25 558 | 29 713 | 23 414 | 8677 | 8231 | 7661 |
| WDP & MVH | % Spectra | 7% | 7% | 6% | 7% | 6% | 6% |
| Minority PSM: Xcorr | % Decoy, Majority | 37% | 39% | 37% | 27% | 32% | 27% |
| % Decoy, Minority | 43% | 44% | 42% | 41% | 42% | 40% | |
| Majority PSM: | # Spectra | 20 010 | 26 478 | 23 053 | 5353 | 7445 | 4836 |
| MVH & Xcorr | % Spectra | 5% | 6% | 6% | 4% | 6% | 4% |
| Minority PSM: WDP | % Decoy, Majority | 33% | 32% | 27% | 30% | 27% | 31% |
| % Decoy, Minority | 45% | 46% | 46% | 41% | 43% | 38% | |
| Discordant PSM | # Spectra | 160 017 | 227 787 | 161 506 | 46 918 | 54 117 | 38 753 |
| % Spectra | 43% | 50% | 45% | 36% | 41% | 32% | |
| % Decoy, WDP | 48% | 48% | 47% | 46% | 47% | 46% | |
| % Decoy, Xcorr | 47% | 48% | 46% | 47% | 46% | 46% | |
| % Decoy, MVH | 47% | 47% | 47% | 48% | 47% | 47% |
Number of spectra in a class.
Percentage of spectra in a class out of all acquired spectra.
Percentage of decoy PSMs out of all PSMs in a class or a sub-class.
Benchmarking of identification performance using six real-world metaproteomes
| Metaproteomes | Soil 1 | Soil 2 | Soil 3 | Marine 1 | Marine 2 | Marine 3 | |
|---|---|---|---|---|---|---|---|
| Search | Filter | # PSM Identifications at PSM FDR 1% | |||||
| W | P | 102 664 | 95 009 | 88 686 | 46 010 | 36 999 | 48 232 |
| M | P | 87 328 | 74 647 | 69 213 | 39 576 | 26 249 | 41 465 |
| C | P | 100 683 | 92 596 | 94 842 | 35 012 | 32 580 | 39 923 |
| G | P | 97 702 | 94 341 | 94 373 | 36 328 | 33 241 | 40 220 |
| C&M | I | 127 582 | 121 166 | 121 567 | 49 262 | 42 688 | 52 154 |
| C&M&G | I | ||||||
| C&M&G | SE-F | 96 220 | 99 579 | 90 507 | 42 811 | 40 282 | 46 603 |
| SE-S | SE-F | ||||||
| Search | Filter | # Peptide Identifications at Peptide FDR 1% | |||||
| W | P | 34 049 | 31 233 | 25 618 | 25 868 | ||
| M | P | 27 700 | 24 236 | 20 210 | 23 572 | 17 935 | 26 277 |
| C | P | 30 165 | 27 165 | 23 680 | 21 726 | 22 823 | 26 173 |
| G | P | 30 465 | 28 693 | 24 100 | 21 603 | 22 252 | 25 338 |
| C&M | I | 35 303 | 32 594 | 27 557 | 27 154 | 27 403 | 30 948 |
| C&M&G | I | 27 412 | 31 244 | ||||
| C&M&G | SE-F | 30 201 | 29 179 | 23 574 | 26 158 | 26 597 | 29 706 |
| SE-S | SE-F | ||||||
| Search | Filter | # Protein Identifications at Protein FDR 1% | |||||
| W | P | 6660 | 5996 | 4636 | 6173 | 7982 | |
| M | P | 7142 | 6546 | 5654 | 6536 | 5892 | 7101 |
| C | P | 7752 | 7020 | 6517 | 6818 | ||
| G | P | 7138 | 6623 | 7086 | 7360 | 8067 | |
| C&M | I | 7103 | 6738 | 5929 | 6107 | 6622 | 7019 |
| C&M&G | I | 7067 | 6800 | 5810 | 6129 | 6571 | 7198 |
| C&M&G | SE-F | 7966 | 6717 | 7302 | 7702 | ||
| SE-S | SE-F | ||||||
Searching algorithms: W, WDP; M, Myrimatch; C, Comet; G, MS-GF+; SE-S, Sipros Ensemble Searching.
Filtering algorithms: P, Percolator; I, iProphet; SE-F, Sipros Ensemble Filtering.
The best entry was underlined and the second best was in bold.
Benchmarking of identification performance using E.coli and synthetic metaproteome databases
| Databases | 100% | 50% | 50% | |
|---|---|---|---|---|
| Search | Filter | 1% FDR | 5% Non- | |
| W | P | 2062 | 955 ( | |
| M | P | 2153 | 972 (0.0%) | 776 (0.0%) |
| C | P | 966 (0.0%) | 836 (0.0%) | |
| G | P | 803 (0.0%) | ||
| C&M | I | 2170 | 915 ( | 807 (0.0%) |
| C&M&G | I | 2182 | 917 ( | 815 (0.0%) |
| C&M&G | SE-F | 2158 | 854 (0.0%) | 726 (0.0%) |
| SE-S | SE-F | 2137 | ||
Searching algorithms: W, WDP; M, Myrimatch; C, Comet; G, MS-GF+; SE-S, Sipros Ensemble Searching.
Filtering algorithms: P, Percolator; I, iProphet; SE-F, Sipros Ensemble Filtering.
Number of identified E.coli proteins filtered at 1% protein FDR estimated by target-decoy searches.
Number of identified E.coli proteins (and their FDRs estimated by target-decoy searches in parenthesis) filtered at 5% non-E.coli proteins.
The best entry was underlined and the second best was in bold.
Fig. 1Computational performance of database searching in metaproteomics. (A) Comparison of the execution time and peak memory of Sipros Ensemble, Comet, MyriMatch, MS-GF+ and their combinations used by iProphet. (B) Computational scalability of Sipros Ensemble on a supercomputer. Only a single LC-MS/MS cycle out of 22 cycles in a MudPIT run was used to measure the computational resources used by the database searching algorithms