| Literature DB >> 28201984 |
Chunwei Ma1, Shaohang Xu1, Geng Liu1, Xin Liu1, Xun Xu1, Bo Wen2, Siqi Liu3.
Abstract
BACKGROUND: Tandem mass spectrometry (MS/MS) followed by database search is a main approach to identify peptides/proteins in proteomic studies. A lot of effort has been devoted to improve the identification accuracy and sensitivity for peptides/proteins, such as developing advanced algorithms and expanding protein databases.Entities:
Keywords: Bioinformatics; Machine learning; Mass spectrometry; Proteogenomics; RNA-Seq; Shotgun proteomics
Mesh:
Substances:
Year: 2017 PMID: 28201984 PMCID: PMC5311845 DOI: 10.1186/s12859-017-1491-5
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Summary of peptide identification with 1% FDR in peptide level for different methods on two data sets
| No. | Methods | Jurkat cell line | Mouse liver | ||
|---|---|---|---|---|---|
| Peptide | Improvement | Peptide | Improvement | ||
| 1 | DBref | 71645 | - | 49937 | - |
| 2 | DBref + DBnovel | 71499 | - | 50083 | - |
| 3 | DBref + DBnovel + Rlow | 72283 | 1.10% | 50993 | 1.82% |
| 4 | DBref + DBnovel + Rlow + FmRNA | 75649 | 5.80% | 52503 | 4.83% |
| 5 | DBref + DBnovel + Rlow + Fpeptide | 76259 | 6.66% | 52170 | 4.17% |
| 6 | DBref + DBnovel + Rlow + Fpeptide+mRNA | 77682 | 8.65% | 53024 | 5.87% |
Note:
1. DBref : searching MS/MS data against with the reference protein database and then using MascotPercolator to process the identification results
2. DBref + DBnovel : searching MS/MS data against with the reference protein database adding the novel transcript-derived proteins, and then using MascotPercolator to process the identification results
3. DBref + DBnovel + Rlow : searching MS/MS data against with the customized protein database (reference proteins + novel transcript-derived proteins + removing low-RNA-level protein entries), and then using MascotPercolator to process the identification results
4. DBref + DBnovel + Rlow + FmRNA : searching MS/MS data against with the customized protein database (reference proteins + novel transcript-derived proteins + removing low-RNA-level protein entries), and then using MascotPercolator to process the identification results with adding the transcript abundance as a feature (FmRNA)
5. DBref + DBnovel + Rlow + Fpeptide: searching MS/MS data against with the customized protein database (reference proteins + novel transcript-derived proteins + removing low-RNA-level protein entries), and then using MascotPercolator to process the identification results with adding the peptide abundance (MS1 XIC of peptide) as a feature (Fpeptide)
6. DBref + DBnovel + Rlow + Fpeptide+mRNA: searching MS/MS data against with the customized protein database (reference proteins + novel transcript-derived proteins + removing low-RNA-level protein entries), and then using MascotPercolator to process the identification results with adding the two features (Fpeptide+mRNA = FmRNA + Fpeptide)
Fig. 1The correlation of transcript and protein abundances. (a) Jurkat cell line dataset and (b) mouse liver dataset. The Pearson correlation coefficients were 0.6318 and 0.4987 for Jurkat cell line and mouse liver datasets, respectively. Intensity based absolute quantification (iBAQ) was utilized to represent the protein abundance
Fig. 2The distribution of two features (XIC and FPKM) in target and decoy PSMs. The value of the two features are log transformed
Fig. 3Peptide identification versus different q-values and Venn plot for peptide identification. (a) and (b) Graphs display the estimated number of correct peptides for the Jurkat cell line and mouse liver data sets. (c) and (d) Unique peptide Venn plots for four methods. “MP” stands for processing by MascotPercolator, FPKM stands for processing by MascotPercolator adding FPKM as feature, XIC stands for processing by MascotPercolator adding XIC as feature, XIC + FPKM and FX stand for processing by MascotPercolator adding both FPKM and XIC as features
Fig. 4Workflow
Fig. 5Validation by permutation test. The left and right arrows indicate the peptides numbers of MascotPercolator (left arrow) and (a) adding FPKM (right arrow), (b) XIC (right arrow), and (c) both (right arrow) as features. The p-value was calculated by permutation test (100 times)