| Literature DB >> 31510687 |
Hao Yang1,2, Hao Chi1,2, Wen-Feng Zeng1,2, Wen-Jing Zhou1,2, Si-Min He1,2.
Abstract
MOTIVATION: De novo peptide sequencing based on tandem mass spectrometry data is the key technology of shotgun proteomics for identifying peptides without any database and assembling unknown proteins. However, owing to the low ion coverage in tandem mass spectra, the order of certain consecutive amino acids cannot be determined if all of their supporting fragment ions are missing, which results in the low precision of de novo sequencing.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31510687 PMCID: PMC6612832 DOI: 10.1093/bioinformatics/btz366
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.The workflow of pNovo 3
Median values of three similarities on all datasets
|
|
|
|
|
| QE_HF_X1 | QE_HF_X2 | |
|---|---|---|---|---|---|---|---|
| Cosine | 0.97 | 0.97 | 0.97 | 0.96 | 0.97 | 0.96 | 0.96 |
| Pearson | 0.97 | 0.97 | 0.96 | 0.96 | 0.96 | 0.95 | 0.95 |
| Spearman | 0.94 | 0.94 | 0.94 | 0.92 | 0.94 | 0.94 | 0.93 |
| #PSMs | 41 721 | 12 538 | 67 452 | 55 163 | 126 966 | 104 052 | 83 313 |
All spectra whose top-10 peptide candidates contain the correct results are considered. This part accounts for 60–76% of total spectra on all of the seven datasets Vigna mungo (V.mungo), Mus musculus (M.musculus), Methanosarcina mazei (M.mazei), Saccharomyces cerevisiae (S.cerevisiae), Apis mellifera (A.mellifera), QE_HF_X1 and QE_HF_X2.
Recall of top-1 peptides identified by different de novo sequencing algorithms
|
|
|
|
|
| QE_HF_X1 | QE_HF_X2 | |
|---|---|---|---|---|---|---|---|
| pNovo 3 | 64.6% | 50.4% | 66.0% | 64.7% | 62.5% | 47.8% | 38.3% |
| pNovo | 42.9% | 25.7% | 42.4% | 47.7% | 36.7% | 29.8% | 21.4% |
| PEAKS | 44.3% | 24.9% | 42.4% | 50.0% | 38.0% | 32.2% | 24.6% |
| Novor | 17.4% | 9.7% | 19.1% | 19.1% | 13.7% | 10.9% | 9.3% |
| #Total PSMs | 62 089 | 25 354 | 103 959 | 81 326 | 217 841 | 196 759 | 201 301 |
Fig. 2.Venn diagram of the correct results of pNovo 3, pNovo and PEAKS on the first three datasets: (a) V.mungo, (b) M.musculus and (c) M.mazei
Fig. 3.The recalls of top-1 to top-10 on the first three datasets: (a) V.mungo, (b) M.musculus and (c) M.mazei
Fig. 4.One example shows that the features extracted by pNovo 3 can effectively discriminate between the correct and very similar incorrect results. The real spectrum is from V.mungo dataset and the title of this spectrum is 4723.8552.8552.2.dta. Both the correct (KYDEIDAAPEER, the above subfigure) and incorrect (KYDEIDAAEPER, the below subfigure) peptide sequences are matched to this real spectrum. Five features of the correct and incorrect peptide sequences are labeled with the green and red figures, respectively
Fig. 5.The precision-recall (PR) curves of pNovo 3, pNovo, PEAKS and Novor on the first three datasets: (a) V.mungo, (b) M.musculus, (c) M.mazei. (d) The AUCs of the three algorithms on the seven datasets