| Literature DB >> 24551059 |
Abstract
De novo sequencing is an important computational approach to determining the amino acid sequence of a peptide with tandem mass spectrometry (MS/MS). Most of the existing approaches use a graph model to describe a spectrum and the sequencing is performed by computing the longest antisymmetric path in the graph. The task is often computationally intensive since a given MS/MS spectrum often contains noisy data, missing mass peaks, or post translational modifications/mutations. This paper develops a new parameterized algorithm that can efficiently compute the longest antisymmetric partial path in an extended spectrum graph that is of bounded path width. Our testing results show that this algorithm can efficiently process experimental spectra and provide sequencing results of high accuracy.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24551059 PMCID: PMC3925086 DOI: 10.1371/journal.pone.0087476
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1(a) The mass peaks in a tandem mass spectrum; (b) The corresponding extended spectrum graph.
Figure 2(a) An example of a graph; (b) A path decomposition for the graph in (a).
Figure 3A path decomposition and its corresponding dynamic programming tables.
The distribution of path widths (PW) of extended spectrum graphs.
| N/S | PW<5 | PW = 5 | PW>5 |
| 0.00 | 51.32% | 42.23% | 6.45% |
| 0.20 | 39.34% | 40.72% | 19.94% |
| 0.50 | 32.53% | 30.26% | 37.21% |
| 0.80 | 27.45% | 30.57% | 42.18% |
| 1.00 | 20.63% | 30.31% | 49.06% |
The prediction accuracy of the program on simulated data.
| N/S | PDS | TDS | NovoHMM | PepNovo |
| 0.00 | 98.60% | 98.60% | 98.32% | 98.46% |
| 0.20 | 98.27% | 98.27% | 98.25% | 98.31% |
| 0.50 | 98.29% | 98.29% | 98.13% | 98.23% |
| 0.80 | 97.98% | 97.98% | 98.08% | 98.12% |
| 1.00 | 96.95% | 96.95% | 95.32% | 97.03% |
The computation time (secs) of the program on simulated data.
| N/S | PDS | TDS | NovoHMM | PepNovo |
| 0.00 | 0.06 | 0.78 | 3.23 | 0.08 |
| 0.20 | 0.28 | 3.67 | 3.79 | 0.31 |
| 0.50 | 0.53 | 6.27 | 7.32 | 0.67 |
| 0.80 | 0.42 | 6.64 | 9.57 | 0.73 |
| 1.00 | 0.66 | 7.83 | 11.46 | 0.85 |
The distribution of path widths of the extended spectrum graphs of 10000 real spectra.
| PW<3 | PW = 3 | PW = 4 | PW = 5 | PW>5 |
| 5.33% | 21.42% | 36.79% | 22.13% | 14.33% |
The average sequencing accuracy achieved by PDS, PepNovo and NovoHMM on each group.
| PDS | PepNovo | NovoHMM | TDS | |
| Group 1 |
| 90.7% | 87.4% | 85.3% |
| Group 2 |
| 88.3% | 82.6% | 78.4% |
| Group 3 |
| 86.4% | 81.7% | 74.2% |
| Group 4 | 86.2% |
| 87.8% | 83.1% |
| Group 5 | 83.1% |
| 86.3% | 84.6% |
The average computation time (secs) needed to process the spectra in each group.
| PDS | PepNovo | NovoHMM | TDS | |
| Group 1 | 0.04 | 0.06 | 3.27 | 0.07 |
| Group 2 | 0.18 | 0.19 | 4.38 | 0.39 |
| Group 3 | 1.35 | 2.23 | 5.66 | 4.76 |
| Group 4 | 3.78 | 5.65 | 14.63 | 32.13 |
| Group 5 | 12.29 | 6.75 | 16.79 | 127.35 |