| Literature DB >> 17069662 |
Kosaku Shinoda1, Nozomu Yachie, Takeshi Masuda, Naoyuki Sugiyama, Masahiro Sugimoto, Tomoyoshi Soga, Masaru Tomita.
Abstract
BACKGROUND: Protein identification based on mass spectrometry (MS) has previously been performed using peptide mass fingerprinting (PMF) or tandem MS (MS/MS) database searching. However, these methods cannot identify proteins that are not already listed in existing databases. Moreover, the alternative approach of de novo sequencing requires costly equipment and the interpretation of complex MS/MS spectra. Thus, there is a need for novel high-throughput protein-identification methods that are independent of existing predefined protein databases.Entities:
Mesh:
Substances:
Year: 2006 PMID: 17069662 PMCID: PMC1643838 DOI: 10.1186/1471-2105-7-479
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Correlation between experimentally measured and predicted LC-ETs. Data for all peptides in the training dataset obtained through cross validation are shown. The correlation coefficient was 0.9755 and the mean prediction error was 1.52 min (SD = 1.63 min).
Target protein ranking among the search hit results using precursor ion m/z values, product ion m/z values, and/or LC-ETs.
| Gene Product | Rank | ||||
| Precursor | Precursor and ET | Precursor and Product | All | Mascot | |
| Average | 157.83 | 35.77 | 4.69 | 1.85 | - |
| 1 | 1 | 1 | 1 | 1 | |
| 1 | 1 | 1 | 1 | 1 | |
| 2 | 1 | 1 | 1 | 1 | |
| 7 | 1 | 1 | 1 | 1 | |
| 9 | 1 | 1 | 1 | 1 | |
| 9 | 1 | 1 | 1 | 1 | |
| 10 | 1 | 1 | 1 | 1 | |
| 12 | 1 | 1 | 1 | 1 | |
| 16 | 1 | 1 | 1 | 1 | |
| 21 | 1 | 1 | 1 | 1 | |
| 25 | 1 | 1 | 1 | 1 | |
| 30 | 1 | 1 | 1 | 1 | |
| 5 | 2 | 1 | 1 | 1 | |
| 10 | 2 | 1 | 1 | 1 | |
| 6 | 3 | 1 | 1 | 1 | |
| 8 | 3 | 1 | 1 | 1 | |
| 10 | 3 | 1 | 1 | 1 | |
| 11 | 3 | 1 | 1 | 1 | |
| 28 | 3 | 1 | 1 | 1 | |
| 301 | 3 | 1 | 1 | 1 | |
| 28 | 4 | 1 | 1 | 1 | |
| 282 | 6 | 1 | 1 | 1 | |
| 55 | 7 | 1 | 1 | 1 | |
| 211 | 7 | 1 | 1 | 1 | |
| 324 | 7 | 1 | 1 | 1 | |
| 155 | 8 | 1 | 1 | 1 | |
| 56 | 9 | 1 | 1 | 1 | |
| 34 | 11 | 1 | 1 | 1 | |
| 512 | 11 | 1 | 1 | 1 | |
| 34 | 12 | 1 | 1 | 1 | |
| 111 | 14 | 1 | 1 | 1 | |
| 139 | 14 | 1 | 1 | 1 | |
| 207 | 27 | 1 | 1 | 1 | |
| 325 | 31 | 1 | 1 | 1 | |
| 493 | 32 | 1 | 1 | N.I. | |
| 617 | 50 | 1 | 1 | 1 | |
| 139 | 80 | 1 | 1 | 4 | |
| 1752 | 767 | 1 | 1 | 3 | |
| 19 | 2 | 2 | 1 | 1 | |
| 62 | 4 | 2 | 1 | 1 | |
| 4 | 1 | 4 | 1 | 1 | |
| 27 | 8 | 4 | 1 | 2 | |
| 294 | 271 | 7 | 1 | 1 | |
| 129 | 1 | 16 | 1 | 1 | |
| 151 | 13 | 16 | 1 | 1 | |
| 312 | 15 | 23 | 7 | N.I. | |
| 429 | 15 | 77 | 32 | N.I. | |
For comparison, the results searched using Mascot software are also shown. Each recombinant protein's name we experimentally tested in the validation dataset is listed, as is its ranking in the search results.
Figure 2PPV of the peptide-screening algorithm using precursor ion m/z values, product ion m/z values and/or LC-ETs. The PPV corresponds to the ratio of correctly identified peptides (i.e. peptide sequences included in the validation protein datasets) to all identified peptides.
Specificity and sensitivity of peptide identifications using precursor ion m/z values, product ion m/z values, and/or LC-ETs.
| 41.208 | 96.038 | 100.000 | |
| 35.317 | 99.028 | 24.522 | |
| 8.467 | 99.995 | 0.134 | |
| 9.789 | 99.997 | 0.073 |
Relative values of the false-positive (FP) rate are also shown. The highest FP rate (precursor) was normalized to 100 and other FP rates were represented as the ratio to the normalized value. We used this representation of the FP rate to facilitate easier comparison.
Figure 3Schematic flowchart depicting the HybGFS method. (A and B) Generation of in silico peptide sequences from a genome sequence. (C) Calculation of precursor and product ion m/z values for in silico peptides. (D) LC-ET prediction for in silico-generated peptides using ANNs. (E) Pre-compilation of in silico peptide information into a database. (F) Querying experimental peptide data obtained by MS/MS to the database.