| Literature DB >> 17761002 |
Xinning Jiang1, Xiaogang Jiang, Guanghui Han, Mingliang Ye, Hanfa Zou.
Abstract
BACKGROUND: In proteomic analysis, MS/MS spectra acquired by mass spectrometer are assigned to peptides by database searching algorithms such as SEQUEST. The assignations of peptides to MS/MS spectra by SEQUEST searching algorithm are defined by several scores including Xcorr, Delta Cn, Sp, Rsp, matched ion count and so on. Filtering criterion using several above scores is used to isolate correct identifications from random assignments. However, the filtering criterion was not favorably optimized up to now.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17761002 PMCID: PMC2040164 DOI: 10.1186/1471-2105-8-323
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Distribution of peptides identified from human liver tissue lysate by SEQUEST. A) Singly charged peptides; B) Doubly charged peptides; C) Triply charged peptides. Each data point represents a peptide identification from the composite database: cross represents peptide identification from reversed sequence while square indicates peptide identification from forward sequence. Cumulate curves drawn in each graph are 1% false-discovery curves. Each point on curves indicates a filtering criterion leading to peptide identification with FDR of 1%, and the identified peptides by each criterion present in the region where Xcorr and ΔCn scores are higher than the Xcorr and ΔCn cutoffs in each set of criteria. Graphs were drawn using the Speed Model by Origin 7.5 with 5000 max points per curve, and three raw graphs with all data points were shown [see Additional file 1].
Figure 2Relationship between the number of peptide identifications and Xcorr values in different criteria which leaded to these identifications for human liver tissue lysate at same FDR (<1%). To achieve less than 1% FDR, ΔCn cutoff for each criterion changes with the Xcorr cutoff. Curves for three different charge states were drawn separately: A) for singly charged peptides, B) for triply charged peptides and C) for doubly charged peptides. D) is the zoomed curve for singly charged peptides.
Figure 3Dependence of fitness on generations for doubly charged peptides. Fitness for each individual (criterion) represents the number of peptide identifications filtered by this criterion. Fitness of the fittest individual in each generation was represented as black dots.
Comparison of the performance of conventional criteria, PeptideProphet and SFOER in peptide identifications for the analysis of human liver tissue lysatea
| Conventional criteriab | PeptideProphetc | SFOERd | |
| # 1+ | 99 | 26 | 162 |
| # 2+ | 17950 | 17451 | 18606 |
| # 3+ | 7388 | 12587 | 11313 |
| # total | 25428 | 30064 | 30081 |
| %incr | / | 18.2% | 18.3% |
| # false pep | 126 | 113 | 147 |
| FDR | 0.99% | 0.75% | 0.98% |
| #unique pep | 4591 | 5175 | 5285 |
| %incr unique pep | / | 12.7% | 15.12% |
| # proteins | 1467 | 1596 | 1665 |
a. Summary of each category returned by different strategies: #1+, #2+ and #3+ indicates the number of peptide identifications for charge states of 1+, 2+ and 3+ respectively. #total = (#1+) + (#2+) + (#3+). #false pep indicates the number of peptides from reversed database, while #unique pep is the number of total unique peptides. Increase of peptide identifications (%incr) and unique peptide identifications (%incr unique pep) by SFOER and PeptideProphet are shown. #proteins are the number of positive proteins identified by the strategies. FDR of identifications are also shown.
b. Conventional criteria. Xcorr > 2.0, 2.5 and 3.8 for singly, doubly and triply charged peptides, respectively and ΔCn > 0.164 for all charge states [25, 33].
c. Cutoff is set as PeptideProphet probability > 0.9 [13, 36-38].
d. Optimized criteria determined by SFOER are: according to the charge states of 1+, 2+ and 3+, Xcorr scores > 1.76, 2.31 and 2.41, ΔCn > 0.061, 0.199 and 0.265, Sp > 44.42, 104 and 276.9 and Rsp < 3, 4 and 2.
The optimized criteria of peptide identifications from human liver tissue lysate and human plasma by SFOER with FDR less than 1%
| Charge | Xcorr | ΔCn | Sp | Rsp | |
| liver tissue | 1+ | 1.76 | 0.061 | 44.42 | 3 |
| 2+ | 2.31 | 0.199 | 104 | 4 | |
| 3+ | 2.41 | 0.265 | 276.9 | 2 | |
| plasma | 1+ | 1.88 | 0.179 | 238 | 80 |
| 2+ | 2.31 | 0.270 | 71 | 2 | |
| 3+ | 2.40 | 0.319 | 215.6 | 1 |
Summary of the peptide identifications from human liver tissue by applying filtering criteria optimized using different score combinations
| All four scores | Xcorr ΔCn Rsp | Xcorr ΔCn Sp | Xcorr ΔCn | |
| # peptides | 30,081 | 29,996 | 29,595 | 29,248 |
| # increase | 2.87% | 2.56% | 1.19% | / |
| FDR | 0.977% | 0.980% | 0.980% | 0.998% |
Figure 4Overlap of peptides identified by SFOER and PeptideProphet for human liver tissue lysate. The numbers of peptide identifications by one or both algorithms are indicated, e.g., 27,272 peptides are identified by both algorithms (intersection).
Figure 5Evaluation of the classification performances of SFOER and PeptideProphet with standard protein mixture. A) Number of correct and incorrect peptide identifications by SFOER and PeptideProphet under different FDR, where incorrect peptide identification indicates peptide assignment from forward yeast database while correct one is from known standard proteins and trypsin. B) Predicated and observed FDRs. Observed FDR is calculated as the number of peptide identifications not from standard proteins over total peptide identifications, while predicated FDR is calculated using equation (1). Observed FDR for SFOER are presented by open circles, while observed FDR for PeptideProphet are represented by filled circles.
Parameter settings for the genetic algorithm
| GA configuration | ||
| Variables | 4 | |
| Population size | 100 | |
| Crossover probability | 0.2 | |
| Mutation probability | 0.01 | |
| Bits | Xcorr | 9 |
| ΔCn | 9 | |
| Sp | 12 | |
| Rsp | 8 | |
| Fitness evaluation | n (peptides) | |
Figure 6Flowchart of the optimization procedure using genetic algorithm. It starts with the initialization phase, which randomly generates the initial population P0. Population in the next generation Pi+1 is obtained by applying genetic operators on current population Pi. Fitness for each individual (criterion) is evaluated as the number of filtered peptides. Evolution continues until a terminating condition is reached. The selection, mutation and cross-over operator are used in genetic algorithm.