| Literature DB >> 18664292 |
Cong Zhou1, Lucas D Bowler, Jianfeng Feng.
Abstract
BACKGROUND: A better understanding of the mechanisms involved in gas-phase fragmentation of peptides is essential for the development of more reliable algorithms for high-throughput protein identification using mass spectrometry (MS). Current methodologies depend predominantly on the use of derived m/z values of fragment ions, and, the knowledge provided by the intensity information present in MS/MS spectra has not been fully exploited. Indeed spectrum intensity information is very rarely utilized in the algorithms currently in use for high-throughput protein identification.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18664292 PMCID: PMC2529326 DOI: 10.1186/1471-2105-9-325
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Features that potentially influence peptide fragmentation.
| 1 | Identity of residue C-terminal to fragmentation site | RB_C |
| 2 | Identity of residue N-terminal to fragmentation site | RB_N |
| 3 | Distance from fragmentation site to peptide N-terminus | DB_N |
| 4 | Distance from fragmentation site to peptide C-terminus | DB_C |
| 5 | Distance from fragmentation site to peptide center | DB_M |
| 6 | Whether fragmentation site is at either end of peptide | B_E |
| 7 | Basicity of residue N-terminal to fragmentation site | BaRB_N |
| 8 | Basicity of residue C-terminal to fragmentation site | BaRB_C |
| 9 | Average basicity of residues N/C terminal to fragmentation site | BaRB_A |
| 10 | Difference in basicity of residues N/C terminal to fragmentation site | BaRB_D |
| 11 | Basicity of fragmented y-ion | BaYI |
| 12 | Basicity of whole peptide | BaP |
| 13 | Helicity of residue N-terminal to fragmentation site | HeRB_N |
| 14 | Helicity of residue C-terminal to fragmentation site | HeRB_C |
| 15 | Average helicity of residues N/C terminal to fragmentation site | HeRB_A |
| 16 | Difference in helicity of residues N/C terminal to fragmentation site | HeRB_D |
| 17 | Hydrophobicity of residue N-terminal to fragmentation site | HyRB_N |
| 18 | Hydrophobicity of residue C-terminal to fragmentation site | HyRB_C |
| 19 | Average Hydrophobicity of residues N/C terminal to fragmentation site | HyRB_A |
| 20 | Difference in Hydrophobicity of residues N/C terminal to fragmentation site | HyRB_D |
| 21 | Hydrophobicity of fragmented y-ion | HyYI |
| 22 | Hydrophobicity of whole peptide | HyP |
| 23 | pI value of residue N-terminal to fragmentation site | PIRB_N |
| 24 | pI value of residue C-terminal to fragmentation site | PIRB_C |
| 25 | Average pI of residues N/C terminal to fragmentation site | PIRB_A |
| 26 | Difference in pI of residues N/C terminal to fragmentation site | PIRB_D |
| 27 | Length of whole peptide | LP |
| 28 | Length of fragmented y-ion | LYI |
| 29 | Ratio of length of fragmented y-ion and peptide | RLIP |
| 30 | Number of basic residues in whole peptide | NBaR_P |
| 31 | Number of basic residues in fragmented y-ion | NBaR_YI |
| 32 | Mass of whole peptide | MP |
| 33 | Mass of fragmented y-ion | MYI |
| 34 | Ratio of mass of fragmentated y-ion and peptide | RMIP |
| 35 | Distance from fragmentation site to basic residues | DBBa |
All features listed above are supposed to exert influences on the gas-phase fragmentation of peptides. They are subject to further examination by the Bayesian neural network model.
Values of amino acid property used in the study.
| 71.0788 | 0.16 | 1.24 | 206.4 | 6 | |
| 103.1388 | 2.5 | 0.79 | 206.2 | 5.02 | |
| 115.0886 | -2.49 | 0.89 | 208.6 | 2.77 | |
| 129.1155 | -1.5 | 0.85 | 215.6 | 3.22 | |
| 147.1766 | 5 | 1.26 | 212.1 | 5.48 | |
| 57.0519 | -3.31 | 1.15 | 202.7 | 5.97 | |
| 137.1411 | -4.63 | 0.97 | 223.7 | 7.47 | |
| 113.1594 | 4.76 | 1.28 | 209.6 | 5.94 | |
| 128.1741 | -5 | 0.88 | 221.8 | 9.59 | |
| 113.1594 | 4.76 | 1.28 | 209.6 | 5.98 | |
| 131.1926 | 3.23 | 1.22 | 213.3 | 5.74 | |
| 114.1038 | -3.79 | 0.94 | 212.8 | 5.41 | |
| 97.1167 | -4.92 | 0.57 | 214.4 | 6.3 | |
| 128.1307 | -2.76 | 0.96 | 214.2 | 5.65 | |
| 156.1875 | -2.77 | 0.95 | 237 | 11.15 | |
| 87.0782 | -2.85 | 1 | 207.6 | 5.68 | |
| 101.1051 | -1.08 | 1.09 | 211.7 | 5.64 | |
| 99.1326 | 3.02 | 1.27 | 208.7 | 5.96 | |
| 186.2132 | 4.88 | 1.07 | 216.1 | 5.89 | |
| 163.176 | 2 | 1.11 | 213.1 | 5.66 |
The values of different peptide property used in the study are listed in the table. Values for all the features listed in Table 1 are calculated with these property values during network training. The values for mass, hydrophobicity, helicity and basicity are cited from [19] and the values for PI are cited from
Figure 1Structure of the Bayesian neural network used to explore the mechanism of gas-phase fragmentation of peptides. The network is fully connected and feed-forward with three layers including one hidden layer. 73 nodes are used in the input layer representing 35 features. 40 nodes in binary are used to represent the presence of 20 different residues at N and C terminus to the target peptide bond. Every node in the input layer has an independent coefficient to reveal its "relevance" to the network output. The hidden layer has 40 nodes and the activation function of the hidden layer is sigmoidal.
Figure 2Verification of the features that potentially influence peptide fragmentation. The importance of the features listed in Table 1 is evaluated by the Bayesian neural network and the results are shown: Red circles: normalized irrelevance scores of the features under non-mobile status. Blue squares: normalized irrelevance scores of the features under partial-mobile status. Green triangles: normalized irrelevance scores of the features under mobile status. The higher an irrelevance score is, the less important the corresponding feature is. The threshold of each mobility status is shown in dashed line and the features proven to be influential on peptides' fragmentation (below threshold) are highlighted with filled circles/squares/triangles.
Figure 3Influence of each residue on fragmentation at its N/C terminal peptide bond. The influence of each residue on cleavage at its N-terminus is illustrated in the left panel (blue dots), and the influence on cleavage at its C-terminus is illustrated in the right panel (red dots). The most influential residues are marked with arrows. Down arrows indicate inhibition whereas up arrows indicate enhancement. Figure 3-A: Mobile status. Figure 3-B: Partial-mobile status. Figure 3-C: Non-mobile status.
Figure 4Reduction of training errors in the feature selection phase. Features are reduced according to their relevance to the fragmentation process (Figure 2). The X-axis represents the number of features being reduced and the Y-axis represents the average training error in percentage over 100 training times counted in percentage. The training error increases significantly when 23 less relevant features are removed, as indicated by the red arrow. It is then suggested that at most 22 features could be eliminated.
Figure 5Predicting spectra intensity pattern: on peptide GYSFVTTAER. Figure 5-A: The raw MS/MS data of peptide GYSFVTTAER. Unlabeled green peaks are ions degraded from labeled b/y ions by losing H2O, NH3, etc. Figure 5-B: The comparison of the experimental spectrum (red) versus the spectrum predicted by the network model (blue). The experimental spectrum is the y-ions extracted from the raw data (Figure 5-A) with intensities log-transformed. Figure 5-C: The effect of using probability theory. Blue dots indicate the interval [mean intensity - SD, mean intensity + SD] within which intensities of the ions are supposed to lie.
Figure 6Predicting spectra intensity pattern: on peptide VLYPNDNFFEGK. Figure 6-A: The raw MS/MS data of peptide VLYPNDNFFEGK. Unlabeled green peaks are ions degraded from labeled b/y ions by losing H2O, NH3, etc. Figure 6-B: The comparison of the experimental spectrum (red) versus the spectrum predicted by the network model (blue). The experimental spectrum is the y-ions extracted from the raw data (Figure 6-A) with intensities log-transformed. Figure 6-C: The effect of using probability theory. Blue dots indicate the interval [mean intensity - SD, mean intensity + SD] within which intensities of the ions are supposed to lie.
Figure 7Comparison of scores with/without intensity information on all test peptides. Similarity scores are computed using Eq. 1. The higher a score, the more similar the predicted spectrum is to the experimental counterpart. Figure 7-A: The red line represents the sorted scores calculated with the predicted intensity information. The blue line represents the corresponding scores calculated without intensity information. The two score use the same variances predicted by the Bayesian neural network model. Figure 7-B: The red line represents the sorted scores calculated with the predicted intensity information. The blue line represents the corresponding scores calculated without intensity information. Variances for intensity-free scores are set to 1.
Prediction error of the kinetic model and the Bayesian intensity model.
| 710 | 0.4294 | 0.1769 | |
| 897 | 0.4094 | 0.1650 |
The intensity patterns of 1607 test data are predicted with Zhang's kinetic model and our Bayesian intensity model. The accuracy of predictions and mean/SD of prediction errors are listed in the table.