| Literature DB >> 18047723 |
William S Sanders1, Susan M Bridges, Fiona M McCarthy, Bindu Nanduri, Shane C Burgess.
Abstract
BACKGROUND: When proteins are subjected to proteolytic digestion and analyzed by mass spectrometry using a method such as 2D LC MS/MS, only a portion of the proteotypic peptides associated with each protein will be observed. The ability to predict which peptides can and cannot potentially be observed for a particular experimental dataset has several important applications in proteomics research including calculation of peptide coverage in terms of potentially detectable peptides, systems biology analysis of data sets, and protein quantification.Entities:
Mesh:
Substances:
Year: 2007 PMID: 18047723 PMCID: PMC2099492 DOI: 10.1186/1471-2105-8-S7-S23
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Classifier construction process.
A list of initial features used for classifier construction in addition to AAIndex features.
| Length of peptide |
| Net charge of peptide |
| Positive charge |
| Negative charge |
| Isoelectric point |
| Molecular weight |
| Hydropathicity |
| Count of each amino acid (20 features) |
| Percent composition of each amino acid (20 features) |
| Percent polar amino acids |
| Percent positive amino acids |
| Percent negative amino acids |
| Percent hydrophobic amino acids |
A feature selection procedure is used to reduce dimensionality prior to classifier construction.
Description of features selected for the classifiers built for the two datasets.
| Avian Bursa Dataset |
| Number of prolines |
| Percent glycine |
| Percent alanine |
| Percent leucine |
| Percent polar amino acids |
| Percent hydrophobic amino acids |
| Percent positive amino acids |
| Percent negative |
| Size (Dawson, 1972) |
| Optimized transfer energy parameter (Oobatake et al., 1985) |
| Weights for beta-sheet at the window position of 5 (Qian-Sejnowski, 1988) |
| Transfer free energy from oct to wat (Radzicka-Wolfenden, 1988) |
| Information measure for C-terminal turn (Robson-Suzuki, 1976) |
| Amphiphilicity index (Mitaku et al., 2002) |
| Hodgkin's Lymphoma Model Dataset |
| Number of cytosienes |
| Signal sequence helical potential (Argos et al., 1982) |
| Transer free energy to surface (Bull-Breese, 1974) |
| Normalized relative frequency of alpha-helix (Isogai et al., 1980) |
| Normalized relative frequence of double bend (Isogai et al., 1980) |
| Distance between C-alpha and centroid fo side chain (Levitt, 1976) |
| Retention coefficient in NAH2PO4 (Meek-Rossetti, 1981) |
| Interior composition of amino acids intracellular proteins (Fukuchi-Nishikawa, 2001) |
| Linker propensity from 1-linker dataset (George-Heringa, 2003) |
10-fold cross-validation accuracy by class for neural networks generated for two datasets.
| Class | True positive rate | False positive rate | Precision | Recall | ROC Area |
| Avian Bursal Dataset | |||||
| Not observed | 0.80 | 0.19 | 0.81 | 0.80 | 0.87 |
| Observed | 0.82 | 0.20 | 0.80 | 0.82 | 0.87 |
| Hodgkin's Lymphoma Model Dataset | |||||
| Not observed | 0.66 | 0.22 | 0.75 | 0.66 | 0.80 |
| Observed | 0.78 | 0.34 | 0.70 | 0.78 | 0.80 |
Accuracy by class for neural networks generated using one dataset as the training set and the other dataset for test data.
| Class | True positive rate | False positive rate | Precision | Recall | ROC Area |
| Avian Bursal Dataset training set, Hodgkins Lymphoma test set | |||||
| Not observed | 0.71 | 0.46 | 0.61 | 0.71 | 0.70 |
| Observed | 0.54 | 0.29 | 0.66 | 0.54 | 0.70 |
| Hodgkin's Lymphoma Model Dataset training set, Avian Bursa test set | |||||
| Not observed | 0.81 | 0.41 | 0.81 | 0.73 | 0.73 |
| Observed | 0.59 | 0.19 | 0.59 | 0.66 | 0.73 |
Number of tryptic peptides predicted to be observable for selected proteins from the two data sets.
| Protein GI Number | Num tryptic peptides (>= 6 aa) | Num tryptic peptides observed | Percent amino acid coverage | Number predicted detectable | Percent predicted detectable | Percent amino acid coverage of detectable |
| Avian bursa data set | ||||||
| 5902793 | 20 | 2 | 10 | 9 | 45 | 33 |
| 119359 | 50 | 5 | 9 | 15 | 30 | 21 |
| 128413 | 16 | 2 | 11 | 3 | 18 | 14 |
| 2119012 | 7 | 2 | 28 | 3 | 43 | 17 |
| 17025728 | 16 | 2 | 6 | 7 | 44 | 20 |
| 122000 | 6 | 4 | 33 | 0 | 0 | 0 |
| 1762374 | 7 | 1 | 23 | 2 | 29 | 21 |
| 1172808 | 13 | 1 | 6 | 4 | 30 | 19 |
| 7512219 | 44 | 1 | 2 | 11 | 25 | 34 |
| 104697 | 9 | 2 | 22 | 4 | 44 | 30 |
| 118106991 | 12 | 0 | 0 | 0 | 0 | 0 |
| Hodgkin's lymphoma model data set | ||||||
| 479367 | 34 | 1 | 3 | 5 | 15 | 11 |
| 729629 | 18 | 2 | 14 | 11 | 61 | 43 |
| 899264 | 13 | 1 | 10 | 4 | 31 | 21 |
| 63544 | 48 | 2 | 2 | 6 | 13 | 15 |
| 50750413 | 38 | 3 | 11 | 8 | 21 | 25 |
| 45433516 | 26 | 0 | 0 | 0 | 0 | 0 |
| 46048702 | 14 | 0 | 0 | 0 | 0 | 0 |
| 125745137 | 9 | 0 | 0 | 0 | 0 | 0 |
| 125745114 | 9 | 0 | 0 | 0 | 0 | 0 |
| 45433516 | 26 | 0 | 0 | 0 | 0 | 0 |
1) For the avian bursa dataset, 10 randomly selected observed proteins and the DR3 protein that was expected but not observed. 2) For the Hodgkin's lymphoma model dataset, 5 proteins that were observed in the pathway under consideration and 5 that had been observed using other methods in previous experiments but not observed in this dataset.