| Literature DB >> 18586733 |
Sean McIlwain1, David Page, Edward L Huttlin, Michael R Sussman.
Abstract
MOTIVATION: In recent years stable isotopic labeling has become a standard approach for quantitative proteomic analyses. Among the many available isotopic labeling strategies, metabolic labeling is attractive for the excellent internal control it provides. However, analysis of data from metabolic labeling experiments can be complicated because the spacing between labeled and unlabeled forms of each peptide depends on its sequence, and is thus variable from analyte to analyte. As a result, one generally needs to know the sequence of a peptide to identify its matching isotopic distributions in an automated fashion. In some experimental situations it would be necessary or desirable to match pairs of labeled and unlabeled peaks from peptides of unknown sequence. This article addresses this largely overlooked problem in the analysis of quantitative mass spectrometry data by presenting an algorithm that not only identifies isotopic distributions within a mass spectrum, but also annotates matches between natural abundance light isotopic distributions and their metabolically labeled counterparts. This algorithm is designed in two stages: first we annotate the isotopic peaks using a modified version of the IDM algorithm described last year; then we use a probabilistic classifier that is supplemented by dynamic programming to find the metabolically labeled matched isotopic pairs. Such a method is needed for high-throughput quantitative proteomic metabolomic experiments measured via mass spectrometry.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18586733 PMCID: PMC2718665 DOI: 10.1093/bioinformatics/btn190
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Example naïve Bayes model for estimating isotopic match distribution probabilities.
Fig. 2.Example of annotated mass spectrum. The red color corresponds to noise peaks. The distributions of the same color are the light–heavy matched pair.
Number of states and equations for number of look backs used in the dynamic programming algorithm
| Look back | States | Equations |
|---|---|---|
| 1 | 3 | 5 |
| 2 | 11 | 21 |
| 3 | 49 | 105 |
| 4 | 257 | 599 |
| 5 | 1539 | 3831 |
Statistical results of classifier and dynamic programming algorithms using expert annotated isotopes (see Section 3 for the distinction between the scores)
| Recall | Precision | F1 | FPR | |
|---|---|---|---|---|
| Absolute match | ||||
| Classifier | 86±6 | 83±8 | 84±6 | 0.7±0.3 |
| 3-Step look back | 99±2 | 83±7 | 90±4 | 0.7±0.3 |
| Monoisotopic match | ||||
| Classifier | 86±6 | 83±8 | 84±6 | 0.7±0.3 |
| 3-Step look back | 99±2 | 83±7 | 90±4 | 0.7±0.3 |
| Monoisotopic fine match | ||||
| Classifier | 87±5 | 88±7 | 87±5 | 38±19 |
| 3-Step look back | 99±1 | 86±6 | 92±4 | 49±14 |
Statistical results of classifier and dynamic programming algorithms using machine selected peaks (see Section 3 for the distinction between the scores)
| Recall | Precision | F1 | FPR | |
|---|---|---|---|---|
| Absolute match | ||||
| Classifier | 35±4 | 58±8 | 43±5 | 1±0.4 |
| 3-Step look back | 36±6 | 59±9 | 44±6 | 1±0.4 |
| Monoisotopic match | ||||
| Classifier | 43±4 | 72±10 | 54±5 | 0.7±0.4 |
| 3-Step look back | 45±6 | 74±10 | 55±6 | 0.7±0.4 |
| Monoisotopic fine match | ||||
| Classifier | 43±5 | 77±8 | 55±5 | 6±2 |
| 3-Step look back | 44±6 | 78±8 | 56±6 | 6±2 |
Fig. 3.F1-scores using expert-annotated isotopes. DP-LB-X stands for the dynamic programming algorithm with look back of X.
Fig. 5.F1-scores using machine-selected peaks.
Statistical results of isotope annotation algorithm using expert and machine selected peaks [see (McIlwain et al., 2007) and Section 3.1 for the distinction between the scores]
| Recall | Precision | F1 | FPR | |
|---|---|---|---|---|
| Absolute isotope | ||||
| Expert selected peaks | 91±3 | 98±2 | 94±2 | 0.4±0.4 |
| Machine selected peaks | 59±5 | 87±3 | 70±6 | 2±0.4 |
| Monoisotopic isotope | ||||
| Expert selected peaks | 99±1 | 95±3 | 97±1 | 2±1 |
| Machine selected peaks | 76±6 | 72±8 | 73±6 | 3±0.4 |
| Monoisotopic fine isotope | ||||
| Expert selected peaks | 97±2 | 100±0 | 98±1 | 0±0 |
| Machine selected peaks | 78±4 | 75±7 | 76±5 | 18±3 |
Statistical results of regressor and dynamic programming algorithms using machine-selected peaks
| Recall | Precision | F1 | FPR | |
|---|---|---|---|---|
| Absolute match | ||||
| Unweighted recall | 43±5 | 49±6 | 45±4 | 2±1 |
| Weighted recall | 50±5 | 48±7 | 48±4 | 2±1 |
| Mono-isotopic Match | ||||
| Unweighted recall | 59±6 | 67±8 | 62±5 | 1±0.4 |
| Weighted Recall | 65±6 | 63±9 | 63±5 | 2±1 |
| Monoisotopic fine match | ||||
| Unweighted recall | 73±5 | 80±7 | 76±4 | 8±2 |
| Weighted recall | 79±5 | 75±8 | 76±4 | 12±3 |
Fig. 4.F1-scores using expert-selected peaks.
Statistical results of classifier and dynamic programming algorithms using expert selected peaks (see Section 3 for the distinction between the scores)
| Recall | Precision | F1 | FPR | |
|---|---|---|---|---|
| Absolute match | ||||
| Classifier | 70±6 | 67±9 | 68±7 | 1±0.3 |
| 3-Step look back | 77±5 | 68±9 | 72±7 | 1±0.3 |
| Monoisotopic match | ||||
| Classifier | 81±5 | 77±7 | 79±5 | 0.9±0.3 |
| 3-Step look back | 91±3 | 81±8 | 86±5 | 0.8±0.3 |
| Monoisotopic fine match | ||||
| Classifier | 81±4 | 86±7 | 83±3 | 40±17 |
| 3-Step look back | 89±2 | 86±7 | 87±4 | 44±15 |