| Literature DB >> 21258065 |
Clement Chung1, Jian Liu, Andrew Emili, Brendan J Frey.
Abstract
MOTIVATION: A post-translational modification (PTM) is a chemical modification of a protein that occurs naturally. Many of these modifications, such as phosphorylation, are known to play pivotal roles in the regulation of protein function. Henceforth, PTM perturbations have been linked to diverse diseases like Parkinson's, Alzheimer's, diabetes and cancer. To discover PTMs on a genome-wide scale, there is a recent surge of interest in analyzing tandem mass spectrometry data, and several unrestrictive (so-called 'blind') PTM search methods have been reported. However, these approaches are subject to noise in mass measurements and in the predicted modification site (amino acid position) within peptides, which can result in false PTM assignments.Entities:
Mesh:
Year: 2011 PMID: 21258065 PMCID: PMC3051323 DOI: 10.1093/bioinformatics/btr017
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Histograms of inputs to our algorithm [generated by SIMS (Liu )] for spectra previously determined to be mapped to phosphopeptides (Beausoleil ). They show that the statistics for modification mass and modified amino acid deviate from the reference, which determined that the PTM (phosphorylation) occurs at ∼80 Da and on serine (S) and threonine (T). (A) The distribution of the measured modification mass. (B) Identified amino acids that deviate from S and T. (C) The distribution of the distance (in residues) from the identified amino acid to the reference for misplaced modifications; this demonstrates that identified modifications are generally only a few residues away from the reference.
Fig. 2.A Bayesian network describing our generative model, using plate notation (box). The shaded nodes represent observed variables, the unshaded nodes represent latent variables and the variables outside the plate are model parameters. The model describes how the observed modification mass and modification position are generated. Given the type of PTM (PTM group), we can generate the observed modification mass as a noisy version of the modification mass mean, and select an amino acid to be modified. Given the peptide sequence, we can choose a position along it that matches the modified amino acid as the ‘true’ modification position. We can generate the observed modification position as a noisy version of the ‘true’ modification position. The plate notation indicates there are N copies of the model, one for each input peptide.
Fig. 3.Distribution of modification position error used by PTMClust. This empirical distribution was derived using yeast PTM data (Krogan ) analyzed with SIMS (Liu ). A positive (negative) modification position error indicates that the observed modification position is toward the C-terminus (N-terminus) of the expected modification position.
Fig. 4.A comparison of clustering algorithms on a synthetically generated dataset. It shows how each of the three methods, k-means clustering, a mixture of Gaussians (MOG) and PTMClust (our algorithm), performs as more sets of data points with different modifications are added (increasing complexity). Correction rate is a quality measure defined as the difference between the total true positives and the total false positives divided by the total number peptides in the sample; higher correction rate indicates better performance. The result shows PTMClust performs consistently well while the other two algorithms exhibit a significant drop as the complexity of the dataset increases.
Results for SIMS, InsPecT, MODmap and PTMFinder with and without application of our method, PTMClust
| No. of correct modification position matches (% improvement over base algorithm) | No. of misplaced modification position matches (% improvement over base algorithm) | Total correct peptide sequence matches | |
|---|---|---|---|
| SIMS | 685 | 267 | 952 |
| SIMS with PTMCIust | 791 (∼15%) | 161 (∼40%) | 952 |
| InsPect | 621 | 239 | 860 |
| InsPect with PTMCIust | 712 (∼15%) | 148 (∼38%) | 860 |
| PTMFinder | 620 | 242 | 862 |
| PTMFinderwith PTMCIust | 711 (∼15%) | 151 (∼38%) | 862 |
| MODmap | 97 | 28 | 125 |
| MODmap with PTMCIust | 108 (∼11%) | 17 (∼39%) | 125 |
A reference set of MS/MS spectra previously mapped to phosphopeptides (Beausoleil ) was analyzed by SIMS (Liu ), InsPecT (Tanner ; Tsur ), MODmap (Na and Paek, 2009), and InsPecT followed by PTMFinder (Tanner ), a PTM refinement method. Using the reference peptide sequences and modifications as the truth, the table shows the number of correct peptide sequence matches, and the correct and misplaced modification positions before and after applying PTMClust (our algorithm) to the output from the four methods. PTMClust was able to correct for a significant portion of the modification position errors made by the four methods and the improvements are consistent across different methods. Furthermore, PTMClust is able to correct errors that PTMFinder missed, significantly outperforming it in terms of refining PTMs.
Summary of known modifications in the yeast proteome dataset
| PTM | PTMCIust | SIMS | ||
|---|---|---|---|---|
| Known PTM sites (% improvement over SIMS) | Peptides with known PTM sites (% improvement over SIMS) | Known PTM sites | Peptides with known PTM sites | |
| Phosphorylation | 66 (∼8%) | 115 (∼15%) | 61 | 100 |
| Acetylation | 9 (∼13%) | 75 (∼42%) | 8 | 72 |
| Cysteine oxidation (Cysteine sulfinic acid) | 1 (∼0%) | 7 (∼17%) | 1 | 6 |
| Others | 5 (∼0%) | 35 (∼0%) | 5 | 35 |
| Total | 81 (∼8%) | 232 (∼9%) | 75 | 213 |
The known set of modifications was taken from Uniprot (Release 2010_11). We matched the sets of modified peptides produced by SIMS and post-processed with PTMClust to the set of known yeast modification sites. The results show PTMClust is able to identify and refine PTMs in a complex dataset.