| Literature DB >> 28480704 |
Surya Gupta1,2,3, Kenneth Verheggen1,2,3, Jan Tavernier1,2, Lennart Martens1,2,3.
Abstract
Mass-spectrometry-based, high-throughput proteomics experiments produce large amounts of data. While typically acquired to answer specific biological questions, these data can also be reused in orthogonal ways to reveal new biological knowledge. We here present a novel method for such orthogonal data reuse of public proteomics data. Our method elucidates biological relationships between proteins based on the co-occurrence of these proteins across human experiments in the PRIDE database. The majority of the significantly co-occurring protein pairs that were detected by our method have been successfully mapped to existing biological knowledge. The validity of our novel method is substantiated by the extremely few pairs that can be mapped to existing knowledge based on random associations between the same set of proteins. Moreover, using literature searches and the STRING database, we were able to derive meaningful biological associations for unannotated protein pairs that were detected using our method, further illustrating that as-yet unknown associations present highly interesting targets for follow-up analysis.Entities:
Keywords: computational analysis; mass spectrometry; pathways; protein co-occurrence; protein complex; protein−protein interaction; proteomics
Mesh:
Substances:
Year: 2017 PMID: 28480704 PMCID: PMC5491052 DOI: 10.1021/acs.jproteome.6b01066
Source DB: PubMed Journal: J Proteome Res ISSN: 1535-3893 Impact factor: 4.466
Figure 1(a) Outline of the workflow to calculate and annotate protein pairs generated from MS-based proteomics experiments. (b) Identification is performed using a pipeline built from three existing tools, pride-asap, SearchGUI and PeptideShaker, all automated on the Pladipus backend. (c) Identified proteins are then analyzed for co-occurrence using the Jaccard similarity coefficient. (d) Protein pairs with a similarity coefficient above threshold are then mapped to existing knowledgebases to validate our findings.
Figure 2Protein–experiment matrix obtained after reprocessing human PRIDE data. Columns represent experiments and rows proteins, with values representing distinct peptide counts for a protein in an experiment.
Figure 3Example of Jaccard similarity coefficient calculation, where similarity between Protein 1 and Protein 2 is calculated by using peptide counts for each across six different experiments.
Figure 4Distribution of percentage of annotated pairs versus similarity score for (a) pride-data set and (b) Pandey-data set.
Figure 5Abundance of four co-occurring proteins in annotated protein pairs for both data sets.