| Literature DB >> 26449181 |
Marc Vaudel1, Kenneth Verheggen2,3,4, Attila Csordas5, Helge Raeder6, Frode S Berven1,7, Lennart Martens2,3,4, Juan A Vizcaíno5, Harald Barsnes1,6.
Abstract
In a global effort for scientific transparency, it has become feasible and good practice to share experimental data supporting novel findings. Consequently, the amount of publicly available MS-based proteomics data has grown substantially in recent years. With some notable exceptions, this extensive material has however largely been left untouched. The time has now come for the proteomics community to utilize this potential gold mine for new discoveries, and uncover its untapped potential. In this review, we provide a brief history of the sharing of proteomics data, showing ways in which publicly available proteomics data are already being (re-)used, and outline potential future opportunities based on four different usage types: use, reuse, reprocess, and repurpose. We thus aim to assist the proteomics community in stepping up to the challenge, and to make the most of the rapidly increasing amount of public proteomics data.Entities:
Keywords: Bioinformatics; Computational proteomics; Data analysis; Data standards; Databases
Mesh:
Year: 2015 PMID: 26449181 PMCID: PMC4738454 DOI: 10.1002/pmic.201500295
Source DB: PubMed Journal: Proteomics ISSN: 1615-9853 Impact factor: 3.984
Figure 1The amount of publicly available proteomics data is increasing, here indicated by the monthly submission statistics for PRIDE from June 2012 to May 2015. The x‐axis represents the months and the y‐axis the monthly number of submissions. The size of the bubbles indicate the data amount submitted each month. Note that the cumulative size of PRIDE data reached the 100 TB milestone in April 2015.
Figure 2The major milestones that enabled efficient proteomics data sharing: (A) standard data formats for sharing proteomics data, (B) data format converters and software exporters able to generate output in the standard formats, (C) tools for simplifying the submission of proteomics data to central proteomics repositories, and (D) central proteomics repositories that store and disseminate public proteomics data, here indicated by the main ProteomeXchange member repositories.
Figure 3The four ways in which public proteomics data can be utilized: (i) use, (ii) reuse, (iii) reprocess, and (iv) repurpose. See main text for details.
Figure 4The rapidly growing amount of publicly available proteomics data opens up the opportunity for in silico proteomics, that is using bioinformatics to test hypotheses directly through the available data, instead of going via the generation of new experimental data.