| Literature DB >> 25158685 |
Yasset Perez-Riverol1, Emanuele Alpi, Rui Wang, Henning Hermjakob, Juan Antonio Vizcaíno.
Abstract
Compared to other data-intensive disciplines such as genomics, public deposition and storage of MS-based proteomics, data are still less developed due to, among other reasons, the inherent complexity of the data and the variety of data types and experimental workflows. In order to address this need, several public repositories for MS proteomics experiments have been developed, each with different purposes in mind. The most established resources are the Global Proteome Machine Database (GPMDB), PeptideAtlas, and the PRIDE database. Additionally, there are other useful (in many cases recently developed) resources such as ProteomicsDB, Mass Spectrometry Interactive Virtual Environment (MassIVE), Chorus, MaxQB, PeptideAtlas SRM Experiment Library (PASSEL), Model Organism Protein Expression Database (MOPED), and the Human Proteinpedia. In addition, the ProteomeXchange consortium has been recently developed to enable better integration of public repositories and the coordinated sharing of proteomics information, maximizing its benefit to the scientific community. Here, we will review each of the major proteomics resources independently and some tools that enable the integration, mining and reuse of the data. We will also discuss some of the major challenges and current pitfalls in the integration and sharing of the data.Entities:
Keywords: Bioinformatics; Databases; MS; Repositories
Mesh:
Year: 2015 PMID: 25158685 PMCID: PMC4409848 DOI: 10.1002/pmic.201400302
Source DB: PubMed Journal: Proteomics ISSN: 1615-9853 Impact factor: 3.984
Figure 1Hierarchy of proteomics data repositories and databases according to the different data types stored: raw MS data repositories, resources that store peptide/protein identification and quantification results, and protein knowledge bases. Some resources are duplicated in different levels because they can be included in more than one category.
Main characteristics of the major MS-based proteomics repositories and databases
| Repositories | Raw data | Support for targeted approaches | Metadata | Human protein expression information | Species | Quantification data | Related stand-alone tools | Web services URL | URL |
|---|---|---|---|---|---|---|---|---|---|
| PRIDE (June 2014) | X | – | High level | 41 835 Protein accessions | Approximately 450 species | X | PRIDE Inspector, PRIDE Converter 2, PeptideShaker | ||
| PeptideAtlas (Human, August 2013) | X | X | Medium level | 14 018 Proteins | – | TIQAM, TIQAM–Digestor, TIQAM–PeptideAtlas, TIQAM–Viewer, ATAQS, PIPE2, PABST | |||
| GPMDB (May 2014) | – | X | High level | 136 373 Protein accessions | – | – | |||
| MassIVE (May 2014) | X | – | Low level | * | * | – | – | – | |
| Chorus (May 2014) | X | – | Low level | * | * | – | – | – | |
| ProteomicsDB (May 2014) | X | – | High level | 18 097 Proteins | X | – | – | ||
| MOPED (May 2014) | – | – | Medium level | 17 141 Proteins | X | – | – | ||
| Human Proteinpedia (May 2014) | X | – | Low level | 15 231 Proteins | – | – | – | ||
| MaxQB (May 2014) | – | – | Medium level | 14 732 Proteins | X | – | – | ||
| PaxDb (May 2014) | – | – | Low level | 10 482 Proteins | X | – | |||
| HPM (June 2014) | – | – | Low level | 10 482 Proteins | X | – | – |
X, the feature or characteristic is supported; -, the feature is not supported; *, it was not possible to retrieve the corresponding information.
Figure 2Bubble chart representation of the size of the PX complete submissions to PRIDE (until May 2014). The x-axis includes months with at least one submission, since PX submissions started (from March 2012). The y-axis corresponds to the number of PX “complete” public datasets submitted to PRIDE in each specific month. The size of each bubble represents the total number of mass spectra included in all the datasets in a given month.
Figure 3(A) Number of UniProtKB/Swiss-Prot human proteins (release 2014_05, 20,265 entries) observed in different proteomics resources that have a uniform data processing pipeline (GPMDB, ProteomicsDB, PeptideAtlas, HPM, PaxDb, MaxQB, Human Proteinpedia; PRIDE is not included); (B) Venn diagram representing the human protein identifications observed in GPMDB, PeptideAtlas, and ProteomicsDB; (C) Area chart showing the distribution of the number of PRIDE assays for those proteins present in three, two, and one proteomics resources, or for those proteins not identified at all.