| Literature DB >> 23533138 |
Attila Csordas1, Rui Wang, Daniel Ríos, Florian Reisinger, Joseph M Foster, Douglas J Slotta, Juan Antonio Vizcaíno, Henning Hermjakob.
Abstract
The PRIDE database, developed and maintained at the European Bioinformatics Institute (EBI), is one of the most prominent data repositories dedicated to high throughput MS-based proteomics data. Peptidome, developed by the National Center for Biotechnology Information (NCBI) as a sibling resource to PRIDE, was discontinued due to funding constraints in April 2011. A joint effort between the two teams was started soon after the Peptidome closure to ensure that data were not "lost" to the wider proteomics community by exporting it to PRIDE. As a result, data in the low terabyte range have been migrated from Peptidome to PRIDE and made publicly available under experiment accessions 17 900-18 271, representing 54 projects, ~53 million mass spectra, ~10 million peptide identifications, ~650,000 protein identifications, ~1.1 million biologically relevant protein modifications, and 28 species, from more than 30 different labs.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23533138 PMCID: PMC3717177 DOI: 10.1002/pmic.201200514
Source DB: PubMed Journal: Proteomics ISSN: 1615-9853 Impact factor: 3.984
Figure 1Peptidome data volume (red) relative to existing, public PRIDE data (blue) in percent as of July 31, 2012, according to seven data types: number of projects, experiments, distinct NEWT taxonomy terms, identified proteins, biologically relevant protein modifications reported, peptide identifications, and number of mass spectra. Peptidome percentage values are shown on top of the red Peptidome bars.
Figure 2Word cloud depicting the frequencies of the terms that were used in the Peptidome project names given by the original submitters. In order to weight the experiment numbers in particular projects, the 54 different project names were counted 371 times. Only terms occurring four or more times are represented (in total 108 terms). The common English words (e.g. and, of, the), the biologically irrelevant (e.g. assessment, detection) or obviously overrepresented meta expressions (e.g. proteome, proteins, proteomics) were removed. It is important to highlight that the project focusing on the phototroph Synechocystis species (PSE117) contained 109 experiment files.