Literature DB >> 25721607

Web resources for mass spectrometry-based proteomics.

Tao Chen1, Jie Zhao2, Jie Ma1, Yunping Zhu3.   

Abstract

With the development of high-resolution and high-throughput mass spectrometry (MS) technology, a large quantum of proteomic data is continually being generated. Collecting and sharing these data are a challenge that requires immense and sustained human effort. In this report, we provide a classification of important web resources for MS-based proteomics and present rating of these web resources, based on whether raw data are stored, whether data submission is supported, and whether data analysis pipelines are provided. These web resources are important for biologists involved in proteomics research.
Copyright © 2015 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.

Entities:  

Keywords:  Mass spectrometry; Proteomics; Web resources

Mesh:

Year:  2015        PMID: 25721607      PMCID: PMC4411487          DOI: 10.1016/j.gpb.2015.01.004

Source DB:  PubMed          Journal:  Genomics Proteomics Bioinformatics        ISSN: 1672-0229            Impact factor:   7.691


Introduction

The advancement of tandem mass spectrometry (MS) has made it possible to identify hundreds of thousands of proteins in MS-based experiments [1]. With the development of a wide range of methods for spectrometry and data analysis, MS-based proteomics has gained popularity in biomedical research. The vastly-expanding research using tandem MS technology is continually generating large amounts of proteomics data. Collecting these datasets is undoubtedly becoming crucial to the research community. Proteomics data repository contains a proteome with high coverage and sufficient data content for statistical analysis, and provides extensive observational data for genome annotation projects as well. However, maintaining such data repository is challenging due to the diversity and quantum of data as well as varying needs of different users. In this report, we describe web data repositories for MS-based proteomics and rate them based on their score against parameters such as storage of raw data, data submission support, and provision of data analysis pipelines. The main features of these resources are shown in Table 1. Based on their focus areas within proteomic research, we classified these resources into 3 categories: general proteomics data repositories, quantitative proteomics data repositories, and proteomics data repositories focusing on protein post-translational modifications (PTMs).
Table 1

List of major MS-based proteomics resources

CategoryNameLinkMain featuresRatingRefs.
GeneralPRIDEhttp://www.ebi.ac.uk/pride/archiveSupports raw data storage and data submission★★★★★[3]
PeptideAtlashttp://www.peptideatlas.orgSupports raw data storage, data submission, and data analysis★★★★★[1]
Human Proteinpediahttp://www.humanproteinpedia.orgSupports raw data storage and data submission★★★★☆[4]
iProXhttp://iprox.hupo.org.cnSupports raw data storage, data submission, and data analysis★★★★★
Tranchehttps://proteomecommons.org/trancheSupports raw data storage and data submission★★★☆☆[5]
GPMDBhttp://www.thegpm.orgSupports data analysis★★★☆☆[6]
MOPEDhttp://moped.proteinspire.orgStores protein expression information from MS-based proteomics experiments★★★☆☆[7]
YPEDhttp://yped.med.yale.eduAn integrated bioinformatics suite and database for proteomics research★★★★☆[8], [9]



Quantitative PTMs-focusedPaxDbhttp://pax-db.orgSupports quantitative proteomics data storage★★★☆☆[10]
Phospho.ELMhttp://phospho.elm.eu.orgSupports phosphoproteomic MS data storage★★★☆☆[11]
PhosphoSitePlushttp://www.phosphosite.orgStores raw data and MS-reported PTM sites★★★★☆[12]
dbPTMhttp://dbptm.mbc.nctu.edu.twStores raw data and MS/MS peptides associated with PTMs★★★★☆[13], [14]
PHOSIDAhttp://www.phosida.comSupports raw data storage and phosphoproteomic MS data storage★★★★☆[15], [16]

Note: These web resources are rated based on their score against parameters such as storage of raw data, data submission support, and provision of data analysis pipelines. MS, mass spectrometry; PTM, post-translational modification.

List of major MS-based proteomics resources Note: These web resources are rated based on their score against parameters such as storage of raw data, data submission support, and provision of data analysis pipelines. MS, mass spectrometry; PTM, post-translational modification.

General proteomics data repositories

Proteomics IDEntifications database

The Proteomics IDEntifications (PRIDE) database created by the European Bioinformatics Institute (EBI) is a web resource that collects MS-based proteomics data. By the end of 2014, PRIDE accumulated data for 41,835 proteins, 269,806 unique peptides, and about 101 million spectra [2]. PRIDE is one of the most popular proteomic data repositories that have played an important role in the nascent Human Proteome Project (HPP) [3].

PeptideAtlas

PeptideAtlas is a database that stores various formats of output files and metadata from MS-based experiments [1], it also allows users to submit raw data. These raw data are periodically analyzed for identification and statistical analysis purposes. The results are made available back to the researchers by web-based presentation systems. PeptideAtlas can help plan targeted proteomics experiments, improve genome annotation, and support data mining projects [1].

Human Proteinpedia

Human Proteinpedia is a resource to integrate, store, and share proteomic data [4]. It is a platform for collecting human proteomic data using a distributed annotation system, which allows the research community to contribute protein annotations. By the end of 2014, Human Proteinpedia has covered 15,231 proteins, 1,960,352 peptides, and about 5 million spectra [2]. It also provides a panorama of the human proteome.

iProX

iProX is an integrated proteome resources center based in China, which is built to support the worldwide sharing of proteomics data. Currently, iProX comprises an experiment data submission system and a proteome database. The iProX submission system is a public platform that was set up following the data-sharing policy of the ProteomeXchange consortium. Raw data and standardized meta-data from proteomics experiments can be collected and shared by using controlled vocabularies to describe the Minimum Information About a Proteomics Experiment (MIAPE). Registered users can choose to submit their proteomics datasets to iProX via public or private modes. Datasets submitted via the public mode are openly accessible, whereas private datasets can only be accessed by the authorized users. On the other hand, the iProX proteomics database was developed as a structured storage platform for data deposited in the system. iProX facilitates data analysis and sharing. Up till now, it has covered 46 projects, 190 subprojects, and 6441 data files.

Tranche

Tranche is a data repository targeting storage and sharing of information for proteomics researchers. It supports re-use and dissemination of both data and software. To reduce data redundancy and achieve load balancing, it adopts peer-to-peer networking. It also uses a client–server model to ensure authentication and reliability. A client tool is required to upload and download datasets. It has several important features including pre-publication encryption, data pedigree, data integrity, immutability, and versioning. Tranche provides interfaces for PRIDE, Human Proteinpedia, and PeptideAtlas to store and disseminate large MS-based data files [5].

Global Proteome Machine Database

The Global Proteome Machine Database (GPMDB) is a resource for collecting diverse tandem mass spectra. It also includes peptide and protein identifications that are important for further MS computational research [6]. GPMDB provides a pipeline for reprocessing raw data submitted by users or imported from other repositories, thus generating XML files that store information about peptide and protein identification. Specifically, identified proteins are organized into separate spreadsheets for each chromosome and mitochondrial DNA. By the end of 2014, GPMDB data spans 136,373 proteins, 1,786,698 peptides, and 1020 million spectra [2]. GPMDB has played an important role in the Chromosome-Centric Human Proteome (C-HPP) Project.

Model Organism Protein Expression Database

The Model Organism Protein Expression Database (MOPED) is a proteomics repository that integrates protein expression information from MS-based proteomics experiments on human specimens and that from model organisms [7]. It also provides new estimates of protein abundance and concentration, and statistical summaries from experiments. Several search and visualization tools are available. By the end of 2014, MOPED has developed into a repository containing 17,141 proteins, 250,000 unique peptides, and approximately 15 million spectra [2], providing researchers with information on complex biological processes and thus supporting biomedical discovery.

Yale Protein Expression Database

The Yale Protein Expression Database (YPED) [8] is an integrated bioinformatics suite and database for proteomics research, which was significantly improved from the first version released in 2007 [9]. YPED supports many kinds of data including those from multiple MS instruments, different search engines, and labeled or label-free quantification. YPED is a web-accessible and user-friendly resource, designed to meet data management, archival, and analysis needs of high-throughput MS-based proteomics research.

Quantitative proteomics data repositories

PaxDb

PaxDb is a meta-database integrating whole-organism data and tissue-resolved data at absolute protein abundance levels for various model organisms. It imports quantitative proteomics data sets exclusively from published experiments and from primary proteomics data resources such as PRIDE and PeptideAtlas, and then analyzes the actual spectral count [10]. By the end of 2014, it included 10,482 proteins; 143,456 peptides, and about 24 million spectra [2]. The launch of PaxDb brings together disparate aspects of biology for high-throughput analysis and supports global comparative analysis across different organism groups.

Proteomics data repositories focusing on protein PTMs

Phospho.ELM

Recent advances in MS techniques have enabled more efficient detection of phosphorylated proteins [9]. The Phospho.ELM is a web-based resource aimed at storing phosphorylation data imported from research papers and phosphoproteomic MS analyses. MS experiments are run on human/mouse cell lines/tissues. Phospho.ELM is used by laboratory scientists and computational biologists to develop public repositories [11]. To date, this web resource covers 42,914 instances, 299 kinases, 3657 references, 11,224 sequences, and 8698 substrates.

PhosphoSitePlus

PhosphoSitePlus (PSP) is a comprehensive and manually-curated resource designed to collect the structure and function of PTMs, primarily of human and mouse origin. PSP supports two kinds of data, including the modified amino acid and surrounding sequences as well as upstream and downstream interactions with regard to functional regions of the protein [12]. The majority of PTM sites in PSP were detected using MS. PSP is useful to life scientists and biomedical researchers. Currently, PSP spans 50,636 proteins, 1,933,888 MS peptides, 438,576 high-throughput MS sites, 20,262 low-throughput sites, and 18,374 curated papers.

dbPTM

dbPTM is a resource which collects data on experimentally-validated protein PTMs. This resource imports PTM sites from public resources such as SwissProt, Phospho.ELM, and O-GLYCBASE [13]. It also extracts identified peptides with PTMs from research papers. dbPTM is an important resource for researchers working on substrate specificity of PTM sites [14]. To date, dbPTM has covered 153,113 phosphorylation experimental sites, 23,673 ubiquitylation experimental sites, 10,385 acetylation experimental sites, 15,678 N-linked glycosylation experimental sites, and 3711 O-linked glycosylation experimental sites.

Phosphorylation site database

The phosphorylation site database (PHOSIDA) is a database with a collection of a large number of high-confidence phosphorylation sites. MS-based proteomics is used to identify these sites in various species [15]. To date, the database covers 80,062 N-glycosylated, phosphorylated, or acetylated sites. Stringent quality criteria based on a very low false positive rate are used to obtain these sites from high-resolution MS data [16]. PHOSIDA contains PTM sites from human as well as other species, including bacteria.

Concluding remarks

In this report, we have covered some important proteomics data repositories that are useful for the research community. These resources not only provide raw data and identification results, but also support prospective, high-throughput proteomics research. In addition, they also act as data providers for large-scale genome annotation efforts. In the years to come, sharing data and metadata between repositories will become more important. Thus, proteomics repositories need to focus on developing an integrated approach to data accessibility between repositories. On the other hand, with the advent of new instruments, new sample preparation techniques, and new data analysis methods, new forms of data will be continuously generated. The amount of data in the repositories to be shared at present is just a small fraction of the actually-generated proteomics data that will eventually become available. In order to attract more researchers to submit data, the resources will have to standardize the process and simplify the interface for data submission.

Competing interests

The authors declared that there are no competing interests.
  16 in total

1.  Tranche distributed repository and ProteomeCommons.org.

Authors:  Bryan E Smith; James A Hill; Mark A Gjukich; Philip C Andrews
Journal:  Methods Mol Biol       Date:  2011

2.  Open source system for analyzing, validating, and storing protein identification data.

Authors:  Robertson Craig; John P Cortens; Ronald C Beavis
Journal:  J Proteome Res       Date:  2004 Nov-Dec       Impact factor: 4.466

3.  YPED: a web-accessible database system for protein expression analysis.

Authors:  Mark A Shifman; Yuli Li; Christopher M Colangelo; Kathryn L Stone; Terence L Wu; Kei-Hoi Cheung; Perry L Miller; Kenneth R Williams
Journal:  J Proteome Res       Date:  2007-09-15       Impact factor: 4.466

4.  The PeptideAtlas Project.

Authors:  Eric W Deutsch
Journal:  Methods Mol Biol       Date:  2010

5.  PHOSIDA 2011: the posttranslational modification database.

Authors:  Florian Gnad; Jeremy Gunawardena; Matthias Mann
Journal:  Nucleic Acids Res       Date:  2010-11-16       Impact factor: 16.971

6.  Phospho.ELM: a database of phosphorylation sites--update 2011.

Authors:  Holger Dinkel; Claudia Chica; Allegra Via; Cathryn M Gould; Lars J Jensen; Toby J Gibson; Francesca Diella
Journal:  Nucleic Acids Res       Date:  2010-11-09       Impact factor: 16.971

7.  PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse.

Authors:  Peter V Hornbeck; Jon M Kornhauser; Sasha Tkachev; Bin Zhang; Elzbieta Skrzypek; Beth Murray; Vaughan Latham; Michael Sullivan
Journal:  Nucleic Acids Res       Date:  2011-12-01       Impact factor: 16.971

8.  YPED: an integrated bioinformatics suite and database for mass spectrometry-based proteomics research.

Authors:  Christopher M Colangelo; Mark Shifman; Kei-Hoi Cheung; Kathryn L Stone; Nicholas J Carriero; Erol E Gulcicek; TuKiet T Lam; Terence Wu; Robert D Bjornson; Can Bruce; Angus C Nairn; Jesse Rinehart; Perry L Miller; Kenneth R Williams
Journal:  Genomics Proteomics Bioinformatics       Date:  2015-02-21       Impact factor: 7.691

9.  Human Proteinpedia: a unified discovery resource for proteomics research.

Authors:  Kumaran Kandasamy; Shivakumar Keerthikumar; Renu Goel; Suresh Mathivanan; Nandini Patankar; Beema Shafreen; Santosh Renuse; Harsh Pawar; Y L Ramachandra; Pradip Kumar Acharya; Prathibha Ranganathan; Raghothama Chaerkady; T S Keshava Prasad; Akhilesh Pandey
Journal:  Nucleic Acids Res       Date:  2008-10-23       Impact factor: 16.971

10.  PHOSIDA (phosphorylation site database): management, structural and evolutionary investigation, and prediction of phosphosites.

Authors:  Florian Gnad; Shubin Ren; Juergen Cox; Jesper V Olsen; Boris Macek; Mario Oroshi; Matthias Mann
Journal:  Genome Biol       Date:  2007       Impact factor: 13.583

View more
  6 in total

1.  Advanced Multidimensional Separations in Mass Spectrometry: Navigating the Big Data Deluge.

Authors:  Jody C May; John A McLean
Journal:  Annu Rev Anal Chem (Palo Alto Calif)       Date:  2016-03-30       Impact factor: 10.745

Review 2.  Middle-down approach: a choice to sequence and characterize proteins/proteomes by mass spectrometry.

Authors:  P Boomathi Pandeswari; Varatharajan Sabareesh
Journal:  RSC Adv       Date:  2019-01-02       Impact factor: 4.036

3.  On bioinformatic resources.

Authors:  Runsheng Chen
Journal:  Genomics Proteomics Bioinformatics       Date:  2015-03-02       Impact factor: 7.691

4.  Integrative biological simulation praxis: Considerations from physics, philosophy, and data/model curation practices.

Authors:  Gopal P Sarma; Victor Faundez
Journal:  Cell Logist       Date:  2017-11-29

5.  Study of phosphorylation events for cancer diagnoses and treatment.

Authors:  Elena López Villar; Luis Madero; Juan A López-Pascual; William C Cho
Journal:  Clin Transl Med       Date:  2015-05-24

Review 6.  Role of omics techniques in the toxicity testing of nanoparticles.

Authors:  Eleonore Fröhlich
Journal:  J Nanobiotechnology       Date:  2017-11-21       Impact factor: 10.435

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.