Literature DB >> 33963869

ProteoSign v2: a faster and evolved user-friendly online tool for statistical analyses of differential proteomics.

Evangelos Theodorakis^1,2, Andreas N Antonakis¹, Ismini Baltsavia¹, Georgios A Pavlopoulos³, Martina Samiotaki⁴, Grigoris D Amoutzias⁵, Theodosios Theodosiou¹, Oreste Acuto⁶, Georgios Efstathiou^1,6, Ioannis Iliopoulos¹.

Abstract

Bottom-up proteomics analyses have been proved over the last years to be a powerful tool in the characterization of the proteome and are crucial for understanding cellular and organism behaviour. Through differential proteomic analysis researchers can shed light on groups of proteins or individual proteins that play key roles in certain, normal or pathological conditions. However, several tools for the analysis of such complex datasets are powerful, but hard-to-use with steep learning curves. In addition, some other tools are easy to use, but are weak in terms of analytical power. Previously, we have introduced ProteoSign, a powerful, yet user-friendly open-source online platform for protein differential expression/abundance analysis designed with the end-proteomics user in mind. Part of Proteosign's power stems from the utilization of the well-established Linear Models For Microarray Data (LIMMA) methodology. Here, we present a substantial upgrade of this computational resource, called ProteoSign v2, where we introduce major improvements, also based on user feedback. The new version offers more plot options, supports additional experimental designs, analyzes updated input datasets and performs a gene enrichment analysis of the differentially expressed proteins. We also introduce the deployment of the Docker technology and significantly increase the speed of a full analysis. ProteoSign v2 is available at http://bioinformatics.med.uoc.gr/ProteoSign.

Entities: Chemical Disease Species

Year: 2021 PMID： 33963869 PMCID： PMC8262687 DOI： 10.1093/nar/gkab329

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Mass spectrometry (MS)-based quantitative proteomics is a powerful approach to study the global proteome dynamics in a cell, a tissue or an organism (1). The latest technological advances in bioanalytical chemistry, mass spectrometry and bioinformatics, allow the detection, relative quantitation and functional annotation of thousands of proteins in a single experiment in an hour using the so-called bottom-up proteomics approach (2,3). In contrast to the still developing top-down proteomics approach where intact proteins are analyzed by MS, in the widely used bottom-up approach, proteins are proteolytically digested into peptides and then separated and analysed. The peptides are analysed with MS techniques so that their accurate mass, absolute or relative abundance and amino acid sequence are determined. This information is used to deduce the abundance and primary structure of the peptides' parent proteins. A significant drawback of the bottom-up approach is its inability to unequivocally identify the different isoforms of the proteins, in contrast to the top-down proteomics approaches. However, the bottom-up approach provides better separation of peptides in both nLC and MS level resulting in a much higher coverage of the predicted proteome. Thus, the bottom-up approach is the most commonly used one in high-throughput proteomics (4). There are many available proteomic differential expression analysis tools, such as Perseus (probably the most popular one) (5), DanTE (6), Prostar (7), MsqRob (8), ProteoSign (9), MSstats (10), Rover (11), HiQuant (12), PIQMIe (13), Scaffold Q+S, ProtExA (14), StatQuant (15) etc. These tools display differences in terms of features, such as filtering, normalization, aggregation, statistical methods, types of analyses, data import/export formats, plots offered etc. They also have different user requirements (need of statistical and programming skills), experimental and software restrictions (for comparison, see (9)). Previously, we published ProteoSign, a web tool for differential and statistical analysis on quantification datasets which was published in the 2017 Nucleic Acids Research web server issue (8). It uses Linear Models for Microarray Data (LIMMA) methodology in order to statistically assess the difference in abundance of proteins between two or more proteome states. It provides descriptive statistics, plots, automated data reformatting and offers the minimum customizability in order to keep its interface simple. ProteoSign had no restriction regarding the experimental methods used as input which gives the opportunity to analyze both label-free and labeled experiments. It accepts as input proteomics quantification data produced by either MaxQuant (MQ) (16) or Proteome Discoverer PD (Thermo Scientific). The data can derive from label-free or labeled experiments, currently supporting SILAC (17), pulsed SILAC (18), iTRAQ (19), TMTs (20) and dimethyl labeling (21) whereas label-swap replication is also supported. Here, we present a significantly updated second version of ProteoSign (v2), guided by feedback from actual users and the broader proteomics community. In this update, we have implemented new features, while still preserving the user friendliness of the tool. The new features and improvements include acceptance of updated input datasets, much faster analysis performance, support of new experimental designs, user-customised and reproduced plots, easy installation in a local server, improved documentation, deployment of the Docker technology and gene enrichment analysis of differentially expressed proteins.

NEW FEATURES AND UPDATES

General design and input data

The pipeline of ProteoSign v2 is shown in Figure 1. The frontend is written in HTML and JavaScript and consists of a welcome page, an analytical help page with examples and relevant pictures describing the whole process, and a detailed documentation. ProteoSign's backend is written in PHP and R and manages the data uploading and analysis processes, as well as the results visualization and downloading processes. It is platform independent and is fully compatible with all major browsers (Mozilla Firefox, Google Chrome, Apple Safari, Opera, etc). In its first version, ProteoSign accepted quantified differential proteomics data, produced by either Proteome Discoverer (PD) 1.3+ or MaxQuant (MQ) 1.3.0.5+. The updated version now accepts datasets also produced by Proteome Discoverer (PD) 2.4. Apart from the previous support of experimental designs, many users required support for Replication Multiplexing, i.e. experiments where biological/technical replicates or conditions are represented either as different tags or as different MS runs, a feature that was integrated in the new version of ProteoSign. Additionally, the User-Interface, help pages and documentation files have integrated user-suggested changes towards a more user-friendly environment and learning experience.

Figure 1.

ProteoSign's v2 pipeline.

Performance

There has been an extensive rewriting and optimization of the source code, in order to increase the performance (in terms of speed) of ProteoSign. Moreover, data tables (in R; package data.table) are now employed instead of dataframes. This new feature is better suited for large datasets and facilitates ProteoSign v2 to increase its running speed by 2- or 3-fold. Specifically, the features are fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast additions/modifications/deletions of columns by group without the use of temporary copies, column listings and faster text reading/writing. As a case study, we conducted the same analysis as in the first version of ProteoSign using the same demonstration datasets and calculated its running time. Speed improvements are presented in Table 1.

Table 1.

Running time comparison between ProteoSign and ProteoSign v2

Data set and PRIDE ID	Data size (MB)	Conditions	Biological replicates	Technical replicates	Fractionation	Samples	Running time ProteoSign v1 (min)	Running time ProteoSign v2 (min)
SILAC 2-plex (MQ) PXD001909 (23)	122	2	3	2	Y	72	<1	0.35 (21 s)
SILAC 2-plex (MQ) large PXD000778 (24)	787	2	4	6	Y	240	6	2
SILAC 2-plex (PD) large PXD000778 (24)	1100	2	4	6	Y	40	4	<2
Label-Free (MQ) large PXD004124 (25)	1070	2	2	3	Y	108	7	<4
TMT (MQ) PXD002622 (26)	62	2	5	0	Y	50	2	<1
TMT (PD) PXD002622 (26)	109	2	5	0	Y	50	2	<1
iTRAQ (PD) PXD004869	684	4	2	0	Y	42	12	<5
pSILAC 3-plex (MQ) PXD001976 (27)	336	2	6	0	Y	120	3	<2
pSILAC 3-plex (PD) PXD001976 (27)	831	2	6	0	Y	120	7	<4
Dimethyl 2-plex (PD) large PXD002073 (28)	1505	2	3	0	Y	36	9	<5

Running time comparison between ProteoSign and ProteoSign v2

User intervention

By adopting a visual analytics approach, in its current version, users can manually adjust several parameters and affect output results. For example, users can initially define the adjusted P-value threshold for differentially expressed proteins at any time. In addition, they have the option to disqualify proteins which were not quantified with at least a user defined number of different peptides, in at least a user defined number of biological replicates. Finally, users can load or save the parameter set of each run and quickly rerun the analysis without the need to re-enter every parameter. The latter is especially useful when planning, setting and defining an experimental design. In addition, the users can customize and reproduce any of the offered plots. The relevant R script is included and there is an extended help file describing how to run the processes in the command line.

Docker

All of ProteoSign v2 components along with the necessary dependencies are packed in a docker image. The docker image allows running ProteoSign v2 without installing any mandatory external libraries. This way, one can run ProteoSign v2 locally on any operating system and access it via her/his local web browser, overcoming network bandwidth issues.

Gene enrichment analysis

A new feature of ProteoSign v2 is its ability to perform gene enrichment analysis to functionally annotate significantly enriched groups consisting of differentially expressed proteins. Methods applied to the analysis originate from g:Profiler2 (22), an R package providing an R interface to g:Profiler, a web server for functional interpretation of gene lists. The query set of proteins consists of Uniprot Accession Numbers used to characterize each one of the proteins found to be expressed differentially. Incorporating an external server such as g:Profiler, analysis can be extended to a plethora of different organisms and enriched terms from several databases. The significance threshold for the detection of enriched terms and the correction method for multiple testing, are set by default values to 0.05 and Set Counts and Sizes (gSCS) respectively. Results can be viewed in an interactive table and are exported in a csv format. Detailed instructions are provided in the help pages.

Local server installation

The source code of ProteoSign v2 has been rewritten so that in its new form, it can be installed in a local server following a few simple steps (see new documentation page). Regardless of the server's system, ProteoSign can now be downloaded, installed and configured easily even from non-experts. A local server installation will provide much higher data transmission speeds, laboratory specific configuration and also the ability to incorporate ProteoSign as part of a custom pipeline.

DISCUSSION

Proteosign v2 is an updated user-friendly web server application for automated differential and statistical analysis on high-throughput proteomic quantification datasets. Novel features enable users to submit additional experimental designs (replication multiplexing), analyze various input datasets and perform gene enrichment analysis to extract differentially expressed proteins. Finally, the deployment of Docker technology along with a much faster performance (2–3×) due to significant code improvements are of great importance. Main differences between ProteoSign v1 and v2 are summarized in Table 2.

Table 2.

ProteoSign version 1 versus version 2

Feature	ProteoSign v1	ProteoSign v2
Aggregation	X	X
Filtering	X	X
Normalization	X	X
Differential analysis (based on LIMMA)	X	X
Generation of various data plots	X	X (plus Venn diagrams)
Enrichment analysis		X
Docker image		X
Support of Proteome Discoverer (PD) 2.4		X
Ability to install to Local Server		X
Support for Replication Multiplexing		X
User defined parameters		X
Higher speed performance		X

ProteoSign version 1 versus version 2

SERVER INFORMATION

ProteoSign v2 is a web application that runs on Apache 2 web server hosted on a Dell PowerEdge R720xd server machine. The server runs Ubuntu (kernel 3.2) and has 128 GB RAM and comes with two Intel Xeon E5–2650 processors clocked at 2GHz.

DATA AVAILABILITY

ProteoSign v2 is freely available at http://bioinformatics.med.uoc.gr/ProteoSign and the source at https://github.com/ananas14/ProteoSign. The docker image can be pulled for docker github https://hub.docker.com/r/mpaltsai/proteosign (command: docker pull mpaltsai/proteosign).

28 in total

1. Peptide-level Robust Ridge Regression Improves Estimation, Sensitivity, and Specificity in Data-dependent Quantitative Label-free Shotgun Proteomics.

Authors: Ludger J E Goeminne; Kris Gevaert; Lieven Clement
Journal: Mol Cell Proteomics Date: 2015-11-13 Impact factor: 5.911

2. Comparative Proteomics and Functional Analysis Reveal a Role of Plasmodium falciparum Osmiophilic Bodies in Malaria Parasite Transmission.

Authors: Pablo Suárez-Cortés; Vikram Sharma; Lucia Bertuccini; Giulia Costa; Naa-Lamiley Bannerman; Anna Rosa Sannella; Kim Williamson; Michael Klemba; Elena A Levashina; Edwin Lasonder; Pietro Alano
Journal: Mol Cell Proteomics Date: 2016-07-18 Impact factor: 5.911

3. StatQuant: a post-quantification analysis toolbox for improving quantitative mass spectrometry.

Authors: Bas van Breukelen; Henk W P van den Toorn; Madalina M Drugan; Albert J R Heck
Journal: Bioinformatics Date: 2009-03-31 Impact factor: 6.937

4. HiQuant: Rapid Postquantification Analysis of Large-Scale MS-Generated Proteomics Data.

Authors: Kenneth Bryan; Mohamed-Ali Jarboui; Cinzia Raso; Manuel Bernal-Llinares; Brendan McCann; Jens Rauch; Karsten Boldt; David J Lynn
Journal: J Proteome Res Date: 2016-05-16 Impact factor: 4.466

5. p53-Regulated Networks of Protein, mRNA, miRNA, and lncRNA Expression Revealed by Integrated Pulsed Stable Isotope Labeling With Amino Acids in Cell Culture (pSILAC) and Next Generation Sequencing (NGS) Analyses.

Authors: Sabine Hünten; Markus Kaller; Friedel Drepper; Silke Oeljeklaus; Thomas Bonfert; Florian Erhard; Anne Dueck; Norbert Eichner; Caroline C Friedel; Gunter Meister; Ralf Zimmer; Bettina Warscheid; Heiko Hermeking
Journal: Mol Cell Proteomics Date: 2015-07-16 Impact factor: 5.911

6. Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents.

Authors: Philip L Ross; Yulin N Huang; Jason N Marchese; Brian Williamson; Kenneth Parker; Stephen Hattan; Nikita Khainovski; Sasi Pillai; Subhakar Dey; Scott Daniels; Subhasish Purkayastha; Peter Juhasz; Stephen Martin; Michael Bartlet-Jones; Feng He; Allan Jacobson; Darryl J Pappin
Journal: Mol Cell Proteomics Date: 2004-09-22 Impact factor: 5.911

7. Rover: a tool to visualize and validate quantitative proteomics data from different sources.

Authors: Niklaas Colaert; Kenny Helsens; Francis Impens; Joël Vandekerckhove; Kris Gevaert
Journal: Proteomics Date: 2010-03 Impact factor: 3.984

8. Proteomics Is Analytical Chemistry: Fitness-for-Purpose in the Application of Top-Down and Bottom-Up Analyses.

Authors: Jens R Coorssen; Alfred L Yergey
Journal: Proteomes Date: 2015-12-03

9. g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update).

Authors: Uku Raudvere; Liis Kolberg; Ivan Kuzmin; Tambet Arak; Priit Adler; Hedi Peterson; Jaak Vilo
Journal: Nucleic Acids Res Date: 2019-07-02 Impact factor: 16.971

10. Robust, reproducible and quantitative analysis of thousands of proteomes by micro-flow LC-MS/MS.

Authors: Yangyang Bian; Runsheng Zheng; Florian P Bayer; Cassandra Wong; Yun-Chien Chang; Chen Meng; Daniel P Zolg; Maria Reinecke; Jana Zecha; Svenja Wiechmann; Stephanie Heinzlmeir; Johannes Scherr; Bernhard Hemmer; Mike Baynham; Anne-Claude Gingras; Oleksandr Boychenko; Bernhard Kuster
Journal: Nat Commun Date: 2020-01-09 Impact factor: 14.919

3 in total

1. Kidney Failure Alters Parathyroid Pin1 Phosphorylation and Parathyroid Hormone mRNA-Binding Proteins Leading to Secondary Hyperparathyroidism.

Authors: Alia Hassan; Yael E Pollak; Rachel Kilav-Levin; Justin Silver; Nir London; Morris Nechama; Iddo Z Ben-Dov; Tally Naveh-Many
Journal: J Am Soc Nephrol Date: 2022-08-12 Impact factor: 14.978

2. The Application of Internet of Things Data Analysis in the Development of International Trade.

Authors: Hao Qiuxia; Hou Yujie
Journal: Comput Intell Neurosci Date: 2022-06-20

3. OmicsOne: associate omics data with phenotypes in one-click.

Authors: Hui Zhang; Minghui Ao; Arianna Boja; Michael Schnaubelt; Yingwei Hu
Journal: Clin Proteomics Date: 2021-12-11 Impact factor: 3.988

3 in total