Literature DB >> 27605098

DAPAR & ProStaR: software to perform statistical analyses in quantitative discovery proteomics.

Samuel Wieczorek1,2,3, Florence Combes1,2,3, Cosmin Lazar1,2,3, Quentin Giai Gianetto1,2,3, Laurent Gatto4,5, Alexia Dorffer1,2,3, Anne-Marie Hesse1,2,3, Yohann Couté1,2,3, Myriam Ferro1,2,3, Christophe Bruley1,2,3, Thomas Burger1,2,3,6.   

Abstract

DAPAR and ProStaR are software tools to perform the statistical analysis of label-free XIC-based quantitative discovery proteomics experiments. DAPAR contains procedures to filter, normalize, impute missing value, aggregate peptide intensities, perform null hypothesis significance tests and select the most likely differentially abundant proteins with a corresponding false discovery rate. ProStaR is a graphical user interface that allows friendly access to the DAPAR functionalities through a web browser.
AVAILABILITY AND IMPLEMENTATION: DAPAR and ProStaR are implemented in the R language and are available on the website of the Bioconductor project (http://www.bioconductor.org/). A complete tutorial and a toy dataset are accompanying the packages. CONTACT: samuel.wieczorek@cea.fr, florence.combes@cea.fr, thomas.burger@cea.fr.
© The Author 2016. Published by Oxford University Press.

Entities:  

Mesh:

Substances:

Year:  2016        PMID: 27605098      PMCID: PMC5408771          DOI: 10.1093/bioinformatics/btw580

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


The objectives of quantitative discovery proteomics are to identify proteins in several biological samples that separate into at least two different biological conditions and to perform a relative quantification, so as to discriminate between the proteins which are significantly differentially abundant, and those which are not. This classically involves numerous steps: (i) protein extraction; (ii) proteins digestion into peptides; (iii) liquid chromatography and tandem mass spectrometry analysis; (iv) peptide identification on the basis of the fragmentation spectra; (v) peptide quantitation on the basis of the precursor chromatograms (XIC) and (vi) peptide aggregation into protein identity and abundance. The outcome of this analytical pipeline is a quantitative dataset that contains protein abundance across all replicates. Once the quantitative dataset is available, the quantitative analysis may start. Its objective is to rely on an efficient and reproducible statistical pipeline to isolate the subset of proteins that are characteristic of the differences between the biological conditions, on which further more exhaustive wet-laboratory experiments will be performed. Numerous tools are available to perform such quantitative analysis, either as stand-alone tools (e.g. MSstats; Choi ) or as a module of a larger bioinformatics tool (e.g. Skyline; MacLean ), or as generic software that is not restricted to proteomics, but can be used in a wider omics context (e.g. InfernoRDN—Former DAnTE; Polpitiya ) or even for general purpose statistics (e.g. JMP—http://www.jmp.com/). It is also possible to sort the available tools according to their code being open (MSstats and more generally any R package) or not (Perseus—http://www.biochem.mpg.de/5111810/perseus), as well as according to the presence of a graphical user interface (GUI) or not: generally most of the R packages are not fit with a GUI, while other software tools are. To date, the only software tool that is based on R and which is endowed with a GUI is InfernoRDN. However, the underlying R packages are not accessible, so that the code is not really open, and the GUI only works on Windows operating systems. As a result, to the best of the authors’ knowledge, there is so far no software tool that is (i) devoted to proteomics; (ii) devoted to quantitative analysis; (iii) with open-source code that guarantees reproducibility, interoperability and quality control of the code; (iv) with a user-friendly GUI and (v) which can be operated on any operating system. This lack has motivated the developments reported here. In general, quantitative analysis is composed of the following steps: Filtering: Some peptides or proteins may be discarded, on the basis of several user-defined criteria (number of missing values within each or across all the biological condition(s), contaminant database, decoy sequences, etc.). Normalization: The protein abundances are rescaled (within or between conditions) to account for the variability between the analyses. Several algorithms can be used: quantile normalization (Bolstad, 2007), abundance normalization, scaling/centering (either globally applied or by condition), etc. Imputation: To maximize the power of the statistical analysis, the missing values are imputed. This is achieved with one of the multiple available algorithms that accounts in a specific manner each for the specific nature of missing values (missing at random, or lower abundance censorship): k Nearest Neighbors (Hastie ), Maximum Likelihood Estimation (Schafer, 2008), Bayesian Principal Component Analysis (Stacklies ), Quantile Regression to Impute Left-Censored data (Lazar, 2015), etc. Aggregation: The peptide intensities are aggregated together so as to infer back the abundances of the proteins originally present in the samples. Several aggregation functions are classically used: sum, mean or median of the intensities of a set of peptides (all of them, the protein specific ones or only the N most abundant ones). Differential analysis: Finally, null hypothesis significance testing (with a Welch or limma t-tests; Ritchie ), as well as P-value adjustment are conducted, leading to a list of differentially abundant proteins endowed with a false discovery rate estimation. DAPAR (differential analysis of protein abundance with R) is an R package that either proposes new algorithms for these five computational steps or simply binds the R packages implementing pre-existing state-of-the-art methods (refer to the ProStaR and DAPAR tutorial for an updated list of the available algorithms). The main feature of DAPAR is to gather in a single package, all the necessary statistical routines for quantitative analysis. Moreover, it is completely compatible with (i) the MSnbase package (Gatto and Lilley, 2012), which provides a standard format for quantitative datasets, as well as with (ii) any bioconductor package, so that its functionalities can be easily extended. However, as is, its use requires being comfortable with R programming, which is not the case for all proteomics practitioners. This is why DAPAR is accompanied by ProStaR, a package that relies on Shiny technology (http://shiny.rstudio.com/) to dynamically build web-based GUI to DAPAR functionalities. All the user has to do is to copy–paste the following command lines source (‘http://www.bioconductor.org/biocLite.R’) biocLite (‘DAPAR’); biocLite (‘Prostar’); library (Prostar); Prostar () in the R console to open the GUI and to start the quantitative analysis by a series of clicks. Moreover, ProStaR is also available in server mode: a single (server) machine is installed and maintained with R, DAPAR and ProStaR, on which each practitioner connects through a given URL. This makes ProStaR particularly suited for proteomics labs where a single bioinformatician deploys and maintains the tools that are used by the proteomicians for their data analyses. In addition, to providing menus devoted to each of the five processing steps (filtering, normalization, imputation, aggregation and differential analysis), ProStaR provides import/export functionalities, as well as a ‘descriptive statistics’ menu where it is possible to visualize the dataset in hands, so as to best understand it or to produce display elements for publications. The packages DAPAR and ProStaR are separated for two reasons: first, ProStaR may be bypassed by any R coder that may want to directly access the low level functions of DAPAR, script their own pipelines and reproduce them in a better and simpler way. Second, the DAPAR functions can be directly mapped to other GUI (such as for instance ProLine software—http://proline.profiproteomics.fr/), so as to provide the same statistical pipeline in a different computational environment. DAPAR and ProStaR are actively maintained. Further versions of DAPAR will include additional algorithms for the five aforementioned processing steps, as well as possibly new steps, such as for instance, bioanaylsis and biological inference. ProStaR will include the interfaces to these new functionalities, as well as predefined pipelines proposing only a restricted set of functionalities that are particularly adapted to specific proteomics analysis (e.g. tandem affinity purification and subcellular localization). Finally, a demo version of ProStaR can be directly tested online at the following URL: http://www.prostar-proteomics.org.
  6 in total

1.  MSnbase-an R/Bioconductor package for isobaric tagged mass spectrometry data visualization, processing and quantitation.

Authors:  Laurent Gatto; Kathryn S Lilley
Journal:  Bioinformatics       Date:  2011-11-22       Impact factor: 6.937

2.  pcaMethods--a bioconductor package providing PCA methods for incomplete data.

Authors:  Wolfram Stacklies; Henning Redestig; Matthias Scholz; Dirk Walther; Joachim Selbig
Journal:  Bioinformatics       Date:  2007-03-07       Impact factor: 6.937

3.  DAnTE: a statistical tool for quantitative analysis of -omics data.

Authors:  Ashoka D Polpitiya; Wei-Jun Qian; Navdeep Jaitly; Vladislav A Petyuk; Joshua N Adkins; David G Camp; Gordon A Anderson; Richard D Smith
Journal:  Bioinformatics       Date:  2008-05-03       Impact factor: 6.937

4.  Skyline: an open source document editor for creating and analyzing targeted proteomics experiments.

Authors:  Brendan MacLean; Daniela M Tomazela; Nicholas Shulman; Matthew Chambers; Gregory L Finney; Barbara Frewen; Randall Kern; David L Tabb; Daniel C Liebler; Michael J MacCoss
Journal:  Bioinformatics       Date:  2010-02-09       Impact factor: 6.937

5.  MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments.

Authors:  Meena Choi; Ching-Yun Chang; Timothy Clough; Daniel Broudy; Trevor Killeen; Brendan MacLean; Olga Vitek
Journal:  Bioinformatics       Date:  2014-05-02       Impact factor: 6.937

6.  limma powers differential expression analyses for RNA-sequencing and microarray studies.

Authors:  Matthew E Ritchie; Belinda Phipson; Di Wu; Yifang Hu; Charity W Law; Wei Shi; Gordon K Smyth
Journal:  Nucleic Acids Res       Date:  2015-01-20       Impact factor: 16.971

  6 in total
  68 in total

1.  Platelet factor 4 is a biomarker for lymphatic-promoted disorders.

Authors:  Wanshu Ma; Hyea Jin Gil; Noelia Escobedo; Alberto Benito-Martín; Pilar Ximénez-Embún; Javier Muñoz; Héctor Peinado; Stanley G Rockson; Guillermo Oliver
Journal:  JCI Insight       Date:  2020-07-09

2.  Ribosomal Proteins Regulate MHC Class I Peptide Generation for Immunosurveillance.

Authors:  Jiajie Wei; Rigel J Kishton; Matthew Angel; Crystal S Conn; Nicole Dalla-Venezia; Virginie Marcel; Anne Vincent; Frédéric Catez; Sabrina Ferré; Lilia Ayadi; Virginie Marchand; Devin Dersh; James S Gibbs; Ivaylo P Ivanov; Nathan Fridlyand; Yohann Couté; Jean-Jacques Diaz; Shu-Bing Qian; Louis M Staudt; Nicholas P Restifo; Jonathan W Yewdell
Journal:  Mol Cell       Date:  2019-01-31       Impact factor: 17.970

3.  ProteoSign: an end-user online differential proteomics statistical analysis platform.

Authors:  Georgios Efstathiou; Andreas N Antonakis; Georgios A Pavlopoulos; Theodosios Theodosiou; Peter Divanach; David C Trudgian; Benjamin Thomas; Nikolas Papanikolaou; Michalis Aivaliotis; Oreste Acuto; Ioannis Iliopoulos
Journal:  Nucleic Acids Res       Date:  2017-07-03       Impact factor: 16.971

4.  Phenotypic Adaption of Pseudomonas aeruginosa by Hacking Siderophores Produced by Other Microorganisms.

Authors:  Quentin Perraud; Paola Cantero; Béatrice Roche; Véronique Gasser; Vincent P Normant; Lauriane Kuhn; Philippe Hammann; Gaëtan L A Mislin; Laurence Ehret-Sabatier; Isabelle J Schalk
Journal:  Mol Cell Proteomics       Date:  2020-02-05       Impact factor: 5.911

5.  CONSTANd : A Normalization Method for Isobaric Labeled Spectra by Constrained Optimization.

Authors:  Evelyne Maes; Wahyu Wijaya Hadiwikarta; Inge Mertens; Geert Baggerman; Jef Hooyberghs; Dirk Valkenborg
Journal:  Mol Cell Proteomics       Date:  2016-06-14       Impact factor: 5.911

6.  A Bayesian Null Interval Hypothesis Test Controls False Discovery Rates and Improves Sensitivity in Label-Free Quantitative Proteomics.

Authors:  Robert J Millikin; Michael R Shortreed; Mark Scalf; Lloyd M Smith
Journal:  J Proteome Res       Date:  2020-04-14       Impact factor: 4.466

7.  The Long Hunt for pssR-Looking for a Phospholipid Synthesis Transcriptional Regulator, Finding the Ribosome.

Authors:  J Bartoli; L My; Lucid Belmudes; Yohann Couté; J P Viala; E Bouveret
Journal:  J Bacteriol       Date:  2017-06-27       Impact factor: 3.490

8.  Evolution of gene dosage on the Z-chromosome of schistosome parasites.

Authors:  Marion A L Picard; Celine Cosseau; Sabrina Ferré; Thomas Quack; Christoph G Grevelding; Yohann Couté; Beatriz Vicoso
Journal:  Elife       Date:  2018-07-25       Impact factor: 8.140

9.  Ral GTPases promote breast cancer metastasis by controlling biogenesis and organ targeting of exosomes.

Authors:  Jacky G Goetz; Vincent Hyenne; Shima Ghoroghi; Benjamin Mary; Annabel Larnicol; Nandini Asokan; Annick Klein; Naël Osmani; Ignacio Busnelli; François Delalande; Nicodème Paul; Sébastien Halary; Frédéric Gros; Laetitia Fouillen; Anne-Marie Haeberle; Cathy Royer; Coralie Spiegelhalter; Gwennan André-Grégoire; Vincent Mittelheisser; Alexandre Detappe; Kendelle Murphy; Paul Timpson; Raphaël Carapito; Marcel Blot-Chabaud; Julie Gavard; Christine Carapito; Nicolas Vitale; Olivier Lefebvre
Journal:  Elife       Date:  2021-01-06       Impact factor: 8.140

10.  Infection-driven activation of transglutaminase 2 boosts glucose uptake and hexosamine biosynthesis in epithelial cells.

Authors:  Benoit Maffei; Marc Laverrière; Yongzheng Wu; Sébastien Triboulet; Stéphanie Perrinet; Magalie Duchateau; Mariette Matondo; Robert L Hollis; Charlie Gourley; Jan Rupp; Jeffrey W Keillor; Agathe Subtil
Journal:  EMBO J       Date:  2020-03-05       Impact factor: 11.598

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.