Literature DB >> 24413670

Mass-spectrometry-based spatial proteomics data analysis using pRoloc and pRolocdata.

Laurent Gatto1, Lisa M Breckels, Samuel Wieczorek, Thomas Burger, Kathryn S Lilley.   

Abstract

MOTIVATION: Experimental spatial proteomics, i.e. the high-throughput assignment of proteins to sub-cellular compartments based on quantitative proteomics data, promises to shed new light on many biological processes given adequate computational tools.
RESULTS: Here we present pRoloc, a complete infrastructure to support and guide the sound analysis of quantitative mass-spectrometry-based spatial proteomics data. It provides functionality for unsupervised and supervised machine learning for data exploration and protein classification and novelty detection to identify new putative sub-cellular clusters. The software builds upon existing infrastructure for data management and data processing.

Entities:  

Mesh:

Substances:

Year:  2014        PMID: 24413670      PMCID: PMC3998135          DOI: 10.1093/bioinformatics/btu013

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

Knowledge of the spatial distribution of proteins is of critical importance to elucidate their role and refine our understanding of cellular processes. Mis-localization of proteins have been associated with cellular dysfunction and disease states (Kau ; Laurila ; Park ), highlighting the importance of localization studies. Spatial or organelle proteomics is the systematic study of the proteins and their sub-cellular localization; these compartments can be organelles, i.e. structures defined by lipid bi-layers, macro-molecular assemblies of proteins and nucleic acids or large protein complexes. Despite technological advances in spatial proteomics experimental designs and progress in mass-spectrometry (Gatto ), software support is lacking. To address this, we developed the pRoloc package that provides a wide range of thoroughly documented analysis methodologies. The software includes state-of-the-art statistical machine-learning algorithms and bundles them in a consistent framework, accommodating any experimental designs and quantitation strategies.

2 AVAILABLE FUNCTIONALITY

pRoloc makes use of the architecture implemented in the MSnbase package (Gatto and Lilley, 2012) for data storage, feature and sample annotation (meta-data) and data processing, such as scaling, normalization and missing data imputation. We also distribute 16 annotated datasets in the pRolocdata package, which are used for illustration of different pipelines as well as algorithm testing and development. Algorithms for (i) clustering, (ii) novelty detection and (iii) classification are proposed along with visualization functionalities.

2.1 Clustering

The unsupervised machine-learning techniques are used, among other aims, as exploration and quality control tools. Several critical factors such as feature-level quantitation values, the extent of missing values and organelle markers can be overlaid on the data clusters as effective data exploration and quality control.

2.2 Novelty detection

An essential step for reliable classification is the availability of well-characterized labeled data, termed ‘marker proteins’. These reliable organelle residents define the set of observed organelles and are used to train a classifier. It is however laborious and extremely difficult to manually define reliable markers for all possible sub-cellular structures. As such, any organelles without any suitable markers will be completely omitted from subsequent classification. pRoloc provides the implementation for the phenoDisco novelty detection algorithm (Breckels ) that, based on a minimal set of markers and unlabeled data, can be used to effectively detect new putative clusters in the data, beyond those that were initially manually described (Fig. 1).
Fig. 1.

Current state-of-the-art experimental organelle proteomics data analysis with pRoloc. On the left, we replicated the original findings from Tan on Drosophila embryos. On the right, we present results of the same data set obtained with pRoloc, utilizing the novelty discovery functionality (new color-coded organelles) and a class-weighted support vector machine (SVM) algorithm with classifier posterior probabilities (point sizes)

Current state-of-the-art experimental organelle proteomics data analysis with pRoloc. On the left, we replicated the original findings from Tan on Drosophila embryos. On the right, we present results of the same data set obtained with pRoloc, utilizing the novelty discovery functionality (new color-coded organelles) and a class-weighted support vector machine (SVM) algorithm with classifier posterior probabilities (point sizes)

2.3 Classification

Since the development and refinement of spatial proteomics experiments, several classification methods have been used: partial least-square discriminant analysis (Dunkley ), SVMs (Trotter ), random forest (Ohta ), neural networks (Tardif ) and naive Bayes (Nikolovski ), all available in pRoloc. In addition, other novel algorithms are proposed, such as PerTurbo (Courty ). We have compared and contrasted these algorithms using reliable marker sets and demonstrate in the package documentation that the driving factor for good classification is reflected in the intrinsic quality of the data itself, i.e. efficient cellular content separation, accurate quantitation (Jakobsen ), etc. illustrating the minor importance of the classification algorithm with respect to thorough data exploration and quality control. While the exact algorithm might not be the major reason for a good analysis, it is essential to guarantee optimal application of the algorithm. A central design decision in the development of the classification schema was to explicitly implement model parameter optimization routines to maximize the generalization power of the results.

3 A TYPICAL PIPELINE

A typical pipeline is summarized below using data from Arabidopsis thaliana callus (Dunkley ). We first load the required packages and example data. The phenoDisco function is then run to identify new putative clusters that, after validation (the pd.markers feature meta-data), can be used for the classification using the SVM algorithm (with a Gaussian kernel). The algorithms parameters are first optimized and then subsequently applied in the actual classification. Finally, the plot2D function is used to generate an annotated scatter plot along the two first principal components (Fig. 1). library(pRoloc) library(pRolocdata) data(dunkley2006) res <- phenoDisco(dunkley2006) p <- svmOptimisation(res, fcol="pd.markers") res <- svmClassification(res, p, fcol="pd.markers") plot2D(res, fcol="svm")

4 CONCLUSIONS

The need for statistically sound proteomics data analysis has spawned interest in the proteomics community (Gatto and Christoforou, 2013) for R and Bioconductor (Gentleman ). pRoloc is a mature R package that provide users with dedicated data infrastructure, visualization functionality and state-of-the-art machine-learning methodologies, enabling unparalleled insight into experimental spatial proteomics data. It is also a framework to further develop spatial proteomics data analysis and novel pipelines. Multiple organelle proteomics datasets illustrating various and diverse experimental designs are available in pRolocdata. Both packages come with thorough documentation and represent a unique framework for sound and reproducible organelle proteomics data analysis. Funding: European Union 7th Framework Program (PRIME-XS project, grant agreement number 262067); BBSRC Tools and Resources Development Fund (Award BB/K00137X/1); Prospectom project (Mastodons 2012 CNRS challenge). Conflict of Interest: none declared.
  15 in total

Review 1.  Nuclear transport and cancer: from mechanism to intervention.

Authors:  Tweeny R Kau; Jeffrey C Way; Pamela A Silver
Journal:  Nat Rev Cancer       Date:  2004-02       Impact factor: 60.716

Review 2.  Organelle proteomics experimental designs and analysis.

Authors:  Laurent Gatto; Juan Antonio Vizcaíno; Henning Hermjakob; Wolfgang Huber; Kathryn S Lilley
Journal:  Proteomics       Date:  2010-11-02       Impact factor: 3.984

3.  PredAlgo: a new subcellular localization prediction tool dedicated to green algae.

Authors:  Marianne Tardif; Ariane Atteia; Michael Specht; Guillaume Cogne; Norbert Rolland; Sabine Brugière; Michael Hippler; Myriam Ferro; Christophe Bruley; Gilles Peltier; Olivier Vallon; Laurent Cournac
Journal:  Mol Biol Evol       Date:  2012-07-23       Impact factor: 16.240

4.  Mapping the Arabidopsis organelle proteome.

Authors:  Tom P J Dunkley; Svenja Hester; Ian P Shadforth; John Runions; Thilo Weimar; Sally L Hanton; Julian L Griffin; Conrad Bessant; Federica Brandizzi; Chris Hawes; Rod B Watson; Paul Dupree; Kathryn S Lilley
Journal:  Proc Natl Acad Sci U S A       Date:  2006-04-17       Impact factor: 11.205

5.  Bioconductor: open software development for computational biology and bioinformatics.

Authors:  Robert C Gentleman; Vincent J Carey; Douglas M Bates; Ben Bolstad; Marcel Dettling; Sandrine Dudoit; Byron Ellis; Laurent Gautier; Yongchao Ge; Jeff Gentry; Kurt Hornik; Torsten Hothorn; Wolfgang Huber; Stefano Iacus; Rafael Irizarry; Friedrich Leisch; Cheng Li; Martin Maechler; Anthony J Rossini; Gunther Sawitzki; Colin Smith; Gordon Smyth; Luke Tierney; Jean Y H Yang; Jianhua Zhang
Journal:  Genome Biol       Date:  2004-09-15       Impact factor: 13.583

6.  Improved sub-cellular resolution via simultaneous analysis of organelle proteomics data across varied experimental conditions.

Authors:  Matthew W B Trotter; Pawel G Sadowski; Tom P J Dunkley; Arnoud J Groen; Kathryn S Lilley
Journal:  Proteomics       Date:  2010-12       Impact factor: 3.984

7.  The effect of organelle discovery upon sub-cellular protein localisation.

Authors:  L M Breckels; L Gatto; A Christoforou; A J Groen; K S Lilley; M W B Trotter
Journal:  J Proteomics       Date:  2013-03-21       Impact factor: 4.044

8.  The protein composition of mitotic chromosomes determined using multiclassifier combinatorial proteomics.

Authors:  Shinya Ohta; Jimi-Carlo Bukowski-Wills; Luis Sanchez-Pulido; Flavia de Lima Alves; Laura Wood; Zhuo A Chen; Melpi Platani; Lutz Fischer; Damien F Hudson; Chris P Ponting; Tatsuo Fukagawa; William C Earnshaw; Juri Rappsilber
Journal:  Cell       Date:  2010-09-03       Impact factor: 41.582

9.  Protein localization as a principal feature of the etiology and comorbidity of genetic diseases.

Authors:  Solip Park; Jae-Seong Yang; Young-Eun Shin; Juyong Park; Sung Key Jang; Sanguk Kim
Journal:  Mol Syst Biol       Date:  2011-05-24       Impact factor: 11.429

10.  Putative glycosyltransferases and other plant Golgi apparatus proteins are revealed by LOPIT proteomics.

Authors:  Nino Nikolovski; Denis Rubtsov; Marcelo P Segura; Godfrey P Miles; Tim J Stevens; Tom P J Dunkley; Sean Munro; Kathryn S Lilley; Paul Dupree
Journal:  Plant Physiol       Date:  2012-08-24       Impact factor: 8.340

View more
  35 in total

Review 1.  Multidimensional proteomics for cell biology.

Authors:  Mark Larance; Angus I Lamond
Journal:  Nat Rev Mol Cell Biol       Date:  2015-04-10       Impact factor: 94.444

2.  Using hyperLOPIT to perform high-resolution mapping of the spatial proteome.

Authors:  Claire M Mulvey; Lisa M Breckels; Aikaterini Geladaki; Nina Kočevar Britovšek; Daniel J H Nightingale; Andy Christoforou; Mohamed Elzek; Michael J Deery; Laurent Gatto; Kathryn S Lilley
Journal:  Nat Protoc       Date:  2017-05-04       Impact factor: 13.491

3.  Deciphering thylakoid sub-compartments using a mass spectrometry-based approach.

Authors:  Martino Tomizioli; Cosmin Lazar; Sabine Brugière; Thomas Burger; Daniel Salvi; Laurent Gatto; Lucas Moyet; Lisa M Breckels; Anne-Marie Hesse; Kathryn S Lilley; Daphné Seigneurin-Berny; Giovanni Finazzi; Norbert Rolland; Myriam Ferro
Journal:  Mol Cell Proteomics       Date:  2014-05-28       Impact factor: 5.911

4.  Mapping the Saccharomyces cerevisiae Spatial Proteome with High Resolution Using hyperLOPIT.

Authors:  Daniel J H Nightingale; Stephen G Oliver; Kathryn S Lilley
Journal:  Methods Mol Biol       Date:  2019

5.  Redefining the Breast Cancer Exosome Proteome by Tandem Mass Tag Quantitative Proteomics and Multivariate Cluster Analysis.

Authors:  David J Clark; William E Fondrie; Zhongping Liao; Phyllis I Hanson; Amy Fulton; Li Mao; Austin J Yang
Journal:  Anal Chem       Date:  2015-09-29       Impact factor: 6.986

6.  A Portrait of the Human Organelle Proteome In Space and Time during Cytomegalovirus Infection.

Authors:  Pierre M Jean Beltran; Rommel A Mathias; Ileana M Cristea
Journal:  Cell Syst       Date:  2016-09-15       Impact factor: 10.304

7.  Testing and Validation of Computational Methods for Mass Spectrometry.

Authors:  Laurent Gatto; Kasper D Hansen; Michael R Hoopmann; Henning Hermjakob; Oliver Kohlbacher; Andreas Beyer
Journal:  J Proteome Res       Date:  2015-11-17       Impact factor: 4.466

Review 8.  Understanding molecular mechanisms of disease through spatial proteomics.

Authors:  Sandra Pankow; Salvador Martínez-Bartolomé; Casimir Bamberger; John R Yates
Journal:  Curr Opin Chem Biol       Date:  2018-10-09       Impact factor: 8.822

9.  Separating Golgi Proteins from Cis to Trans Reveals Underlying Properties of Cisternal Localization.

Authors:  Harriet T Parsons; Tim J Stevens; Heather E McFarlane; Silvia Vidal-Melgosa; Johannes Griss; Nicola Lawrence; Richard Butler; Mirta M L Sousa; Michelle Salemi; William G T Willats; Christopher J Petzold; Joshua L Heazlewood; Kathryn S Lilley
Journal:  Plant Cell       Date:  2019-07-02       Impact factor: 11.277

10.  Proteome Mapping of a Cyanobacterium Reveals Distinct Compartment Organization and Cell-Dispersed Metabolism.

Authors:  Laura L Baers; Lisa M Breckels; Lauren A Mills; Laurent Gatto; Michael J Deery; Tim J Stevens; Christopher J Howe; Kathryn S Lilley; David J Lea-Smith
Journal:  Plant Physiol       Date:  2019-10-02       Impact factor: 8.340

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.