Literature DB >> 21131968

SAINT: probabilistic scoring of affinity purification-mass spectrometry data.

Hyungwon Choi1, Brett Larsen, Zhen-Yuan Lin, Ashton Breitkreutz, Dattatreya Mellacheruvu, Damian Fermin, Zhaohui S Qin, Mike Tyers, Anne-Claude Gingras, Alexey I Nesvizhskii.   

Abstract

We present 'significance analysis of interactome' (SAINT), a computational tool that assigns confidence scores to protein-protein interaction data generated using affinity purification-mass spectrometry (AP-MS). The method uses label-free quantitative data and constructs separate distributions for true and false interactions to derive the probability of a bona fide protein-protein interaction. We show that SAINT is applicable to data of different scales and protein connectivity and allows transparent analysis of AP-MS data.

Entities:  

Mesh:

Substances:

Year:  2010        PMID: 21131968      PMCID: PMC3064265          DOI: 10.1038/nmeth.1541

Source DB:  PubMed          Journal:  Nat Methods        ISSN: 1548-7091            Impact factor:   28.547


The analysis of protein complexes and protein interaction networks is of central importance in biological research. A combination of affinity purification and mass spectrometry (AP-MS) has been increasingly used for both small scale and large scale analysis of protein complexes and interaction networks 1–4. However, the development of computational tools for the processing of AP-MS data has not kept pace with improvement in experimental approaches. In addition to the general challenge of false positive protein identifications in MS-based proteomic data5, unfiltered AP-MS datasets contain a large number of non specifically binding proteins; filtering these contaminants represents the foremost computational challenge. While early methods filtered the noise using binary data (presence or absence of a protein), more recently proposed methods take into account quantitative information embedded in the mass spectrometric data (e.g. label-free quantification, such as spectral counts). For example, one recently described method converts the normalized spectral abundance factor (NSAF) into the posterior probability of a true interaction between a bait-prey pair using simple heuristics, which we term PP-NSAF hereafter 6. Another method, CompPASS computes scores that adjust observed spectral counts relative to the reproducibility of detection across biological replicates and to the frequency of observing prey proteins in purifications of different baits7. Although both approaches are effective in analyzing the datasets for which they were developed, these scores are an empirical transformation of spectral counts without a probability model for the measurement errors in the data in a transparent manner. In a recent work we introduced an advanced approach for statistical analysis of interaction data from AP-MS experiments utilizing label-free quantification, which we termed Significance Analysis of INTeractome (SAINT) 8. Like PP-NSAF and CompPASS, our original SAINT approach was designed for the analysis of a specific dataset, the yeast kinase and phosphatase interactome. Here we present a generalized SAINT framework that can compute interaction probabilities in a variety of datasets. The method incorporates negative controls commonly generated as a part of the experimental study, but can also be applied to large datasets in the absence of such data. Here we illustrate the methodology and its advantages through the analysis of datasets of different sizes and network density levels: from a large, sparsely connected network involving human deubiquitinating enzymes to a smaller, highly interconnected network for chromatin remodeling proteins, and even to the analysis of a single bait, the protein CDC23. The aim of SAINT is to convert the label free quantification (spectral count X) for a prey protein i identified in a purification of bait j into the probability of true interaction between the two proteins, P(True|X. The spectral counts for each prey-bait pair are modeled with a mixture distribution of two components representing true and false interactions. Note that these distributions are specific to each bait-prey pair. The parameters for true and false distributions, P(X and P(X, and the prior probability π of true interactions in the dataset, are inferred from the spectral counts for all interactions involving prey i and bait j. SAINT normalizes spectral counts to the length of the proteins and to the total number of spectra in the purification. In addition to the experimental data for bait proteins, AP-MS data often contain negative controls (Fig. 1a). When these are available, SAINT estimates the spectral count distribution for false interactions directly from the negative controls, which makes the modeling approach semi-supervised (see Methods). SAINT modeling can also be performed without negative control data, given that a sufficient number of independent baits are profiled, and provided that these baits are not densely interconnected. In this case (illustrated in Fig. 1b), a prey detected in the purification of a bait is scored in reference to the quantitative information for the same prey across purifications of all other baits in the dataset. While this is possible for large datasets such as the yeast kinase and phosphatase network8 and the human deubiquitinating (DUB) enzyme interaction network7 (that each contain >75 baits; see below), this unsupervised approach involves additional assumptions and separate treatment of high and low frequency prey proteins (see Methods).
Figure 1

Probability model in SAINT

a–b Interaction data in the presence (a) and absence (b) of control purifications. Top: schematic of the experimental AP-MS procedure; Bottom: illustration of a spectral count interaction table. c. Modeling spectral count distributions for true and false interactions. For the interaction between prey i and bait j, SAINT utilizes all relevant data for the two proteins, as shown in the column of the bait (green) and the data in the row of the prey (orange) in a and b. d. Probability is calculated for each replicate by application of Bayes rule, and a summary probability is calculated for the interaction pair (i,j).

One challenge in modeling AP-MS data is the limited number of replicates available for each bait. SAINT addresses this problem by inferring individual bait-prey interaction parameters via joint modeling of the entire bait-prey data. To this end, SAINT defines a protein-specific abundance parameter and establishes a multiplicative model in the mixture component distributions. In other words, if prey i and bait j interact, then the “interaction abundance” (the spectral count of the prey i in purification with bait j) is assumed to be proportional to α×α. Under this assumption, the protein-specific abundance parameters α and α can be learned not only from the interaction between the two proteins themselves, but also from other bona fide interactions that involve either one of them. The same principle applies to false interactions. Hence SAINT builds a large number of mixture distributions by pooling data (separate mixture distributions for individual prey-bait pairs), but all models are interconnected through the shared abundance parameters. The probability distributions P(X and P(X are then used to calculate the posterior probability of true interaction P(True|X (Fig. 1c and 1d, Methods). For baits profiled in replicates, the next step involves computing a combined probability score from independent scoring of each replicate (see Methods). Finally, SAINT probabilities can be used to estimate the false discovery rate (FDR). By ordering interactions in a decreasing order of probabilities, a threshold can be selected that considers the average of the complement probabilities as the Bayesian FDR9. Although the accuracy of FDR estimates remains to be validated, the availability of an objective reliability measure that has been widely used is an advantage over other methods. The performance of the generalized SAINT model was first investigated using a human dataset centered around four key protein complexes involved in chromatin remodeling, Prefoldin, hINO80, SRCAP, and TRRAP/TIP60 (referred to as the TIP49 dataset)6. While the original publication focused the analysis on the interaction network observed between a core set of 65 proteins, the entire dataset provided by the authors of the study is analyzed here. The dataset consists of 27 baits (35 purifications) and 1207 preys which yielded 5521 unfiltered interactions. 35 negative controls were included in the dataset, allowing semi-supervised modeling (Fig. 1a; Supplementary Table 1). We applied SAINT to this data and compared the results to PP-NSAF6 and CompPASS Z and DN scores7,10, which we re-implemented in-house (see Methods). We note that PP-NSAF6 removes all interactions involving prey proteins for which the sum of squared NSAF values across the negative control purifications is higher than that in the experiments containing bait proteins. CompPASS is the only method that does not incorporate negative controls in scoring. SAINT selected 1375 interactions at the probability threshold 0.9, which was approximately equivalent to an estimated FDR of 2%. In PP-NSAF, since arbitrary cutoffs were set to define high, moderate, and low probability interaction sets, the same number of top scoring interactions was selected from the method (corresponding to a PP-NSAF probability 0.2 or higher). In CompPASS, the same number of interactions corresponded to a DN-score threshold of 1.48 (Supplementary Table 1). We evaluated the performance of each algorithm firstly by benchmarking the selected interactions against two interaction databases BioGRID11 and iRefWeb12 (Fig. 2a), and secondly by assessing the co-annotation rate of interaction partners to common Gene Ontology (GO) terms in Biological Processes (Fig. 2b; Supplementary Table 1). SAINT filtered interactions (with controls) consistently showed the highest overlap with previously reported interactions and co-annotation rates to terms relevant to chromatin remodelling, including histone acetylation, protein amino acid acetylation, chromatin organization and modification, and cellular macromolecular complex assembly. Variation of the SAINT probability thresholds (0.8 ~ 0.95) did not qualitatively change this conclusion (data not shown). Note that omission of negative controls from SAINT modeling decreased the literature overlap (Supplementary Fig. 1). Explicit incorporation of the negative control data improves the robustness of modeling, especially in small to medium datasets.
Figure 2

Analysis of TIP49 and DUB datasets

a. Benchmarking of filtered interactions in the TIP49 dataset by the overlap with interactions previously reported in BioGRID and iRefWeb databases. b. Co-annotation of interaction partners to common GO terms in Biological Processes in the TIP49 dataset. c. Benchmarking against BioGRID and iRefWeb in the DUB dataset. d. Co-annotation to GO terms in the DUB dataset.

The performance of SAINT for large scale datasets without negative controls (Fig. 1b) was tested on the human deubiquitinating enzymes (DUB) dataset 7 (this dataset was used in the development of CompPASS). High confidence interactions from SAINT were compared to the high confidence set from CompPASS (see Supplementary Table 2). Due to the absence of negative controls, it was not possible to apply PP-NSAF to this dataset. SAINT probabilities and DN scores were notably correlated (Pearson correlation r=0.79). At probability 0.8 threshold, SAINT selected 1300 interactions, while CompPASS DN≥1 (threshold value used in 7) reported 1377 interactions. Of these, 1051 interactions were common to both methods. Reflecting the similarity of selected interactions, SAINT and CompPASS recovered previously reported interactions at comparable rates (Fig. 2c). In the top 1000 interactions, SAINT showed higher overlap with literature data. The co-annotation of interaction partners to the common GO terms also showed similar results between the two methods (Fig. 2d), including relevant terms such as positive and negative regulation of ubiquitin-protein ligase activity during mitotic cell cycle, proteasome, etc. (Supplementary Table 2). While SAINT and CompPASS recovered largely overlapping interactions, SAINT removed the interactions identified with 1–2 spectral counts, which were still scored by CompPASS if they were specific to a single bait protein and detected in duplicates. Another advantage of SAINT over other methods is that it is applicable to the analysis of small-scale datasets for which control purifications are available; this extends to the case of a single bait. We illustrate this by using a recent dataset13 containing 3 experimental purifications of the bait CDC23 and 3 control purifications. In the original analysis, the authors of the study identified true interactions using ion intensity-based quantification followed by a simple t-test. We applied the SAINT approach to the same dataset by using spectral counts (the data was researched in-house as described in Methods). The results obtained by SAINT were nearly identical to the initial report (Supplementary Table 3), the sole exception being the single peptide hit C11orf51, which was reported as a new interactor in the original analysis13, but which was removed by SAINT. In summary, SAINT is a probability-based model that is generally applicable to mass spectrometry-based interaction data. The SAINT model presented here is based on label-free quantification using spectral counts, a parameter that is easily extracted from most AP-MS datasets. However, SAINT can also be extended to model other types of quantitative parameters such as peptide ion intensity 14 or other continuous variables 15, which can be accommodated by simply substituting the likelihood with an appropriate continuous distribution. Supplementary Figure 1. Literature overlap for the TIP49 dataset using SAINT with and without control purification data. Supplementary Table 1. Supplementary data for the TIP49 dataset. 1A) List of all detected interactions and scores from PP-NSAF, CompPASS, and SAINT. 1B) All interactions in control purifications were included in a separate table after merging of 35 technical replicate purifications into 9 purifications. 1C) Table of technical replicates of control purifications. 1D) GO terms enrichment in top scoring interactions for each scoring method. Supplementary Table 2. Supplementary data for the DUB dataset. 2A) List of all detected interactions and scores from CompPASS and SAINT. 2B–D) GO terms enrichment in top scoring interactions for each scoring method. Supplementary Table 3. Supplementary data for the CDC23 dataset. List of all detected interactions with SAINT scores and results reported by t-test.
  15 in total

1.  Proteome survey reveals modularity of the yeast cell machinery.

Authors:  Anne-Claude Gavin; Patrick Aloy; Paola Grandi; Roland Krause; Markus Boesche; Martina Marzioch; Christina Rau; Lars Juhl Jensen; Sonja Bastuck; Birgit Dümpelfeld; Angela Edelmann; Marie-Anne Heurtier; Verena Hoffman; Christian Hoefert; Karin Klein; Manuela Hudak; Anne-Marie Michon; Malgorzata Schelder; Markus Schirle; Marita Remor; Tatjana Rudi; Sean Hooper; Andreas Bauer; Tewis Bouwmeester; Georg Casari; Gerard Drewes; Gitte Neubauer; Jens M Rick; Bernhard Kuster; Peer Bork; Robert B Russell; Giulio Superti-Furga
Journal:  Nature       Date:  2006-01-22       Impact factor: 49.962

2.  An integrated mass spectrometric and computational framework for the analysis of protein interaction networks.

Authors:  Oliver Rinner; Lukas N Mueller; Martin Hubálek; Markus Müller; Matthias Gstaiger; Ruedi Aebersold
Journal:  Nat Biotechnol       Date:  2007-02-25       Impact factor: 54.908

3.  Probabilistic assembly of human protein interaction networks from label-free quantitative proteomics.

Authors:  Mihaela E Sardiu; Yong Cai; Jingji Jin; Selene K Swanson; Ronald C Conaway; Joan W Conaway; Laurence Florens; Michael P Washburn
Journal:  Proc Natl Acad Sci U S A       Date:  2008-01-24       Impact factor: 11.205

4.  A global protein kinase and phosphatase interaction network in yeast.

Authors:  Ashton Breitkreutz; Hyungwon Choi; Jeffrey R Sharom; Lorrie Boucher; Victor Neduva; Brett Larsen; Zhen-Yuan Lin; Bobby-Joe Breitkreutz; Chris Stark; Guomin Liu; Jessica Ahn; Danielle Dewar-Darch; Teresa Reguly; Xiaojing Tang; Ricardo Almeida; Zhaohui Steve Qin; Tony Pawson; Anne-Claude Gingras; Alexey I Nesvizhskii; Mike Tyers
Journal:  Science       Date:  2010-05-21       Impact factor: 47.728

5.  Quantitative proteomics combined with BAC TransgeneOmics reveals in vivo protein interactions.

Authors:  Nina C Hubner; Alexander W Bird; Jürgen Cox; Bianca Splettstoesser; Peter Bandilla; Ina Poser; Anthony Hyman; Matthias Mann
Journal:  J Cell Biol       Date:  2010-05-17       Impact factor: 10.539

6.  iRefWeb: interactive analysis of consolidated protein interaction data and their supporting evidence.

Authors:  Brian Turner; Sabry Razick; Andrei L Turinsky; James Vlasblom; Edgard K Crowdy; Emerson Cho; Kyle Morrison; Ian M Donaldson; Shoshana J Wodak
Journal:  Database (Oxford)       Date:  2010-10-12       Impact factor: 3.451

7.  Defining the human deubiquitinating enzyme interaction landscape.

Authors:  Mathew E Sowa; Eric J Bennett; Steven P Gygi; J Wade Harper
Journal:  Cell       Date:  2009-07-16       Impact factor: 41.582

8.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae.

Authors:  Nevan J Krogan; Gerard Cagney; Haiyuan Yu; Gouqing Zhong; Xinghua Guo; Alexandr Ignatchenko; Joyce Li; Shuye Pu; Nira Datta; Aaron P Tikuisis; Thanuja Punna; José M Peregrín-Alvarez; Michael Shales; Xin Zhang; Michael Davey; Mark D Robinson; Alberto Paccanaro; James E Bray; Anthony Sheung; Bryan Beattie; Dawn P Richards; Veronica Canadien; Atanas Lalev; Frank Mena; Peter Wong; Andrei Starostine; Myra M Canete; James Vlasblom; Samuel Wu; Chris Orsi; Sean R Collins; Shamanta Chandran; Robin Haw; Jennifer J Rilstone; Kiran Gandi; Natalie J Thompson; Gabe Musso; Peter St Onge; Shaun Ghanny; Mandy H Y Lam; Gareth Butland; Amin M Altaf-Ul; Shigehiko Kanaya; Ali Shilatifard; Erin O'Shea; Jonathan S Weissman; C James Ingles; Timothy R Hughes; John Parkinson; Mark Gerstein; Shoshana J Wodak; Andrew Emili; Jack F Greenblatt
Journal:  Nature       Date:  2006-03-22       Impact factor: 49.962

9.  Large-scale mapping of human protein-protein interactions by mass spectrometry.

Authors:  Rob M Ewing; Peter Chu; Fred Elisma; Hongyan Li; Paul Taylor; Shane Climie; Linda McBroom-Cerajewski; Mark D Robinson; Liam O'Connor; Michael Li; Rod Taylor; Moyez Dharsee; Yuen Ho; Adrian Heilbut; Lynda Moore; Shudong Zhang; Olga Ornatsky; Yury V Bukhman; Martin Ethier; Yinglun Sheng; Julian Vasilescu; Mohamed Abu-Farha; Jean-Philippe Lambert; Henry S Duewel; Ian I Stewart; Bonnie Kuehl; Kelly Hogue; Karen Colwill; Katharine Gladwish; Brenda Muskat; Robert Kinach; Sally-Lin Adams; Michael F Moran; Gregg B Morin; Thodoros Topaloglou; Daniel Figeys
Journal:  Mol Syst Biol       Date:  2007-03-13       Impact factor: 11.429

10.  The BioGRID Interaction Database: 2008 update.

Authors:  Bobby-Joe Breitkreutz; Chris Stark; Teresa Reguly; Lorrie Boucher; Ashton Breitkreutz; Michael Livstone; Rose Oughtred; Daniel H Lackner; Jürg Bähler; Valerie Wood; Kara Dolinski; Mike Tyers
Journal:  Nucleic Acids Res       Date:  2007-11-13       Impact factor: 16.971

View more
  326 in total

1.  Identification of SUMO-2/3-modified proteins associated with mitotic chromosomes.

Authors:  Caelin Cubeñas-Potts; Tharan Srikumar; Christine Lee; Omoruyi Osula; Divya Subramonian; Xiang-Dong Zhang; Robert J Cotter; Brian Raught; Michael J Matunis
Journal:  Proteomics       Date:  2015-01-07       Impact factor: 3.984

2.  Proteomic profiling of the human cytomegalovirus UL35 gene products reveals a role for UL35 in the DNA repair response.

Authors:  Jayme Salsman; Madhav Jagannathan; Patrick Paladino; Pak-Kei Chan; Graham Dellaire; Brian Raught; Lori Frappier
Journal:  J Virol       Date:  2011-11-09       Impact factor: 5.103

3.  Mapping the protein interaction network of the human COP9 signalosome complex using a label-free QTAX strategy.

Authors:  Lei Fang; Robyn M Kaake; Vishal R Patel; Yingying Yang; Pierre Baldi; Lan Huang
Journal:  Mol Cell Proteomics       Date:  2012-04-03       Impact factor: 5.911

4.  A Role for Widely Interspaced Zinc Finger (WIZ) in Retention of the G9a Methyltransferase on Chromatin.

Authors:  Jeremy M Simon; Joel S Parker; Feng Liu; Scott B Rothbart; Slimane Ait-Si-Ali; Brian D Strahl; Jian Jin; Ian J Davis; Amber L Mosley; Samantha G Pattenden
Journal:  J Biol Chem       Date:  2015-09-03       Impact factor: 5.157

5.  Relevance Rank Platform (RRP) for Functional Filtering of High Content Protein-Protein Interaction Data.

Authors:  Yuba Raj Pokharel; Jani Saarela; Agnieszka Szwajda; Christian Rupp; Anne Rokka; Shibendra Lal Kumar Karna; Kaisa Teittinen; Garry Corthals; Olli Kallioniemi; Krister Wennerberg; Tero Aittokallio; Jukka Westermarck
Journal:  Mol Cell Proteomics       Date:  2015-10-23       Impact factor: 5.911

6.  Defining the Protein-Protein Interaction Network of the Human Protein Tyrosine Phosphatase Family.

Authors:  Xu Li; Kim My Tran; Kathryn E Aziz; Alexey V Sorokin; Junjie Chen; Wenqi Wang
Journal:  Mol Cell Proteomics       Date:  2016-07-18       Impact factor: 5.911

7.  A meta-analysis of affinity purification-mass spectrometry experimental systems used to identify eukaryotic and chlamydial proteins at the Chlamydia trachomatis inclusion membrane.

Authors:  Macy G Olson; Scot P Ouellette; Elizabeth A Rucks
Journal:  J Proteomics       Date:  2019-11-21       Impact factor: 4.044

8.  Discovery of host-viral protein complexes during infection.

Authors:  Daniell L Rowles; Scott S Terhune; Ileana M Cristea
Journal:  Methods Mol Biol       Date:  2013

9.  The SHCA adapter protein cooperates with lipoma-preferred partner in the regulation of adhesion dynamics and invadopodia formation.

Authors:  Alex Kiepas; Elena Voorand; Julien Senecal; Ryuhjin Ahn; Matthew G Annis; Kévin Jacquet; George Tali; Nicolas Bisson; Josie Ursini-Siegel; Peter M Siegel; Claire M Brown
Journal:  J Biol Chem       Date:  2020-04-16       Impact factor: 5.157

10.  The GTPase regulatory proteins Pix and Git control tissue growth via the Hippo pathway.

Authors:  Lucas G Dent; Carole L C Poon; Xiaomeng Zhang; Joffrey L Degoutin; Marla Tipping; Alexey Veraksa; Kieran F Harvey
Journal:  Curr Biol       Date:  2014-12-04       Impact factor: 10.834

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.