Literature DB >> 17940529

PhosphoPep--a phosphoproteome resource for systems biology research in Drosophila Kc167 cells.

Bernd Bodenmiller¹, Johan Malmstrom, Bertran Gerrits, David Campbell, Henry Lam, Alexander Schmidt, Oliver Rinner, Lukas N Mueller, Paul T Shannon, Patrick G Pedrioli, Christian Panse, Hoo-Keun Lee, Ralph Schlapbach, Ruedi Aebersold.

Abstract

The ability to analyze and understand the mechanisms by which cells process information is a key question of systems biology research. Such mechanisms critically depend on reversible phosphorylation of cellular proteins, a process that is catalyzed by protein kinases and phosphatases. Here, we present PhosphoPep, a database containing more than 10 000 unique high-confidence phosphorylation sites mapping to nearly 3500 gene models and 4600 distinct phosphoproteins of the Drosophila melanogaster Kc167 cell line. This constitutes the most comprehensive phosphorylation map of any single source to date. To enhance the utility of PhosphoPep, we also provide an array of software tools that allow users to browse through phosphorylation sites on single proteins or pathways, to easily integrate the data with other, external data types such as protein-protein interactions and to search the database via spectral matching. Finally, all data can be readily exported, for example, for targeted proteomics approaches and the data thus generated can be again validated using PhosphoPep, supporting iterative cycles of experimentation and analysis that are typical for systems biology research.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Substances：

Year: 2007 PMID： 17940529 PMCID： PMC2063582 DOI： 10.1038/msb4100182

Source DB: PubMed Journal: Mol Syst Biol ISSN： 1744-4292 Impact factor: 11.429

Introduction

It is the premise of systems biology that biological processes are studied as integrated systems consisting of multiple interacting elements and that the basis for the system's properties is the contextual information of the elements interactions. Operationally, biological systems are frequently represented as networks and their properties are studied by iterative cycles of targeted network perturbation followed by quantitative measurement of all the system's elements (Ideker ). Networks typically studied are transcriptional networks analyzed by gene expression arrays (Schena ; Lipshutz ) and CHIP on chip assays (Ren ; Iyer ), protein interaction networks analyzed by the yeast two-hybrid systems (Fields and Song, 1989; Uetz ; Giot ) or mass spectrometry of purified protein complexes (Rigaut ; Gavin ; Gingras ; Ewing ) and genetic interactions analyzed by synthetic lethal screens (Tong ). Protein phosphorylation, a network of protein kinases and phosphatases and their respective cellular substrates, is a universal regulatory mechanism and plays a pivotal role in the control of most cellular process. Thus, the understanding of protein phosphorylation networks and their dynamic changes is of fundamental importance for systems biology (Hunter, 2000). Recently, phosphoproteomics has become a robust technique for the analysis of protein phosphorylation networks. Typically, (phospho)protein samples are digested with a protease, and the peptides are analyzed by liquid-chromatography tandem mass spectrometry (LC-MS/MS) (Aebersold and Mann, 2003). As after the digestion of a proteome phosphopeptides are present at a low concentration, it is necessary to specifically enrich them before analysis (Aebersold and Goodlett, 2001; Reinders and Sickmann, 2005). Recently, several phosphopeptide enrichment methods have been described and their performance has been compared (Bodenmiller ). They include affinity chromatography and phosphoramidate chemistry-based purification. The most commonly used affinity-based methods are immobilized metal affinity chromatography (IMAC) (Andersson and Porath, 1986) and titanium dioxide (TiO2) (Pinkse ; Larsen ). As an alternative phosphoramidate chemistry (PAC), in which the phosphopeptides are covalently captured on an amino-modified solid phase (e.g. a dendrimer (Tao ) or glass beads (Zhou ; Bodenmiller )) and are released by acid hydrolysis of the phosphoramidate bond (Zhou ; Tao ; Bodenmiller , 2007b) can be used. Using the technologies described above, several large scale data sets on protein phosphorylation have recently been published (Ficarro ; Beausoleil ; Schwartz and Gygi, 2005; Olsen ). However, a number of factors limit the usefulness of these data for systems biology research. First, the data sets are far from being complete. Second, false-positive and false-negative error rates are frequently unknown and spectra may not be accessible to independently assess the quality of peptide identification and assigned site of phosphorylation. Third, the data are mostly presented as lists of identified phosphopeptides, limiting their use for further experimentation or meta-analysis. In this report, we describe PhosphoPep, a database for phosphopeptides and phosphoproteins from Drosophila melanogaster Kc167 cells and a suite of associated software tools as a resource for systems biology research in D. melanogaster. The small genome size, short generation time, the highly developed genetic tools that can be easily combined with biochemical analysis (Bier, 2005) and the high degree of conservation of signaling pathways between the fly and humans (Reiter ) make Drosophila an ideal, but as yet largely unexplored species for systems biology. PhosphoPep contains over 10 000 high-confidence phosphorylation sites from 3472 gene models and 4583 distinct phosphoproteins, and therefore, is the as yet most completely mapped phosphoproteome of any single source. To support further experimentation and analysis of the phosphorylation data, we added to the PhosphoPep database a number of software tools. First, we implemented a search function to detect the sites of phosphorylation on individual proteins and to place phosphoproteins within cellular pathways as defined by the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database (Kanehisa ). Such pathways, along with the identified phosphoproteins can be interrogated by a pathway viewer and exported to Cytoscape (Shannon ), a software tool, which supports the integration of the data from PhosphoPep and other databases. Second, we added utilities for the use of the phosphopeptide data for targeted proteomics experiments. In a typical experiment of this type, the known phosphorylation sites of a protein or set of proteins are detected and quantified in extracts representing different cellular conditions via targeted mass spectrometry experiments such as MRM (Gerber ; Domon and Aebersold, 2006; Picotti ; Stahl-Zeng ; Wolf-Yadlin ). Third, we made the data in PhosphoPep searchable by spectral matching through SpectraST (Lam ). Specifically, for each distinct phosphopeptide ion identified in this study, all corresponding MS2 spectra were collapsed into a single consensus spectrum. Unknown query spectra can then be identified by spectral searching against the library of phosphopeptide consensus spectra. Collectively, PhosphoPep and the associated software tools and data mining utilities support the use of the data for diverse types of studies, from the analysis of the state of phosphorylation of a single protein to the detection of quantitative changes in the state of phosphorylation of whole signaling pathways at different cellular states and has been designed to enable the iterative cycles of experimentation and analysis that are typical for systems biology research.

Results and discussion

Strategy

To generate an extensive phosphopeptide map of D. melanogaster KC 167 cells, we first performed a large-scale phosphorylation site mapping project as described in the Supplementary information and Supplementary Figure S1. Briefly, as the phosphoproteome strongly depends on the cellular state, we performed tryptic digestion of protein extracts from D. melanogaster Kc167 cells grown under various conditions: nutrient-rich medium; nutrient-depleted medium; medium supplemented with insulin (a growth inducer); medium supplemented with rapamycin (a growth inhibitor); and medium containing Calyculin A, an inhibitor of protein phosphatase 1 and protein phosphatase 2A. The combined peptide sample was separated by peptide isoelectric focusing (IEF) in a free-flow electrophoresis (FFE) instrument (Malmstrom ). From each fraction phosphopeptides were isolated using three different phosphopeptide isolation methods (IMAC, TiO2 and PAC) to maximize coverage of the phosphoproteome (Bodenmiller ). Each phosphopeptide fraction was then subjected to LC-MS/MS using a high mass accuracy tandem mass spectrometer. The generated LC-MS/MS data were searched against a protein (decoy) database and the identified phosphorylation sites were validated using the PeptideProphet software tool (Keller ) or the target-decoy search strategy (Elias and Gygi, 2007). The resulting combined data set consisting of 10 118 high-confidence phosphorylation sites from 3472 gene models and 4583 distinct phosphoproteins was incorporated into the PhosphoPep database.

Assignment of fragment ion spectra to phosphopeptide sequences

The fragment ion spectra obtained in this study were assigned to (phospho)peptide sequences using the sequence database search tool Sequest (Eng ) and were investigated for two forms of errors in the data set: first, the miss-assignment of the fragment ion spectrum to a peptide sequence (Keller ; Elias and Gygi, 2007) and second, the miss-assignment of the phospho-amino acid in an otherwise correctly identified phosphopeptide (Beausoleil ). When assessing the first type of error using the statistical tool PeptideProphet (Keller ) or a decoy database (DD) (Elias and Gygi, 2007), we found that at a PeptideProphet probability score cut off value of 0.8 approximately 2.6% (1.8% DD), at a cut off of 0.9 1.5% (0.8% DD) and at a cut off of 0.99 approximately 0.2% (0% DD) of all identifications were false-positive assignments. Based on these results, we decided to upload all phosphopeptides with a PeptideProphet probability score greater than 0.8 into PhosphoPep. To assess the second type of error, the miss-assignment of the phospho-amino acid in a correctly identified phosphopeptide we used the dCn score computed by Sequest (Eng ) as described in the Supplementary information and Supplementary Figure S2. We found that a dCn value greater than 0.1 corresponds to >90% certainty in phosphorylation site assignment. Overall, the application of a dCn threshold of 0.1 yielded 10 118 distinct phosphorylation sites (PeptideProphet probability score >0.9) or 12 756 phosphorylation sites (PeptideProphet probability score >0.8). Without any dCn filter PhosphoPep contains 12 596 (PeptideProphet probability score >0.9) or 16 608 phosphorylation sites (PeptideProphet probability score >0.8).

Structural and functional properties of the identified phosphopeptides

We next analyzed the structural and functional properties, namely the distribution and number of phosphorylated residues per phosphopeptide, the molecular functions, and the biological processes and the pathways that are associated with the identified phosphoproteins along with their predicted abundance.

Distribution of phosphorylated amino acids

We found that 78% of the identified phospho-amino acids were phosphorylated on a serine, 19% on a threonine and 3% on a tyrosine. Furthermore, nearly 87% of all peptides were phosphorylated at one site, 10% at two sites and 3% at three sites. These results are slightly different from the so far assumed distribution of phospho-amino acids (Hunter and Sefton, 1980) (89% serine, 10% threonine and 1% tyrosine) and other large-scale data sets (Olsen ).

Molecular function and biological processes

To derive the molecular functions and biological processes of the identified phosphoproteins, we used ‘panther' ontology (PO) (Mi ). We also investigated whether some molecular functions or biological processes were enriched or depleted in the phosphoprotein data set compared to an external (proteome predicted from the FlyBase (r4.3) sequence database) and an internal reference (proteins identified from the peptide sample before the phosphopeptide enrichment). For both the molecular function (Figure 1A) and the biological processes (Figure 1B), all possible PO annotations were identified from the phosphoprotein data set. However, for many processes and functions, biases were visible compared to the external reference. Many of these biases can be explained by proteomics workflows, in which low-abundant, small or membrane proteins are often underrepresented (Brunner ). This is also reflected in the comparison between the internal and external reference. We therefore also contrasted the phosphoprotein data set to the internal reference detecting differences between the two proteomic data sets (Figure 1A and B).

Figure 1

Phosphoprotein properties. (A) Depletion/enrichment of molecular functions derived from ‘panther' ontology (Mi ) of the corresponding phosphoproteins (red) and the proteins identified from the separated peptides before enrichment (yellow) relative to the FlyBase database (0%) is shown. (B) Depletion/enrichment of biological functions derived from ‘panther' ontology (Mi ) of the corresponding phosphoproteins (red) and the proteins identified from the separated peptides before enrichment (yellow) compared to the FlyBase database (0%) is shown. (C) A comparison of the predicted phosphoprotein abundance (blue) with the predicted abundance (Duret and Mouchiroud, 1999) of all proteins of the used FlyBase database (pink) is shown. The scale ranges from 0 (low abundance) to 1 (highly abundant). Proteins for which no molecular function or biological process could be assigned were omitted for (A) and (B). χ2 test results for (A) and (B) are shown in Supplementary Table II.

In regards to the molecular functions and biological processes, enrichment for phosphoproteins (compared to the internal reference) involved in regulatory processes was apparent, in particular for kinases, transcription factors, ion channels (Figure 1A) or developmental processes (Figure 1B). In contrast, in the categories metabolism (lyases, isomerases and synthases) or metabolic processes (sulfur, coenzyme, carbohydrate and other metabolism) phosphoproteins were depleted (Figure 1B). The overrepresentation of kinases, transcription factors and ion channels compared to the internal reference is expected as these classes of proteins are known to be highly regulated by protein phosphorylation (Hunter, 2000). In addition, the enrichment of phosphoproteins in developmental processes indicates that these processes are highly regulated by protein phosphorylation as well.

Pathway association and abundance of identified phosphoproteins

We next investigated the depth of phosphoproteome coverage achieved by the data set. Of 118 PO pathways (Mi ) (from the FlyBase database (r4.3) (Grumbling and Strelets, 2006)) 98 were represented by the phosphoproteome data set. Most of the pathways to which no phosphoprotein could be assigned (15 of the 20) consisted of equal to or less then three proteins, thus reducing the likelihood of their detection. A comparison of the codon bias distribution (Duret and Mouchiroud, 1999) of the complete predicted D. melanogaster proteome (from the FlyBase database (r4.3)) with that of the identified phosphoproteins showed similar curves, indicating that proteins from all levels of abundance were identified (Figure 1C). Overall, these data indicate that the phosphoprotein data set reached a considerable depth of the analysis of the phosphoproteome of Kc167 cells. This finding is further strengthened by the observation that we detected proteins mapping to over 50% of so far ∼6200 gene models in D. melanogaster Kc167 cells for which a protein was detectable (Brunner ). For systems biology-based signaling research, such an in-depth coverage of phosphorylation sites is highly beneficial and strengthens the use of D. melanogaster Kc167 cells as a model organism for systems biology.

PhosphoPep—a database and associated utilities for systems biology signaling research

To increase the utility of the phosphopeptide data set described above, we organized the data in a publicly accessible relational database, PhosphoPep, and added functions supporting data mining and meta-analysis. The following sections describe the database and the added functions.

The PhosphoPep database

The consolidated D. melanogaster Kc167 cell phosphopeptide data set was uploaded to PhosphoPep, which is publicly accessible (www.phosphopep.org). PhosphoPep is a derivative of the UniPep (Zhang ) and PeptideAtlas (Desiere ) databases, connected to the Systems Biology Experiment Analysis Management System (SBEAMS; http://www.sbeams.org), a tool to collect, store and access different data types. All peptides were parsed and loaded into a relational database using SQL (structured query language). Access to the phosphorylation sites and the database is provided by a cgi web interface. We designed a ‘Search interface' that allows users to query the data using different parameters (Figure 2A). These include searches for single proteins (using the gene ID, protein name, gene symbol, swiss-prot/FlyBase accession number or amino-acid sequence) or searches for a set of proteins (identified proteins search, bulk search and pathway search) at a user-defined PeptideProphet probability score. When a search is executed, a list of all proteins that match the search criteria is shown. Each listing contains a link to view a detailed record for the respective phosphoprotein entry, called ‘protein information page'. On that page for each protein in the PhosphoPep database, four different types of information (Figure 2B) are displayed.

Figure 2

(A) Design of the PhosphoPep database. By using the ‘Search interface' (α) PhosphoPep can be interrogated for single proteins, a set of proteins or pathways. For each protein, several types of information including the observed phosphopeptides is shown in the ‘Protein information' page (see panel B and β. Single proteins or a set of proteins can be placed into their pathways (χ). From this ‘Pathway view' all phosphoproteins can be exported to Cytoscape (Shannon ) (δ). This software tool allows integrating data from PhosphoPep with external data such as protein–protein interaction networks (ɛ). For most phosphopeptides, consensus MS2 spectra (φ) are given which can be exported for targeted proteomics experiments such as multiple reaction monitoring (Domon and Aebersold, 2006) (γ). As we supply an online spectral matching search tool, results generated by such experiments can be validated using PhosphoPep. (B) Representative output of the PhosphoPep database. The PhosphoPep (www.phosphopep.org) database contains more than 10 000 phosphorylation sites from nearly 3500 gene models and nearly 5800 phosphoproteins derived from the FlyBase (Grumbling and Strelets, 2006) nonredundant database (r4.3). For each phosphoprotein, the phosphopeptide sequence, the protein annotation and the predicted subcellular location is shown. Furthermore, additional information for each phosphopeptide is given: The probability, the number of tryptic ends, the dCn value, the mass, how often it was observed and to how many gene models and transcripts it maps. The phosphopeptides are represented in both the protein sequence and in a graphical representation, the protein map. Finally, a link to the ‘Pathway view', to the ‘Cytoscape export' function and to http://scansite.mit.edu/ (Obenauer ) is given as represented by the three symbols besides the FlyBase gene entry.

The first section, ‘Protein info,' indicates the protein database ID, the protein name (including synonyms), and a protein summary. The ‘Protein info' section also contains three links represented by symbols. The first link queries the protein sequence for potential kinase motives using the Scansite (Obenauer ) algorithm. The second link displays all KEGG pathways in which the respective phosphoprotein is represented and the third link allows exporting the phosphoprotein to the Cytoscape software (see ‘Pathway search, pathway building and data integration'). Additionally, the ‘Protein info' section categorizes the subcellular location of the proteins into cell surface, secreted, transmembrane or intracellular (Nielsen ; Krogh ). The second section displays the ‘Observed phosphopeptides'. For every protein, all phosphopeptides identified in the data set are shown. To allow the user to assess the quality of the phosphopeptide assignment, the PeptideProphet (Keller ) score is given as well as the number of tryptic ends, the mass of the phosphopeptide, the dCn value (Eng ), a link to the MS2 consensus spectrum and a link to export the consensus spectrum ion values for targeted proteomic approaches (See consensus spectra section below). In addition unambiguously assigned phosphorylation sites (dCn>0.1) are highlighted in red and ambiguous sites (dCn<0.1) are highlighted in yellow. Finally, for each phosphopeptide, it is indicated if it maps to a single protein or to several, an important aspect for quantitative targeted proteomics experiments. In the third section, ‘Protein/Peptide sequence', the whole sequence of the respective phosphoprotein is shown with the identified phosphopeptides, the site(s) of phosphorylation and transmembrane regions, which are highlighted to give a general overview. In the forth section ‘Protein/Peptide map', the phosphopeptides and the phosphorylation sites are shown according to their position in the protein sequence, thereby giving an indication of the general protein topology.

Pathway search, pathway building and data integration

To build pathways and query the phosphorylation state of the constituent proteins, we placed a protein or proteins contained in PhosphoPep within pathways retrieved from KEGG (Kanehisa ) (‘Pathway view', Figure 2A). Proteins can be placed into ‘Pathway view' from both the ‘Search interface' as well as from the ‘Protein information' page of a given protein. ‘Pathway view' also retrieves from PhosphoPep and displays all other identified phosphoproteins of a particular pathway. A ‘Bulk search' option allows placing all of the proteins within their respective pathways. Finally, each pathway can readily be exported, annotated with the relevant phosphoprotein information to ‘Cytoscape' (Shannon ). Cytoscape is a generic visualization tool to integrate and visualize different data types. In this case, the phosphoprotein information contained in PhosphoPep can be complemented with additional data types, such as biomolecular interaction networks, accessible through the web. To facilitate the retrieval of relevant information, ‘Cytoscape' is automatically linked to ‘Gaggle' (Shannon ). Gaggle is an informatics-working environment in which information from different web resources can be retrieved and imported into the Cytoscape environment.

Consensus spectra: a searchable fragment ion representation of the phosphoproteome

The analysis of proteomic data sets carries a large computational overhead. This is particularly true for spectra of phosphopeptides, due to their particular fragmentation characteristics and increased peptide search space in database searching. Furthermore, targeted proteomic workflows are emerging in which sets of specific analytes, for example, the phosphorylation sites on proteins constituting a signaling pathway are analyzed under varying cellular conditions (Domon and Aebersold, 2006; Wolf-Yadlin ). To support the rapid (Supplementary Figure S3A), highly sensitive (Supplementary Figure S3B and Supplementary Table I) and reliable identification of phosphopeptides in future experiments and targeted mass spectrometry by MRM, we built a searchable consensus spectral library of most identified peptides in PhosphoPep, and made them available in a searchable and downloadable form (Figure 2A). By using the spectral matching search tool SpectraST (Lam ), both as a web interface in PhosphoPep, and as a stand-alone application released as part of the TPP suite of software (Keller ), spectra can be searched against the phosphopeptide consensus library (see also Supplementary information). To support MRM-based targeted proteomic experiments, we provide a download function for consensus spectra representing a specific phosphopeptide (Domon and Aebersold, 2006; Picotti ; Stahl-Zeng ; Wolf-Yadlin ). Such spectra can be a useful start for the optimization of precursor ion to fragment ion transitions for MRM experiments, for example by performing MRM-triggered MS2 experiments searchable against the phosphopeptide consensus spectra library (Lam ). Overall, these functionalities are highly useful for researchers focused on single proteins and especially for systems biologists who wish to conduct iterative cycles of experimentation and analysis on differentially perturbed cell states.

Assessment of the identified phosphoproteome

There is no ‘gold standard' phosphoproteome data set that could be used to assess the extent to which the Kc167 phosphoproteome has been mapped out. To further investigate the achieved phosphoproteome coverage, we compared the phosphorylation sites from our data set that matched the highly conserved (Oldham ; Garofalo, 2002) and clinically relevant insulin/TOR pathway with the already known sites in D. melanogaster. The results are shown in Figure 3. Of the 15 pathway members, 6 (dAKT1, CHICO, dFOXO, dTSC2, dS6K and d4E-BP) have been known to be phosphorylated in D. melanogaster. In our data set, we found all 15 members to be phosphorylated. Furthermore, for the proteins for which phosphorylation sites have been published previously, we were able to identify multiple new sites. The most prominent example is the insulin receptor substrate, CHICO, for which the number of known phosphorylation sites increased from 2 to 20. For dFOXO and d4E-BP, we identified all, and for dS6K, we identified one already known phosphorylation sites. For dAKT1, CHICO and dTSC2, the already known sites were not found in our experiments, indicating that in spite of the high number of sites identified in this study the KC167 phosphoproteome is likely not complete at this time (see Supplementary information).

Figure 3

Proteins involved in the target of rapamycin (TOR) and insulin signaling. To demonstrate the usefulness of our database, we compared the already known phosphoproteins (left) with our identified phosphoproteins (right). As can be seen, compared to the literature in which only 6 out of the 15 proteins were found to be phosphorylated, we extended the phosphorylation map to all proteins of the pathway (Hay and Sonenberg, 2004; Oldham and Hafen, 2003) (phosphorylation sites are depicted by the P in a red circle, the number assigns the number of distinct phosphorylation sites). The number of identified phosphorylations ranged from 1 to 20 (CHICO). Peptides with P>0.8 and a defined phosphorylation site (dCn>0.1) were considered.

This example shows that we have reached a depth in phosphoproteome coverage that is suitable for systems biology signaling research in D. melanogaster and, due to a myriad of orthologous sites (Reiter ), also in other species.

Materials and methods

All chemicals, if not otherwise mentioned, were bought with the highest available purity from Sigma-Aldrich, Taufkirchen, Germany.

Cell culture, lysis and protein digestion

D. melanogaster Kc167 cells were grown in Schneiders Drosophila medium (Invitrogen) supplemented with 10% fetal calf serum, 100 U penicillin (Invitrogen) and 100 μg/ml streptomycin (Invitrogen, Auckland, New Zealand) in an incubator at 25°C. To increase the number of mapped phosphorylation sites, different batches of cells were pooled. Cells were either grown in rich medium, or were serum-starved, or were treated for 30 min with 100 nM Rapamycin (LClabs, Woburn, MA, USA) in rich medium, or were treated for 30 min with 100 nM insulin (serum starved), or were treated for 30 min with 100 nM Calyculin A (rich medium). Then the cells were washed with ice-cold phosphate-buffered saline and resuspended in ice-cold lysis buffer containing 10 mM HEPES, pH 7.9, 1.5 mM MgCl2, 10 mM KCl, 0.5 mM dithiothreitol and a protease inhibitor mix (Roche, Basel, Switzerland). To preserve protein phosphorylation, several phosphatase inhibitors were added to a final concentration of 20 nM calyculin A, 200 nM okadaic acid, 4.8 μm cypermethrin (all bought from Merck KGaA, Darmstadt, Germany), 2 mM vanadate, 10 mM sodium pyrophosphate, 10 mM NaF and 5 mM EDTA. After 10 min incubation on ice, cells were lysed by douncing. Cell debris and nuclei were removed by centrifugation for 10 min at 4°C using 5500 g. Then the cytoplasmic and membrane fraction were separated by ultracentrifugation at 100 000 g for 60 min at 4°C. The proteins of the cytosolic fraction (supernatant) were subjected to acetone precipitation. The protein pellets were resolubilized in 3 mM EDTA, 20 mM Tris–HCl, pH 8.3, and 8 M urea. The disulfide bonds of the proteins were reduced with tris (2-carboxyethyl) phosphine at a final concentration of 12.5 mM at 37°C for 1 h. The produced free thiols were alkylated with 40 mM iodoacetamide at room temperature for 1 h. The solution was diluted with 20 mM Tris–HCl (pH 8.3) to a final concentration of 1.0 M urea and digested with sequencing-grade modified trypsin (Promega, Madison, WI) at 20 μg per mg of protein overnight at 37°C. Peptides were desalted on a C18 Sep-Pak cartridge (Waters, Milford, MA) and dried in a speedvac. Finally, 280 mg of peptides were separated by IEF using FFE.

Peptide separation

FFE-Weber reagent basic kit (Prolyte 1, Prolyte 2, Prolyte 3 and Prolyte 4–7 and pI markers) were purchased from FFE-Weber Inc. (now BD-Diagnostics, NJ, USA). Hydroxyisobutyric acid, DL-2-aminobutyric acid, nicotinamide, glycyl-glycine and ethanolamine were purchased from Sigma-Aldrich (Steinheim, Germany), AMPSO and HEPES from Roth (Karslruhe, Germany) and TAPS from ACROS (NJ, USA).

Free-flow electrophoresis

IEF was performed using an FFE instrument, type prometheus from FFE Weber Inc. (now BD-Diagnostics, PAS). For a detailed description of the experimental procedure, please see Malmstrom ). The digested peptides were diluted in separation media containing 8 M Urea and 250 mM Mannitol and 20% ProLyte solution at a concentration of 10 mg/ml. This sample was loaded continuously for 1 h at 1 ml/h. Total collection time was 24 h and the volume of each collected fraction was about 25–50 ml. A Thermo Orion needle tip micro pH electrode (Thermo Electron Corporation, Beverly, MA) was used to measure the pH value of each fraction. Peptides from the FFE fractions 18–60 were purified on a C18 Sep-Pak cartridge (Waters Corporation, Milford, MA, USA). After purification, the eluted peptides where split into three fractions (one fraction was used for phosphopeptide isolation using PAC, one for TiO2 and one for IMAC) and dried down and used for phosphopeptide isolation.

Phosphopeptide isolation

The phosphopeptides were isolated using PAC, IMAC and TiO2 as described by Bodenmiller , 2007b).

MS analysis

The majority of samples were analyzed on a hybrid LTQ-Orbitrap mass spectrometer (ThermoFischer Scientific, Bremen, Germany) interfaced with a nanoelectrospray ion source. Chromatographic separation of peptides was achieved on an Eksigent nano LC system (Eksigent Technologies, Dublin, CA, USA), equipped with a 11 cm fused silica emitter, 75 μm inner diameter (BGB Analytik, Böckten, Switzerland), packed in-house with a Magic C18 AQ 3 μm resin (Michrom BioResources, Auburn, CA, USA). Peptides were loaded from a cooled (4°C) Spark Holland auto sampler and separated using ACN/water solvent system containing 0.1% formic acid with a flow rate of 200 nl/min. Peptide mixtures were separated with a gradient from 3 to 35% ACN in 90 min. Up to five data-dependent MS2 spectra were acquired in the linear ion trap for each FT-MS spectral acquisition range, the latter acquired at 60 000 FWHM nominal resolution settings with an overall cycle time of approximately 1 s. Charge state screening was employed to select for ions with two charges and rejecting ion with one or undetermined charge state. The same sample was injected a second time with the same setting besides the charge state screening, which was then set to three and higher (excluding 1, 2 and undetermined charge state). For injection control, the automatic gain control was set to 5e5 and 1e4 for full FTMS and linear ion trap MS2, respectively. The instrument was calibrated externally according to manufacturers instructions. The samples were acquired using internal lock mass calibration on m/z 429.088735 and 445.120025. For some pre-experiments and re-measurements, a hybrid LTQ-FTICR mass spectrometer (Thermo, San Jose, CA) interfaced with a nanoelectrospray ion source was used. Chromatographic separation of peptides was achieved on an Agilent Series 1100 LC system (Agilent Technologies, Waldbronn, Germany), equipped with an 11 cm fused silica emitter, 150 μm inner diameter (BGB Analytik, Böckten, Switzerland), packed in-house with a Magic C18 AQ 5 μm resin (Michrom BioResources, Auburn, CA, USA). Peptides were loaded from a cooled (4°C) Agilent auto sampler and separated with a linear gradient of ACN/water, containing 0.15% formic acid, with a flow rate of 1.2 μl/min. Peptide mixtures were separated with a gradient from 2 to 30% ACN in 90 min. Three MS2 spectra were acquired in the linear ion trap per each FT-MS scan, the latter acquired at 100 000 FWHM nominal resolution settings with an overall cycle time of approximately 1 s. Charge state screening was employed to select for ions with at least two charges and rejecting ions with undetermined charge state. For each peptide sample, a standard data-dependent acquisition method on the three most intense ions per MS-scan was used and a threshold of 200 ion counts was used for triggering an MS2 attempt.

Data analysis

The MS2 data were searched against the FlyBase (Release 4.3) (Grumbling and Strelets, 2006) nonredundant database containing 19 465 proteins using SORCERER-SEQUEST(TM) v3.0.3, which was run on the SageN Sorcerer2 (Thermo Electron, San Jose, CA, USA). For the in silico digest, trypsin was defined as protease, cleaving after K and R (if followed by P the cleavage was not allowed). Two missed cleavages and one nontryptic terminus were allowed for the peptides that had a maximum mass of 6000 Da. The precursor ion tolerance was set to 5 p.p.m. and the fragment ion tolerance was set to 0.8 Da. Before searching using Sequest, the neutral loss peaks were removed and indicated as described previously (Bodenmiller ). Then data were searched (for IMAC and TiO2) allowing phosphorylation (+79.9663 Da) of serine, threonine and tyrosine as a variable modification and carboxyamidomethylation of cysteine (+57.0214 Da) residues as a fixed modification. For PAC, in addition to the just mentioned modifications, the methylation (+14.0156 Da) of all carboxylate groups as a static modification was also defined. In the end, the search results obtained by Sequest were subjected to statistical filtering using PeptideProphet (V3.0) (Keller ) and ProteinProphet(V3.0) (Keller ). Proteins identified that way were used for the analysis in Figure 1A and B. The proteins were queried using the ‘panther classification system' (Mi ) http://www.pantherdb.org/ by using the batch search. FlyBase (r4.3) was used as reference (Grumbling and Strelets, 2006) (0% depletion/enrichment). Significance of the biases was determined using a χ2 test. If the same analysis is carried out using all proteins from PhosphoPep (PeptideProphet P>0.9; in the construction of PhosphoPep each peptide identified using PeptideProphet (with P>0.8) was mapped against each possible protein derived from the FlyBase database (r4.3)) basically the same biases (with similar significances) as shown in Figure 1A and B were visible if queried using the ‘panther classification system' (Mi ) http://www.pantherdb.org/ by using the batch search. To determine the certainty of the assignment of a phosphate group to a hydroxyamino acid, the dCn was used as it has been shown recently that it directly correlates with the certainty of phosphorylation site assignment (Beausoleil ). To estimate a dCn cut-off to consider a site well assigned (>90% certainty), the following assumption was made: as many of our phosphopeptides were sequenced more than once, an uncertainty in the phosphorylation site assignment will result in several ‘versions' of a phosphopeptide, namely that the amino-acid sequence is identical but that the site of phosphorylation is different. After consolidation of the phosphopeptides using the computer program ‘Phosphogigolo' (Bodenmiller ), we computed for a given dCn value the percentage of peptides that have the same amino-acid sequence (ignoring the phosphate group and the fact that a peptide can exist in two phosphorylation states with a high certainty of phosphorylation site assignment). Finally, the ‘percent' ambiguous was computed by 2 × (percentage of redundant ‘stripped' peptide entries) (Supplementary Figure S2).

Decoy database search strategy

The decoy database was designed in the following way: FlyBase database (r4.3) was in silico digested using trypsin. Then the amino acids of these peptides were scrambled except for the c-terminal lysine or arginine. Proteins were reconstructed by the scrambled peptides and the label Rev_ was added to the protein names. This resulted in a protein database with half the proteins being original and the other half concatenated from the scrambled peptides. This decoy protein database gives rise to peptides with approximately the same length distribution as the original database. The false-positive rate was estimated as described by Elias and Gygi (2007).

Creation of the consensus spectral library

The PeptideProphet-processed SEQUEST search result from all LC-MS/MS runs performed on either a LTQ-Orbitrap or LTQ-FT mass spectrometer was screened for spectra that are identified above a probability threshold of 0.9 and a dCn value of 0.1. A total of over 170 000 confidently identified spectra mapping to about 33 000 distinct peptide ions were collected. The spectra identified to the same peptide ion (replicates) were then grouped, and collapsed into a single consensus spectrum. The corresponding peaks in the replicates are m/z-aligned, and only peaks that are present in a majority of the replicates are included in the consensus, making no assumption about the possible identities of the fragments. The consensus intensity of each peak is calculated as the average of the peak intensities in the replicates, weighted by a measure of the varying spectral quality of the replicates. For peptide ions for which only a single observation is made, the raw spectrum is included after simple noise reduction. All the resulting spectra are then annotated and indexed for fast searching. The details of the consensus spectrum building algorithm, as well as the software to perform it, will be provided in a future publication. For the comparison of SpectraST and the Sequest database search algorithms in regards of search speed, two test data sets were used. For the LTQ-Orbitrap, a randomly chosen data set with 10 166 spectra and for the LTQ data set randomly chosen 27 556 spectra were used. SpectraST was run on a single processor while SORCERER-SEQUEST(TM) v3.0.3, which was run on the SageN Sorcerer2. For the database search, a 5 p.p.m. parental mass tolerance was used for the Orbitrap data set and 3 Da for the LTQ data set. The sensitivity and error curves were determined using the PeptideProphet (Keller ) (Supplementary Figure S3). For the comparison of between SpectraST and the Sequest database search algorithms in regards of achieved sensitivity/identifications three randomly chosen test data set were used for each IMAC, TiO2 and PAC. After database search (SpectraST was run on a single processor, SORCERER-SEQUEST(TM) v3.0.3 was run on the SageN Sorcerer2) the sensitivity and error curves were determined using the PeptideProphet (Keller ) (Supplementary Table I).

Determination of protein abundance based on codon bias

As described previously (Duret and Mouchiroud, 1999) for all proteins, the abundance, ranging from 1 (highly abundant) to 0 (very low abundant), was calculated (Figure 1C).

59 in total

1. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes.

Authors: A Krogh; B Larsson; G von Heijne; E L Sonnhammer
Journal: J Mol Biol Date: 2001-01-19 Impact factor: 5.469

2. Genome-wide location and function of DNA binding proteins.

Authors: B Ren; F Robert; J J Wyrick; O Aparicio; E G Jennings; I Simon; J Zeitlinger; J Schreiber; N Hannett; E Kanin; T L Volkert; C J Wilson; S P Bell; R A Young
Journal: Science Date: 2000-12-22 Impact factor: 47.728

3. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search.

Authors: Andrew Keller; Alexey I Nesvizhskii; Eugene Kolker; Ruedi Aebersold
Journal: Anal Chem Date: 2002-10-15 Impact factor: 6.986

4. An integrated chemical, mass spectrometric and computational strategy for (quantitative) phosphoproteomics: application to Drosophila melanogaster Kc167 cells.

Authors: Bernd Bodenmiller; Lukas N Mueller; Patrick G A Pedrioli; Delphine Pflieger; Martin A Jünger; Jimmy K Eng; Ruedi Aebersold; W Andy Tao
Journal: Mol Biosyst Date: 2007-02-19

5. Functional organization of the yeast proteome by systematic analysis of protein complexes.

Authors: Anne-Claude Gavin; Markus Bösche; Roland Krause; Paola Grandi; Martina Marzioch; Andreas Bauer; Jörg Schultz; Jens M Rick; Anne-Marie Michon; Cristina-Maria Cruciat; Marita Remor; Christian Höfert; Malgorzata Schelder; Miro Brajenovic; Heinz Ruffner; Alejandro Merino; Karin Klein; Manuela Hudak; David Dickson; Tatjana Rudi; Volker Gnau; Angela Bauch; Sonja Bastuck; Bettina Huhse; Christina Leutwein; Marie-Anne Heurtier; Richard R Copley; Angela Edelmann; Erich Querfurth; Vladimir Rybin; Gerard Drewes; Manfred Raida; Tewis Bouwmeester; Peer Bork; Bertrand Seraphin; Bernhard Kuster; Gitte Neubauer; Giulio Superti-Furga
Journal: Nature Date: 2002-01-10 Impact factor: 49.962

Review 6. Genetic control of size in Drosophila.

Authors: S Oldham; R Böhni; H Stocker; W Brogiolo; E Hafen
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2000-07-29 Impact factor: 6.237

7. A novel genetic system to detect protein-protein interactions.

Authors: S Fields; O Song
Journal: Nature Date: 1989-07-20 Impact factor: 49.962

8. High sensitivity detection of plasma proteins by multiple reaction monitoring of N-glycosites.

Authors: Jianru Stahl-Zeng; Vinzenz Lange; Reto Ossola; Katrin Eckhardt; Wilhelm Krek; Ruedi Aebersold; Bruno Domon
Journal: Mol Cell Proteomics Date: 2007-07-20 Impact factor: 5.911

9. From genomics to chemical genomics: new developments in KEGG.

Authors: Minoru Kanehisa; Susumu Goto; Masahiro Hattori; Kiyoko F Aoki-Kinoshita; Masumi Itoh; Shuichi Kawashima; Toshiaki Katayama; Michihiro Araki; Mika Hirakawa
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

10. UniPep--a database for human N-linked glycosites: a resource for biomarker discovery.

Authors: Hui Zhang; Paul Loriaux; Jimmy Eng; David Campbell; Andrew Keller; Pat Moss; Richard Bonneau; Ning Zhang; Yong Zhou; Bernd Wollscheid; Kelly Cooke; Eugene C Yi; Hookeun Lee; Elaine R Peskind; Jing Zhang; Richard D Smith; Ruedi Aebersold
Journal: Genome Biol Date: 2006-08-10 Impact factor: 13.583

85 in total

1. Identification and validation of inhibitor-responsive kinase substrates using a new paradigm to measure kinase-specific protein phosphorylation index.

Authors: Xiang Li; Varsha Rao; Jin Jin; Bin Guan; Kenna L Anderes; Charles J Bieberich
Journal: J Proteome Res Date: 2012-06-18 Impact factor: 4.466

2. Mitotic exit control of the Saccharomyces cerevisiae Ndr/LATS kinase Cbk1 regulates daughter cell separation after cytokinesis.

Authors: Jennifer Brace; Jonathan Hsu; Eric L Weiss
Journal: Mol Cell Biol Date: 2010-12-06 Impact factor: 4.272

Review 3. Phosphoproteomic analysis: an emerging role in deciphering cellular signaling in human embryonic stem cells and their differentiated derivatives.

Authors: Brian T D Tobe; Junjie Hou; Andrew M Crain; Ilyas Singec; Evan Y Snyder; Laurence M Brill
Journal: Stem Cell Rev Rep Date: 2012-03 Impact factor: 5.739

4. Phosphoproteome of Toxoplasma gondii Infected Host Cells Reveals Specific Cellular Processes Predominating in Different Phases of Infection.

Authors: Cheng He; Ai-Yuan Chen; Hai-Xia Wei; Xiao-Shuang Feng; Hong-Juan Peng
Journal: Am J Trop Med Hyg Date: 2017-07 Impact factor: 2.345

5. In silico analysis of phosphoproteome data suggests a rich-get-richer process of phosphosite accumulation over evolution.

Authors: Nozomu Yachie; Rintaro Saito; Junichi Sugahara; Masaru Tomita; Yasushi Ishihama
Journal: Mol Cell Proteomics Date: 2009-01-09 Impact factor: 5.911