Literature DB >> 23093603

The Online Protein Processing Resource (TOPPR): a database and analysis platform for protein processing events.

Niklaas Colaert¹, Davy Maddelein, Francis Impens, Petra Van Damme, Kim Plasman, Kenny Helsens, Niels Hulstaert, Joël Vandekerckhove, Kris Gevaert, Lennart Martens.

Abstract

We here present The Online Protein Processing Resource (TOPPR; http://iomics.ugent.be/toppr/), an online database that contains thousands of published proteolytically processed sites in human and mouse proteins. These cleavage events were identified with COmbinded FRActional DIagonal Chromatography proteomics technologies, and the resulting database is provided with full data provenance. Indeed, TOPPR provides an interactive visual display of the actual fragmentation mass spectrum that led to each identification of a reported processed site, complete with fragment ion annotations and search engine scores. Apart from warehousing and disseminating these data in an intuitive manner, TOPPR also provides an online analysis platform, including methods to analyze protease specificity and substrate-centric analyses. Concretely, TOPPR supports three ways to retrieve data: (i) the retrieval of all substrates for one or more cellular stimuli or assays; (ii) a substrate search by UniProtKB/Swiss-Prot accession number, entry name or description; and (iii) a motif search that retrieves substrates matching a user-defined protease specificity profile. The analysis of the substrates is supported through the presence of a variety of annotations, including predicted secondary structure, known domains and experimentally obtained 3D structure where available. Across substrates, substrate orthologs and conserved sequence stretches can also be shown, with iceLogo visualization provided for the latter.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2012 PMID： 23093603 PMCID： PMC3531153 DOI： 10.1093/nar/gks998

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

More than two percent of all human and mouse genes encode proteases. These enzymes control many biological processes and are of crucial importance for relatively simple processes such as food digestion, as well as for highly regulated proteolytic cascades such as controlled cell death or blood coagulation. In addition, misregulated protease activities add to the severity of several pathologies, including cancer, cardiovascular and inflammatory diseases. It is commonly recognized that a more detailed understanding of protease-controlled or protease-affected processes can be achieved by extending our overall knowledge on proteases, their (preferred) substrates and specificities (1). The N-terminal COFRADIC (COmbinded FRActional DIagonal Chromatography) technique developed in our laboratory enables isolation and identification of protein N-terminal peptides using peptide chromatography and mass spectrometry (MS) (2). Given that protein processing induces new N- and C-terminal protein ends, the resulting neo-N-terminal peptides are also isolated and identified, and represent proxies for the actual cleavage position in the protease substrate (3). Recently, a similar C-terminal COFRADIC technique was developed to select and identify (neo-) C-terminal peptides, thus also identifying processing events (4). These COFRADIC techniques made the identifications of large numbers of processing events possible, and these techniques are applicable for both individual proteases and in cellular setups in which several proteases are active (3,5–12). It should be noted that besides the COFRADIC technologies, other mass spectrometry-based technologies were recently introduced to identify protein processing events [recently reviewed in (13)]. A number of databases are currently available that disseminate protein processing events. The MEROPS database is specialized in proteases and their classification, and stores substrates linked to proteases (14). However, this database does not easily allow a user to perform specific meta-analyses such as a direct comparison among substrates. PMAP/CutDB on the other hand is a community-driven (Wikipedia style) database, implying that any scientist can add new substrates or substrate predictions (15,16). The intrinsic disadvantage is that the quality of the reported substrates cannot be guaranteed, especially because the original data leading to the discovery of a substrate can only be accessed by using the provided links to the original article. CASBAH, developed in 2007 as part of a review article on proteolytic processes in dying cells, stores processing sites of caspases only (17). It is also important to note that a processing event in a substrate stored in CutDB or CASBAH is not necessarily linked to the actual cleavage position in the substrate. Recently, the TopFIND 2.0 database has been released as well, offering a protein-centric knowledgebase on protein termini, including modifications and processing events (18). TopFIND comes equipped with data mining and analysis tools, but is ultimately based on curated (imported) data from experimental data sets and existing databases, losing any direct connection to the underlying experimental data in the process. As such, it resembles the UniProtKB/Swiss-Prot model, albeit with more integrated analysis tools. Finally, the ApoptoProteomics database was also recently launched, but as can be derived from its name, this database centralizes on protein processing found in apoptotic cells from different origins (19). In conclusion, no single database exists today that provides complete data provenance, an essential feature in guaranteeing data quality, with only TopFIND 2.0 and ApoptoProteomics providing support for further (meta-) analyses on both proteases and their substrates. We here present The Online Protein Processing Resource (TOPPR) that stores published processing events identified in our laboratory by N- and C-terminal COFRADIC technologies. TOPPR makes our data available through an easy and intuitive analysis platform. Furthermore, the application provides a user interface that is specifically tailored to verify the actual MS/MS data that led to the identification of the reported processed sites. In fact, the Mascot identification score (20), corresponding threshold score and confidence level are easily checked for every peptide reporting a processing event. Additionally, the b- and y-ion annotated MS/MS spectrum can be viewed and downloaded using the PRIDE spectrum viewer (21). These annotations are derived from the underlying ms_lims processing pipeline (22) that is in turn built on the MascotDatfile library for reading and interpreting Mascot search results (23). Full data provenance is thus guaranteed, making it simple for the user to check data quality. Note that this implies that TOPPR is exclusively dedicated to displaying experimentally observed cleavages sites, and that the system does not include any predicted cleavage sites. The focus on empirical mass spectrometry-based proteomics data also means that TOPPR will under-represent any cleavages sites that are difficult to detect with this technology, notably in the case of heavily modified peptides. TOPPR also supports user-level security, allowing data to be kept private to one or more authenticated users before publication. All information in TOPPR is stored in a MySQL database (see Supplementary Figure S1 for the relational schema), and query results are generated by JavaServer Pages and Java Servlet technologies running on an Apache Tomcat server infrastructure. TOPPR is released under the permissive Apache2 open source license, and all source code can be downloaded from http://code.google.com/p/toppr/. At the time of writing, TOPPR contains 2234 substrates, for 18 studied treatments or peptidases, resulting in 27 147 cleavages. To navigate these data, TOPPR provides three different search methods. The first method, the parameter search, is used to find all substrates for one treatment or a combination of treatments, where a treatment corresponds to either a cellular stimulus in an in vivo/in cellulo assay or, alternatively, a protease used in an in vitro assay (i.e. a protease added to a cell lysate). The user selects one or more treatments from a list of published treatments, and can perform a range of set operations on these to create specific queries. Furthermore, this query interface allows even more fine-grained retrieval options to be specified, including combinations of treatments that result in the same site being cleaved (processing ‘hot spots’). Second, a UniProtKB/Swiss-Prot search is provided by which users can use a UniProtKB/Swiss-Prot accession number, a UniProtKB/Swiss-Prot entry name or a fraction of the corresponding protein description as the search string. This method will reveal all stored processed sites linked to the specified substrate. Third, a motif search enables users to search for substrates containing processed sites that match the user-defined protease specificity profile. This specificity profile is defined in two parts: a pre-site (non-primed sites) and a post-site motif (primed sites) (24), with each motif defined using a simplified regular expression syntax. TOPPR also supports two types of analysis: analysis of the processed sites and corresponding protease specificity and, in addition, detailed analysis of individual substrates. The analysis of processed sites to infer protease specificity can be carried out by using integrated tools like iceLogo [probability-based visualization of significantly enriched/depleted residues in aligned sequences (25)], Weblogo [sequence logos (26)], PoPS [prediction of protease specificity (27)] and JalView [multiple sequence viewer and analysis tool (28)]. The list of processed sites used as input for these tools is extracted from TOPPR through the data retrieval options listed earlier, or as a manually selected subset of sites. The detailed analysis of individual substrates, on the other hand, is provided in TOPPR through a variety of integrated substrate metadata whenever available. First of all, processing events by different proteases are readily visualized using the substrate sequence view of TOPPR (see Figure 1). Additionally, Smart (29) and Pfam (30) annotations can be shown alongside reported processing events at the substrate level, thus indicating possible processing-derived interference with substrate function. Conservation of processing events among different species can be studied via a built-in function that globally aligns the surrounding sequences of processed sites in all known substrate homologs or orthologs (found in the HomoloGene database). Where available, TOPPR provides 3D structures (visualized with JMol) of the substrates. Processing events are indicated in these structures using a ball and stick configuration, whereas the rest of the polypeptide chain is in the cartoon configuration. This visualization makes it straightforward to assess the processing event in the context of the substrate’s 3D structure. Finally, the UniProtKB/Swiss-Prot annotated secondary structure elements, or, if unavailable, a secondary structure prediction (31) can be shown in the sequence view, facilitating substrate examination in the absence of 3D structures.

Figure 1.

The substrate sequence view in TOPPR. This display provides an overview of the protein sequence and links to dynamic annotations (here secondary structure and domain annotation have been selected). A protein bar representation, below the sequence, represents the full length of the protein with processing events indicated; immediately underneath, the domain visualization is shown. Below this protein-centric sequence view, details are shown on the individual peptides that were found to represent the annotated cleavage sites. Each peptide sequence in turn links to a peptide-centric view, where motif analyses, mass spectrometry data and all matching proteins can be found. Note that the peptide is annotated with its start and end coordinates on the protein. The TOPPR database is continuously updated with novel findings, keeping track of all published protease cleavage sites identified by COFRADIC (or related) technologies in our laboratory. Over time, more treatments and substrates will therefore be included, leading to an increasingly comprehensive database of cleavage sites for the most abundant eukaryotic intracellular proteases (e.g. human and mouse caspases, granzymes, calpains and cathepsins). Furthermore, through the ability to transmit the data associated with an entire project from one ms_lims system to another via the Internet, TOPPR can easily receive incoming data from third parties. Users of ms_lims need only contact the authors for a username and password to connect to the TOPPR-linked ms_lims installation at the authors’ laboratories, at which point the project transmission application of ms_lims allows the data to be transmitted with a single click, ensuring its downstream uptake in TOPPR as well. Note that the reliance on ms_lims as the underlying data processing and management platform implicitly ensures consistency and comparable quality across all assembled data. Apart from its role as a data storage system, TOPPR also serves as a powerful exploration platform to verify data quality, assess protease specificity, perform processing site motif analyses and carry out detailed substrate analyses. An online user manual provides detailed information on the available search methods and types of analysis. TOPPR thus provides a powerful platform for discovery and analysis to both protease researchers and scientists studying a single substrate.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online: Supplementary Figure 1.

FUNDING

Ghent University Multidisciplinary Research Partnership ‘Bioinformatics: from nucleotides to networks’; Fund for Scientific Research (FWO)—Flanders (Belgium) (postdoctoral research fellowship to F.I., P.V.D. and K.H.); Institute for the Promotion of Innovation through Science and Technology in Flanders (IWT-Vlaanderen) (PhD to K.P.); ProteomeXchange project, funded by the European Union 7th Framework Program under grant agreement number [260558 to N.H.]; PRIME-XS project, funded by the European Union 7th Framework Program under grant agreement number [262067 to K.G. and L.M.]. Funding for open access charge: VIB, Ghent, Belgium. Conflict of interest statement. None declared.

31 in total

Review 1. Who gets cut during cell death?

Authors: Francis Impens; Joël Vandekerckhove; Kris Gevaert
Journal: Curr Opin Cell Biol Date: 2010-09-16 Impact factor: 8.382

2. The substrate specificity profile of human granzyme A.

Authors: Petra Van Damme; Sebastian Maurer-Stroh; Han Hao; Niklaas Colaert; Evy Timmerman; Frank Eisenhaber; Joël Vandekerckhove; Kris Gevaert
Journal: Biol Chem Date: 2010-08 Impact factor: 3.915

3. Complementary positional proteomics for screening substrates of endo- and exoproteases.

Authors: Petra Van Damme; An Staes; Silvia Bronsoms; Kenny Helsens; Niklaas Colaert; Evy Timmerman; Francesc X Aviles; Joël Vandekerckhove; Kris Gevaert
Journal: Nat Methods Date: 2010-06-06 Impact factor: 28.547

4. A quantitative proteomics design for systematic identification of protease cleavage events.

Authors: Francis Impens; Niklaas Colaert; Kenny Helsens; Bart Ghesquière; Evy Timmerman; Pieter-Jan De Bock; Benjamin M Chain; Joël Vandekerckhove; Kris Gevaert
Journal: Mol Cell Proteomics Date: 2010-07-13 Impact factor: 5.911

5. ms_lims, a simple yet powerful open source laboratory information management system for MS-driven proteomics.

Authors: Kenny Helsens; Niklaas Colaert; Harald Barsnes; Thilo Muth; Kristian Flikka; An Staes; Evy Timmerman; Steffi Wortelkamp; Albert Sickmann; Joël Vandekerckhove; Kris Gevaert; Lennart Martens
Journal: Proteomics Date: 2010-03 Impact factor: 3.984

6. Improved visualization of protein consensus sequences by iceLogo.

Authors: Niklaas Colaert; Kenny Helsens; Lennart Martens; Joël Vandekerckhove; Kris Gevaert
Journal: Nat Methods Date: 2009-11 Impact factor: 28.547

7. The Pfam protein families database.

Authors: Robert D Finn; Jaina Mistry; John Tate; Penny Coggill; Andreas Heger; Joanne E Pollington; O Luke Gavin; Prasad Gunasekaran; Goran Ceric; Kristoffer Forslund; Liisa Holm; Erik L L Sonnhammer; Sean R Eddy; Alex Bateman
Journal: Nucleic Acids Res Date: 2009-11-17 Impact factor: 16.971

8. Proteome-wide substrate analysis indicates substrate exclusion as a mechanism to generate caspase-7 versus caspase-3 specificity.

Authors: Dieter Demon; Petra Van Damme; Tom Vanden Berghe; Annelies Deceuninck; Joost Van Durme; Jelle Verspurten; Kenny Helsens; Francis Impens; Magdalena Wejda; Joost Schymkowitz; Frederic Rousseau; Annemieke Madder; Joël Vandekerckhove; Wim Declercq; Kris Gevaert; Peter Vandenabeele
Journal: Mol Cell Proteomics Date: 2009-09-16 Impact factor: 5.911

9. TopFIND 2.0--linking protein termini with proteolytic processing and modifications altering protein function.

Authors: Philipp F Lange; Pitter F Huesgen; Christopher M Overall
Journal: Nucleic Acids Res Date: 2011-11-18 Impact factor: 16.971

10. MEROPS: the peptidase database.

Authors: Neil D Rawlings; Alan J Barrett; Alex Bateman
Journal: Nucleic Acids Res Date: 2009-11-05 Impact factor: 16.971

7 in total

Review 1. Proteolytic post-translational modification of proteins: proteomic tools and methodology.

Authors: Lindsay D Rogers; Christopher M Overall
Journal: Mol Cell Proteomics Date: 2013-07-25 Impact factor: 5.911

2. N-terminal proteomics and ribosome profiling provide a comprehensive view of the alternative translation initiation landscape in mice and men.

Authors: Petra Van Damme; Daria Gawron; Wim Van Criekinge; Gerben Menschaert
Journal: Mol Cell Proteomics Date: 2014-03-12 Impact factor: 5.911

3. MEROPS: the database of proteolytic enzymes, their substrates and inhibitors.

Authors: Neil D Rawlings; Matthew Waller; Alan J Barrett; Alex Bateman
Journal: Nucleic Acids Res Date: 2013-10-23 Impact factor: 16.971

4. Extracellular Alterations in pH and K+ Modify the Murine Brain Endothelial Cell Total and Phospho-Proteome.

Authors: Jared R Wahl; Anjali Vivek; Seph M Palomino; Moyad Almuslim; Karissa E Cottier; Paul R Langlais; John M Streicher; Todd W Vanderah; Erika Liktor-Busa; Tally M Largent-Milnes
Journal: Pharmaceutics Date: 2022-07-15 Impact factor: 6.525

5. Antagonism of the mu-delta opioid receptor heterodimer enhances opioid antinociception by activating Src and calcium/calmodulin-dependent protein kinase II signaling.

Authors: Attila Keresztes; Keith Olson; Paul Nguyen; Marissa A Lopez-Pier; Ryan Hecksel; Natalie K Barker; Zekun Liu; Victor Hruby; John Konhilas; Paul R Langlais; John M Streicher
Journal: Pain Date: 2022-01-01 Impact factor: 6.961

6. Twenty years of the MEROPS database of proteolytic enzymes, their substrates and inhibitors.

Authors: Neil D Rawlings; Alan J Barrett; Robert Finn
Journal: Nucleic Acids Res Date: 2015-11-02 Impact factor: 16.971

Review 7. Exploring the potential of public proteomics data.

Authors: Marc Vaudel; Kenneth Verheggen; Attila Csordas; Helge Raeder; Frode S Berven; Lennart Martens; Juan A Vizcaíno; Harald Barsnes
Journal: Proteomics Date: 2015-12-15 Impact factor: 3.984

7 in total