Literature DB >> 27455965

Determinants of Macromolecular Specificity from Proteomics-Derived Peptide Substrate Data.

Julian E Fuchs¹, Oliver Schilling², Klaus R Liedl³.

Abstract

BACKGROUND: Recent advances in proteomics methodologies allow for high throughput profiling of proteolytic cleavage events. The resulting substrate peptide distributions provide deep insights in the underlying macromolecular recognition events, as determinants of biomolecular specificity identified by proteomics approaches may be compared to structure-based analysis of corresponding protein-protein interfaces.
METHOD: Here, we present an overview of experimental and computational methodologies and tools applied in the area and provide an outlook beyond the protein class of proteases. RESULTS AND
CONCLUSION: We discuss here future potential, synergies and needs of the emerging overlap disciplines of proteomics and structure-based modelling. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

Entities: Chemical Disease Gene Species

Keywords: Macromolecular recognition; molecular modelling; peptide binding; protease substratezzm321990profiling; protein-protein-interface; specificity

Mesh：

Substances：

Year: 2017 PMID： 27455965 PMCID： PMC5898033 DOI： 10.2174/1389203717666160724211231

Source DB: PubMed Journal: Curr Protein Pept Sci ISSN： 1389-2037 Impact factor: 3.272

INTRODUCTION

Proteolytic enzymes (proteases, proteinases, peptidases) cleave peptidic bonds in substrate proteins and peptides and fulfil multiple central roles in living organisms [1]. Therefore, specialized proteases with diverse functions and distinct catalytic classes and fold types have evolved [2]. Some enzymes process well-defined substrates at specific sites and are involved in signalling cascades, e.g., the blood clotting cascade [3], the apoptosis pathway [4] or the complement cascade [5]. Other proteases cleave a variety of substrates and are required for digestion of nutrition proteins [6] as well as degradation of extracellular matrix proteins [7]. Therefore, the range of substrates (“degradome”) defines the biological function of proteolytic enzymes [8] and turns them into attractive drug targets [9]. Proteases are not only an interesting protein class in terms of their biological functions but also as prototypes of multi-specific protein-protein interfaces [10]. A multitude of protease substrate sequences has been reported in scientific literature [11] and gathered in publicly available online data- bases (MEROPS [12], CutDB [13], PMAP [14], DegraBase [15], TopFIND [16]). Information content of MEROPS, its access and utilization, also in respect of protease substrate specificity, has recently been reviewed by the curators of the database [17]. Consensus substrate sequences in the P4-P4' amino acid positions [18] flanking the scissile bond of protease substrates are often depicted as heat maps [19], sequence logos [20], or iceLogos [21] (see Fig. for an example sequence logo for the serine protease factor Xa generated with Weblogo [22]). Recently, the Skylign web server was launched to facilitate generation and interactive manipulation of sequence logos [22]. As of December 2014 MEROPS lists 13,768 substrates for the unspecific serine protease trypsin-1 only, with the vast majority stemming from proteomics-based identification techniques [23, 24]. Several other proteolytic enzymes spanning different catalytic types are characterized with at least 1,000 annotated targets. These innovative experimental methodologies allow for rapid identification of proteolytic events at the proteome level using mass spectrometry and therefore increasingly broaden the range of available peptide substrate data [25-32]. The gathered amount of substrate data allows for quantification and direct comparison of protease specificity [33]. In combination with structure-based techniques, molecular determinants of macromolecular specificity and promiscuity can be identified and generalized from proteases to general protein-protein interfaces [34]. In the following review, we will outline technologies used on both the experimental and computational side and aim to judge future potential and challenges for this emerging field at the interface of proteomics and structural bioinformatics.

DEGRADOMICS METHODS and data

Several approaches for the specificity profiling of proteases have been established. Importantly, the different strategies have particular advantages and should be considered as being highly complementary. Determination of protease specificity is a fundamental step in their biochemical characterization and provides the basis for the design of specific probes and inhibitors. For yet uncharacterized so-called “novel” proteases, powerful specificity profiling approaches enable rapid de-orphanizing and establishing of robust activity assays. As outlined in the present review, the combination of positional specificity profiles with structural investigations and modern computational techniques are exceptionally powerful in providing a molecular understanding of peptide substrate recognition by proteolytic enzymes. On a basic level, protease specificity can be investigated with a small number of peptidic substrates. This is exemplified by an early study on matrix metalloproteases, in which a set of 16 synthetic octapeptides were used to assess specificity of skin fibroblast collagenase [35]. The sequences of these peptides represent variations of known collagenase cleavage sites in proteins. However, usage of only a few peptidic substrates severely limits coverage of sequences diversity and is intrinsically biased. Phage display is a powerful technique for the profiling of protein-peptide interactions. Phage display has been adopted to identify preferred peptidic substrates for proteolytic enzymes [36]. Randomized, genetically encoded sequences are expressed as protease-sensitive linkers between an affinity domain and a truncated form of the gene III protein. Each phage particle encodes one substrate sequence. Efficient cleavage of the substrate sequence in the linker region removes the affinity domain, thus enabling separation of phage particles with cleavable sequences. Substrate phage display enables extensive coverage of sequence diversity and iterative refinement of protease specificity profiles. The method has been widely adopted. Further developments include bacterial display in combination with fluorescence-activated cell sorting [37] and automated platforms for increased throughput, enabling profiling of entire protease families, such as matrix metalloproteases [38]. While phage display is a genetic approach to generate sequence diversity, complimentary techniques have been developed based on synthetic peptide libraries. Here, three approaches are particularly outstanding: positional scanning synthetic combinatorial libraries (PS-SCLs), peptide nucleic acids (PNA) arrays, and mixture-based oriented peptide libraries. The PS-SCL strategy employs peptide libraries in which one position (e.g. P1) is occupied with a defined residue while the other positions are randomized [39]. The aim is to profile specificity of the defined position without interference of the randomized other positions. Important refinements of the PS-SCL strategy include a more versatile chemical synthesis, allowing for randomization of the P1 residue together with more sensitive fluorescence detection. The typical design of a PS-SCL experiment focuses on either prime or non-prime specificity. However, some proteases such as matrix metalloproteases (MMPs) have a specificity profile that spans across the scissile peptide bond. Turk et al. [40] designed a two-step strategy to tackle such cases using synthetic peptide libraries. Firstly, prime-site specificity is profiled by an N-terminally protected degenerate peptide library. Cleavage products possess free N-termini and are analyzed by Edman sequencing, while chemical protection of uncleaved peptides prevents Edman degradation. The mixed signals stemming from Edman sequencing of the library pool are translated into subsite preferences. The preferred prime site motif is used in a second step. Here, the library consists of fixed prime-site and randomized non-prime site residues. The library has free N-termini and C-terminal biotin tags. The fixed prime-site motif serves as an anchor to define the scissile peptide bond. Proteolysis releases the non-prime cleavage product while uncleaved peptides and prime-site cleavage products are captured by biotin affinity chromatography. Non-prime cleavage products are analyzed by Edman sequencing, yielding positional preferences. The approach has been particularly useful in characterizing metalloproteases [40-42]. PNAs employ peptidic substrates that are tagged to specific nucleic acid sequences. Proteolysis removes a terminal fluorophore. Subsequently, PNAs are hybridized to a complementary, spatially deconvoluting microarray [43]. Lack of fluorescence at a specific position indicates preferential proteolysis of that specific peptide sequence. The approach typically employs hundreds to thousands of different peptide sequences [44]. In contrast to synthetic peptide libraries, proteome-derived peptide libraries employ natural sequence diversity. Protease specificity profiling using proteome-derived peptide libraries is highlighted in (Fig. ) [24]. Peptide libraries are generated by endoproteolytic digestion of proteomes such as cell lysates, thereby representing natural sequence diversity. After inactivation of the digestion protease and dimethylation of α− and ε−amines, peptide libraries are incubated with a test protease. Prime-site cleavage products possess free N-termini, allowing biotinylation and specific retrieval, followed by mass-spectrometry based identification. The corresponding non-prime site sequences are inferred from proteome sequence databases. Thus, this approach profiles prime- and non-prime specificity in a single experiment, directly identifies the position of the scissile peptide bond, and retrieves individual cleavage sequences rather than pooled preferences. The method often identifies hundreds of cleavage sequences for a test protease and has been used for serine-, cysteine-, metallo- and aspartyl-proteases [24, 45] as well as for the deorphanizing of previously uncharacterized proteases [46]. Identification of large arrays of cleavage sequences enables investigation of subsite cooperativity, as highlighted for HIV protease 1 (see Fig. ): here, presence of a “large” residue in P1 favors acceptance of a “small” residue in P3 and vice-versa This effect has been originally described by Ridky et al. [47] and is correctly portrayed by specificity profiling with proteome-derived peptide libraries [24]. Notably, few other methods enable direct assessment of protease subsite cooperativity. Adaptions of the technique include multiplexed stable isotope tagging for kinetic investigations [48]. Proteome-derived peptide libraries have also been used to investigate the specificity of carboxypeptidases and Nα-acetyltransferases [49, 50]. In both cases, sophisticated chromatographic strategies were employed to retrieve modified peptides. Mass-spectrometry based identification of cleavage products has also been used in combination with synthetic rather than proteome-derived peptide libraries [51]. Importantly, proteome-derived peptide libraries do not indicate processing of native substrates. To this end, powerful approaches exist that identify and quantify protease cleavage sites in cells and tissues. Since identification of proteineous substrates in physiologic or pathologic settings exceeds the scope of the present review, we refer to two other reviews for a discussion of these degradomic techniques [52, 53]. However, detailed knowledge of protease specificity is useful in delineating candidate proteases that are able to mediate cleavage events that were identified in vivo. This concept has already been implemented for peptidomic analyses [54].

QUANTIFYING, MAPPING AND COMPARING SPECIFICITY

Experimental peptide substrate data can be utilized to quantify protease specificity. Based on probabilities for each amino acid in each protease subpocket, an information entropy “cleavage entropy” can be calculated [30]. These continuous values between zero (completely unspecific) and one (perfectly specific) highlight positions of specificity in protease substrate recognition and can be readily projected onto the binding site where protein-substrate co-crystal structures are available (see Fig. ). Because most proteases show a canonical orientation for the substrate peptide in extended beta conformation around the scissile for at least 6 amino acids (P3-P3') [55, 2], amino acid side-chains explore overlapping regions of the substrate binding cleft. Effects on substrate turn-over have been demonstrated for regions far beyond this central region have been shown recently in human and mouse granzyme B [56]. Specificity landscapes form the basis for investigation of biomolecular recognition processes and can even be utilized to rationalize virtual screening results targeting proteases [57]. Alternatively, protease substrate data may be exploited to map distances between proteolytic distances (the degradome) in substrate space [33]. Thereby, similarities in substrate recognition can be identified in the absence of protease sequence or structural similarity. Here, peptide substrate data shows a similar information content as structural information [58]. Substrate-derived protease similarities have also been successfully employed as a lead discovery technique for novel protease inhibitors [59]. Substrate-driven mapping strategies have recently been explored similarly for kinase specificity mapping [60].

CHARACTERIZING PROTEIN STRUCTURE AND DYNAMICS

Structural data for proteases and their complexes with substrates and inhibitors has increased dramatically in recent years. Currently, the Protein Data Bank [61] contains in total 105,212 entries (database accession 27.11.2014). There are 8,901 of these structures annotated as peptidases (with enzyme classification (EC) number 3.4), thus representing 8.5% of the whole database and 14.6% of all enzymes with annotated EC code (total 61,014 entries). These structural data form the basis for a molecular understanding of protease-substrate interactions as a model for biomolecular recognition. Binding sites of proteases may be computationally characterized in terms of static molecular properties like electrostatics [62], hydrophobicity [63] or cooperative hydration networks [64]. Size and surface properties of binding cavities can be similarly explored automatically [65, 66]. Different molecular probes can be used for the detection of interaction hot spots and their characterization [67]. See (Fig. ) for example mappings to the binding site of the serine protease thrombin [68]. Molecular properties were calculated and mapped using Molecular Operating Environment (MOE) [69]. All aforementioned approaches treat proteins and their binding sites as rigid bodies. By contrast, proteins are inherently flexible entities and explore a range of conformational states in physiological conditions [70, 71]. Therefore, static enthalpic driving forces of molecular recognition are complemented by entropic factors arising from the dynamics of the system [72]. Molecular dynamics simulations allow exploration of conformational dynamics of biological systems at atomistic level in silico [73] with increasing time scales [74] and accuracy [75]. Generated conformational ensembles of proteins can be utilized to characterize binding sites in terms of global and local flexibility and their respective time scales [76]. Conformational entropies calculated from positional fluctuations [77] or dihedral distributions [78, 79] allow to identify flexible regions in binding sites. Additionally, state-of-the-art simulations are performed in presence of water boxes surrounding the simulated systems (explicit solvation), thus enthalpy and entropy of water molecules bound to binding site regions can be explored [80-83]. Hydration effects are especially interesting for proteases as the hydration is known to be key for protein stability and function [84, 85]. Recently, technologies using mixed solvents to probe binding site preferences for particular molecular fragments and allowing direct calculation of binding free energies have been developed [86, 87]. On the other hand experimental data allows insights into binding site flexibility for a limited set of proteolytic enzymes. Nuclear magnetic resonance (NMR) spectroscopy has been successfully employed to characterize solution dynamics in bacterial subtilisins [88], thrombin [89], and HIV protease [90]. On the other hand, ensemble refinement of crystal structures has been utilized recently to investigate binding site conformations of complement factor D [91]. Future broader utilization of noise in electron densities for identification of alternate protein conformations will help to characterize macromolecular flexibility based on crystallographic data [92].

SPECIFICITY OF PROTEASES

Substrate specificity of proteases has long been thought to be driven by static structural features only. Within the class of chymotrypsin-like serine proteases single amino acids directly contacting the substrate in the S1 pocket were thought to explain specificity completely [93]. This class of proteases shares a common catalytic triad and protein framework with mostly unspecific pockets except for S1-P1 interactions [94]. Therefore, the presence of an Asp residue in the S1 of trypsin explains its specificity for Arg and Lys at P1, whilst chymotrypsin and elastase show a preference for hydrophobic amino acids at the same position [95]. Hedstrom et al. attempted to prove the simple paradigm that single residues direct specificity by attempting to convert trypsin to chymotrypsin by S1-directed mutagenesis [96]. Even after replacing the whole S1 pocket of trypsin with the corresponding residues in chymotrypsin, the specificity could not be exchanged entirely. Therefore, simple static effects are not sufficient to explain protease specificity and attention was pointed towards adjacent surface loops [97]. Similarly, no unique solution was later found to exchange the specificity of trypsin and elastase and it was hence concluded that protease specificity is both difficult to rationalize and transfer [98]. The situation was even more complicated by the discovery that the S1 pocket in elastase communicates with other subpockets [99]. Therefore, factors for protease specificity were summed by Hedstrom as follows [1]: “Catalysis and specificity are not simply controlled by a few residues, but are rather a property of the entire protein framework, controlled via the distribution of charge across a network of hydrogen bonds and perhaps also by the coupling of domain motion to the chemical transformation”. All the aforementioned findings point towards more complex origins of proteases specificity beyond static structural factors. Indeed, dynamic contributions have been described in the recognition of almost all catalytic classes ranging from the serine proteases subtilisins [88] and α-lytic protease [100] via the aspartic HIV protease [101] to snake venom metalloproteases [102]. Recently, quantitative metrics for substrate specificity allowed for direct correlation of binding site flexibility and substrate promiscuity. Thereby, a direct interplay of dynamics and substrate recognition was shown for effector caspases [34]. Here, correlations between receptor backbone dynamics and specificity were shown to be stronger than between hydrogen bonding occupancy and specificity. Recently, a propagation of protease dynamics to the first hydration layers that might explain the specificity profile has been shown for thrombin [103]. This highlights that dynamics are key to a molecular understanding of protease-substrate peptide interactions and macromolecular binding events in general. Flexible binding sites adopt more diverse conformations which leads to promiscuity when following a binding paradigm of conformational selection [104, 105].

IMPLICATIONS FOR GENERAL MACROMOLECULAR BINDING EVENTS BEYOND PROTEASES

Protein-protein interfaces are of highest interest for both structural biology [106] as well as drug design [107]. The interface between proteolytic enzymes and their substrates is a well-studied example and therefore offers peptide data sets suitable for statistical analysis. In addition to the described analysis of the protease side of the protein-protein interface, the substrate side may be investigated by similar means raising the question which structural properties turn proteins into protease substrates. Here, broad proteolysis corresponding to non-specificity has been successfully used as a probe for thermal unfolding [108, 109], indicating strong links to local flexibility and accessibility [110]. Thereby, differences in protein dynamics and stability caused by mutations have been linked to different so-called conformational diseases [111-113]. Statistical analyses of glutamyl endopeptidase and caspase-3 cleavage sites revealed independence of cleavages sites from local secondary structure [114]. Exposure appears to be more crucial than flexibility and local interactions to allow proteolysis [115]. Recently, limited proteolysis was coupled with targeted proteomics to describe conformational changes in proteins [116]. On the other hand, local unfolding events are required for some proteolytic events, thus preventing their direct identification by fluctuations around the native state [117, 118]. Proteomics techniques allow profiling of substrate spectra of enzyme classes beyond proteases. Recently, several techniques assessing substrate preferences of kinases by phosphoproteomics have been developed [119, 120]. Still, the data basis here is not yet as broad as for peptides binding to PDZ domains, where again correlations between receptor promiscuity and flexibility have been reported [121]. Similarly, the binding specificity of ephrins to the Eph receptor appears to be coupled to intrinsic dynamics [122] as well as the binding specificity of ubiquitin [123]. Small molecule selectivities of G protein-coupled receptors have recently been successfully modelled and predicted based on structural descriptors [124]. Here, the number of disulfide bonds in the extracellular region seems to govern receptor promiscuity by determining the maximum ligand size in the entrance pathway. Comparably, affinity maturation of antibodies leading to proteins with increased affinity and selectivity is paralleled by a loss of flexibility [125]. Recently, even antibody degradation sites quantified via mass spectrometry data have been linked to local flexibility [126]. Intrinsically disordered proteins (IDPs) demonstrate the extreme case: Here, extreme flexibility leads to almost complete non-specific binding [127] and also higher susceptibility to proteolysis [128]. This extreme binding promiscuity is central for the function of IDPs as mediators for many cellular interaction networks in parallel [129].

CONCLUSION AND PERSPECTIVES

We have demonstrated how molecular determinants of specificity may be deduced from proteomics-derived peptide data. Joined forces of proteomics and structure-based modelling approaches allow tackling of topical questions of molecular biosciences and provide further insights into protein-protein interactions at molecular level. As for all data-driven modelling techniques data accessibility and careful database curation are a prerequisite for statistical analyses. We therefore encourage the scientific community to support and make use of public data resources and associated tools [130]. For the described studies, required data spans from crystal structures from the Protein Data Bank [61], via protease classification schemes from MEROPS [12], their sequences from UniProt [131], to proteomics-derived peptide data. Here, the PRoteomics IDEntifications (PRIDE) database [132] allow uploading and annotation of large proteomics data sets that may be shared for further analysis via ProteomeXchange [133, 134]. Together with more and more sophisticated experimental techniques to detect proteolytic events and their kinetics [135], increasingly broader and more detailed analyses of their molecular origins can be performed. Thus, we are convinced that collaborations between experts in proteomics and structural bioinformatics will lead to a new understanding of macromolecular interactions and in turn to exciting novel opportunities for structure-based drug design.

Consent for Publication

Not applicable.

132 in total

1. Structural characterisation and functional significance of transient protein-protein interactions.

Authors: Irene M A Nooren; Janet M Thornton
Journal: J Mol Biol Date: 2003-01-31 Impact factor: 5.469

2. The energy landscapes and motions of proteins.

Authors: H Frauenfelder; S G Sligar; P G Wolynes
Journal: Science Date: 1991-12-13 Impact factor: 47.728

Review 3. Biomolecular simulation: a computational microscope for molecular biology.

Authors: Ron O Dror; Robert M Dirks; J P Grossman; Huafeng Xu; David E Shaw
Journal: Annu Rev Biophys Date: 2012 Impact factor: 12.981

4. Stabilizing of a globular protein by a highly complex water network: a molecular dynamics simulation study on factor Xa.

Authors: Hannes G Wallnoefer; Sandra Handschuh; Klaus R Liedl; Thomas Fox
Journal: J Phys Chem B Date: 2010-06-03 Impact factor: 2.991

5. On the size of the active site in proteases. I. Papain.

Authors: I Schechter; A Berger
Journal: Biochem Biophys Res Commun Date: 1967-04-20 Impact factor: 3.575

6. Structural and kinetic determinants of protease substrates.

Authors: John C Timmer; Wenhong Zhu; Cristina Pop; Tim Regan; Scott J Snipas; Alexey M Eroshkin; Stefan J Riedl; Guy S Salvesen
Journal: Nat Struct Mol Biol Date: 2009-09-20 Impact factor: 15.369

7. MMP-20 is predominately a tooth-specific enzyme with a deep catalytic pocket that hydrolyzes type V collagen.

Authors: Benjamin E Turk; Daniel H Lee; Yasuo Yamakoshi; Andreas Klingenhoff; Ernst Reichenberger; J Timothy Wright; James P Simmer; Justin A Komisarof; Lewis C Cantley; John D Bartlett
Journal: Biochemistry Date: 2006-03-28 Impact factor: 3.162

8. Promiscuous and specific recognition among ephrins and Eph receptors.

Authors: Dandan Dai; Qiang Huang; Ruth Nussinov; Buyong Ma
Journal: Biochim Biophys Acta Date: 2014-07-10

9. Human immunodeficiency virus, type 1 protease substrate specificity is limited by interactions between substrate amino acids bound in adjacent enzyme subsites.

Authors: T W Ridky; C E Cameron; J Cameron; J Leis; T Copeland; A Wlodawer; I T Weber; R W Harrison
Journal: J Biol Chem Date: 1996-03-01 Impact factor: 5.157

10. Structure-based prediction of asparagine and aspartate degradation sites in antibody variable regions.

Authors: Jasmin F Sydow; Florian Lipsmeier; Vincent Larraillet; Maximiliane Hilger; Bjoern Mautz; Michael Mølhøj; Jan Kuentzer; Stefan Klostermann; Juergen Schoch; Hans R Voelger; Joerg T Regula; Patrick Cramer; Apollon Papadimitriou; Hubert Kettenberger
Journal: PLoS One Date: 2014-06-24 Impact factor: 3.240