Literature DB >> 28516008

The alphabet of intrinsic disorder: I. Act like a Pro: On the abundance and roles of proline residues in intrinsically disordered proteins.

Francois-Xavier Theillet¹, Lajos Kalmar², Peter Tompa^2,3, Kyou-Hoon Han^4,5, Philipp Selenko¹, A Keith Dunker⁶, Gary W Daughdrill⁷, Vladimir N Uversky^8,9.

Abstract

A significant fraction of every proteome is occupied by biologically active proteins that do not form unique three-dimensional structures. These intrinsically disordered proteins (IDPs) and IDP regions (IDPRs) have essential biological functions and are characterized by extensive structural plasticity. Such structural and functional behavior is encoded in the amino acid sequences of IDPs/IDPRs, which are enriched in disorder-promoting residues and depleted in order-promoting residues. In fact, amino acid residues can be arranged according to their disorder-promoting tendency to form an alphabet of intrinsic disorder that defines the structural complexity and diversity of IDPs/IDPRs. This review is the first in a series of publications dedicated to the roles that different amino acid residues play in defining the phenomenon of protein intrinsic disorder. We start with proline because data suggests that of the 20 common amino acid residues, this one is the most disorder-promoting.

Entities: Chemical Disease Gene Species

Keywords: cis-trans isomerization; conformational restriction; intrinsically disordered protein; post-translational modification; protein solubility; protein surfaces

Year: 2013 PMID： 28516008 PMCID： PMC5424786 DOI： 10.4161/idp.24360

Source DB: PubMed Journal: Intrinsically Disord Proteins ISSN： 2169-0707

Introduction

Intrinsically disordered proteins (IDPs) and intrinsically disordered protein regions (IDPRs) have recently become a hot topic in molecular and structural biology., Computational analyses show that about 10–20% of full-length eukaryotic proteins are IDPs and that 25–40% of all protein residues are classified as IDPRs.- Furthermore, more than half of IDPs experimentally characterized by NMR are in fact IDPRs. Despite the fact that IDPs/IDPRs do not form regular, three dimensional structures on their own, they are nevertheless associated with various important cellular roles- and implicated in a number of prominent human diseases.,- The unique structural properties of IDPs/IDPRs require new methods for their analyses and new concepts for understanding their functions.,, Structural and functional properties of a protein are encoded by the alphabet of the 20 naturally occurring amino acids. Therefore, to understand the unique structural and functional properties of IDPs/IDPRs it is necessary to determine how their amino acid sequences differ from ordered proteins. A number of research groups, including ours, have interrogated this problem using computational methods and determined that the amino acid compositions of IDPs and IDPRs are biased in relation to ordered proteins.,,,- (add ref. 0) Based on these studies, the concept of “order-promoting” (cysteine, tryptophan, tyrosine, isoleucine, phenylalanine, valine, leucine, histidine, threonine, asparagine) and “disorder-promoting” residues (aspartic acid, methionine, lysine, arginine, serine, glutamine, proline, glutamic acid) has been proposed. From a physico-chemical point of view, the majority of order-promoting residues are non-polar and commonly found within the hydrophobic cores of ordered proteins, whereas the majority of disorder-promoting residues are polar, often charged, and commonly found on the surfaces of ordered proteins. This notion is consistent with our current understanding of the highly dynamic structures of IDPs/IDPRs that do not form stable hydrophobic cores and probably expose most of their amino acids to the solvent., Important exceptions to the just stated polar or charged tendencies are prolines, which are the most disorder-promoting residues despite the non-polar nature of their side chains. The differences in composition between ordered and disordered proteins are coupled to distinct evolutionary patterns, with IDPs and IDPRs typically displaying higher global mutation rates than ordered proteins. Despite this, some IDP residues, such as aromatic amino acids (tryptophans, tyrosines, and phenylalanines), leucines and prolines are well-conserved. With the exception of prolines, all other conserved residues are generally less abundant in IDPs than in ordered proteins. Conserved aromatic and hydrophobic IDP residues are frequently found in protein segments with molecular recognition features (MoRFs), and in the pre-structured motifs (PreSMos). MoRFs are short IDPRs that of 10-fold upon binding to other proteins, as well as to DNA. MoRFs determine the functions of many IDPs because they define specific protein-protein interaction surfaces, which likely explain their higher degree of evolutionary conservation. Figure 1 and Table 1 show the statistics of amino acid compositions of proteins in four standard data sets, Swiss-Prot, PDB Select 25, surface residues and DisProt, where Figure 1A recapitulates Table 1 in a graphical form, and Figure 1B shows the compositional differences between the structured and disordered data sets. The Swiss-Prot database (UniProtKB/Swiss-Prot) was chosen because it contains sequence and functional information on ~550,000 proteins from all kingdoms of life and therefore represents the unbiased distribution of amino acids throughout nature. PDB Select 25 contains a representative set of PDB entries with less than 25% of sequence identity. This database was chosen because of its bias toward “structural” proteins that are likely to crystalize. Surface residues were determined with the Molecular Surface Package and a number of PDB structures of monomeric proteins that were found suitable for studying biological activities associated with protein surface properties, such as protein binding, for example. Finally, the DisProt database comprises entries of proteins and protein regions that had been experimentally verified to be intrinsically disordered.Figure 1A and Table 1 show that average proline contents in these four data sets are 4.83 ± 0.03%, 4.57 ± 0.05%, 5.6 ± 0.1% and 8.1 ± 0.6%, respectively (cprofiler.org/help.html). Hence, IDPs contain, on average, 1.7- to 1.8-times more prolines than proteins in UniProt, or PDB Select 25, respectively. Furthermore, the overall proline content in IDPs is 1.4-times higher than on surfaces of folded proteins.

Table 1. Amino acid compositions of the standard data sets (modified from ref. 37)

Residue^a	Disorder propensity^b	SwissProt^c	PDB S25^d	Surface residues^e	DisProt^f
Cys (C)	0.000	1.50 ± 0.02	1.74 ± 0.05	0.78 ± 0.04	0.80 ± 0.08
Trp (W)	0.004	1.13 ± 0.01	1.44 ± 0.03	1.33 ± 0.05	0.67 ± 0.06
Ile (I)	0.090	5.90 ± 0.04	5.61 ± 0.06	2.77 ± 0.07	3.24 ± 0.13
Tyr (Y)	0.113	3.03 ± 0.02	3.50 ± 0.04	3.58 ± 0.08	2.13 ± 0.15
Phe (F)	0.117	3.96 ± 0.03	3.98 ± 0.04	2.38 ± 0.05	2.44 ± 0.13
Leu (L)	0.195	9.65 ± 0.04	8.68 ± 0.08	5.11 ± 0.08	6.22 ± 0.25
His (H)	0.259	2.29 ± 0.02	2.41 ± 0.04	2.60 ± 0.06	1.93 ± 0.11
Val (V)	0.263	6.73 ± 0.03	6.72 ± 0.06	4.01 ± 0.06	5.41 ± 0.44
Asn (N)	0.285	4.13 ± 0.04	4.58 ± 0.06	6.23 ± 0.15	3.82 ± 0.27
Met (M)	0.291	2.38 ± 0.02	2.22 ± 0.04	1.13 ± 0.04	1.87 ± 0.10
Arg (R)	0.394	5.40 ± 0.04	4.93 ± 0.06	6.56 ± 0.13	4.82 ± 0.23
Thr (T)	0.401	5.41 ± 0.02	5.63 ± 0.05	6.08 ± 0.11	5.56 ± 0.24
Asp (D)	0.407	5.35 ± 0.03	5.83 ± 0.05	8.18 ± 0.10	5.80 ± 0.30
Gly (G)	0.437	6.96 ± 0.04	7.16 ± 0.07	7.06 ± 0.11	7.41 ± 0.40
Ala (A)	0.450	7.89 ± 0.05	7.70 ± 0.08	6.03 ± 0.13	8.10 ± 0.35
Lys (K)	0.588	5.92 ± 0.05	6.37 ± 0.08	9.75 ± 0.16	7.85 ± 0.45
Gln (Q)	0.665	3.95 ± 0.03	3.95 ± 0.05	5.21 ± 0.09	5.27 ± 0.37
Ser (S)	0.713	6.83 ± 0.04	6.19 ± 0.06	6.87 ± 0.13	8.65 ± 0.43
Glu (E)	0.781	6.67 ± 0.04	6.65 ± 0.07	8.70 ± 0.17	9.89 ± 0.61
Pro (P)	1.000	4.83 ± 0.03	4.57 ± 0.05	5.63 ± 0.10	8.11 ± 0.63

a Residues are arranged according to their decreasing intrinsic disorder propensity; bDisorder propensity is calculated based on the fractional difference in the amino acid compositions between the disordered and ordered proteins obtained by renormalizing these values to lie between 0 and 1; cSwissProt 51 is closest to the distribution of amino acids in nature among the four data sets;dPDB Select 25 is a subset of proteins from the Protein Data Bank with less than 25% sequence identity, biased toward the composition of proteins amenable to crystallization studies;eSurface residues determined by the Molecular Surface Package over a sample of PDB structures of monomeric proteins suitable for protein surface analysis; fDisProt 3.4 comprised of a set of experimentally determined disordered regions.

Figure 1. Amino acid determinants defining structural and functional differences between the ordered and intrinsically disordered proteins. (A) Amino acid compositions of several data sets discussed in the text (DisProt, UniProt, PDB Select 25 and surface residues). (B) Fractional difference in the amino acid composition (compositional profile) between the typical IDPs from the DisProt database and a set of completely ordered proteins calculated for each amino acid residue. The fractional difference was evaluated as (CDisProt-CPDB)/CPDB, where CDisProt is the content of a given amino acid in a DisProt databse, and CPDB is the corresponding content in the data set of fully ordered proteins. Positive bars correspond to residues found more abundantly in IDPs, whereas negative bars show residues, in which IDPs are depleted. Amino acid types were ranked according to their decreasing disorder-promoting potential. a Residues are arranged according to their decreasing intrinsic disorder propensity; bDisorder propensity is calculated based on the fractional difference in the amino acid compositions between the disordered and ordered proteins obtained by renormalizing these values to lie between 0 and 1; cSwissProt 51 is closest to the distribution of amino acids in nature among the four data sets;dPDB Select 25 is a subset of proteins from the Protein Data Bank with less than 25% sequence identity, biased toward the composition of proteins amenable to crystallization studies;eSurface residues determined by the Molecular Surface Package over a sample of PDB structures of monomeric proteins suitable for protein surface analysis; fDisProt 3.4 comprised of a set of experimentally determined disordered regions. Figure 1B shows that proline exhibits the largest fractional change between structured and disordered proteins, and the fractional changes for the various residues provide the basis for estimating the disorder propensities given in Table 1 (see Table 1, footnote b). Indeed, the disorder propensities here yield the same P, E and S ranking for the most disorder-promoting residues as obtained in a previous study, while the remaining amino acids show some alterations in the ranking compared with the previous study, especially for amino acids with similar disorder propensity values. Of course such estimates depend on both the methods used and the sets of proteins in the databases, which were both significantly different in the previous study as compared with this one. Overall, the disorder propensity ranking between the two studies differ in detail but these differences are not significant. This article starts a series of publications on the alphabet of intrinsic disorder, which is dedicated to exploring the amino acid determinants of intrinsic protein disorder. Here, we review the functions of prolines in IDPs/IDPRs and provide compelling evidence for proline-specific biological activities that may provide explanations for their high levels of abundance and conservation in disordered proteins and protein regions.

Structural Properties of Prolines

Chemical structure of prolines

Among the 20 natural amino acids, proline is unique in that it is the only imino acid; that is, the proline backbone nitrogen is bound to two alkyl carbons and lacks the usual proton (see Fig. 2). Proline’s distinctive cyclic structure renders the backbone conformation more rigid than in any other amino acid. Hence, proline peptide bonds exhibit structural features that differ substantially from other residues, also because they do not contain backbone amide hydrogen atoms at physiological pH and therefore do not form stabilizing hydrogen bonds in α-helices, or β-sheets. In consequence, prolines are rarely found as integral parts of secondary structure elements,, but rather at the ends of a-helices, or in protein loop regions. Their characteristic backbone angle properties and unique structural properties in proteins and polypeptides (see below) also give rise to atypical Ramachandran plot features.- Prolines sample restricted areas of the Ramachandran space, which are primarily defined by their backbone pyrrolidine constraints. They also exert pronounced effects on the backbone geometries of residues preceding them, i.e., pre-prolines.

Figure 2. Chemical structure of peptide fragments in trans (A) and (C) and cis conformation (B) and (D); (C) and (D) show a proline-containing fragment. The red arrows point out the steric hindrances between the Cα of the residue (−1) with the Hamide (A) or the Cα of the residue (0) (B) for the non-proline-containing peptides, and between the Cα of the residue (-1) with the Cδ (C) or the Cα of the proline (D). Ramachandran plots of non-proline, non-glycine, non-isoleucine, non-valine residues (E) and proline residues (F) result from the analysis of 1.5 million residues in 8,000 protein chains with resolution < 2 Å and backbone B-factors < 30.The contours separate the “outlier,” “allowed” and “favored” regions of the Ramachandran plots. The Ramachandran plots were adapted from commons.wikimedia.org/wiki/User:Dcrjsr. The β-strand (β), α-helix (α), α-L-helix (αL), poly-proline II (PPII) regions of the Ramachandran plots are indicated and we show a representation of a model poly-proline II helix.

Cis-trans isomerization

Although most amino acids form peptide bonds that are in their trans-isomer conformations (> 99.5%),, Xaa-Pro peptide bonds populate both cis- and trans-states. Xaa-Pro trans isomers are indeed less favored because of relatively high steric conflicts between Xaa-Cα atoms and Pro-Cδ’s (see Fig. 2). The energy differences between proline cis/trans conformers are less pronounced than in other amino acids, which, in connection with a high energy barrier between the two isomers (~20 kcal/mol), results in slow cis/trans interconversion rates (10−3 s−1). Hence, on average, ordered proteins contain 5–10% cis-conformers of the Xaa-Pro peptide bonds, whereas the occurrence of cis-isoforms of usual amide bonds in proteins is typically below 0.5%., The cis-isomer content is influenced by the nature of the surrounding residues and by the types of surrounding secondary structure.- Despite these similar energy levels in disordered peptides, prolines in natively folded proteins tend to display exclusive cis-, or trans-conformations, which are primarily established via the protein fold and the resulting specific interactions with residues close in space., Within protein Xaa-Pro motifs, Cα(Xaa)/Cα(Pro) distances of trans-proline conformations are on average 1.5 Å larger than for cis proline isomers;, however, these effects are not systematic and strongly influenced by the nature of Xaa. In most folded proteins, isomer-specific structural changes are local, and vanish at a distance of 2–3 residues from the proline of interest. More extended conformational rearrangements have only been observed for a few cases. From a local point of view the effects that proline cis/trans isomers induce in polypeptide chains are important. Cis-isoforms result in turn-like structures, whereas trans-isoforms favor locally extended conformations (see Fig. 2). In protein folding cis/trans isomerization plays an important role and often functions as the rate limiting step in the overall folding process. Important cellular enzymes such as peptidyl-prolyl isomerases (PPIases) accelerate proline isomerization processes and thereby enhance the kinetic rates with which thermodynamic equilibrium states are reached. The relationships between PPIases and IDPs will be discussed in more detail, later in the article. One aspect that we want to stress is that proline cis-trans characteristics and behaviors of IDPs are similar to those of peptides. IDPs display cis population averages of ~5–10% and, therefore, IDPs with 10 or more prolines have high probabilities for multiple cis conformations. This creates substantial diversity in population conformers that sample a vast conformational space.

On the hydrophobicity of the proline residue

In the initial hydrophobicity scale development, the backbone was considered to be constant for all of the amino acids, and thus only the side chain was considered to be contributing to the values of the scale. However, with regard to residue hydrophobicity, the proline imine brings the backbone into play. That is, upon burying a typical amino acid residue, the backbone has both hydrogen bond donors and acceptors, leading to helices, sheets, turns or other structures in which the backbone hydrogen bonding potential is self-satisfied. For proline, on the other hand, the backbone has hydrogen bond acceptors but no donors, and for this reason it is costly from an energetic point of view to sequester the proline backbone from the solvent. The consequences of this donor / acceptor imbalance in the backbone are that, compared with valine, the other amino acid with a side chain containing 3 aliphatic carbons, proline is less frequently buried and more frequently on protein surfaces (Table 1; Fig. 1B). In this regard, the solubility of the individual amino acids is generally inversely correlated with hydrophobicity, yet proline is by far the most soluble of the amino acids at neutral pH, and furthermore, polyproline is much more water soluble than polyglycine, polyalanine and polyleucine due to polyproline’s lack of an NH group. Thus, despite its hydrophobic side chain, the proline residue is very hydrophilic.

Prolines in IDPs/IDPRs: Structural and Functional Roles

The polyproline type II helix as a unique binding interface

The unusual chemistry of prolines imposes several constraints on neighboring residues and proline-rich motifs (PRMs) have high propensities for adopting non-classical conformations such as the polyproline type II (PPII) helix.- PPII helices are left-handed, extended structures that contain three residues per turn and no internal hydrogen bonding. They are surprisingly abundant structural scaffolds in virtually every proteome. Even ordered proteins contain short PPII stretches, and PPII backbone dihedral angles (−75°, 150°) are frequently observed in amino acids other than prolines., In PPII helices, side-chain and backbone carbonyls are solvent-exposed and often engage in intermolecular hydrogen bonds, thereby mediating generic intermolecular recognition events of rather low ligand specificities. In turn, a great number of proline-recognition domains (PRDs) interact with PRMs and PPII helices, among which SH3 and WW domains are probably the most well-known examples. The giant human protein titin, with a total of 34,000 amino acids, contains ~550 SH3 binding motifs, of which ~100 are found in PRMs.- PPII-mediated interactions regulate diverse sets of particular cellular functions.,, A statistical analysis on 74 scaffolding proteins for example, has revealed that this class of proteins contained predicted degrees of disorder (i.e., 49.7% by IUPred, 63.36% by VSL2 and 47.82% by FoldIndex) that were comparable to highly disordered classes of proteins, such as transcription factors and RNA chaperones. Furthermore, 26 of the most disordered scaffolding proteins contained average proline contents of 11.2 ± 0.4%, which appears to predispose PRM-proteins to function as hubs in protein-protein interaction networks.- PRMs, or polyproline regions (PPRs) are also found in the proteomes of several viruses, such as hepatitis E (HEV), rubivirus and cutthroat virus (CTV). Although the functional significance of PPRs in viruses remains poorly understood, they appear to mediate interactions of viral proteins with cellular host factors to modulate viral replication efficiencies. A recent study further demonstrated that sequence variabilities in viral PPRs play important roles in adaptation and in specifying the range of host cells. PPRs of HEV genotypes 3 and 4, for example, indicating viral variants of zoonotic origins that can infect humans and animals, are twice as heterogeneous than PPRs in the HEV genotype 1 variant, which is purely anthropotropic and can infect humans only. Also, in these PRM-containing binding regions, proline not only is involved in maintaining an open conformational state compatible with binding, it is also the most important residue that contacts the partner protein. An analysis of short linear motifs (SLiMs, also termed Eukaryotic Linear Motifs, ELMs) showed that Pro is the residue most significantly enriched in sites that determine binding specificity of the motif (restricted sites, RSs).

PRMs and IDP conformations

Based on the high levels of PPII sequence conservations in folded proteins, it has been suggested that these structural elements constituted a separate class of secondary structure elements, with two major functions: To promote super-secondary structures, such as PPII/α-helical interactions, and to form inter-domain linkers. In IDPs, the unique propensities of PPII structures in rigidifying polypeptide backbone conformations is thought to spatially separate functionally important protein regions. An example for such a separation function is provided by the human oncoprotein and transcription factor p53 that contains two PRMs in PPII-type conformations. One separating the intrinsically disordered N-terminal transactivation domain (NTAD) of p53 from its folded DNA-binding domain (DBD), the other one within the NTAD separating a helical pre-structured segment and two pre-structured turns that mediate distinct protein-protein interactions.,- Similarly, two transactivation domains within the C-terminus of herpes simplex virus protein 16 (VP16) are separated by a conserved PRM (452PGPGFTPHDSAP464)., In both cases, spatial positioning via PRMs likely regulates independent transcription activation processes that rely on different interactions with the RNA polymerase II machinery., By analogy, two helical segments within the C-terminal portion of human securin, potentially mediating the interactions with separase, are separated by a PRM (162PPSPVKMPSPP173), whereas a PRM in the human transcription factor FoxA3 (250PPQPPPPAPEP260) separates its DNA- from its histone-binding domain. Whereas PRMs often induce extended conformations, many IDPs are usually more compact than chemically denatured proteins of comparable lengths,, whose conformational behaviors still cannot be described as random coils. Because most IDPs are not restricted to stable three-dimensional architectures, to seamlessly vary their degrees of global compactions is thought to constitute an important functional IDP feature., Therefore, the ability of PRMs to elongate and stiffen polypeptide chains has to be discussed in this context. For example, proline-rich salivary proteins possess significantly higher radii of gyration than are expected for unfolded polypeptides of similar lengths. It has been proposed that organized PPII helices in these proteins result in larger collisional cross sections that facilitate their interactions with tannins, which form the basis of the sensory perception of astringency. Extending IDP structures via PRM-mediated effects may not necessarily be restricted to long proline sequences alone. In fact, a strong correlation between the number of prolines in an IDP and its radius of gyration has been established. Such expansions have been attributed to the unique properties of Xaa-Pro peptide bonds to adopt backbone dihedral angles that correspond to extended conformations. However, prolines can also promote β-turn conformations, which elicit various degrees of polypeptide chain compactions., The degree of compaction can moreover be tuned by cis/trans equilibria. In line with these observations, mutating proline residues in a short, disordered elastin-like peptide has been shown to induce a stepwise expansion. In contrast, the overall stiffness of four disordered peptides were reported to be more correlated with their PPII contents than their proline counts, whereas the intrinsic capacities for hairpin structures strongly correlated with the numbers of glycines and prolines. Therefore, the possible role(s) of prolines in compacting, or expanding IDPs conformations would depend on the context. While increasing the number of prolines in PPII conformations appears to rigidify IDPs, a high-abundance of prolines in combination with favorable glycine contents, or with selective positioning of charged and/or hydrophobic residues, gives rise to preferred hairpin conformations that result in more collapsed structures.

Prolines as secondary structure-breakers

Because of their unique chemical and structural properties, and because of their negative influence on classical secondary structure, it is tantalizing to speculate that proline positions in folded, but also in intrinsically disordered proteins, had been evolutionary selected, as well as conserved, for their unique capacities to modulate the structural propensities of neighboring protein residues. In folded proteins, a preference for prolines at helix-capping positions had been recognized very early on. Depending on the data set, or the methods for defining secondary-structures, prolines in N- or C-cap positions preferentially occur between Ncap-1 and Ncap+2 and between Ccap and Ccap+3, respectively.- In these instances, high proline frequencies do not relate to helix stabilization effects, but more likely function as border elements that confine existing secondary structures to certain lengths., In IDPs, proline positions may have been evolutionarily conserved to ensure that protein regions with residual structural propensities, such as MoRFs for example, retain their partially folded states in a balanced manner. Recent findings support this notion by showing that prolines at positions that flank partially folded IDP segments (PreSMos) occur more frequently and display higher levels of positional conservation, than elsewhere in these proteins. In essence, this notion represents an extension of the “proline bracket” concept,, according to which prolines in segments flanking protein interaction sites negatively modulate the propagation of α-helices and β-strands. Such effects may preserve various degrees of conformational IDP plasticity, which may eventually steer different binding behaviors in protein-protein interactions.

Prolines and prevention of amyloid-like aggregation

As mentioned earlier, positional proline effects in IDPs may preserve levels of disorder in regions with residual structural propensities. This, in turn, may also reduce the likelihood for spontaneous IDP aggregation, which is often cytotoxic, results in cell death and produces several devastating disease phenotypes. In fact, many different IDP aggregation processes proceed via intermediate conformations that harbor folded aggregation cores, which progressively expand into highly ordered macromolecular assemblies such as amyloids fibrils, for example. In folded proteins, uncontrolled association events via existing secondary structure elements are often prevented by combinations of dedicated structural features that “protect” aggregation-prone entities such as peripheral β-strands. These include “covering” interactions with loop- or helical-segments, β-strand distortions via inward-pointing, charged residues, incorporation of prolines, β-bulges, or glycine-promoted bends and twists, or via formations of continuous β-sheets to yield β-barrels. Therefore, prolines at the domain boundaries are often highly conserved and mutating them usually promotes aggregation., In depth analyses of various protein segments that display high propensities for β-aggregation have shown that β-breaking prolines, together with charged amino acids such as lysines, arginines, glutamates and aspartates, are specifically enriched at these positions and thought to serve as anti-aggregation “gatekeepers.”

Elastomeric proteins

Elastomeric proteins exemplify another important aspect of the “usage” of prolines for specific biological functions. These proteins display remarkable propensities for elastic recoiling behaviors and undergo innumerous reversible deformations in the course of their lifetimes, which are directly related to their specific biological functions in tissues and other biomaterials. In all vertebrates, elastomeric proteins constitute the building blocks of blood vessels; in insects, they give rise to specialized structures such as a spider’s silk; in arthropods they make up the intrinsic energy storage apparatus that enables jumping. Some of these proteins are IDPs that have evolved to aggregate in a controlled manner to form dedicated, rubber-like structures that are able to be stretched under extreme physical circumstances and to recoil by itself later. Although these elastomeric proteins can spontaneously organize themselves into elastomeric protein complexes, they are surprisingly resistant to forming β-rich amyloid structures. Despite their sequence and functional diversities, all elastomeric proteins and IDPs contain unusually high proline and glycine contents, which clearly separates elastomeric proteins from amyloidogenic proteins and peptides (Fig. 3). Prolines in these structures, together with glycines, prevent the formation of long, stable amyloid structures, whereas their relatively high hydrophobicities promote aggregation-like behaviors such as recoiling. Thus, amino-acid compositions of elastomeric proteins depend on a fine balance between polypeptide hydrophobicity and high proline and glycine contents.,

Figure 3. A two-dimensional plot correlating proline and glycine content for a wide variety of elastomeric and amyloidogenic peptides. Elastomeric proteins are characterized by high GP content and are located in the upper-right part of this plot. Contrarily, amyloidogenic peptides are characterized by low PG content and therefore are located in the left bottom corner of the plot. The coexistence region (shaded in gray) contains P and G compositions consistent with both amyloidogenic and elastomeric properties. Elastomeric proteins, including the domains of elastin, major ampullate spindroin (MaSp) 2, flagelliform silk, the elastic domains of mussel byssus thread, and abductin, appear above a composition threshold (upper dashed line). Amyloidogenic sequences are primarily found below the PG-threshold, along with rigid lizard egg shells, tubulliform silk (TuSp1), a protective silk for spider eggs, and aciniform silk (AcSp), used for wrapping prey. The coexistence region contains amyloid-like peptides as well as the elastomeric adhesive produced by the frog Notaden bennetti, the PEVK domains of titin, wheat glutenin protein, and the strongest spider silks, namely MaSp1 and minor ampullate spindroin (MiSp). Figure reproduced from ref. 130 Abbreviations: AcSp, aciniform silk; MaSp, major ampullate spindroin; MiSp, minor ampullate spindroin; TuSp1, tubulliform silk.

Proline-Directed Post-translational Modifications

Post-translational protein modifications (PTMs) range from enzymatic cleavage reactions of peptide bonds to covalent additions of particular chemical groups, lipids, carbohydrates or even entire proteins onto selected subsets of amino acid side chains. PTMs extend the range of amino acid structures and properties and greatly diversify the functional space of virtually every proteome. With regard to our subject, strong correlations between predicted, and experimentally verified protein disorder and the occurrence of PTMs exist, the most common among which are phosphorylation,, ubiquitination, acetylation, methylation, and glycosylation reactions. These PTMs are typically involved in the regulation and control of various signaling and recognition processes (for example see ref. 139). Although direct post-translational modifications of proline residues only have a limited range of functions, prolines play important roles in the regulation of the occurrences of other PTMs.

Proline PTMs

Annotated lists of experimentally verified PTMs, in Swiss-Prot and other databases, clearly indicate that prolines are primarily subject to post-translational hydroxylation (selene.princeton.edu/PTMCuration/), which can occur on Cβ ((2S,3S)-3-hydroxyproline) or Cγ ((2S,4R)-4-hydroxyproline) positions. These nonreversible conversions of prolines to (2S,4R)-4-hydroxyprolines (Hyps) are catalyzed by prolyl 4-hydroxylase enzymes and surprisingly, represent the most common PTM in humans. In fact, Hyps are more abundant in animals than seven of the most “common” amino-acid types: Cys, Gln, His, Met, Phe, Trp and Tyr. The best known roles for Hyp’s are in stabilizing collagen triple helices. Proline hydroxylation enhances the stability of trans-isoforms of Xaa-Pro peptide bonds relative to cis-isoforms. Since proline trans-isoforms already constitute the major conformations in IDPs (~90%), hydroxylation is not thought to play additional important roles in their conformational behaviors. Apart from their roles in collagen-like coiled-coil structures, Hyp’s are also found in many other connective tissue proteins, in proteins with collagen-like domains, as well as in the (partially) disordered proteins elastin, conotoxin and argonaute 2. The best example for Pro-hydroxylation generating a signal for regulation is hypoxia-inducible transcription factor 1α (HIF-1α). At low oxygen conditions (hypoxia), HIF-1α activates transcription by recruiting the general coactivator CBP/p300 via interaction with its TAZ1 domain. Upon elevation of oxygen level, Pro564 of HIF-1α becomes hydroxylated, it binds to the ubiquitin ligase von Hippel-Lindau factor and undergoes ubiquitination that targets the protein for degradation.

Proline-directed limited proteolysis

Structural disorder and the extended structure ensured by Pro residue(s) are also involved in directing the action of proteases in limited proteolysis. Due to being an irreversible modification, limited proteolysis is a serious and tightly regulated signaling decision by the cell. For example, calpain, the intracellular protease only cleaves specific substrates if activated by calcium and released by its tight inhibitor, calpastatin, and shows a strong preference for regions of local structural disorder dominated by Pro residues (Tompa et al. J Biol Chem 2004; 279: 20775-85). Actually, Pro is depleted around the scissile bond (positions P2, P1 and P1’), but is highly significantly enriched in flanking regions (positions P4, P3 and P2’ to P6’).

Roles of prolines in protein phosphorylation

Many serine/threonine kinases modify substrate sites that constitute integral or distal parts of kinase consensus motifs., Within these consensus motifs, proline residues often define substrate site specificities. Examples include many proline-directed protein kinases, such as cyclin-dependent kinases (CDKs),, the mitogen-activated family of protein kinases (MAPKs),, extracellular signal-regulated kinases (ERKs), stress-activated protein kinases/c-Jun-N-terminal kinases (SAPKs/JNKs), p38 kinases, glycogen synthase kinase-3 (GSK3) and Polo-like kinases (PLKs), all of which require prolines at positions +1 with respect to the sites of modification. These kinases play important roles in diverse cellular processes, such as cell-cycle progression, sensing of metabolic states, regulating of cellular growth, mediating of intracellular signaling, as well as executing deterministic cell response behaviors. In turn, mutations of proline-directed kinase consensus- and phosphorylation-sites are often involved in different forms of cancer and in neurodegenerative disorders.- A second, less stringent proline −2 position has recently been identified as a supplementary specificity determinant for some proline +1-directed kinases., Whereas many prolines positively regulate kinase activities, by targeting them to their phosphorylation sites, proline residues within kinase consensus motifs can also weaken kinase activities, especially when they occur at positions −1 and −2, relative to the PTM sites, or even at positions +1.- In other phosphorylation reactions, prolines play important roles in serving as specific kinase docking sites that are distal from actual phosphorylation sites but key to recruiting kinases to substrate proteins.- In addition to kinases, the enzymatic properties of phosphatases are also modulated by prolines, either in the vicinities of phospho-sites or at distal docking sites., Finally, prolines that are close to modified substrate residues may critically influence PTM-mediated protein-protein interactions. It has been shown that phosphorylated serines, or threonines, followed by a proline, are more specifically recognized by subsets of 14–3-3 proteins or by Group IV WW domains.,

Roles of prolines in protein glycosylation

Glycosyltransferases are classes of enzymes that transfer sugar moieties onto proteins and they are strongly influenced by the presence of prolines in their substrate proteins. N-glycosylation of asparagines within the Asn-Xaa-Ser/Thr motif has been found to have a very low penetrance when the Xaa residue is a proline or when prolines are present at the +1 positions. In contrast, N-glycosylation is greatly enhanced when prolines are present at the −2 positions., O-glycosylation preferentially occurs in protein regions with high proline contents, and particularly high proline frequencies have been reported for positions −1 and +3 relative to O-glycosylation sites. Both phosphorylation and glycosylation do not affect proline cis-conformer contents of phospho-Ser/Thr/Tyr-Pro motifs- and of glyco-Ser-Pro motifs, respectively.

Roles of proline isomerases in PTM establishments

As mentioned previously, proline cis/trans isomerization reactions play important roles in protein folding and refolding processes, via the establishment of rather long-lived kinetic intermediates. Therefore, classes of cellular enzymes, so-called peptidyl-prolyl isomerases (PPIases), specifically enhance proline cis/trans isomerization without affecting their thermodynamic equilibrium states. PPIases are evolutionarily conserved and often characterized as foldases, or annotated as catalytic structural chaperones. Due to their inherent differences in stereochemistry, proline cis/trans isomers can also define different functional states of proteins. In these cases, PPIase activity drastically impacts protein function, as has been shown for the folded SH2 domain of the interleukin-2 inducible T-cell kinase (Itk)- and the PHD-BRD tandem domain of the MLL1 protein.- In both cases, proline cis/trans isomerization leads to large inter-domain conformational changes that subsequently affect protein-protein interaction behaviors. Enhanced proline cis/trans isomerization in the presence of PPIases, leads to rapid sequestration of binding-competent protein states, which shifts the global population equilibrium toward the structure with which the more abundant binding partner interacts.,, Therefore, without changing protein free energies of cis/trans isomers, PPIases are capable of promoting new cis/trans distributions via additional factors that form complexes with, and thereby stabilize, individual isomer states. Because many IDPs are PPIase substrates,- enzyme-controlled proline cis/trans isomerization processes provide intricate extensions to the long list of possible proline functions in IDPs. For example, proline isomerization controls switching of the adaptor protein Crk between two conformations: an auto-inhibitory state is stabilized by intramolecular association of two, tandem SH3 domains via a flexible linker IDPR containing a cis-proline isomer and a non-inhibited, activated conformation results from the promoted interconversion of this proline into its trans form. In turn, this particular cis/trans isomerization is targeted by the PPIase cyclophilin A. Among other PPIase enzymes, the phospho-dependent Pin1 [protein interacting with NIMA (never in mitosis A)-1] enzyme is of special interest. Pin1 functions in phospho-dependent signaling by catalyzing cis/trans interconversions of pSer/pThr-Pro peptide bonds in their phosphorylated states. Structurally, Pin1 consists of an N-terminal phospho-recognition WW domain and a C-terminal, catalytic PPIase domain. Whereas cis/trans population ratios in these Ser/Thr-Pro motifs are not affected by phosphorylation in a peptide/IDP context, cis/trans isomerization rates are severely reduced when the motif is modified.- In folded proteins, the protein fold and amino acids that surround these Ser/Thr-Pro sites often stabilize, or de-stabilize one of the isomers. Enzymes such as Pin1 establish faster inter-conversion rates upon phosphorylation, which enables a 2-way control over the protein’s function:,- One way is regulation via phosphorylation, processed by a kinase or removed by a phosphatase, and a second way is control via isomerization, accelerated by a non-phospho-dependent PPIase or by the phospho-dependent PPIase Pin1. Could similar 2-way controls be utilized by IDPs? A limitation is that Ser/Thr-Pro cis/trans thermodynamic equilibrium is not greatly affected by protein phosphorylation but is substantially affected in folded proteins. A supplementary IDP protein partner is thus required for the emergence of a function of the phospho-dependent cis/trans isomerization. For example, 2-way control like that discussed above has been observed for the pSer7-Pro8 motif within the intrinsically disordered, C-terminal domain (CTD) of RNA polymerase II, whose phosphorylation status correlates with transcriptional activity. Only the cis-isomer of the modified peptide motif serves as a substrate for the Ssu72 phosphatase., Hence, Ssu72-mediated dephosphorylation of the CTD pSer7-Pro8 sequence occurred much faster when Pin1 was present and proline cis/trans isomerization has been identified as the rate-limiting step in Ser7 dephosphorylation. Another interesting example is afforded by pSer62 of the c-Myc oncoprotein, a key regulator of cell growth that is stabilized by Ser62 phosphorylation. Dephosphorylation by PP2A only occurs when Thr58-Pro59 is phosphorylated and Pin1 is present. Therefore, pSer62 dephosphorylation may similarly require Pro59 to be in the cis isomer state. Analogous relations between the Alzheimer disease-associated protein Tau, Pin1 and PP2 have been observed. Based on these examples, it is evident that PPIase activities represent important supplementary levels of regulatory controls in many cellular processes, although, in some cases, it remains unclear whether Pin1 binding, or catalysis, constitutes is the mechanism of action.,

Conclusions

Examples presented in this review show that there are multiple, distinct mechanisms by which proline regulates IDP and IDPR structure and function. The unique chemical properties of proline define its role as a modulator of secondary structural elements, but also its propensity to promote specific structural motifs such as the polyproline type II helix. In turn, these features appear to be especially important in regulating a multitude of functional IDP and IDPR properties that include their aggregation propensities. In addition, nature seems to have taken full advantage of the slow proline cis/trans isomerization characteristics in a number of biological processes that, altogether, extend the impressive functional range of this unique imino acid.

197 in total

1. Random-coil behavior and the dimensions of chemically unfolded proteins.

Authors: Jonathan E Kohn; Ian S Millett; Jaby Jacob; Bojan Zagrovic; Thomas M Dillon; Nikolina Cingel; Robin S Dothager; Soenke Seifert; P Thiyagarajan; Tobin R Sosnick; M Zahid Hasan; Vijay S Pande; Ingo Ruczinski; Sebastian Doniach; Kevin W Plaxco
Journal: Proc Natl Acad Sci U S A Date: 2004-08-16 Impact factor: 11.205

Review 2. Flexible nets. The roles of intrinsic disorder in protein interaction networks.

Authors: A Keith Dunker; Marc S Cortese; Pedro Romero; Lilia M Iakoucheva; Vladimir N Uversky
Journal: FEBS J Date: 2005-10 Impact factor: 5.542

3. Quantitative assessment of the preferences for the amino acid residues flanking archaeal N-linked glycosylation sites.

Authors: Mayumi Igura; Daisuke Kohda
Journal: Glycobiology Date: 2010-11-29 Impact factor: 4.313

4. Determinants for substrate phosphorylation by p21-activated protein kinase (gamma-PAK).

Authors: P T Tuazon; W C Spanos; E L Gump; C A Monnig; J A Traugh
Journal: Biochemistry Date: 1997-12-23 Impact factor: 3.162

5. Exceptional disfavor for proline at the P + 1 position among AGC and CAMK kinases establishes reciprocal specificity between them and the proline-directed kinases.

Authors: Guozhi Zhu; Koichi Fujii; Natalya Belkina; Yin Liu; Michael James; Juan Herrero; Stephen Shaw
Journal: J Biol Chem Date: 2005-01-12 Impact factor: 5.157

Review 6. Intrinsically disordered proteins in human diseases: introducing the D2 concept.

Authors: Vladimir N Uversky; Christopher J Oldfield; A Keith Dunker
Journal: Annu Rev Biophys Date: 2008 Impact factor: 12.981

7. Force-induced prolyl cis-trans isomerization in elastin-like polypeptides.

Authors: Alexei Valiaev; Dong Woo Lim; Terrence G Oas; Ashutosh Chilkoti; Stefan Zauscher
Journal: J Am Chem Soc Date: 2007-05-01 Impact factor: 15.419

8. Proteome-wide post-translational modification statistics: frequency analysis and curation of the swiss-prot database.

Authors: George A Khoury; Richard C Baliban; Christodoulos A Floudas
Journal: Sci Rep Date: 2011-09-13 Impact factor: 4.379

9. Statistically significant dependence of the Xaa-Pro peptide bond conformation on secondary structure and amino acid sequence.

Authors: Doreen Pahlke; Christian Freund; Dietmar Leitner; Dirk Labudde
Journal: BMC Struct Biol Date: 2005-04-01

10. The Ramachandran plots of glycine and pre-proline.

Authors: Bosco K Ho; Robert Brasseur
Journal: BMC Struct Biol Date: 2005-08-16

60 in total

1. Dynamic structure of the full-length scaffolding protein NHERF1 influences signaling complex assembly.

Authors: Shibani Bhattacharya; Christopher B Stanley; William T Heller; Peter A Friedman; Zimei Bu
Journal: J Biol Chem Date: 2019-06-06 Impact factor: 5.157

Review 2. Conditionally disordered proteins: bringing the environment back into the fold.

Authors: Andrew C Hausrath; Richard L Kingston
Journal: Cell Mol Life Sci Date: 2017-06-08 Impact factor: 9.261

Review 3. Structural metamorphism and polymorphism in proteins on the brink of thermodynamic stability.

Authors: Prakash Kulkarni; Tsega L Solomon; Yanan He; Yihong Chen; Philip N Bryan; John Orban
Journal: Protein Sci Date: 2018-09-24 Impact factor: 6.725

4. A sequence-specific transcription activator motif and powerful synthetic variants that bind Mediator using a fuzzy protein interface.

Authors: Linda Warfield; Lisa M Tuttle; Derek Pacheco; Rachel E Klevit; Steven Hahn
Journal: Proc Natl Acad Sci U S A Date: 2014-08-13 Impact factor: 11.205

10. Proline-rich domain of human ALIX contains multiple TSG101-UEV interaction sites and forms phosphorylation-mediated reversible amyloids.

Authors: Ruben D Elias; Wen Ma; Rodolfo Ghirlando; Charles D Schwieters; Vijay S Reddy; Lalit Deshmukh
Journal: Proc Natl Acad Sci U S A Date: 2020-09-11 Impact factor: 11.205