Literature DB >> 28145059

Time, space, and disorder in the expanding proteome universe.

David-Paul Minde^1,2,3, A Keith Dunker⁴, Kathryn S Lilley^1,2,3.

Abstract

Proteins are highly dynamic entities. Their myriad functions require specific structures, but proteins' dynamic nature ranges all the way from the local mobility of their amino acid constituents to mobility within and well beyond single cells. A truly comprehensive view of the dynamic structural proteome includes: (i) alternative sequences, (ii) alternative conformations, (iii) alternative interactions with a range of biomolecules, (iv) cellular localizations, (v) alternative behaviors in different cell types. While these aspects have traditionally been explored one protein at a time, we highlight recently emerging global approaches that accelerate comprehensive insights into these facets of the dynamic nature of protein structure. Computational tools that integrate and expand on multiple orthogonal data types promise to enable the transition from a disjointed list of static snapshots to a structurally explicit understanding of the dynamics of cellular mechanisms.

Entities: Chemical Disease Gene Mutation Species

Keywords: Alternative splicing; Conformation; Intrinsically disordered protein; Membrane proteins; Post-translational modification

Mesh：

Substances：

Year: 2017 PMID： 28145059 PMCID： PMC5573936 DOI： 10.1002/pmic.201600399

Source DB: PubMed Journal: Proteomics ISSN： 1615-9853 Impact factor: 3.984

Introduction

The human genome sequence has a smaller number of genes than expected: ∼19 000 compared to 6.7 million genes in earlier estimates 1. It has remained largely unclear how this small number of genes can be sufficient to support human complexity. In recent years, hierarchical layers of regulation have been revealed that give rise to some of the functional complexity observed in living cells despite the compact nature of the protein coding genome. These are directly linked to spatiotemporal dynamics on all levels of protein structure from their sequence, three‐dimensional structure to alternative cellular localizations and spatial organization of specific proteins in tissues and organs. We discuss these new regulatory mechanisms which contribute to emergent complexity of living systems (Fig. 1), as follows:

Figure 1

Challenging questions in proteomics. The proteome is not a fixed entity but a dynamic system. Unraveling a multitude of dynamic layers of its regulation is key to comprehensive understanding.

About 80% of the human genome maps to non‐coding yet functional genomic elements 2. These regulatory elements include sites for DNA methylation, DNase I hypersensitive regions that function as preferential interaction sites for transcription factors and long‐range regulatory elements. Fine‐tuning the control of transcription makes it possible to switch among a large variety of transcriptional states depending on intracellular and extracellular changes. Alternative splicing has been implicated in tissue differentiation and is positively correlated with organism complexity 3, 4, 5. Alternative splicing is an important mechanism to generate multiple sequence variants from the same gene, for instance in different tissues or developmental stages 6. Post‐translational modifications (PTMs) such as phosphorylation 7 and acetylation 8 crucially modulate protein function. PTMs further expand the space of alternative sequence variants of proteins. Intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs) can assume alternative secondary and tertiary conformations. This expands the available space of alternative structures 9. Switchable alternative protein–protein interactions lead to yet more diversity 10. IDPs can engage in a large number of alternative interactions as function of PTM and overlapping short linear motifs. Some IDPs use overlapping linear segments for binding to multiple, distinct protein partners with low affinities 11, thus enabling rapid rewiring of large cellular interaction networks; these same capabilities enable the rapid rewiring of gene regulatory networks 6, 12, 13. Protein turnover. The half‐lives of eukaryotic proteins range from on the order of minutes to decades 14, 15. Such differential protein turnover leads to a greater range of protein abundance than for example, transcript abundance. Transcripts vary some 2 orders of magnitude in their cellular abundance whereas proteins cover a dynamic range greater than 6 orders of magnitude or higher in some cell types. Low‐abundant proteins are turned over more rapidly by proteasomal proteolysis, contain more IDRs and are enriched in PEST motifs 16, 17. Multiple subcellular locations enable the same protein to exert different functions in different parts of the cell 18. Proteins involved in transducing intrinsic and extrinsic signaling exist within spatially restricted concentration gradients. For example, Wnt signaling gradients control asymmetric cell divisions during early development and later in the life of complex organisms maintain tissue organization. Spatial organization enables the formation of complex tissues and organs up to the highly interconnected human brain. Correspondence concerning this and other Viewpoint articles can be accessed on the journals' home page at: http://viewpoint.proteomics‐journal.de Correspondence for posting on these pages is welcome and can also be submitted at this site. Challenging questions in proteomics. The proteome is not a fixed entity but a dynamic system. Unraveling a multitude of dynamic layers of its regulation is key to comprehensive understanding. The term “proteoform” has been recently proposed as an umbrella term to summarize all possible alternative protein sequences for a given protein including genetic sequence variants, PTMs, splice variants, proteolysis variants 19. Powerful methods to characterize proteoforms have been comprehensively covered in several excellent papers 20, 21, 22. It should be added at this point that IDPs and IDRs can be viewed as “outliers” in the context of structural biology terminology: Folded proteins are readily described by the well‐established hierarchy of 1D structure (i.e. protein sequence) to 2D structure (i.e. local secondary structure elements) to 3D structure (i.e. atomic coordinates of atoms of a folded protein chain), but IDPs lack a fixed 2D or 3D structure and therefore elude a straightforward classification in the established terminology framework. To cope with this phenomenon, it was recently suggested to extend the concept of “proteoforms” to include manifold alternative conformations of IDPs and IDRs as “conformational (or basic or intrinsic) proteoforms” 23. Other authors have used various descriptors for IDPs, 24 including “4D proteins” 25 to indicate that their conformations and functions can change over time or, alternatively, other authors have attempted to classify IDPs by physical parameters such as charge patterns, IDR length and residual structure 26. While an in‐depth discussion of the issue of IDP classification and terminology is clearly beyond the scope of this viewpoint, it is important to acknowledge the current imperfections of our terminology and to encourage community‐wide efforts to find a new consensus solution for a more effective terminology that would fully integrate IDPs and IDRs into the terminology of biological sciences. Compared to extensive insights into multiple aspects of proteoforms, much less is known about higher‐order structural proteome dynamics that enable cellular complexity (Fig. 1). We focus on recently developed methodologies designed to study dynamic protein conformations, interactions, and subcellular mobility. We also present a brief summary of what we consider to be remaining key challenges in studying the structure and function of cellular proteomes.

How can alternative structures tune functional protein interactions (and vice versa)?

Not all protein interactions fit the classical lock and key model of molecular recognition achieved by docking of rigid components. Fine‐tuning target recognition can require ‘conformability’ as in the case of bacterial Lac repressor protein, which assumes a fuzzy complex when sliding along non‐specific DNA sequence but a mostly structured state in the specific, tightly DNA‐bound complex once associated with its specific target sequence 27. Similar observations have been made for human sequence‐specific transcription factor LEF1 which is mostly disordered free in solution but assumes a defined 3D structure in complex with its specific target DNA 28. Even more pronounced structural transitions from unstructured to pathologically structured fibril conformations can contribute to neurodegenerative disorders as in the case of Parkinson disease, which is associated with toxic accumulation of α‐Synuclein aggregates 29. Many cell‐regulatory hub proteins contain IDRs 30. Adenomatosis polyposis coli (APC), a tumor suppressor protein, is frequently mutated in cancer, and cancer mutated forms of APC often lack most of their 2000 residue long IDR. Axin1, an interaction partner of APC, can gain pathological functions if single point mutations disrupt the normal fold of a small folded domain that is located between its long IDRs 31, 32, 33. Increasing largely anecdotal evidence suggests that both transient and persistent structural disorder play crucial roles in biology and understanding disease mechanisms and that there is no unique disordered state but rather a continuum from fully structured to fully disordered 34, 35.

“LEGO brick” structural biology is getting more dynamic

X‐ray crystallography beyond static structures

Traditionally, structural biology was rationalized by the dogma that biological function requires a rigid 3D protein structure. According to this dogma, it should be possible to understand biology by solving one minimal energy structure per protein. Greater than 100 000 structures of folded domains have been solved over the last decades and first near‐complete structural proteome models have been proposed based on homology modeling 36. 90% of these protein structures have been solved using X‐ray crystallography, which is intrinsically restricted to the solid phase of proteins. Structural protein dynamics in solution are, therefore, incompletely characterized so far. Despite its historical bias towards solving static structures, X‐ray crystallography has chiefly contributed to the birth of the IDP field 37 as thousands of polypeptide segments in crystallised protein constructs do not give rise to a well‐defined electron density and can therefore be classified as “disordered” 38. More direct time‐resolved methods are currently under development building on the latest advances in high‐brilliance X‐ray sources. Spectacular first dynamic pictures of ultrafast light‐induced femtosecond isomerization events in the photoactive yellow protein and alternative conformations of riboswitches dynamically reshaping upon ligand‐binding highlight the possibility of capturing dynamic structural data in the future 39, 40. In addition to exciting technological developments, it will be interesting to explore improved computational possibilities for a more comprehensive analysis of existing X‐ray crystallographic datasets: further improvements are possible by treating protein dynamics explicitly and enabling improved fitting of existing electron density maps to alternative conformations and locally flexible parts in proteins 41. Even fully disordered proteins are no longer outside of the reach of X‐ray crystallography. Several important c‐Myc structures have been solved in complex with specifically binding partner proteins 42. It is hoped that this will make previously undruggable IDPs specifically targetable by exploiting unique interfaces that only arise in specific protein–protein complexes of these IDPs 43. X‐rays can make numerous protein dynamics crystal clear.

Cryo‐EM and NMR – a dynamic pair

In the last two decades, major technological breakthroughs in electron detection efficiency and image processing have culminated in a recent explosion of new Cryo‐EM structures, which is experiencing a higher average annual growth compared to x‐ray crystallography (with an average of 34 versus 9%). Cryo‐EM, like NMR spectroscopy, is capable of revealing local structural disorder. NMR peaks of disordered protein segments cluster together more closely because their more averaged chemical environments result in lower chemical shift dispersion 44. Anisotropic Cryo‐EM resolution scales with flexibility 45, i.e. highest resolution is achievable for rigid and lowest resolution for very flexible regions 46. Their preferred molecular size ranges are complementary: typically below 50 kDa for NMR and above 150 kDa for Cryo‐EM. While the rate of progress is nicely accelerating, costs of state of the art Cryo‐EM and NMR facilities still restrict broader community access to these technologies. Establishing optimal protein production protocols and sample conditions remain shared bottle‐necks among all high‐resolution structural techniques 47, 48. While the contribution of NMR to solving new structures might shrink in the future, it cannot be over‐emphasised that this technique has unique capabilities in covering directly a large range of protein solution dynamics on timescales ranging from picoseconds to hours 49. Briefly, all major high‐resolution structural biology technologies continue to develop dynamically and complement each other.

Biochemical approaches to study protein conformational dynamics

Many aspects of protein conformational dynamics are either impractical or impossible to study using exclusively above‐mentioned high‐resolution structural methods. Biochemical methods including 1D SDS‐PAGE and proteolysis have been successfully used as valuable complementary methods to characterize protein folding and conformational heterogeneity in solution 50. Short digestion protocols as in pulse proteolysis 51, 52, membrane pulse proteolysis 53, SILAC pulse proteolysis 54, and FASTpp 55 considerably increased throughput in recent years. FASTpp uses thermal denaturation in contrast to chemical denaturation in pulse proteolysis. FASTpp exploits the principle of rapid digestion of exposed, thermally unfolded polypeptide segments before they had a chance to aggregate. FASTpp detects ligand‐induced folding and stabilisation, missense mutation effects on protein stability 56, 57, 58. While FASTpp is technically simple and fast to implement without the need to equilibrate samples in denaturant, which can take months in the case of kinetically stable proteins 59, pulse proteolysis can be used to derive equilibrium unfolding energies (ΔΔGs). Limited proteolysis (LiP) has been used for many decades in structural biology and continues to be actively developed using a wide range of proteases and readout methods from low to high multiplexity 60, 61. A recent breakthrough study used LiP in combination with peptide sequencing by mass spectrometry to simultaneously map conformations of 1000 yeast proteins and to reveal quantitative structural changes in 300 proteins upon growth on different sugars 62, 63. Similar methods combining the best of classical biochemical methods and ultra‐sensitive and large‐scale protein detection have a great potential for revealing structural proteome dynamics under a large range of biological 64, physical, and chemical conditions, thereby redefining our understanding of protein stability and folding in the cellular context.

Label‐dependent protein folding assays

A wide range of highly specific methods to study protein conformations depends on selective chemical protein labeling. Tryptophan‐free proteins can be selectively labeled using a single tryptophan substitution of a chemically similar aromatic residue like phenylalanine, which often does not perturb the biological behavior of the wild‐type protein 65, 66. Another more widely used chemical labeling method is hydrogen deuterium exchange (HDX). As all proteins contain hydrogens, their exchange with deuterons presents a very generic and minimally perturbing strategy of labeling. Hydrogens are ubiquitous in proteins yet local hydrogen to deuterium exchange rates vary over many orders of magnitude depending on their structural interactions: rigidly folded and hydrogen‐bonded segments of proteins exchange very slowly (∼hours to years) while random coil regions can often exchange rapidly (∼milliseconds–seconds) 67. This effect can be used to investigate how much structure a disordered region assumes upon addition of specific ligands by investigating how the exchange rates decrease as ligand is added. A recent study demonstrated the use of reverse (i.e. deuterium to hydrogen) exchange to map peptidome‐wide peptide–protein interactions. This study highlights the fundamental possibility of exploiting atomic changes to map protein interactions on a global scale 68. Using HDX technologies on whole cells for cellular structural studies is a desirable extension of the method, however a significant hurdle is the need to minimize back‐exchange during necessary processing steps such as cell lysis and protein digestion prior to bottom‐up LC‐MS/MS analysis. Novel strategies in directed evolution or metagenomics selection 69 have the potential to identify novel types of acid‐compatible specific proteases that can help to accelerate specific digestion under conditions that drastically slow down back‐exchange. These highly acidic conditions would be only necessary after conformational features are “encoded” as deuterium incorporation and thus do not affect the native structural states of cells. Ultra‐rapid digestion methods, mass‐spectrometry compatible detergents and faster computation of complex spectra resulting from a large number of variable isotope changes may further help to pave the way toward proteome‐wide in vivo HDX experiments 70, 71.

Solubility methods to probe protein conformation

Alternative methods based on physical principles increasingly complement chemical methods. One of the earliest physical methods to characterize protein unfolding is monitoring their soluble fraction at a range of temperatures. Analogous to egg‐white protein in boiled eggs, most proteins irreversibly precipitate above their unfolding temperature. Temperatures just slightly above the physiological growth optimum can cause dramatic reductions of proteome solubility in cells lacking the Hsp70 system that is an essential component of the cellular heat shock protection system by interacting with aggregation‐prone unfolded and partially folded proteins 72, 73. The cellular thermal shift assay (CETSA) assay exploits this effect to screen ligand‐dependent changes of thermal solubility of proteins 74. CETSA revealed drug‐dependent increases of kinase stability. Initial examples of CETSA required a large number of samples to be screened by quantitative antibody‐based detection methods 74. Thermal proteome profiling (TPP) overcomes the dependence on antibodies and limited throughput by combining the CETSA principle with TMT 10‐plex mass spectrometric detection in a large temperature window between 37 and 67°C. A small number of TPP runs in human cells and cell lysates enabled quantitatively tracing drug interactions with nearly 7000 human proteins and revealed off‐target interactions of a drug 75. Interestingly, TPP can be also applied to many transmembrane proteins either before or after detergent solubilisation using a range of mild detergents 76. A systematic comparison of both datasets suggests that cellular compartments alter the biophysical stability of membrane proteins: membrane proteins in native membranes are more stable than intracellular proteins while detergent‐solubilized membrane‐proteins are less stable compared to intracellular proteins. This finding suggests that membrane proteins are more stable in vivo than intracellular proteins yet significantly less stable in vitro consistent with their reputation of being notoriously unstable during crystallization trials in detergents. As protein structural dynamics affect protein interactions and interactions in turn affect structural stability, characterizing these dynamics‐functional relations is of fundamental interest and has started to be bio‐medically transformative by establishing novel drug discovery routes.

How can transient protein–protein interactions contribute to functional diversity?

Specific protein‐protein interactions (PPI) are widely considered as key to understanding cellular functions of proteins. One might intuitively expect most PPI to be high affinity as this ensures a high fraction of specifically bound complexes. Highest affinity can be reached with rigid proteins, but transient and biophysically weak interactions are at the hub of biological interaction networks 77 and ultra‐affinity is rare 78. One of the most striking examples for two non‐rigid proteins interacting specifically is the mutually synergistic folding of two independently flexible proteins in the ACTR‐NCBD complex 79. Both protein domains engage in an intimate complex that covers a large, rather hydrophobic interface to jointly regulate transcription as crucial parts of a large number of larger proteinaceous transcription‐regulatory machineries 80, 81.

High‐throughput affinity‐based methods to study protein interactions

Most large‐scale methods depend on short, disordered affinity‐tags 48, 82. Affinity purification (AP)‐MS uses a single affinity enrichment step and investigates all co‐eluting proteins, while tandem (T)AP‐MS uses two sequential affinity steps. More specific interactors than in sequential multiple‐affinity methods can be retrieved using two or more orthogonal tag systems in parallel for the same target protein, for instance FLAG‐tag and Strep‐tag, in interactomes using parallel affinity capture (iPAC) 83 or quantitative SILAC‐iPAC 84. Recently, GFP was introduced as novel affinity tag in AP‐MS 85, which made it possible to build on existing large GFP‐fusion libraries and to selectively enrich interactors of most human proteins 77. How good are these methods for capturing weak yet potentially biologically important interactions? It is a priori not clear how these methods might bias against the detection of very transient binding events shorter than current affinity protocols or bias toward complexes that only form in vitro in dilute lysis and affinity purification buffers but would never form in the crowded intracellular environment in the presence of optimal concentrations of molecular chaperones. Clearly, orthogonal methods are needed to validate interactions and to discover additional interactions that are too transient or weak for detection by affinity‐enrichment methods.

Overcoming the quantitative protein‐protein interaction validation bottle‐neck

While affinity methods readily provide large lists of specific interactors, it is generally difficult to derive predictions about proteoform‐specific dissociation constants, which would enable quantitative predictions for other protein concentrations. Direct biophysical high‐throughput quantification of binding strength of putative protein‐protein interactions has remained highly challenging. Single‐molecular‐interaction sequencing (SMI‐seq) enables high‐throughput quantification of up to hundreds of protein interactions in parallel covering a broad range of affinities by covalently crosslinking proteins to nucleotide‐barcodes for multiplexed sequencing in situ 86. SMI‐seq has been successfully applied to both water‐soluble and membrane proteins incorporated in phospholipid bilayer nanodiscs 86. SMI‐seq uses cell‐free in vitro production of proteins and is, therefore, not fundamentally limited by the natural genetic code. Related approaches that offer high‐throughput and quantification of protein interactions will be valuable for coping with the validation bottle‐neck in protein‐protein interaction research.

Comparing in vitro and in vivo protein associations

Comparison of in vitro and in vivo protein complexes is in principle possible by fixation of protein interactions using chemical crosslinking or using fluorescence correlation spectroscopy (FCS) 87. In vivo FCS can visualize the dynamic assembly and disassembly of protein complexes during the cell cycle 87. Crosslinking mass spectrometry (XL‐MS) recently advanced from the study of a few crosslinks of small protein complexes to large viruses thanks to improvements on all levels from MS‐cleavable cross‐linkers over new mass spectrometric strategies to novel data analysis workflows 88, 89, 90. A wide range of cross‐linkers exist that cover zero‐length to several nanometers in distance between crosslinked molecules. Relatively short lengths, such as 0.5 nm for MS‐cleavable DSSO, can be ideal for use in integrative biology to refine structural models of protein complexes of partly solved composition 91. Larger cross‐linkers can be beneficial to elucidate the network of transiently or weakly binding proteins in large protein complexes 92. Future expansion of these novel crosslinking‐MS strategies to in vivo analysis of intracellular protein complexes using a class of cross‐linkers that combines clickable affinity purification handles for enrichment of crosslinked peptides and MS‐cleavability for accelerated peptide identification is becoming possible 93, 94.

How does protein‐organelle partitioning affect protein interactions?

Even the simplest known living cells are compartmentalized 95. Membrane enrichment is crucial for membrane‐intrinsic transporters and helps to orchestrate a variety of metabolic pathways 96. Eukaryotic cells have multiple membrane‐enclosed organelles that enable a wide range of physicochemical conditions to coexist in a single cell. Secretory granules can have a pH of 5.0 while other compartments typically vary between pH 6.4 and pH 7.2 97. Some proteins are fully folded in one compartment but unfolded in another 98.

Chemical proximity‐labeling strategies to discover protein co‐localisation

Efficient strategies are being developed to selectively label membrane‐associated protein complexes for subsequent MS detection. APEX2‐MS 99 is based on an enzyme that catalyzes the conversion of biotin phenol to a biotin radical and rapid labeling of nearby proteins 99, 100. Both phenol and peroxide as co‐substrates of this labeling reaction might induce cellular stress in some organisms and cell types. Selective proteomic proximity labeling using tyramide (SPPLAT) is a chemical variation to the same theme of enzymatically creating an activated biotin‐conjugate that has a short half‐life and therefore can only react in the immediate vicinity of the activating enzyme 101, 102, which is horse‐radish peroxidase in the case of SPPLAT in contrast to ascorbate peroxidase in APEX 103. Biotinylation is in principle also possible using more gentle enzymatic approaches as biotinylation is one of the most specific known PTMs 104. This natural specificity is, however, a challenge for APEX‐like applications that require promiscuous biotinylation in the proximity of the enzyme. A mutant of the bacterial BirA ligase that lacks this substrate specificity, “BioID”, has been applied to discover transient interaction partners of specific BirA‐mutant labelled proteins 105; an accelerated unspecific biotin‐ligase called BioID2 is available 106, 107. Directed evolution might further improve the activity of BioID2 at 37°C as BioID2 is derived from a highly thermophilic (Aquifex aelicus) source and displays optimal activity far above 37°C 106. Additional improvements of the method appear possibly for many applications if biotin‐enrichment is performed on the peptide level instead of protein level as ∼200‐fold increased direct mass spectrometric detection was demonstrated for biotin‐peptides 108.

How does lipid‐less subcellular partitioning affect protein interactions?

Even within a single organelle, biomolecules are not homogenously mixed. Active sub‐organellar partitioning often involves ATP‐fuelled molecular machines, for instance dynein guiding cargo proteins along the cytoskeleton 109. Other sub‐organellar structures form spontaneously. IDPs have been recently identified as crucial components driving the assembly of membrane‐less cellular compartments. The prion‐like domain of Xvelo, an IDP, is crucial for formation Balbiani bodies that are a hallmark of asymmetry in oocyte formation 110. A variety of different flavours of protein‐RNA bodies have been identified including stress granules, nucleoli, Cajal bodies, and PML bodies in the nucleus. Intriguingly, some of their properties can be explained by sequence patterns in their specific IDPs. Specific F/R/G‐rich motifs in these IDPs can efficiently drive liquid‐liquid phase separations and contribute to formation of these membrane‐less bodies 111. Thus subcellular order comes, at least in part, out of intrinsic disorder. Given their large molecular size, the ribosome and other large cellular machines including the proteasome and chaperonins constitute nanoscopic cellular compartments in their own right. Based on RNA‐seq and isolation of translationally halted ribosomes, and one‐by‐one addition of chaperones, it is now becoming possible to selectively profile ribosomal complexes to unravel how molecular chaperones engage during the translation process. This “selective ribosome profiling” approach revealed that trigger factor (TF) engages in vivo only upon emergence of ∼100 nascent residues in contrast to the earlier suggestions based on in vitro work on TF that TF is waiting per default at the ribosomal exit tunnel 112; analogous approaches have great potential to transform our understanding of spatiotemporal organisation of proteostasis including synthesis and folding of membrane proteins. Exciting open questions related to suborganellar cellular structures include: how is the timing of metabolic pathways tuned by subcellular structures? Are PTMs regulating their formation? How can we monitor systems‐wide perturbations of these structures by changing environments?

Organelle proteomics

Combining state of the art mass spectrometry, partial separation of organelles in a density gradient, and statistical analysis of resulting patterns enabled first quantitative and nearly proteome‐wide maps of cellular localizations for eukaryotic cells, such methods include protein correlation profiling (PCP) 113 and localization of organelle proteins by isotope tagging (LOPIT) 18. LOPIT has been further refined by combination with 10‐plex TMT labeling in hyper‐LOPIT 114. TMT labeling of peptides is independent of subcellular protein fractionation in density gradients and solely used to achieve maximal subcellular resolution, coverage of sub‐cellular niches and reduction of false assignments to different sub‐cellular niches; differential centrifugation and in‐solution digests have been used as technical variations of hyperLOPIT 115. LOPIT studies have revealed that many more proteins than expected are present in multiple locations of the cell. This observation gives rise to intriguing questions including how multiple locations are linked to structural and functional diversity and PTMs as well as splice variants and IDRs. APC, which contains an unstructured region of some 2000 residues, for instance, can travel from the nucleus to near the membrane and engage in several condition‐dependent transient functional protein and protein‐RNA complexes including the machinery for its own synthesis 116, 117. It will be a fascinating challenge to explore globally how other IDPs act differently in different parts of the cell and how dynamic cellular structure form under direct control from IDP regions. Selected examples for other Wnt pathway members are highlighted in a HyperLOPIT plot (Fig. 2) 118.

Figure 2

Thousands of proteins have multiple alternative cellular localizations 114, 118. Mouse stem cell hyperLOPIT data 114. Predominant variations of the partitioning of individual proteins into fractions of the density gradient are captured by the first two components (denoted PC1 and PC2) of a principal component analysis (PCA). Wnt signaling proteins APC2, CK1, GSK3β, neurodegeneration‐linked Huntingtin, and the breast cancer‐linked tumor suppressor protein BRCA1 (highlighted as solid black circles) are not assigned to a single location, characteristic of proteins with mixed localization. Despite their current limitations to relatively small numbers of different proteins that can be observed simultaneously, it will be interesting to explore the complementary benefits of cryo‐electron tomography (cryo‐ET) 119, 120, 121 and super‐resolution (SR) fluorescence microscopy 122. Both techniques are experiencing rapid technological advances and further improvements have the potential to provide novel insights into high‐resolution spatiotemporal subcellular dynamics as well as fine details of tissue architectures 123.

How can tissue and organ partitioning affect localized interactions?

Complex tissues and organs such as the human brain clearly require a high degree of spatial organization beyond single cells. Nearly 50 years ago, Francis Crick proposed diffusive “morphogen” gradients as minimal ingredient for spatial organization of cells during embryogenesis 124. Only very recently, it has become possible to directly visualize morphogen gradients in vivo using elegant organoid models that reflect most architectural features of organs while adding benefits of infinite expansion and culturability. Surprisingly, the measured short‐range cellular Wnt gradients are inconsistent with free diffusion but appear to require a cell‐bound propagation mechanism 125. Wnt signaling as a whole is a perfect illustration of the importance of various levels of disorder in establishing multi‐cellular order. Many of its crucial signaling components including the scaffolds APC, Axin and WTX contain large IDRs up to some 2000 residues 33, have large numbers of PTMs and alternative interactions 126, are cellularly mobile (Fig. 2) and read the gradient signal that spans across several cell length and ultimately established tissue and organ shape. Curiously, the massively disordered APC protein is also needed for proper synapse formation in the brain. Specific mutations of APC correlate with autism and a conditional knock‐out impaired synapse maturation 127. Defined disorder appears to be an architectural hallmark of some of the most intricate structures in nature, which are just becoming observable by mass spectrometry imaging 128.

Computational biology helping to fill the voids in structural proteomics

Acquiring all‐atom movies of the living organisms is clearly beyond experimental reach. Computational methods increasingly help to fill gaps in our understanding of structural biology. Efficient algorithms can predict secondary structure, IDRs and increasingly 3D structure can be predicted from readily available genomic sequences 129, 130, 131. Despite the astronomic conformational possibilities to arrange a given short polypeptide sequence in 3D, de novo prediction of the folding of ∼100 residue long peptides based on physical principles in silico has been shown for some examples 132. However, the community experiment on protein structure prediction known as CASP shows that de novo prediction of even small, single domain proteins, while improving over time, is still far from routine, and further shows that the most reliable method for protein 3D structure prediction remains the construction of protein models using the known structures of homologous proteins as templates. These template‐based models suffer from template bias, e.g. the resulting structures are more similar to the templates than to the true structures. Improvements in protein dynamics methods are finally leading to approaches for reducing the degree of template bias 133. Similarly, the most recent force‐field developments now show promise toward correct prediction of conformational ensemble properties of IDPs 134, 135. Computational approaches can amplify the attainable insight from highly complex multi‐dimensional proteomics experiments by efficient dimensionality reduction methods. PCA plots often capture most of the variation of highly dimensional data in visually intuitive two‐dimensional plots (Fig. 2) 136. Significant computational science community efforts are needed to maximize the knowledge gain from rapidly accumulating and diversifying multi‐omics datasets to ultimately reveal fascinating new hidden ordered patterns in complex cellular dynamic systems 137.

Outstanding challenges in proteomics

Which weak or transient interactions are functionally important? How to quantitatively understand and predict in vivo versus in vitro protein interactions? How can we quantitatively link various “omics” from DNA to RNA and the higher‐order structure of proteins including their cellular trafficking? What are the underlying principles determining cellular protein structural dynamics and how to predict them from readily accessible genomic sequences? How can we improve the mutually enhancing efforts of experimentalists and theoretical scientists to tackle highly complex “multi‐omics” projects? It is a formidable challenge for computational biologists and mathematicians to glean sufficient breadth of data types from experimentalists, to discover overarching patterns in various “omics” datasets that are fundamentally connected by common cellular biology. How can we link different protein structural states to functional diversity? The decade‐old C‐value paradox states that genome sizes are not well‐correlated with organism complexity 138. Extensive multi‐purposing in eukaryotic proteomes might explain the exceptional “coding efficiency” in many eukaryotic genomes that are too small relative to their complexity 4, 13. Quantifying the extent of multi‐purposing is highly challenging as individual dimensions such as PTM, alternative splicing and IDR discovery, and protein function prediction and validation are individually challenging. Expanding and integrating these efforts into comprehensive high‐throughput methods is highly desirable but not yet straightforward 139. Can we use our improved understanding of spatiotemporal proteome dynamics to improve life of ageing and growing societies?

Conclusion

Bottom‐up approaches have been very powerful in structural biology over many decades. DNA and RNA sequencing technologies have become highly robust and widely accessible technologies and rapid proteome‐wide protein sequencing is now possible for several organisms and transform our understanding of biology. Protein de novo folding simulations have reached near‐atomic precision for small folded domains and IDRs. Higher‐order structures are less readily predictable so far. Complicating factors are the intracellular and environmental fluctuations, which can be observed even in the most simple model systems 140, 141. Clever combinations of traditional biochemical and physical assays with increasingly rapid bottom‐up mass spectrometry generate many new opportunities to characterize these higher‐order structures as outlined in this review (Fig. 3). Collectively, these new bottom‐up mass spectrometric techniques make it possible to “sequence” many crucial layers of dynamic regulation of protein structures.

Figure 3

Opening the black box of biology. New proteomics techniques enable a more comprehensive understanding of the systems dynamics of life. New structural proteomics technologies can accelerate the analysis of the dynamics of biological regulation and extend the scope of structural biology well beyond its descriptive origins toward closer connections with cellular functions and prediction of systems behaviors of proteins. Very recent breakthrough studies demonstrated the possibility of few‐protein spatiotemporal engineering of organisms to improve carbon fixation or accelerate the process of switching from reduced photosynthetic activity under low‐light conditions to full photosynthetic productivity once more light becomes available after clouds have passed 142, 143. An improved proteome‐wide understanding of the hidden order in apparent disorder of higher‐order protein structures in living organisms can pave the way to de novo spatiotemporal engineering of organisms with beneficial properties. While this might sound like a long way off at present, it was well beyond the wildest imaginations just 20 years ago that we would be able to routinely sequence entire proteomes in an hour of measurement time 144. It will become increasingly possible to avoid late‐stage failures in drug discovery pipelines due to an improved understanding of cellular dynamics. Plenty of dynamics at the bottom of biology (Fig. 3).

143 in total

Review 1. Molecular dynamics simulations of biomolecules.

Authors: Martin Karplus; J Andrew McCammon
Journal: Nat Struct Biol Date: 2002-09

2. A human interactome in three quantitative dimensions organized by stoichiometries and abundances.

Authors: Marco Y Hein; Nina C Hubner; Ina Poser; Jürgen Cox; Nagarjuna Nagaraj; Yusuke Toyoda; Igor A Gak; Ina Weisswange; Jörg Mansfeld; Frank Buchholz; Anthony A Hyman; Matthias Mann
Journal: Cell Date: 2015-10-22 Impact factor: 41.582

Review 3. Flexible nets. The roles of intrinsic disorder in protein interaction networks.

Authors: A Keith Dunker; Marc S Cortese; Pedro Romero; Lilia M Iakoucheva; Vladimir N Uversky
Journal: FEBS J Date: 2005-10 Impact factor: 5.542

4. Femtosecond structural dynamics drives the trans/cis isomerization in photoactive yellow protein.

Authors: Kanupriya Pande; Christopher D M Hutchison; Gerrit Groenhof; Andy Aquila; Josef S Robinson; Jason Tenboer; Shibom Basu; Sébastien Boutet; Daniel P DePonte; Mengning Liang; Thomas A White; Nadia A Zatsepin; Oleksandr Yefanov; Dmitry Morozov; Dominik Oberthuer; Cornelius Gati; Ganesh Subramanian; Daniel James; Yun Zhao; Jake Koralek; Jennifer Brayshaw; Christopher Kupitz; Chelsie Conrad; Shatabdi Roy-Chowdhury; Jesse D Coe; Markus Metz; Paulraj Lourdu Xavier; Thomas D Grant; Jason E Koglin; Gihan Ketawala; Raimund Fromme; Vukica Šrajer; Robert Henning; John C H Spence; Abbas Ourmazd; Peter Schwander; Uwe Weierstall; Matthias Frank; Petra Fromme; Anton Barty; Henry N Chapman; Keith Moffat; Jasper J van Thor; Marius Schmidt
Journal: Science Date: 2016-05-05 Impact factor: 47.728

5. Determining biophysical protein stability in lysates by a fast proteolysis assay, FASTpp.

Authors: David P Minde; Madelon M Maurice; Stefan G D Rüdiger
Journal: PLoS One Date: 2012-10-03 Impact factor: 3.240

6. Mass-spectrometry-based spatial proteomics data analysis using pRoloc and pRolocdata.

Authors: Laurent Gatto; Lisa M Breckels; Samuel Wieczorek; Thomas Burger; Kathryn S Lilley
Journal: Bioinformatics Date: 2014-01-11 Impact factor: 6.937

7. SILAC-iPAC: a quantitative method for distinguishing genuine from non-specific components of protein complexes by parallel affinity capture.

Authors: Johanna S Rees; Kathryn S Lilley; Antony P Jackson
Journal: J Proteomics Date: 2014-12-20 Impact factor: 4.044

8. A draft map of the mouse pluripotent stem cell spatial proteome.

Authors: Andy Christoforou; Claire M Mulvey; Lisa M Breckels; Aikaterini Geladaki; Tracey Hurrell; Penelope C Hayward; Thomas Naake; Laurent Gatto; Rosa Viner; Alfonso Martinez Arias; Kathryn S Lilley
Journal: Nat Commun Date: 2016-01-12 Impact factor: 14.919

9. Visual proteomics of the human pathogen Leptospira interrogans.

Authors: Martin Beck; Johan A Malmström; Vinzenz Lange; Alexander Schmidt; Eric W Deutsch; Ruedi Aebersold
Journal: Nat Methods Date: 2009-10-18 Impact factor: 28.547

10. Localization of organelle proteins by isotope tagging (LOPIT).

Authors: T P J Dunkley; R Watson; J L Griffin; P Dupree; K S Lilley
Journal: Mol Cell Proteomics Date: 2004-08-04 Impact factor: 5.911

8 in total

Review 1. Features of molecular recognition of intrinsically disordered proteins via coupled folding and binding.

Authors: Jing Yang; Meng Gao; Junwen Xiong; Zhengding Su; Yongqi Huang
Journal: Protein Sci Date: 2019-09-04 Impact factor: 6.725

2. PRISMA: Protein Interaction Screen on Peptide Matrix Reveals Interaction Footprints and Modifications- Dependent Interactome of Intrinsically Disordered C/EBPβ.

Authors: Gunnar Dittmar; Daniel Perez Hernandez; Elisabeth Kowenz-Leutz; Marieluise Kirchner; Günther Kahlert; Radoslaw Wesolowski; Katharina Baum; Maria Knoblich; Maria Hofstätter; Arnaud Muller; Jana Wolf; Ulf Reimer; Achim Leutz
Journal: iScience Date: 2019-03-01

3. Gonadotropin Releasing Hormone Agonists Have an Anti-apoptotic Effect on Cumulus Cells.

Authors: Paola Scaruffi; Sara Stigliani; Barbara Cardinali; Claudia Massarotti; Matteo Lambertini; Fausta Sozzi; Chiara Dellepiane; Domenico Franco Merlo; Paola Anserini; Lucia Del Mastro
Journal: Int J Mol Sci Date: 2019-11-30 Impact factor: 5.923

Review 4. Pharmacological plasticity-How do you hit a moving target?

Authors: Michael J Parnham; Gerd Geisslinger
Journal: Pharmacol Res Perspect Date: 2019-11-21

Review 5. Navigating the dynamic landscape of alpha-synuclein morphology: a review of the physiologically relevant tetrameric conformation.

Authors: Heather R Lucas; Ricardo D Fernández
Journal: Neural Regen Res Date: 2020-03 Impact factor: 5.135

6. Decision-Tree Based Meta-Strategy Improved Accuracy of Disorder Prediction and Identified Novel Disordered Residues Inside Binding Motifs.

Authors: Bi Zhao; Bin Xue
Journal: Int J Mol Sci Date: 2018-10-07 Impact factor: 5.923

7. Biotin proximity tagging favours unfolded proteins and enables the study of intrinsically disordered regions.

Authors: David-Paul Minde; Manasa Ramakrishna; Kathryn S Lilley
Journal: Commun Biol Date: 2020-01-22

8. Protein-Protein Connections-Oligomer, Amyloid and Protein Complex-By Wide Line ¹H NMR.

Authors: Mónika Bokor; Ágnes Tantos
Journal: Biomolecules Date: 2021-05-18

8 in total