Literature DB >> 32607431

Circuit Topology Analysis of Polymer Folding Reactions.

Maziar Heidari^1,2, Helmut Schiessel³, Alireza Mashaghi¹.

Abstract

Circuit topology is emerging as a versatile measure to classify the internal structures of folded linear polymers such as proteins and nucleic acids. The topology framework can be applied to a wide range of problems, most notably molecular folding reactions that are central to biology and molecular engineering. In this Outlook, we discuss the state-of-the art of the technology and elaborate on the opportunities and challenges that lie ahead.

Entities: Chemical Disease Gene Species

Year: 2020 PMID： 32607431 PMCID： PMC7318069 DOI： 10.1021/acscentsci.0c00308

Source DB: PubMed Journal: ACS Cent Sci ISSN： 2374-7943 Impact factor: 14.553

Introduction

Linear polymer chains are building blocks of life and an important class of synthetic macromolecules in chemistry. The chemical and physical properties of these molecules are determined partly by the chemical nature of their monomers and partly by their arrangements along the chain and in 3D. Natural proteins and synthetic protein origami are good examples of such linear polymers, in which topological arrangement of interacting non-neighboring monomers (contact sites) determine their structure, stability, surface properties, and folding kinetics and pathways. Chain segments can come into physical contact via electrostatic, van der Waals, hydrogen bond, or covalent interactions, but they can also cross each other and form knots. Circuit topology categorizes the arrangement of contacts, while knot theory categorizes the arrangement of chain crossings. Thus, a combination of both conceptual frameworks provides a complete theory for folded linear polymer chains. Arrangement of contacts and knot crossings are functionally important properties of folded linear macromolecules. For proteins, topology has been used to describe folding pathways;[1−4] supercoiling and knotting are linked to protein stability;[4,5] intriguing observations revealed possible connections between topology, function, and evolution of some proteins.[6] Similar links are being explored for RNA molecules. For example, the statistics of loops has been linked to the thermodynamics of RNA folding.[7] RNA molecules involving base pairs between loops are likely to become topologically trapped in persistent frustrated states.[8] The folding structure of the genomic DNA has also been a subject of intense studies and discussions.[9,10] A topology analysis has been used to extract conformational reaction pathways that lead to the formation of the chromosome. A combination of modeling and experimental analysis of contacts suggest that the organization of the genome is determined by an interplay of loop extrusion and compartmental segregation.[11,12] Condensin I and II proteins together form nested DNA loops around a helical DNA scaffold leading to the formation of the mitotic chromosome.[13] In addition to protein and genome studies, much progress has been made in the field of polymer science by studying synthetic and model polymers. For example, a recent study revealed the interplay between topology and mechanics in elastic knots. By combining an optomechanical analysis of knotted fibers with modeling, Patil et al. identified simple topological counting rules to predict the relative mechanical stability of knots and tangles.[14] The synthesis of molecules with different topologies is a challenging task. Some good progress has been made,[15,16] but there is still much be learned on this emerging research frontier. Studies on the topology of linear molecules are naturally linked to studies on folding processes. As mentioned earlier, folding may happen through knotting or contact formation or both. Until now, most studies focused on the knot formation pathway and kinetics. Contacts are ignored in knot theory, and most proteins and RNA molecules do not form knots.[17] Thus, understanding the contact arrangement (circuit topology) of folded molecules and their implications for folding is critically important for biochemistry and bioinspired molecular engineering and beyond. Research on contact arrangements and their implication for folding has gained momentum only recently thanks to theoretical and technological advancements. Here, we focus on circuit topology and discuss its applications to folding studies. The organization of the paper is as follows. We start with defining the statistics of loop formation and circuit topology. Then we will provide a topological description of single-molecule folding and unfolding dynamics. Next, we will discuss how one can assist folding toward preferred topologies by introducing various forms of confinement. Afterward, we will show how one can sort and enrich certain topologies. Finally, we will discuss possible extensions of current topology frameworks and particularly, how knot theory and circuit topology can merge into a complete topology theory for folded chains.

Formation and Arrangement of Contacts

A given linear polymer, biological or synthetic, can be considered as a fluctuating chain (path), whose equilibrium statistics are defined by the Boltzmann weight, which is the exponential of the chain energy in units of the thermal energy. As the result of the thermal fluctuations, different segments of the chains can meet and form contacts. The contact probability of two points separated by contour length s along an ideal chain in a dilute solution decays with s–3/2.[18] However, a polymer chain in a poor solvent folds into a globule, whose structure is either close to equilibrium or out-of-equilibrium. In the former case, the contact probability decays still as s–3/2 while in the latter, which is also known as fractal or collapsed globule, it decays as s–1.[19] The stability of these contacts is determined by the free energy changes between bound (after contact) and unbound (free) states; the free energy change includes enthalpic and entropic contributions. The formation of a contact is associated with the formation of a closed loop along the contour length; the statistics of such loops carry critical information with functional relevance (e.g., see ref (7)). The loops can be characterized by their geometry (e.g., contact order defined as the ratio of the loop size to the total contour length of the chain) and by their topology, that is, the way in which the loops are mutually closed. Loop geometry has been discussed in several studies; for example, statistical physics frameworks have been introduced to study loop geometry in nucleic acids and proteins.[7,20] However, the topological arrangements of the loops and their importance in determining folding kinetics have only recently received attention since the introduction of “circuit topology” by Mashaghi et al.[17] The topological arrangements of the loops in the circuit topology framework are defined by considering a linear polymer chain with N contact sites and M loops. Here, the contact sites are labeled as C = C1, C2, ..., C and loops as L = L1, L2, ..., L. If loop L1 connects sites C and C with contour segment [C, C] and loop L2 connects sites C and C with contour segment [C, C], the circuit topology relation between L1 and L2 is defined as follows:where L1SL2 denotes that L1 is in series arrangement with L2 (and similarly for P, X, and P–1). As can be seen in Figure , such arrangements between each loop pair are analogous to the arrangements of elements in an electrical circuit; hence the name “circuit topology”. This definition applies to binary contacts (valency of binding sites is 1). Now if we allow for valency of 2, then we need to take into account the possibility of overlap between the connecting sites. Of course, we can slightly change the definitions by categorizing the shared binding site to allow for concerted parallel (CP), concerted series (CS), concerted inverse parallel (CP–1), and concerted inverse series (CS–1) (See Figure b). Valencies higher than 2 are less common in biology, although one can extend the definition to higher valency too. Thus, a topological space can be defined by three independent elements (if we allow commutativity of the parallel loops, i.e., P–1 = P) whose values are set by the topological fraction of each category, that is, the number of loop pairs in that category divided by the total number of loop pairs. Higher order correlation patterns than binary ones can be expected in the contact matrix of macromolecules having hierarchical structures such as the genome. The detection and comparison of these patterns can be facilitated and quantified using machine learning as the size of the circuit topology matrix and the higher order correlations of the patterns grow. Figure a shows an example of a folded chain having 11 contacts. The contacts can exemplify local loops in RNA molecules or β–β interactions in proteins. The circuit topology matrix can be generated using the information on either all contacts (high resolution map) or coarse-grained (CG) contacts (low resolution map). In the latter, the local contacts are grouped into a single coarse-grained (CG) contact, which leads to four CG contacts. In Figure c, a folding pathway of N-terminal domain of ribosomal protein L9 (2HFV)[21] is illustrated, and its corresponding circuit topology changes are shown.

Figure 1

Configuration of a folded chain with 6 contact pairs is depicted in the left panel. Each contact along the chain is marked with different colors. The corresponding circuit topology of the folded chain is shown in the middle panel. The loop pairs are wired in series (S), parallel (P), or cross (X) topology. The inverse parallel topology (P–1) is assumed to be the same as parallel topology (P). Since the arrangement of the contacts is analogous to the arrangement of elements in an electrical circuit, the mathematical framework is dubbed “circuit topology”. The circuit topology entropy of a folded chain configuration with 9-contact pairs is shown in right panel as a function of the series (ns) and parallel (np) topological fractions.[22] Reproduced from ref (22) 2015 with permission from the Royal Society of Chemistry.

Figure 2

(a) Configuration of a folded chain with 11 contact pairs is illustrated. The contacts can exemplify local loops in the RNA molecules or β–β interactions in proteins. The circuit topology matrix built on the high resolution information, that is, the contact pairs, is shown in the top middle. The contact pairs are categorized in series (S), parallel (P), cross (X), and inverted parallel (P–1). If the local contacts are grouped into single coarse-grained (CG) contact, the number of contacts reduce to four CG contacts, and the new low-resolution circuit topology is shown in the top right matrix. (b) Two molecular chains with contact pairs formed in the concerted series (CS) and concerted parallel (CP) arrangements are illustrated. In these cases, a single binding site on the chain forms two contacts and its valency rises up to two. By slight change of the definition, one can regard CP as P and CS as S, although a full consideration is also an option of course. (c) Snapshot and circuit topology of the folding trajectory of the N-terminal domain of ribosomal protein L9 (2HFV).[21] First, the N-terminal hairpin forms. Next, the α helix forms a contact in concerted parallel with the hairpin, contacting the first strand. This second contact helps initiate the hydrogen bonding of the third β strand to the first, yielding three contacts in concerted parallel relation. Contact marked with * is not numbered in the molecular structure because the final α helix was excluded in folding simulations. The dashed and solid lines in the circuit topology matrix indicate the two folding steps.

In Figure we show different levels of coarse graining and simplification. For CP and CS, by slight changes in definition one can regard CP as P and CS as S, although a full consideration is also an option of course. Configuration of a folded chain with 6 contact pairs is depicted in the left panel. Each contact along the chain is marked with different colors. The corresponding circuit topology of the folded chain is shown in the middle panel. The loop pairs are wired in series (S), parallel (P), or cross (X) topology. The inverse parallel topology (P–1) is assumed to be the same as parallel topology (P). Since the arrangement of the contacts is analogous to the arrangement of elements in an electrical circuit, the mathematical framework is dubbed “circuit topology”. The circuit topology entropy of a folded chain configuration with 9-contact pairs is shown in right panel as a function of the series (ns) and parallel (np) topological fractions.[22] Reproduced from ref (22) 2015 with permission from the Royal Society of Chemistry. (a) Configuration of a folded chain with 11 contact pairs is illustrated. The contacts can exemplify local loops in the RNA molecules or β–β interactions in proteins. The circuit topology matrix built on the high resolution information, that is, the contact pairs, is shown in the top middle. The contact pairs are categorized in series (S), parallel (P), cross (X), and inverted parallel (P–1). If the local contacts are grouped into single coarse-grained (CG) contact, the number of contacts reduce to four CG contacts, and the new low-resolution circuit topology is shown in the top right matrix. (b) Two molecular chains with contact pairs formed in the concerted series (CS) and concerted parallel (CP) arrangements are illustrated. In these cases, a single binding site on the chain forms two contacts and its valency rises up to two. By slight change of the definition, one can regard CP as P and CS as S, although a full consideration is also an option of course. (c) Snapshot and circuit topology of the folding trajectory of the N-terminal domain of ribosomal protein L9 (2HFV).[21] First, the N-terminal hairpin forms. Next, the α helix forms a contact in concerted parallel with the hairpin, contacting the first strand. This second contact helps initiate the hydrogen bonding of the third β strand to the first, yielding three contacts in concerted parallel relation. Contact marked with * is not numbered in the molecular structure because the final α helix was excluded in folding simulations. The dashed and solid lines in the circuit topology matrix indicate the two folding steps. The averaged contact orders of each topological set can also be calculated. The contact order of two loops with topology i is calculated by CO = (1/2NL)∑(ΔL1 + ΔL2), where N is the number of double loops that are categorized in the topological state i, ΔL1 and ΔL2 are the monomer separation of each loop, and L is the total polymer length. Having defined the topological circuit framework, one can introduce measures to quantify the distance between two distinct topologies. Here, analogous to reaction coordinates along a pathway between two thermodynamic states, it is possible to define measures to represent progress along a topological reaction pathway from initial topology to final topology.[22] The developed measures on the topological space can also be employed to categorize and map molecular structures such as proteins and nucleic acids and compare the corresponding topological circuits. This allows translation of familiar molecular operations in biology, such as duplication, permutation, and elimination of contacts, into the language of circuit topology, which is based on a coherent algebraic framework.[23] Moreover, the statistical mechanics of the loops can be utilized to describe the statistical mechanics of networks with different circuit topologies as will be described in the following.[24] For a chain with N contacts, one can build a graph with a corresponding link configuration set defined between the connected monomer pairs, (i, j), . As prescribed in circuit topology, for any pair of links, three states are assigned, series (S), parallel (P), and cross (X). The topology matrix of links for each configuration is defined by A ∈ {S, P, X} as shown in Figure for a chain having 6 contacts. The number of perfect matching configurations of the graph having 2N nodes reads (2N – 1)!! = (2N – 1) × (2N – 3) ... × 1. Using Stirling’s approximation for large N, the number of configurations grows as e. By imposing the perfect matching condition on the graph configurations, one can calculate the circuit topology entropy of the chain configurations normalized by the total number of possible configurations using , where is obtained by the enumeration of the configurations in each topology state. The exact entropy of a chain with nine contacts as a function of the series and parallel topology fractions is shown in the right panel of Figure . The cross topology fraction is not an independent variable and can be determined from the relation (ns + np + nx) = 1. The circuit topology entropy goes to zero for the all-(S, P, X) configurations in the corners of the entropy plot, while its maximum values appear at the position where the contacts are equally distributed within the topology states, that is, for ns = np = nx = 1/3.[24] We will discuss later how the topological state equipartition breaks when the chain is internally constrained. As the circuit topology of a folded structure is defined by the arrangement of contacts and as it is invariant upon deformations of the loops formed between the contacts, several fundamental questions are raised: (i) How can a thermodynamic process, which is changing the configuration states of a chain, alter or pertain the circuit topology of a chain? (ii) Is a thermodynamic process, which might be the consequence of an external interference such as chaperones involved in protein folding processes, able to smoothly deform a topological circuit of a chain into a different one? (iii) How does the dynamics and out-of-equilibrium nature of a process affect the final topological states? (iv) How is the circuit topology degeneracy associated with a topological reaction able to hamper evolution to desired topologies? We address these questions by discussing results from simple and generic models prevailing in polymer physics and statistical mechanics in the following sections.

Folding/Unfolding Pathways in Topological Landscapes

Folding of molecular chains involves conformational searches within the free energy landscape. Since the conformational transitions are not always associated with topological transitions, finding the topological landscape and mapping the conformational transitions to their topological counterparts are fundamental questions in the field. Additionally, due to the cooperativity and nonlinearity of the interactions between the binding sites in the presence of solvent as well as the existence of many intermediate transition states, the folding and unfolding pathways are often irreversible.[25,26] Thus, it is not obvious that the reverse unfolding process follows the same route as the folding process. In the following, we address the question of how folding and unfolding processes are mapped on the topological space by looking at different scenarios of mechanical folding and unfolding.

Folding

The physics of folding reactions can be mostly explained by conceptual frameworks including the nucleation–propagation mechanism and the diffusion–collision model.[27−31] However, the topological arrangements and evolution of the contacts are not addressed in any of the theories. One way to monitor the real-time folding in a controlled manner is by restraining the ends of the linear (bio)molecules, for example, between two optically trapped beads (as is done for single-molecule mechanical interrogation studies). In silico modeling of end-restrained folding polymers revealed that the folding process starts with nucleation followed by growth. Before the onset of the nucleation, transient local entropic loops dominate leading to an increase in the number with series topology. After the nucleation, the circuit topology of the loops inside the nucleus leads to a drop in the topological fraction of the series topology while the other topologies, cross and parallel, grow. Such transient topological rearrangements converge to a steady-state, implying that the fold grows in a self-similar manner.[32] The circuit topology is a determinant of folding kinetics (and complements size and contact order). It is shown that the folding rate increases with the fractions of parallel and crossed relations.[33] The reason is backed by the zipping (nesting loops) effect in which the formation of contacts placed closely along the chain expedites formation of the contacts that are relatively far by bringing the contact sites closer together. The loops in parallel and cross topologies feature the zipping effect, while folding of the loops in series topology does not involve any nesting of contacts.

Unfolding

Many biological processes depend on the unfolding of biopolymers. For example, during translocation through a nanopore channel, degradation, and even folding of proteins and nucleic acids, it is required to partially or completely unfold a biopolymer. The dependency of the unfolding pathways on the native state topology is under investigation. The unfolding of the biopolymers can be triggered either by a change in the thermodynamic conditions of the medium, such as a change in temperature[26] of ion concentration,[34] or by an externally applied mechanical force.[35−40] Three different unfolding strategies can be anticipated for the mechanical unfolding: threading through a pore, pulling from the ends, and pulling by threading (see Figure ). For a three-contact chain as shown in Figure , provided the contacts are likely to break under the same force, each unfolding route has a different number of unfolding pathways. In the pulling method, when the contacts are in series or cross relations relative to each other, any of the two contacts can be opened independently. Thus, there exist two unfolding pathways for series and cross topologies. However, for a 2-contact chain with parallel contact arrangement, there exists only one pathway. The contact nested inside the other contact cannot be opened unless the outer contact is opened first. Thus, for the pulling unfolding of the chain example in Figure , the contact number 2 cannot be opened prior to the other contacts. In the threading method, only the contact in the nanopore can be opened at each time. Therefore, there exists only one pathway for the three topologies in the threading method. In the pulling by threading method, which is a combination of the previous two methods, the ratio of the length of the released chain behind the nanopore and the distance between the nanopore and the tethered chain end, that is, L/d, determines whether pulling by threading primarily acts like pulling or threading. For L < d, the pulling component would be dominant, while for L > d, the unfolding method would be similar to simple threading.[41] In Figure b, the number of pathways is shown for a 5-contact chain (N = 5) for the different unfolding methods. The total number of possible ways to pick a contact pair is N(N – 1)/2 = 10. This number is equal to the total number of contact pairs in each topology, Ns + Np + Nx = 10, where Ns, Np, and Nx are the number of contact pairs in series, parallel, and cross topologies, respectively. The plots in the figure show the number of unfolding pathways as a function of Ns and Np (Nx follows from the other two numbers). As discussed earlier, the number of pathways in either pulling or pulling by threading methods, decreases upon increasing the number of parallel contact pairs in the system. On the other hand, the number of pathways is constant for the threading unfolding process.

Figure 3

(a) Schematic representation of the three mechanical unfolding methods: pulling, threading, and pulling by threading. The number of pathways and efficiency of unfolding are listed for a 3-contact chain with a specific topology, L1PL2, L1SL3, L2SL3 (see eq ). (b) Number of pathways for unfolding a 5-contact chain using pulling, pulling by threading, and threading methods. The total number of contact pairs in each topology is Ns + Np + Nx = 10, where Ns, Np, and Nx are the numbers of contact pairs in series, parallel and cross topologies, respectively. The plots in the figure show the number pathways as a function of Ns and Np (Nx follows from these numbers). The color codes the number of pathways. Reproduced from ref (41). Copyright 2018 American Chemical Society.

Circuit Topology under Confinement

Confinement may drastically change the configuration entropy of linear molecular chains and consequently affects their topology or directs their folding toward a certain topology. Here, two different scenarios can be considered: confinement can be introduced internally; alternatively the chain can be confined by enclosing confinements. Understanding the so-called confinement assisted folding is important both for biologists and chemists who wish to facilitate the synthesis of a desired molecular fold.

Internal Constraint

For linear chains with 2N intrachain binding sites, the number of possible ways to fold is given as ∼(2N – 1)!!. For example, for a chain having 10 binding sites, there are 1500 different folding pathways. Every binding pair can adopt different topologies with distinct transition rates between them. The total number of contact pairs that each can occupy either series, parallel, or cross topology grows as N(N – 1)/2. Using Monte Carlo simulations, it was shown that the topological dynamics of a simple linear chain is strongly affected by the presence of a “non-native” contact that is transiently introduced into the chain during the folding process.[42] The role of an external molecule (chaperone) is schematically shown in Figure , where it can be seen that an external two point contact is sufficient to accelerate or slow down the formation of certain topologies. The presence of such contacts deforms the folding-time landscapes of the chains and hence alters the occupation probability of topological states. Examples of such mechanisms can be found in chaperone-assisted folding processes in cells. For example, trigger factor has finger like appendages that are able to touch a few sites on the unfolded chain and restrain those segments.[43,44] Such internal confinements bias the conformational search of nonfolded molecular chains toward certain fold topologies.

Figure 4

Folding time maps of a chain under different types of internal restraints. The folding time of a chain with no external perturbation (control case) is shown on the top. The folding time map in the presence of a chaperone is shown in the lower panels, where the chaperone binds to a native contact of the chain (lower left) and where it binds to the chain and forms an external contact (lower right). In both cases, the diffusion of the contact pairs of the chaperone is three times smaller than the diffusion of the chain’s native binding sites. The axes of the ternary plots are topological fractions in series (S), parallel (P), and cross (X).[42] Reproduced from ref (42) 2017 with permission from the PCCP Owner Societies.

External Confinement

An important class of confinement is external confinement imposed by an enclosing cavity. Here, the whole part or a large part of a chain is confined within a certain geometry such as a sphere of radius Rc.[45−48] These external perturbations can alter the localization of the chain binding sites and accordingly enhance the binding probabilities and contact formations. It has been observed that the persistence length, which is determined by the stiffness of the chain, can influence the contact probabilities as well. Given the confinement length scale Rc, and the gyration radius Rg of the chain, which depends on the persistence length as well as on the chain length, one can distinguish different regimes based on the ratio of the two scales. At large confining radius Rc > Rg, due to entropy, the linear chain remains in a coiled configuration in which the formation of independent contacts along the chain with small contact order are probable (see Figure ). This accordingly increases the fraction of series topology with respect to other topologies. However, at small confining radius, Rc < Rg, the chain folds on itself. In this regime, the fractions of cross and parallel topologies are enhanced in the chain topological circuits with cross becoming predominant. At an intermediate confining regime, a critical radius can be expected at which all topological states have equal probability. This means that the confining radius can be considered as the topological reaction coordinate through which one can tune the occupancy of topological space. Furthermore, over a wide range of confining radii, loops arranged in parallel and cross topologies have nearly identical contact orders. The existence of such a degeneracy implies that the kinetics and transition rates between the topological states cannot be solely explained by contact order. In addition to spherical confinement, the topological circuit of the chain can be investigated under ellipsoidal confinement in which more topological reaction coordinates are introduced into the system. In this case, the change in the aspect ratio of the confinement can alter the contact probability and accordingly the circuit topology.

Figure 5

Fraction of topological circuits and corresponding averaged contact order (CO) as a function of the confining radius, Rc for a linear polymer chain (see eq ). The dashed line indicates the transition confinement radius, Rct, at which all topological fractions are equal.[45] Topological circuits of two intramolecular loops under spherical confinement with radius Rc are illustrated on the right side. Reproduced from ref (45) 2017 with permission from the Royal Society of Chemistry.

Nuclear Confinement

An extreme case of external confinement of a polymer is the organization of the DNA molecules inside cells[49] where macroscopic lengths of DNA, for example, 2 m in human cells, need to fit into micrometer-sized nuclei. As the cell needs access to genetic information, there are serious demands on the topology of DNA inside the nucleus. In fact, already in 1993,[50] it was speculated that the DNA cannot show an equilibrium conformation (a polymer inside a spherical compartment, similar to a polymer globule in a poor solvent) as this would lead to conformations too entangled to be accessible. Instead a structure akin to a collapsed globule was proposed, the state into which a polymer folds when one quickly changes the solvent quality from good to poor (e.g., through a temperature jump). In 2009, experimental evidence for this claim was found through a then new method, chromosome conformation capture.[9] This study suggested that the contact probability along the DNA (for human cells in interphase) decays as s–1 with genomic distance s, as opposed to s–3/2 for the equilibrium globule.[49] Unlike for an equilibrium globule, the overall conformation is fractal. An interesting question is whether the hierarchical organization of such a nonequilibrium fractal globule would manifest itself in the context of circuit topology in different fractions of the three topological states. In this context, one might also study space filling curves like Peano and Hilbert curves that have been invoked as toy models for fractal globules.[9] Chromosome capture experiments at higher resolution led in 2014[51] to the discovery of contact domains (also called topologically associated domains), contiguous stretches of DNA of median length 185 kilobases, which have a substantially higher contact probability with themselves than with the rest of the genome. Boundaries of topologically associated domains are typically demarcated by a short base pair sequence to which insulator protein CCCTC-binding factor (CTCF) is bound. The two boundaries of a domain are in direct physical contact, and such a pair of nonpalindromic CTCF binding sites is always in convergent orientation. At first it appeared mysterious how pairs of DNA sites about 200 kilobases apart can find each other and then only bind if their sequences happen to be in convergent orientation. The solution to this puzzle lies in another protein that is bound to these locations: cohesin. It has been suggested that these molecules act as loop extruders, causing the formation of the contact domains.[12] Extrusion complexes contain two DNA binding subunits tethered together. Initially these two subunits bind nearby on the DNA. Then they move along the DNA in opposite directions while bridging these increasingly distant chromosomal sites, thereby increasing the size of the loop. The spooling of DNA into the loop continues until the subunits encounter CTCF proteins bound to flanking convergently arranged CTCF binding sites, which block further extrusion. As a result, topological domains are dynamical systems of loops that are nonconcatenated with each other. As nonconcatenated polymer loops are known not to mix[52] (unlike open polymers in solution[18]), such topological domains are separated spatially from each other. Circuit topology applied to such cohesin-induced loops would show that there are only series and parallel topological states but no cross topologies. If cross topologies would be present, especially beyond the boundaries of topological domains, the isolating effect of the ring topology would be destroyed and the effect of the cohesins would just be comparable to the effect of a poor solvent. Finally, before cell division, there is a spectacular polymer physics problem to overcome. Each DNA double helix (chromosome) is copied, and one has two identical DNA copies entangled with each other. In order to propagate the genome to the two daughter cells, these molecules need to be neatly separated. The result, the X-shaped mitotic chromosome, is well-known but how does the cell arrive at that state? Polymer physics teaches us that the gain in the separation of two overlapping chains is only on the order of the thermal energy.[53] What is needed are motor proteins that pull the chains apart, but the challenge lies in how theses motors can distinguish the two identical chains. We know now that this is achieved by another loop extruder, condensin (see ref (49) for a historical overview on how this insight has been reached). When condensin molecules start to act on the DNA molecules, they create loops, shortening each chromosome lengthwise. As the loops are nonconcatenated, they repel each other. On one hand, this creates the desired repulsion between the chromosome pairs, which are only kept together at the centrosome. On the other hand, the loops along each chromosome stiffen the complex. The result of this process is the mitotic X-shaped chromosome, as beautifully demonstrated in a computer simulation[54] (see ref (13) for a more detailed study into mitotic chromosome formation). Also here one finds in terms of circuit topology only series and parallel topological states but no cross topology. Interestingly, despite the absence of cross topology, the two DNA copies still suffer from entanglements as they are driven apart from each. A specialized protein, toposiomerase II, resolves these entanglements by letting the DNA double helices pass through each other. In summary, the three-dimensional organization of extremely long DNA molecules in tiny nuclear compartments leads to various challenges. Nature overcomes some of these challenges by making use of topology. Typically these topological states cannot be seen directly but leave their footprints in circuit topology by showing unusual distributions of the different topological states.

Topology Sorting and Enrichment

In the previous sections, we discussed how a chain folds into its “native” circuit topology and how this process can be guided (e.g., by confinement). Synthesis is a key process both in biology and in polymer chemistry. In chemistry, synthesis is typically followed by purification and enrichment and characterization. This can be achieved, for example, by passaging the synthesized chains through nanopores.[55] This technique has been used to identify the contacts and loops in nucleic acids. Under constant pulling force, one can identify the passage of loops with different topologies by examining the passage time. If the intramolecular contacts remain intact during the passage, the topology of the chain dramatically affects the dynamics of passage through the nanopore. This phenomenon can be exploited to separate or enrich a certain topology in a mixture. If the chain contacts are lost during passage, the discrimination between pure states, in which the majority of contacts are arranged identically, becomes possible.[55]

Conclusions and Future Perspectives

The topological description of a molecular chain ignores geometric and chemical details but keeps the contact arrangements and knotted structures. This allows us to disentangle the contribution of topology to folding processes from chemistry and geometry. In this Outlook, we discussed the recent studies on the circuit topology of polymer folding and unfolding reactions occurring in different circumstances, such as confinements, internal constraints, and nanopore translocation. The circuit topology framework can be readily extended to include additional complexities seen in nature or designed in an engineered setting. Contacts with higher valencies, transient contacts, and crossing can be included. Geometric and mechanical information can also be combined with the topological description to address a given physicochemical problem. Among these possible extensions, merging knot theory and circuit topology is a key step. Some initial progress has been made in this direction. Adams et al. have recently added intrachain contacts to knot theory.[56] Future developments are needed to provide a generalized circuit topology that can be generically applied to describe topology of any given molecular fold and can provide measures that are readily observable in experiments. We envision that the topology approaches discussed here open up new research directions in polymer chemistry, genome biology, and protein biophysics.

47 in total

Circuit Topology Analysis of Polymer Folding Reactions.

Introduction

Formation and Arrangement of Contacts

Folding/Unfolding Pathways in Topological Landscapes

Folding

Unfolding

Circuit Topology under Confinement

Internal Constraint

External Confinement

Nuclear Confinement

Topology Sorting and Enrichment

Conclusions and Future Perspectives

1. Shear-flow-induced unfolding of polymeric globules.

2. Correlation between mechanical strength of messenger RNA pseudoknots and ribosomal frameshifting.

3. Protein-folding dynamics.

4. Size and topology modulate the effects of frustration in protein folding.

Review 5. Circuit topology of proteins and nucleic acids.

6. Mapping a single-molecule folding process onto a topological space.

7. Reshaping of the conformational search of a protein by the chaperone trigger factor.

8. Shear-induced unfolding triggers adhesion of von Willebrand factor fibers.

9. Intricate knots in proteins: Function and evolution.

10. The energy cost of polypeptide knot formation and its folding consequences.

1. Topological dynamics of an intrinsically disordered N-terminal domain of the human androgen receptor.

2. Circuit topology analysis of cellular genome reveals signature motifs, conformational heterogeneity, and scaling.

3. Circuit topology predicts pathogenicity of missense mutations.

4. ProteinCT: An implementation of the protein circuit topology framework.