Literature DB >> 25371772

On the computational ability of the RNA polymerase II carboxy terminal domain.

Jim Karagiannis1.   

Abstract

The RNA polymerase II carboxy terminal domain has long been known to play an important role in the control of eukaryotic transcription. This role is mediated, at least in part, through complex post-translational modifications that take place on specific residues within the heptad repeats of the domain. In this addendum, a speculative, but formal mathematical conceptualization of this biological phenomenon (in the form of a semi-Thue string rewriting system) is presented. Since the semi-Thue formalism is known to be Turing complete, this raises the possibility that the CTD - in association with the regulatory pathways controlling its post-translational modification - functions as a biological incarnation of a universal computing machine.

Entities:  

Keywords:  Carboxy terminal domain; Computation; RNA polymerase II; Semi-Thue string rewriting system; Transcription; Turing completeness; Universal Turing machine

Year:  2014        PMID: 25371772      PMCID: PMC4217226          DOI: 10.4161/cib.28303

Source DB:  PubMed          Journal:  Commun Integr Biol        ISSN: 1942-0889


The Computational Power of Simple, Combinatorial, Symbolic Systems

Past research has clearly demonstrated the ease with which simple, combinatorial, symbolic systems can function as universal computing machines (i.e., systems capable of calculating any computable function). In addition to Alan Turing's original construction, there are many other notable examples; one of the earliest of these being Marvin Minsky's 7 state, 4 symbol machine based upon an Emil Post type “tag system”. Other examples include Stephen Wolfram's 2 state, 5 color cellular automaton, as well as Wolfram's 2 state, 3 color cellular automaton (the simplest system defined to date). In each case the simple manipulation of symbols according to explicit rules imparts upon the system the capacity to perform complex computations – in fact, given access to infinite memory and sufficient time, any computable function is calculable (i.e., any function that can be computed, is computable by the given system). Excellent reviews on this subject can be found in both the popular and scientific literature.,- Similar to the realm of abstract computing, the field of molecular and cellular biology is also filled with a multitude of combinatorial, symbolic systems; one of the most striking of which is comprised of the RNA polymerase II carboxy terminal domain and its associated effector molecules. A speculative examination of the computational ability of this system forms the basis of this addendum and is discussed in detail below.

The RNA Polymerase II Carboxy Terminal Domain

The RNA polymerase II holoenzyme is a large, eukaryotic enzyme complex that functions to transcribe protein coding genes (as well as microRNAs)., Interestingly, the largest subunit of the complex, Rpb1, possesses at its carboxy terminus an unusual, repetitive consensus sequence referred to simply as the carboxy terminal domain (or CTD).- The CTD is comprised of multiple repeats of the heptapeptide sequence, YSPTSPS, and is highly conserved in all fungi, plants, and metazoans., In addition, it has long been known that the Rpb1 CTD exists in both hyper- and hypo-phosphorylated states and that regulated changes in phosphorylation (on Tyr-1, Ser-2, Thr-4, Ser-5, and Ser-7 residues) influence both the initiation of transcription and transcript elongation.,,- Current models suggest that these modifications also affect the physical recruitment of accessory proteins that function in various aspects of pre-mRNA processing.-,,- The importance of the CTD is also supported by the fact that it is essential for viability in all organisms tested to date. While partial truncations of the CTD sequence can be tolerated, the deletion of the entire CTD is invariably lethal.-,,, Curiously, while the CTD is indeed essential for viability, it is not required for basal transcriptional activity in vitro.,, This strongly suggests that, while the CTD is not catalytically essential, it must perform other crucial functions within eukaryotes. What these functions are, and the mechanism(s) by which the CTD carries out these functions, has been the subject of much interest.,-,,,, In the remainder of this addendum, a speculative, but testable mathematical hypothesis regarding the underlying nature of the CTD is proposed (a hypothesis based upon a careful consideration of some of the lab's previous results).,,, Within this paradigm, the CTD (and its associated effectors) are viewed as a simple semi-Thue string re-writing system., Since the semi-Thue computational formalism is known to be Turing complete, this raises the possibility that the CTD functions as a biological incarnation of a universal computing machine. To advance these ideas, it is first necessary to present some simple, mathematical preliminaries regarding semi-Thue systems. These preliminaries are described below.

Semi-Thue String Rewriting Systems

Semi-Thue string rewriting systems define an abstract model of computation first described by the Norwegian mathematician, Axel Thue. Essentially, such systems are comprised of a series of “rewrite” rules that control how the system converts symbols in a string into other symbols. Formally, a semi-Thue system can be defined as a 2-tupel T =(A,R) where A describes a finite alphabet. Given A, it is possible to define A* (the Kleene closure). A* is simply the set of finite length words over A (i.e., the set of finite words resulting from the concatenation of the symbols comprising the alphabet). For example, if A were defined as A ={a,b,c} then A* would be comprised of A* ={e,a,b,c,aa,ab,ac,ba,bb,bc,ca…} where e represents the empty set. Using A*, the re-write rules, R, of the system can be defined as R ⊆ A* × A* R simply defines a set of pairs of strings, where each string is an element of A*. For instance, if A ={a,b,c}and R = {(a,b),(aa,bc)} then the semi-Thue system, T, would search an initial string for an instance of a, or aa, and replace these symbols so that a → b, or aa → bc. Each re-write step in the process is performed non-deterministically, i.e., if there is more than one possibility of applying rules from R, then there is no preference as to which rule is applied, or where it is applied to in the string. Rules continue to be applied until no occurrences of rewritable strings remain. Using this definition it is possible to create systems capable of computation. For example, one could create a system, T, capable of adding two quantities together by defining A and R as A = {*,+} and R = {*+*,**} where n concatenated asterisks represents the natural number, n. If given the string “*+**+***+****” (representing 1 + 2 + 3 + 4), the system would then non-deterministically apply the rewrite rule until the string was reduced to “**********” (representing the number, 10). A useful tool to both create and examine semi-Thue systems is a javascript interpreter (freely available at https://github.com/mvmn/Thue-in-java) for the esoteric programming language, “Thue” (http://esolangs.org/wiki/Thue). Programs in “Thue” consist of a series of rewrite rules followed by the initial string. The rewrite rules are of the form lefthandside ::= righthandside. The list of rewrite rules terminates with the symbol ::= which is immediately followed by the initial string. For example, the system defined above would be represented in the “Thue” programming language as *+*::= ** ::= *+**+***+**** Despite its simplicity, the semi-Thue formalism is nevertheless known to be Turing complete., Thus, given infinite memory and sufficient time, any computable function is calculable using such systems. In other words, any function that can be computed, is computable using the semi-Thue formalism. Other more sophisticated examples of semi-Thue systems implemented in the “Thue” language can be found at http://lvogel.free.fr/thue.htm.

Conceptualizing the RNA Pol II CTD as a Semi-Thue String Rewriting System

To conceptualize the CTD as a biologically relevant and naturally selectable semi-Thue string rewriting system, several key observations must be noted. First, that each copy of the YSPTSPS heptad is phosphorylatable on Tyr-1, Ser-2, Thr-4, Ser-5, or Ser-7. Second, that eukaryotic cells modulate the phosphorylation status of each heptad through the regulated action of both kinases and phosphatases.-,-,-,,,, Third, that mutations affecting post-translational modification of the CTD profoundly influence phenotype in a wide variety of distinct organisms.-,,,-,,,-,-,,- And fourth, that progress through the transcription cycle is controlled (at least in part) by a series of sequential phosphorylation events. For example, Kin28 mediated Ser-5 phosphorylation in budding yeast leads to the recruitment of the Bur1/2 Ser-2 kinase complex and the subsequent phosphorylation of Ser-2 residues55. Thus, modifications at one residue can influence the subsequent recruitment of interacting proteins that are themselves CTD effectors. Many other examples of such phenomenon can be found in the literature.-,-,-,,,,,,- Taking all of the above observations together, it is clear that 1) the CTD possesses symbols, 2) that these symbols can be altered according to explicit rules, and 3) that these modifications influence phenotype. Thus, all the requirements of a simple combinatorial, symbolic system capable of computation (and that is sensitive to natural selection) are satisfied. Significantly, this conceptualization is realized without postulating novel biological mechanisms or introducing unconventional computational paradigms. Thus, in the final analysis, these observations lead directly to the hypothesis that the CTD computes – and does so in a manner analogous to that described for semi-Thue string rewriting systems.

Simulating a Turing machine with a Semi-Thue String Rewriting System

In the following paragraphs the computational power of semi-Thue systems is formally demonstrated by generating an abstract semi-Thue string rewriting system that acts as a Turing machine. As first described by Huet and Langford, it is relatively simple to construct a semi-Thue string rewriting system capable of simulating any Turing machine. Briefly, take a Turing machine M = (Q,Γ,∂,q) where 1. Q is a finite set of internal states, Q = {q}; 2. Γ is the tape alphabet, Γ = {s}; 3. δ is the transition function, ∂:Q × Γ →Q × Γ × {L,R}; 4. q ∈ Q is the initial state; and 5. F ⊆ Q is the set of final states. The transition function of this machine can then be given as a list of 5-tupels (q, where 1. q ∈ Q ∪ F is the current state; 2. s ∈ Γ is the current symbol; 3. q ∈ Q ∪ F is the next state; 4. s ∈ Γ is the next symbol; and 5. d ∈ {L,R} is the direction of movement of the tape head (left or right). Next, rewrite rules can be created that correspond to the transition rules in ∂. Movements of the tape head to the left correspond to rewrite rules of the form (s) where s indicates the symbol initially to the left of the tape head (0 ≤ m ≤ n). Movements of the head to the right, on the other hand, correspond to rewrite rules of the form (q) where s in this case indicates the symbol initially to the right of the head. In this way the position of q denotes the position of the head, and the symbol to the right of q denotes the current symbol to be read. Since it is possible to 1) simulate any Turing machine using semi-Thue grammar, as well as 2) define a universal Turing machine capable of simulating any other Turing machine (given the code and input word for that machine), it thus follows that the semi-Thue formalism is indeed Turing complete.

A Concrete Example

To further illustrate the ideas presented above, consider a simple Turing machine that takes any binary string and prepends 0 to that string. A description of the transition function for such a machine is shown in Table 1. It is possible to visualize the actions of such a machine by encoding the transition function into the Turing machine simulator of the software package, JFLAP (http://www.jflap.org/). As shown in the animation contained in Supplementary Video SV1, the machine (using a binary string as input) halts after prepending 0 to the string. The “.jff” file encoding this Turing machine is included as Supplementary File S1.

Table 1. Transition function for a Turing machine that prepends 0 to any binary string

Current StateCurrent SymbolNext StateNext SymbolDirection
s>r0>R
r00r00R
r01r10R
r0*l0L
r10r01R
r11r11R
r1*l1L
l0l0L
l1l1L
l>h>-

The symbol s represents the starting state; r represents the state in which the head moves to the right and prints 0; r represents the state in which the head moves to the right and prints 1; l represents the state in which the head moves to the left and prints the last read symbol; h represents the halting state; > is the symbol denoting the start of the binary string; * is the symbol denoting the end of binary string; R, and L represent the direction of movement of the head (right or left, respectively).

The symbol s represents the starting state; r represents the state in which the head moves to the right and prints 0; r represents the state in which the head moves to the right and prints 1; l represents the state in which the head moves to the left and prints the last read symbol; h represents the halting state; > is the symbol denoting the start of the binary string; * is the symbol denoting the end of binary string; R, and L represent the direction of movement of the head (right or left, respectively). As described in the previous section, it is now trivial to simulate this Turing machine using a semi-Thue grammar by implementing the following rewrite rules 1. (s > s> r) 2. (r0 s0 r) 3. (r1 s0 r) 4. (s0) 5. (r0 s1 r) 6. (r1 s1 r) 7. (s1) 8. (s0 ,l s0) 9. (s1 , l s1) 10. (l > s> h s). It is again possible to visualize the actions of the machine; this time by encoding the rewrite rules into the software package, “Thue.” Since “Thue” is unable to accept subscripted symbols, r becomes r, and r becomes t within the program. In addition l becomes p to avoid confusion with respect to visualizing the symbols 1 and l. Thus, in this incarnation of the machine, the tape head is denoted by a letter (s, r, t, p, or h) that represents q. The Turing machine can thus be simulated in “Thue” using the program s>0::=>r0 s>1::=>r1 s>>::=>r> s>*::=>r* r00::=0r0 r01::=0r1 r0>::=0r> r0*::=0r* r10::=0t0 r11::=0t1 r1>::=0t> r1*::=0t* 0r*::=p00 1r*::=p10 >r*::=p>0 *r*::=p*0 t00::=1r0 t01::=1r1 t0>::=1r> t0*::=1r* t10::=1t0 t11::=1t1 t1>::=1t> t1*::=1t* 0t*::=p01 1t*::=p11 >t*::=p>1 *t*::=p*1 0p0::=p00 1p0::=p10 >p0::=p>0 *p0::=p*0 0p1::=p01 1p1::=p11 >p1::=p>1 *p1::=p*1 p>0::=>h0 p>1::=>h1 p>>::=>h> p>*::=>h* :: = s>* where the binary string would be inputted between the “>” and “*” in the final line of the program. As shown in the animation contained in Supplementary Video SV2, the machine, upon inputting a binary string, halts after prepending 0. The “Thue” file encoding this Turing machine is included as Supplementary File S2.

Encoding Programs Using a CTD-like Symbolic Structure

From a biological perspective, the key question that now remains is whether the CTD possesses enough symbolic complexity to encode specific programs using a semi-Thue grammar. As previously described, it can be formally shown that the CTD possesses (at minimum) 260 bits of informational entropy that could be exploited to encode such programs9. In other words, symbols comprised of distinct heptad configurations could be used to represent the rewrite rules. Transition from one heptad configuration to another could then be envisioned to result from the phosphorylation dependent recruitment of a specific CTD effector. In such a scenario, CTD-like symbols could be represented in “Thue” by way of five character strings (representing single heptads) in which “O” denotes a non-phosphorylated residue, and “P” denotes a phosphorylated residue (e.g., a heptad unphosphorylated on Tyr-1, Thr-4, and Ser-5, but phosphorylated on Ser-2 and Ser-7, would be represented by “OPOOP”). Since each heptad can exist in any one of 32 distinct configurations, it becomes possible to encode sophisticated programs using this grammar. For example, one could encode the Turing machine described above using only nine distinct heptad configurations (where s in the original “Thue” program is represented by (OOOOO), r by (POOOO), t by (OPOOO), p by (OOOPO), h by (PPPPP), 0 by (PPOPP), 1 by (OOPOO), > by (OPPPP) and * by (PPPPO). This machine (encoded using a CTD-like symbolic structure) can now be simulated in “Thue” by the program (OOOOO)(OPPPP)(PPOPP)::=(OPPPP)(POOOO)(PPOPP) (OOOOO)(OPPPP)(OOPOO)::=(OPPPP)(POOOO)(OOPOO) (OOOOO)(OPPPP)(OPPPP)::=(OPPPP)(POOOO)(OPPPP) (OOOOO)(OPPPP)(PPPPO)::=(OPPPP)(POOOO)(PPPPO) (POOOO)(PPOPP)(PPOPP)::=(PPOPP)(POOOO)(PPOPP) (POOOO)(PPOPP)(OOPOO)::=(PPOPP)(POOOO)(OOPOO) (POOOO)(PPOPP)(OPPPP)::=(PPOPP)(POOOO)(OPPPP) (POOOO)(PPOPP)(PPPPO)::=(PPOPP)(POOOO)(PPPPO) (POOOO)(OOPOO)(PPOPP)::=(PPOPP)(OPOOO)(PPOPP) (POOOO)(OOPOO)(OOPOO)::=(PPOPP)(OPOOO)(OOPOO) (POOOO)(OOPOO)(OPPPP)::=(PPOPP)(OPOOO)(OPPPP) (POOOO)(OOPOO)(PPPPO)::=(PPOPP)(OPOOO)(PPPPO) (PPOPP)(POOOO)(PPPPO)::=(OOOPO)(PPOPP)(PPOPP) (OOPOO)(POOOO)(PPPPO)::=(OOOPO)(OOPOO)(PPOPP) (OPPPP)(POOOO)(PPPPO)::=(OOOPO)(OPPPP)(PPOPP) (PPPPO)(POOOO)(PPPPO)::=(OOOPO)(PPPPO)(PPOPP) (OPOOO)(PPOPP)(PPOPP)::=(OOPOO)(POOOO)(PPOPP) (OPOOO)(PPOPP)(OOPOO)::=(OOPOO)(POOOO)(OOPOO) (OPOOO)(PPOPP)(OPPPP)::=(OOPOO)(POOOO)(OPPPP) (OPOOO)(PPOPP)(PPPPO)::=(OOPOO)(POOOO)(PPPPO) (OPOOO)(OOPOO)(PPOPP)::=(OOPOO)(OPOOO)(PPOPP) (OPOOO)(OOPOO)(OOPOO)::=(OOPOO)(OPOOO)(OOPOO) (OPOOO)(OOPOO)(OPPPP)::=(OOPOO)(OPOOO)(OPPPP) (OPOOO)(OOPOO)(PPPPO)::=(OOPOO)(OPOOO)(PPPPO) (PPOPP)(OPOOO)(PPPPO)::=(OOOPO)(PPOPP)(OOPOO) (OOPOO)(OPOOO)(PPPPO)::=(OOOPO)(OOPOO)(OOPOO) (OPPPP)(OPOOO)(PPPPO)::=(OOOPO)(OPPPP)(OOPOO) (PPPPO)(OPOOO)(PPPPO)::=(OOOPO)(PPPPO)(OOPOO) (PPOPP)(OOOPO)(PPOPP)::=(OOOPO)(PPOPP)(PPOPP) (OOPOO)(OOOPO)(PPOPP)::=(OOOPO)(OOPOO)(PPOPP) (OPPPP)(OOOPO)(PPOPP)::=(OOOPO)(OPPPP)(PPOPP) (PPPPO)(OOOPO)(PPOPP)::=(OOOPO)(PPPPO)(PPOPP) (PPOPP)(OOOPO)(OOPOO)::=(OOOPO)(PPOPP)(OOPOO) (OOPOO)(OOOPO)(OOPOO)::=(OOOPO)(OOPOO)(OOPOO) (OPPPP)(OOOPO)(OOPOO)::=(OOOPO)(OPPPP)(OOPOO) (PPPPO)(OOOPO)(OOPOO)::=(OOOPO)(PPPPO)(OOPOO) (OOOPO)(OPPPP)(PPOPP)::=(OPPPP)(PPPPP)(PPOPP) (OOOPO)(OPPPP)(OOPOO)::=(OPPPP)(PPPPP)(OOPOO) (OOOPO)(OPPPP)(OPPPP)::=(OPPPP)(PPPPP)(OPPPP) (OOOPO)(OPPPP)(PPPPO)::=(OPPPP)(PPPPP)(PPPPO) ::= (OOOOO)(OPPPP)(PPPPO) where the binary string would be inputted between the “(OPPPP)” and “(PPPPO)” in the final line of the program. Furthermore, it is again possible to visualize the actions of the machine using “Thue” (Supplementary Video SV3). The “Thue” file encoding this Turing machine is included as Supplementary File S3. Finally, if we were to continue this speculative line of reasoning, and imagined the existence of distinct protein complexes that specifically bound the given tri-heptads of the left-hand side and specifically converted them (through the action of associated kinases/phosphatases) to the tri-heptad configurations of the right-hand side, it would then be possible to envision a CTD-like system capable of behaving as a Turing machine. Of course, it is not being suggested that this is indeed the case in vivo (i.e., it is not being suggested that the CTD literally behaves as a Turing machine). Instead, these concepts are presented only to demonstrate how easily one could program sophisticated algorithms into the CTD using string rewriting systems and established biological mechanisms.

Final Thoughts

In conclusion, it is important to note that the re-write rules used to construct a given program necessarily determine the computation being performed. This is to say, any number of unique programs could be constructed using different rewrite rules. Moreover, when considering these principles in a biological context, it is crucial to be cognisant of the fact that the rewrite rules would themselves be governed by the biochemical activity of the CTD effectors (e.g., kinases, phosphatases, cis-trans isomerases) present within the cell. Thus, in the final analysis, the “program” encoded would ultimately be under the control of natural selection. Thus, depending on the selective pressures experienced, any number of computational machines could be implemented through the CTD as a function of the rewrite rules. Lastly, the conspicuous location of the CTD as part of an enzyme complex required for the transcription of all protein coding genes in almost all developmentally complex eukaryotes must also be noted. This last fact raises the fundamental biological question of whether CTD based computations have been exploited over the course of evolutionary time to control the sophisticated temporal/spatial regulation of transcription in these organisms.
  45 in total

1.  Evolution of the RNA polymerase II C-terminal domain.

Authors:  John W Stiller; Benjamin D Hall
Journal:  Proc Natl Acad Sci U S A       Date:  2002-04-23       Impact factor: 11.205

Review 2.  Cracking the RNA polymerase II CTD code.

Authors:  Sylvain Egloff; Shona Murphy
Journal:  Trends Genet       Date:  2008-05-03       Impact factor: 11.639

Review 3.  The CTD code of RNA polymerase II: a structural view.

Authors:  Olga Jasnovidova; Richard Stefl
Journal:  Wiley Interdiscip Rev RNA       Date:  2012-10-05       Impact factor: 9.957

Review 4.  Disentangling the many layers of eukaryotic transcriptional regulation.

Authors:  Katherine M Lelli; Matthew Slattery; Richard S Mann
Journal:  Annu Rev Genet       Date:  2012-08-28       Impact factor: 16.830

5.  A gene-specific requirement of RNA polymerase II CTD phosphorylation for sexual differentiation in S. pombe.

Authors:  Damien Coudreuse; Harm van Bakel; Monique Dewez; Julie Soutourina; Tim Parnell; Jean Vandenhaute; Brad Cairns; Michel Werner; Damien Hermand
Journal:  Curr Biol       Date:  2010-06-03       Impact factor: 10.834

6.  FUS binds the CTD of RNA polymerase II and regulates its phosphorylation at Ser2.

Authors:  Jacob C Schwartz; Christopher C Ebmeier; Elaine R Podell; Joseph Heimiller; Dylan J Taatjes; Thomas R Cech
Journal:  Genes Dev       Date:  2012-12-15       Impact factor: 11.361

Review 7.  Dynamic phosphorylation patterns of RNA polymerase II CTD during transcription.

Authors:  Martin Heidemann; Corinna Hintermair; Kirsten Voß; Dirk Eick
Journal:  Biochim Biophys Acta       Date:  2012-09-07

8.  Pol II CTD kinases Bur1 and Kin28 promote Spt5 CTR-independent recruitment of Paf1 complex.

Authors:  Hongfang Qiu; Cuihua Hu; Naseem A Gaur; Alan G Hinnebusch
Journal:  EMBO J       Date:  2012-07-13       Impact factor: 11.598

9.  Comparative genomics of cyclin-dependent kinases suggest co-evolution of the RNAP II C-terminal domain and CTD-directed CDKs.

Authors:  Zhenhua Guo; John W Stiller
Journal:  BMC Genomics       Date:  2004-09-20       Impact factor: 3.969

10.  Recruitment of TREX to the transcription machinery by its direct binding to the phospho-CTD of RNA polymerase II.

Authors:  Dominik M Meinel; Cornelia Burkert-Kautzsch; Anja Kieser; Eoghan O'Duibhir; Matthias Siebert; Andreas Mayer; Patrick Cramer; Johannes Söding; Frank C P Holstege; Katja Sträßer
Journal:  PLoS Genet       Date:  2013-11-14       Impact factor: 5.917

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.