Weijun Zhu1, Changwei Feng2, Huanmei Wu3. 1. School of Information Engineering, Zhengzhou University, Zhengzhou 450001, China. 2. The Second Affiliated Hospital of Zhengzhou University, Zhengzhou 450001, China. 3. School of Informatics and Computing, Indiana University-Purdue University Indianapolis, Indianapolis, IN, USA.
Abstract
As an important complex problem, the temporal logic model checking problem is still far from being fully resolved under the circumstance of DNA computing, especially Computation Tree Logic (CTL), Interval Temporal Logic (ITL), and Projection Temporal Logic (PTL), because there is still a lack of approaches for DNA model checking. To address this challenge, a model checking method is proposed for checking the basic formulas in the above three temporal logic types with DNA molecules. First, one-type single-stranded DNA molecules are employed to encode the Finite State Automaton (FSA) model of the given basic formula so that a sticker automaton is obtained. On the other hand, other single-stranded DNA molecules are employed to encode the given system model so that the input strings of the sticker automaton are obtained. Next, a series of biochemical reactions are conducted between the above two types of single-stranded DNA molecules. It can then be decided whether the system satisfies the formula or not. As a result, we have developed a DNA-based approach for checking all the basic formulas of CTL, ITL, and PTL. The simulated results demonstrate the effectiveness of the new method.
As an important complex problem, the temporal logic model checking problem is still far from being fully resolved under the circumstance of DNA computing, especially Computation Tree Logic (CTL), Interval Temporal Logic (ITL), and Projection Temporal Logic (PTL), because there is still a lack of approaches for DNA model checking. To address this challenge, a model checking method is proposed for checking the basic formulas in the above three temporal logic types with DNA molecules. First, one-type single-stranded DNA molecules are employed to encode the Finite State Automaton (FSA) model of the given basic formula so that a sticker automaton is obtained. On the other hand, other single-stranded DNA molecules are employed to encode the given system model so that the input strings of the sticker automaton are obtained. Next, a series of biochemical reactions are conducted between the above two types of single-stranded DNA molecules. It can then be decided whether the system satisfies the formula or not. As a result, we have developed a DNA-based approach for checking all the basic formulas of CTL, ITL, and PTL. The simulated results demonstrate the effectiveness of the new method.
Differing from an electronic computer, a DNA computer uses DNA molecules as the carrier of computation. In 1994, a Turing Award winner Professor Adleman published an article in 〈Science〉 that solved a small-scale Hamiltonian path problem with a DNA experiment [1], which is regarded as the pioneering work in the field of DNA computing. As DNA computing has a huge advantage for parallel processing, this technique was subsequently advanced rapidly. Many models and approaches based on DNA computing have been developed to solve some complex computational problems, especially the famous NP-hard problems and PSPACE-hard ones. For example, Lipton published an article in 〈Science〉 that improved Adleman's idea for the SAT problem [2]. Ouyang et al. published an article in 〈Science〉 that presented a DNA-computing-based model for solving the maximal clique problem [3]. Benenson et al. published an article in 〈Nature〉 that solved an automata problem of two states and two characters using the autonomous DNA computing technique [4].Many other DNA models have been constructed, such as the restricted model [5], the sticker system [6], the length-encoding model [7], the sticker automaton model [8], the DNA Turing machine model [9], the nonenumerative DNA model [10], the giant-magneto-resistance-based DNA model [11], the logical DNA molecular model [12], and the logical nanomolecular model [13]. And a series of methods based on nonautonomous or self-assembling are proposed for solving various complex computational problems, including the Nondeterministic Polynomial (NP) ones. For example, there are methods proposed for the maximum clique problem [14, 15], the vertex coloring one [10, 16], the SAT one [11], the N queen one [17], the maximum matching one [18], the minimum vertex cover one [19], the minimum and exact cover one [20], the subset-sum one [21], the classical Ramsey number one [22], the spatial cluster analysis [23, 24], and the knapsack [25].On the one hand, some problems in computer science can be solved by applying the techniques based on biochemical reactions in test tubes, nanodevices, or molecular self-assembly [1, 26–28]. On the other hand, due to the excellent information processing mechanism and the huge parallelism, some living cells can also be employed to perform some computations. The site-specific DNA recombinase Hin, which can mediate inversion of DNA segments that represent variables, was used to produce the solution. In this model, each cell can produce and examine a solution of satisfiability problem. As a result, billions of cells can explore billions of possible solutions [29]. In this way, Professor Chen et al. constructed a cellular computing model [29] to solve the satisfiability problem. In addition, a conditional learning system in Escherichia coli was built to identify the “bad man” signal with the help of the “learning” signal. It is a useful attempt to construct the artificial intelligent system using some molecular biological techniques.One of the key differences between computer and other computing tools is the universality. Professor Xu constructed a mathematical model called “probe machine” for the general DNA computer [30]. By integrating the storage system, operation system, detection system, and control system into a whole, a real general DNA computer was gradually obtained, which was the “Zhongzhou DNA computer” [30]. A probe machine is a nine-tuples consisting of data library, probe library, data controller, probe controller, probe operation, computing platform, detector, true solution storage, and residue collector [15]. It is a universal DNA computing model which can be realized in biology. And a Turing machine is just a “special case” of a probe machine [15]. This significant progress has raised the practical importance of the researches on DNA computing.More studies on DNA computing have been conducted for the last three years. Some of the major studies are summarized as follows: (1) aiming to deal with some inherent flaws of DNA computing, such as adaptability [31] and instability [32]; (2) employing DNAs to realize some basic computing components and/or techniques, such as data storage [33], database operations [34], odd parity checker [35], half adder [36], encryption [37], and data hiding [38]; (3) utilizing DNAs to address some problems in real world, such as the inverse kinematics redundancy problem of six-degree-of-freedom humanoid robot arms [39], dynamic control of elevator systems [40], and hyperspectral remote sensing data/imagery [41, 42].Besides the satisfiability problem, model checking (MC) is another important computational problem. These two problems are correlated. The MC proposed by the Turing Award winner Professor Clarke et al. [43] is widely used in the fields of CPU verification [44], network protocol verification, security protocol verification [45], and software verification [46]. MC algorithms answer automatically the question of whether a system satisfies the given property or not. NASA, Intel, IBM, and Motorola are using this technique. The general principles of MC can be given as follows: (i) a system model is constructed with an automaton; (ii) a property which the system should satisfy is described by a temporal logic formula; and (iii) if an automaton is a model of the formula, the system model satisfies the property; otherwise, the system does not satisfy the property.In order to describe the different temporal properties, some different temporal logic types have been proposed. For instance, Linear Temporal Logic (LTL) was introduced into computer science to express the linear properties by the Turing Award winner Professor Pnueli [47]. Computation Tree Logic (CTL) was proposed to express the branch properties by the Turing Award winner Professor Clarke [48, 49]. Interval Temporal Logic (ITL), Duration Calculus (DC), and Projection Temporal Logic (PTL) were also investigated to express other temporal properties [50-52].As a complex computational problem, model checking under the circumstance of DNA computing is always a goal for researchers. In 2006, some DNA molecules were applied to conduct CTL model checking for the first time by the Turing Award winner Professor Emerson et al. [53]. However, this method can check only one basic CTL formula, called EFp. It is known that there are eight basic formulas in CTL, that is, EpUq, ApUq, EFp, AFp, EGp, AGp, EXp, and AXp. It has been a pending and challenging issue to perform model checking for all of the eight basic CTL formulas using DNA computing. As shown in Table 1, there are eight basic formulas for CTL, two for ITL, and one for PTL. Except for the EFp formula, all the other ten basic formulas in CTL, ITL, and PTL cannot conduct model checking under the circumstance of DNA computing using the existing methods.
Table 1
State of the art of DNA model checking and its open problems.
Temporal logic
State of the art (the formulas which have been checked)
The problems to be solved (the formulas which cannot be checked yet)
LTL
All the four basic formulas
General formulas and cellular model checking
CTL
The basic formula EFp
Another seven basic formulas (will be studied in this paper)
ITL
Nothing reported
All the two basic formulas (will be studied in this paper)
PTL
Nothing reported
All the one formula (will be studied in this paper)
DC
It is infeasible due to the limitation of the current biochemical experimental technique
Motivated by it, we proposed a set of DNA-based model checking algorithms. With our new algorithms, all the eleven basic formulas for CTL, ITL, and PTL can undergo model checking via some DNA molecules. Basically, the core model checking problem for the CTL, ITL, and PTL is solved by DNA computing, because every CTL/ITL/PTL formula can be obtained by combining the basic CTL/ITL/PTL formulas recursively. This is the main contribution of this paper.The rest of this paper is organized as follows. Section 2 introduces some basic concepts. Our newly proposed algorithms will be described in Section 3. The simulated experiments will be presented in Section 4, which demonstrates that the new algorithms are feasible in molecular biology. Section 5 provides brief conclusions. The formal definitions of these temporal logic types are given in the Appendix.
2. Preliminary
2.1. The Basic Formulas in CTL [43]
Definition 1 .
Let p and q be atomic propositions and EpUq, ApUq, EFp, AFp, EGp, AGp, EXp, and AXp be the basic CTL formulas. An arbitrary CTL formula can be obtained by recursive combinations of these basic CTL formulas.An atomic proposition and a basic CTL formula are interpreted on a system model M, and their intuitive meanings are given as follows:Figure 1 gives some example models which satisfy the eight basic CTL formulas. A circle represents a state, and a letter in a circle represents an atomic proposition which is satisfied in the state. A line segment with an arrow means an edge (i.e., a transition between two states). A state sequence from the root node to a leaf node is called a path. Time passes from top to bottom, and the different branches represent the alternative transitions from the current state to the next one.
Figure 1
Examples of the basic CTL formulas and their models.
p or q is satisfied in a state s.EpUq describes the property: there exists at least one path in M, such that p is always satisfied until q is satisfied.ApUq describes the property: for each path in M, p is always satisfied until q is satisfied.EFp describes the property: there exists at least one path in M, such that p is eventually satisfied.AFp describes the property: for each path in M, p is eventually satisfied.EGp describes the property: there exists at least one path in M, such that p is always satisfied.AGp describes the property: for each path in M, p is always satisfied.EXp describes the property: there exists at least one path in M, such that p is satisfied in the next state.AXp describes the property: for each path in M, p is satisfied in the next state.For the model M in Figure 1(a), there are four paths. Each path passes through three states at three moments, which forms four sequences of atomic propositions: ppr, ppq, prr, and prr. It is noticeable that ppq, that is, the second path, satisfies the following property: p is always satisfied until q is satisfied. In contrast, any other path in M does not satisfy this property. According to the definition of EpUq, the model M satisfies EpUq.For the model M in Figure 1(b), there are also four paths with three states for each path. The four sequences of atomic propositions are: ppq, ppq, pqr and pqr. All paths in M satisfy the property: p is always satisfied until q is satisfied. According to the definition of ApUq, the model M satisfies ApUq.Similarly, for the model M in Figure 1(c), the path qqp in M satisfies the property: p is eventually satisfied. Thus, the model M satisfies EFp. For the model M in Figure 1(d), all four paths, qpq, qpq, qqp, and qqp, satisfy the property: p is eventually satisfied. Thus, the model satisfies AFp. For the model M in Figure 1(e), the path ppp in M satisfies the property: p is always satisfied. Thus, the model M satisfies EGp. For the model M in Figure 1(f), all the four paths, ppp, ppp, ppp, and ppp, satisfy the property: p is always satisfied, which makes the model M satisfy AGp. For the model M in Figure 1(g), the paths rpq and rpr in M satisfy the property: p is satisfied in the next state. Thus, M satisfies EXp. For the model M in Figure 1(h), all the four paths, rpq, rpr, rpp, and rpq, satisfy the property: p is satisfied in the next state. Thus, this M satisfies AXp.Given an arbitrary model M, the challenge is how to use the DNA-computing-based method to determine whether the eight basic CTL formulas are satisfied by M or not. Section 3.1 will provide our new approach which can check all the eight basic CTL formulas.
2.2. The Basic Formulas in LTL [43]
Definition 2 .
Let p and q be atomic propositions and pUq, Fp, Gp, and Xp be the basic LTL formulas. An arbitrary LTL formula can be obtained by combining recursively some basic LTL formulas. An atomic proposition and a basic LTL formula are interpreted on a path L and a system model M, and their intuitive meanings are given as follows:For a path L, time passes from left to right, and the system transits from the current state to the next one. For a model M, time passes from top to bottom, and the different branches represent the alternative transitions from the current state to the next one.p or q is satisfied in a state s, or not.pUq describes the property: for each path L in M, p is always satisfied until q is satisfied.Fp describes the property: for each path L in M, p is eventually satisfied.Gp describes the property: for each path L in M, p is always satisfied.Xp describes the property: for each path L in M, p is satisfied in the next state.Figure 2 gives one sample M for each basic LTL formula, respectively. The formula pUq is called the core LTL formula since every basic LTL formula can be expressed by pUq. Given an arbitrary model M, previous studies have provided approaches on how to use the DNA-computing-based method to determine whether the four basic LTL formulas are satisfied by M or not [54, 55].
Figure 2
Examples of the basic LTL formulas and their models.
2.3. The Basic Formulas in ITL [50]
Definition 3 .
Let p, q, p1, p2, q1, and q2 be atomic propositions and (p1Uq1); (p2Uq2) and (pUq) be the basic ITL formulas. An arbitrary ITL formula can be obtained by combining recursively some basic ITL formulas. A basic ITL formula is interpreted on a path L and a system model M, and their intuitive meanings are given as follows.Figure 3 gives an example for each basic ITL formula, respectively. For (p1Uq1); (p2Uq2), any path L in M has the following characteristics: L reaches a number of red states after it crosses some blue states. The prefix interval of L is denoted as the state sequence marked in blue, whereas the suffix interval of L is denoted as the state sequence marked in red. An ITL formula is satisfied in such an interval. In Figure 3(a), p1q1p2p2q2 is a path satisfying the ITL formula (p1Uq1); (p2Uq2). The prefix interval of p1q1p2p2q2 satisfies the LTL formula p1Uq1, whereas the suffix interval of p1q1p2p2q2 satisfies the LTL formula p2Uq2. Similarly, all the paths in M of Figure 3(a) satisfy the ITL formula (p1Uq1); (p2Uq2). In Figure 3(b), pqpq… is a path satisfying the ITL formula (pUq). The loop body consisting of an interval (i.e., pq) satisfies the LTL formula pUq. Similarly, all the paths in M of Figure 3(b) satisfy the ITL formula (pUq).
Figure 3
Examples of the basic ITL formulas and their models.
(p1Uq1); (p2Uq2): for each path L in M, the following property holds: prefix subpath (i.e., prefix interval) satisfies the core LTL formula p1Uq1, and suffix subpath (i.e., suffix interval) satisfies the core LTL formula p2Uq2.(pUq): for each path L in M, the following property holds: L circulates in a loop body consisting of a subpath (i.e., a loop body consisting of an interval), and the loop body of interval satisfies the core LTL formula pUq.Given an arbitrary model M, we will provide a new solution in Section 3.2 on how to use the DNA-computing-based method to determine whether the two basic ITL formulas are satisfied by M or not.
2.4. The Basic Formula in PTL [52]
Definition 4 .
Let p1, p2, p3, q1, q2, and q3 be atomic propositions and ((p1Uq1), (p2Uq2)) prj (p3∧Xq3) be the basic PTL formula. An arbitrary PTL formula can be obtained by combining recursively the basic PTL formula. A basic PTL formula is interpreted on a path L and a system model M, and its intuitive meaning is given as follows:Figure 4 gives a sample model for the basic PTL formula. For ((p1Uq1), (p2Uq2)) prj (p3∧Xq3), any path L in M has the three intervals: the fined-grained prefix interval is made up of the blue states, the fined-grained suffix interval is made up of the red states, and the coarse-grained interval is made up of the black states. In fact, the difference of the fined-grained intervals and the coarse-grained interval is the different units of time elapse.
Figure 4
An example of the basic PTL formula ((p1Uq1), (p2Uq2)) prj (p3∧Xq3) and its model.
((p1Uq1), (p2Uq2)) prj (p3∧Xq3): for each path L in M, the following property holds: (1) the prefix subpath (i.e., the fined-grained prefix interval) satisfies the core LTL formula p1Uq1, (2) the suffix subpath (i.e., the fined-grained suffix interval) satisfies the core LTL formula p2Uq2, and (3) the state sequence consisting of the first state in the fine-grained prefix interval and the first state in the fine-grained suffix interval (i.e., the coarse-grained interval) satisfies the LTL formula p3∧Xq3.Given an arbitrary model M, we will provide a new solution in Section 3.3 on how to use the DNA-computing-based method to determine whether the basic PTL formula is satisfied by M or not.
2.5. Finite State Automata and Model Checking
Definition 5 .
A Finite State Automaton (FSA) is a five-tuples (Σ, Q, T, q0, F), whereFigure 6 depicts an example for an FSA. This automaton is made up of two states and two transitions. State 0 is an initial state which is pointed at by an arrow without source, whereas state 1 is an acceptance state which is marked by a double circle. The automaton will enter state 0 if p is input at state 0, whereas the automaton will enter state 1 if q is input at state 0. The string pq is an acceptance word, since the automaton will transit from an initial state to an acceptance state if pq is input. Similarly, the strings q,ppq, pppq,… are acceptance words too. An acceptance language of an automaton is made up of all of the acceptance words of the automaton. In this example, {q,pq, ppq, pppq,…} is the acceptance language of the automaton which is illustrated by Figure 6.
Figure 6
An example on FSA.
Σ is a finite alphabet,Q is a finite set of states,T is a finite set of transitions: T : Q × Σ → R(Q),q0 ∈ Q is an initial state,F⊆Q is a set of acceptance states.The only difference between the automaton in Figure 6 and the one in Figure 7 is that the atomic propositions in the latter automaton are satisfied in the states rather than in the transitions. Therefore, the latter automaton is called a Label FSA (LFSA).
Figure 7
Some examples on LFSA: the systematic models of the experiments in this paper.
In classical computation, the principles of the algorithms for temporal logic model checking can be illustrated by Figure 5. A LFSA, denoted as B1, is used to describe some behaviors of a system, whereas an FSA, denoted as B2, is employed to construct a model of a temporal logic formula. The model checking algorithm will decide that the system meets the property specified by the formula, if some inclusion relations hold between the two acceptance languages of the two automata.
Figure 5
Principle of the model checking algorithms based on classic computing.
2.6. Sticker Automata and DNA Model Checking
2.6.1. Sticker Automata
As a model of DNA computing, a sticker automaton can realize an FSA. Given a DNA strand characterizing an input string and an FSA, the sticker automaton can determine whether or not the string is accepted by the FSA.M = (Σ, S, T, s0, F) is an FSA, and every character a in the alphabet Σ can be encoded as C(a). One way of the DNA encoding is as follows [56]:The computational process of sticker automata can be summarized in the following three steps [56].An input string a1,…, a in Σ can be encoded with the single-stranded DNA molecule: 5′ I1 X0 ⋯ X C(a1) ⋯ X0 ⋯ X C(a) X0 ⋯ X I2 3′, where I1 is an initiator sequence, X0 ⋯ X is a spacer sequence separating C(a), and I2 is a terminator sequence.A transition T(s, a) = s is encoded as 3′ 5′, where means the Watson-Crick complement (WC for short) of a nucleotide X and means the WC of the DNA strand characterizing a.An initial state s is encoded as 3′ 5′.An acceptance state s is encoded as 3′ 5′.
Step 1 (data preprocessing).
(1) Synthesize some DNA strands characterizing an automaton and its input strings.(2) Put all the DNA strands into the test tube T, and anneal to make sure that the strands and their WC complements can be hybridized completely. The process of base pairing and the placement of ligase can form complete or partial double-stranded DNA molecules.
Step 2 (computation).
After Step 1, there are two possible cases. If the input string is accepted by the automaton, the tube T contains only the complete double-stranded DNA molecules, which begin with an initiator sequence and terminate at a terminator sequence. Otherwise, there are partial double-stranded or single-stranded DNA molecules in T. For the second case, some fragments of the single-stranded DNA molecules which characterize the input strings are paired successfully with some single-stranded DNA molecules which characterize transitions, whereas other fragments of the single-stranded DNA molecules which characterize the input strings cannot be paired with any single-stranded DNA molecules which characterize transitions. Therefore, ribozymes called Mung Bean are poured into the test tube T to degrade the single-stranded DNA fragment and retain the complete double-stranded DNA molecules.
Step 3 (output of results).
The DNA molecules with different lengths can be separated using the electrophoretic technique. If there exist a variety of lengths of DNA molecules, this indicates that there are some partial double-stranded DNA molecules in T before we add the ribozymes, and the input string cannot be accepted by the automaton. Otherwise, T contains only complete double-stranded DNA molecules before we add the ribozymes, and the input string can be accepted by the automaton.
2.6.2. DNA Model Checking
On the basis of sticker automata, a DNA-computing-based LTL model checking method has been presented [55], which can be denoted as algorithm TL-MC-DNA(DNACODE(A), x), where DNACODE(A) and x are two inputs of the algorithm, where A is an FSA expressing a run of a system, DNACODE(A) is an encoding with a sticker automaton for characterizing A, x = DNACODE(A(f)) is an encoding with a sticker automaton for characterizing A(f), and A(f) is an FSA model of a formula f. The scope of f includes all the basic LTL formulas and some popular LTL formulas (f formula) [55]. The output of the algorithm is yes or no, representing the result of the model checking. The principle of this algorithm is illustrated by Figure 8.
Figure 8
Principle of the model checking algorithms based on DNA computing [55].
2.7. The Four FSAs of the Formulas and Their DNA Model Checking
Given a temporal logic formula, an FSA model can be computed [43, 50, 52]. Figure 9 gives the four FSA models for the four specific formulas of temporal logic, respectively. Their corresponding relations are shown in Table 2, where φ2 and φ3 are the basic ITL formulas and φ4 is the basic PTL formula. In addition, ┐ is logical negation, is logical duality of U, and describes the property: for each path in M, there exists at least one state which does not satisfy p, before q is satisfied.
Figure 9
The four FSA models of the four formulas.
Table 2
Relationships: the four formulas of temporal logic and their FSA models, where is the logical duality of U.
The formula
φ1=┐pU--┐q
φ2 = (p1Uq1); (p2Uq2)
φ3 = (pUq)∗
φ4 = ((p1Uq1), (p2Uq2)) prj (p3∧Xq3)
FSA of formula
A1
A2
A3
A4
3. The DNA Model Checking Method
As mentioned in Section 2.6.2, if the encoding of one sticker automaton for an FSA of a system and the encoding of the other sticker automaton for an FSA of a formula are input into the algorithm TL-MC-DNA(DNACODE(A), DNACODE(A(f))) [55], the algorithm can compute and return the model checking results. This has been confirmed for the effectiveness of the algorithm TL-MC-DNA for the f formulas by simulated biological experiments [55]. This paper expands the range of the formula f and a series of new encodings of sticker automata, which will be explained in detail in Section 4. The DNA model checking for the four temporal logic formulas in Table 2 is performed by running the algorithm TL-MC-DNA(DNACODE(A), DNACODE(A(f′′))), where f′′ = {φ1, φ2, φ3, φ4} (f′′ formula). Section 4 will study the effectiveness of the new algorithm for the f′′ formulas by a number of simulated biological experiments. It should be noted that the algorithm TL-MC-DNA(DNACODE(A), DNACODE(A(f))) comes from previous research [55]. Due to space limitations, its pseudocode is not given in this paper.
3.1. The DNA Model Checking for the Basic CTL Formulas
There are eight basic CTL formulas. Due to the different semantics, the DNA model checking algorithms are different too.
3.1.1. The DNA Model Checking for the Four Universal Formulas
Four basic CTL formulas, ApUq, AFp, AGp, and AXp, are called the universal formulas since their semantics are all involved in “all paths.” Comparing the CTL formula ApUq and the LTL formula pUq, it can be clearly seen that these two formulas have the same semantics. Therefore, the algorithm TL-MC-DNA(DNACODE(A), x) [55] can be employed to check the CTL formulas ApUq, AFp, AGp, and AXp. The detailed algorithm is formulated as shown in Algorithm 1.
Algorithm 1
CTLQ-MC-DNA(DNACODE(A), DNACODE(A(f))), the DNA model checking algorithm for the universal CTL formulas.
3.1.2. The DNA Model Checking for the Four Existence Formulas
The remaining four basic CTL formulas, EpUq, EFp, EGp, and EXp, are called the existence formulas since their semantics are all involved in “there exists at least one path.” Each of these four existence formulas is related to one of the universal formulas, which is summarized in Table 3.
Table 3
The relationships between the existence and the universal CTL formulas.
Existence formulas
Universal formulas
Relationships
EpUq
ApUq
┐EpUq=A┐pU--┐q
EpUq=┐A┐pU--┐q
EGp
AFp
┐EGp = AF┐p
EGp = ┐AF┐p
EFp
AGp
┐EFp = AG┐p
EFp = ┐AG┐p
EXp
AXp
┐EXp = AX┐p
EXp = ┐AX┐p
Comparing and , it can be observed that these two formulas have the same semantics. Thus, ┐φ1 = EpUq. Therefore, the algorithm TL-MC-DNA(DNACODE(A), DNACODE(A(f′′ = φ1))) can be used to check the CTL formula EpUq. Similarly, the algorithm TL-MC-DNA(DNACODE(A), x) can be employed to check the CTL formulas EFp, EGp, and EXp. The detailed algorithm is formulated as shown in Algorithm 2. It should be noted that when a negative form of an atomic proposition occurs in the algorithm and is assigned as its argument, only one new atomic proposition is needed in the design of DNA encoding. No modification is needed on the algorithm, the FSA structure, or the encoding scheme of sticker automata.
Algorithm 2
CTLC-MC-DNA(DNACODE(A), DNACODE(A(f))), the DNA model checking algorithm for the existence CTL formulas.
3.1.3. The DNA Model Checking for the Basic CTL Formulas
The principle of this algorithm is as follows. (1) If a basic CTL formula is a universal formula, Algorithm 1 will be called. (2) And if a basic CTL formula is an existence formula, Algorithm 2 will be called. In this way, model checking of the basic CTL formulas can be conducted. The algorithm is formulated as shown in Algorithm 3.
Algorithm 3
CTL-MC-DNA(DNACODE(A), DNACODE(A(fCTL))), the DNA model checking algorithm for the basic CTL formulas.
3.1.4. Complexity Analysis
The time complexity of the algorithm TL-MC-DNA is O(m + n) [55], where m means the number of nodes in an automaton and n means the number of edges in this automaton. Therefore, Algorithm 1 needs to execute O(m + n) + O(3) = O(m + n) times operations. Similarly, Algorithm 2 needs to execute O(m + n) + O(3) = O(m + n) times operations. Algorithm 3 calls Algorithm 1 or Algorithm 2, so that the complexity of Algorithm 3 is O(m + n). In comparison, the model checking of the basic CTL formulas based on classical computing has a square complexity.Regarding the efficiency of the algorithm in the classical model checking based on electronic computing, a computational process will advance sequentially. In the DNA model checking, the process is different. A large number of molecules execute computations at the same time, in a parallel manner. Although/since the massive computational units (i.e., molecules) are involved in computation, the efficiency of the algorithm is improved. In contrast, the classical model checking requires fewer computational units but more computational steps. In short, the DNA computing is better in the time at the cost of space, compared with the classical computing. Thus, the two kinds of computing approaches are complementary.
3.2. The DNA Model Checking for the Basic ITL Formulas
3.2.1. The DNA Model Checking for the Basic ITL Formulas
There are two basic ITL formulas. The basic ITL formula (p1Uq1); (p2Uq2) can perform DNA model checking by calling the algorithm TL-MC-DNA(DNACODE(A), DNACODE(A(f′′ = φ2))). The basic ITL formula (pUq) can perform DNA model checking by calling the algorithm TL-MC-DNA(DNACODE(A), DNACODE(A(f′′ = φ3))). The algorithm is formulated as shown in Algorithm 4.
Algorithm 4
ITL-MC-DNA(DNACODE(A), DNACODE(A(fITL))), the DNA model checking algorithm for the basic ITL formulas.
3.2.2. Complexity Analysis
Algorithm 4 calls the algorithm TL-MC-DNA, which has a complexity of O(m + n) [55]. Therefore, the complexity of Algorithm 4 is O(m + n). In comparison, the model checking of the basic ITL formulas based on classical computing has an exponential complexity.
3.3. The DNA Model Checking for the Basic PTL Formula
3.3.1. The DNA Model Checking for the Basic PTL Formula
The DNA model checking for the basic PTL formula ((p1Uq1), (p2Uq2)) prj (p3∧Xq3) can be performed by calling the algorithm TL-MC-DNA(DNACODE(A), DNACODE(A(f′′ = φ4))). The algorithm is formulated as shown in Algorithm 5.
Algorithm 5
PTL-MC-DNA(DNACODE(A), DNACODE(A(fPTL))), the DNA model checking algorithm for the basic PTL formulas.
3.3.2. Complexity Analysis
Algorithm 5 calls the algorithm TL-MC-DNA which has a complexity of O(m + n) [55]. Thus, the complexity of Algorithm 5 is O(m + n). In comparison, the model checking of the basic PTL formula based on classical computing has an exponential complexity.
4. Simulated Experiments
The core implement component of our new approaches is TL-MC-DNA algorithm which is called by all the new methods. We have implemented this algorithm on the general model of sticker automata, with a simulation platform called NUPACK [57]. It has been confirmed that, (1) for the nine FSAs of the nine specific temporal logic formulas, the algorithm TL-MC-DNA can be realized effectively in molecular biology; (2) for the above FSAs, one can design their appropriate encoding of sticker automata, so that the accuracy rate of base pairing reaches more than 99% [55]. For the four FSAs of the formulas presented in Section 2.7, it is important to implement the TL-MC-DNA algorithm effectively in molecular biology. In particular, the biological effectiveness of the algorithms from 1 to 5 is dependent on this. Therefore, the same experimental platform and experimental means with the ones in [55] are employed to carry out the molecular biological simulated experiments.The design of the DNA encoding is in relation to the success of the experiment. In order to ensure the specificity of hybridization, an encoding sequence must satisfy some physical constraints and thermodynamic constraints [58]. In this paper, the thermodynamic constraints, including the thermal denaturizing temperature, and the free energy are studied only because the problem is limited by the physical constraints [55]. NUPACK can be employed to design the DNA encodings for sticker automata, and this tool can simulate the hybridization phenomena which originate from the running of the TL-MC-DNA algorithm. This experimental way has been proved to be scientific in [55].Experimental Procedure. (1) According to Figures 7 and 9, one can design the encoding of the sticker automata for systematic FSAs shown in each subgraph, as well as the encoding of the sticker automata for FSAs of formulas shown in each subgraph, respectively; (2) for these FSAs mentioned above, one can simulate the process of hybridization between some single-stranded DNA molecules; (3) according to the five algorithms proposed in this paper, one can get the results of model checking of various temporal logic formulas, by reading the results of hybridization.Experimental Objective. The objective is to test the correctness, effectiveness, and biological reliability of the new algorithms.
4.1. Simulated Experiments for φ1
4.1.1. Encoding Designs
The DNA encoding via NUPACK is designed, as illustrated in Table 4. Figures 10, 11, and 12 show the thermodynamic analysis of the encoding sequence at 10°C. As shown in Figure 10, the Normalized Ensemble Defect (NED) means the incorrect matching ratio of the nucleotides when a biochemical reaction is in equilibrium. 0% implies an optimal design, whereas 100% implies the worst design. The NED of our coding sequence is 0.1%.
Table 4
Checking for φ1: the designed encoding sequence, where WC means Watson-Crick complementary strand of code.
Checking the formula φ1: the structural properties of encoding sequence.
Figure 11
Thermodynamic analysis for φ1: minimum free energy structure.
Figure 12
Checking for φ1: pairing probability in equilibrium.
The principle of the minimum free energy points out that the free energy is minimized when a biochemical reaction is in equilibrium. As shown in Figure 11, the color of the match between two kinds of molecules is dark red. The probability of the following event almost reaches 100%: the double-stranded molecule is completely matched. We find this fact by comparing color changes of the vertical bar that indicate the balance probability. Thus, its free energy is approximately equal to the minimum free energy.As shown in Figure 12, the position of the red line indicates that all bases in the two single strands are completely complementary to each other, and the color of the red line indicates that the probability of all the pairs is approximately equal to 1. As analyzed above, our DNA sequence satisfies the minimum free energy constraint and the DNA molecules that participate in the reaction have a basically consistent temperature of solution chain. Therefore, the experimental results obtained from this encoding are reliable and effective in biology.In fact, Table 4 indicates the encoding rules for the input strings, as shown in Table 5. According to Table 5 and the principle of encoding of sticker automata, we can deduce the encoding of the sticker automaton characterizing φ1, as shown in Table 6.
Table 5
Checking for φ1: the encoding rules of input strings characterizing runs, encoding by the way of sticker automata.
With the DNA code given in Section 4.1.1 at hand, we can conduct our simulated experiments. It should be noted that, in Section 4.1.2, all the encoding of the DNA molecules is written from left to right with a 5′-3′ direction, which is consistent with the way of writing in NUPACK.We will check whether or not the systematic FSA M1 satisfies the formula φ1. According to the DNA codes given by Section 4.1.1, we can get all the paths which come from the systematic runs, as shown in Table 7, where k is a natural number. The transition rules shown in Table 6 clearly indicate that none of the atomic proposition excerpts for s, u, and q takes part in the transitions of states. Therefore, we do no need to consider whether or not the states satisfy the atomic propositions p and r.
Table 7
The runs of the system M1.
Path
DNA code of the path or sequence of nodes (atomic propositions) crossed by the path
First, we will check path 1. There are two possible runs in this path. Without loss of generality, we support that the atomic proposition sequence which is crossed by the run is suq.All the molecules expressing the runs begin with GCCAGAA and end with GGCCGTC. Thus, we only need to consider d = TTGCAAGGCAGCGAATTGCAAGGCGCGGAATTGCAAGGCCCCGAATTGCAA. In short, we will observe whether or not hybridization occurs between the DNA molecules expressing transitions and the molecule d. For this experiment, the following six kinds of molecules are poured into a container with a volume of 10−15 L: d, t0s0, t0u1, t0s2, t1s1, and t1q2, for observing the hybridization.The systematic run, which is expressed by the molecule d, crosses the three states. If hybridization occurs between the DNA molecules expressing transitions and the molecule d, there are not more than three kinds of molecules which are the WC of some segment of d, involved in the specific hybridization. For selecting three kinds of molecules from all the five kinds of WC molecules, one has ten choices. Thus, the following ten groups of subexperiments are performed, accordingly.(1) Group 1: t0s0, t0u1, t1q2, and d. The concentrations of the four kinds of molecules are all 100 uM, and their molecular numbers are all 60000. With the temperature naturally dropped to 10°C, the hybridization reaction is observed. Figures 13(a) and 13(b) show the result of the hybridization, where strand1, strand2, strand3, and strand4 mean d, t0s0, t0u1, and t1q2, respectively.
Figure 13
Checking for φ1: the groups of subexperimental results on base pairing and hybridization.
In Figure 13(b), the coordinates of the location of the first red line from top to bottom indicate that the base sequence of the molecule d from the 1st to the 15th sites at 5′-3′ direction is paired with all of the fifteen bases of the molecule t0s0 at 3′-5′ direction. The coordinates of the location of the second red line from top to bottom indicate that the base sequence of the molecule d from the 16th to the 33rd sites at 5′-3′ direction is paired with all of the eighteen bases of the molecule t0u1 at 3′-5′ direction. The coordinates of the location of the third red line from top to bottom indicate that the base sequence of the molecule d from the 34th to the 51st sites at 5′-3′ direction is paired with all of the eighteen bases of the molecule t1q2 at 3′-5′ direction. The results suggest that the complete double-stranded DNA molecules are formed, and the hybridization among the four kinds of single-stranded DNA molecules is specific.Comparing the color of the three red lines with the color change of the vertical bar on the right side of Figure 13(b), it can be clearly seen that the former colors are very close to the color at the top of the vertical bar. This phenomenon suggests that the probabilities of these base pairs are close to 100%. This is a higher degree of specificity.As shown in Figure 13(a), the concentration of the molecule strand1-strand2-strand3-strand4 is 100 uM, and the concentrations of the molecules t0s0, t0u1, t1q2, and d are approximately equal to 0 after their hybridization. This indicates that all of the molecular reactants are involved in the specific hybridization, due to 100 uM/100 uM = 100%. Therefore, both the false negative rate and the false positive rate are approximate to 0. The true positive rate is approximately equal to 100%. In short, the results show that the four kinds of molecules are hybridized with strong specificity.(2) Group 2: t0s0, t0u1, t0s2, and d. The concentrations of the four kinds of molecules are all 100 uM, and their molecular numbers are all 60000. As the temperature naturally drops to 10°C, the hybridization reaction is observed. Figure 13(c) shows the result of the hybridization, where strand1, strand2, strand3, and strand4 mean d, t0s0, t0u1, and t0s2, respectively.See Figure 13(c). There exists a red dot in the segment of strand1 of the vertical thin bar on the right side of strand4, indicating that some bases of strand1 are not paired with others. The results suggest that the four kinds of molecules in group 2 do not form complete double strands.For all the other groups, all of the biochemical conditions and processes are similar to the groups above.(3) Group 3: t0s0, t0u1, t1s1, and d. Figure 13(d) shows the result. There exist some red dots in the segment of strand1 of the vertical thin bar on the right side of strand4, suggesting that the four kinds of molecules do not form complete double strands.(4) Group 4: t0s0, t0s2, t1s1, and d. Figure 13(e) shows the results. No red line is found at the 5′ end of strand1, indicating that the 5′ end of strand1 is not paired with any molecule. This suggests that the four kinds of molecules do not form complete double strands.(5) Group 5: t0s0, t0s2, t1q2, and d. The results are shown in Figure 13(f). There exist some red dots in the segments of strand3 and strand4 of the vertical thin bar on the right side of strand4, suggesting that the four kinds of molecules do not form complete double strands.(6) Group 6: t0s0, t1s1, t1q2, and d. As shown in Figure 13(g), no red line is found at the 5′ end of strand1, suggesting that the four kinds of molecules do not form complete double strands.(7) Group 7: t0u1, t0s2, t1s1, and d. As shown in Figure 13(h), there are some red dots in the segments of strand1 of the vertical thin bar on the right side of strand4, suggesting that the four kinds of molecules do not form complete double strands.(8) Group 8: t0u1, t0s2, t1q2, and d. As shown in Figure 13(i), there are some red dots in the segments of strand2 and strand3 of the vertical thin bar on the right side of strand4, suggesting that the four kinds of molecules do not form complete double strands.(9) Group 9: t0u1, t1s1, t1q2, and d. As shown in Figure 13(j), there exist a red dot in the segments of strand1 of the vertical thin bar on the right side of the strand4, suggesting that the four kinds of molecules do not form the complete double strands.(10) Group 10: t0s2, t1s1, t1q2 and d. As shown in Figure 13(k), there exist some red dots in the segments of strand2 and strand3 of the vertical thin bar on the right side of strand4, suggesting that the four kinds of molecules do not form complete double strands.According to the ten groups of subexperiments mentioned above, we find that only group 1 (i.e., t0s0, t0u1, t1q2, and d) can form complete double strands by the hybridization reaction. That is to say, the systematic run suq satisfies the formula φ1, since the first state does not satisfy q, the second state satisfies none of p and q, and the third state satisfies q.The above results are obtained when k = 1. It has been proved that a system satisfies the formula pUq, if and only if all the runs whose lengths are less than |V | ∗2|+|E| satisfy pUq, where |V| and |E| mean the number of nodes and the number of edges in the systematic FSA, respectively [55]. Similarly, we can prove that this conclusion holds for φ1. M1 has three nodes and three edges. Thus, we need to check fifteen paths due to k = 3∗23−1 + 3 = 15. With the same experimental way, we have checked the kth path, as shown in Table 8. M1 satisfies the formula φ1 since all paths (i.e., runs) satisfy this formula.
Table 8
The results: checking for φ1 in the different paths of M1 (whether or not the path satisfies φ1).
Formula
Path 1
Path k, where 15 > k > 1
Path 15
Does M1 satisfy φ1?
φ1
Yes
Yes
Yes
Yes
By calling the procedure for checking φ1, Algorithm 2 can get the model checking results on the formula EpUq. The model checking results on the eight basic CTL formulas are shown in Table 9. According to the experimental processes and results in Section 4.1, we can safely say that Algorithm 3, which can be employed to check the basic CTL formulas, has been effectively implemented in molecular biology.
Table 9
The model checking results: M1 and the basic CTL formulas (whether or not the system M1 satisfies these formulas).
Formula
Result
The used algorithm and decision basis
ApUq
No
TL-MC-DNA determines that M1 does not satisfy pUq, and thus Algorithm 1 determines that M1 does not satisfy ApUq
AFp
Yes
TL-MC-DNA determines that M1 satisfies Fp, and thus Algorithm 1 determines that M1 satisfies AFp
AGp
No
TL-MC-DNA determines that M1 does not satisfy Gp, and thus Algorithm 1 determines that M1 does not satisfy AGp
AXp
No
TL-MC-DNA determines that M1 does not satisfy Xp, and thus Algorithm 1 determines that M1 does not satisfy AXp
EpUq
No
Extended TL-MC-DNA determines that M1 satisfies φ1, and thus Algorithm 2 determines that M1 does not satisfy EpUq
EFp
Yes
TL-MC-DNA determines that M1 does not satisfy G¬p, and thus Algorithm 2 determines that M1 satisfies EFp
EGp
No
TL-MC-DNA determines that M1 satisfies F¬p, and thus Algorithm 2 determines that M1 does not satisfy EGp
EXp
No
TL-MC-DNA determines that M1 satisfies X¬p, and thus Algorithm 2 determines that M1 does not satisfy EXp
4.2. Simulated Experiments for φ2 and φ3
4.2.1. Encoding Designs
The formula φ2 and the formula φ3 need to be encoded with the same coding scheme since both formulas are ITL ones. Therefore, we combine the FSAs of these two formulas into one, as shown in Figure 14. Our design of a DNA encoding via NUPACK is shown in Table 10, while Figures 15, 16, and 17 show the thermodynamic analysis of the encoding sequence presented in Table 10 at 10°C. The NED of our coding sequence is 0.1%, which is illustrated in Figure 15. Its free energy is approximately equal to the minimum free energy, as shown in Figure 16. All bases in the two single strands are completely complementary to each other as shown in Figure 17, and the probabilities of all the pairs are approximately equal to 1. In Table 11, the encoding rules for the input strings are provided while Table 12 shows the encoding of the sticker automaton characterizing φ2 and φ3.
Figure 14
FSA of formula: merged graph A5 of A2 and A3.
Table 10
Checking for φ2 and φ3: the designed encoding sequence.
With the DNA code given in Section 4.2.1 at hand, we can conduct our simulated experiments. It should be noted that, in Section 4.2.2, all the encoding of the DNA molecules is written from left to right with a 5′-3′ direction, which is consistent with the way of writing using NUPACK.(1) Model Checking: Whether the Systematic FSA M
2
Satisfies φ
2
or Not. With our DNA codes, all the paths of M2 are shown in Table 13, where k is a natural number. By observing the transition rules which are related to φ2 and shown in Table 12, it can be seen that none of the atomic proposition excerpts for p1, q1, p2, and q2 takes part in the transitions of states. Therefore, we do no need to consider whether or not the states satisfy other atomic propositions.
Table 13
The runs of the system M2.
Path
DNA code of the path or sequence of nodes (atomic propositions) crossed by the path
Code of path 1
CGCT CGAATCGGAATG GAT CGAATCGGAATG GAT | ATA CGAATCGGAATG GAA CGAATCGGAATG TTC CGAATCGGAATG CGGC
Sequence of nodes crossed by path 1
0,1, 2,3 (p1, p1 | q1, p2, q2)
Code of path k
CGCT CGAATCGGAATG (GAT CGAATCGGAATG GAT | ATA CGAATCGGAATG)k GAA CGAATCGGAATG TTC CGAATCGGAATG CGGC
Sequence of nodes crossed by path k
(0,1)k, 2,3
First, we will check path 1. There are two possible runs in this path. Without loss of generality, we suppose that the atomic proposition sequence which is crossed by the run is p1q1p2q2. We only need to deal with d = ATCGGAATGGATCGAATCGGAATGATACGAATCGGAATGGAACGAATCGGAATGTTCCGAATCGGA. In short, we will observe whether or not hybridization occurs between the DNA molecules expressing transitions and the molecule d. To this end, we pour the following five kinds of molecules into a container with a volume of 10−15 L: d, t0p10, t0q11, t1p21, and t1q22, for observing the hybridization.The concentrations of the five kinds of molecules reach 100 uM, and their molecular numbers are all 60000. With the temperature naturally dropped to 10°C, the hybridization reaction is observed. Figure 18 shows the result of the hybridization, where strand1, strand2, strand3, strand4, and strand5 mean d, t0p10, t0q11, t1p21, and t1q22, respectively.
Figure 18
Checking for φ2: the experimental results on base pairing and hybridization.
In Figure 18(b), the coordinates of the location of the four red lines from top to bottom indicate that the complete double-stranded DNA molecules are formed by the hybridization among the five kinds of single-stranded DNA molecules. Comparing the color of the four red lines with the color change of the vertical bar on the right side of Figure 18(b), we can see clearly that the probabilities of these base pairs are close to 100%. This is a higher degree of specificity. As shown in Figure 18(a), the concentration of the molecules indicates that the true positive rate is approximately equal to 100%. Once again, it suggests that the five kinds of molecules are hybridized with strong specificity. Thus, the systematic run p1q1p2q2 satisfies the formula φ2.The above results are gotten when k = 1. It has been proved that a system satisfies the formula pUq, if and only if all the runs whose lengths are less than |V|∗2|+|E| satisfy pUq, where |V| and |E| mean the number of nodes and the number of edges in the systematic FSA, respectively [55]. A system satisfies the formula φ2, if and only if all the runs whose lengths are less than (|V1|∗2| + |E1|)+(|V2|∗2| + |E2|) satisfy pUq since φ2 is composed of the two pUq-like formulas sequentially, where |V1| and |E1| mean the number of nodes and the number of edges in the prefix interval of the systematic FSA, respectively, and |V2| and |E2| mean the number of nodes and the number of edges in the suffix interval of the systematic FSA, respectively. For M2, |V1 | = 2, |E1 | = 2, |V2 | = 2, and |E2 | = 1. Thus, we need to check eleven paths due to 2∗22−1 + 2 + 2∗22−1 + 1 = 11, as shown in Table 14. M2 satisfies the formula φ2 since all paths (i.e., runs) satisfy this formula. By calling the procedure for checking φ2, Algorithm 4 can get the model checking results on this basic ITL formula.
Table 14
The results: checking for φ2 in the different paths of M2 (whether or not the path satisfies φ2).
Formula
Path 1
Path k, where 11 > k > 1
Path 11
Does M2 satisfy φ2?
φ2
Yes
Yes
Yes
Yes
(2) Model Checking: Whether the Systematic FSA M
3
Satisfies φ
3
or Not. According to the DNA codes given by Section 4.2.1, we can get all the paths of M3, as shown in Table 15, where k is a natural number. The transition rules, related to φ3 and shown in Table 12, show that none of the atomic proposition excerpts for p and q takes part in the transitions of states. Therefore, we do no need to consider whether or not the states satisfy other atomic propositions.
Table 15
The runs of the system M3.
Path
DNA code of the path or sequence of nodes (atomic propositions) crossed by the path
Code of path 1
CGCT CGAATCGGAATG TAT CGAATCGGAATG TGA CGAATCGGAATG CGGC
First, we will check path 1. The atomic proposition sequence which is crossed by the run is pq. We only need to deal with d = ATCGGAATGTATCGAATCGGAATGTGACGAATCGGA. In short, we will observe whether or not hybridization occurs between the DNA molecules expressing transitions and the molecule d. To this end, we pour the following five kinds of molecules into a container with a volume of 10−15 L: d, t0p0, t0q2, t2p0, and t2q2, for observing the hybridization.The concentrations of the five kinds of molecules reach 100 uM, and their molecular numbers are all 60000. The hybridization reaction is observed as the temperature naturally drops to 10°C. Figure 19 shows the result of the hybridization, where strand1, strand2, strand3, strand4, and strand5 mean d, t0p0, t0q2, t2p0, and t2q2, respectively.
Figure 19
Checking for φ3: the experimental results on base pairing and hybridization.
As shown in Figure 19(b), the coordinates of the location of the two red lines from top to bottom indicate that the complete double-stranded DNA molecules are formed by the hybridization among t0p0, t0q2, and d. Comparing the color of the two red lines with the color change of the vertical bar on the right side of Figure 19(b), we can see clearly that the probabilities of these base pairs are close to 100%. This is a higher degree of specificity.As shown in Figure 19(a), 99 uM/100 uM = 99% of the molecules d take part in the specific hybridization. Note that only the molecule strand1-strand3-strand2 is the product of the specific hybridization. Therefore, the true positive rate of the specific hybridization of d is approximately equal to 99%. Similarly, the false negative rate of d is equal to 0, and the false positive rate of d is 0.68 uM/100 uM = 0.68%. As for t0p0, its false negative rate is 0.65 uM/100 uM = 0.65%, its false positive rate is equal to 0, and its true positive rate is approximately equal to 99%. Similarly, the false negative rate of t0q2 is equal to 0, the false positive rate of t0q2 is 0.68 uM/100 uM = 0.68%, and the true positive rate of t0q2 is approximately equal to 99%. Once again, this suggests that the three kinds of molecules are hybridized with strong specificity. Thus, the systematic run pq satisfies the formula φ3.Similarly, the above results are gotten when k = 1. It also has been proved that a system satisfies the formula pUq, if and only if all the runs whose lengths are less than |V | ∗2|+|E| satisfy pUq, where |V| and |E| mean the number of nodes and the number of edges in the systematic FSA, respectively [55]. As for φ3, the same property holds, since φ3 is composed of the one pUq-like formula recursively. For M3, we need to check six paths due to 2∗22−1 + 2 = 6, as shown in Table 16. M3 satisfies the formula φ3 since all paths (i.e., runs) satisfy this formula.
Table 16
The results: checking for φ3 in the different paths of M3 (whether or not the path satisfies φ3).
Formula
Path 1
Path k, where 6 > k > 1
Path 6
Does M3 satisfy φ3?
φ3
Yes
Yes
Yes
Yes
By calling the procedure for checking φ3, Algorithm 4 can get the model checking results on this basic ITL formula. According to the experimental processes and results in Section 4.2, we can safely say that Algorithm 4, which can be employed to check the basic ITL formulas, has been effectively implemented in molecular biology.
4.3. Simulated Experiments for φ4
4.3.1. Encoding Designs
We have designed a DNA encoding via NUPACK, as shown in Table 17. Figures 20, 21, and 22 show the thermodynamic analysis of the encoding sequence presented in Table 17 at 10°C. As shown in Figure 20, the NED of our coding sequence is 0.1%. Figure 21 shows that its free energy is approximately equal to the minimum free energy and Figure 22 shows that all bases in the two single strands are completely complementary to each other, and the probabilities of all the pairs are approximately equal to 1. Table 18 gives the encoding rules for the input strings. And Table 19 shows the encoding of the sticker automaton characterizing φ4.
With the DNA code given in Section 4.3.1 at hand, we can conduct our simulated experiments. It should be noted that, in Section 4.3.2, all the encoding of the DNA molecules is written from left to right with a 5′-3′ direction, which is consistent with the way of writing in NUPACK.According to the DNA codes given by Section 4.3.1, we can get all the paths of M2, as shown in Table 20. The transition rules related to φ4 are shown in Table 19 and none of the atomic proposition excerpts for p1, q1, p2, q2, and m1 takes part in the transitions of states. Therefore, we do no need to consider whether or not the states satisfy other atomic propositions. First, we will check path 1. There are two possible runs in this path. Without loss of generality, we support that the atomic proposition sequence which is crossed by the run is m1q1p2q2. We only need to deal with d = CGCATCATGTGGTCTTTGCATGGACGTAGTGATCGGCGCATCATGTGGTCTTTGCATGGACGTAATCCTCGGCGCATCATGTGGTCTTTGCATGGACGTACAAATCGGCGCATCATGTGGTCTTTGCATGGACGTAGGGATCGGCGCATCATGTGGTCTT.
Table 20
The runs of the system M2.
Path
DNA code of the path or sequence of nodes (atomic propositions) crossed by the path
In short, we will observe whether or not hybridization occurs between the DNA molecules expressing transitions and the molecule d. For selecting four kinds of molecules from all the eight kinds of WC molecules, one has seventy choices. Thus, we need to execute the seventy groups of subexperiments. For example, we pour the following five kinds of molecules into a container with a volume of 10−15 L: d, t0m11, t1q12, t2m33, and t3q24, for observing the hybridization.The concentrations of the five kinds of molecules reach 100 uM, and their molecular numbers are all 60000. With the temperature naturally dropped to 10°C, the hybridization reaction is observed. Figure 23 shows the result of the hybridization, where strand1, strand2, strand3, strand4, and strand5 mean d, t0m11, t1q12, t2m33, and t3q24, respectively.
Figure 23
Checking for φ4: a group of subexperimental results on hybridization: location and rate of base pairing.
As shown in the graph, the bases in the middle of strand4 are not paired with strand1, indicating that the complete double strands are not formed. As for any other group of subexperiments, the complete double strands are not formed. Thus, the systematic run m1q1p2q2 does not satisfy the formula φ4. Similarly, none of the runs in path 1 satisfies this formula. Therefore, there exists a path which does not satisfy φ4. That is to say, M2 does not satisfy the formula φ4.By calling the procedure for checking φ4, Algorithm 5 can get the model checking results on this basic PTL formula. According to the experimental processes and results in Section 4.3, we can safely say that Algorithm 5, which can be employed to check the basic PTL formulas, has been effectively implemented in molecular biology.
4.4. The Effect of Reaction Temperature on the Above Experimental Results
As mentioned above, (1) the complete double strands shown in Figures 13(a) and 13(b) are formed in the process of checking φ1; (2) the complete double strands shown in Figure 18 are formed in the process of checking φ2; (3) the complete double strands shown in Figure 19 are formed in the process of checking φ3. For these complete double strands which come from the hybridization, the relationships between the rates of unpaired bases and the reaction temperatures are illustrated in Figures 24, 25, and 26.
Figure 24
Effect of reaction temperature on hybridization for φ1: forming complete double strands (rate of unpaired bases).
Figure 25
Effect of reaction temperature on hybridization for φ2: forming complete double strands (rate of unpaired bases).
Figure 26
Effect of reaction temperature on specific hybridization for φ3: forming complete double strands (rate of unpaired bases).
The temperatures are illustrated in Figures 24, 25, and 26. As shown in these graphs, the lower the reaction temperature, the higher the ratio of base pairing. This result suggests that the cooling target temperature (i.e., reaction temperature) has an important influence on the experimental results, and 10°C is a suitable temperature to ensure the specificity of hybridization. In comparison, the initial temperature and the cooling speed are not crucial. As far as sticker automata are concerned, one can place directly a container with some molecular reactants at room temperature, in order to obtain his/her products of hybridization. This is the standard experimental way given in [56] for sticker automata.
4.5. The Simulated Experiments on Molecular Kinetics
Regarding the complete double strands shown in Figures 13(a), 13(b), 18, and 19, the base pairings are illustrated in the corresponding graphs. This is a result of the competitive hybridization among the different kinds of molecules. In order to better observe the process of molecular competition, we design a number of experiments. With the DIZZY tool for the DNA molecular kinetics [59], the famous chemical kinetics algorithm, called Gibson-Bruck [56], is applied to compute the dynamic changes of the numbers of the various kinds of molecules in the process of hybridization. The results are illustrated in Figures 27 and 28.
Figure 27
Simulated experiments in molecular kinetics: complete double strands formed for checking φ1.
Figure 28
Experiments in molecular kinetics: complete double strands formed for checking φ2 and φ3.
Figure 27 shows the variation of the numbers of the different kinds of molecules in the process of formation of the complete double strands shown in Figures 13(a) and 13(b). The blue line in Figure 27(a) shows a change in the number of the complete double strands. Throughout the process, the number of the complete double-stranded molecules increases exponentially with time elapse, and the number of the molecular reactants decreases exponentially with time elapse, approaching zero, as shown in Figure 27(a). Within 10 seconds, the blue line begins to approximate the upper bound of 60000 and it reaches the maximum value of 59405 (more than 99% of the upper bound) in 727 seconds (about 12 minutes), as shown in Figure 27(d).The changes for the numbers of the nonspecific molecular products are illustrated in Figures 27(b) and 27(c). At first, they increase rapidly, suggesting that a large number of nonspecific products occur. In the end, they decrease exponentially, suggesting that the specific molecular products dominate the competition eventually. It can be explained that the specific molecular products have more advantages on the physical structure and thermodynamic properties than the nonspecific molecular products.Figure 28 shows the variation of the numbers of the different kinds of molecules in the process of formation of the following two kinds of complete double strands: the one shown in Figure 18 and the one in Figure 19. These phenomena, rules, and causes in molecular kinetics are similar to the ones in Figure 27.
4.6. Some Comparisons among the New Method and the Related Ones
4.6.1. Comparing the New Methods with Other Related DNA-Based Ones
Table 21 gives a comparison of power between the new method and the existing DNA-based ones. From this table, the following observations can be drawn:In summary, our new method extends the range of the DNA model checking, and some stronger temporal properties can be checked. In addition, the new method does not simply call the algorithm TL-MC-DNA in [55]. There are some key differences between the two methods.In addition, Sections 4.1, 4.2, and 4.3 have confirmed that the simulated biochemical experiments can ensure the correctness and effectiveness for DNA model checking. In comparison, the simulated experiments on molecular kinetics in Section 4.5 further demonstrate that the DNA model checking can be biochemically implemented in acceptable time.
Table 21
A comparison of power among the various DNA model checking methods (can the method conduct DNA model checking for a given formula?).
Logic
Basic formula
Method in [53]
Method in [55]
Method in [54]
The new method
What the new method can do
LTL
pUq
No
Yes
Yes
No
—
Fp
No
Yes
The method can be used to check. However, it is not practical to check due to the limitation of the code
No
—
Gp
No
Yes
The method can be used to check. However, it is not practical to check due to the limitation of the code
No
—
Xp
No
Yes
No
No
—
CTL
ApUq
No
No
No
Yes
A combination of Algorithm 1 and the experiments in [55]
AFp
No
No
No
Yes
AGp
No
No
No
Yes
AXp
No
No
No
Yes
EpUq
No
No
No
Yes
The experiments for φ1 in Section 4.1
EFp
Yes
No
No
Yes
A combination of Algorithm 2 and the experiments in [55]
EGp
No
No
No
Yes
EXp
No
No
No
Yes
ITL
(p1Uq1); (p2Uq2)
No
No
No
Yes
The experiments for φ2 in Section 4.2
(pUq)∗
No
No
No
Yes
The experiments for φ3 in Section 4.2
PTL
((p1Uq1), (p2Uq2)) prj (p3∧Xq3)
No
No
No
Yes
The experiments for φ4 in Section 4.3
The DNA-computing-based approach for checking the basic CTL formula EFp [53] cannot deal with any other CTL formula (including the basic formula). In comparison, the new method can conduct model checking for all of the eight basic CTL formulas via DNA molecules. In addition, the method in [53] cannot deal with any ITL/PTL formulas, whereas the new method can deal with them.There are some previous DNA-computing-based approaches for checking all of the four basic LTL formulas and some popular LTL formulas [54, 55]. However, these methods cannot work on any of the CTL formula, ITL formula, and PTL formula. In comparison, the new method can conduct model checking for all of the basic formulas of the above three temporal logic types. In particular, the relationship of the expressive abilities of these three temporal logic types is shown in Figure 29.
Figure 29
Comparison of power among the several logic types (comparison of action ranges among the new method and the existing ones).
First, the new approach has employed some formal technique based on the semantic equivalent transformations before calling the algorithm TL-MC-DNA.Second, the new approach extends the scope of input parameters of the algorithm TL-MC-DNA.Third, we have designed a number of the new DNA encoding schemes which are more effective and used together with the new algorithms.Fourth, the targeted problems are different. The algorithm TL-MC-DNA is used for the basic LTL model checking, whereas the new approach is used for the basic CTL model checking, the basic ITL model checking, and the basic PTL model checking. In other words, the latter problems are reduced to the former solved problem, using a series of logical ways and molecular biological ones. This is the research scheme in this paper.
4.6.2. Comparing the New Method to the Classical Model Checking Algorithms
As shown in Section 3.1.4, the model checking method using DNA molecules is different from the model checking algorithms based on electronic computing devices, in terms of the computing mechanism due to the different computing carriers. As a result, the new method and the classical ones in [43] are complementary.
4.6.3. Additional Discussions
In [60], Professor Lamport talked about the problems that he knows with liveness. One problem is that “more than 90% (probably more than 95%) of the errors in real systems are violations of safety properties.” The CTL formula AFp is usually employed to describe a safety property in practical model checking. Our new method can check AFp via DNA molecules, as illustrated in Algorithm 1. Therefore, the core of CTL in this paper is useful in practice of computing.Previous research has demonstrated that the DNA model checking technique using sticker automata can be implemented in molecular biology [55], if the number of the nodes of FSA of a logical formula is not greater than 7 and the number of the edges of FSA of the logical formula is not greater than 42. It is obvious that the new method aims to deal with CTL/ITL/PTL formulas using sticker automata. Thus, our molecular biology technique mentioned above indicates that not only all of the basic formulas but also some popular CTL/ITL/PTL formulas in practice can be dealt with, by extending the new method, which is similar to the case of LTL in [55]. In addition, the methods based on sticker automata can deal with the complete decidable sets of temporal logic using some extended DNA encoding, in theoretical computing [55]. The details of these extensions are omitted from the paper due to the scope of this paper and the limitation of space.
5. Conclusions
Early studies on DNA computing focused on nonautonomous models and algorithms. The DNA computing techniques have been optimized with self-assembly in recent years. This paper has presented a novel DNA computing method using sticker automata for model checking temporal logic formulas. Particularly, our newly developed algorithms are based on the self-assembly of sticker automata.The state of the art of the universal DNA computer is encouraging [15, 30], and it is in a great need for new components to enhance the theoretical architecture of DNA computing, especially for the temporal logic model checking, which is a complex computational problem. The new algorithms have been implemented and model checking was conducted via DNA molecules for the basic formulas of CTL, ITL, and PTL. The model checking technique based on molecular computation has its intrinsic advantages of parallel computing, compared with the classical model checking methods. The new DNA computing approach based on sticker automata will develop a molecular solution and expand the previous DNA computational problem library.There are several directions that can be further explored based on the new method described in this paper. One area is to apply the cellular model checking technique for genomic research. For example, it can be used to study DNA repairing and mismatching during cell division, which was believed to be associated with cancer occurrence [56]. In order to improve the ability of discovery and repair of abnormal genes not only at the structural level but also at the functional level, it is necessary to study the temporal and spatial expression of genes. Some previous research has employed the cellular computation technique to provide an autonomous intelligent method for the molecular diagnosis and treatment of some human diseases which are caused by genetic mutation [56]. However, the new method can deal with more temporal relationships. Therefore, one of the future works of our study is to incorporate our new DNA computing approach for model checking temporal logic into the artificial controlled gene repair techniques. This will develop a molecular means for the early detection, diagnosis, and treatment of cancer. It will impact the prognosis and the survival rate of cancerpatients.A specific future application of our method is to study the gastric cardiac cancer for the stage of gastric inflammation and precancerous lesions, which showed some abnormal behavioral changes in genes with temporal characteristics. To give an example, previous study discovered a susceptibility gene locus of gastric cardiac cancer in the Han population of Northern China [61]. Future research will focus on how to embed the autonomous model checking method into living human cells. Such an approach can be used to develop a molecular robot technique to repair susceptibility loci or DNA mismatches in the gastric cancer cells or the normal cells. To be a little more specific, the basic CTL formulas can be applied to describe the branch temporal relationships among dynamic changes in genes, the basic ITL formulas for the simple interval relationships among dynamic changes in genes, and the basic PTL formula for the general relationships between intervals and their effects of dynamic changes in genes. These basic temporal logical formulas are sufficient to describe the dynamic changes of genes and no other formulas are needed.
Authors: S Roweis; E Winfree; R Burgoyne; N V Chelyapov; M F Goodman; P W Rothemund; L M Adleman Journal: J Comput Biol Date: 1998 Impact factor: 1.479