Sarah Guiziou1, Federico Ulliana2, Violaine Moreau1, Michel Leclere2, Jerome Bonnet1. 1. Centre de Biochimie Structurale (CBS), INSERM U1054 , CNRS UMR5048, University of Montpellier , 34090 Montpellier , France. 2. Laboratoire d'Informatique, de Robotique et de Microelectronique de Montpellier (LIRMM) , CNRS UMR 5506, University of Montpellier , 34090 Montpellier , France.
Abstract
Tools to systematically reprogram cellular behavior are crucial to address pressing challenges in manufacturing, environment, or healthcare. Recombinases can very efficiently encode Boolean and history-dependent logic in many species, yet current designs are performed on a case-by-case basis, limiting their scalability and requiring time-consuming optimization. Here we present an automated workflow for designing recombinase logic devices executing Boolean functions. Our theoretical framework uses a reduced library of computational devices distributed into different cellular subpopulations, which are then composed in various manners to implement all desired logic functions at the multicellular level. Our design platform called CALIN (Composable Asynchronous Logic using Integrase Networks) is broadly accessible via a web server, taking truth tables as inputs and providing corresponding DNA designs and sequences as outputs (available at http://synbio.cbs.cnrs.fr/calin ). We anticipate that this automated design workflow will streamline the implementation of Boolean functions in many organisms and for various applications.
Tools to systematically reprogram cellular behavior are crucial to address pressing challenges in manufacturing, environment, or healthcare. Recombinases can very efficiently encode Boolean and history-dependent logic in many species, yet current designs are performed on a case-by-case basis, limiting their scalability and requiring time-consuming optimization. Here we present an automated workflow for designing recombinase logic devices executing Boolean functions. Our theoretical framework uses a reduced library of computational devices distributed into different cellular subpopulations, which are then composed in various manners to implement all desired logic functions at the multicellular level. Our design platform called CALIN (Composable Asynchronous Logic using Integrase Networks) is broadly accessible via a web server, taking truth tables as inputs and providing corresponding DNA designs and sequences as outputs (available at http://synbio.cbs.cnrs.fr/calin ). We anticipate that this automated design workflow will streamline the implementation of Boolean functions in many organisms and for various applications.
Reprogramming
the response of
living cells to chemical or physical signals is a key goal of synthetic
biology and would support the development of complex manufacturing
processes, sophisticated diagnostics, or cellular therapies.[1] In order to control cellular behavior, researchers
have engineered many types of Boolean logic gates operating in single
cells by using transcriptional regulators,[2−8] RNA molecules,[9−11] or site-specific recombinases.[12−14] However, scaling-up
single-cell logic systems requires solving multiple engineering challenges.
First, when program complexity increases (number of inputs ≥3),
the high number of parts needed can cause metabolic burden and affect
cellular viability. Second, current design methods are mostly ad-hoc, and each Boolean function is implemented using a
different genetic architecture that needs to be fully characterized
and optimized. Despite recent progress toward predictable gate design,[7] some gates simply do not work or are too complex
to be implemented within a single cell. Finally, in order to avoid
cross-talk, single-cell logic systems need to use different components
for every novel signal to be detected. While library of orthogonal
regulatory components have greatly expanded,[3,6,15,16] their deployment
can be challenging and requires time-consuming optimization.In nature, division of labor between cellular subpopulations is
a ubiquitous mechanism allowing cellular communities to accomplish
complex functions.[17,18] Early efforts to engineer synthetic
multicellular systems led to the construction of pattern-forming communities,[19] predator–prey ecosystems,[20] synchronized oscillators,[21,22] or distributed metabolic pathways.[23] Researchers
also realized that problems faced by logic circuits operating in single
cells could be addressed by distributing the logic program between
different cells.[24] Because of the spatial
separation allowed by cellular compartments, optimized regulatory
components can be reused in different subpopulations. As the circuit
is divided into smaller subcircuits, metabolic burden is reduced.
Finally, simple cellular computing modules can be composed in different
manners and wired via cell–cell communication
channels to obtain different logic functions. For example, Tamsir et al. used multilayered circuit designs inspired from electronics
to construct all 2-input logic gates by combining spatially separated E. coli colonies encoding NOR gates wired via quorum-sensing molecules.[25] Specific
features of biology can also be used to our advantage to engineer
logic systems in a more efficient manner than by strictly transposing
electronic designs.[12,24,26] One particularly promising approach is distributed multicellular
computation (DMC).[24,27−29] DMC is based
on the decomposition of a Boolean function into various subfunctions,
each performed by a particular subpopulation of cells. Different subpopulations
can then be combined in different manners to realize any given Boolean
function of interest. Importantly, multiple cells are capable of producing
the output which is therefore distributed among the cellular subpopulations.
Recently, Macia and colleagues implemented DMC within a multicellular
consortium by using cellular computing units performing elementary
IDENTITY or NOT operations.[30] While highly
scalable, the need for spatial separation between each subpopulation
prevents these systems from operating autonomously.Here we
present a composable framework enabling the systematic
design of logic gates performing Boolean logic within an autonomous
multicellular consortium.We designed our system to operate
using site-specific recombinases,
more specifically serine integrases, which allow robust and flexible
engineering of complex logic gates.[12,13] Serine integrases
are members of the large serine recombinase family[31] and catalyze site-specific recombination between attachment
sites attB and attP. Recombination
operates via double-strand breaks located at the
central dinucleotides followed by the generation of hybrid sites attL and attR. Depending on the relative
orientation of attB and attP, the
recombination reaction leads to excision (parallel orientation) or
inversion (antiparallel orientation) of the DNA sequence flanked by
the attachment sites.[32] Recombinase devices
can implement complex logic functions without the need of cascading
multiple logic gates like in electronics.[12,13,26] Integrase recombination is irreversible
in the absence of cofactors, so that recombinase logic gates exhibit
memory, are single use (one-shot), and therefore belong to the family
of asynchronous logic devices (i.e., the system can
respond to multiple signals even if they are not present simultaneously).Our design for Boolean logic is based on a reduced library of cellular
computing units responding to one or multiple inputs that can be composed
at will to implement all desired Boolean functions (Figure ). Our logic system is single
layer, does not require cell–cell communication nor spatial
separation, greatly facilitating its implementation. In order to make
our design framework broadly accessible, we provide a fully automated
web platform called CALIN (Composable Asynchronous Logic using Integrase
Networks) taking truth tables as inputs and providing corresponding
DNA designs and sequences as outputs.
Figure 1
Distribution of a Boolean function within
a multicellular consortium.
The Boolean function of interest is decomposed as a disjunction (i.e., sum) of subfunctions (or clauses). Here, as an example,
a given function, f, is decomposed into functions f1, f2, and f3. The strains performing f1, f2 and f3 are selected from the strain library to assemble a multicellular
consortium computing the desired Boolean function.
Distribution of a Boolean function within
a multicellular consortium.
The Boolean function of interest is decomposed as a disjunction (i.e., sum) of subfunctions (or clauses). Here, as an example,
a given function, f, is decomposed into functions f1, f2, and f3. The strains performing f1, f2 and f3 are selected from the strain library to assemble a multicellular
consortium computing the desired Boolean function.
Results
A Hierarchical Composition
Framework for Multicellular Boolean
Logic Using Integrase Switches
In order to implement a Boolean
function within a multicellular consortium, we decomposed the function
into several independent subfunctions, or clauses,[30] executed by a different cellular subpopulation, chosen
from a library containing a reduced number of cellular computing units
(Figure ). To facilitate
multicellular system composition, we designed our system so that each
cellular subpopulation computes independently of the others, without
cell–cell communication needed. As a consequence, if one cellular
subpopulation is ON (expression of the output gene), the global output
of the system is considered to be ON. Because of their reduced number
and of the absence of cell–cell communication, cellular computing
units can be extensively characterized and optimized to predictably
implement all Boolean functions at the multicellular level.Boolean functions encode the output state of the logic gate. The
variables of the function are the inputs of the gate which are equal
to 1 if the signal has been present and otherwise to 0. We express
Boolean functions using the disjunctive normal form.[33]The Boolean function f is a disjunction: f = β1 OR...OR β, where M is the number
of clauses present in f, and each β is a conjunctive clause: β = θ AND...AND θ, where each θ is a literal of the variable x (either the identity of the variable
or its negation), with j being an integer between
1 and n. n corresponds to the number of variables
in this conjunction (an integer between 1 and N). N is the number of variables in the function f.Each cellular computing unit executes a particular “subfunction”
corresponding to a conjunctive clause. Then, the full function is
performed by combining multiple cellular computing units (Figure A).
Figure 2
A hierarchical composition
framework for asynchronous Boolean recombinase
logic. (A) Distribution of a Boolean function within a multicellular
consortium by decomposition into conjunctions of literals (variables
or their negations). Here an example is depicted in which a Boolean
function is decomposed into three subfunctions and implemented in
three separate cellular computing units. (B) attB and attP disposed in parallel orientation. (C)
Elements implementing IDENTITY and NOT functions. To obtain an IDENTITY
function, a transcriptional terminator is flanked by parallel attachment
sites, blocking transcription of the gene of interest. When the signal
is present, the terminator is excised and the output gene is expressed.
To obtain a NOT function, a promoter is flanked by parallel attachment
sites. When the signal is present, the promoter is excised, and the
gene is no longer expressed. (D) Functional composition of ID-elements
into ID-modules, by placing elements in series to obtain the conjunction
of IDENTITY functions. For a 2-input ID-module, the output gene is
expressed only when both inputs have been present, both terminators
excised (corresponding to an AND gate (A AND B)).
(E) Functional composition of NOT-elements into NOT-modules, by nesting
elements to obtain conjunction of NOT functions. For a 2-input NOT-module,
the output gene is expressed only when none of the inputs has been
present (corresponding to a NOR gate: NOT(A) AND NOT(B)). (F) Hierarchical composition framework for Boolean recombinase
logic. ID- and NOT-modules are composed in series, following a priority
rule in which the NOT-module is placed upstream the ID-module. The
device shown here can be scaled to perform all functions based on
conjunction of NOT and IDENTITY functions.
A hierarchical composition
framework for asynchronous Boolean recombinase
logic. (A) Distribution of a Boolean function within a multicellular
consortium by decomposition into conjunctions of literals (variables
or their negations). Here an example is depicted in which a Boolean
function is decomposed into three subfunctions and implemented in
three separate cellular computing units. (B) attB and attP disposed in parallel orientation. (C)
Elements implementing IDENTITY and NOT functions. To obtain an IDENTITY
function, a transcriptional terminator is flanked by parallel attachment
sites, blocking transcription of the gene of interest. When the signal
is present, the terminator is excised and the output gene is expressed.
To obtain a NOT function, a promoter is flanked by parallel attachment
sites. When the signal is present, the promoter is excised, and the
gene is no longer expressed. (D) Functional composition of ID-elements
into ID-modules, by placing elements in series to obtain the conjunction
of IDENTITY functions. For a 2-input ID-module, the output gene is
expressed only when both inputs have been present, both terminators
excised (corresponding to an AND gate (A AND B)).
(E) Functional composition of NOT-elements into NOT-modules, by nesting
elements to obtain conjunction of NOT functions. For a 2-input NOT-module,
the output gene is expressed only when none of the inputs has been
present (corresponding to a NOR gate: NOT(A) AND NOT(B)). (F) Hierarchical composition framework for Boolean recombinase
logic. ID- and NOT-modules are composed in series, following a priority
rule in which the NOT-module is placed upstream the ID-module. The
device shown here can be scaled to perform all functions based on
conjunction of NOT and IDENTITY functions.We designed a hierarchical composition framework in which
two elements
encoding the NOT and IDENTITY functions (called ID-element and NOT-element)
are composed into computational modules which are then combined to
generate computational devices executing a particular clause within
a cellular subpopulation.For the sake of simplicity and robustness,
we designed switches
controlled by integrase-mediated excision (Figure B). Excision-based design reduces the distance
between gate promoter and the gene of interest. Moreover, as no asymmetric
terminator is needed, this design might be easier to deploy into many
organisms.[14]The ID-element consists
of a transcriptional terminator flanked
by recombination sites and placed between the promoter and the output
gene. In presence of the signal, the terminator is excised and the
output gene is expressed (Figure C, left panel). The NOT-element consists of a promoter
driving the output gene and flanked by recombination sites. In presence
of the signal, the promoter is excised and the gene is not expressed
anymore (Figure C,
right panel). Computational modules performing conjunctions of NOT
or conjunctions of IDENTITY functions are respectively realized by
nesting NOT-elements or by placing ID-elements in series (Figure D,E). Finally, NOT-
and ID-modules are composed in series to obtain the final computational
devices: in this case the NOT-module containing the promoter is positioned
in 5′ of the ID-module, with the output gene positioned downstream
(Figure F). Following
this hierarchical composition framework, all conjunctive clauses are
implementable within a cellular computing unit. The full Boolean function
is then executed by a multicellular consortium containing different
cellular computing units.To reduce the number of computational
devices, we implemented only
one computational device per set of symmetric Boolean functions and
interchanged connection between integrases and control signals. For
example, the two Boolean functions: NOT(A) AND B; B AND NOT(A) are executed using the same computational
device (Figure S1). Consequently, only
14 computational devices are needed to realize all 4-input Boolean
functions (65 536 functions) (Figure A). For every additional input (from N – 1 to N), only N + 1 novel computational devices are needed while the number of Boolean
functions increases drastically. For example, 7 additional devices
are needed to transition from 5 to 6 inputs (27 devices in total),
enabling a 1010 fold increase in the number of Boolean
functions (for a total of ∼1019) (Figure B). Of note, the different
cellular computing units do not always include N integrases and computational
devices responding to N inputs. As an example, the 4-input Boolean
equation shown in Figure D can be executed using 3 strains containing respectively
4, 3, and 2 integrases and with different signal-integrase connectivities.
Figure 3
Implementing
all Boolean logic functions using a reduced number
of computational devices. (A) Schematics of all devices needed to
implement up to 4-input functions. (B) Maximum number of strains and
number of computational devices needed to compute all Boolean functions
for a given number of inputs. See Methods for
details. (C) Proportion of Boolean functions implementable with a
specific number of strains for 3 and 4 inputs (obtained by generating
all the biological designs for 3 and 4-input Boolean functions, see Table S1 for numbers). (D) Example of a biological
implementation for a 4-input Boolean function. The function shown
here is divided into a disjunction of conjunctive clauses (see Figure A). Each conjunctive
clause is executed using a particular computational device (defined
in panel A) each placed into a separate cellular computing unit. By
combining the different units, the full logic function is obtained.
If at least one of the cellular units is ON, the output is considered
to be ON. Of note, inputs are not always connected to the same integrase
(as for input D in Cell 1 and Cell 2), and all integrases and inputs
are not present in all cells.
Implementing
all Boolean logic functions using a reduced number
of computational devices. (A) Schematics of all devices needed to
implement up to 4-input functions. (B) Maximum number of strains and
number of computational devices needed to compute all Boolean functions
for a given number of inputs. See Methods for
details. (C) Proportion of Boolean functions implementable with a
specific number of strains for 3 and 4 inputs (obtained by generating
all the biological designs for 3 and 4-input Boolean functions, see Table S1 for numbers). (D) Example of a biological
implementation for a 4-input Boolean function. The function shown
here is divided into a disjunction of conjunctive clauses (see Figure A). Each conjunctive
clause is executed using a particular computational device (defined
in panel A) each placed into a separate cellular computing unit. By
combining the different units, the full logic function is obtained.
If at least one of the cellular units is ON, the output is considered
to be ON. Of note, inputs are not always connected to the same integrase
(as for input D in Cell 1 and Cell 2), and all integrases and inputs
are not present in all cells.To implement a N-input Boolean function, a maximum of 2 different cellular computing units
have
to be composed, corresponding to a culture of 2 different strains: 4 for 3 inputs and 8 for 4 inputs
(Figure B). However,
most logic functions can be performed using less cellular computing
units (an average of 2.3 strains for 3-input and 3.6 strains for 4-input
Boolean functions, Figure C).In summary, we provide a hierarchical composition
framework using
a reduced library of computational devices to systematically implement
all N-input Boolean logic functions within a multicellular consortium.
An Automated Design Platform for Recombinase Logic
We then
aimed at generating a software for automating the design
of cellular consortia performing asynchronous Boolean logic. Softwares
enabling such automated genetic circuit design are necessary and extremely
useful when the design space becomes too large for humans to explore
it efficiently.[7,34−36]We thus
designed an algorithm called CALIN (Composable Asynchronous Logic
using Integrase Networks) based on two main steps (Figure A). First, the Boolean function
of interest is decomposed into a disjunction of conjunctive clauses
using the Quine–McCluskey algorithm (see Methods). Then, each clause is converted into a given computational device
for which particular connections between integrases and inputs are
generated.
Figure 4
Automated design of multicellular recombinase logic. (A) The CALIN
algorithm enables the systematic design of Asynchronous Boolean logic.
(B) CALIN web-interface takes as an entry a Boolean truth table and
generates as outputs: the connection map between inputs and integrases,
the DNA architectures of the computational devices and the corresponding
DNA sequences.
Automated design of multicellular recombinase logic. (A) The CALIN
algorithm enables the systematic design of Asynchronous Boolean logic.
(B) CALIN web-interface takes as an entry a Boolean truth table and
generates as outputs: the connection map between inputs and integrases,
the DNA architectures of the computational devices and the corresponding
DNA sequences.The CALIN script written
in Python is available on Github and can
be directly used for high-throughput generation of biological designs.
Furthermore, the CALIN python script can design logic devices customized
for specific organisms (E. coli, B. subtilis and S. cerevisiae) and can be tailored by
the user to generate devices using fully customized DNA sequences.In order to enable broader access to our design framework, we also
provide a Web site of CALIN accessible at http://synbio.cbs.cnrs.fr/calin.In the CALIN web-interface, the user fills the number of
inputs
to process (up to 5) and the desired Boolean truth table or corresponding
binary number. The Web site provides as outputs the DNA architectures
of the computational devices, the connection map between signals and
integrases, and the corresponding DNA sequences (Figure B).
Discussion
In
this work we present a scalable composition framework for implementing
asynchronous Boolean logic within a multicellular consortium. We provide
an online design tool for the systematic design of recombinase logic
circuits called CALIN (Composable Asynchronous Logic using Integrase
Networks). While these designs are currently theoretical, the robustness
of integrase-mediated recombination against various site permutations
and orientations[12,13,34,37] should support straightforward experimental
implementation.By taking advantage of the single-layer architecture
of recombinase
logic, we encapsulated complex Boolean functions into various subcellular
populations. Because of its compact architecture, our design exhibit
two significant improvements over previous DMC systems: (i) no cell–cell
communication channels are needed, and (ii) cells do not need to be
spatially separated, thereby supporting the implementation of fully
autonomous multicellular consortia operating without an external physical
device.Another difference between our system and other DMC
is the use
of recombinase switches that provide memory.[34,38,39] Recombinase mediated data-storage could
be useful for applications requiring endpoint measurements, or delayed
readout, like diagnostics. Also, because the state of the logic system
is written within DNA, it can be addressed via PCR
or DNA sequencing,[13,38,40] even if the cells die, providing other robust readout modalities.As with others DMC systems, for a given number of inputs, the number
of elementary computational devices needed to compose all logic functions
compares very favorably with the number of possible functions. For
example, implementing all 65 536 4-input, or all ∼4.3
× 109 5-input Boolean functions only requires respectively
14 and 20 computational modules.As serine recombinases do not
require host-specific cofactors and
can operate in several species, the designs presented here could be
implemented in many organisms. Logicfunctions could also be distributed
between different species operating in concert. In such schemes, researchers
could take advantage of the particular capacities of different organisms
to detect different signals and/or perform specific tasks. Examples
of applications include environmental remediation[41,42] or microbiome engineering for therapeutic applications.[43]A possible challenge for our system is
the high number of strains
that have to operate together when the number of inputs increases
(Figure B). Cultivating
many strains together could lead to counter selection of some subpopulations,
but this problem could be addressed by encapsulating the different
strains into hydrogel beads.[40] Also, as
the number of strains increases, the output of one subpopulation representing
a small fraction of the whole consortia could become difficult to
measure. The output level in the ON state will also be different if
one or multiple cellular subpopulations are turned ON. However, adding
a single cell–cell communication channel could address this
problem by propagating the output to the whole-population (Figure S2).Finally, for some applications,
“real-time” response
could be achieved via a similar composition framework
using synchronous recombinase logic gates based on reversible recombination
reactions performed by integrases coupled with recombination directionality
factors (RDFs) (Figure S3).[12,26]
Methods
Equations for Determining of Numbers of Functions/Strains/Devices
The number of Boolean functions corresponds to 2 to the power of
the number of possible states. As each state can be equal to 1 or
to 0, the number of possible states is equal to 2 to the power of N where N is the number of inputs. Consequently,
the number of Boolean functions is equal to eq .The maximum number of strains needed
to implement any Boolean logic function with N inputs is equal to eq , as all N-input Boolean equations can be written in the disjunctive normal
form, then as a disjunction of a maximum of 2 conjunctive clauses.[33]The number
of different conjunctive clauses (corresponding to a
conjunction of literals) is equal to eq .If we implement all these functions within
cells, the number of
standard devices needed is equal to the number of conjunctive clauses
(eq ).This method leads to a high number of devices.
Therefore, we decided
to construct only one device per set of symmetric Boolean functions
(e.g., A AND NOT(B) is the symmetric
function of NOT(A) AND B). This approach reduces
the number of standard devices. In consequence, for an N-input Boolean function, devices computing from 1 to N inputs are needed and k + 1 nonsymmetric Boolean
functions computing the conjunction of k literals
exist:Of note, the number of devices follows
the
arithmetic series: where a1 =
2, d = 1, and N is the number of
inputs.In a first approximation, N sensor-modules
in
which a control signal (i.e., a sensor device responding
to an input of interest) is connected to an integrase are needed for
the construction of an N-input system. However, as
we reduced the number of devices to a set composed of nonsymmetric
Boolean functions, we need to connect all control signals to all integrases
to compute all Boolean functions. Therefore, N2 sensor-modules are needed.
Automated Generation of
Genetic Designs
We encoded
an algorithm generating genetic designs executing N-input Boolean
functions using Python (Figure S4). The
algorithm takes as input a Boolean truth table or the binary number
corresponding to the function. The output corresponds to the biological
implementation of the Boolean function, such as for each strain: a
graphical representation of the genetic circuit and its associated
DNA sequences.The truth table is transformed into a Boolean
function in the disjunctive normal form using the Quine–McCluskey
algorithm[33] (Figure A). The Boolean function is decomposed into
conjunctive clauses (conjunction of literals). In this scheme, each
clause can be regarded as a “subfunction”. From each
conjunctive clause, we extract two types of information. First, based
on the number of IDENTITY and NOT functions, we identify which logic
device is needed. Second, based on the association of inputs to either
IDENTITY and NOT functions, we identify which sensor-modules are needed
among the different connection possibilities between control signals
and integrases. Finally, we combine the designs executing the different
conjunctive clauses to obtain the global design for implementing the
desired truth table.To simplify the construction process, the
DNA sequence of the computational
devices is generated by our Python code. In CALIN, sequences are adapted
for E. coli, but sequence generation can be
adapted to other organisms (database available for B. subtilis and Saccharomyces cerevisiae) or customized using
the source Python code available on github.
Authors: Lei Yang; Alec A K Nielsen; Jesus Fernandez-Rodriguez; Conor J McClune; Michael T Laub; Timothy K Lu; Christopher A Voigt Journal: Nat Methods Date: 2014-10-26 Impact factor: 28.547
Authors: Frederick K Balagaddé; Hao Song; Jun Ozaki; Cynthia H Collins; Matthew Barnet; Frances H Arnold; Stephen R Quake; Lingchong You Journal: Mol Syst Biol Date: 2008-04-15 Impact factor: 11.429
Authors: Virgil A Rhodius; Thomas H Segall-Shapiro; Brian D Sharon; Amar Ghodasara; Ekaterina Orlova; Hannah Tabakh; David H Burkhardt; Kevin Clancy; Todd C Peterson; Carol A Gross; Christopher A Voigt Journal: Mol Syst Biol Date: 2013-10-29 Impact factor: 11.429