Literature DB >> 16924266

Towards synthesis of a minimal cell.

Abstract

Construction of a chemical system capable of replication and evolution, fed only by small molecule nutrients, is now conceivable. This could be achieved by stepwise integration of decades of work on the reconstitution of DNA, RNA and protein syntheses from pure components. Such a minimal cell project would initially define the components sufficient for each subsystem, allow detailed kinetic analyses and lead to improved in vitro methods for synthesis of biopolymers, therapeutics and biosensors. Completion would yield a functionally and structurally understood self-replicating biosystem. Safety concerns for synthetic life will be alleviated by extreme dependence on elaborate laboratory reagents and conditions for viability. Our proposed minimal genome is 113 kbp long and contains 151 genes. We detail building blocks already in place and major hurdles to overcome for completion.

Entities: Chemical Disease Gene Mutation Species

Mesh：

Year: 2006 PMID： 16924266 PMCID： PMC1681520 DOI： 10.1038/msb4100090

Source DB: PubMed Journal: Mol Syst Biol ISSN： 1744-4292 Impact factor: 11.429

Overview

‘How far can we push chemical self-assembly?'

This question was posed recently as one of the big 25 questions in science for the next 25 years (Service, 2005). Nowadays, big questions often are addressed by big experimental efforts. But before embarking on a big project, it is helpful to get specific. What push in chemical self-assembly might be most worthwhile and practical? Self-assembly in vitro of viruses and the ribosome, achieved decades ago, taught us some of the principles assumed to be used in general by cells (Lewin, 2004). For example, self-assembly occurs in a definite sequence and is generally energetically favored, obviating the need for enzymes and an energy source. Assembling some type of cell (i.e. a self-replicating, membrane-encapsulated collection of biomolecules) would seem to be the next major step, yet detailed plans have not been published. Here, we attempt to outline the synthesis of a minimal cell containing the core cellular replication machinery, review the pertinent literature and highlight gaps in knowledge that need filling.

Utility

Synthesizing a minimal cell will advance knowledge of biological replication. Many hypotheses in replication and its subsystems can only be tested in such a synthetic biology project. The meaning of ‘synthetic' (from Greek sunthesis, to put together) discussed here bypasses the current reliance of synthetic biology on cells or macromolecular cell products: the aim is to put together an organism from small molecules alone. The simplest approach for creating an artificial cell may be by evolving an RNA polymerase made exclusively of RNA (Szostak ) to replace all protein components of in vitro replicating and evolving systems (e.g. to replace Qβ replicase; Mills ). But in comparison with a purified protein-based system, it is neither guaranteed to arrive sooner nor tell us more. A protein-based system will connect with, and reveal more about, existing biological systems. Life, like a machine, cannot be understood simply by studying it and its parts; it must also be put together from its parts. Along the way to synthesizing a cell, we might discover new biochemical functions essential for replication, unsuspected macromolecular modifications or previously unrecognized patterns of coordinated expression. How good a model would an artificial, protein-based, minimal cell be for natural cells? The only cellular alternative is a perturbed natural cell, an incredibly complex system even for the simplest of cells. A much simpler purified system based on a real cell would thus be easier to model and understand. It could certainly answer questions that cannot be answered in vivo or in crude extracts, such as which macromolecules and macromolecular modifications are sufficient for subsystem function. However, even the simplest minimal cell would still be highly complex; so its construction and study would be facilitated by substituting some of the necessary subsystems with simpler analogs. Should the simpler in vitro model turn out to be a poor model for the more complex in vivo system, one could always construct a more complex in vitro system that may better reflect in vivo. Synthesizing a cell will also lead to new applications. Purified biochemical systems already offer major advantages, such as the polymerase chain reaction (PCR) and in vitro transcription. A better understanding and manipulation of all cellular replication subsystems (molecular biology's tool kit) should spin off new technologies. For example, in vitro genome replication may be useful for replicating very large segments of DNA with high fidelity. Combined in vitro transcription, RNA processing and RNA modification would allow preparation of rRNAs and tRNAs with defined modifications to test the roles of the modifications, and modified tRNAs to aid incorporation of unnatural amino acids into proteins. Purified translation systems have enabled reassignment of mRNA codons to encode unnatural amino acids by omission of competing natural amino acids (Forster ); further improvements of the purified translation system could enable the genetic selection of protease-resistant, peptide-like ligands for drug discovery by pure translation display (Forster ). The purified translation system may also facilitate expression of proteins difficult to express by standard approaches. Better control of lipid vesicle synthesis could advance liposome-based drug delivery. Since bacterial translation is the main target of antibiotics, greater understanding may assist development of new drugs to fight mounting antibiotic resistance. Ultimate success in cell synthesis could generate useful microorganisms, for example, for renewable production of biodegradable plastics (Pohorille and Deamer, 2002).

Approach

The ideal approach for synthesizing a cell would allow all of the machine parts to be understood and tested. Like any engineering project, this requires detailed blueprints, raw synthetic capabilities and an overall diagnostic and debugging strategy. The use of entire genomes as the blueprints, some of which are small enough to synthesize de novo, is inconsistent with this approach. Self-replication of an unadulterated genome, however impressive, would not define the unnecessary genes, and the functions of about a third of the genes would remain unknown (Fraser ; Jaffe ). Building a machine from mysterious parts can only create a mysterious machine. What is needed is some way of defining a near-minimal genome and then a strategy that will lead inexorably to an understanding of all of its parts. Theoretical and experimental studies have attempted to establish a minimal set of genes needed for a self-replicating system in a cushy constant environment of unlimited, small molecule nutrients. Three basic approaches present themselves.

Comparative genomics

Comparative genomics searches for genes that have homologs in the genomes of groups of organisms. The approach estimates from 50 to 380 genes in a minimal genome (Mushegian and Koonin, 1996; Tomita ; Koonin, 2000; Jaffe ). It has the caveat that, among closely related genomes, some genes appear ‘required' for those species (e.g. many of the genes retained in the synthetic reduced genome Escherichia coli (Posfai )) although they are not required for basic life. If one goes to longer evolutionary distances, many gene functions are replaced by non-homologous genes, hence making some essential genes look dispensable (e.g. some of the tRNA modification enzymes used by Mycoplasma are either different from E. coli or unidentified by sequence identity, but that does not mean the different ones are dispensable). An additional challenge is that about a third of the essential genes have unknown functions. It is thus expected that a minimal genome based on this approach alone would be inviable, and it would not be possible to identify the missing essential genes.

Genetics

Genetics searches for essential genes by mutating one gene at a time. This approach estimates 430 genes in a minimal genome (out of Mycoplasma genitalium's total of 528; Supplementary Table S3; Hutchison ; Glass ). About a fifth of these essential genes have unknown functions. It is limited by false ‘essentials' due to the fraction of genes that were never mutated in the screen, due to creation of toxic partial complexes or pathways, and due to inadvertent effects on adjacent genes. The latter effects are prevalent in bacteria because a primary RNA transcript typically encodes multiple gene products. At the other extreme, false ‘dispensables' are disastrous when trying to assemble a viable minimal genome that lacks all of the individual ‘dispensables'. For example, most RNA modification enzymes are individually dispensable, but simultaneous deletion of tens of them would be expected to be unsustainable due to cumulative reductions in efficiency or fidelity (a useful working definition of essentials for a minimal genome should encompass such lethal ‘dispensables'). Again, in using this approach alone, it would not be possible to identify the missing essential genes.

Biochemistry

Biochemistry identifies from cell fractions those gene products essential for the reconstitution of biochemical reactions. It does not suffer from the above problems (except creation of toxic partial complexes), gives access to details of kinetic steps and allows debugging of isolated subsystems. However, the cellular subsystems must be integrated and thoroughly tested for accuracy on long templates before they can be considered physiological. Nevertheless, the biochemical approach has been successful at identifying macromolecules sufficient for reconstituting DNA, RNA and protein syntheses and, based on individual subtraction experiments, the components have either been shown to be necessary or could be so tested. Mindful of the remaining self-replication functions that need to be discovered (see below), it seems likely that a largely biochemical approach, now further empowered by mass spectrometry analyses and genetic and comparative genomic information, will be the most practical route to define a near-minimal, well-understood genome. We now review the relevance of current knowledge and technology to this new minimal cell project (MCP; Luisi, 2002).

A minimal genome

An MCP may be realized by reconstituting the macromolecular catalysts that synthesize DNA, RNA and protein. However, this overlooks the formation of the membrane compartment and the poorly understood process in which it is divided by membrane proteins (Gitai, 2005), both of which are required for life. But lipids alone have been shown to be sufficient for formation of rudimentary membranous compartments capable of both transmembrane transport of small molecules and fission autocatalytically (Szostak ), so membrane proteins may be dispensable. Polysaccharides should also be dispensable. If the simplest and best-characterized examples of DNA, RNA and protein synthesis are selected, if translation of all codons is enabled for generalizability and if efficiency and accuracy are not compromised, then this leads to the macromolecules and pathways of Figure 1.

Figure 1

A minimal cell containing biological macromolecules and pathways proposed to be necessary and sufficient for replication from small molecule nutrients. The macromolecules are all nucleic acid and protein polymers and are encapsulated within a bilayer lipid vesicle. The small molecules (brown) diffuse across the bilayer. The macromolecules are ordered according to the pathways in which they are synthesized and act. They are colored by biochemical subsystem as follows: blue=DNA synthesis, red=RNA synthesis and cleavage, green=RNA modification, purple=ribosome assembly, orange=post-translational modification and black=protein synthesis. MFT=methionyl-tRNAfMeti formyltransferase. The system could be bootstrapped with DNA, RNA polymerase, ribosome, translation factors, tRNAs, MTF, synthetases, chaperones and small molecules.

A detailed list of the gene products in the hypothetical synthetic minimal cell of Figure 1 is shown in Table I (left column). This list overlaps with a computational model of minimal cell genes largely derived from a minimal organism, M. genitalium (Tomita ; Supplementary Table S4), but differs by omitting enzymes for synthesizing small molecules (e.g. lipids and glycolysis substrates) and by including DNA replication, RNA processing, RNA modification, extra tRNAs to decode the whole genetic code, some additional essential translation components and chaperones. It should be emphasized that Table I is a working model only and that strict adherence will likely hamper progress. Examples of omitted, potentially stimulatory genes are given below and in Supplementary Table S1. Conversely, examples of included, potentially dispensable genes may be gleaned by comparison with the streamlined Mycoplasma genome (Fraser ; Table I, middle column; Supplementary Table S2).

Table 1

Biochemically derived list of genes that may encode a useful, near-minimal, self-replicating system dependent only on small molecule nutrients

Escherichia coli	Mycoplasma	3D structure
Coliphage f29 DNA polymerase	+	+
Coliphage P1 Cre recombinase	−	+
>Coliphage Lox/Cre recombinase site	−	+

Coliphage T7 RNA polymerase	Analog	+
>Coliphage T7 RNA polymerase initiation site	Analog	+
>Coliphage T7 RNA polymerase class II termination site	Analog	+

Lucerne viral hammerhead RNA	−	+
RNase P RNA	+	+
RNase P protein	+	+
>RNase P site/RNA primer for DNA polymerase	+	+

Small subunit 16S ribosomal RNA	+	+
All 21 small subunit ribosomal proteins (1–21)	+ except 1, 21	+
Large subunit 5S ribosomal RNA	+	+
Large subunit 23S ribosomal RNA	+	+
Large subunit 23S rRNA G2445>m2G methylase: unidentified	Unknown	−
Large subunit 23S rRNA U2449>dihydroU synthetase: unidentified	Unknown	−
Large subunit 23S rRNA U2457>pseudoU synthetase	Unknown	−
Large subunit 23S rRNA C2498>Cm methylase: unidentified	Unknown	−
Large subunit 23S rRNA A2503>m2A methylase: unidentified	Unknown	−
Large subunit 23S rRNA U2504>pseudoU synthetase	Unknown	−
All 33 large subunit ribosomal proteins (1–7, 9–11, 13–25, 27–36)	+ except 25, 30	+

Translational initiation factor 1	+	+
Translational initiation factor 2	+	+
Translational initiation factor 3	+	+
Translational elongation factor Tu	+	+
Translational elongation factor Ts	+	+
Translational elongation factor G	+	+
Translational release factor 1	+	+
Translational release factor 2	−	+
Translational release factor Gln methylase	+	+
Translational release factor 3	−	+
Ribosome recycling factor	+	+

33/45 tRNAs (see Figure 3)	Set of 29	+
tRNA C34>lysidine synthetase	Unidentified	+
tRNA A34>I deaminase	Unidentified	+
tRNA U34>cmo5U (=V) synthetases: unidentified	−	−
tRNA U34>2sU Cys desulfurase	−	+
tRNA U34>2sU synthetase	Unidentified	+
tRNA U34>cmnm5U GTPase	Unidentified	+
tRNA U34>cmnm5U synthetase	Unidentified	+
tRNA cmnm5U34>nm5U>mnm5U synthetase	Unidentified	−
tRNA G37 N1-methylase	+	+
tRNA A37>t6A N6-threonylcarbamoyl-A synthetase: unidentified	Unidentified	−
tRNA A37>i6A synthetase	−	+
tRNA i6A37>s2i6A>ms2i6A synthetase	−	+
All 22 aminoacyl-tRNA synthetase subunits (20 enzymes)	+ except Gly sub., Gln	+ except Gly sub., Ala
Met-tRNA formyltransferase	+	+

Chaperonin GroEL	+	+
Chaperonin GroES	+	+

151 genes=38 RNAs+113 proteins

Gaps in knowledge are in bold. Left column: chosen gene products and DNA sites. Middle column: relationship to the minimal genome of M. genitalium; clear sequence homolog=‘+'; known enzyme product without an evident sequence homolog=‘unidentified'; no functional homolog=‘−'. Right column: high-resolution, three-dimensional, structural information; >25% of the structure solved=‘+', <25%=‘−'. The small molecules known to be required are four dNTPs, four NTPs, 20 amino acids, N-5,10-methenyltetrahydrofolate, S-adenosylmethionine and isopentenyl pyrophosphate). Note: a full version listing the nomenclature, database link, length and sequence of each individual product is available in Supplementary Tables S1 and S2.

Several conclusions can be drawn from the provisional list of genes selected for a minimal cell, most of which are attractive when contemplating an MCP. In genomic terms, the list is very short, containing only 151 genes and 113 kbp. All of the genes are derived from E. coli and its bacteriophages (except for the hammerhead RNA from a plant virus; Forster and Symons, 1987), implying that the individual subsystems will be compatible. In contrast to lists derived by comparative genomics or genetic approaches, the biochemically based list does not contain any genes of unknown function or challenging membrane proteins; so it is close to a fully understood, accurately replicating ‘platform' for life. The few known gaps constitute only about seven genes, all of which are predicted to be for RNA modification (Table I, bold in the left column). From the viewpoint of structural biology, courtesy of recent breakthroughs in ribosome structure determination (Diaconu ; Ogle and Ramakrishnan, 2005), significant three-dimensional information is lacking for only 3% of the products: a few RNA modification proteins and aminoacyl-tRNA synthetases (Table I, right column). While some of the states and complexes remain to be solved at high resolution, a draft three-dimensional structure for any replicating system is a major milestone in the history of biology.

Tools

Genes for an MCP could be synthesized using either natural or unnatural gene sequences as starting points. Using natural gene sequences, genes can be readily synthesized by PCR, and large cloned operons of essential genes can be fused using synthetic linkers and homologous recombination. However, gene synthesis by cloning and PCR will soon be more expensive than raw synthesis from synthetic oligodeoxyribonucleotides (oligos). The latter also allows unnatural sequences, such as versions with altered codon bias to adjust mRNA secondary structures (Tian ). Scalability and cost limitations of established methods for gene synthesis from synthetic oligos are now being overcome by oligo synthesis on chips followed by PCR amplification and error correction (Carr ; Richmond ; Tian ; Zhou ).

Biochemical subsystems

Several biochemical subsystems are required to synthesize a minimal cell, and they are reviewed here. For each subsystem, possible examples from natural systems will be compared, gaps in knowledge will be identified and diagnostic and debugging strategies to fill the gaps will be suggested. Mindful of the goal of integration of the subsystems, emphasis is placed on subsystems that are homologous and that operate under standard physiological conditions.

Genome replication

In principle, the genetic material for an MCP could be either DNA or RNA. Although an RNA genome has the advantage of obviating genes for DNA replication, the challenges of preventing inhibitory double-stranded RNA structures and replicative mutations in artificial RNA genomes (Mills ) are unsolved. So the genetic material for an MCP should be DNA. A simple possible scheme for DNA replication that could be completely integrated with biological systems is shown in Figure 2. It shows rolling-circle DNA strand displacement (Zhong ) initiated with RNA transcript primers synthesized in situ by an RNA polymerase. Processing of the resulting double-stranded DNA concatemers into monomeric DNA circles occurs by homologous recombination at Lox sites catalyzed by Cre recombinase (Sauer, 2002). This approach has advantages over existing rolling-circle (Dahl ) or PCR (Mitra and Church, 1999) replication methods, as it requires neither solid-phase oligo synthesis nor changes in temperature, and is far simpler than natural DNA replication systems (Khan, 1997).

Figure 2

A generalizable, physiologically compatible, theoretical scheme for accurate DNA replication and RNA synthesis in vitro. Polymerase movements are illustrated by colored arrowheads. DNA synthesis: a nicked double-stranded DNA circle (middle) undergoes rolling-circle DNA synthesis by coliphage φ29 DNA polymerase (Dahl ) to give an oligomeric single-stranded DNA (bottom, blue). RNA primers (red) then hybridize at two sites to prime lagging strand DNA synthesis (bottom, green). When two Lox sites (bottom, L) are completed, recombination occurs between them catalyzed by coliphage P1 Cre recombinase (black cross) to form a duplicate of the original circular template. RNA synthesis: the circular genetic operon (middle) contains a promoter for T7 RNA polymerase (P), a ribosomal RNA (rRNA) gene, two transfer RNA (tRNA) sequences, a self-cleaving hammerhead sequence (H) and a T7 terminator (T). RNA synthesis from P generates a precursor RNA (top, red) containing three cleavage sites (thin black arrows). The second tRNA sequence merely serves as a recognition site for RNase P cleavage. Cleavages yield the mature rRNA and tRNA1. Any cleavage product containing a 3′ hydroxyl group or primary RNA transcript can serve as a primer for DNA synthesis (bottom, red).

Rolling-circle DNA strand displacement could be engineered in a stepwise manner. First, a simpler version could be constructed in which the T7 RNA polymerase and RNA processing are substituted by addition of short RNA primers to test the effect of multiple initiation sites. The efficiency of synthesis of monomeric DNA circles would be followed by gel electrophoresis (Dahl ), and replication fidelity at the base pair and whole genome levels should be tested with different polymerases. The biggest challenge anticipated is boosting the efficiency of monomeric circular template generation over by-products, such as linear DNAs or oligomeric circles. Such defective by-products would also be replicated and compete for nutrients (like PCR deletion products or defective interfering viruses). Defective by-products potentially could be weeded out with appropriate selection schemes. For example, encapsulation of individual genomes within membranous cells would result in non-viability of cells containing deleted genomes. However, encapsulation would raise new challenges, especially for large genomes. This might be aided by compacting the DNA through addition of DNA gyrase.

Transcription

A single RNA polymerase should suffice for an MCP. E. coli's multi-subunit enzyme (Lewin, 2004) or the single polypeptide enzyme encoded by coliphage T7 (Studier ) seems to be the best, with the choice influenced by several considerations that also determine possible modes of regulation. In considering the whole transcription cycle for a minimal replicating system, the simpler, more predictable T7 RNA polymerase is arguably a better starting point than the E. coli RNA polymerase (a detailed comparison is provided in Supplementary information).

RNA processing

A host of RNases cleave precursor RNAs in vivo (Li and Deutscher, 1996) with a complexity that could be reproduced in an MCP. However, inclusion of these RNases comes with the risks of cryptic cleavages, and a simpler approach may be easier to engineer (Figure 2, top). This approach generates all required unadulterated termini: tRNA 5′ and 3′ ends (Forster and Altman, 1990) and, if necessary, the 3′ end of an rRNA. The self-cleaving sequence (Forster and Symons, 1987) is included because precursor tRNAs with substantial 3′ extensions can be poor substrates for RNase P (Li and Deutscher, 1996) and RNA polymerase terminators are inefficient. The efficiency of RNA processing, monitored by gel electrophoresis, could be improved by trying several different precursor-specific sequences.

A minimal translatome

The most complex universal biological machinery is clearly translation. Translation-associated genes (the ‘translatome') account for a large fraction of cellular genes, 96% of the genes in Table I, and all of the currently predicted gaps in knowledge of an MCP. The eukaryotic version is less attractive for engineering than the bacterial version because it contains some 30 initiation factor proteins and because eukaryotic ribosome assembly in vitro awaits the coordination of more than a hundred non-ribosomal macromolecules (Fromont-Racine ). Of the bacterial systems, Mycoplasma has advantages over E. coli owing to its eight-fold-smaller minimal genome and its simple set of 29 tRNAs that is the only completely characterized set (Andachi ). Unfortunately, other important biochemical information for Mycoplasma is essentially unknown in areas where it is well studied in E. coli (e.g. reconstitution of ribosomes and translation, characterization and functional assays of rRNA modifications, characterization of RNA modification enzymes). Presently, this seems to favor the E. coli translatome for an MCP.

Purified translation

Efficient synthesis of proteins has been reconstituted from purified natural components (Kung ) or recombinant His-tagged translation factors (Shimizu ) from E. coli, but not yet from eukaryotes. The next steps with the E. coli system will be verifying accuracy by mass spectrometry and extending the short lifetime of the batch mode by continuous dialysis (Spirin ). The versatility of the system will become apparent as more mRNAs are translated. If stronger mRNA secondary structures prove inhibitory despite the helicase activity of the ribosome (Takyar ), introduction of an RNA helicase may be helpful. Given that aminoacyl-tRNA synthetases, translation factors and ribosomal proteins are among the most abundant proteins in the cell, it will be important to verify that the purified system can produce high concentrations of all of these proteins.

An in vitro ribosome

The ribosome of choice is from E. coli because, in contrast with its eukaryotic cousins, it has been self-assembled from its purified components (Traub and Nomura, 1968; Nomura and Erdmann, 1970; Nierhaus and Dohme, 1974) and is homologous with the other components of the gene list (Table I). Reconstituted ribosomes have only been assayed by synthesis of phenylalanine polymers from polyU templates (Lietzke and Nierhaus, 1988); so future assays need to test initiation and elongation at non-UUU codons, and also termination. Furthermore, the self-assembly protocol is finicky and non-physiological. In vitro assembly of the 30S subunit under physiological temperatures has been attained recently by adding the DnaK/DnaJ/GrpE chaperone system (Maki and Culver, 2005), although this system is dispensable in vivo (El Hage ). Perhaps addition of natural polyamines might overcome the requirement for an unphysiologically high concentration of magnesium ions. All 54 of the ribosomal proteins have been cloned (Culver and Noller, 1999; Semrad ); the hypothesis that they (and other proteins in Table I) can be synthesized in a purified translation system in active forms warrants testing. rRNA production in a purified system is complicated by post-transcriptional nucleoside modifications. Since 5S rRNA lacks nucleoside modifications and is short, it is not surprising that it is active when transcribed in vitro (Zvereva ). But the other two rRNAs are modified by about 20 enzymes in E. coli, half of which are unidentified. All 11 modifications of the E. coli small subunit 16S rRNA are dispensable for subunit assembly and aminoacyl-tRNA binding (Krzyzosiak ). However, E. coli 23S rRNA lacking its 23 modifications is 30-fold less active than the natural version in N-Ac-Met-puromycin synthesis (Semrad and Green, 2002) due to one to six modifications in a relatively small RNA domain (Green and Noller, 1996). The enzymes that catalyze these six modifications are therefore included in Table I, although the two known ones are individually dispensable (Del Campo ). Other bacteria should also be entertained for an MCP, as these six E. coli modifications are not conserved and the unmodified 23S RNAs from two other eubacteria are quite active (Green and Noller, 1999; Khaitovich ).

In vitro tRNAs

Which of the myriad tRNA genes and tRNA modification enzymes are likely to be sufficient to decode all 61 sense codons in an MCP? There are some 85 tRNA genes in E. coli coding for some 45 different tRNAs each bearing post-transcriptional modifications on about 10% of their nucleosides (Supplementary Table S5), and a fifth of the tRNAs still remain to be characterized at the modification level. At least 27 different types of nucleoside modifications are present in E. coli (Bjork, 1995). There are an estimated 40–50 tRNA modification enzymes in E. coli, about half of which remain to be identified. To make matters worse (or more interesting) for an MCP, the roles of the tRNA modifications are controversial. Arguments for choosing essential tRNA modification activities are highly speculative (detailed in Supplementary information). As few as 33 E. coli tRNAs may be sufficient to translate the entire genetic code accurately (Table I, left, and Figure 3). E. coli tRNAs could be substituted with the completely characterized set from Mycoplasma capricolum (Supplementary Table S7), which contains only 14 types of nucleoside modifications (Andachi ), some of which differ from E. coli (Supplementary Table S1). However, the predicted savings in the number of essential tRNAs and modification enzymes are minor (Table I, middle column), and full compatibility with the heterologous E. coli translation apparatus seems unlikely (e.g. the codon UGA in Mycoplasma encodes Trp, not stop).

Figure 3

All nucleoside modifications of all 33 synthetic tRNAs that may be sufficient for accurate translation. Outside (shaded): mRNA codons of the genetic code are illustrated in the standard format, except that the 3′ U and C are switched to simplify depiction of decoding. Inside: tRNA nucleotides 34–37 (from 5′ to 3′) and their cognate amino acids. Nucleotides 34–36 are the anticodons, and the 37th nucleotides are represented by black superscripts. Codon and anticodon positions that base pair with each other are colored similarly. Stop codon specificities of release factor (RF) proteins are included. The portions of the tRNA sequences not shown in the figure are unmodified. Expected modifications of in vitro transcripts by the enzymes in Table I, and expected amino-acid and codon specificities are given. *=unspecified modification, _=unknown modification status, ms2i6A=2-methylthio-N6-isopentenyladenosine, m1G=1-methylguanosine, t6A=N6-threonylcarbamoyladenosine, cmnm5U=5-carboxymethylaminomethyluridine, V=cmo5U=uridine 5-oxyacetic acid, I=inosine, cmnm5s2U=5-carboxymethylaminomethyl-2-thiouridine, k2C=lysidine, S=mnm5s2U=5-methylaminomethyl-2-thiouridine, mnm5U=5-methylaminomethyluridine.

Each in vitro-synthesized nascent tRNA transcript should be modified with different combinations of modification enzymes and tested for efficiency and accuracy of codon recognition in translation, initially in a simplified purified translation system (Forster ). Identification of the unknown modification enzymes is being hastened by bioinformatic and genomic approaches (Soma ). It is also conceivable, although unlikely, that unknown small molecules would need to be identified biochemically for RNA modification (or other reactions). The remaining E. coli tRNA modification enzymes not listed in Table I might be predicted to be dispensable based on available data (Bjork, 1995; Giege ). But given the uncertainties, it may be faster to get to a working near-minimal cell by using every known E. coli modification enzyme. Such a system would be ideally suited for freeing up codons to encode unnatural amino acids: this would be carried out by omission of one or more codons from all mRNAs and omission of their cognate tRNAs.

Post-translation

An MCP must promote correct protein folding and any necessary post-translational amino-acid modifications. Early versions of a purified replicating system will contain cell-derived macromolecules, so establishing that such systems can be completely weaned from cells will require enough rounds of replication for ‘infinite' dilution of the starting macromolecules. This will test for dependence on folding by chaperones and on post-translational modifications. It is unclear which, if any, chaperones will be necessary, but GroEL/ES (El Hage ; Kerner ) are likely candidates (Table I). The only known examples of required post-translational modifications for the proteins in Table I are the recently discovered methylations of translation release factors 1 and 2 catalyzed by release factor Gln methylase (Table I) (Heurgue-Hamard ; Nakahigashi ). Other possibilities include ribosomal protein acetylations. Mass spectral comparisons between proteins made in the purified system and those made in vivo will expose modifications and also assess fidelity, while the inactivity of a protein of expected mass would suggest a protein-folding deficit and the need for an additional chaperone. Any necessary missing components could be identified biochemically by mixing with fractionated crude extracts or through genetics.

Compartments and division

Membranes would allow evolution without serial transfers and purifications, extension of the system to new environments and better modeling of cells. On the other hand, membranous boundaries are unnecessary for directed evolution (Mills ) or, in theory, self-replication. Membranes also restrict applications (e.g. delivery of unnatural amino acyl-tRNAs, selection schemes based on binding and spacial arraying for nanofabrication). Addition to self-replicating macromolecules of lipids alone may be sufficient for encapsulation of the macromolecules within bilayer membrane vesicles, synthetic cell division and transmembranous small molecule transport (Szostak ). The choice of lipids is wide open, but one should not underestimate the challenges involved in working with them (Luisi, 2002) nor the advantages in regulation to be gained by adding membrane-modeling proteins (e.g. pores, transporters and the yet-to-be-discovered complement of cell division proteins; Gitai, 2005).

Integrating the subsystems

How might all of the biochemical subsystems in Figure 1 be combined to generate a self-sustaining system? This is clearly a new level of complexity in comparison with prior self-assembly projects. None of the subsystems described above are completed, yet their selection is based on a reasonable plan for their ultimate integration. The approach again would be stepwise, and there are many possible pathways that could be integrated in parallel (Figure 1). For example, transcription by T7 RNA polymerase couples well with a purified E. coli translation system (Shimizu ). Theoretical integration of DNA synthesis, RNA synthesis and RNA processing was discussed above (Figure 2). These four different subsystems could then be combined to synthesize part of a fifth system (the ribosome) by synthesis of an antibiotic-resistant 16S rRNA and His-tagged versions of all 21 small subunit ribosomal proteins (Tian ). The products of these integrated subsystems could then be assayed for correct in vitro reconstitution of small ribosomal subunits by (i) selecting for resistance of protein synthesis to the antibiotic, and (ii) detecting the presence of tagged proteins in purified small ribosomal subunits by Western blot with anti-His antibodies. As another example, rudimentary vesicles encapsulating replicating systems (e.g. Qβ replicase) were shown to be capable of multiplication (Luisi, 2002). Numerous fine-tuning strategies can be envisioned. Relative strengths of DNA promoters and mRNA ribosome-binding sites for different genes could be modeled on the in vivo strengths, with necessary adjustments of synthetic rates (and thus concentrations of products) achieved by mutations in the binding sites (see Supplementary information on transcription). Additional modules might be useful, such as catabolism (nucleases and proteases), active conversion or removal of waste products (e.g. by energy regenerating enzymes (Supplementary Table S1) or membrane transporters) and regulatory feedback (e.g. excess transcription → excess T7 lysozyme mRNA → excess lysozyme → lysozyme binding to and inhibition of T7 RNA polymerase). Control of macromolecular concentrations will be aided by in silico modeling and design (Tomita ). Given that the subsystems discussed above were selected with integration in mind by choosing physiological reaction conditions and homologous components, and given that additional subsystems could always be borrowed from living cells as needed (e.g. E. coli RNA polymerase (Supplementary Table S1) and regulatory modules such as riboswitches (Isaacs )), it seems likely that this approach will eventually produce synthetic self-replication and ultimately a self-sustaining minimal cell. It is important to note that a minimal cell would be intentionally fragile. For example, the vesicle would be easily lysed and the small molecule feeding mix would be highly specialized indeed (including unstable cofactors such as N-5,10-methenyltetrahydrofolate and S-adenosylmethionine). These built-in safety features will prevent a minimal cell from replicating outside the laboratory. However, some or all of the synthetic genes for an MCP would be intentionally passaged through living cells for construction of recombinant DNA clones and for amplification. Constantly upgraded ethical and safety regulations in place for existing biohazards would also encompass this research (Cho ; http://arep.med.harvard.edu/SBP/Church_Biohazard04c.htm).

Completion

In conclusion, a stepwise biochemical approach lends itself to the eventual identification of any remaining functions essential for the synthesis of a minimal cell sustained solely by small molecules. Five states of completion present themselves as tractable goals of an MCP. Namely, the identification of the genes listed as missing in Table I, any additional genes and organization necessary experimentally for minimal cell synthesis, any dispensable genes, biochemical parameters and computational models sufficiently detailed to predict the effects of alterations and the missing three-dimensional structures of the gene products and their relevant complexes. It is difficult to predict how long it will take to debug each of the individual biochemical subsystems or to put them all together; so it is important to bear in mind that there are short-term goals (see the Utility section). Intermediate assembly steps could also be pursued while the gaps in RNA modification knowledge (Table I) are being filled. For example, the project to assemble a ribosome under physiological conditions could be carried out without the missing 23S rRNA modification enzymes (Table I) by substituting in natural 23S rRNA. Similarly, assembly of self-replication in the absence of functional in vitro-synthesized tRNA substrates could be carried out using cellular total tRNA to enable self-replication from substrates (rather than just small molecules) as a major step towards understanding biological self-replication. This would also allow directed evolution of all of the components except the tRNAs in a more flexible manner than is possible in vivo (e.g. for selecting ribosome mutants that incorporate unnatural amino acids more efficiently). The biochemical subsystems necessary for an MCP are central, old fields that have lost impetus. Completion within a decade will only be possible through a coordinated filling of the key gaps in knowledge by the cutting-edge laboratories scattered around the world in these fields. It will also require stimulation of rate-limiting fields. For example, although rRNAs and tRNAs can constitute more than 70% of the dry weight of a cell, half of the estimated 60–70 RNA modification enzymes of E. coli and one-fifth of the tRNAs remain to be characterized (Supplementary Tables S5 and S6), despite the recent completion of about 300 bacterial whole genome sequences. The momentum of genomics and consequent deluge of computed hypotheses cries out for comparable breakthroughs in experimental tests. Synthetic systems biology projects such as an MCP promise such tests with the added bonus of new applications. Supplementary Tables

60 in total

Review 1. Artificial cells: prospects for biotechnology.

Authors: Andrew Pohorille; David Deamer
Journal: Trends Biotechnol Date: 2002-03 Impact factor: 19.536

Review 2. Cre/lox: one more step in the taming of the genome.

Authors: Brian Sauer
Journal: Endocrine Date: 2002-12 Impact factor: 3.633

Review 3. Toward the engineering of minimal living cells.

Authors: Pier Luigi Luisi
Journal: Anat Rec Date: 2002-11-01

4. Programming peptidomimetic syntheses by translating genetic codes designed de novo.

Authors: Anthony C Forster; Zhongping Tan; Madhavi N L Nalam; Hening Lin; Hui Qu; Virginia W Cornish; Stephen C Blacklow
Journal: Proc Natl Acad Sci U S A Date: 2003-05-16 Impact factor: 11.205

5. HemK, a class of protein methyl transferase with similarity to DNA methyl transferases, methylates polypeptide chain release factors, and hemK knockout induces defects in translational termination.

Authors: Kenji Nakahigashi; Naoko Kubo; Shin-ichiro Narita; Takeshi Shimaoka; Simon Goto; Taku Oshima; Hirotada Mori; Maki Maeda; Chieko Wada; Hachiro Inokuchi
Journal: Proc Natl Acad Sci U S A Date: 2002-01-22 Impact factor: 11.205

6. The hemK gene in Escherichia coli encodes the N(5)-glutamine methyltransferase that modifies peptide release factors.

Authors: Valérie Heurgué-Hamard; Stéphanie Champ; Ake Engström; Måns Ehrenberg; Richard H Buckingham
Journal: EMBO J Date: 2002-02-15 Impact factor: 11.598

7. Identification and site of action of the remaining four putative pseudouridine synthases in Escherichia coli.

Authors: M Del Campo; Y Kaya; J Ofengand
Journal: RNA Date: 2001-11 Impact factor: 4.942

8. Osmolytes stimulate the reconstitution of functional 50S ribosomes from in vitro transcripts of Escherichia coli 23S rRNA.

Authors: Katharina Semrad; Rachel Green
Journal: RNA Date: 2002-04 Impact factor: 4.942

9. A simplified reconstitution of mRNA-directed peptide synthesis: activity of the epsilon enhancer and an unnatural amino acid.

Authors: A C Forster; H Weissbach; S C Blacklow
Journal: Anal Biochem Date: 2001-10-01 Impact factor: 3.365

Review 10. How many genes can make a cell: the minimal-gene-set concept.

Authors: E V Koonin
Journal: Annu Rev Genomics Hum Genet Date: 2000 Impact factor: 8.929

109 in total

Review 1. One core, two shells: bacterial and eukaryotic ribosomes.

Authors: Sergey Melnikov; Adam Ben-Shem; Nicolas Garreau de Loubresse; Lasse Jenner; Gulnara Yusupova; Marat Yusupov
Journal: Nat Struct Mol Biol Date: 2012-06-05 Impact factor: 15.369

2. Algorithms for optimization of the transport system in living and artificial cells.

Authors: A V Melkikh; M I Sutormina
Journal: Syst Synth Biol Date: 2011-06-17

3. Thermal adaptation of viruses and bacteria.

Authors: Peiqiu Chen; Eugene I Shakhnovich
Journal: Biophys J Date: 2010-04-07 Impact factor: 4.033

4. DNA construction: homemade or ordered out?

Authors: Peter A Carr
Journal: Nat Methods Date: 2010-11 Impact factor: 28.547

Review 5. Building synthetic memory.

Authors: Mara C Inniss; Pamela A Silver
Journal: Curr Biol Date: 2013-09-09 Impact factor: 10.834

Review 10. Genome engineering.

Authors: Peter A Carr; George M Church
Journal: Nat Biotechnol Date: 2009-12 Impact factor: 54.908