Literature DB >> 23219343

From essential to persistent genes: a functional approach to constructing synthetic life.

Carlos G Acevedo-Rocha1, Gang Fang, Markus Schmidt, David W Ussery, Antoine Danchin.   

Abstract

A central undertaking in synthetic biology (SB) is the quest for the 'minimal genome'. However, 'minimal sets' of essential genes are strongly context-dependent and, in all prokaryotic genomes sequenced to date, not a single protein-coding gene is entirely conserved. Furthermore, a lack of consensus in the field as to what attributes make a gene truly essential adds another aspect of variation. Thus, a universal minimal genome remains elusive. Here, as an alternative to defining a minimal genome, we propose that the concept of gene persistence can be used to classify genes needed for robust long-term survival. Persistent genes, although not ubiquitous, are conserved in a majority of genomes, tend to be expressed at high levels, and are frequently located on the leading DNA strand. These criteria impose constraints on genome organization, and these are important considerations for engineering cells and for creating cellular life-like forms in SB.
Copyright © 2012 Elsevier Ltd. All rights reserved.

Entities:  

Mesh:

Year:  2012        PMID: 23219343      PMCID: PMC3642372          DOI: 10.1016/j.tig.2012.11.001

Source DB:  PubMed          Journal:  Trends Genet        ISSN: 0168-9525            Impact factor:   11.639


The Holy Grail of SB

The goal of SB is to engineer cells for useful applications, while contributing to our understanding of the origin of life on Earth [1]. A core undertaking in SB has been the quest for the minimal set of genes required to allow cellular life; that is, the ‘minimal genome’ concept (Box 1). The assumption of the minimal genome is to use it as a scaffold onto which genes can be added, followed by its transplantation into a chassis (see Glossary). The final aim is to build upon the chassis through inclusion of specialized genes to create ‘turbo cells’ for applications in the fields of energy production, health, and the environment [2]. These applications require an understanding of minimal genomes onto which these specialized genes can be grafted. Attempts to define the minimal genome in microorganisms have employed both random and targeted reductions to strip down nonessential genes as well as comparative genomics. These approaches have yielded valuable information on basic processes required for life, but they have not been entirely successful in their stated goal, and our understanding of minimal genomes still remains limited [3]. Moreover, the lack of consensus in the field as to how to define gene essentiality further complicates the issue. Here, we suggest using metrics of gene persistence as a constructive way to identify the minimal universal functions that support robust cellular life. We review work combining experiments on gene essentiality with in silico comparative genomics that support our belief that searching for a universal minimal genome is unproductive, highlighting the advantages of this new approach. Assembling rationally designed sets of persistent genes should enable the successful engineering of genomes. A deeper analysis of persistent functions also provides an opportunity to explore the evolution of cells from the origin of life to the extant microbial diversity. This has important implications for designing other evolvable synthetic lifeforms.

The elusive minimal genome

For years, scientists have explored ways to define a universal minimal genome. Some efforts have focused on gene mutagenesis experiments, but ‘minimal gene sets’ have remained problematic because these experiments do not take into account gene–environment interactions. Others have centered on comparative genomics, and this allowed scientists to compare genomes from closely or distantly related microorganisms. But as an ever-increasing number of genome projects were completed, the outcome of the comparisons did not improve. As a consequence, even in combination, these approaches failed to provide a universal minimal genome.

The minimal genome in vivo

The first attempts to delineate minimal gene sets arose from experiments meant to identify novel drug targets by determining which genes were essential for the survival of a pathogen. These studies were carried out on libraries of microorganisms using transposons or antisense RNA expression [4], but the ‘minimal set’ outcomes differed widely in terms of gene number (and often identity), not only in distant organisms but also in the same organism under different (and sometimes similar) conditions (Table 1). This environmental context-dependency reflects the existence of many ‘minimal cells’ with different ‘minimal genome’ versions.
Table 1

Minimal gene sets obtained by direct and random experimental mutagenesis

MicroorganismMinimal gene setaMethodRefs
Cell factories/model organisms
Acinetobacter baylyi205/499DGIb[51]
Bacillus subtilis217DGI[52]
Corynebacterium glutamicum658RTMc[53]
Caulobacter crescentus480RTM[54]
Escherichia coli620RTM[11]
303DGI[12]
302DGI[55]
Saccharomyces cerevisiae1105DGI[56]
Human pathogen
Bacillus anthracis253RTM[57]
Francisella tularensis sp. novicida396RTM[58]
Haemophilus influenzae670RTM[59]
136/358RTM[60]
Helicobacter pylori255–344RTM[61]
Mycobacterium tuberculosis∼614RTM[62]
Mycoplasma genitalium265–350RTM[41]
382dRTM[42]
Mycoplasma pulmonis321RTM[63]
Pseudomonas aeruginosa335RTM[64]
Salmonella enterica ser. Typhimurium257–490RTM[65]
Streptococcus pneumoniae82RTM[66]
Staphylococcus aureus71RTM[67]
351RTM[68]
150/600Random antisense RNA[69]
Vibrio cholerae789RTM[70]

Protein-coding genes.

DGI, direct gene inactivation.

RTM, Random transposon mutagenesis.

The study additionally suggested 43 RNA genes, in other words the first minimal gene set of 405 genes including RNA genes.

Targeted methods were also developed to eliminate one gene at a time to generate collections of knockout strains (Table 1). However, the simultaneous elimination of two individually nonessential genes may lead to a lethal phenotype, an outcome commonly known as ‘synthetic lethality’ due to mutually inclusive mutations [5]. Conversely, some genes may be individually essential but, in combination with a second disruption, the first disruption becomes tolerated. This phenomenon was reported in genes of toxin–antitoxin systems due to mutually exclusive mutations [6]. Therefore, mutually inclusive and exclusive mutations preclude the cumulative elimination of dispensable genes in a single strain. The recent quantitative concept of ‘degree of essentiality’ aims at providing a framework to determine synthetic lethal interactions [7]. This approach could then be combined with advanced genome engineering tools to disrupt dispensable genes rationally in a cumulative manner for applications in SB [8].

The minimal genome in silico

Based on the first two bacterial genomes available, a ‘minimal set’ of 256 genes was identified via comparative genomics, but this number dropped to 63 when 100 genomes were examined [4] and to 0 when 1000 genomes were compared [9]. The number of universally conserved genes, however, often depends on the species of the tree of life chosen. This is illustrated by the fact that two protein-coding (elongation factor and ribosomal protein S12) and two non-coding (16S and 23S rRNAs) genes are conserved in 930 of the available 1000 bacterial genomes, but these genes are not conserved in 70 archaeal genomes [9]. Naturally evolved symbionts with reduced genome sizes were also considered as a way to assess the minimal number of genes required for life, but again this showed little consistency in terms of gene number and identity of essential genes [10]. Indeed, obligatory intracellular symbionts and parasites with host-associated lifestyles have considerably relaxed selection on the maintenance of genes that are not required in their protected environments (e.g., synthesis of essential amino acids or vitamins). Therefore, per definition, ‘minimal genomes’ of symbionts and parasite are ecologically constrained. Combining computational and experimental approaches to define a minimal genome is also problematic. Comparing minimal gene sets characterized by two studies for Escherichia coli K-12 revealed an overlap of only 205 genes out of 620 genes [11] and 303 genes [12]. This discrepancy is due to differences in the interpretation of essentiality based on bacterial growth (slow versus rapid growth) and the method used to generate the mutant strains (targeted deletion versus random transposon insertion). Furthermore, if the gene set in the latter study is compared with other genomes, the following numbers of conserved genes are found: (i) 282 genes (90%) among three E. coli species, (ii) 147 genes (49%) among 20 different enterobacteria, (iii) 85 genes (28%) among 74 proteobacteria, and (iv) 42 genes (14%) among 171 bacteria [12]. Another example compared the minimal sets obtained in vitro and in silico from symbionts and free-living organisms, and only 206 universal genes were identified [13]. However, when the robustness of the metabolic network derived from this minimal set of 206 genes was explored in silico, the results suggested that this set would make a very fragile network, indicating that more genes would be required for a truly sustainable lifeform [14]. The main issue is that minimal cells endowed with a minimal genome are adapted to ideal environments, typically nutrient-rich media and relatively constant temperature. However, the elimination of stress-response genes, determined as dispensable under ideal conditions, results in cell death upon mild changes of temperature or nutrient availability [15]. In addition, the elimination of genes from the toxin/antitoxin or restriction/methylase systems, which are dispensable for simple growth on solid media, would render cells vulnerable to infections by phages or other microorganisms in a natural environment. Thus, minimal cells are fragile and restricted to various ecological niches.

Genomes as late inventions of cellular life

A major failing of the minimal genome concept is that it assumes a unique origin of cellular life based on the genome of the ‘last universal common ancestor’ (LUCA) [16]. However, recent research based on comparative proteomics hints that LUCA could have given rise to a community of primordial cells, which were in turn the genetic founders of the three domains of life: Archaea, Eubacteria, and Eukarya [17]. Therefore, it is likely that DNA-based genomes may have developed at a late stage of cellular evolution in which the enzymes involved in DNA replication [18], lipid biosynthesis [19], and RNA degradation pathways [20] were invented not once, but multiple times. Taken together, we believe that these results are indicative of the futility of defining a universal minimal genome, in large part because different essential functions depend on highly diverse environmental constraints, and because life does not appear to have evolved around such a basic unit. Thus, the focus should shift away from the universal minimal genome and towards a more robust and general way to reevaluate the essentiality of a gene. Here, we suggest that the concept of gene persistence can be used to assess the in/dispensability of a gene and provide a list of universal functions shared by living cells that should guide future synthetic biologists as they assemble synthetic constructs.

Gene persistence as a metric of functional essentiality

As discussed above, the number of universally conserved genes, assumed to be essential, drops to 0 as more genomes are compared, especially if many different species from different branches of life are considered. Nevertheless, important genes are preserved and passed on, and this is reflected in gene persistence – in other words, in the fact that some genes, although not ubiquitous, are conserved in the majority of genomes and are distributed throughout the tree of life (Figure 1). This indicates that even if a gene ortholog is not found in the genomes of particular microbial clades, another family of genes might encode the corresponding function.
Figure 1

Different criteria for defining universally conserved genes. When 1000 genomes are compared via comparative genomics, the number of orthologous genes falls to 0 (left), but this number can increase to about 500 persistent genes by comparing orthologs that belong to a quorum of a similar or different genomes from evolutionarily distinct bacteria, above a threshold computed using a measure that retains frequent genes that tend to cluster together (right).

When designing synthetic life, the concept of gene persistence can be used as a metric, replacing gene essentiality. Persistent genes can be identified via gene orthology (Box 2) and are defined by several characteristics: they tend to be expressed at high levels [21], and they are preferentially located on the leading DNA strand [22,23]. This has implications for engineering because replication and transcription occur simultaneously on the same DNA molecule; this biased gene distribution helps in avoiding collisions between the respective machineries. Accordingly, gene persistence suggests strong constraints on genome organization, which should be taken into account by engineers designing robust synthetic cells.

Engineering constraints for genomes and chasses

One way to assess the persistence of a gene is to conceptually reverse-engineer life. By considering the necessary components of life, one can work backwards to identify the functions that have persisted throughout various organisms that have evolved under many different environmental conditions. Once these functions have been identified, they can be assigned to genes based on the criteria used to define persistence (Box 2). We know that life requires at least three interrelated components: (i) genetic program, (ii) metabolism, and (iii) compartmentalization. Although we can synthesize a genetic program from scratch relatively easily [24], our ability to design it de novo is still limited [25]. One challenge lies in understanding the physical constraints of the genome such as organization, codon bias, and conformation [26]. Furthermore, we should not forget that any engineered cellular chassis will need safety valves to control osmotic pressure, transporters for discarding useless metabolic products, and the ability to cope with leftovers resulting from macromolecule degradation, all of which are indispensable functions for cellular maintenance and robust growth [27]. The presence of genes involved in these processes must be ubiquitous. However, like chopsticks and forks, things with the same function do not need to resemble each other. As a case in point, an essential function such as degradation of very short RNA leftovers requires nanoRNases that come from a variety of origins (Orn, NrnA, NrnB, NrnC); sometimes these are considered essential because a unique gene exists in a given organism (orn in E. coli), or apparently nonessential because of functional redundancy (nrnA, nrnB) [28]. If this persistent function would be considered nonessential and eliminated in a particular genome, the cells will inevitably age, lose their capacity to generate progeny, and die [29]. An additional important outcome of considering persistent over essential genes in SB is that the former could provide an inventory of essential functions as ‘parts’ to which the corresponding gene sequences could be listed depending on the ‘chassis’. For example, a part could be an essential amino acid, but if a given microorganism lives in a rich environment, the chassis of that organism would have to include a receptor/uptake mechanism. However, if an organism lives in a poor environment, it would have to synthesize the essential amino acid by itself. It is important to mention that the biosynthesis of essential amino acids or coenzymes is highly variable although, as illustrated in the MetaCyc database, for example, that contains multiple pathways for NAD or lysine biosynthesis (respectively, three or six pathways) [30]. Thus, that there are many solutions to the same problem is another aspect that gene persistence considers. Finally, analysis of gene persistence also provides a list of universal functions shared by most bacterial genomes. Accordingly, genomes can be divided in two classes: the ‘paleome’ containing persistent genes and the ‘cenome’ composed of nonpersistent genes [31]. This delineation is important for engineering goals because the paleome corresponds to a universal core, whereas the cenome provides accessory functions for particular niches (vide infra).

Paleome and cenome

The paleome, or old genome, can be defined as an early archive of the origin of life [31]. It contains persistent genes involved in essential functions related to growth, replication, transcription, translation, maintenance, as well as in aging and senescence [31]. The paleome can be further subdivided into two main functionalities: persistent essential genes that allow cells to sustain life, reproduce, and replicate their DNA, and a set of persistent nonessential genes (as determined experimentally in many cases), which are mainly involved in cellular maintenance and stress response [32]. These ‘dispensable’ genes should be particularly important in SB because their elimination can result in cell death upon environmental fluctuations [15]. Structurally, the paleome is composed of about 500 persistent genes (see borderline genes in [31]) distributed in three clusters: (i) core metabolism and synthesis of amino acids, nucleotides, coenzymes, and lipids, (ii) cell division and aminoacyl-tRNA synthetases, and (iii) transcription and translation [31]. A significant proportion of persistent genes allow cells to adapt by evolving while maintaining important functional elements [29]. Although the paleome will allow survival in an optimal environment, many more genes are required for dealing with natural environments. These nonpersistent genes comprise the cenome, or community genome, a set of genes whose functions are necessary for an organism to exploit particular niches by sensing, moving, or scavenging [31]. These genes tend to move from organism to organism by horizontal gene transfer, which accounts for the fact that they tend to cluster together within the genome [33]. The cenome is extremely variable and differs from strain to strain in a given species. Whereas the cenome of a given species is a subset of the corresponding pan-genome of a particular species in a particular niche, the sum of the paleome and all of the cenomes corresponds to the pan-genome of all strains of a given species (Figure 2).
Figure 2

A universe of gene functions. In a particular environment, the sum of all microbial genes corresponds to the metagenome, which is in turned formed by pan-genomes. A pan-genome is the sum of all genomes of similar strains; each having similar (core genome) or distinct (cenomes) sets of nonpersistent genes. About ∼500 persistent genes form the paleome. As an example, the addition of 1500 nonpersistent genes to the 500 persistent genes of the paleome in E. coli makes a core genome of 2000 genes, whereas the sum of all cenomes of each individual E. coli strain comprises about 18 000 genes [71]. For the time being, the pan-genome of E. coli is composed of roughly 20 000 genes (2000 of the core-genome and 18 000 of the cenomes), the majority of which (80%) is often colocalized on genomic islands [72]. For a particular E. coli strain with a genome of 4500 genes the cenome alone would be about 4000 genes.

Concluding remarks

Like the Holy Grail, a universal DNA ‘minimal genome’ has remained elusive despite efforts to define it. This is partially due to the strong context-dependency of essential genes and the likelihood that DNA-based genomes may have developed at a late stage of cellular evolution. Furthermore, many functions may be fulfilled by a variety of gene products, precluding ubiquitous conservation between species. Therefore, gene essentiality has to be defined within the specific context of the bacterium, growth conditions, and possible environmental fluctuations. This presents a bewildering number of conditions to consider, but gene persistence can be used as an alternative because it provides a more general framework for defining the requirements for long-term survival via identification of universal functions. These functions are contained in the paleome, which provides the core of the cell chassis, whereas the cenome corresponds to nonpersistent genes required to explore a particular niche. These concepts are useful for engineering life for a particular context-dependent application: first, identify a specific chassis (i.e., one suited to the specific environmental conditions), then rationally delete nonpersistent (or truly dispensable) functions [8] to leave behind the paleome and a reduced cenome, and finally add particular sets of functions (extracted from known cenomes or metagenomics projects) helpful for the application in question. It is important to note that it should be easier to design synthetic constructs for scaling-up in a fermenter [27], than for applications in the environment [34], because there are more fluctuations in the latter. In this case, experimentally determining whether a gene is persistent would require evaluating the survival of a mutant in laboratory adaptive-evolution experiments [35], where fluctuation of nutrients or changing environmental conditions could take place, for instance, in a chemostat or turbidostat. This will also enable bottom-up tinkerers [36] and xenobiologists [37] to evolve other similar synthetic lifeforms.
  72 in total

1.  Genomics. Tinker, tailor: can Venter stitch together a genome from scratch?

Authors:  Carl Zimmer
Journal:  Science       Date:  2003-02-14       Impact factor: 47.728

2.  Genome-wide identification of Streptococcus pneumoniae genes essential for bacterial replication during experimental meningitis.

Authors:  T E Molzen; P Burghout; H J Bootsma; C T Brandt; Christa E van der Gaast-de Jongh; M J Eleveld; M M Verbeek; N Frimodt-Møller; C Østergaard; P W M Hermans
Journal:  Infect Immun       Date:  2010-11-01       Impact factor: 3.441

3.  Life's demons: information and order in biology. What subcellular machines gather and process the information necessary to sustain life?

Authors:  Philippe M Binder; Antoine Danchin
Journal:  EMBO Rep       Date:  2011-05-06       Impact factor: 8.807

4.  Bottom-up synthetic biology: engineering in a tinkerer's world.

Authors:  Petra Schwille
Journal:  Science       Date:  2011-09-02       Impact factor: 47.728

5.  Essence of life: essential genes of minimal genomes.

Authors:  Mario Juhas; Leo Eberl; John I Glass
Journal:  Trends Cell Biol       Date:  2011-09-01       Impact factor: 20.808

6.  Synthetic chromosome arms function in yeast and generate phenotypic diversity by design.

Authors:  Jessica S Dymond; Sarah M Richardson; Candice E Coombes; Timothy Babatz; Héloïse Muller; Narayana Annaluru; William J Blake; Joy W Schwerzmann; Junbiao Dai; Derek L Lindstrom; Annabel C Boeke; Daniel E Gottschling; Srinivasan Chandrasegaran; Joel S Bader; Jef D Boeke
Journal:  Nature       Date:  2011-09-14       Impact factor: 49.962

7.  Genes required for mycobacterial growth defined by high density mutagenesis.

Authors:  Christopher M Sassetti; Dana H Boyd; Eric J Rubin
Journal:  Mol Microbiol       Date:  2003-04       Impact factor: 3.501

Review 8.  Microbial laboratory evolution in the era of genome-scale science.

Authors:  Tom M Conrad; Nathan E Lewis; Bernhard Ø Palsson
Journal:  Mol Syst Biol       Date:  2011-07-05       Impact factor: 11.429

9.  The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases.

Authors:  Ron Caspi; Tomer Altman; Kate Dreher; Carol A Fulcher; Pallavi Subhraveti; Ingrid M Keseler; Anamika Kothari; Markus Krummenacker; Mario Latendresse; Lukas A Mueller; Quang Ong; Suzanne Paley; Anuradha Pujar; Alexander G Shearer; Michael Travers; Deepika Weerasinghe; Peifen Zhang; Peter D Karp
Journal:  Nucleic Acids Res       Date:  2011-11-18       Impact factor: 16.971

10.  The essential genome of a bacterium.

Authors:  Beat Christen; Eduardo Abeliuk; John M Collier; Virginia S Kalogeraki; Ben Passarelli; John A Coller; Michael J Fero; Harley H McAdams; Lucy Shapiro
Journal:  Mol Syst Biol       Date:  2011-08-30       Impact factor: 11.429

View more
  38 in total

1.  Functional genomics of Lactobacillus casei establishment in the gut.

Authors:  Hélène Licandro-Seraut; Hélène Scornec; Thierry Pédron; Jean-François Cavin; Philippe J Sansonetti
Journal:  Proc Natl Acad Sci U S A       Date:  2014-07-14       Impact factor: 11.205

Review 2.  Evolution by gene loss.

Authors:  Ricard Albalat; Cristian Cañestro
Journal:  Nat Rev Genet       Date:  2016-04-18       Impact factor: 53.242

3.  Uncovering major genomic features of essential genes in Bacteria and a methanogenic Archaea.

Authors:  Ana Laura Grazziotin; Newton Medeiros Vidal; Thiago Motta Venancio
Journal:  FEBS J       Date:  2015-07-14       Impact factor: 5.542

4.  Systems biology definition of the core proteome of metabolism and expression is consistent with high-throughput data.

Authors:  Laurence Yang; Justin Tan; Edward J O'Brien; Jonathan M Monk; Donghyuk Kim; Howard J Li; Pep Charusanti; Ali Ebrahim; Colton J Lloyd; James T Yurkovich; Bin Du; Andreas Dräger; Alex Thomas; Yuekai Sun; Michael A Saunders; Bernhard O Palsson
Journal:  Proc Natl Acad Sci U S A       Date:  2015-08-10       Impact factor: 11.205

5.  Emergence of robust growth laws from optimal regulation of ribosome synthesis.

Authors:  Matthew Scott; Stefan Klumpp; Eduard M Mateescu; Terence Hwa
Journal:  Mol Syst Biol       Date:  2014-08-22       Impact factor: 11.429

6.  Minimal genome encoding proteins with constrained amino acid repertoire.

Authors:  Olga Tsoy; Marina Yurieva; Andrey Kucharavy; Mary O'Reilly; Arcady Mushegian
Journal:  Nucleic Acids Res       Date:  2013-07-19       Impact factor: 16.971

7.  Genomic Adaptations to the Loss of a Conserved Bacterial DNA Methyltransferase.

Authors:  Diego Gonzalez; Justine Collier
Journal:  MBio       Date:  2015-07-28       Impact factor: 7.867

8.  Geptop: a gene essentiality prediction tool for sequenced bacterial genomes based on orthology and phylogeny.

Authors:  Wen Wei; Lu-Wen Ning; Yuan-Nong Ye; Feng-Biao Guo
Journal:  PLoS One       Date:  2013-08-15       Impact factor: 3.240

9.  Genes found essential in other mycoplasmas are dispensable in Mycoplasma bovis.

Authors:  Shukriti Sharma; Philip F Markham; Glenn F Browning
Journal:  PLoS One       Date:  2014-06-04       Impact factor: 3.240

10.  Predicting the minimal translation apparatus: lessons from the reductive evolution of mollicutes.

Authors:  Henri Grosjean; Marc Breton; Pascal Sirand-Pugnet; Florence Tardy; François Thiaucourt; Christine Citti; Aurélien Barré; Satoko Yoshizawa; Dominique Fourmy; Valérie de Crécy-Lagard; Alain Blanchard
Journal:  PLoS Genet       Date:  2014-05-08       Impact factor: 5.917

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.