Kalliopi Georgiades1, Vicky Merhej, Didier Raoult. 1. Unité de Recherche en Maladies Infectieuses Tropical Emergentes, CNRS-IRD UMR 6236-198, Université de la Méditerranée Marseille, France. didier.raoult@gmail.com
Abstract
Many of the definitions in microbiology are currently false. We have reviewed the great denominations of microbiology and attempted to free microorganisms from the theories of the twentieth century. The presence of compartmentation and a nucleoid in Planctomycetes clearly calls into question the accuracy of the definitions of eukaryotes and prokaryotes. Archaea are viewed as prokaryotes resembling bacteria. However, the name archaea, suggesting an archaic origin of lifestyle, is inconsistent with the lifestyle of this family. Viruses are defined as small, filterable infectious agents, but giant viruses challenge the size criteria used for the definition of a virus. Pathogenicity does not require the acquisition of virulence factors (except for toxins), and in many cases, gene loss is significantly inked to the emergence of virulence. Species classification based on 16S rRNA is useless for taxonomic purposes of human pathogens, as a 2% divergence would classify all Rickettsiae within the same species and would not identify bacteria specialized for mammal infection. The use of metagenomics helps us to understand evolution and physiology by elucidating the structure, function, and interactions of the major microbial communities, but it neglects the minority populations. Finally, Darwin's descent with modification theory, as represented by the tree of life, no longer matches our current genomic knowledge because genomics has revealed the occurrence of de novo-created genes and the mosaic structure of genomes, the Rhizome of life is therefore more appropriate.
Many of the definitions in microbiology are currently false. We have reviewed the great denominations of microbiology and attempted to free microorganisms from the theories of the twentieth century. The presence of compartmentation and a nucleoid in Planctomycetes clearly calls into question the accuracy of the definitions of eukaryotes and prokaryotes. Archaea are viewed as prokaryotes resembling bacteria. However, the name archaea, suggesting an archaic origin of lifestyle, is inconsistent with the lifestyle of this family. Viruses are defined as small, filterable infectious agents, but giant viruses challenge the size criteria used for the definition of a virus. Pathogenicity does not require the acquisition of virulence factors (except for toxins), and in many cases, gene loss is significantly inked to the emergence of virulence. Species classification based on 16S rRNA is useless for taxonomic purposes of human pathogens, as a 2% divergence would classify all Rickettsiae within the same species and would not identify bacteria specialized for mammalinfection. The use of metagenomics helps us to understand evolution and physiology by elucidating the structure, function, and interactions of the major microbial communities, but it neglects the minority populations. Finally, Darwin's descent with modification theory, as represented by the tree of life, no longer matches our current genomic knowledge because genomics has revealed the occurrence of de novo-created genes and the mosaic structure of genomes, the Rhizome of life is therefore more appropriate.
Entities:
Keywords:
archaea; bacterial virulence factors; definitions; metagenomics; orphan genes; prokaryotes; tree of life; virus
Post-modern philosophy, also called the French theory (Wicks, 2003), states that the majority of theories, including scientific theories, are only based on meta-narratives expressing the influence of a culture at a given time. These scientific theories can be questioned when a change in techniques creates instability in the theory, as postulated by Karl Popper (Popper, 1959; Raoult, 2010a). In addition (and in the direction of post-modern theory), these theories can also be called into question due to an intellectual change of paradigm (Kuhn, 1962). The study of Rickettsiae has been challenging for the past few years because of the great difficulty in their handling. Moreover, the ancestors of Rickettsiae contributed to the birth of modern eukaryotes by transferring genes to the mitochondrion and the nucleus (Koonin, 2010; Renvoisé et al., 2011). During the explosion of microbial genetics, the study of Rickettsia did not benefit from the model of Escherichia coli, and rickettsiologists had to develop alternative approaches that did not include the common meta-narratives (Renvoisé et al., 2011). Among these approaches were those based on observations of the characteristics of intracellular bacteria, whose genomes and behaviors resemble those of viruses. Thus, Rickettsia has been classified as intermediate bacteria between the viruses and bacteria. Currently, the genomic revolution and “multiomics” have made it possible to analyze Rickettsia with many new tools (Bechah et al., 2010), and Rickettsia was among the species that were sequenced most quickly (Andersson et al., 1998; Ogata et al., 2001). This sequencing and generally all the work achieved by Rickettsiologists brought an important revision to the way of thinking with respect to E. coli and forced microbiologists to visualize the general theories concerning bacterial species in a different way, so several theories concerning bacteria had, or need, to be revised (Georgiades and Raoult, 2011a). In this work, our goal was to revise the overarching classifications and denominations used in microbiology. In particular, as postulated by post-modern philosophy (Lyotard, 1979; Williams, 1998), we know that the denomination of an object constrains it in its definition and that when the definition is inappropriate, one cannot conceive of the object in a reasonable way.
Definition of Eukaryotes and Prokaryotes
The word “microbe,” literally meaning “small life,” was introduced by the French surgeon Charles Sédillot in 1878 to define infinitely small living organisms (Vallery-Radot, 1885). One of the most important advances in our understanding of the living world was the realization by the French scientist Edouard Chatton that there are two major groups of organisms that he named the prokaryotic (bacteria) and the eukaryotic (organisms with nucleated cells) type (Chatton, 1925; Stanier and van Niel, 1962; Sapp, 2005). This classification was adopted by Stanier and van Niel (1962) and the prokaryote–eukaryote dichotomy was universally accepted as the natural order of things until the 1970s and the emergence of rRNA phylogenetics (Sapp, 2005). At that time, Woese achieved a comprehensive understanding of bacterial phylogeny using laborious molecular sequencing methods (Woese et al., 1975). Those data revealed two separate lineages among prokaryotes: the Archaea (Archaebacteria) and the Bacteria (Eubacteria). The prokaryote/eukaryote system was replaced by the “three domain system” and the classification of Eukarya, Archaea, and Bacteria (Woese, 1994). However, bacteria had always been defined largely in negative terms: they lacked a nucleus, compartmentation, and sexual reproduction (Sapp, 2005). This negative description is somewhat invalid because it does not define what a prokaryote is but rather what it is not (Pace, 2006). Furthermore, recent observations of Planctomycetes prove that the definitions of eukaryotes and prokaryotes are erroneous. Planctomycetes is a distinctive phylum of the domain Bacteria, in which the cells possess a different structural plan than other prokaryotes; the cells of all cultured and some uncultured species are divided into compartments by one or more membranes (Figure 1). In addition, in one particular species, Gemmata obscuriglobus, the nucleoid is enveloped in two membranes to form a nuclear body that is analogous to the structure of a eukaryotic nucleus. The existence of these organisms clearly calls into question the accuracy of the actual definitions of eukaryotes and prokaryotes (Fuerst, 1995, 2005, 2010). The nucleus of these cells likely resulted from autogenous membrane development in a prokaryote lineage (Taylor, 1976; Lake and Rivera, 1994; Glansdorff et al., 2008), most likely in Planctomycetes and the closely related Chlamydia (Ward et al., 2000; Horn et al., 2004; Figure 2). This theory has been strengthened by the discovery of nuclear envelope fold topology in Planctomycetes, which is analogous to the eukaryotic cell structure (Fuerst, 2005, 2010). Moreover, the eukaryotes all harbor mitochondria, or mitochondria-related genes, inherited from Rickettsiales (Golding and Gupta, 1995; Lang et al., 1999). Therefore, eukaryotes are younger than Rickettsia, their other ancestors are unknown, and there is no evidence that these ancestors had a nucleus. As it turns out, the three domain system, as previously defined, does not exist (Lake, 1988).
Figure 1
Compartmentation in prokaryotes and eukaryotes. Compartmentation of Gemmata obscuriglobus
(A) and a eukaryotic cell (B) is comparable.
Figure 2
Time scale of eukaryogenesis and nucleogenesis. Eukaryotes are not the only species with compartmentation. First eukaryotes emerged from an endosymbiotic event. The first nucleus appeared approximately 3 billion years ago in Planctomycetes and Chlamydia. These numbers are approximations (Bromham and Penny, 2003; Cavalier-Smith, 2004; Trevors and Abel, 2004).
Compartmentation in prokaryotes and eukaryotes. Compartmentation of Gemmata obscuriglobus
(A) and a eukaryotic cell (B) is comparable.Time scale of eukaryogenesis and nucleogenesis. Eukaryotes are not the only species with compartmentation. First eukaryotes emerged from an endosymbiotic event. The first nucleus appeared approximately 3 billion years ago in Planctomycetes and Chlamydia. These numbers are approximations (Bromham and Penny, 2003; Cavalier-Smith, 2004; Trevors and Abel, 2004).
Archaea
When they were identified in the late 1970s based upon ribosomal sequences, Archaea were viewed as a group of archaic bacteria (Woese and Fox, 1977). Indeed, because of their capacity of methanogenesis, archaea were supposed to live in ancient organisms and received the name of “archaea.” This name is misleading as it speculates that these organisms resemble ancient cells and live in specific and “archaic like” environment. They have long been considered as extremophile bacteria that can be found in the most extreme environments (temperature, salinity, and pH). This explains the fact that archaea have not been extensively studied in clinical microbiology and their place among living organisms long went unrecognized.Because of their archaic label, Archaea have been used as models of the early evolution of cellular life forms (Romano and Conway, 1996; Embley and Martin, 2006; Poole and Penny, 2007; Cox et al., 2008). The information processing machineries of archaea are considered ancestral forms of the more complex replication, transcription, and translation machineries of the eukaryotic cell (Gribaldo et al., 2010). Other evolutionary hypotheses about the path of life reject the archaic status of archaea. They suggested that the three domains of life evolved from a pre-cellular community containing different types of genes using a process that led to the fixation of specific subsets of genes in the ancestors of these domains (Woese, 1998). Considering evidence from molecular sequences, envelope structure, and motility mechanisms, other hypothesis suggested that the archaea evolved from Gram-positive bacteria as an adaptation to hyperthermophilic or hyperacidity (Cavalier-Smith, 2002) or in response to antibiotic selection pressure (archaea are resistant to a wide variety of antibiotics that are primarily produced by Gram-positive bacteria; Gupta, 1998a,b, 2000).Recent results obtained using molecular approaches and metagenomic studies have changed our perspective of the nature and the diversity of archaea. Indeed, archaea were considered predominant over bacteria in all extreme environments. This is true for high-temperature environments, as only archaea can thrive at temperatures from 95 to 113°C (Huber et al., 2000). However, in all other situations, species of Bacteria and Eukarya have been found together with those of archaea (Aravalli et al., 1998; DeLong, 1998; Rothschild and Mancinelli, 2001). Novel archaea have been isolated from a variety of temperate and cold environments (Cavicchioli, 2006), agricultural and forest soils, plankton, fresh water lake sediments, and the deep waters of oceans (Schleper et al., 2005). Archaea seem to constitute a major part of global ecosystems. They were estimated to account for approximately 34% of the total marine biomass of Antarctica (DeLong, 1998) and for nearly 20% of the total marine picoplankton biomass worldwide (Karner et al., 2001). The ubiquitous abundance of archaea and their influence on biogeochemical cycles remain largely unexplored. A recent tentative to infer the ancestral conditions of life suggests that he last common ancestor of archaea has been hyperthermophilic and mesophilic species have showed adaptations to cooler environments (Groussin and Gouy, 2011).Methanogenic bacteria play a paramount role in digestion processes. Indeed, metagenomic analysis of the gut flora in three healthy individuals established that Methanobrevibacter smithii comprised up to 11.5% of the gut microorganisms (Eckburg et al., 2005). On the contrary, while many studies using 16S rDNA sequencing confirmed the presence of M. smithii in the human gut, the prevalence was low and Methanosphaera stadtmanae was not detected in most cases (Miller and Wolin, 1982; Belay et al., 1988; Dridi et al., in revision). This contrast is due to limitations in the experimental protocols that are largely designed for the study of bacteria. Recently, in our laboratory we developed an optimized protocol for the extraction and specific PCR-based detection of M. smithii and M. stadtmanae in DNA stool samples, using specific primers (Dridi et al., 2009). Using this protocol it was demonstrated that all individuals carried methanogenic archaea with a high prevalence of M. smithii (95.5%). The application of this specific approach allowed the isolation of Methanomassiliicoccus luminyensis and its description as a new species (Dridi et al., in revision), and the same protocol can be used to identify other archaeal species (Dridi et al., 2009, 2011). It is obvious that previous methods did not allow the identification of Archaea because they were not designed for Archaea identification.Molecular experiments and genomic approaches have suggested that the different criteria used to define archaea are not completely valid. The definition currently used for Archaea merely cloaks our lack of knowledge of this domain of life. Undoubtedly, Archaea are not a form of “archaic” bacteria, they rather represent a distinct evolutionary domain.
Bacterial Virulence Factors
It is not surprising that many people believe that bacteria that are dangerous to us are better armed than non-pathogenic bacteria. Toxins were identified in 1884 and defined as virulence factors; since, early genetic studies on bacterial virulence demonstrated that removing a certain number of genes from pathogenic bacteria decreases their capacity to infect hosts. Therefore, the conclusions of most studies on bacterial virulence, driven by anthropocentric intuition and perspective, suggested, and some still suggest that non-pathogenic bacteria lack supplementary virulence factors (Lawrence, 1999; Ochman et al., 2005).An outstanding example of this way of thinking is the Shigella paradigm. Shigella spp. are human pathogens associated with bacillary dysentery, or shigellosis. Shigella dysenteriae causes deadly epidemics in many of the world’s poorest countries. Shigella spp. and E. coli have always been considered closely related, and they have even been placed in the same species (Pupo et al., 2000). However, most E. coli strains are commensals of the human intestine (Maurelli et al., 1998), and Shigella spp. differ from E. coli in their lack of certain phenotypic traits, such as extracellular mobility and the ability to ferment lactose and many sugars (Karaolis et al., 1994; Pupo et al., 2000). Similar to S. dysenteriae, pathogenic enteroinvasive E. coli lack lysine decarboxylase (LDC) activity. In a study by Maurelli et al. (1998), the induction of LDC expression attenuated the virulence of a transformed strain of S. flexneri. It seems plausible that Shigella evolved from the E. coli complex through the acquisition of a plasmid containing critical genes. Plasmids of Shigella spp. have been directly associated with virulence and were even named “virulence plasmids” after their discovery (Hale et al., 1983). Furthermore, actin-based motility initiated by the icsA gene has also been reported to be a virulence factor (Goldberg et al., 1994). However, virulence increased after massive gene deletions (Maurelli et al., 1998). In conclusion, S. dysenteriae was not found to have more virulence genes than related bacteria (Georgiades and Raoult, 2011a).Many recent comparative genomics studies have demonstrated that the specialization of bacteria for the colonization of eukaryotic hosts is associated with massive gene loss (Nierman et al., 2004; Merhej et al., 2009a) and the loss of identified “virulence factors” (Audic et al., 2007). Genomic analysis has revealed that Borellia recurrentis, the agent of deadly louse-borne relapsing fever, encodes fewer putative virulence factors than Borellia duttonii (Lescot et al., 2008). Gene loss has also accompanied the evolution of pathogenic Bordetella species (Cummings et al., 2004) and gene deletions in Mycobacterium tuberculosis have resulted in a hypervirulent phenotype (Bokum et al., 2008). Finally, in a study by Audic et al. (2007), the number of putative virulence factors was found to be higher in water-dwelling bacteria than in any other categories of bacteria, including specialized pathogens (Audic et al., 2007).One of the best examples of genome reduction can be found in the epidemic-causing Rickettsia prowazekii, which is the most dangerous rickettsial species. Genome comparisons of R. prowazekii with the less virulent R. conorii have revealed that R. prowazekii is a subset of R. conorii, with only 834 open reading frames (ORFs) compared to the 1,374 ORFs of R. conorii (Ogata et al., 2001). Although intracellular motility has been considered a virulence factor of Shigella (Goldberg and Theriot, 1995) and Listeria monocytogenes (Tilney and Portnoy, 1989; Mounier et al., 1990), R. prowazekii is completely immobile in the cytoplasm (Teysseire et al., 1992). Actin-based motility in R. conorii and R. rickettsii requires two proteins functioning together, Sca2 and RickA, suggesting that these two proteins could be virulence factors of R. rickettsii. R. typhi possesses only the Sca2 protein and is also mobile in the cytoplasm but less than R. conorii (Teysseire et al., 1992; Figure 3). However, none of these proteins are found in R. prowazekii, which lacks actin-based motility (Kleba et al., 2010). Consequently, motility is not a virulence factor per se but can be part of a virulence repertoire in some pathogens (Georgiades and Raoult, 2011a). Other studies have also demonstrated genome reduction to a lower extent in the extremely successful and fit R. africae, the agent of African tick-bite fever. In contrast with their possession of virulence factors, R. prowazekii and R. africae have the most and the least decayed genomes, respectively, among pathogenic Rickettsiae (Fournier et al., 2009). A comparison of R. africae with R. rickettsii suggested the loss of essential genes in R. rickettsii as a possible factor involved in the development of virulence (Fournier et al., 2009). In general, pathogenic Rickettsia species lack what was defined as “pathogenicity islands” and that are present in other bacterial pathogens (Hacker and Kaper, 2000). It has been suggested that plasmids contain genes encoding proteins responsible for host recognition, invasion, and pathogenicity. The presence of plasmids in Rickettsia species, however, did not show any correlation with virulence (Paddock et al., 2004; Ogata et al., 2005; Blanc et al., 2007a). The examples of Rickettsiae and Shigella spp. show that the terms “pathogenicity islands” and “virulence plasmids” are misleading. The genomic analysis of rickettsial species has revealed that the shift to pathogenicity does not require the acquisition of new genes, but in more cases, and not only in Rickettsia, gene loss seems to be implicated in the emergence of virulence (Moran, 1996; Andersson and Kurland, 1998; Andersson and Andersson, 1999; Blanc et al., 2007a; Darby et al., 2007; Merhej et al., 2009a). In a recent study in our laboratory, we demonstrated that the only features found at higher levels in extremely dangerous bacterial pathogenic species than in closely related less pathogenic species were toxins and toxin–antitoxin modules (TA; Georgiades and Raoult, 2011b).
Figure 3
Motility is not necessarily a virulence factor. (A)
Rickettsia conorii is mobile in the cytoplasm and in the nucleus and moves quickly. Actin-based motility is associated with the Sca2 and RickA proteins (represented in red and blue circles respectively). Yellow stars are to demonstrate that the bacterium moves fast in the cytoplasm; dotted lines are to show that the bacterium can be found anywhere in the cytoplasm and even in the nucleus. (B)
Rickettsia typhi is also mobile in the cytoplasm, but it moves less quickly than R. conorii. Its mobility is associated with the Sca2 protein. (C)
R. prowazekii is completely immobile in the cytoplasm. The Sca2 and RickA proteins are absent.
Motility is not necessarily a virulence factor. (A)
Rickettsia conorii is mobile in the cytoplasm and in the nucleus and moves quickly. Actin-based motility is associated with the Sca2 and RickA proteins (represented in red and blue circles respectively). Yellow stars are to demonstrate that the bacterium moves fast in the cytoplasm; dotted lines are to show that the bacterium can be found anywhere in the cytoplasm and even in the nucleus. (B)
Rickettsia typhi is also mobile in the cytoplasm, but it moves less quickly than R. conorii. Its mobility is associated with the Sca2 protein. (C)
R. prowazekii is completely immobile in the cytoplasm. The Sca2 and RickA proteins are absent.In conclusion, except for toxins and TA modules, which have a direct effect and are indeed virulence factors, other products named “virulence factors” are, in reality, associated with fitness in a genomic context and in a specific environment, including in tested experimental models. Comparative genomics have shown than pathogenic bacteria have smaller genomes than non-specialized bacteria. Therefore, it is not possible to say that supplementary virulence factors establish pathogenicity, but rather, the overall gene repertoire is more associated with virulence than specific genes. In a recent study, the deletion of four different gene clusters in fungi attenuated their virulence in plants, while deletion of the “divergence cluster 8–12” (region encoding effector genes with mow sequence conservation) caused a hypervirulent fungal phenotype (Schirawski et al., 2010). Under these conditions, a virulent gene repertoire is composed of both present and absent genes. The term “virulence factor” seems to be invalid, and we propose that it should be abandoned.
Phylogeny and Taxonomy
Biological dogma states that phylogeny reflects taxonomy. Indeed, the 16S rRNA sequence has been widely used for the description of many newly classified bacterial species (Rosello-Mora and Amann, 2001; Drancourt et al., 2004; Roux et al., 2004). A 16S rRNA divergence of 1–2% is considered to correspond to approximately 50 million years of divergence (Ochman et al., 1999; Ogata et al., 2001), and a cut-off of 98.7% similarity in 16S rRNA reflects a new species (Stackebrandt and Ebers, 2006). However, an accurate delineation of bacterial species cannot be guaranteed by the use of ribosomal DNA sequence identity, which often leads to misleading species definitions (Fox et al., 1992; Rosello-Mora and Amann, 2001). Bartonella henselae has two copies of 16S rRNA in some cases, which likely emerged through recombination (Sanogo et al., 2003), and these copies present a divergence higher than 1.3% (Viezens and Arvand, 2008). For several bacterial species, the presence of multiple copies of the 16S rRNA gene has been documented (Acinas et al., 2004). Although generally, these multiple copies in an organism are either identical or nearly identical, in some cases, they are divergent enough to overestimate the number of bacterial species. This overestimation can be seen in the case of Delisea
pulchra strains, in which 16S rRNA gene copies were used to illustrate the effects of 16S rRNA heterogeneity in the marine bacterial community (Dahllöf et al., 2000; Adékambi et al., 2008). The use of 16S rRNA for such analysis is limited due to its inherent heterogeneity (Dahllöf et al., 2000). Moreover, using the molecular clock scale based on 16S rRNA as a species definition criterion, specialized bacteria within mammalian hosts are not defined as species (Georgiades and Raoult, 2011a). Species definition cannot be based on the percent divergence of 16S rRNA because bacteria having a divergence less than 1.3% correspond to bacterial complexes rather than species (Doolittle and Papke, 2006).There are 9,000 validated bacterial species and 1.5 million eukaryotic species, even though the biomass of bacteria is comparable to that of eukaryotes; this suggests that use of the 16S rDNA sequence as a taxonomic tool is not adapted to the definition of species. Furthermore, genomic contents are not represented by phylogeny. In a study based on the genomic content comparison of bacteria with different lifestyles, discrepancies between taxonomy and gene content were observed (Audic et al., 2007; Merhej et al., 2009b). The phylogenomic analysis yielded a tree similar to the one produced using the 16S rDNA gene sequence. However, γ-proteobacteria appeared to be divided into three groups, confirming that these species were more similar to each other in terms of gene content than to their close phylogenetic relatives (Audic et al., 2007). Similarly, rickettsial species and relatives, such as Wolbachia and Ehrlichia species, comprise an α-proteobacterial clade characterized by small genomes; this clade is distantly related to other α-proteobacterial species with larger genomes (Moran, 2002). Phylogenetic analysis of Rickettsia species based on 16S rRNA sequences has been frequently performed; however, significant inferences about intragenus phylogeny are not possible because the sequences are almost identical (Roux and Raoult, 1997). In fact, the official molecular criteria used for the classification of a bacterial species, DNA/DNA hybridization >70%, GC content <5%, and a 16S rRNA divergence <1, 3%, cannot be applied to Rickettsia species. A 16S rRNA divergence <2% alone would classify all Rickettsiae within the same species (Fournier and Raoult, 2009). Furthermore, based on this criterion, bacteria specialized to mammalian hosts are not defined as a species (Georgiades and Raoult, 2011a). Homo sapiens emerged approximately 250,000–400,000 years ago, while the first human-specialized pathogenic bacterial species, M. tuberculosis, emerged much later, only 20,000 years ago (Wirth et al., 2008). For organisms such as archaea, bacteria, and some unicellular eukaryotes, the species and gene trees do not show much identity with each other on an evolutionary scale (Bapteste et al., 2009). This result is due to the fact that individual gene histories can be different from the history of a species. Over the past 15 years, lateral inheritance (as opposed to vertical inheritance) has been proven to be a major evolutionary force in microorganisms (Bapteste and Boucher, 2008). Examples of extensive chimerism and LGT across prokaryotes are common, and it is absolutely plausible that every gene in prokaryotes has been affected by LGT at some point in evolutionary history (Bapteste et al., 2009). With this in mind, whole gene content and present and absent genes should be taken into consideration when searching for a reliable species classification (Figure 4).
Figure 4
Phylogenomic tree based on whole gene content (present/absent genes) in pathogenic and non-pathogenic . Two clusters are formed: one for pathogenic species (in red), and one for non-pathogenic species (in blue). Pathogenic strains are divided into five groups: EPEC, enteropathogenic; EHEC, enterohemorrhagic; UPEC, uropathogenic; ETEC, enterotoxigenic; EAEC, enteroaggregative.
Phylogenomic tree based on whole gene content (present/absent genes) in pathogenic and non-pathogenic . Two clusters are formed: one for pathogenic species (in red), and one for non-pathogenic species (in blue). Pathogenic strains are divided into five groups: EPEC, enteropathogenic; EHEC, enterohemorrhagic; UPEC, uropathogenic; ETEC, enterotoxigenic; EAEC, enteroaggregative.
Definition of a Virus
The discovery of giant viruses with large genomes has raised many questions about virus definitions and evolution. According to Lwoff, viruses have typically been defined as “filterable infectious agents” smaller than 200 nm that are unable to undergo binary fission and have one type of nucleic acid with few protein-encoding genes (Lwoff, 1957). Giant viruses, such as mimivirus (Raoult et al., 2004, 2007) and mamavirus (La Scola et al., 2008), challenge the size criteria used for the definition of a virus. These viruses, with an icosahedral capsid diameter of nearly 400 nm, have particle sizes comparable to that of bacteria such as Mycoplasma (La Scola et al., 2003; Raoult et al., 2004). Mimivirus possesses a large double-stranded DNA genome (1,181 kb). The mimivirus genome has 1,262 putative ORFs, of which 911 are predicted to be protein-coding genes, and 298 could be associated with functional attributes (Raoult et al., 2004). Mamavirus has a slightly larger genome than mimivirus (1,200 kb), and 99% of its predicted genes are orthologous to mimivirus ORFs (Colson and Raoult, 2010). The concept of the small particle (and genome) that once defined viruses is no longer valid.The discovery of large viruses prompted a re-evaluation of the commonly used viral isolation methods and consideration of the role played by amebae as a source of giant viruses. Because amebae ingest any particle that is larger than 100 nm, these phagocytes represent a potential source of giant viruses with chimeric repertoires (Raoult and Boyer, 2010). Indeed, another virus, Marseillevirus, has recently been isolated from amebae. It has a diameter of 250 nm and a genome size of 368,454 bp (Boyer et al., 2009). Mimivirus, Mamavirus, and Marseillevirus belong to the Mimiviridae, a family in the group of nucleo-cytoplasmic large DNA viruses (NCLDVs; Iyer et al., 2006; Boyer et al., 2009). Genomic analysis of the giant viruses showed that only 4.6 and 11.2% of the ORFs of mimivirus and marseillevirus, respectively, had homologs in the NCLDV core gene set. Hence, the majority of the genome is lineage-specific. In addition to the core genome, the gene repertoire of these ameba-associated viruses contains duplicated genes, ORFans and genes likely acquired through LGT. Indeed, a substantial proportion of the genome exhibits sequence similarities to gene homologs found in bacteria, archaea, eukaryotes, and viruses (Raoult et al., 2004). Using phylogenetic analyses, a bacterial or bacteriophage origin has been inferred for 49 genes and a eukaryotic origin for 85 genes of the marseillevirus genome (Boyer et al., 2009). Likewise, 60 genes from the mimivirus genome had reliable homologs in cellular species and seemed to be acquired from eukaryotes, especially from amebae (Moreira and Brochier-Armanet, 2008). These chimeric gene contents may have resulted from acquisitions through LGT involving the eukaryotic host (ameba) and sympatric bacteria and viruses. Amebae may serve as a genetic mixing bowl from which giant viruses may have gathered a complex set of genes, leading to large chimeric genomes (Raoult and Boyer, 2010).The genomes of giant viruses help to elucidate their origin and early evolution. The position of viruses within the tree of life (TOL) has been a subject of disagreement. Indeed, the classification of organisms into a universal TOL based on ribosomal RNA sequences (Pace, 2006) evidently excludes viruses, which lack ribosomes. As acellular organisms, viruses were intentionally not represented with other living ribosome-encoding organisms in the TOL (Moreira and Lopez-Garcia, 2009). Like other viruses, the mimivirus genome contains genes for replication. Surprisingly, the genome of mimivirus also contains genes that code for components of translation machinery never before found in viruses, including four amino-acyl transfer RNA synthetases, peptide release factor 1, the translation elongation factor EF-TU, and translation initiation factor 1 (Raoult et al., 2004). The presence of these genomic features has triggered a reappraisal of the definition of living beings (Raoult and Forterre, 2008) and the evolutionary implication of viruses. The phylogenetic analysis of mimivirus proteins that have closely related eukaryotic homologs support the appearance of mimivirus as representing a fourth domain of life together with bacteria, archaea, and eukaryotes (Raoult et al., 2004). Indeed, there are some genes that allow tracing history, including DNA processing genes, even thought a whole, complete organism cannot be represented by a classic TOL. An additional genomic study revealed the early emergence of NCLDVs whose core genome is as ancient as the three currently accepted domains of life (Boyer et al., 2010a). These findings confirm previous hypotheses stating that viruses may be at the origin of many eukaryotic genes (Villarreal and De Filippis, 2000; Forterre, 2006) and might have contributed to nucleus formation (Bell, 2001; Takemura, 2001). Thus, the study of giant virus genomes sheds light on the origin of eukaryotes and emphasizes the possible role played by capsid-containing organisms in the evolution of ribosome-encoding organisms.
Metagenomics and Microbial Diversity
The study of many species is difficult or even impossible, mainly due to our inability to culture them in the laboratory (Zengler et al., 2005). Metagenomics, or the culture-independent genomic analysis of an assemblage of organisms, allows us to study microorganisms by deciphering their genetic information from DNA that is extracted directly from communities of environmental microorganisms. Metagenomics has revealed that the vast majority of microbial diversity has been missed using cultivation-based methods (Handelsman, 2004; Riesenfeld et al., 2004; Eckburg et al., 2005). Indeed, approximately 10 and 60% of the sequences from environmental microbial and viral metagenomes, respectively, are novel sequences; they have no significant similarity to any sequence in the GenBank non-redundant database (Tyson et al., 2004; Venter et al., 2004; Edwards and Rohwer, 2005). Thus, our knowledge has been gleaned from the relatively small number of presently culturable representatives while ignoring the “uncultured majority” (Hugenholtz et al., 1998).Metagenomics has offered unprecedented insights into microbial diversity and sparked a revolution in the field of microbiology. Historically, microbiology has focused on single species in pure laboratory cultures; thus, the understanding of microbial communities has lagged behind the understanding of their individual members. In addition, limited information about physiology and functional roles can be gained from microbes in culture. Metagenomics is a new tool to study microbes in the complex communities in which they live and to begin to understand how these communities work. Indeed, metagenomics relies on high-throughput sequencing, which permits the isolation of large portions of genomes, providing access to protein-coding genes and biochemical pathways. Metagenomics focuses on profiling the functions encoded by a microbial community in a selected environment rather than the types of organisms producing them. Analysis of the genomic content of communities of organisms sheds light on the metabolic variability of an environment and on specific physiological functions (Eckburg et al., 2005; Dinsdale et al., 2008). Metagenomic studies of the pathogen-associated microbiome have allowed for an understanding of the role of microbial communities and their clinical implications (Gill et al., 2006; Ley et al., 2006; Turnbaugh et al., 2006; Willner et al., 2009). Information from metagenomic libraries has the ability to enrich our knowledge and has applications in many aspects of industry, therapeutics, and environmental sustainability.Metagenomic approaches have revealed insights into environmental features with important evolutionary implications. Metagenomic functional analyses of ecosystems have revealed the correlation between geochemical conditions, metabolic capacity, and genetic diversity in microbial communities (Edwards et al., 2006; Frias-Lopez et al., 2008; Simon et al., 2009). Indeed, sequencing projects provide a means for sampling the genetic diversity of natural microbial populations by estimating the rate of recombination and have the potential to reveal much about the evolution of these populations (Johnson and Slatkin, 2009). Moreover, this gene-centric approach to environmental sequencing suggests that the functional profile predicted from environmental sequences of a community is similar to that of other communities whose environments of origin pose similar demands. These findings have provided insight into the processes of adaptation and the evolution of life on earth.Notably, metagenomics represents a powerful tool that can be used to access the abundant biodiversity of environmental samples, but its accuracy depends on many limitations. Technical and economic constraints limit the depth of analysis necessary for obtaining a representative picture of microbial and viral communities, their metabolic profiles and their adaptation dynamics (Morgan et al., 2010). Indeed, large-scale sequencing of metagenomic DNA permits the isolation of the most predominant species in the environment, while sequences from low-abundance species may go undetected. In this way, only the most frequently represented functional genes and metabolic pathways that are relevant in a given ecosystem can be identified and assessed. However, the low-abundance species and their encoded functions could also play a critical role in the ecology and physiology of the studied environment (Piganeau and Moreau, 2007).In conclusion, metagenomics has shown that the uncultured microbial world far outnumbers the cultured world and has emphasized the extent of our ignorance about the microbial world. Metagenomics has helped elucidate the structure, function, and interactions of microbial communities; these advances were not possible in the culture-dependent era. Metagenomics constitutes a comprehensive approach for understanding evolution and physiology.
Orphan Genes
Orphan genes constitute a class of lineage-specific genes that do not show homology to sequences in other species (Fischer and Eisenberg, 1999). They typically encode small proteins and show high non-synonymous substitution rates, but their functions are unknown (Domazet-Loso and Tautz, 2003; Daubin and Ochman, 2004). Recently, a classification of ORFans has been proposed, dividing ORFans into singletons, multipletons, and lineage ORFans (Boyer et al., 2010b). Each newly sequenced genome contains significant numbers of such genes (Toll-Riera et al., 2009). For example, of 60 fully sequenced microbial genomes, 14% of genes are species-specific orphans (Siew and Fischer, 2003), while 18% of genes in Drosophila are restricted to the Drosophila group (Zhang et al., 2007; Zhou et al., 2008). However, the origin of orphan genes remains a mystery (Merkeev and Mironov, 2008). One proposed scenario is that they derived from gene duplication events in which one copy accumulated so many sequence changes that the ancestral similarity is no longer detectable (Domazet-Loso and Tautz, 2003). It was recently proposed that such ORFans could also represent genes of viral or plasmid origin (Rocha et al., 2006), and some seem to correspond to truly new genes formed de novo through diverse mechanisms of gene evolution (Boyer et al., 2010b). This mechanism has been proposed to have made a significant contribution to the formation of novel genes in mammals, specifically in primates, in which 5.5% of orphan genes could have originated de novo from non-coding genomic regions (Toll-Riera et al., 2009). The formation of novel genes has also been described in Drosophila (Begun et al., 2006; Levine et al., 2006; Zhou et al., 2008) and Saccharomyces
cerevisiae (Cai et al., 2008).In a recent study in our laboratory, we identified a small number of gene sequences in Rickettsia species that had no match in any database and that seem to have resulted from de novo creation (Georgiades et al., 2011). Indeed, 17 rickettsial gene sequences seem to have no homologs in the NR database. The Ka/Ks ratio revealed that 15 of these sequences were either non-functional or had adopted functionality. Of course, the probability of pseudogenization or even of a possible viral origin of these genes should not be excluded, but because these genes were not found in regions with traces of active or ancient integrated extra-chromosomal elements, we strongly believe that they are novel genes (Georgiades et al., 2011).Finally, it has been reported that new genes might be essential to an organism’s viability. In the case of Drosophila, 59 de novo genes were found to be as essential as the old genes in terms of viability. The observation of lethal phenotypes caused by the knockout of new genes suggested that de novo-created genes may integrate a vital pathway by interacting with existing genes, and this co-evolution may lead to the new gene becoming indispensable (Chen et al., 2010).In summary, gene creation is a continuous and unsettled phenomenon, and this idea is supported by the discovery of new genes, which are permanently generated and whose identification is becoming increasingly frequent (Raoult, 2010a; Boyer et al., 2010b). De novo-created genes are evidence of life’s permanent creativity.
The Tree of Life
The TOL was used by Darwin approximately 150 years ago, as a concept to explain the evolutionary relationships between different species (Doolittle, 1999; Lawton, 2009). It has been accepted as a biological fact since (Doolittle and Bapteste, 2007). According to Darwin’s theory, namely the “descent with modification theory” (Penny, 2011), the common descent of species is demonstrated by similarities between species, while modifications driven by natural selection create differences in species that result in speciation (Doolittle and Bapteste, 2007). The TOL is therefore composed of a common ancestor, the root of the tree, species separated quickly and in a stable way, key branches, and branches containing the most recently arisen species (Raoult, 2010b,c). However, evidence acquired using comparative genomic analyses contradicts the existence of a single common ancestor for the gene repertoire of any organism. Evidence obtained through genomic analyses suggests that nearly all genes have been exchanged or recombined at some point and that there are no two genes with a similar history on the phylogenetic tree (Raoult, 2010c).Since the late 1990s, LGT and gene loss in bacterial genomes have been recognized as much more frequent than previously proposed (Ochman et al., 2000; Lawrence, 2005; Dagan and Martin, 2007). Up to 30% of the genome-to-genome variation within a species is the result of LGT and gene loss, and homologous recombination is now thought to be the first cause of sequence divergence in many bacteria (Doolittle and Bapteste, 2007). Thus, LGT had been considered a rare phenomenon in intracellular bacteria (Audic et al., 2007) until the discovery of the mobilome in Rickettsiae, suggesting that such events were possible (Merhej and Raoult, 2010). Consequently, several further studies identified candidates for LGT in Rickettsia species (Wolf et al., 1999; Ogata et al., 2006; Blanc et al., 2007a,b; Georgiades et al., 2011). Moreover, genetic elements invade and proliferate in rickettsial genomes and eventually integrate genes into their host’s chromosomes (Merhej and Raoult, 2010). Analysis of the R. felis genome has provided evidence for gene transfers between the chromosome and the R. felis plasmid, while the plasmids themselves seem to have been acquired through conjugation (Ogata et al., 2005). The first evidence for LGT in R. bellii also indicated the role of amebae in gene exchanges; amebae constitute a melting pot in which species can exchange genetic material (Ogata et al., 2006; Moliner et al., 2010). Indeed, the genome of R. bellii contains many genes highly similar to those of intracellular bacteria of amebae, such as Legionella pneumophila and Protochlamydia amoebophila (Ogata et al., 2006). L. pneumophila has developed the ability to infect different species of amebae (Rowbotham, 1980; Fields et al., 2002). A recent study on L. pneumophila provided evidence for non-vertical inheritance: 34–57% of the genome has been involved in recombination events. In this study, LGT events between Legionella and all bacterial groups known to be present in amebae were detected (Coscolla et al., 2011). In parallel, other studies have identified eukaryotic-like genes in Legionella that are most likely of amebal origin (Lurie-Weinberger et al., 2010; Moliner et al., 2010; Schmitz-Esser et al., 2010). The most plausible scenario for the multiple phylogenetic origins of an important fraction of Legionella genes is the exchange of genetic material in the common ameba host.These lateral transfer events do not always involve whole genes or certain gene functions. The R. felis paradigm is the first rickettsial genomic analysis in which random transfers of DNA sequences were found to occur independently of gene functions or sequence lengths (Merhej et al., 2011). The functional vision of genes and sequences often influences scientists’ analytical strategies and interpretations. Some bacterial genomes contain up to 40% of genes with no apparent function aside from duplication (selfish genes; Raoult, 2010b). Likewise, random sequences could have hybridized between species because of their sympatric lifestyle (Mayr, 1957).In light of these post-genomic data, a post-Darwinist concept should be introduced, one that assimilates the chimerism and mosaic structure (Figure 5) of all living organisms through both non-vertical inheritance and de novo creation (Raoult, 2010c). The TOL is a biblical phrase (Penny, 2011) that matches well the desire to have classification reflecting the “natural order” that is inclusively hierarchical and goes back to a single ancestor (Doolittle, 1999). Our current genomic knowledge no longer matches with Darwin’s representation of the TOL. Species evolution looks much more like a rhizome (Deleuze and Guattari, 1976; Raoult, 2010c), reflecting all of the various origins of genomic sequences in each species (Raoult, 2010c). Every living organism has a variety of ancestors; exchanges between species are intense, and the creation of new genes is frequent and constant in all organisms. For example, the human genome is a chimera and viruses and bacterial species are also our ancestors. Retroviruses left relics in our genomes, in the same way that both HHV-6A and B viruses can integrate into human chromosomes and may be vertically transmitted in the germ line (Arbuckle et al., 2010). Trypanosoma cruzi sequences were also integrated and identified into the genomes of patients (Hecht et al., 2010). Therefore, the definition of a common ancestor should be revised and instead of referring to a single ancestor, refer to viral ancestors, bacterial ancestors, eukaryotic ancestors, and archaeal ancestors.
Figure 5
Each one of the four domains of life, (A) Eukaryotes (in yellow), (B) Archaea (in blue), (C) Viruses (in pink), and (D) Bacteria (in green), is represented as mosaics containing genes from all four domains. Purple squares represent ORFan genes.
Each one of the four domains of life, (A) Eukaryotes (in yellow), (B) Archaea (in blue), (C) Viruses (in pink), and (D) Bacteria (in green), is represented as mosaics containing genes from all four domains. Purple squares represent ORFan genes.
Conclusion
We think that the radical approach developed by the post-modern French philosophers is useful at this time, as technology has allowed for important discoveries. From this perspective, rickettsiologists, virologists, and bacteriologists, all of whom have different points of view, can make a real contribution to their fields and to the study of the evolution of living organisms. Without the adoption of a non-traditional vision, a large proportion of living organisms, which are now within reach, will remain invisible because we will be trapped by the theories of the twentieth century. Objects are constrained by their definitions. For example, giant viruses were missed by scientists and were not identified earlier because of the misleading definitions of viruses that wanted them to be filterable and smaller than 200 nm (Lwoff, 1957) If the definitions are false, like we demonstrated for the great denominations of microbiology, objects cannot be conceived in a reasonable way and the conclusions derived from the observations of the microorganisms will be biased by misleading believes and theories.
Conflict of Interest Statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Authors: Mia T Levine; Corbin D Jones; Andrew D Kern; Heather A Lindfors; David J Begun Journal: Proc Natl Acad Sci U S A Date: 2006-06-15 Impact factor: 11.205