Microbial ecology is the study of microbes in the natural environment and their interactions with each other. Investigating the nature of microorganisms residing within a specific habitat is an extremely important component of microbial ecology. Such microbial diversity surveys aim to determine the identity, physiological preferences, metabolic capabilities, and genomic features of microbial taxa within a specific ecosystem. A comprehensive review of various aspects of microbial diversity (phylogenetic, functional, and genomic diversities) in the microbial (bacterial, archaeal, and microeukaryotic) world is clearly a daunting task that could not be aptly summarized in a single review. Here, we focus on one aspect of diversity (phylogenetic diversity) in one microbial domain (the Bacteria). We restrict our analysis to the highest taxonomic rank (phylum) and attempt to investigate the extent of global phylum level diversity within the Bacteria. We present a brief historical perspective on the subject and highlight how the adaptation of molecular biological and phylogenetic approaches has greatly expanded our view of global bacterial diversity. We also summarize recent progress toward the discovery of novel bacterial phyla, present evidences that the scope of phylum level diversity in nature has hardly been exhausted, and propose novel approaches that could greatly facilitate the discovery process of novel bacterial phyla within various ecosystems.
Microbial ecology is the study of microbes in the natural environment and their interactions with each other. Investigating the nature of microorganisms residing within a specific habitat is an extremely important component of microbial ecology. Such microbial diversity surveys aim to determine the identity, physiological preferences, metabolic capabilities, and genomic features of microbial taxa within a specific ecosystem. A comprehensive review of various aspects of microbial diversity (phylogenetic, functional, and genomic diversities) in the microbial (bacterial, archaeal, and microeukaryotic) world is clearly a daunting task that could not be aptly summarized in a single review. Here, we focus on one aspect of diversity (phylogenetic diversity) in one microbial domain (the Bacteria). We restrict our analysis to the highest taxonomic rank (phylum) and attempt to investigate the extent of global phylum level diversity within the Bacteria. We present a brief historical perspective on the subject and highlight how the adaptation of molecular biological and phylogenetic approaches has greatly expanded our view of global bacterial diversity. We also summarize recent progress toward the discovery of novel bacterial phyla, present evidences that the scope of phylum level diversity in nature has hardly been exhausted, and propose novel approaches that could greatly facilitate the discovery process of novel bacterial phyla within various ecosystems.
Microbial ecology is the scientific discipline where scientists examine microbes in their environment, their impact and adaptation to their habitat and their interactions with each other. Microbial diversity surveys, which aim to identify the types of microorganisms within a specific habitat are an integral part of microbial ecology. The discovery of “animalcules” (single celled microscopic microorganisms), by Antony van Leeuwenhoek in various samples e.g. rain drops, water samples from wells and lakes, oral and stool samples from humans is, in essence, microbial diversity surveys [1]. Following Leeuwenhoek’s discoveries, a relative hiatus in microbiology research ensued in the 18th and the earlier parts of the 19th century. The revival of microbiology research during the mid 19th–early 20th century was characterized by a marked shift in research philosophy. Holistic observation of microorganisms in their natural habitats was replaced with a reductionist research philosophy, with emphasis on the identification of etiological agents of microbially mediated phenomena such as fermentation and pathogenesis. Research during this era, deservedly referred to as the “golden age of microbiology” has lead to multiple seminal advances e.g. development of solid media for culturing bacteria, germ theory of disease, staining techniques, and vaccination procedures [2]. However, such spectacular advances have shifted the research focus of microbiologists from an ecosystem-oriented, holistic philosophy to a reductionist, pure-culture centric focus.The Russian/Ukrainian scientist Sergei Winogradsky, whose biography is almost as interesting as his research accomplishments, advocated a research approach that emphasizes the study of microorganisms in their natural habitats in mixed cultures or in isolates recently recovered from the ecosystem of interest. Winogradsky correctly reasoned that microorganisms in nature survive in conditions that are a far cry from the controlled, nutrient-rich conditions at which pure cultures are maintained in the laboratory. He reasoned that the behavior of a specific microorganism in its natural habitat is markedly different from its behavior in pure culture due to the differences in nutrient and resource availability between both conditions, as well as to the constant interactions with various microbial taxa coexisting within the same habitat [1]. His work on environmental samples, especially soil, has clearly led to a better appreciation of the metabolic and functional diversity of microorganisms in their natural habitats.Winogradsky’s research, and subsequent efforts by eminent microbiologists (Beijerinck, van Neal, Kluyver, and Hungate) has defined the goals of microbial ecology. These could be simplified for the non-specialist as the “who” (identity of microorganisms), “what” (their metabolic capabilities), “where” (their spatiotemporal distribution within an ecosystem as well as in a global scale), and “why” (functions in a specific ecosystem and role in geochemical cycling). The “who” is, obviously, the most basic question in microbial ecology (add references). After 340 years postanimalcules discovery and almost a century since the revival of microbial ecology by Winogradsky, one would imagine that this seemingly straightforward question has satisfactory been answered, and that the science of microbial discovery and description of new taxa would be as dead as the science of discovering new organs in the human body. This could not be any further from the truth. A global census of all microbial species on earth is now recognized as a truly impossible task [3]. Even with a single sample from a highly diverse ecosystem (e.g. soil), such census still represents a daunting challenge [4,5,6].In this review, we examine the scope of bacterial diversity within the domain Bacteria. We limit our assessment of phylogenetic diversity to the highest taxonomic rank (phylum) and attempt to address seemingly straightforward questions: How many bacterial phyla exist in nature? Have all such phyla already been described? And what approaches could be implemented to more effectively document novel, yet undescribed phylum level diversity within the Bacteria?
From the great plate count anomaly to the uncultured bacterial majority
The great plate count anomaly and the “missing” cells
It has been observed, as early as 1932, that within freshwater samples, only an extremely small fraction of microscopically observed microbial cells is recoverable as pure cultures in microbial growth media [7]. This observation (initially seen in freshwater) has since been validated in a wide array of environmental samples (e.g. marine, soils, and freshwater habitats, see [8] and references within). Typically, the absolute majority (99–99.9%) of cells within an environmental sample are not recoverable in pure culture using plating or most probable number (MPN) enumeration procedure. Specific measures have been shown to slightly improve the proportion of cultured cells within select environmental samples. These include the utilization of multiple media targeting various metabolic capabilities and physiological preferences, longer incubation time [9], novel isolation contraptions [10,11], use of dilute media to mimic resource scarcity in nature and/or media mimicking natural settings [12], and the implementation of more sensitive growth detection methods [11,13]. Nevertheless, even with improved methodologies, the majority of cells within highly complex habitats remain uncultured. The term “The great plate count anomaly” has been aptly coined to describe this phenomenon in 1988 [8].A logical inquiry stemming from the recognition of this phenomenon is the identity of microorganisms escaping enrichment and isolation procedures. Do these microorganisms represent novel, hitherto unknown bacterial taxa, or do they represent close relatives of bacterial taxa available in pure culture that possess attenuated growth capabilities, multiple unidentified auxotrophies, and/or yet-unclear physiological and growth requirements? The presence of unique cellular morphologies in environmental samples that have never been recovered in pure cultures has often hinted at the putative novelty of at least a fraction of these uncultured cells [14]. However, prior to the advent of molecular taxonomic approaches and their wide utilization in diversity surveys this question was mostly philosophical in nature [15].
Use of molecular phylogeny in culture-independent diversity surveys
The late American microbiologist Carl Woese pioneered the use of 16S rRNA gene as a phylogenetic marker to provide an evolutionary-based taxonomic outline for living organisms. Using comparative 16S rRNA gene sequence analysis, he proposed a three kingdom classification scheme [16], where all living creatures are grouped into three domains (Bacteria, Archaea, and Eukaryotes). His further investigation of cultured taxa within the bacterial domain has produced the first high rank taxonomic outline for Bacteria, with all known bacterial taxa grouped into 12 different phyla or divisions (Fig. 1) [17].
Fig. 1
Phylogenetic tree depicting the twelve “original” bacterial phyla proposed by Carl Woese in his seminal review on bacterial evolution. Adapted from Ref. [9]. These phyla are Thermotogae, Chloroflexi (Green non-sulfur Bacteria), Deinococcus, Spirochaetes, Chlorobia (Green sulfur bacteria), Bacteroidetes, Planctomycetes, Chlamydia, Cyanobacteria, Gram-positive Bacteria (comprising the high GC Actinobacteria, and the low GC Firmicutes), Proteobacteria (Purple bacteria).
Building on these efforts, the American microbiologist Norman Pace has pioneered the use of 16S rRNA gene-based sequencing and analysis procedures as a tool for direct identification of microbial populations in environmental samples. This approach was originally dubbed “phylotyping” but is more commonly referred to now as “16S rRNA gene-based culture-independent diversity survey”, or simply “16S rRNA analysis” (Fig. 2) [18]. It involves direct isolation of bulk DNA from an environmental sample followed by PCR amplification of a fragment of the 16S rRNA gene using primers targeting conserved regions within the molecule. The amplicon, representing a mix of 16S rRNA genes originating from different cells within the environmental sample of interest is then cloned and sequenced (or directly sequenced when using newer high throughput sequencing procedures, see below) [15,19]. The obtained sequences are analyzed and their phylogenetic affiliation is assessed using various phylogenetic and bioinformatics procedures. This approach has the monumental advantage of being culture-independent i.e. capable of identifying microorganisms within a specific environmental samples regardless of their amenability or refractiveness to isolation [18]. As such, it is well suited to address questions posed above regarding the identity and taxonomy of uncultured microorganisms routinely escaping detection in enrichment and isolation-based procedures.
Fig. 2
Flowchart depicting the “16S rRNA analysis” protocol. The protocol starts by DNA extraction, followed by amplifying the small subunit rRNA gene using universal or domain-specific primers. PCR products are then cloned and sequenced. Obtained small subunit rRNA gene sequences are then analyzed, binned into operational taxonomic units (OTUs), and used for phylogenetic inferences.
The uncultured bacterial majority revealed
The 16S rRNA gene-based approach has been readily adopted in the past three decades by the absolute majority of the scientific community, and extensively utilized to study the microbial diversity in ecosystems ranging from large global habitats, e.g. oceans [20-40], and soil [41-60], to hardly accessible extreme environments such as deep sea hydrothermal vents [61-76], Antarctic lakes [32,62,77-82], and Antarctic soils [33,62,83-90]. Collectively, these studies have demonstrated that the scope of phylogenetic diversity is much broader than previously implied from culture-based studies. Multiple novel microbial lineages have been identified, many of which appear to be deeply branching within the bacterial tree and unaffiliated with any of the known bacterial phyla. The discovery of these lineages necessitated coining the term candidate phylum (or candidate division) to accommodate these bacterial phyla where only 16S rRNA sequences but no isolates are available. Indeed, examination of taxonomic outlines provided by curated 16S rRNA gene databases e.g. Greengenes [91] and SILVA [33] suggests that, currently, the majority of currently recognized bacterial phyla are candidate phyla (Table 1). Therefore, the application of 16S rRNA gene based diversity surveys has resulted in the discovery of multiple novel bacterial lineages at the highest taxonomic rank and have revolutionized our understanding of the scope of phylum level diversity in nature. More importantly, such analysis clearly demonstrated that a fraction of microbial cells consistently missed in enumeration and isolation approaches clearly belong to novel, hitherto unrecognized bacterial lineages.
Table 1
Bacteria phyla names according to Greengenes [91] and SILVA [33] databases (August 2014).a
Greengenes
SILVA
AC1
Acidobacteria
Acidobacteria
Actinobacteria
Actinobacteria
AD3
AncK6
aquifer1
aquifer2
Aquificae
Aquificae
Armatimonadetes
Armatimonadetes
Bacteroidetes
Bacteroidetes
BD1-5
BHI80-139
BHI80-139
BRC1
BRC1
Caldiserica
Caldiserica
Caldithrix
CD12
Chlamydiae
Chlamydiae
Chlorobi
Chlorobi
Chloroflexi
Chloroflexi
Chrysiogenetes
Chrysiogenetes
CKC4
Cyanobacteria
Cyanobacteria
Deferribacteres
Deferribacteres
Thermi
Deinococcus-Thermus
Dictyoglomi
Dictyoglomi
Elusimicrobia
Elusimicrobia
EM3
EM19
FBP
FCPU426
Fibrobacteres
Fibrobacteres
Firmicutes
Firmicutes
Fusobacteria
Fusobacteria
GAL08
GAL15
Gemmatimonadetes
Gemmatimonadetes
GN01
GN02
GN04
GOUTA4
GOUTA4
H-178
Hyd24-12
Hyd24-12
Kazan-3B-28
KB1
KSB3
LCP-89
JL-ETNP-Z39
JS1
LD1
LD1-PA38
Lentisphaerae
Lentisphaerae
MAT-CR-M4-B07
MVP-21
MVS-104
NC10
Nitrospirae
Nitrospirae
NKB19
NPL-UPA2
NPL-UPA2
OC31
OC31
OctSpA1-106
OD1
OD1
OP1
OP3
OP3
OP8
OP8
OP9
OP9
OP11
OP11
PAUC34f
Planctomycetes
Planctomycetes
Poribacteria
Proteobacteria
Proteobacteria
RsaHF231
S2R-29
SAR406
SBR1093
SBYG-2791
SC4
SHA-109
SM2F11
Spirochaetes
Spirochaetae
SR1
SR1
Synergistetes
Synergistetes
TA06
TA06
Tenericutes
Tenericutes
Thermodesulfobacteria
Thermotogae
Thermotogae
TM6
TM6
TM7
TM7
TPD-58
Verrucomicrobia
Verrucomicrobia
VHS-B3-43
WCHB1-60
WD272
WPS-2
WS1
WS2
WS3
WS3
WS4
WS5
WS6
WS6
WWE1
ZB3
Phyla shown in Boldface are those already known with cultured representatives prior to the advent of 16S rRNA gene diversity surveys. Phyla in italics are those with cultured representatives originally identified using 16S rRNA sequencing as uncultured bacterial phyla, with representative isolates subsequently obtained. The rest of the phyla currently have no cultured representatives.
Global phylum level diversity in bacteria
These new discoveries of novel bacterial phyla and candidate phyla have added multiple new deep branches (phyla) to the bacterial trees of life, but are we done with this exercise? Has the phylum level diversity within the Bacteria been exhausted, or are there multiple, yet-undescribed novel bacterial phyla (or even domains) in nature? One would imagine that, after three decades of research, thousands of published 16S rRNA gene-based diversity surveys, 5.4 million Sanger-generated 16S rRNA gene sequences in GenBank and >1.7 billion sequences in high throughput sequencing archives e.g. SRA [92], CAMERA [93], and MG-RAST [94], and the discovery and documentation of tens of novel bacterial candidate phyla, that the global scope of diversity of bacteria on earth has been documented, at least at the highest taxonomic (phylum) level. However, based on our research experience in the last decade, the authors are now firm believers that the scope of global phylum level bacterial diversity is much greater than currently recognized in curated 16S rRNA gene databases such as Greengenes [91] and SILVA [33] (Table 1). Below, we present three different reasons why we believe that this is the case, as well as procedures that could putatively facilitate the discovery of these novel phyla.
Novel bacterial phyla as constituents of the rare biosphere
Within highly diverse microbial ecosystems, several distribution models can be used to fit the frequency data, e.g. ordinary Poisson distribution, gamma-mixed Poisson, inverse Gaussian-mixed Poisson, lognormal-mixed Poisson, Pareto-mixed Poisson, and mixture of 2 exponentials-mixed Poisson [58,95-99]. Regardless of the distribution pattern, the community structure in diverse habitats typically exhibits a taxon rank distribution curve with a long tail corresponding to bacterial species present in low abundance. This fraction constituting the majority of species is referred to as the “rare” biosphere [20]. The reason why these lineages are present and maintained at low abundances, as well as their global distribution patterns and putative ecological roles (or lack thereof), is an active area of interest to microbial ecologists and evolutionary microbiologists.Access to the rare members of the community has been greatly augmented by the advent of high throughput sequencing technologies and their adaptation to amplicon-based 16S rRNA gene-based diversity surveys e.g. pyrosequencing [20], and Illumina sequences [100]. Such adaptation has allowed for the generation of hundreds of thousands (pyrosequencing) to millions (Illumina) of sequencing reads in a single run and hence provided unprecedented access to the rare biosphere. Collectively, these studies have documented the extremely high level of species richness within the rare biosphere. More interestingly, within such studies, a significant fraction of the obtained sequences (10–74% [101-105] are considered unclassified beyond a preset sequence similarity threshold, e.g., 80%, to the closest classifiable relative in databases.However, it is important to note that, while pyrosequencing-, and Illumina-based studies are excellent tools for suggesting the occurrence of novel bacterial diversities within a sample, they are very poor in accurately documenting and describing such diversity. Accurate determination of the phylogenetic affiliation of such pyrosequencing-, and Illumina-generated sequences is unfeasible, mainly due to the short-read-length output of currently available high throughput technologies, and the error rate associated with them, which preclude the direct deposition of obtained short sequences into public databases e.g. GenBank. Hopes on the development of a high throughput, long-read sequencing approach have been high, but the newer systems that offer that (e.g. PacBio SMRT) have a dreadfully high error rate (∼14% indels for PacBio SMRT sequencing) that preclude their utilization for high throughput phylogenetic studies.Therefore, Sanger-generated near full-length 16S rRNA gene sequences remain the only viable way for the accurate description and documentation of novel bacterial lineages. In spite of the fact that an extremely large number of Sanger-generated 16S rRNA gene sequences (>5 M, as of August 2014) are currently available through the GenBank database, the absolute majority of these sequences have been obtained during the course of small-scale diversity surveys (e.g. <200 sequences generated per study). Accordingly, these studies, and consequently the entire database have an extremely poor representation of the rare biosphere within the ecosystems studied.Two strategies have been developed as a means to obtain near full-length 16S rRNA gene sequences from the rare biosphere. The first is a brute force approach in which a large number of clones are sequenced from a single sample, and the other depends on the development of a more targeted approach to specifically access putatively novel members within the rare biosphere. Due to cost issues, relatively few studies have utilized a brute force approach for this process. For example, [106] examined the bacterial diversity in grassland soil by analyzing 13,001 sequences from a single sample. This study demonstrated that rare members of the microbial community have, on average, more novelty (i.e. less sequence similarity to their closest relative in the database) compared to more abundant members of the samples. More importantly, the authors identified multiple novel lineages at various taxonomic levels, with the identification of 6 putative new phyla. Another more impressive more recent effort [107] focused on analyzing ∼119,000 Sanger-generated sequences obtained from 10 equivalent sections pooled from 4 core samples of a 5 cm thick Guerrero Negro microbial mat, and resulted in the identification of 43 putatively novel phyla. Collectively, both studies, as well as other deep sequencing Sanger-based studies conducted on a smaller scale, e.g. [108,109] consistently demonstrate that novel bacterial phyla are still to be encountered in the rare biosphere.A more targeted approach to zoom in on putatively novel members of the rare biosphere has been independently developed by three different research laboratories and used to target putatively novel and rare members of the microbial community in a sulfide and sulfur-rich spring in southwestern Oklahoma (Zodletone spring) [110], freshwater microbial communities [111], and marine sponges [103]. This approach (Fig. 3) is based on using sequences generated in high throughput sequencing surveys to identify sequences with low sequence similarity (e.g. <80%) to closest relatives in GenBank database. Primers specific to these putatively novel sequences are then designed and used in conjunction with universal bacterial primers to obtain near full length 16S rRNA amplicons which could be cloned, sequenced using Sanger sequencing, and subjected to detailed phylogenetic analysis. Using this approach, five novel bacterial phyla were identified within the rare members of the microbial community in Zodletone spring in Southwestern Oklahoma [110]. Therefore, regardless of the approach utilized, it is clear that all dedicated efforts expended on identifying novelty within the rare biosphere in various ecosystems almost invariably yielded novel bacterial phyla. We hence conclude that a sustained and dedicated effort to investigate phylum level diversity in the rare biosphere in multiple complex habitats could hence have a profound effect on our understanding of the global scope of phylum level diversity within the domain Bacteria.
Fig. 3
Flowchart depicting a targeted approach developed for the identification of novel bacterial phyla within the rare biosphere. The approach combines the sequence read length and accuracy of the Sanger sequencing approach with the high throughput capability of next generation (Pyrosequencing or Illumina) sequencing approaches. Pyrosequencing or Illumina sequencing output are first used to identify potentially novel members within rare members of the community. The short sequences are then used to design custom primers. The newly designed primers are then used in conjunction with a forward, or reverse bacterial primer for amplification of near-complete 16S rRNA gene sequences. Obtained PCR products are cloned and Sanger-sequenced, and the sequences obtained are used for detailed phylogenetic inferences.
Novel bacterial phyla in the shadow biosphere
All 16S rRNA gene-based diversity surveys are initiated by amplification of 16S rRNA genes using primers that target conserved regions within the 16S rRNA molecule. A list of universal bacterial primers used in diversity surveys is shown in Table 2. It has often been argued that these “universal” primers could not theoretically amplify every single microbial strain within a single complex environmental sample, and that a fraction of microbial diversity is routinely missed in PCR-based diversity studies. However, the proportion of missed diversity, or the “shadow biosphere” as a fraction of the total number of cells is currently unclear. Indeed, 16S rRNA gene sequences within genomic fragments obtained via PCR-independent techniques, e.g. cloned in fosmids [112], have mismatches to the sequences of commonly used universal 16S rRNA primers [113]. Further, a detailed in silico analysis of 16S rRNA gene sequences identified in PCR-independent metagenomic survey in NCBI environmental survey repository also identified multiple 16S rRNA gene sequences that harbor mismatches to common universal bacterial 16S rRNA primers [114].
Table 2
Common 16S rRNA bacterial primers used for culture-independent analysis.a
Primer name
Primer sequenceb
8F
AGAGTTTGATCCTGGCTCAG
27F
AGAGTTTGATCMTGGCTCAG
338R
GCCTTGCCAGCCCGCTCAG
338F
ACTCCTACGGGAGGCWGCAGC
518R
GTATTACCGCGGCTGCTGG
530F
ACGCTTGCACCCTCCGTATT
805R
GGATTAGATACCCTGGTAGTC
967F
CAACGCGAAGAACCTTACC
1238R
GTAGCRCGTGTGTMGCCC
1100F
YAACGAGCGCAACCC
1492R
CGGTTACCTTGTTACGACTT
F indicates a forward primer and R indicates a reverse primer. Number in the primer name indicates the starting position of the primer sequence within the E. coli 16S rRNA gene sequence.
Data from references [57,126].
In addition, several studies provide empirical evidence that the shadow biosphere harbors a disproportionally large fraction of bacterial cells belonging to novel bacterial phyla. For example, the discovery of candidate divisions AD3, NC10, and mesophilic Thermotoga as integral constituents within soil ecosystems has long been hampered by the common mismatches exhibited in their 16S rRNA gene sequences to universal bacterial primers, resulting in their chronically common misrepresentation and outright absence in soil clone libraries [115]. More importantly, recent studies from the Banfield laboratory at UC-Berkley have constituted multiple genome assemblies from metagenomic datasets derived from a variety of habitats [113,116-120]. Many of these reconstituted genomes represent completely novel bacterial phyla that have never been observed before, in pure cultures, or in 16S PCR-based diversity surveys. All such novel biosphere-derived phyla exhibit multiple mismatches within their 16S rRNA gene sequences to various “universal” bacterial primers currently in use, and hence were always missed in diversity surveys. A similar situation has been encountered within the domain Archaea, where culture-independent single cell genomic analysis recovered genomes belonging to completely novel archaeal phyla with 16S rRNA gene sequences exhibiting marked mismatches, and even indels (insertions and deletions), which render them recalcitrant to amplification using current PCR primers and protocols [121,122].Utilization of PCR independent metagenomic approaches as a routine procedure for assessing diversity might be possible in the future, but currently, PCR-based approaches represent the most feasible way to assess diversity. Therefore, to assess diversity within the shadow biosphere using PCR-based approaches, newer strategies are needed. One approach to potentially limit or decrease the proportion of cells missed due to primer mismatches is to utilize miniprimers (10 bp primers) instead of the standard 18–20 bp primers currently in use, and to employ engineered S-Tbr DNA polymerase instead of Taq polymerase to allow such amplification procedure [114]. Theoretically, mismatches are less probable to occur in a shorter 10 bp primer when compared to a standard 18–20 bp primer. Isenbarger et al. [114] used this approach to examine bacterial diversity in soil, as well as a microbial mat sample from Cabo Rojo, PR using a shorter version of the standard 27F and 1505R primers (Table 2) [27F-10 (5′ TTCCGGTTGA) 1505R-10 (5 CCTTGTTACG)], and engineered S-Tbr DNA polymerase. The authors compared clone libraries observed using both approaches and clearly demonstrated that a higher proportion of putatively novel sequences were obtained with the miniprimer approach when compared to standard primer approach.We further propose an additional approach based on designing multiple degenerate primers to account for mismatches to the universal 16S rRNA gene. Since base pairing is necessary to maintain 16S rRNA secondary structure, degenerate primers will be designed to theoretically maintain canonical base pairings in 16S rRNA secondary structure (Fig 3), i.e. any base change at one position will be compensated by a complementary base change at the pairing position (Fig. 3). Applications of such an exercise to two primers (27f, and 1492r) would generate a list of 21 degenerate forward, and 19 degenerate reverse primers (Table 3). Each of these degenerate primers can theoretically be paired with the universal forward or reverse primer and used for 16S rRNA sequence amplification in a multiplexed high throughput PCR approach to identify novel sequences. Such approach has been mulled before but has never been utilized to our knowledge to identify diversity (see Fig. 4).
Table 3
List of degenerate primers for 27F and 1492R designed as specified in the text. The sequences of the non-degenerate 27F and 1492R are given in the table heading.
AGAGUUUGAUCAUGGCUCAG
AAGUCGUAACAAGGUAACC
BGAGUUUGAUCAUGGCUCAG
GAGUCGUAACAAGGUAACC
AHAGUUUGAUCAUGGCUDAG
AGGUCGUAACAAGGUAACC
AGBGUUUGAUCAUGGCVCAG
AAHUCGUAACAAGGUAACC
AGAHUUUGAUCAUGHDUCAG
AAGCCGUAACAAGGUAACC
AGAGGUUGAUCAUGHCUCAG
AAGUDGUAACAAGGUAACC
AGAGAUUGAUCAUGHCUCAG
AAGUCHUAACAAGGUAACC
AGAGCUUGAUCAUGHCUCAG
AAGUCGCAACAAGGUAACC
AGAGUGUGAUCAUGHCUCAG
AAGUCGUGACAAGGUAACC
AGAGUAUGAUCAUGHCUCAG
AAGUCGUABCAAGGUAACC
AGAGUCUGAUCAUGHCUCAG
AAGUCGUAADAAGGUAACC
AGAGUUCGAUCAUGGCUCAG
AAGUCGUAACGAGGUAACC
AGAGUUUAAUCAUGGCUCAG
AAGUCGUAACAGGGUAACC
AGAGUUUGGUCAUGGCUCAG
AAGUCGUAACAAAGUAACC
AGAGUUUGAVCAUGGCUCAG
AAGUCGUAACAAGAUAACC
AGAGUUUGAUDAUGGCUCAG
AAGUCGUAACAAGGVAACC
AGAGUUUGAUCBUGGCUCAG
AAGUCGUAACAAGGUBACC
AGAGUUUGAUCACGGCUCAG
AAGUCGUAACAAGGUABCC
AGAGUUUGAUCAUUGCUCAG
AAGUCGUAACAAGGUAADC
AGAGUUUGAUCAUGUCUCAG
AAGUCGUAACAAGGUAACD
AGAGUUUGAUCAUGGCUCGG
AGAGUUUGAUCAUGGCUCAH
Fig. 4
Secondary structure of regions (A) 8–27, and (B) 1492–1510 of the 16S rRNA molecule. Canonical base pairing (shown as lines) is targeted for designing degenerate primers such that a change in one base is associated with a complementary change in the pairing position. Noncanonical base pairings (A-A, C-C, G-G, C-A, U-G, G-A, U-U), and wobble base pairing (G-U), often a consequence of canonical pairings, are theoretically less necessary for maintaining ribosomal integrity, and so are not targeted for primer design. The sequences of all possible degenerate 27F and 1492R primers are shown in Table 3.
Inadequate documentation of phylum level diversity within existing databases
In addition to the failure to detect novel bacterial phyla due to their rarity in environmental samples or to their possession of mismatches to most commonly used 16S rRNA gene primers, we argue that current inadequate curation of deposited 16S rRNA gene sequences is leading to failure in recognizing novel bacterial phyla for which 16S rRNA gene sequence has already been reported. All published studies of 16S rRNA gene surveys deposit sequences obtained in a public database, most commonly GenBank database (available at ftp://ftp.ncbi.nih.gov/blast/db/nt. and EMBL database). Many of the studies are focused on various ecological questions and do not conduct a detailed assessment of the phylogenetic affiliation of every obtained 16S rRNA gene sequence. Therefore, 16S rRNA gene sequences representing novel phyla could be deposited unnoticed to GenBank database. Curated 16S rRNA gene databases (e.g. Greengenes [91], and SILVA [33]) routinely upload recently deposited 16S rRNA gene sequences in GenBank and add such sequences to their taxonomic outlines. However, proposing novel bacterial phyla based on newly obtained sequences represent but one of the interests and responsibilities of database curators, and many novel 16S rRNA sequences that putatively represent novel bacterial phyla are simply refer to as “unclassified” in such databases.We hypothesized that 16S rRNA sequences representing multiple novel bacterial phyla have already been obtained and deposited in public databases but has so far escaped detection and documentation due to reasons highlighted above. As a proof of principle, we queried one of such database depositories, the European Nucleotide Archive (ENA) [92], for novel 16S rRNA sequences. At the time of download (September, 2013), 3,178,046 16S rRNA gene sequences were obtained. The sequences were trimmed for length to remove all sequences shorter than 800 bp and were classified using Greengenes taxonomy and Wang method employed in Mothur. Most of the sequences (∼80%) were classified into a known phylum or candidate division with >50% bootstrap support. The remaining 20% of sequences were subjected to an extensive phylogenetic analysis using maximum likelihood approaches (implemented in RaxML [123] and Mega [124]). As a result, 79 different sequences were judged to represent 8 novel bacterial phyla. These 79 sequences formed eight different independent, deep branching, reproducibly monophyletic, bootstrap-supported clusters, upon applying various tree-building algorithms as well as upon varying the composition and size of the data set used for phylogenetic analysis (Fig. 5). Sequences representing potentially novel classes and orders belonging to known phyla were also identified (data not shown). Therefore, such analysis, conducted sequences from the relatively smaller ENA database, clearly demonstrates that novel bacterial phyla are routinely detected in diversity surveys but often escapes documentation. Similar analysis using sequences in larger databases e.g. GenBank, as well as continuous evaluation of recently deposited sequences could clearly result in the identification of additional novel phyla.
Fig. 5
Maximum likelihood dendogram based on the 16S rRNA gene sequences affiliated with representatives of the putatively novel phyla (PNP1-PNP8). Bootstrap values (in percentages) are based on 1000 replicates and are shown for branches with more than 50% bootstrap support. Sequences obtained from the ENA database (n = 3,178,046) were classified in MOTHUR using classify.seqs command with the Greengenes taxonomy outline and Wang method. Sequences that failed to classify into a known phylum with at least 50% bootstrap support (n = 664,621) were considered potentially novel and were subjected to extensive phylogenetic analysis using a combination of Mega [124], RaxML [123], and Arb [125]. Seventy-nine sequences formed 8 independent, deep-branching, reproducibly monophyletic, bootstrap-supported clusters, upon applying various tree-building algorithms as well as upon varying the composition and size of the data set used for phylogenetic analysis. Representatives of these 8 novel phyla are shown in the tree along with their source.
Conclusions
We hope to convey that, in spite of the spectacular technological advances in DNA sequences, and intense research in the area of microbial diversity, that to-date, a complete census of the phylum level diversity within the domain bacteria has not yet been realized. A similar statement could be made regarding the domain Archaea and, to some extent, the microeukaryotes. Our review summarizes progress toward such goal, and outlines potential strategies and procedures that could facilitate the discovery process. It is interesting to note that many of such novel bacterial phyla appear to have a limited distribution and often represent a minor fraction of the microbial community within a specific habitat. The reason for their retention of such cells in highly diverse habitats, and their potential role within a specific ecosystem (or lack thereof) is an issue that is currently unclear. Access to the genome of such microorganisms through single cell genomics or metagenomics, or success in obtaining representative pure cultures would be required to address such questions.
Conflict of interest
The authors have declared no conflict of interest.
Compliance with ethics requirements
This article does not contain any studies with human or animal subjects.
Authors: Shayne J Joseph; Philip Hugenholtz; Parveen Sangwan; Catherine A Osborne; Peter H Janssen Journal: Appl Environ Microbiol Date: 2003-12 Impact factor: 4.792
Authors: Wolfgang Ludwig; Oliver Strunk; Ralf Westram; Lothar Richter; Harald Meier; Arno Buchner; Tina Lai; Susanne Steppi; Gangolf Jobb; Wolfram Förster; Igor Brettske; Stefan Gerber; Anton W Ginhart; Oliver Gross; Silke Grumann; Stefan Hermann; Ralf Jost; Andreas König; Thomas Liss; Ralph Lüssmann; Michael May; Björn Nonhoff; Boris Reichel; Robert Strehlow; Alexandros Stamatakis; Norbert Stuckmann; Alexander Vilbig; Michael Lenke; Thomas Ludwig; Arndt Bode; Karl-Heinz Schleifer Journal: Nucleic Acids Res Date: 2004-02-25 Impact factor: 16.971