Literature DB >> 25626903

High-throughput metagenomic technologies for complex microbial community analysis: open and closed formats.

Jizhong Zhou¹, Zhili He², Yunfeng Yang³, Ye Deng, Susannah G Tringe⁴, Lisa Alvarez-Cohen.

Abstract

Understanding the structure, functions, activities and dynamics of microbial communities in natural environments is one of the grand challenges of 21st century science. To address this challenge, over the past decade, numerous technologies have been developed for interrogating microbial communities, of which some are amenable to exploratory work (e.g., high-throughput sequencing and phenotypic screening) and others depend on reference genes or genomes (e.g., phylogenetic and functional gene arrays). Here, we provide a critical review and synthesis of the most commonly applied "open-format" and "closed-format" detection technologies. We discuss their characteristics, advantages, and disadvantages within the context of environmental applications and focus on analysis of complex microbial systems, such as those in soils, in which diversity is high and reference genomes are few. In addition, we discuss crucial issues and considerations associated with applying complementary high-throughput molecular technologies to address important ecological questions.

Entities: Chemical Disease Species

Mesh：

Year: 2015 PMID： 25626903 PMCID： PMC4324309 DOI： 10.1128/mBio.02288-14

Source DB: PubMed Journal: MBio Impact factor: 7.867

Minireview

Microorganisms inhabit almost every imaginable environment in the biosphere, play integral and unique roles in ecosystems, and are involved in the biogeochemical cycling of essential elements, such as carbon, oxygen, nitrogen, sulfur, phosphorus, and various metals. Their structure, function, interaction, and dynamics are critical to our existence, yet their detection, identification, characterization, and quantification pose several great challenges. First, microbial communities can be extremely diverse, and the majority of microorganisms in natural environments have not yet been cultivated (1, 2). Second, in any ecosystem, various microorganisms interact with each other to form complicated networks whose behavior is hard to predict (3, 4). Establishing mechanistic linkages between microbial diversity and ecosystem functioning adds an additional challenge to understanding the interactions and activities of complex microbial communities (5, 6). Effective high-throughput technologies for analyzing microbial community structure and functions are critical for advancing this mechanistic understanding. Sequencing and phylogenetic analysis of 16S rRNA genes provided the foundation for modern study of microbial communities. PCR-based 16S rRNA cloning analysis has driven the explosion of information about community memberships and vastly expanded the known diversity of microbial life (7). PCR-based analyses of 16S rRNA genes have three major limitations: (i) PCR limits the information obtained to the sequence between the primers, thereby disregarding functional information; (ii) PCR-based analysis is only somewhat quantitative, with most measurements providing only relative abundance information; and (iii) PCR primer mismatches may result in some lineages being missed entirely (8). All three challenges have been addressed by the development of metagenomic analyses involving direct sequencing or screening of unamplified environmental DNA (9–12). These methods constitute critical “open formats,” which do not require prior knowledge of the community, thereby enabling unprecedented discovery of new taxa and genes and associations between them. Analysis of cloned DNA has largely been replaced by next-generation sequencing of DNA extracted from environmental sources, which has transformed the field of microbial ecology by increasing the speed and throughput of DNA sequencing by orders of magnitude. Now the metagenomic databases are packed with high-quality sequence information from diverse habitats across the globe, revolutionizing molecular analyses of biological systems (13, 14) and facilitating research on questions that formerly could not be approached. Although functional metagenomics, in which clones containing metagenomic DNA are screened for expressed activities, holds great promise to shape ecological theory and understanding, it has lagged behind shotgun sequencing because of the comparatively slow advances in screening technology. Ecological insights from the massive data sets generated by high-throughput sequencing (open formats) have been facilitated by sophisticated computational methods and by closed-format methods, such as microarrays, which can be used to rapidly query taxa, genes, or transcripts over space and time in complex communities. High-throughput sequencing and microarray technologies have been applied to diverse communities. The plethora of research using these methods has stimulated several excellent reviews (15–17), particularly as applied to the human microbiome (18–20). Our intent here is to complement previous reviews by focusing primarily on DNA-based metagenomic technologies applied to complex environmental communities, such as those found in soils.

OVERVIEW OF OPEN AND CLOSED MOLECULAR DETECTION APPROACHES

Since 1990, various molecular methods capable of tracking one to hundreds of biomarkers have been widely used to analyze microbial community structure, such as PCR amplification-based gene cloning, sequencing of 16S rRNA genes (21) and functional genes (22), amplified ribosomal DNA restriction analysis (23), denaturing gradient gel electrophoresis (24), terminal restriction fragment length polymorphism (25), phospholipid fatty acid analysis (26), and BioLog EcoPlates for measuring carbon and nitrogen metabolisms (27). Especially in the last decade, high-throughput molecular technologies capable of tracking multiple thousands of biomarkers have been developed for characterizing microbial communities, including high-throughput DNA/RNA sequencing (18, 28–31), PhyloChip (32), GeoChip (33), mass spectrometry-based proteomics for community analysis (34), and metabolite analysis (35). We can group high-throughput molecular microbial detection technologies into two major categories: open and closed formats (16, 17). “Open format” refers to technologies whose potential experimental results cannot be anticipated prior to performing the analysis, and thus, the experimental outcome is considered open. For instance, when using sequencing to analyze a microbial community, we will not know what types of sequences will be obtained prior to sequencing. The main characteristics of technologies of this type are that they typically do not require a priori sequence information from the community of interest (16, 17) and, overall, they enable discovery of new genes, pathways, and taxa (Table 1). This category includes a variety of molecular techniques, such as high-throughput sequencing technologies, screening for functional expression, fingerprinting methods, and mass spectrometry-based proteomic and metabolomic approaches.

TABLE 1

Key differences among open and closed high-throughput platforms for microbial community analysis

Step or parameter	Characteristic or consideration	Description of characteristic or consideration in indicated type of analysis^b					Comments
		Open format			Closed format
		TGS	SMS	MTS	FGAs	PGAs
Sample preparation and analysis	Sample/target preparation	Complicated	Simple	Very complicated	Simple	Simple	DNA/RNA quality is important for all approaches
	Analysis of multiplex samples per assay	Large potential	Medium potential	Medium potential	Low (only one or two)	Low (only one or two)	FGAs and PGAs use 1 or 2 dyes for labeling, and it is difficult to multiplex samples in a single assay
	PCR amplification or whole-genome analysis	Yes	No	No	No/yes	Yes/no	Amplification introduces major problems for quantification
	Potential uneven hybridization	NA	NA	NA	Yes	Yes	Signal normalization is needed within and between arrays to correct signal differences due to systematic errors
Data processing and analysis	Raw data processing	Relatively easy	Difficult	Difficult	Easy	Easy	A major challenge for SMS and MTS with large raw datasets
	Phylogeny	Yes	Some	Some	No/yes	Yes	GeoChip uses gyrB for phylogeny
	Taxonomic resolution	Strain, species, genus	Strain, species	Strain, species	Strain, species	Genus, family	It depends on molecular markers with high resolution for functional genes
	Functional features	No/yes	Yes	Yes	Yes	No	TGS can analyze DNA and RNA for functional genes
	Signal threshold	Yes	NA	NA	Yes	Yes	Both PGAs and FGAs require a threshold to call positive signals, which is more or less arbitrary. Thus, some ambiguity exists for positive or negative spots.
	Requires a priori knowledge	No/yes	No	No	Yes	Yes	Closed-format technologies are designed based on known sequences
	Analysis of α diversity	Very good	Good	Very poor	Fair	Fair	Here, α diversity estimation is based on a single gene
	Data comparison across samples	Moderate	Difficult	Difficult	Easy	Easy	Random or undersampling is a major issue for open-format approaches
Performance	Coverage/breadth (no. of different genes detected)	Very low	High	High	High	Very low	TGS can analyze phylogenetic or functional genes
	Sampling depth (no. of sequences or OTUs per gene)	Very high	Low/medium	Low/medium	Medium	High	The sampling depth for closed-format approaches depends on the number of probes used
	Detection of rare species/genes	Medium	Difficult	Difficult	Easy	Easy	Easy for closed format as long as the appropriate probes are present
	Quantification	Low	Not known	Not known	High	Low/medium	Not rigorously tested for SMS and MTS; for PhyloChip, if RNA is used instead of DNA (no PCR step), quantification is high
	Susceptibility to the artifacts associated with random sampling process	Medium	High	High	Low	Medium/low	A major problem for sequencing approaches; PCR amplification may be involved in PhyloChip
	Potential discovery of novel genes/species	Yes	Yes	Yes	No	No
	Results skewed by dominant populations	Yes	Yes	Yes	No	No
	Sensitivity to (host) DNA/RNA contamination	No/yes	Yes	Yes	No	No	Difficult to remove host DNA/RNA contamination
Applicability and cost	Most promising applications	In-depth studies of microbial diversity or specific functional groups and discovery of novel genes	Surveys of microbial genetic diversity of unknown communities and discovery of novel genes	Surveys of functional activity of unknown microbial communities and discovery of novel genes	Comparisons of functional diversity and structure of microbial communities across many samples	Comparisons of taxonomic or phylogenetic diversity and structure of microbial communities across many samples	The choice of technology mainly depends on the biological questions and hypotheses to be addressed
	Relative cost per assay	Medium	High	High	Low	Low	It is challenging to make general statements of cost because they depend on technology platforms, depth of analysis, and approaches used for processing and analyzing data
	Cost per sample ($)	30–150	1200–4000	1500–4500	150–800	150–1000	This is only based on the cost of materials for target gene amplicon preparations and sequencing.
	Cost for bioinformatic analysis	Medium	High	High	Low	Low

Since various technologies have different features, it is difficult to make straightforward, point-by-point direct comparison. Thus, our attempt is to highlight the major differences of various technologies in a general sense. We attempt to focus on the issues important to microbial ecology within the context of environmental applications and complex microbial communities like those in soil rather than list the differences of various technologies in a comprehensive manner.

TGS, target gene (e.g., 16S rRNA, amoA, nifH) sequencing; SMS, shotgun metagenome sequencing; MTS, metatranscriptome sequencing; FGAs, functional gene arrays: the listed analysis is mostly based on GeoChip; PGAs, phylogenetic gene arrays: the listed analysis is mostly based on PhyloChip; NA, not applicable.

Key differences among open and closed high-throughput platforms for microbial community analysis Since various technologies have different features, it is difficult to make straightforward, point-by-point direct comparison. Thus, our attempt is to highlight the major differences of various technologies in a general sense. We attempt to focus on the issues important to microbial ecology within the context of environmental applications and complex microbial communities like those in soil rather than list the differences of various technologies in a comprehensive manner. TGS, target gene (e.g., 16S rRNA, amoA, nifH) sequencing; SMS, shotgun metagenome sequencing; MTS, metatranscriptome sequencing; FGAs, functional gene arrays: the listed analysis is mostly based on GeoChip; PGAs, phylogenetic gene arrays: the listed analysis is mostly based on PhyloChip; NA, not applicable. “Closed format” refers to the detection technologies whose range of potential experimental results is defined prior to performing the analysis, and thus, the experimental outcome is considered closed. For example, when a functional gene array containing 10,000 probes is used for analyzing a microbial community, the experimental results from this sample cannot go beyond the detection capability of the probes (10,000) fabricated on the array. The main features of technologies of this type are that they require a priori sequence information (16, 17) and they do not provide new molecular information because all molecules used for designing the querying devices are known. DNA arrays (32, 33), protein arrays (36), carbohydrate arrays (37), phenotype arrays (38), and BioLog EcoPlates (27), as well as quantitative PCR, are all considered closed-format technologies. Open- and closed-format technologies typically differ in sample preparation and quality control, data processing and analysis, performance, and application (Table 1), and each presents advantages and limitations. In the following discussion, we compare features important to meaningful applications of each platform by giving special consideration to their usefulness in analyzing complex microbial communities like those in soils. Since next-generation sequencing and microarrays are the best and most widely used representatives of open- and closed-format technologies, respectively, our comparison and discussion are primarily focused on these technologies.

SEQUENCING-BASED HIGH-THROUGHPUT MOLECULAR TECHNOLOGIES FOR MICROBIAL COMMUNITY ANALYSIS.

Sequencing technologies and applications.

Several high-throughput sequencing platforms have been developed and are widely used, including the Illumina (e.g., HiSeq, MiSeq), Roche 454 GS FLX+, SOLiD 5500 series, and Ion Torrent/Ion Proton platforms. The advantages and limitations of these platforms are detailed elsewhere (18, 29, 30, 39–42). Currently, the majority of microbial ecology studies apply high-throughput sequencing by focusing on either targeted gene sequencing with phylogenetic (e.g., 16S rRNA) (29, 43) or functional (e.g., amoA, nifH) (44, 45) gene targets or on shotgun metagenome sequencing (Fig. 1a). For targeted gene sequencing, community DNA is extracted from environmental samples (e.g., samples from soils, sediments, water, bioreactors, or humans) using various extraction and purification methods (46, 47). After high-quality DNA is obtained, targeted genes can be amplified with conserved primers. Each set of primers is generally barcoded with short oligonucleotide tags (6- to 12-mer), as well as sequencing adapters, so that multiple samples can be pooled and sequenced simultaneously (29, 43). Then, after nontarget DNA fragments are removed by gel electrophoresis, target DNA is quantified, sequenced, and analyzed using bioinformatic approaches, such as operational taxonomic unit (OTU) assignment, sequence assembly, phylogeny, and annotation (Fig. 1a) (41).

FIG 1

Key steps of high-throughput metaomic technologies for microbial community analysis. (a) Sequencing-based open-format technologies. Extracted DNA/RNA samples are prepared for sequencing by target gene sequencing (TGS), shotgun metagenome sequencing (SMS), and/or metatranscriptome sequencing (MTS). RT, reverse transcription. (b) Data processing and analysis. Both sequencing- and microarray-based data are processed and then statistically analyzed to address specific microbial ecology questions related to community diversity, composition, structure, function, and network, as well as their linkages with environmental factors. (c) Array-based closed-format technologies. For the GeoChip and PhyloChip, extracted DNA is directly labeled and hybridized, while RNA is first reverse transcribed (RT) to cDNA. DNA and RNA can be amplified by whole-community genome amplification (WCGA) or by whole-community RNA amplification (WCRA), respectively, when there is not enough mass for direct hybridization, but this compromises quantification. Images from both arrays are digitized for further data processing and statistical analysis. Although targeted gene sequencing is a powerful tool for providing information on specific genes within a microbial community, its suitability for analyzing the whole genetic and functional diversity of communities is limited (18). To query broader characteristics and identify novel genes, shotgun metagenome sequencing has been widely used (10, 28, 48–50). Briefly, community DNA is randomly sheared using various methods, including nebulization, endonucleases, or sonication (Fig. 1a) (40). The sheared fragments are end repaired prior to ligation to platform-specific adaptors, which serve as the priming sites for template amplification (40). A transposon-based approach for simultaneous fragmentation and tagging has also become available (40). Subsequent sequencing produces vast amounts of short reads, which can be assembled and annotated for functional characterization (41, 51). The shotgun metagenomic sequencing approach provides community-level information in complex environments with thousands to millions of different archaeal, bacterial, and eukaryotic species (52, 53), such as soil (49), ocean (10, 28), groundwater (54), cow rumen (50), and human microbiome (48), although short read sequences from complex communities cannot always be assembled and only a fraction may be useful for functional or phylogenetic analyses. Targeted and shotgun sequencing of DNA provide snapshots of the gene content and genetic diversity of microbial communities but cannot distinguish between expressed and nonexpressed genes in a given environment. In contrast, metatranscriptomic sequencing (i.e., metatranscriptomics) involves random sequencing of expressed microbial community RNA (Fig. 1a) (31, 55–57). Typically, total RNA extracted from microbial communities is dominated by rRNA, which must be removed to obtain high levels of mRNA transcripts (55, 58). Then, the remaining RNAs are reverse transcribed into cDNAs, ligated to adapters, and sequenced (Fig. 1a) (55, 58). Metatranscriptomic studies have provided insight into microbial community functions and activities from diverse habitats, including soil (59), sediment (60), seawater (31, 57), gut microbiomes (61, 62), and activated sludge (63). However, major challenges include the inherent lability of mRNA, requiring proper nucleic acid stabilization and storage procedures to obtain sufficient quantities of high-quality mRNA. Furthermore, mRNA is still one or more steps away from actualized microbial community functions. Therefore, the further development of proteomics and metabolomics is important to understand microbial community functions in the environment.

Key features of sequencing-based open-format detection technologies.

One of the most appealing features of the sequencing-based open-format approaches is that they are ideal for novel discovery (Table 1). Many new genes, phylotypes, regulators, and/or pathways have been discovered using shotgun metagenome sequencing (48–50, 64). For example, in cow rumen samples, 15 uncultured microbial genomes involved in biomass decomposition were reconstructed along with 27,755 putative carbohydrate-active genes, dozens of which were demonstrated to exhibit carbohydrate-degrading activity despite a <55% average amino acid similarity to known proteins (50). Based on mate-paired short-read oceanic metagenomes, the genome of an uncultured member of a novel class of marine photoheterotrophic Euryarchaeota was reconstructed (65). Sequence analyses of this genome also suggested that proteorhodopsin (28, 66, 67) appears to be of euryarchaeal origin. Metatranscriptomics has also provided new insights into microbial community activities and functions, as well as discovery of novel genes and regulatory elements. For example, the first metatranscriptomic analysis of seawater communities demonstrated that this technique is capable of detecting novel gene- and taxon-specific expression patterns and led to the discovery of novel gene categories undetected in previous DNA-based surveys (31). Subsequently, Shi et al. employed metatranscriptomics to discover well known small RNAs and previously unrecognized putative small RNAs in the ocean’s water column (57). More recently, Haroon et al. (68) used a combination of metagenomics and metatranscriptomics to demonstrate a novel archaeal pathway for anaerobic oxidation of methane coupled with nitrate reduction in an anaerobic bioreactor. Another distinguishing characteristic of the sequencing-based open-format approaches is in the assessment of α and γ diversity. While α diversity is the diversity within a particular area or ecosystem, which is usually expressed as the number of taxa and abundance of each taxon within a community, γ diversity refers to the overall total diversity of taxa/genes for the different ecosystems within a region. Since new genes and taxa can be detected by sequencing-based open-format technologies, deep sequencing of phylogenetically informative genes (e.g., 16S rRNA) or functional genes (e.g., nifH, amoA) is more suitable for estimating α and γ diversity of microbial communities at the whole-community level or functional-population level. With current high-throughput technologies, it is possible to recover substantial portions of the microbial diversity in complex communities, even if only a few samples are analyzed. For instance, deep pyrosequencing analysis of amoA gene fragments in soil communities identified novel amoA sequences and previously undiscovered phylogenetic lineages (44, 45). In addition, many samples can be multiplexed for analysis in a single assay by targeted gene sequencing (29), and so, the experimental cost per single assay or per sample can be very low for this technique (Table 1). There are distinct differences between targeted and shotgun metagenome sequencing approaches in terms of sample preparation, sequence output, and data analysis (Fig. 1a), and some of these differences are particularly important for microbial ecology research (Table 1). Targeted sequencing can provide greater depth of coverage for specific gene(s) of interest (e.g., nifH), while shotgun metagenome sequencing captures information about the community as a whole, as well as divergent homologs not captured by the primers employed. Functional metagenomics can be treated as another open-format approach that does not presuppose or require sequence information, providing the opportunity for novel discovery and representing a powerful complement to shotgun sequencing. This approach involves screening cloned DNA for expressed functional activity in a surrogate host cell (12, 69, 70). Given that the majority of genes in most metagenomic databases do not have homologs with biochemically characterized functions, the opportunity for discovery in the metagenomic sequence space is vast. Although this approach has been used to successfully discover new biosynthetic enzymes (71), degradative enzymes (11, 72), and antibiotics (73, 74), active clones are typically identified at low frequency (typically 1 clone in 10,000 to 100,000 is active). Selective screening, e.g., using antibiotic resistance, can facilitate screening libraries containing 107 or more clones. Functional metagenomic studies of antibiotic resistance in soil (70, 75, 76), water (77), and insect, bird, pig, cow, and human microbiomes (78–80) have yielded a new understanding of the genes encoding antibiotic resistance in natural and managed environments and provide the basis for comparing frequencies of antibiotic resistance among habitats.

Challenges and limitations associated with open-format techniques.

The open-format techniques described above each have their challenges. Some of the major technical challenges for targeted gene sequencing are bias caused by PCR amplification (81–84), sequencing errors, and chimeric sequences (83, 85, 86). In one study, based on 90 identical mock community samples, the average error rate in 16S rRNA pyrotag sequences was 0.6%, and the chimera rate was 8% (83). Sequencing errors have been reduced 30-fold (from 0.6 to 0.02%) by the use of effective sequence analysis pipelines (83, 86). Recently, low-error amplicon sequencing approaches have been developed for human and plant microbiome studies (87, 88). Although the sequencing errors and chimera rates are less problematic for analyses based on assembled sequences, due to sequence overlap and redundancy, they are challenging in studies based on individual sequence reads (83). Sequence errors and chimeras can generate numerous spurious OTUs, which can inflate community diversity estimates by as much as 2 orders of magnitude (16, 82, 86). There is an intense debate regarding how much of the “rare biosphere” is due to sequencing artifacts (43, 82, 83). Thus, great caution and attention to denoising the data are needed when using high-throughput sequencing technologies for estimating microbial community diversity. Another technical challenge for amplicon-based sequencing approaches can be low reproducibility (84, 85, 89–93, 163–165) and poor quantitation (89) due to the artifacts associated with inadequate random sampling (89, 94, 95), amplification biases (82, 83), and/or sequencing errors (83). For example, the subset of 16S molecules that are amplified and the subset of tagged amplified fragments that are attached to the surface of the flow cell (e.g., Illumina) or allocated to beads (e.g., 454) for sequencing is totally random and follows a Poisson random sampling distribution (95). How such artifacts associated with inadequate molecular-level sampling can lead to low technical reproducibility was described with an analogy to reading random words in a book (96) and explicitly demonstrated by recent mathematical modeling and simulations (95). To better visualize the potential differential effects of inadequate random sampling on open- and closed-format detection, it is useful to consider a hypothetical community. We assume that such a microbial community has 50 exponentially distributed taxa with 5,000 individuals (or 16S rRNA molecules) (Fig. 2a), and the community is sampled twice with 1% effort (i.e., 50 individuals) as technical replicates (Fig. 2b). Due to the molecular-level random sampling artifacts generated by insufficient sequences to represent all taxa, the taxon membership and abundance distribution are quite different between these two samples even though they are from the same community (Fig. 2b). Based on mathematical simulation, the overlap between these two samples is approximately 50% (Fig. 2d), which is consistent with experimental observations (84, 85, 89–93). However, as the sampling effort increases, the overlap between samples increases, achieving 95% overlap between two samples with ~20% of the community sampled. If all individuals are effectively sampled, erasing all the random sampling artifacts, 100% overlap is theoretically expected. For one real soil microbial community, on average, more than 60,000 16S rRNA sequences per sample were needed to achieve 90% OTU overlap among three technical replicates (95). Due to artifacts associated with inadequate random sampling, PCR amplification biases, chimeras, and/or sequencing errors, amplicon-based target sequencing is not considered quantitative (85, 89). This is consistent with the results of previous pyrotag sequencing studies (81) and with a general consensus that conventional PCR amplification of the template can introduce significant biases and artifacts (97).

FIG 2

Illustration of random sampling processes and their impacts on the analysis of microbial communities using open- and closed-format metagenomic technologies. (a) A theoretical community contains 50 taxa with 5,000 individuals and follows exponential distribution, λe−λ (λ = 0.01 in this case). The taxa are ranked based on their abundance. Two technical replicates of this community are taken for analysis (sample I and sample II). Also, assume that a microarray is constructed, covering about 50% of the taxa, as indicated by asterisks (*). (b) For sequencing, 1% sampling effort is performed. Overlapping taxa detected in the two samples are indicated by carets (^). (c) The community DNA is directly labeled and hybridized with the microarrays. Because some populations are below the detection limit, only certain portions are detected. Overlapping taxa detected in the two samples are also indicated by carets (^). In both cases (b and c), similar numbers of taxa were detected. (d and e) Jaccard and Bray-Curtis overlaps for the open- and closed-format technologies. Targeted sequencing of functional genes can provide important functional gene information from microbial communities (45, 98); however, there are several challenges associated with this approach. First, widespread lack of sequence conservation across functionally homologous genes can make PCR primer design difficult, leading to lack of detection of relevant functional genes in the environment. Second, even though fairly conserved primers can be designed for some functional genes of interest (e.g., amoA, nifH, nirS, nirK), the success of amplification is habitat/ecosystem dependent, most likely due to variations in the quality of extracted DNA, community complexity, sequence divergence, and target gene abundance. As a result, comparative studies can be compromised or impossible (99). In addition, preparing high-quality libraries of amplified PCR products for various functional genes from multiple samples is often difficult. Nonspecific amplification requires the tedious and time-consuming step of additional gel purification of PCR products prior to sequencing, which could substantially slow down the sequencing process as a whole. Shotgun metagenomic sequencing avoids many of the biases encountered in amplicon sequencing because it does not require amplification prior to sequencing. While it often fails to provide sufficient sequence depth to assemble and model the genomes of individual species (41, 100), especially in complex microbial communities like those found in soils, whole-genome recovery from ever more complex communities is now possible (50, 64, 101). Another obstacle to adequate sequence coverage is contaminant DNA, particularly in host-associated microbiome studies, where sequence data may be predominantly from the host (41, 102). Sequence-based open-format approaches can also be impaired by dominant populations in the sample, which may be excessively oversampled. In metatranscriptomic studies, this issue can be compounded by high rRNA abundance (55). Data analysis can be challenging for the open-format sequencing technologies, particularly shotgun sequencing data, as the assembly and analysis of large sequencing data sets are computationally demanding and often require specialized computing hardware (50, 51, 64). Many genome-oriented analyses of interest are still impractical with short reads alone (102). Also, although many studies are focused on single-read-based analysis, statistical analysis of large short read datasets is time consuming and sometimes only a fraction of reads are usable for biological inference (103), depending on the length of the reads and the availability of representative reference genomes. With frequent changes in technology, there may be little consensus on appropriate procedures for quality filtering and statistical validation. However, with recent rapid advances in both hardware and software for data analysis, plus an ever-growing genome database, sequence data analysis is constantly improving. In addition, for MiSeq-based target gene sequencing data, considerable variations (up to 10-fold) of the estimated OTU numbers can be obtained from the same data set with different computational software tools (e.g., UCLUST versus UPARSE) (86), which presents a challenge for microbial diversity assessments; however, an increasing number of controlled benchmarking experiments are addressing these issues. The challenges of functional metagenomics are largely associated with barriers to heterologous gene expression. Transcription and translation machinery of the surrogate host must recognize cues in the foreign DNA, and authentic posttranslational modification, protein secretion, and/or availability of precursors for synthesis of active small molecules may not be sufficiently coordinated to enable detection of the active product (104, 105). These challenges have been addressed using phylogenetically diverse hosts (106) and promoters tailored to the host species (107, 108). Functional metagenomics is also laborious and time consuming, providing deep information about a small collection of clones that is in sharp contrast with the expansive views provided by high-throughput sequencing or microarrays. Yet the functional analysis of novel gene products that lack sequence similarity to genes of known function is necessary to illuminate the contents of the vast collection of genes with no known functions that are now in metagenomic databases.

CLOSED-FORMAT MICROARRAY-BASED HIGH-THROUGHPUT DETECTION APPROACHES FOR MICROBIAL COMMUNITY ANALYSIS

Array-based detection technologies.

Various types of DNA microarrays have been developed for microbial detection and community analyses (109), including phylogenetic and functional gene arrays as two main categories. Phylogenetic gene arrays often target rRNA genes, which are useful for identifying specific taxa within microbial communities and studying phylogenetic relationships among different microorganisms. Different types of phylogenetic gene arrays have been developed for microbial ecology applications, such as the PhyloChip (32) that broadly targets known taxa, a microbiota microarray (110) targeting human gut microbiomes, COMPOCHIP targeting compost-degrading microbial communities (111), and SRP-PhyloChip for detecting sulfate-reducing microorganisms (112). PhyloChip is the most comprehensive and widely used phylogenetic gene array. It is a photolithographic Affymetrix-based technology with 25-mer oligonucleotide probes to discriminate 16S rRNA gene sequences in microbial communities. The most recent version of the PhyloChip (G3) has probes targeting ~60,000 operational taxonomic units (OTUs), representing 2 domains (Archaea and Bacteria), 147 phyla, 1,123 classes, 1,219 orders, 1,464 families, and 10,993 subfamilies (32). Generally, 16S rRNA genes are extracted and PCR amplified from microbial community DNA and then biotin labeled for PhyloChip hybridization and digital image detection (Fig. 1) (32, 113, 114). Functional gene arrays contain probes targeting genes involved in various biogeochemical cycling processes or specific genomes (115), pangenomes (116), or metagenomes (117), which are useful for monitoring the functional composition and structure of microbial communities (Fig. 1c). Over the past decade, different types of functional gene arrays have been developed, including GeoChip, a generic functional array targeting hundreds of functional gene categories for biogeochemical, ecological, and environmental analyses (33, 118), as well as arrays for detecting specific functional processes, such as nitrogen cycling (119, 120), methanotrophy (121), virulence (122, 123), stress responses (124), hydrogen production and consumption (125), marine microbial communities (117), and bioleaching potential (Fig. 1c) (126). The most recent GeoChip (version 5.0) contains about 167,000 50-mer oligonucleotide probes covering ~395,000 coding sequences from >1,590 functional genes related to microbial (archaea, bacteria, fungi, and protists) carbon, nitrogen, sulfur, and phosphorus cycling, energy metabolism, antibiotic resistance, metal homeostasis and resistance, secondary metabolism, organic remediation, stress responses, bacteriophages, and virulence. GeoChip also uses phylogenetic markers like gyrB rather than 16S rRNA genes for fine-level phylogenetic analysis (33, 127). To fabricate the GeoChip, it is designed using sequences retrieved from public databases and the CommOligo program (128). Once probes are selected, microarrays are spotted or photolithographically manufactured (e.g., Roche NimbleGen and Agilent). In general, community nucleic acids are extracted, directly labeled with fluorescent dyes, hybridized with GeoChip, and digitally imaged (Fig. 1). Specificity, sensitivity, and quantitation are critical parameters for any technique used to detect and monitor microorganisms in natural environments, due to the presence of numerous orthologous sequences for each gene in a sample (33). Extremely stringent conditions can improve microarray hybridization specificity, generating results that can be species/strain specific (33, 129). Also, only moderate amounts of total community DNA are needed for PhyloChip and GeoChip analyses. For instance, generally, 0.5 to 2.0 µg of PCR amplicons or ~2.0 µg of total RNA are needed for PhyloChip hybridization (113, 114), and the PhyloChip exhibits a detection limit of 107 copies or 0.01% of nucleotides hybridized to the array (114, 130). For GeoChip hybridization, samples comprising 0.2 to 2.0 µg of DNA or 2 to 5 µg of total RNA (33, 118) are needed, depending on the array format. If the amount of community DNA or RNA is not sufficient, it can be amplified using whole-community genome amplification (131) or whole-community RNA amplification (132), with initial DNA concentrations as low as 10 fg (~2 bacterial cells) resulting in positive detection but not accurate quantification (131). With appropriate amounts of unamplified material, reliable quantitation can be obtained with microarrays like the GeoChip (33) and PhyloChip (130). For example, GeoChip-based studies have shown good correlations between target DNA or RNA concentrations and hybridization signal intensities using pure cultures, mixed cultures, and environmental samples without amplification (33, 129–132) over DNA input amounts varying by 5 orders of magnitude (0.01 to 500 ng) (131). Good correlations have also been reported between PhyloChip signal intensities and quantitative PCR copy numbers of over 5 orders of magnitude (130, 133). It should be noted that PCR amplification biases also occur with the PhyloChip-based detection approach if the 16S rRNA genes are PCR amplified for detection prior to hybridization. Recently, two PCR-independent methods have been developed as viable alternatives to PCR-amplified microbial community analysis for PhyloChip analysis (113, 114).

Key features of array-based detection.

Technical reproducibility in array-based closed-format technologies is less affected by inadequate random sampling than open-format sequencing technologies. To better illustrate this point, we return to the hypothetical community described above (Fig. 2a) to analyze it with a microarray-based technology. Even if the arrays only have probes covering half of the taxa in the community, the simulated overlap between two replicate samples is expected to be above 90% (Fig. 2e). However, taxa with no probes or taxa whose abundance is below the array detection limit will remain undetected by the microarrays (Fig. 2c). The number of taxa detected is defined by the probe sets on the array, and the overlap between samples is less dependent on the level of sampling effort. Furthermore, because hybridization is reasonably quantitative, the taxon identities and abundance distribution are very similar between replicates. Consequently, depending on the sampling coverage of microbial communities, technical reproducibility can be a significant issue in open-format approaches, while it is minimized in closed-format approaches (94, 134). As a consequence, open- and closed-format detection can yield different results when they are used for comparing microbial community structure. This could be particularly important in examining microbial taxa-area relationships (TARs), one of the best studied and documented patterns in biogeography (94, 134), because taxon richness data are used. The lower susceptibility to random sampling artifacts associated with closed-format-based detection approaches renders them better suited for assessing β diversity, which describes the site-to-site variability in taxon/gene/population composition among communities (89, 94, 95, 117), as well as for detecting low-abundance organisms (117, 135) Another feature of the array-based closed-format detection is that it is less affected by dominant genes/populations because, although detection is confined to the defined probe set (Table 1), even low-abundance populations present at numbers above the detection limit will be detected (135). Unlike the sequencing-based open-format detections, the array-based closed-format detections are also less susceptible to contaminant DNAs or rRNAs because only targeted nucleic acids generate signals and, hence, interference from the contaminating nucleic acids is minimal (133). Compared to other high-throughput technologies that target a single gene, such as targeted sequencing and phylogenetic gene arrays, functional gene arrays have several unique features (Table 1). First, they are capable of simultaneously identifying and quantifying many microbial functional genes/pathways that are important for biogeochemical, environmental, and ecological processes, which is critical for ecosystem-level studies, functional biodiversity (136), and trait-based microbial biogeography (137). In contrast, 16S rRNA gene-based techniques do not provide functional information. Second, functional gene arrays can have higher taxonomic resolution than the 16S rRNA gene-based approaches because functional gene markers are generally more divergent than phylogenetic gene markers (129). High taxonomic resolution is important for differentiating treatment effects and examining fine-scale biogeographical patterns. Moreover, technologies that do not require PCR amplification can provide reliable quantitative information on the genes detected (8, 89, 129), across space, time and environmental gradients. However, unlike the 16S rRNA gene-based technologies, functional arrays may not be suitable for providing phylogenetic information at high taxonomic levels (e.g., family and above), due to faster molecular evolution (i.e., rapid mutation saturation), lack of representation on the array, and complications associated with horizontal gene transfer for some functional genes, especially for the genes involved in metal resistance, antibiotic resistance, and contaminant degradation. Rapid mutational saturation of the functional genes could make them less suitable for broad-scale (e.g., continental) microbial biogeographical investigations because the functional genes among various communities could diverge too quickly to preserve signals that would be reliable for resolving broad-scale biogeographical patterns. The beneficial characteristics of closed-format technologies, including high throughput, low detection limits, high reproducibility, and/or potential for quantification enables them to provide novel insights into specific ecosystems of interest. For instance, surprisingly rich and diverse metabolic reservoirs of microbial communities were revealed using these technologies in a hydrothermal vent chimney (135), Antarctic dry valleys (138), and urban aerosols (130). The importance of stochasticity in controlling ecological diversity and succession was also recently demonstrated by GeoChip-based functional community structure data (139, 140).

Challenges and limitations of array-based closed-format detection technologies.

Unlike the sequencing-based open-format detection technologies, one of the main drawbacks of the closed-format technologies is that they do not enable novel discoveries, such as new genes, taxa, and/or regulatory elements. This is because the input required for array construction must be based upon known sequence information. Thus, the closed-format approaches are not suitable for novel explorations. Another major limitation of the array-based closed format is that all of the probes on the arrays are derived from a chosen set of genes/sequences that do not necessarily represent the known diversity of the microbial communities of interest. As a result, closed-format technologies will fail to detect potentially important taxa not represented on the microarrays, potentially underestimating the diversity of microbial communities. Thus, it is necessary to continuously update closed-format technologies to reflect the expanding knowledge generated by open-format technologies. Since high-throughput sequencing is ideal for characterizing diversity and discovering new genes, while functional metagenomics assigns function to genes of previously unknown function, coupling high-throughput sequencing approaches, functional expression, and array hybridization is desirable for describing microbial community structure, function, and activity in a comprehensive manner that includes both depth and breadth, as well as quantitative and qualitative surveys. Although many technical challenges regarding environmental applications of microarrays have been solved over the last decade, several critical bottlenecks still limit the technology. One critical issue is the designing of oligonucleotide probes specific to the target genes/microorganisms of interest when sequences of a particular phylogenetic/functional gene are highly homologous and/or incomplete. This is especially challenging when using arrays for analyzing complex natural systems, since the majority of microorganisms (1, 2) are not yet cultivated and, even among cultured organisms, the biochemical functions of many genes have not been assigned, dramatically compounding this issue. In addition, due to the variability of reagents (e.g., dyes) and hybridization dynamics, large variations within or between technical microarray replicates are sometimes observed, so that normalization within and between replicates (32, 33, 127, 141, 142) is generally needed. Such variations could affect the probe numbers detected and their quantitation if they are not well controlled experimentally. Various types of controls and skilled personnel with extensive experience are important to minimize such variations. Finally, due to sequence conservation and the complicated nature of surface hybridization, there can be low-level cross-hybridization to nontarget genes/strains. The challenge is to distinguish true hybridization signals from nonspecific background noise. Also, differentiating genes/populations with low abundance/expression from those not present or not expressed can be a challenge. Generally, subjective thresholds of signal intensity based on signal-to-noise ratio are applied to call-positive signals (33). Thus, great caution is needed in interpreting the gene numbers detected when estimating microbial diversity.

CRITICAL ISSUES IN THE USE OF HIGH-THROUGHPUT METAGENOMIC TECHNOLOGIES TO ADDRESS ECOLOGICAL QUESTIONS

Quality of community DNA/RNA.

Variations in DNA extraction methods can have dramatic impacts on the results of metagenomic studies, especially in high-diversity communities like those in soil (91, 143). Obtaining representative high-quality DNA and RNA from environmental samples is challenging since different populations within the community may require different lysis conditions and diverse, sometimes unidentified contaminants must be removed (144, 145). Thus, any comparisons between studies, whether the analysis is an open or closed format, must be undertaken with caution if nucleic acid extraction methods vary among the studies compared. High-molecular-weight DNA is required to produce representative, quantitative, and efficient amplification of whole-community DNA for microarray analysis (131), to build long-range mate-pair libraries for effective scaffolding of metagenome sequence, and to perform functional metagenomic studies in which entire genes or gene clusters linked to their regulatory sequences need to be maintained intact. But the gentle extraction methods that produce large DNA fragments may underrepresent cells that are harder to lyse, such as those of Gram-positive bacteria and archaea (131). In addition, metagenomic DNA should be sufficiently pure (e.g., A260/A230 ratios of >1.7) for subsequent experimental analyses, such as template amplifications, tagging, or dye labeling. Although PCR amplification can occasionally be obtained with lower-quality DNA, such amplifications might be unreliable and carry the risk of biases, errors, and artifacts. Since community DNA extracted and processed using many commercial kits is often of low purity or low molecular weight, well established custom-optimized DNA extraction protocols are preferred for certain applications (46, 47), ensuring that reliable experimental data are generated for subsequent resource- and effort-intensive analyses and interpretation.

Biological and technical replicates.

The composition, structure, activities, and dynamics of microbial communities in natural settings are shaped by a variety of biological (e.g., competition, predation, mutualistic interactions) and environmental (e.g., temperature, pH, and moisture) factors, which are generally characterized by high spatial and temporal variability. Quantifying the scale at which variation is of interest (between sites, samples, subsamples, nucleic acid extractions, or PCR amplifications) is necessary to determine the nature and degree of replication and to design proper statistical analysis and interpretation of results. That is, it is not possible to determine whether communities in different environments differ significantly if the within-site variability in sampling and analysis is not known. Technical replicates (splitting one sample into two or more aliquots prior to parallel processing and analysis) are useful for estimating the variability associated with the multiple steps of sample processing and analytical methods. On the other hand, biological replicates (e.g., multiple samples taken from soil plots or microcosms that have been manipulated identically) are necessary for estimating spatial and temporal variability associated with experimental conditions so that proper statistical analysis can lead to appropriate interpretation of data (89, 146). This is especially important for highly heterogeneous soil samples (114, 147). Having a priori knowledge of the expected ranges of variability allows the experimental design to integrate appropriate numbers and types of replicates. For example, while technical replicates are often not performed with photolithographic microarrays due to the known analytical reproducibility of those platforms, biological replicates are essential for proper statistical analysis (114). In contrast, both technical and biological replicates could be important for PCR amplicon-based sequencing approaches (85, 89). In the early years of molecular microbial ecology, many studies were performed without sufficient biological replicates for valid quantitative comparisons and statistical analysis (146). In particular, targeted gene sequencing data may have higher technical variation, which could make comparative studies challenging, particularly with inadequate sampling and replication (89). Increasing the biological replicates, even at the cost of sampling depth, can be an effective way to improve the comparability of data (4, 89, 146). Based on past experience with soils, 3 to 12 biological replicates are needed in typical microbial ecology studies, and more replicates are needed for proper network analysis (4, 148).

Sampling, replication, and sequencing depth.

The site-to-site variability in species/taxon composition, known as β diversity, is crucial to understanding spatiotemporal patterns of species diversity and the mechanisms controlling community composition and structure, which is a central but poorly understood issue in ecology, especially in microbial ecology. However, quantifying β diversity in microbial ecology by using sequencing-based metagenomic technologies requires proper experimental design, including suitable replication, minimal amplification, adequate depth, and stringent quality control (89, 95). With recent advances in sequencing technologies and associated reductions in cost, appropriate replication can be attained with greater sequencing coverage (29). Balancing sequencing depth with the number of samples per sequencing run is dependent on the biological question and the complexity of the community (8, 32, 89). If the objective is to differentiate the impacts of various conditions (e.g., warming versus nonwarming or high versus low CO2 exposure) on microbial community structure, sampling only dominant microorganisms could be sufficient, necessitating less sequencing coverage per sample (149). However, if the objective is to focus on microbial diversity, distribution, and biogeography, sampling rare taxa could be more important, and thus, deep sequencing with up to millions of reads per sample may be preferred (29, 150). Increasing the sequencing depth will reduce the chance of artifacts associated with random sampling (95). In addition, the sampling effort generally depends on the variations between microbial communities to be compared. For communities that share great similarity, deeper sequencing is needed to distinguish treatment effects on microbial communities (29).

Relative comparisons.

In analyses of microbial communities with high-throughput molecular technologies, relative comparisons are often valid when absolute measurements are not possible. Making relative comparisons mitigates the possible effects of technical variations associated with both open and closed detection formats, such as incomplete cell lysis in DNA extraction, PCR amplification biases, chimerism, sequencing errors, molecular-level random sampling artifacts, variability in bioinformatics analysis, specificity, sensitivity, and/or quantification issues. In general, relative changes in microbial communities can be reliably measured by sequence abundance or treatment sample/control sample hybridization signal ratios. When ratios are used under the assumption that technical variations are similar between the treatment and control samples, such a relative comparison could cancel out the effects of technical variations on the final experimental outcomes and, hence, increase quantitative accuracy (142). Describing relative changes between treatment and control samples is usually defensible, whereas describing absolute changes is much more complicated (32, 147). One drawback of applying a strictly relative approach is that changes in relative abundances can easily mask large changes in actual abundances. In some cases, the change in absolute abundance can be more informative in describing the dynamics of a population in a community. For example, a 10-fold increase in the expression of a gene with extremely low abundance may simply be an artifact, whereas a 10-fold increase in a moderately abundant gene is more likely to be biologically meaningful. Ideally, both relative and absolute abundances should be used, but caution is needed in the interpretation of data, including assessments of statistical significance.

INTEGRATED FRAMEWORK FOR ANALYZING COMPLEX MICROBIAL COMMUNITIES

A wide variety of open- and closed-format technologies have been developed, each having distinct features and advantages suitable for different applications in microbial ecology, and thus, they provide complementary approaches for addressing microbial ecology questions (Table 1). Here, we describe an integrated workflow for analyzing microbial communities from different environments using high-throughput metaomic technologies (Fig. 3). Cultivated microorganisms are isolated and sequenced to study their physiology, ecology, gene functions, and regulation. For not-yet cultivated microorganisms, single-cell genomics (151) may provide similar information. To study microbial communities, extracted nucleic acids (DNA/RNA) are analyzed by high-throughput sequencing, including targeted gene sequencing, metagenome, and/or metatranscriptome sequencing. The resulting sequence data are assembled, annotated, and analyzed with information from reference isolates or single-cell genomes (Fig. 3) (152). Functional metagenomics or stable isotope probing (153) can be integrated into the workflow to assign functions to hypothetical genes and uncharacterized populations. Metaomic data and functional information can then be used to develop more comprehensive microarray technologies that complement sequencing. Subsequently, both sequencing and microarray data might be used to link the microbial community structure to ecosystem metadata (e.g., biogeochemical variables) with deeper sampling. In this manner, open- and closed-format technologies can be used as complementary tools for examining microbial community diversity and distribution and to address fundamental questions in microbial ecology. In addition, the data can be used for studying microbial network interactions, identifying keystone species/populations, examining the effects of environmental perturbations, and simulating and modeling community dynamics for predictive microbial ecology (148, 154).

FIG 3

An integrated workflow for analyzing microbial communities from different environments using high-throughput metaomic technologies. DNA, RNA, proteins, and/or metabolites are extracted from environmental samples for sequencing and protein/metabolite identification. At the same time, physiological, ecological, and functional information can be obtained via reference genomes and single-cell genomics, which helps with sequencing data analysis and functional annotation, generating useful information for microarray development, especially with novel genes. Microarray-based technologies can be used as a routine tool to address various microbial ecology questions in a rapid and cost-effective manner. Furthermore, metagenomic, metaproteomic, metametabolomic, stable isotope probing, and microarray data can be used alone or coupled with metadata for network analysis and modeling, understanding of microbial diversity, distribution and assembly mechanisms, and linking the microbial community structure with both environmental factors and ecosystem functioning.

CONCLUDING REMARKS AND FUTURE PERSPECTIVES

Significant progress has been made in the development and application of high-throughput molecular technologies for microbial community analysis, but many challenges still remain, especially in the context of environmental applications. For instance, metagenomic sequence assembly, especially from complex communities like those in soil, is one of the grand challenges in bioinformatics (51, 155) although metagenome-specific assembly algorithms (155) and methods for “binning” genomes from metagenome data (64, 156) have led to numerous successes. Single-cell genomics technologies are also proving to be a powerful complement to metagenome studies (Fig. 3) (50, 151, 152). Another grand challenge for the application of high-throughput molecular tools for microbial community research is the analysis, visualization, and interpretation of massive amounts of both sequencing and array data, especially shotgun metagenome sequencing data (16, 18, 41, 100). For instance, it is difficult to annotate abundant short read sequences to be tabulated and compared in an intuitive manner. This limits our ability to address ecological questions related to microbial biodiversity (e.g., taxonomic, phylogenetic, genetic, functional diversity), functional trait-based microbial biogeography (94, 134, 137), and ecosystem functioning, stability, and succession (157–159). Many excellent bioinformatics tools have been developed for processing, mining, visualizing, and comparing molecular data (41), but they are not optimized for dealing with the vast amounts of experimental data from complex communities like those in soil. Network tools to delineate the interactions among different microbial populations based on high-throughput metagenomics datasets are a promising new development, since understanding the interactions among different species is a central but poorly understood issue in microbial ecology (Fig. 3) (4, 99, 160). Each omics technology has its strengths and weaknesses and must be selected based on the biological questions and objectives of the study (Fig. 3). In general, open-format technologies are most suitable for exploratory discovery studies, whereas the closed-format technologies can be advantageous for more narrowly defined, hypothesis-driven, quantitative, and comparative studies (117). As sequencing technologies improve and costs decrease, high-throughput sequencing may replace microarrays as the method of choice for many applications (40), but for now, microarray-based closed-format approaches play a valuable role in microbial community analysis, especially for complex microbial communities whose comprehensive sampling remains infeasible (16). Functional metagenomics will continue to identify functions of previously unknown genes. As more functional gene sequences of interest become available, functional arrays that are both more comprehensive (e.g., the next generation of GeoChip, with up to 1 million probes) and more specific (e.g., PathoChip and StressChip) (123, 124) will be developed for addressing different ecological questions and applications. Also, high-throughput molecular technologies should be integrated with other approaches, such as single-cell genomics, metaproteomics (161), and metametabolomics (35, 162), as well as targeted techniques like stable isotope probing (Fig. 3), to address ecological questions and hypotheses within the context of environmental and medical applications. Only in this way will their power for microbial community analysis be realized. The ultimate goal of microbial ecology is to understand who is where, with whom, doing what, why, and when (159). To answer such questions, reliable, reproducible, quantitative, and statistically valid (146) experimental information on community-wide spatial and temporal dynamics is needed. Also, to achieve this predictive goal, it is essential to model microbial community dynamics and their behaviors at both structural and functional levels (Fig. 3). With the rapid and continuous advances of molecular high-throughput technologies and high-performance computational tools, it is anticipated that in the not-too-distant future, microbiologists will be able to model and predict the behaviors of microbial communities. A new era of quantitative predictive microbial ecology is coming.

154 in total

1. Validation of two ribosomal RNA removal methods for microbial metatranscriptomics.

Authors: Shaomei He; Omri Wurtzel; Kanwar Singh; Jeff L Froula; Suzan Yilmaz; Susannah G Tringe; Zhong Wang; Feng Chen; Erika A Lindquist; Rotem Sorek; Philip Hugenholtz
Journal: Nat Methods Date: 2010-09-19 Impact factor: 28.547

2. GeoChip 3.0 as a high-throughput tool for analyzing microbial community composition, structure and functional activity.

Authors: Zhili He; Ye Deng; Joy D Van Nostrand; Qichao Tu; Meiying Xu; Christopher L Hemme; Xingyuan Li; Liyou Wu; Terry J Gentry; Yifeng Yin; Jost Liebich; Terry C Hazen; Jizhong Zhou
Journal: ISME J Date: 2010-04-29 Impact factor: 10.302

Review 3. Replicate or lie.

Authors: James I Prosser
Journal: Environ Microbiol Date: 2010-04-28 Impact factor: 5.491

Review 4. Comparing microarrays and next-generation sequencing technologies for microbial ecology research.

Authors: Seong Woon Roh; Guy C J Abell; Kyoung-Ho Kim; Young-Do Nam; Jin-Woo Bae
Journal: Trends Biotechnol Date: 2010-04-08 Impact factor: 19.536

5. mRNA-based parallel detection of active methanotroph populations by use of a diagnostic microarray.

Authors: Levente Bodrossy; Nancy Stralis-Pavese; Marianne Konrad-Köszler; Alexandra Weilharter; Thomas G Reichenauer; David Schöfer; Angela Sessitsch
Journal: Appl Environ Microbiol Date: 2006-02 Impact factor: 4.792

6. Proteorhodopsin lateral gene transfer between marine planktonic Bacteria and Archaea.

Authors: Niels-Ulrik Frigaard; Asuncion Martinez; Tracy J Mincer; Edward F DeLong
Journal: Nature Date: 2006-02-16 Impact factor: 49.962

7. Microbial communities in wetlands of the Athabasca oil sands: genetic and metabolic characterization.

Authors: Alisonk M Hadwin; Luis F Del Rio; Linda J Pinto; Morgan Painter; Richard Routledge; Margo M Moore
Journal: FEMS Microbiol Ecol Date: 2006-01 Impact factor: 4.194

8. Fluostatins produced by the heterologous expression of a TAR reassembled environmental DNA derived type II PKS gene cluster.

Authors: Zhiyang Feng; Jeff H Kim; Sean F Brady
Journal: J Am Chem Soc Date: 2010-09-01 Impact factor: 15.419

9. The light-driven proton pump proteorhodopsin enhances bacterial survival during tough times.

Authors: Edward F DeLong; Oded Béjà
Journal: PLoS Biol Date: 2010-04-27 Impact factor: 8.029

10. Comparative analyses of the bacterial microbiota of the human nostril and oropharynx.

Authors: Katherine P Lemon; Vanja Klepac-Ceraj; Hilary K Schiffer; Eoin L Brodie; Susan V Lynch; Roberto Kolter
Journal: MBio Date: 2010-06-22 Impact factor: 7.867

84 in total

1. Microbial Community Analysis with Ribosomal Gene Fragments from Shotgun Metagenomes.

Authors: Jiarong Guo; James R Cole; Qingpeng Zhang; C Titus Brown; James M Tiedje
Journal: Appl Environ Microbiol Date: 2015-10-16 Impact factor: 4.792

Review 2. Interactions between host and gut microbiota in domestic pigs: a review.

Authors: Yadnyavalkya Patil; Ravi Gooneratne; Xiang-Hong Ju
Journal: Gut Microbes Date: 2019-11-24

3. Metabolic Interactions of a Chain Elongation Microbiome.

Authors: Wenhao Han; Pinjing He; Liming Shao; Fan Lü
Journal: Appl Environ Microbiol Date: 2018-10-30 Impact factor: 4.792

4. A general framework for quantitatively assessing ecological stochasticity.

Authors: Daliang Ning; Ye Deng; James M Tiedje; Jizhong Zhou
Journal: Proc Natl Acad Sci U S A Date: 2019-08-07 Impact factor: 11.205

5. Nitrogen Cycle Evaluation (NiCE) Chip for Simultaneous Analysis of Multiple N Cycle-Associated Genes.

Authors: Mamoru Oshiki; Takahiro Segawa; Satoshi Ishii
Journal: Appl Environ Microbiol Date: 2018-04-02 Impact factor: 4.792

6. Distance-Decay Relationship for Biological Wastewater Treatment Plants.

Authors: Xiaohui Wang; Xianghua Wen; Ye Deng; Yu Xia; Yunfeng Yang; Jizhong Zhou
Journal: Appl Environ Microbiol Date: 2016-07-29 Impact factor: 4.792

7. Warming and nutrient enrichment in combination increase stochasticity and beta diversity of bacterioplankton assemblages across freshwater mesocosms.

Authors: Lijuan Ren; Dan He; Zhen Chen; Erik Jeppesen; Torben L Lauridsen; Martin Søndergaard; Zhengwen Liu; Qinglong L Wu
Journal: ISME J Date: 2016-12-09 Impact factor: 10.302