| Literature DB >> 25626903 |
Jizhong Zhou1, Zhili He2, Yunfeng Yang3, Ye Deng, Susannah G Tringe4, Lisa Alvarez-Cohen.
Abstract
Understanding the structure, functions, activities and dynamics of microbial communities in natural environments is one of the grand challenges of 21st century science. To address this challenge, over the past decade, numerous technologies have been developed for interrogating microbial communities, of which some are amenable to exploratory work (e.g., high-throughput sequencing and phenotypic screening) and others depend on reference genes or genomes (e.g., phylogenetic and functional gene arrays). Here, we provide a critical review and synthesis of the most commonly applied "open-format" and "closed-format" detection technologies. We discuss their characteristics, advantages, and disadvantages within the context of environmental applications and focus on analysis of complex microbial systems, such as those in soils, in which diversity is high and reference genomes are few. In addition, we discuss crucial issues and considerations associated with applying complementary high-throughput molecular technologies to address important ecological questions.Entities:
Mesh:
Year: 2015 PMID: 25626903 PMCID: PMC4324309 DOI: 10.1128/mBio.02288-14
Source DB: PubMed Journal: MBio Impact factor: 7.867
Key differences among open and closed high-throughput platforms for microbial community analysis
| Step or parameter | Characteristic or consideration | Description of characteristic or consideration in indicated type of analysis | Comments | ||||
|---|---|---|---|---|---|---|---|
| Open format | Closed format | ||||||
| TGS | SMS | MTS | FGAs | PGAs | |||
| Sample preparation and analysis | Sample/target preparation | Complicated | Simple | Very complicated | Simple | Simple | DNA/RNA quality is important for all approaches |
| Analysis of multiplex samples per assay | Large potential | Medium potential | Medium potential | Low (only one or two) | Low (only one or two) | FGAs and PGAs use 1 or 2 dyes for labeling, and it is difficult to multiplex samples in a single assay | |
| PCR amplification or whole-genome analysis | Yes | No | No | No/yes | Yes/no | Amplification introduces major problems for quantification | |
| Potential uneven hybridization | NA | NA | NA | Yes | Yes | Signal normalization is needed within and between arrays to correct signal differences due to systematic errors | |
| Data processing and analysis | Raw data processing | Relatively easy | Difficult | Difficult | Easy | Easy | A major challenge for SMS and MTS with large raw datasets |
| Phylogeny | Yes | Some | Some | No/yes | Yes | GeoChip uses | |
| Taxonomic resolution | Strain, species, genus | Strain, species | Strain, species | Strain, species | Genus, family | It depends on molecular markers with high resolution for functional genes | |
| Functional features | No/yes | Yes | Yes | Yes | No | TGS can analyze DNA and RNA for functional genes | |
| Signal threshold | Yes | NA | NA | Yes | Yes | Both PGAs and FGAs require a threshold to call positive signals, which is more or less arbitrary. Thus, some ambiguity exists for positive or negative spots. | |
| Requires | No/yes | No | No | Yes | Yes | Closed-format technologies are designed based on known sequences | |
| Analysis of α diversity | Very good | Good | Very poor | Fair | Fair | Here, α diversity estimation is based on a single gene | |
| Data comparison across samples | Moderate | Difficult | Difficult | Easy | Easy | Random or undersampling is a major issue for open-format approaches | |
| Performance | Coverage/breadth (no. of different genes detected) | Very low | High | High | High | Very low | TGS can analyze phylogenetic or functional genes |
| Sampling depth (no. of sequences or OTUs per gene) | Very high | Low/medium | Low/medium | Medium | High | The sampling depth for closed-format approaches depends on the number of probes used | |
| Detection of rare species/genes | Medium | Difficult | Difficult | Easy | Easy | Easy for closed format as long as the appropriate probes are present | |
| Quantification | Low | Not known | Not known | High | Low/medium | Not rigorously tested for SMS and MTS; for PhyloChip, if RNA is used instead of DNA (no PCR step), quantification is high | |
| Susceptibility to the artifacts associated with random sampling process | Medium | High | High | Low | Medium/low | A major problem for sequencing approaches; PCR amplification may be involved in PhyloChip | |
| Potential discovery of novel genes/species | Yes | Yes | Yes | No | No | ||
| Results skewed by dominant populations | Yes | Yes | Yes | No | No | ||
| Sensitivity to (host) DNA/RNA contamination | No/yes | Yes | Yes | No | No | Difficult to remove host DNA/RNA contamination | |
| Applicability and cost | Most promising applications | In-depth studies of microbial diversity or specific functional groups and discovery of novel genes | Surveys of microbial genetic diversity of unknown communities and discovery of novel genes | Surveys of functional activity of unknown microbial communities and discovery of novel genes | Comparisons of functional diversity and structure of microbial communities across many samples | Comparisons of taxonomic or phylogenetic diversity and structure of microbial communities across many samples | The choice of technology mainly depends on the biological questions and hypotheses to be addressed |
| Relative cost per assay | Medium | High | High | Low | Low | It is challenging to make general statements of cost because they depend on technology platforms, depth of analysis, and approaches used for processing and analyzing data | |
| Cost per sample ($) | 30–150 | 1200–4000 | 1500–4500 | 150–800 | 150–1000 | This is only based on the cost of materials for target gene amplicon preparations and sequencing. | |
| Cost for bioinformatic analysis | Medium | High | High | Low | Low | ||
Since various technologies have different features, it is difficult to make straightforward, point-by-point direct comparison. Thus, our attempt is to highlight the major differences of various technologies in a general sense. We attempt to focus on the issues important to microbial ecology within the context of environmental applications and complex microbial communities like those in soil rather than list the differences of various technologies in a comprehensive manner.
TGS, target gene (e.g., 16S rRNA, amoA, nifH) sequencing; SMS, shotgun metagenome sequencing; MTS, metatranscriptome sequencing; FGAs, functional gene arrays: the listed analysis is mostly based on GeoChip; PGAs, phylogenetic gene arrays: the listed analysis is mostly based on PhyloChip; NA, not applicable.
FIG 1 Key steps of high-throughput metaomic technologies for microbial community analysis. (a) Sequencing-based open-format technologies. Extracted DNA/RNA samples are prepared for sequencing by target gene sequencing (TGS), shotgun metagenome sequencing (SMS), and/or metatranscriptome sequencing (MTS). RT, reverse transcription. (b) Data processing and analysis. Both sequencing- and microarray-based data are processed and then statistically analyzed to address specific microbial ecology questions related to community diversity, composition, structure, function, and network, as well as their linkages with environmental factors. (c) Array-based closed-format technologies. For the GeoChip and PhyloChip, extracted DNA is directly labeled and hybridized, while RNA is first reverse transcribed (RT) to cDNA. DNA and RNA can be amplified by whole-community genome amplification (WCGA) or by whole-community RNA amplification (WCRA), respectively, when there is not enough mass for direct hybridization, but this compromises quantification. Images from both arrays are digitized for further data processing and statistical analysis.
FIG 2 Illustration of random sampling processes and their impacts on the analysis of microbial communities using open- and closed-format metagenomic technologies. (a) A theoretical community contains 50 taxa with 5,000 individuals and follows exponential distribution, λe−λ (λ = 0.01 in this case). The taxa are ranked based on their abundance. Two technical replicates of this community are taken for analysis (sample I and sample II). Also, assume that a microarray is constructed, covering about 50% of the taxa, as indicated by asterisks (*). (b) For sequencing, 1% sampling effort is performed. Overlapping taxa detected in the two samples are indicated by carets (^). (c) The community DNA is directly labeled and hybridized with the microarrays. Because some populations are below the detection limit, only certain portions are detected. Overlapping taxa detected in the two samples are also indicated by carets (^). In both cases (b and c), similar numbers of taxa were detected. (d and e) Jaccard and Bray-Curtis overlaps for the open- and closed-format technologies.
FIG 3 An integrated workflow for analyzing microbial communities from different environments using high-throughput metaomic technologies. DNA, RNA, proteins, and/or metabolites are extracted from environmental samples for sequencing and protein/metabolite identification. At the same time, physiological, ecological, and functional information can be obtained via reference genomes and single-cell genomics, which helps with sequencing data analysis and functional annotation, generating useful information for microarray development, especially with novel genes. Microarray-based technologies can be used as a routine tool to address various microbial ecology questions in a rapid and cost-effective manner. Furthermore, metagenomic, metaproteomic, metametabolomic, stable isotope probing, and microarray data can be used alone or coupled with metadata for network analysis and modeling, understanding of microbial diversity, distribution and assembly mechanisms, and linking the microbial community structure with both environmental factors and ecosystem functioning.