| Literature DB >> 31213523 |
Zhou Shi1, Huaqun Yin2, Joy D Van Nostrand1, James W Voordeckers1, Qichao Tu3, Ye Deng3, Mengting Yuan1, Aifen Zhou1, Ping Zhang1, Naijia Xiao1, Daliang Ning1, Zhili He4, Liyou Wu1, Jizhong Zhou5,6,7.
Abstract
While functional gene arrays (FGAs) have greatly expanded our understanding of complex microbial systems, specificity, sensitivity, and quantitation challenges remain. We developed a new generation of FGA, GeoChip 5.0, using the Agilent platform. Two formats were created, a smaller format (GeoChip 5.0S), primarily covering carbon-, nitrogen-, sulfur-, and phosphorus-cycling genes and others providing ecological services, and a larger format (GeoChip 5.0M) containing the functional categories involved in biogeochemical cycling of C, N, S, and P and various metals, stress response, microbial defense, electron transport, plant growth promotion, virulence, gyrB, and fungus-, protozoan-, and virus-specific genes. GeoChip 5.0M contains 161,961 oligonucleotide probes covering >365,000 genes of 1,447 gene families from broad, functionally divergent taxonomic groups, including bacteria (2,721 genera), archaea (101 genera), fungi (297 genera), protists (219 genera), and viruses (167 genera), mainly phages. Computational and experimental evaluation indicated that designed probes were highly specific and could detect as little as 0.05 ng of pure culture DNAs within a background of 1 μg community DNA (equivalent to 0.005% of the population). Additionally, strong quantitative linear relationships were observed between signal intensity and amount of pure genomic (∼99% of probes detected; r > 0.9) or soil (∼97%; r > 0.9) DNAs. Application of the GeoChip to a contaminated groundwater microbial community indicated that environmental contaminants (primarily heavy metals) had significant impacts on the biodiversity of the communities. This is the most comprehensive FGA to date, capable of directly linking microbial genes/populations to ecosystem functions.IMPORTANCE The rapid development of metagenomic technologies, including microarrays, over the past decade has greatly expanded our understanding of complex microbial systems. However, because of the ever-expanding number of novel microbial sequences discovered each year, developing a microarray that is representative of real microbial communities, is specific and sensitive, and provides quantitative information remains a challenge. The newly developed GeoChip 5.0 is the most comprehensive microarray available to date for examining the functional capabilities of microbial communities important to biogeochemistry, ecology, environmental sciences, and human health. The GeoChip 5 is highly specific, sensitive, and quantitative based on both computational and experimental assays. Use of the array on a contaminated groundwater sample provided novel insights on the impacts of environmental contaminants on groundwater microbial communities.Entities:
Keywords: functional gene array; microarrays; microbial communities
Year: 2019 PMID: 31213523 PMCID: PMC6581690 DOI: 10.1128/mSystems.00296-19
Source DB: PubMed Journal: mSystems ISSN: 2379-5077 Impact factor: 6.496
Summary of probes on GeoChip 5.0M by functional gene categories
| Functional gene | No. of: | % of probe | |||||
|---|---|---|---|---|---|---|---|
| Subcategories | Genes or | Sequence-specific | Group-specific | Total | Covered | ||
| C cycling | 3 | 118 | 4,354 | 19,261 | 23,615 | 50,040 | +114 |
| N cycling | 7 | 22 | 2,397 | 3,600 | 5,997 | 11,654 | −19 |
| S cycling | 5 | 17 | 1,969 | 2,317 | 4,286 | 6,823 | +38 |
| P cycling | 4 | 7 | 960 | 2,300 | 3,260 | 6,245 | +143 |
| Metal homeostasis | 24 | 105 | 5,084 | 37,543 | 42,627 | 91,614 | +360 |
| Organic contaminant | 7 | 157 | 2,204 | 9,241 | 11,445 | 27,938 | −33 |
| Electron transport | 3 | 35 | 612 | 1,348 | 1,960 | 3,351 | +72.3 |
| Stress response | 18 | 86 | 2,098 | 23,634 | 25,732 | 79,356 | +19 |
| Plant growth | 7 | 31 | 957 | 2,263 | 3,220 | 5,720 | NA |
| Microbial defense | 4 | 87 | 3,284 | 19,954 | 23,238 | 50,019 | +597 |
| Virulence | 10 | 587 | 1,264 | 3,596 | 4,860 | 10,863 | +30 |
| Virus specific | 4 | 115 | 1,521 | 1,336 | 2,857 | 5,182 | +167 |
| Protozoan specific | 10 | 84 | 845 | 615 | 1,460 | 2,146 | NA |
| Fungus specific | 9 | 66 | 2,559 | 2,079 | 4,638 | 6,987 | −7 |
| GyrB | 1 | 1 | 532 | 2,234 | 2,766 | 9,997 | +18 |
| Total | 116 | 1,447 | 30,640 | 131,321 | 161,961 | 365,651 | +97 |
NA (not applicable) because this is a new category for GeoChip 5.0.
Total number of covered coding DNA sequences (CDS) does not equal the sum of those from individual categories due to the presence of CDS that were covered in two or more categories.
Detailed information on individual subcategories of functional genes is presented in Table S1.
Summary of probes in GeoChip 5.0M within broad microbial groups
| Major microbial | No. of: | % of probe | ||||||
|---|---|---|---|---|---|---|---|---|
| Phyla | Genera | Species | Strains | Genes | Probes | Covered | ||
| Bacteria | 33 | 1,122 | 2,721 | 6,465 | 1,003 | 141,153 | 333,675 | +93 |
| Archaea | 6 | 101 | 188 | 282 | 269 | 5,728 | 38,978 | +124 |
| Fungi | 7 | 297 | 404 | 625 | 226 | 8,856 | 21,101 | +130 |
| Protists | 10 | 219 | 251 | 362 | 201 | 2,051 | 5,376 | |
| Other eukaryotes | 7 | 64 | 66 | 86 | 62 | 509 | 1,170 | |
| Viruses | 1 | 167 | 311 | 1,364 | 116 | 2,848 | 6,028 | +166 |
| Unclassified | 125 | 816 | 2,561 | +116 | ||||
| Total | 64 | 1,970 | 3,941 | 9,184 | 1,447 | 161,961 | 365,651 | +97 |
Other eukaryotes include Metazoa and Viridiplantae.
Total number of genes does not equal the sum of those from individual taxonomic groups due to the presence of the genes shared across two or more taxonomic groups.
Total number of covered CDS does not equal the sum of those from individual taxonomic groups due to the presence of the CDS covered in two or more taxonomic groups.
The sequences are unclassified due to missing annotations in the data source; most of these are metagenomics sequencing contigs.
Detailed information on the phylogenetic distribution of functional genes is in Table S2.
FIG 1Relationship between detected spots and the concentration of community DNAs used. (a) Hybridization of grassland soil community DNAs with GeoChip 5.0S (see the images in Fig. S2). (b) Hybridization of community DNAs from a wastewater treatment plant with GeoChip 5.0M. Different amounts of unamplified community DNAs were labeled with Cy3 in triplicate. Hybridizations were carried out at 67°C plus 10% formamide for 24 h. Any spots with a signal-to-noise ratio (SNR) of >2 were considered positive.
FIG 2Computational evaluation of the specificity of the designed probes based on sequence identity, length of continuous sequence stretch, and free energy. Three parameters were evaluated by comparing the designed probes to sequences in the databases. (a) Maximal sequence identity (%) of a probe (sequence or group specific) to its closest nontarget sequences. (b) Minimal sequence identity (%) of a group-specific probe to its targeted group sequences. (c) Maximal sequence stretch length (bp) of a probe to its closest nontarget sequences. (d) Minimal sequence stretch length (bp) of a group-specific probe to its targeted group sequences. (e) Minimal free energy (kcal/mol) of a probe to its closest nontarget sequence. (f) Maximal free energy (kcal/mol) of a group-specific probe to its targeted group sequences.
FIG 3Experimental evaluation on the specificity of designed arrays with perfect match (PM)/mismatch (MM) probes. One hundred nanograms of genomic DNAs was labeled with Cy3 and hybridized with a modified GeoChip 5.0S in triplicate. For each PM or MM pair, the net signal intensity was obtained by subtracting the signal intensity from the Agilent negative controls within a subarray from the raw signal intensity. The ratio of PM to MM probe pairs was estimated. DvH, D. vulgaris Hildenborough.
FIG 4Sensitivity evaluation of the designed arrays with pure genomic DNAs. Genomic DNAs from D. vulgaris Hildenborough and H10 (0.05 ng to 100 ng) were mixed with grassland soil community DNAs as a background to equal 1,000 ng. The mixed DNAs were labeled with Cy3 and hybridized in triplicate to a GeoChip 5.0S containing 938 probes each from D. vulgaris Hildenborough (DvH) and H10.
FIG 5Quantitative evaluation of the designed arrays with pure culture and soil community DNAs. (a) Relationship of total signal intensity of all detected spots to the amount of pure culture DNAs used. (b) Relationship of the signal intensity of selected representative probes to the amount of pure culture DNAs used. (c) Distribution of Pearson correlation coefficients (r) based on individual spots for pure culture DNAs. (d) Relationship of total signal intensity of all detected spots to the amount of soil community DNAs used. (e) Relationship of signal intensity of selected representative probes to the amount of soil community DNAs used. (f) Distribution of Pearson correlation coefficients (r) based on individual spots for soil community DNAs.
FIG 6Associating variations in microbial functional gene structure with environmental variables. (a) CCA based on selected environmental variables. A total of 7 environmental factors (U, pH, redox, Se, O2, DIC, and DOC) were selected from 41 measured variables. The top two axes (CCA1 and CCA2) were included and accounted for 50.7% and 13.8% microbial functional gene structure variation, respectively. (b) Partial CCA-based VPA assigning variance to U, pH, and DOC. The value inside each colored circle indicates the fraction of variance assigned to that variable alone. Asterisks show level of significance of test in partial CCA: *, P < 0.05; **, P < 0.01. The value by the solid black line indicates the variance assigned to the interactive effect of the two connected variables. The value inside the dashed triangle indicates the variance assigned to the interactive effect of all three variables.
Comparison of GeoChip and shotgun metagenomics sequencing
| Contamination level and sequencing method | Functional gene richness (no. of genes) | No. of significantly different genes |
|---|---|---|
| L0 | ||
| Shotgun | 6,166 ± 415 | |
| GeoChip | 63,739 ± 3,663 | |
| L1 | ||
| Shotgun | 5,462 ± 396 | 782 |
| GeoChip | 66,225 ± 12,710 | 1,987 |
| L2 | ||
| Shotgun | 6,040 ± 180 | 221 |
| GeoChip | 53,999 ± 7,848 | 1,501 |
| L3 | ||
| Shotgun | 5,231 ± 285 | 971 |
| GeoChip | 53,357 ± 11,180 | 832 |
Standard deviation of triplicate samples.
Compared to gene abundance in L0, t test.