Literature DB >> 30057749

A comparison of computationally predicted functional metagenomes and microarray analysis for microbial P cycle genes in a unique basalt-soil forest.

Erick S LeBrun1, Sanghoon Kang1.   

Abstract

Here we compared microbial results for the same Phosphorus (P) biogeochemical cycle genes from a GeoChip microarray and PICRUSt functional predictions from 16S rRNA data for 20 samples in the four spatially separated Gotjawal forests on Jeju Island in South Korea. The high homogeneity of microbial communities detected at each site allows sites to act as environmental replicates for comparing the two different functional analysis methods. We found that while both methods capture the homogeneity of the system, both differed greatly in the total abundance of genes detected, as well as the diversity of taxa detected. Additionally, we introduce a more comprehensive functional assay that again captures the homogeneity of the system but also captures more extensive community gene and taxonomic information and depth. While both methods have their advantages and limitations, PICRUSt appears better suited to asking questions specifically related to microbial community P as we did here. This comparison of methods makes important distinctions between both the results and the capabilities of each method and can help select the best tool for answering different scientific questions.

Entities:  

Keywords:  GeoChip; Metagenome; MiSeq; PICRUSt; microbial communities; nutrient cycling; phosphorus

Year:  2018        PMID: 30057749      PMCID: PMC6051228          DOI: 10.12688/f1000research.13841.1

Source DB:  PubMed          Journal:  F1000Res        ISSN: 2046-1402


Introduction

Relating the functionality of microbes to environmental factors is one of the primary goals in microbial ecology. With the advent of modern genomic technologies, such as next generation sequencing and microarray hybridization, there are more options than ever to test environmental community’s genomics and functional capabilities. Metagenome sequencing is one of the most thorough and comprehensive methods currently available for looking at microbial community gene compositions [1– 5], but can be costly and generate enormous data sets that require a large amount of work in processing, analysis, and storage. Two technologies currently in use for looking at community functional profiles that can be less expensive and more accessible than metagenome sequencing include computationally predicted functional metagenomes (PFMs) [6] and microarray analyses [7]. These technologies both have known advantages and disadvantages [8], but investigation into how they compare in the same system is still needed. Here we compare PFMs from Phylogenetic Investigation of Communities by Reconstruction of Unobserved States (PICRUSt) [6] to GeoChip [9] microarray data. While both methods are distinct, they can each be applied to an environmental community gene pool to estimate the presence and abundance of genes within the community genomic landscape related to function. Resulting datasets from each technique are tables showing counts of genes or functions as determined by either probes (microarray) or reference data (PFMs), and therefore are directly comparable in the context of functional gene landscapes within the system. We utilize 20 sites in a unique basalt-soil Gotjawal forest on Jeju Island in Korea. Despite being both rocky, lava-formed basalt and having dense vegetation [10], this forest is considered a wetland environment due to the homogenous, rocky soil and its capacity for absorbing water [11]. All 20 sites, though spatially separated by distance of 5 km to 65 km ( Figure S1), showed strong homogeneity in bacterial/archaeal community assemblies in 16S rRNA gene taxonomic analysis ( Figure S2) and so act as replicates in this system for the current study. This makes it ideal for comparing the technologies. We specifically look at how these technologies perform related to the same phosphorus (P) cycle genes as the unique basalt-soil environment has the potential to be a unique P environment [12– 14].

Methods

Data origination and processing

GeoChip 4.0 data for P cycle genes came from Kim et al. [15]. For sequencing data, we started with raw sequencing files also from the study by Kim et al. [16]. Paired-end reads were combined using the join-fastq algorithm from eautils [17]. Un-paired reads were discarded at this time. Additional sequence processing was performed using Quantitative Insights Into Microbial Ecology (QIIME) version 1.9.1 [18]. Sequences were then filtered with a maximum unacceptable Phred quality score of 20. Chimeric sequences were identified and removed using the UCHIME algorithm within USEARCH [19]. Operational taxonomic unit (OTU) picking was performed via open reference using uclust against the Greengenes 13_8 database with a 0.97 similarity cutoff [20]. Singleton sequences were removed during OTU picking and taxonomy was assigned with Greengenes 13_8 database as reference. Only reads identified in closed reference picking were used for the PICRUSt analysis. Using PICRUSt [6], predicted functional metagenomes (PFMs) were constructed from the resulting 16S rRNA sequences. PFMs were generated using the Kyoto Encyclopedia of Genes and Genomes (KEGG) database [21, 22] as a functional reference.

Genes studied

The GeoChip 4.0 data provided probe data for genes identified as “phytase”, “ppk”, and “ppx”. We identified these genes in the KEGG database to have the KEGG orthology (KO) numbers K01083 and K01093 for phytase, K00937 for ppk, and K01514 for ppx These KO numbers were the only PICRUSt results extracted for direct comparison. Additionally, we built another P assay in PICRUSt utilizing 417 KO numbers associated with P ( Table S1).

Statistical analyses

All analyses were performed in the R software package v.3.2.3 [23]. The relationship between the PICRUSt and GeoChip data was tested using a Mantel test with the Pearson correlation method and 1,000 permutations through the vegan package [24]. Non-metric multidimensional scaling (NMDS) ordinations were constructed using Bray-Curtis dissimilarity through the vegan package. A PROcrustean randomization TEST of community environment concordance (PROTEST), a potentially more sensitive detection method than a Mantel test, was also used to compare the NMDS ordinations to each other [25]. Figures and plots were created using the ggplot2 package [26].

Results and discussion

Both PICRUSt and GeoChip appear to have captured the homogeneity of the system ( Figure 1). PICRUSt captured much more diversity and depth in terms of taxa identified ( Figure 1) and total counts ( Figure 2) than GeoChip. PICRUSt identified organisms from 40 different phyla where GeoChip identified organisms from 15. Total counts at each site for the two methods were on a very different scale. When placed on a scale that shows the variation in each set of counts, it becomes apparent that the trends of total counts across sites do not match between methods ( Figure S3). The Mantel test resulted in no significant statistic between the two data sets and Procrustes analysis confirmed this, showing no significant correlation either ( Figure S4). The same analyses were performed with the data for each gene isolated and each of the three genes independently provided similar results of inconsistency between methods to the comparison of total gene datasets. There was no correlation between the datasets in Mantel or Procrustes analysis and gene counts and trends were markedly different.
Figure 1.

Bubble plots of taxa relative abundance detected by the GeoChip 4.0 array PICRUSt from 16S rRNA data for P cycle genes found on GeoChip array.

Figure 2.

Plot of total P cycle gene counts as detected by PICRUSt and GeoChip at each site.

The new PICRUSt assay with 417 P related genes captured the system homogeneity but with additional depth ( Figure S5). The new assay identified organisms from 41 phyla similar to the smaller, comparative assay’s 40 but also provided data counts per site ranging from ~70,000 to ~110,000. The PICRUSt dataset from the new assay not only represents what is likely a better dataset for answering community functional questions within the P cycle than the previous, comparative PICRUSt or GeoChip datasets, but also illustrates an important difference between the two methods. While both methods could be considered “closed-format” technologies in that they are reliant on the available known references [8], the process of adapting or updating the two methods contrasts. The method of using computational predictions is highly adaptable and allows for the easy inclusion or exclusion of additional genes [6]. Improving or expanding the reference database that computational prediction can be achieved through simply updating the curated reference database. The microarray method is more involved including the identification, creation, and inclusion of specific target probes into the manufacturing of a microarray [7]. It is important to note that for our comparison we are specifically looking at functional genes within the P biogeochemical cycle. Both methods explored are designed for, and capable of looking a more comprehensive whole functional profile for communities. Computational functional prediction seems to be better suited to the task of viewing independent functional groupings as we did here. While microarrays have shown linear relationships to RNA and DNA levels in environmental systems [16, 27], they are limited in coverage and small sequence divergence can affect quantitative capability [7]. These quantitative limitations should be carefully considered in light of recent findings showing that the composition of P cycle genes in some microbial communities are more closely related to environmental P levels than absolute abundance [1]. Computational functional prediction again seems better equipped to handle questions related to functional gene composition due to the high specificity of probes to taxa and limited genes included in microarrays. It is also important to note that the data from both methods is representative of DNA present in microbial communities and not true expression levels or enzyme abundance.

Conclusions

Computational functional prediction and microarray analysis of P cycle genes both captured system homogeneity. However, they did not agree in terms of capturing absolute abundance or taxonomic composition in P cycle genes. Computational functional prediction provided more count depth and taxonomic diversity than microarray analysis did. The ease with which computational functional prediction is adapted additionally allowed for the capture of additional genes and taxonomic diversity in P function along with increased depth by expanding the PICRUSt assay to include 417 KO numbers related to P function instead of the original 4 used in the microarray comparison. While we compared two methods for the exploration of functional P cycle genes within microbial communities to each other, an additional comparison to whole metagenome data in a system would further validate either method.

Data availability

The sequence data used in this study was deposited in the NCBI Sequence Read Archive (SRA) under the BioSample accession numbers SAMN06049757 to SAMN06049776. The GeoChip microarray data used in this study is available in OSF: http://doi.org/10.17605/OSF.IO/AT93H [28]. Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0). LeBrun and Kang compared two metagenomic functional gene datasets of the same microbial community, functional gene microarray and PICRUSt 16S rRNA sequencing data.  PICRUSt detected a much higher gene diversity and abundance than the microarray.  Some of that difference could be accounted for by group probes on the GeoChip microarray that cover two or more similar sequences, although PICRUSt offers a much larger database.  The authors make a good point about the relative ease of expanding coverage by including additional genes for the PICRUSt method.  Microarray require additional steps and manufacturing a new array compared to simply updating a database for PICRUSt.  The methods used and the conclusions are sound.  This paper can be indexed. Minor edit - The second Kim et al. reference in the methods section has an incorrect citation - I believe it should be 10 (or 15) I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. The authors reported a parallel comparison on two most popular high-throughput technologies on environmental microbiota studies, 16S sequencing and GeoChip. Other than taxa aspect, they focused on the comparison of functional gene involving in P. Overall, the methods and results are sound that 16S sequencing detected more taxa involving in P since 417 KO could be predicted by PtCRUST rather than 3 KO in GeoChip. It is unsurprising because functional gene microarray can only pick up limited number of genes in each bio-process as marked functional traits, while the high-throughput sequencing can be much more open to uncultured and novel taxa in environments. Zhou et al. already mentioned this opinion in their review paper [1] as well. Thus, according to the policy of F1000, I think this paper can be indexed. I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
  16 in total

1.  KEGG: kyoto encyclopedia of genes and genomes.

Authors:  M Kanehisa; S Goto
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  Development and evaluation of functional gene arrays for detection of selected genes in the environment.

Authors:  L Wu; D K Thompson; G Li; R A Hurt; J M Tiedje; J Zhou
Journal:  Appl Environ Microbiol       Date:  2001-12       Impact factor: 4.792

Review 3.  The soil metagenome--a rich resource for the discovery of novel natural products.

Authors:  Rolf Daniel
Journal:  Curr Opin Biotechnol       Date:  2004-06       Impact factor: 9.740

Review 4.  Development of functional gene microarrays for microbial community analysis.

Authors:  Zhili He; Ye Deng; Jizhong Zhou
Journal:  Curr Opin Biotechnol       Date:  2011-11-16       Impact factor: 9.740

5.  Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB.

Authors:  T Z DeSantis; P Hugenholtz; N Larsen; M Rojas; E L Brodie; K Keller; T Huber; D Dalevi; P Hu; G L Andersen
Journal:  Appl Environ Microbiol       Date:  2006-07       Impact factor: 4.792

6.  A metagenome-wide association study of gut microbiota in type 2 diabetes.

Authors:  Junjie Qin; Yingrui Li; Zhiming Cai; Shenghui Li; Jianfeng Zhu; Fan Zhang; Suisha Liang; Wenwei Zhang; Yuanlin Guan; Dongqian Shen; Yangqing Peng; Dongya Zhang; Zhuye Jie; Wenxian Wu; Youwen Qin; Wenbin Xue; Junhua Li; Lingchuan Han; Donghui Lu; Peixian Wu; Yali Dai; Xiaojuan Sun; Zesong Li; Aifa Tang; Shilong Zhong; Xiaoping Li; Weineng Chen; Ran Xu; Mingbang Wang; Qiang Feng; Meihua Gong; Jing Yu; Yanyan Zhang; Ming Zhang; Torben Hansen; Gaston Sanchez; Jeroen Raes; Gwen Falony; Shujiro Okuda; Mathieu Almeida; Emmanuelle LeChatelier; Pierre Renault; Nicolas Pons; Jean-Michel Batto; Zhaoxi Zhang; Hua Chen; Ruifu Yang; Weimou Zheng; Songgang Li; Huanming Yang; Jian Wang; S Dusko Ehrlich; Rasmus Nielsen; Oluf Pedersen; Karsten Kristiansen; Jun Wang
Journal:  Nature       Date:  2012-09-26       Impact factor: 49.962

7.  A Metagenome-Based Investigation of Gene Relationships for Non-Substrate-Associated Microbial Phosphorus Cycling in the Water Column of Streams and Rivers.

Authors:  Erick S LeBrun; Ryan S King; Jeffrey A Back; Sanghoon Kang
Journal:  Microb Ecol       Date:  2018-03-22       Impact factor: 4.552

8.  GeoChip 4: a functional gene-array-based high-throughput environmental technology for microbial community analysis.

Authors:  Qichao Tu; Hao Yu; Zhili He; Ye Deng; Liyou Wu; Joy D Van Nostrand; Aifen Zhou; James Voordeckers; Yong-Jin Lee; Yujia Qin; Christopher L Hemme; Zhou Shi; Kai Xue; Tong Yuan; Aijie Wang; Jizhong Zhou
Journal:  Mol Ecol Resour       Date:  2014-03-14       Impact factor: 7.090

9.  Detection of genes involved in biodegradation and biotransformation in microbial communities by using 50-mer oligonucleotide microarrays.

Authors:  Sung-Keun Rhee; Xueduan Liu; Liyou Wu; Song C Chong; Xiufeng Wan; Jizhong Zhou
Journal:  Appl Environ Microbiol       Date:  2004-07       Impact factor: 4.792

Review 10.  High-throughput metagenomic technologies for complex microbial community analysis: open and closed formats.

Authors:  Jizhong Zhou; Zhili He; Yunfeng Yang; Ye Deng; Susannah G Tringe; Lisa Alvarez-Cohen
Journal:  MBio       Date:  2015-01-27       Impact factor: 7.867

View more
  5 in total

1.  Analysis of bacterial and fungal communities in fermented fish (pla-ra) from Northeast Thailand.

Authors:  Auttawit Sirichoat; Viraphong Lulitanond; Kiatichai Faksri
Journal:  Arch Microbiol       Date:  2022-05-06       Impact factor: 2.552

2.  The Seasonal Patterns, Ecological Function and Assembly Processes of Bacterioplankton Communities in the Danjiangkou Reservoir, China.

Authors:  Zhao-Jin Chen; Yong-Qi Liu; Yu-Ying Li; Li-An Lin; Bao-Hai Zheng; Ming-Fei Ji; B Larry Li; Xue-Mei Han
Journal:  Front Microbiol       Date:  2022-06-15       Impact factor: 6.064

3.  Rhizosphere Bacterial Community Structure and Predicted Functional Analysis in the Water-Level Fluctuation Zone of the Danjiangkou Reservoir in China During the Dry Period.

Authors:  Zhao-Jin Chen; Yang Shao; Ying-Jun Li; Li-An Lin; Yan Chen; Wei Tian; Bai-Lian Li; Yu-Ying Li
Journal:  Int J Environ Res Public Health       Date:  2020-02-16       Impact factor: 3.390

4.  Structural and Functional Impacts of Microbiota on Pyropia yezoensis and Surrounding Seawater in Cultivation Farms along Coastal Areas of the Yellow Sea.

Authors:  Arsalan Ahmed; Anam Khurshid; Xianghai Tang; Junhao Wang; Tehsin Ullah Khan; Yunxiang Mao
Journal:  Microorganisms       Date:  2021-06-12

5.  Responses of rhizosphere soil bacteria to 2-year tillage rotation treatments during fallow period in semiarid southeastern Loess Plateau.

Authors:  Qing Xia; Xiaoli Liu; Zhiqiang Gao; Jianming Wang; Zhenping Yang
Journal:  PeerJ       Date:  2020-05-05       Impact factor: 2.984

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.