Literature DB >> 28348642

Metagenomic Analysis of a Complex Community Present in Pond Sediment.

Abstract

The metagenomic profiling of complex communities is gaining immense interest across the scientific community. A complex community present in the pond sediment of a water body located close to a hexachlorocyclohexane (HCH) production site of the Indian Pesticide Limited (IPL) (Chinhat, Lucknow) was selected in an attempt to identify and analyze the unique microbial diversity and functional profile of the site. In this study, we supplement the metagenomic study of pond sediment with a variety of binning approaches along with an in depth functional analysis. Our results improve the understanding of ecology, in terms of community dynamics. The findings are crucial with respect to the mechanisms such as those involving the lin group of genes that are known to be implicated in the HCH degradation pathway or the Type VI secretory system (T6SS) and its effector molecules. Metagenomic studies using the comparative genomics approach involving the isolates from adjacent HCH contaminated soils have contributed significantly towards improving our understanding of unexplored concepts, while simultaneously uncovering the novel mechanisms of microbial ecology.

Entities: Chemical Disease Mutation Species

Keywords: Functional diversity; genomic variation.; pond sediment

Year: 2017 PMID： 28348642 PMCID： PMC5362764 DOI： 10.7150/jgen.16685

Source DB: PubMed Journal: J Genomics

Introduction

Environmental pollution caused by the presence of highly persistent organic pollutants (POPs) has been identified as a problem at several places in Lucknow, India (7, 35). The presence of pesticides, especially those of organochlorine (OC) and organophosphate (OP) origin, in the soil significantly impacts our ecosystem. Although these pesticides might be present in low concentrations but due to biomagnifications they acquire the potential to introduce mutagenic and/or carcinogenic changes. Soil is naturally rich for the several uncultured microbes that aid, directly or indirectly, in the metabolism of these pesticides and their conversion to safer compounds. From this point of view, the field of metagenomics has been instrumental in elucidating and characterizing microflora belonging to different environmental niches. The studies have revealed the diverse functional contribution of the microbial community towards the environment. These contributions and functional roles include acid mine drainage 1, presence of bacterial rhodopsin in the marine metagenome 2, and the existence of degradation pathways for xenobiotic compounds and other unique cellular processes as required in stressed niches 3, 4, 5. The extensive analysis of the cultivable 6, 7, 8, 9, 10, 11, 12 and uncultivable diversity 4, 5 of the HCH dumpsite located at the Ummari village in Lucknow, India, has revealed the presence of several potent HCH degraders as well as several HCH tolerant bacterial species. lin Genes, a group of genes involved in catabolic activities, are known to mediate the degradation of HCH isomers 13, 14, 15, 16 as well as play a major role in the degradation of xenobiotic compounds 17. A comparative metagenomic survey of HCH contaminated soil was carried out considering different gradients, such as DS (dumpsite) (450 mg/kg of ∑HCH), 1 Km (0.7 mg/kg of ∑HCH) and 5 Km (0.03 mg/kg of ∑HCH), to determine the microbial dynamics active at stressed site. The comparative metagenomics presents new opportunities to access significant difference and similarity between communities. It also highlights the importance of functional identity for communities existing in specific environments. At such sites, the selection pressure of xenobiotic compounds is linearly correlated to increase in genome size and acquisition of foreign genes. Thus the environment specific and strain specific traits in these genomes have been well implicated in ancestral evolution and positive selection of bioremediation traits 4, 5. Pond sediment has been previously reported to harbor pesticides such as lindane (57.6 ng/mL), α-endosulfan (80.9 ng/mL), β-endosulfan (220.9 ng/mL), chlorpyriphos (20.9 ng/mL), monocrotophos (8.3 ng/mL), dimethoate (2.9 ng/mL) and malathion (1.6 ng/mL) (35). As per the physiochemical analysis conducted by our group, the HCH content of pond sediment was found to be very low (> 0.909 mg/kg). In this study, we performed a metagenomic analysis of pond sediment (Chinhat village, Lucknow, India) with comparative genomics of isolates from the hexachlorocyclohexane (HCH) dumpsite (Ummari village, Lucknow India) or the pond sediment. In order to decipher the unique mechanisms present within the complex community, we investigated the community structure as well as the genetic structure in the context of the environment. Thus, in an attempt to elucidate the role of microbes in balancing the soil ecosystem, especially in contaminated sites, we (i) performed differential taxonomic binning of the pond sediment metagenome; (ii) compared the functional potential of pond sediment metagenome with that of HCH gradient metagenomes; (iii) conducted a metagenomic reassignment of reads with the help of genomic isolates from the same site or the pond sediment; and (iv) constructed a profile of genomic variation of the pond sediment based on essential single copy genes.

Materials and Methods

Sample Collection and Downstream Physiochemical Analysis

Two sediment samples were collected in triplicate from a pond located near the Chinhat village in Lucknow, India (26° 54ʹ 34.86 ˝ N, 81° 04ʹ 25.22 ˝ E), which is a few meters away from the IPL (India Pesticides Ltd.) factory. Collection was done in the month of January 2011. Weather conditions prevalent at the time of collection were as follows: 20 °C temperature, 0% precipitation, 38% humidity, 15 km/h wind and 966.24 mm average annual rainfall [Lucknow Centre, Geological Survey of India (GSI) Training Institute, Aliganj, Lucknow, India]. The samples were kept in sterile plastic bags, sealed and transported on ice (4°C) and thereafter stored at -80 °C until processed. The physiochemical properties of the sediment samples were analyzed at the Soil Testing Services - Indian Agricultural Research Institute. The extraction and analysis of HCH residues from samples were conducted as per the method described previously by Concha-Grana et al. A recovery of 74.95% with standard deviation of ±1.5% was obtained.

Metagenomic DNA Extraction and Sequencing

Power Max® Soil DNA Isolation Kit (MO-BIO, Carlsbad, CA, USA) was used for the isolation of DNA from the pond sediment samples in triplicate. Equal concentrations from each sample were pooled separately for two PS samples to reveal wide variety and range of the microbial community present in the pond sediment. DNA purity and concentration (PS1 = 1200 ng μL-1, PS2 = 1178 ng μL-1) were estimated using a NanoDrop Spectrophotometer (NanoDrop Technologies Inc., Wilmington, DE, USA). The two samples were then proceeded for DNA sequencing. Clean sequencing data were obtained using the Illumina HiSeq 2000 platform (~28,164,307 and ~28,044,108 paired-end reads of PS1 and PS2, respectively). The data were generated by the removal of adaptors and reads containing the 'N' annotation from the raw data. The insert length and average read length of clean data were 170 bp and 91 bp, respectively.

De-novo Assembly of Pond Sediment Metagenome

The sequence data (Illumina HiSeq 2000 clean data) were assembled using the Velvet_1.2.03 19 software at an optimized k-mer length of 39 (insert length, 170 bp; expected coverage, auto; minimum contig standard deviation for insert length, 20 bp; and length cutoff, 200 bp). The k-mer length was optimized after running Velvet at different k-mer lengths. The CAST algorithm was used to filter the assembled sequences 20. The cutoff for minimum contig length was kept >200 bp in order to ensure a reasonable length for predicted ORFs 3.

Comparative Taxonomic Binning of Metagenomes

Assembled contigs were binned using RAIphy 21 and also by comparing the same to the NCBI non-redundant protein database (ftp://ftp.ncbi.nlm.nih.gov/blast/db/nr, June, 2015) using BLASTX (E-value = 1e-10) 22. The MEtaGenome ANalyzer (MEGAN 5.0) 23 was used at default parameters in order to assign contigs to phylogenetic groups at the genera level. Six metagenomes were classified (PS1, PS2, 1 Km, 5 Km, DS(Dumpsite) and SolexaDS) against the nr database (ftp://ftp.ncbi.nlm.nih.gov/blast/db/nr, June, 2015) (BLASTX, E-value = 1e-10) with and without the isolates (Table obtained from the HCH contaminated site or from other suitable sites of pond sediment. Differential abundance of bacterial taxa at the phylum level among PS1 and PS2 samples were statistically computed using METASTATS (http://metastats.cbcb.umd.edu). The total numbers of subjects were two with 0.05 threshold p value and 1000 permutations. For each group, mean values for the present phyla were computed along with variance and standard error.

Genomic Variation in Pond Sediment Metagenome

For the analysis of the genomic variation present in pond sediment, 74 of the most abundant prokaryotic genomes obtained from taxonomic binning of pond sediment samples and whole genomes isolated from the HCH dumpsite were shortlisted as the set of reference genomes (Table S4). A set of 107 essential single-copy gene HMMs (24) were searched against the predicted ORFs of pond sediment associated genomes using HMMER3 (http://hmmer.janelia.org/). Out of these 107 genes, a set of 60 essential single-copy genes were found to be in consensus with all selected genomes (Table . These marker genes, which were deemed capable of resolving global phylogeny, were then used to determine genomic variation in pond sediment with respect to the reference genome marker genes. All associated analytical procedures were performed using R v. 3.2.2 via the ace function of ape. Principle component analysis of tetranucleotide frequencies of the marker genes was conducted on the genomes and the pond sediment sample in order to form clusters. To identify genomic variation within pond sediment samples, we used a combination of principle components that display the clusters in question. To estimate and analyze the pattern of genotypic diversity, we also performed phylogenetic principal components analysis (PCA) using a maximum-likelihood algorithm for essential single copy genes. The variation in genotypic diversity present within the pond sediment is a function of the complexity of the community. The plot was constructed along two pairs of axes; it was composed of all terminal and internal phylogenetic nodes with the branches connecting the adjacent nodes. The R plot visualization also includes the branches and nodes connecting clades.

Gene Prediction and Functional Annotation

MetaGene gene finder 26 and the metagenome version of MetaProdigal 27 were used to predict open reading frames (ORFs) of the metagenomic contigs. The functional annotation of these predicted ORFs was carried out by database comparison searches. HMMER3 was used to assign Reverse Position-Specific BLAST (RPS-BLAST) of peptide sequences present in pond sediment metagenomes to COG clusters 28 and PFAM families 29 present in the NCBI Conserved Domain Database (E-value < 1e-5 and Percentage Identity > 20%). Pathway annotation of peptide sequences was done against the KEGG GENES 30 database using Reverse Position-Specific BLAST (RPS-BLAST) 22 (Bit Score > 60 and bi-directional hit rate (BHR) > 0.95). Rarefaction curve, two sided Fisher's exact test and Storey's FDR methods were carried out on the Pfam 29 database using STAMP 31. Principle Component Analysis (PCA) of COG categories of the metagenomic samples was conducted using R v. 3.2.2 [https://cran.r-project.org/bin/windows/base/old/3.2.2/]. Shannon diversity for each metagenomic sample was calculated for functional COG categories and microbial diversity by EGT analysis in R using the vegan 32 package. MinPath 33 was used for predicting probable pathways. All-versus-all BLASTP (default parameters) followed by MCL clustering (http://micans.org/mcl/) was used to determine clusters of homologs in order to validate the pathways.

Data Availability

The data obtained post-analysis of the pond sediment was submitted to NCBI SRA under the accession ID SRP047092. Metagenomic sequence data obtained from previous metagenomic reads have been deposited at MG-RAST ID under the study accessions: DS =4461840.3, 1 Km =4461013.3 and 5 Km =4461011.3 and at GenBank/DDBJ/EMBL : SolexaDS = ERP001726.

Results

Sequence data generation and assembly

The metagenomic data obtained after analyzing the pond sediment from Chinhat, Lucknow resulted in 106 × 106 Illumina HiSeq 2000 paired-end reads of PS1 and PS2 each. These paired-end reads were arrived at after quality filtering raw data for adapter sequences and 'N' nucleotide reads. After assembling the clean data on Velvet_1.2.03 19 at an optimized k-mer length of 39,774,416 and 1,483,949 contigs with n50; 158 and 116 of PS1 and PS2 samples were obtained, respectively (Table . The percentage of reads used for the assembly process was 50.17% for PS1 and 30.25% for PS2. The quality of assembly was checked by CAST algorithm to obtain good quality of sequence for the analysis of complex communities present in the metagenomic samples. In total 243,624 and 206,869 ORFs were predicted for PS1 and PS2, respectively.

Physiochemical Properties of the Sediment

The electrical conductivity across the pond sediment [1.10 dS/m (PS1) and 0.73 dS/m (PS2)] was lower as compared that obtained for DS (8.5 dS/m) 4. Along similar lines, when compared to DS, the potassium content of the pond sediment was <10% while the salinity was within the normal range (PS1 = 89 kg/ha and PS2 = 90 kg/ha; DS: 918 kg/ha) 4. Physiological properties of the pond sediments (PS1 and PS2) exhibited significant differences (p<0.00001) based upon the different HCH contaminated locations (DS, 1 Km and 5 Km) (Table 4. The residue of HCH isomer contamination detected in the sediment samples is elucidated in Table . The collective concentration of all HCH isomers found in the pond sediment was 0.905 ±0.003 mg/kg (PS1) and 0.909 ±0.006 mg/kg (PS2). In all the sediment samples collected for this study, it was found that the β-HCH and α-HCH isomer content was less than 0833 ±0.003 mg/kg and greater than 0.076 ±0.002 mg/kg, respectively.

Phylogenetic Reassignment of Metagenomic Reads using Sequenced Whole Genomes from Contaminated Sites

The percentage of assigned reads was higher when the metagenomic reads were aligned to a nonredundant (nr) database along with the whole genome data rather than a nonredundant (nr) database only (p value < 0.0001) (Fig. . In pond sediment metagenome, the improved metagenomic reassignment was computed to be 1.02%, whereas the same cumulatively for all the metagenomes was determined to be 0.44%. Percentage of reads reassigned to major phyla in pond sediment were as follows: Proteobacteria [16.35%], Actinobacteria [7.67%], Chloroflexi [5.48%], Acidobacteria [4.45%], Planctomycetes [4.22%], Verrucomicrobia [2.48%], Firmicutes [2.38%], Cyanobacteria [1.79%], Thaumarchaeota [0.92%], Euryarchaeota [0.35%], Deinococcus-Thermus [0.28%], Nitrospirae [0.29%], Armatimonadetes [0.25%], Spirochaetes [0.27%], Candidatus [0.12%], Chlorobi [0.06%], Poribacteria [0.11%], Aminicenantes [0.08%] and Ignavibacteriae [0.12%] (Table . Most of the bacteria belong to phylum Proteobacteria, and whole genome sequence information is available for maximum members of this phylum, therefore the outcome of the analysis for this group is best. In addition, this methodology also highlights comparatively rarer phyla such as Candidatus, Cyanobacteria, etc.

Pond Sediment Microbial Diversity Analysis

Phylum level of taxonomic assignment as obtained from BLASTP was displayed using MEGAN (E-value ≥ 1e-5) (Fig. . The cumulative number of unique NCBI non-redundant protein (y-axis) was plotted as the function of the cumulative number of percentage of contigs (x-axis). The curve for PS1 and PS2 becomes flat at the right side, which implies that the abundance of the non-redundant protein family had reached a saturation level (Fig. . Additionally, by using ggplo2 in R v. 3.2.2, a stacked area plot was constructed in order to show that pond sediment has an abundance of microbes spread over a range of phyla. All the components were stacked in this plot (Fig. . p-Value and q-value were chosen in the top order to resolve individual patterns. The relative abundance of dominant bacterial phyla in the pond sediment samples are represented in the plot along with median values, variance, standard error, p-value and q-value. (Table The majority of the microbiota present in pond sediment includes the degraders and contaminant resistant and antibiotic producing microbes. Taxonomical classification across PS1 and PS2 was determined using the Environmental Gene Tag analysis (EGT: a compositional based binning) as well as BLASTP of contigs against NCBI's non-redundant protein database (ftp://ftp.ncbi.nlm.nih.gov/blast/db/nr, June, 2015). According to EGT analysis, the PS1 and PS2 samples have similar profiles (R2 = 0.99). Furthermore, as EGT typing assay annotates the abundance of microbes by presenting an average percentage for each genera along with standard deviation, the following bacteria were identified as the most abundant in both PS1 and PS2 (R2 > 0.99): Corynebacterium (4.33% ±0.53), Rhodococcus (3.06% ±0.90), Bradyrhizobium (2.62% ±0.84), Sorangium (2.25% ±0.99), Thauera (0.94% ±1.41), Methylibium (1.02% ±1.54), Candidatus (2.24% ±3.40), Anaeromyxobacter (5.83% ±2.73), Streptomyces (5.82% ±4.46) and Burkholderia (12.61% ±3.56) (Table .

Genomic Variation within the Pond Sediment Metagenome

The most abundant genomes present within the complex pond sediment communities (Table were identified by clustering 60 essential single copy genes (Table present in the two principle components. The tetranucleotide frequency of essential single copy gene based clusters, as generated by principle component analysis, has proved to be immensely useful in identifying genomes from the metagenome of sediments. A large majority was found to be composed of organisms that are actively evolving to compete and proliferate in synchrony with the microenvironment of the soil (Fig. . However, despite the several tools used for the purpose, many of the genomes present in the clusters could not be clearly demarcated. In order to distinguish between the genotypic diversity present within the pond sediment, we used a combination of principle components by utilizing a maximum-likelihood algorithm that gave us the greatest possible resolution between the genomes (Fig. . In this study, we have demonstrated that genotypic diversity and clustering of genomes can be used to predict the magnitude by which each genome is related to the other and can also be used to reveal how one cluster achieved greater diversity than the other. The differential pattern of clusters and the position of the internal nodes are predictive of the mean distance between adjacent nodes; these values are known to be relatively different within the clusters. The ability of a null Brownian model of genotypic diversity to generate clusters with the observed difference in mean length between the nodes can be tested via simulation. The clusters that are being compared have either diversified under differing sequence of single copy essential genes or remain conformed; this is checked for by the rejection of the simulated Brownian null model for genotypic diversity for the pond sediment. The clusters that are packed more densely in the plot show lower genotypic diversity as compared to the clusters with diffused nodes. Clusters A, C and D appear to have continuously expanded to a larger number of regions whereas clusters B and D are confined only to one region. Each cluster is denoted with a different color and the pond sediment appears to be well associated with cluster B. The Brownian rate parameter was higher in clusters A, C and E than in clusters B and D.

Comparative Functional Annotation of Contaminated Sites

In an attempt to delineate the interaction of the bacteria present in pond sediment 7, 34, 35, 36 with their environment as well as among themselves, we explored their metabolic potential using KEGG (Kyoto Encyclopedia of Genes and Genomes, version 69) 30, PFAM (Pfam 27.0, Protein families database) 29 and COG (Cluster of Orthologs Groups) 28 (Fig. . In line with previously published metagenomic studies 4, 5, pond sediment metagenome was also found to contain a variety of subsystem proteins coding for membrane transport, cell signaling, chemotaxis, transposases, and phage elements (Table . Pond sediment metagenome also contained genes that code for proteins of plasmid related function and resistance against toxins as well as antibiotics. The KEGG analysis results were used to map enzymes present in pond sediment metagenome into the various metabolic pathways that they were involved in. These pathways included the bacterial secretory system, the aromatic compound metabolism (degradation of chlorocyclohexane, chlorobenzene, toluene, catechol, terephthalate and xylene isomers) system and the two component system. Principal component analysis (PCA) was performed on the COG categories of all metagenomic samples (PS1, PS2, DS, 1 Km, 5 Km and SolexaDS) (4, 5) (Fig. ). To compare the variability present among the metagenomic samples, we separately calculated the coefficient of variation (CV) within each sample for both functional as well as EGT analysis. Finally, Shannon diversity was calculated between samples for computing functional (COG analysis) and taxonomic diversity (EGT analysis). Solexa sequencing of the dump site (SolexaDS) revealed the presence of a high degree of diversity in microbes as well as in the functional subsystems of the community (Fig. ). We found that pond sediment is less diverse than the three HCH gradients. According to the gene centric approach for sequencing of the contaminated sites, several enzymes involved in the degradation of chlorocyclohexane and chlorobenzene (ko00361) were found to be present in the pond sediment metagenome. These enzymes mostly belong to the families of hydrolases, oxidoreductase, isomerases, and lyases (Table which assist microbes in tolerating and metabolizing xenobiotic compounds. Genes implicated in HCH (Hexachlorocyclohexane) degradation, especially those involved in the upper pathway, are present in pond sediment metagenomes that are enriched with sequences for the linA, linB (Evalue > 1e-5) genes (Evalue > 1e-5). Genes involved in T6SS were abundant in the pond sediment metagenome and were assigned to the KEGG Orthology (KO) system 30 using reverse position-specific (RPS) BLAST 22 against the KEGG GENES database with a BLAST score (bit score) greater than 60 and a bi-directional hit rate (BHR) of > 0.95 (Fig. .

Discussion

Physiochemical Properties of the Pond Sediment

The presence of a relatively lower electrical conductivity level, salinity, and ion concentration of PS1 and PS2 sediment samples indicated towards positive microbial activity and a reduced stress microbial community that operated with a high metabolic efficiency rate (Table . The physiochemical conditions of the pond sediment match well with the condition of bacteria (predicted by EGT analysis) (Fig. derived from the site of isolation 35. As per the Stockholm Convention of 2009, α-HCH, β-HCH and γ-HCH are considered as persistent organic pollutants. Residue levels of HCH isomers and total HCH obtained from different pond sediment sites were computed during the course of this study; the results indicated towards low levels of HCH contamination (Table The presence of HCH contamination in the pond sediment near the Indian Pesticide Limited (IPL) factory at Chinhat, Lucknow is concordant with the fact that the factory discharges its waste (HCH muck) into the IPL drainage system, which later enters the pond. This phenomenon results in distribution of the contaminant in the pond water as well as in the bank (sediment). The waste produced at the plant comprises predominantly of α- and β-HCH residues. The lindane production plant produces waste that can be classified under two categories: (a) bulk waste fraction comprising mainly of α- and β-HCH isomers; and (b) minor waste fraction, which contains γ- and δ-HCH isomers 37. The prevalence of HCH was less than the permissible limit (>4000 mg/kg for t-HCH). However, the recalcitrant nature and stability of β-HCH and δ-HCH in sediment can lead to its biomagnification and harm to nature.

Differential Binning of Pond Sediment

Phylogenetic origins of a majority of metagenomic reads remain anonymous even after applying most of the popular and credited binning algorithms. The enormous size of the metagenomic sample is one of the crucial factors that interfere with the classification of its reads largely due to the biased similarity with reference genomes in databases. In order to improve the efficiency of the metagenomic binning process, we introduced the concept of metagenomic reassignment of reads. The first step was to annotate the six metagenomes (PS1, PS2, 1 Km, 5 Km, DS and SolexaDS) against nr-database (ftp://ftp.ncbi.nlm.nih.gov/blast/db/nr, June, 2015). Then the annotation was performed against the nr-database (ftp://ftp.ncbi.nlm.nih.gov/blast/db/nr, June, 2015) with the isolates (Table that were obtained from the HCH contaminated site or the pond sediment. The metagenomic reads that were phylogenetically reassigned during the second step were determined and reported in percentage. In pond sediment metagenome, the improved metagenomic reassignment was computed to be 1.02%, whereas the same for all the metagenomes was cumulatively determined to be 0.44%. The use of the refined binning methodology on the pond sediment samples gave the following results: cellular organisms- 61.43%, unassigned reads- 1.98% and, no hit- 2%. It was determined that the cellular organism component in pond sediment predominantly included bacteria [56.39 %], archaea [0.69 %] and eukarya [0.86 %]; it is evident that the presence of bacteria-based communities was in dominant majority (Fig. . The dominant bacterial phyla in pond sediment were identified and these, along with other unknown phyla, were subjected to METASTATS (http://metastats.cbcb.umd.edu) in order to compute differential abundance. The most significant (P-value ≤ 0.05) as well as abundant phyla were explored as a part of this study. Statistically significant phyla that were displayed in stacked plot of R v. 3.2.2 were as follows: Proteobacteria (P-value = 7.8 × 10-7, Q-value = 2.7 × 10-6), Actinobacteria (P-value = 1.0 × 10-26, Q-value = 7.0 × 10-26), Euryarchaeota (P-value = 8.4 × 10-3, Q-value = 2.3 × 10-2), Chloroflexi (P-value = 4.1 × 10-17, Q-value = 1.9 × 10-16), Verrucomicrobia (P-value = 1.7 × 10-2, Q-value = 3.3 × 10-2), Planctomycetes (P-value = 3.1 × 10-2, Q-value = 4.7 × 10-2), Chlorobi (P-value = 1.9 × 10-2, Q-value = 3.3 × 10-2) and Acidobacteria (P-value = 1.7 × 10-2, Q-value = 3.2 × 10-2). All of the above mentioned phyla, especially the Proteobacteria, were predominantly found in the soil fraction. Taxonomical distributions, such as the one outlined above, are in context of the environment in question. Community structure of pond sediment is known to be extremely complex and only a few dominant phyla are capable of thriving under such severely competitive environmental conditions. In order to determine which genera were the most active and abundant in the pond sediment, genomic variation was analyzed using essential single copy genes 24. These 107 essential single copy genes are known to be highly conserved; examples of the same include the ribosomal genes or the conserved core genes. The top 74 genera that were selected from the pond sediment were found to be most abundant as well as to contain all 60 such essential single copy genes (Table . It was observed that the pond sediment was close to cluster B and that the genera dominating this cluster were Beutenbergia, Brachybacterium, Cellulomonas, Cellvibrio, Clavibacter, Isoptericola, Kineococcus, Kitasatospora, Kribbella, Kytococcus, Microbacterium, Micrococcus, Nocardioides, Sanguibacter, Streptomyces, Thermobispora and Xylanimonas (Fig. . These genera are found abundantly in soil specimens and are possibly enriched in the pond sediment due to presence of aromatic hydrocarbons and some other similar factors. The functional properties of the pond sediment are in accord with the presence of genes involved in aromatic compound degradation, secretion, and transport.

Statistical Analysis

Various statistical analyses were performed to prove and validate the significance of metagenomic samples with complex communities. In terms of the Pfam protein domains, the functional profiles of samples PS1 and PS2 are closely correlated to each other (R2 = 0.985) (Fig. The results also highlight the abundance of bacteria belonging to the phylum Actinobacteria and Betaproteobacteria in PS1. Actinobacteria consist of bacterial groups that span a wide array of morphology and are known to demonstrate highly variable physiological and metabolic profiles. Betaproteobacteria on the other hand have been previously reported to degrade xenobiotic compounds as well as to fix nitrogen. PS2 sample was found to be dominated by Alphaproteobacteria and Deltaproteobacteria. These phyla represent phototrophs, plant symbionts and are composed of predominantly of aerobic bacteria. The solid lines with different colors in the rarefaction curve represent ≤ 4 × 106 ORFs each across Dump Site and pond metagenomes (Fig. . The cumulative number of Pfam protein families (y-axis) is plotted as the function of the cumulative number of ORFs (x-axis). The curve for PS1, PS2, 1 Km and 5 Km becomes flat at right, which implies that the abundance of the Pfam protein family has reached a saturation level. However, in case of DS, the slope of the curve was steep, hence there is some scope to determine further the protein family richness. The demography of PS1, PS2, and SolexaDS forms one cluster (Fig. . This can be attributed to similar sequencing data (Illumina Hiseq) and a significantly correlated cluster of ortholog (COG) profiles of respective samples (R2 > 0.75). To compare the variability in Shannon diversity of the metagenomic samples, coefficient of variation was calculated for functional (COG) as well as taxonomic diversity (Fig. . The HCH gradients show a linear pattern, which is indicative of an increase in diversity as distance from DS increases. Pond sediment metagenomic samples show a higher functional diversity rather than taxonomic diversity. It might be due to the microbe-friendly conditions that are prevalent in the pond sediment, which lead to a competition wherein only the dominant genera survive. DS sequencing methodology can be held responsible for the large changes that are observed in diversity. In case of SolexaDS, a very high diversity is observed and this can be due to HCH contamination. As the HCH contaminated site is not favorable for microbes, the possibility of dominant genera affecting the diversity at such sites is expected to be less.

Metabolic Profile of Pond Sediment

Soil is the most complex community known till date and its diversity remains to be adequately decoded and understood. Even with the help of high throughput technologies (e.g., Illumina, Pyrosequencing and Sanger) and the presence of advanced soil DNA extraction kits (MoBio), delineating complete and comprehensive information regarding soil remain a tough challenge; however, it is possible to make some assumptions and generalizations regarding metabolic pathways present at the site. In the specific case of pond sediment, the most abundant subsystem includes factors implicated with amino acids and its derivatives, carbohydrates, cell division and cell cycle, cell wall and capsule, clustering-based subsystems, cofactors, vitamins, prosthetic groups, pigments, DNA metabolism, dormancy and sporulation, fatty acids, lipids, and isoprenoids, iron acquisition and metabolism, membrane transport, metabolism of aromatic compounds and several other processes (Table . These are the populated subsystems, which were assembled from analysis of the different microbes present in pond sediment. Detailed annotation of pond sediment environment helps in creation, curation, population, and exchange of subsystems within the community. Further in-depth functional study of pond sediment revealed a high abundance of microbial cation/multidrug efflux pump system, response regulator (CheY-like receiver, AAA-type ATPase, and DNA-binding domains) and the serine/threonine protein kinase enzyme. The same subsystems were prevalent in other metagenomes (DS, 1 Km, 5 Km and SolexaDS) as well. These trends are constant in all the above mentioned soil and sediment metagenomes. Efflux pumps play an important role in enabling multidrug resistance as they function as an energy-dependent active transport system for pumping out unwanted toxic substances. The structure of the pump complex in bacteria is such that the unwanted toxic substance is exported directly into the external medium rather than into the periplasm. The bacterial cells are benefited by this mechanism because once the toxic substance has been expelled into the external space it then has to pass outer membrane again in order to reenter the cell. These pumps function in conjunction with other outer membrane barriers in order to defend the bacterial cell in extreme niches. The microbial response regulator consists of an N-terminal CheY-like receiver domain and a C-terminal DNA-binding domain. These response regulators are involved in regulating the production of important virulence factors, bacteriocins, and extracellular polysaccharides. Table S7 of the supplementary information enumerates the enzymes implicated in degradation of the aromatic compounds chlorocyclohexane and chlorobenzene as obtained from the annotation profile of the pond sediment. The two enzymes found in pond sediment metagenome, maleylacetate reductase (EC 1.3.1.32) and hydroxyquinol 1, 2-dioxygenase (EC 1.13.11.37), are capable of catalyzing the conversion of hydroxyquinol to 3-oxoadipate when they act together. Enzymes of the modified ortho cleavage pathway yield chlorocatechols after degrading chlorinated aromatic compounds 38. Maleylacetate is converted to 3-oxoadipate by maleylacetate reductase in an enzymatic pathway that is common to the metabolism of several aromatic compounds. A rich diversity is observed in case of dehalogenases owing to the ability of microorganisms to degrade a wide variety of halogenated compounds to different degrees. Focusing on the lin group of genes (Table that are implicated in the degradation pathway of HCH isomers, it is well known that these genes diverge in order to catalyze several catabolic functions 39. For example, linA encodes for HCH dehydrochlorinase and linB, belonging to α/β hydrolase family, encodes for halidohydrolase, which acts as the primary enzyme of the pathway. LinB has a very broad substrate preference for halogenated compounds, whereas the substrate range for LinA is restricted to α-, γ- and δ-HCH and their corresponding PCCH products 14. linC encodes for a 2, 5-DDOL dehydrogenase and has an activity profile similar to that of the linX gene 40. In nature, microbes thrive in complex communities where they interact with other species living within the same environment. The abundance of the type VI secretion system (T6SS) in pond sediment indicates towards a complex interaction between microbes and their environment with respect to transport mechanisms 41 (Fig. . This secretion system is a multicomponent complex that has a sec-independent mechanism for the transport of the effector proteins associated with it. The T6SS system translocates its effector proteins across recipient cells in a contact dependent manner 42, 42. These well known and characterized effector proteins include enzymes that can act upon actin or degrade peptidoglycan. Many unknown effector molecules may be involved in different biotic interactions ranging from symbiosis to pathogenesis; this is an area that still needs to be researched and explored. T6SS is composed of 13 core subunits along with various additional components that are essential for the secretion process 44; together these subunits and components form a cluster of genes that is approximately 20 kb size. According to results derived from functional analysis, protein localization experiments 45, 46 and bioinformatic analysis 44, the genes involved in this secretory system can be classified into three categories. The first category is composed of membrane associated proteins, integral membrane proteins and lipoproteins (K11892, K11891, K11906, K11894, K11910, K11911, K11893, K11907 and K11896). The second category (K11903, K11904, K11900, K11895) is composed of proteins that make up the tailed bacteriophage machinery that make up the cell puncturing device 47. The last category (K11901) is composed of proteins whose function remains to be delineated. In addition to the abovementioned genes, several accessory genes (K11908, K11907, K02557, K11898, K01115, 03Y_06575) are also associated with the cluster of T6SS genes. It is noteworthy that within the T6SS machinery, genes comprising of the Virulence Associated Cluster (VAS) have both a regulatory as well as a structural significance 48. An analysis of the relative functional subsystems present in pond sediment samples have highlighted some unique functional attributes (R2 = 0.985) that are specific to the system under study. Based on our results we can deduce that the complex community present within the pond sediment fosters an environment conducive to a high level of inter-specific competition. As a direct consequence of this, the microbes present in pond sediment have developed highly evolved subsystems for defense, survival, and resource gathering.

Conclusion

This study involved a physiochemical analysis of sediment, microbial diversity, and functional capability of the microbial community inhabiting pond sediment near IPL a well-known producer of lindane. The assembly of contigs plays an important role in diversity and functional analysis. Phylum, such as the Proteobacteria, gets assembled more conveniently as compared to Verrucomicrobia or Actinobacteria due the absence of regions that limit assembly. Thus, the information retrieved by high-throughput sequencing of complex communities might be biased towards certain phyla. Hence, in order to get a complete and comprehensive profile of the microbiota of complex communities, the comparative approach of metagenomics must be supplemented with genomic analysis. This study sheds light on the comparative functional profiling between DS, 1 KM, 5 KM and pond sediment in order to correlate the microbial community with its complexity. Besides HCH, there are many other aromatic compounds that also get discharged from the factory effluents into the soil, from where it leaches into the ground water thus contaminating the entire ecosystem. According to differential binning approaches Mycobacterium, Corynebacterium, Rhodococcus, Bradyrhizobium, Sorangium, Thauera, Methylibium, Candidatus, Anaeromyxobacter, Streptomyces and Burkholderia were the most abundant genera in the pond sediment samples collected by us. Most of the genera listed above are known to degrade aromatic compounds and have advanced secretory systems. Distribution of T6SS in bacteria inhabiting environmental niches is not well documented. Metagenomic sequencing of the pond sediment dataset has helped us in understanding T6SS as well as the mechanism of export of effector molecules in these tightly regulated processes. Most of the elements present in pond sediment are involved in horizontal transfer of genetic material, owing to that, there is a very robust network for exchange of genetic information within the community. As evidence for the widespread exchange of genetic information in the pond sediment, it was seen that an analysis of the diversity of the pond sediment shows a high occurrence frequency for genera that harbor a large number of plasmids. Due to this, the microbes present at pond sediment have evolved a superior adaptability for the contaminated sites and are also actively involved in degradation of the toxic contaminants. Until now, the studies have revealed the existence of very few pathways for microbial interaction but with increased research and improved knowledge along with the discovery of novel pathways, the concept of microbial interaction will reach a new level of understanding. However, understanding complex microbial communities such as those that exist in soil requires ultra deep sequencing (50 Tbp approx) for the extraction of useful information 49. Supplementary figures and tables. Click here for additional data file.

Table 1

Sequencing Data and sequence assembly statistics of both pond sediment samples.

Sample	PS1	PS2
Total no. of reads	106,039,716	175,086,320
Percentage of reads used	50.17 %	30.25 %
Total Assembly length (bp)	59,521,231	51,183,827
K-mer	39	39
No of Contigs obtained	774,416	1,483,949
n50	158	116
Maximum length of Contig (bp)	10,214	10,340
GC Content of Contigs	62 ±8 %	±9 %

Table 2

Chemical properties and HCH Residue Analysis (mg/kg) of soil samples. pH values of the samples are neutral and normal salinity levels are represented by low concentration of cations (except Nitrogen) and Electrical Conductivity.

Samples	PS1	PS2
pH	7.72	6.92
Electrical Conductivity (dS/m)	1.10	0.73
Organic Carbon (%)	0.85	0.41
Available Phosphorus (kg/ha)	25.3	24.9
Available Potassium (kg/ha)	89	90
Available Nitrogen (kg/ha)	439	502
Salinity	Normal	Normal
αHCH (mg/kg)	0.074	0.076
βHCH (mg/kg)	0.831	0.833
ΣHCH (mg/kg)	0.905	0.909

45 in total

Review 1. Mechanisms of protein export across the bacterial outer membrane.

Authors: Maria Kostakioti; Cheryl L Newman; David G Thanassi; Christos Stathopoulos
Journal: J Bacteriol Date: 2005-07 Impact factor: 3.490

2. Identification of a conserved bacterial protein secretion system in Vibrio cholerae using the Dictyostelium host model system.

Authors: Stefan Pukatzki; Amy T Ma; Derek Sturtevant; Bryan Krastins; David Sarracino; William C Nelson; John F Heidelberg; John J Mekalanos
Journal: Proc Natl Acad Sci U S A Date: 2006-01-23 Impact factor: 11.205

3. Velvet: algorithms for de novo short read assembly using de Bruijn graphs.

Authors: Daniel R Zerbino; Ewan Birney
Journal: Genome Res Date: 2008-03-18 Impact factor: 9.043

Review 4. Structural similarity of tailed phages and pathogenic bacterial secretion systems.

Authors: Shuji Kanamaru
Journal: Proc Natl Acad Sci U S A Date: 2009-03-10 Impact factor: 11.205

5. Sphingobium quisquiliarum sp. nov., a hexachlorocyclohexane (HCH)-degrading bacterium isolated from an HCH-contaminated soil.

Authors: Kiran Bala; Pooja Sharma; Rup Lal
Journal: Int J Syst Evol Microbiol Date: 2009-08-03 Impact factor: 2.747

6. Organization of lin genes and IS6100 among different strains of hexachlorocyclohexane-degrading Sphingomonas paucimobilis: evidence for horizontal gene transfer.

Authors: Charu Dogra; Vishakha Raina; Rinku Pal; Mrutyunjay Suar; Sukanya Lal; Karl-Heinz Gartemann; Christof Holliger; Jan Roelof van der Meer; Rup Lal
Journal: J Bacteriol Date: 2004-04 Impact factor: 3.490

Review 7. Diversity, distribution and divergence of lin genes in hexachlorocyclohexane-degrading sphingomonads.

Authors: Rup Lal; Charu Dogra; Shweta Malhotra; Poonam Sharma; Rinku Pal
Journal: Trends Biotechnol Date: 2006-02-13 Impact factor: 19.536

Review 8. Aerobic degradation of lindane (gamma-hexachlorocyclohexane) in bacteria and its biochemical and molecular basis.

Authors: Yuji Nagata; Ryo Endo; Michihiro Ito; Yoshiyuki Ohtsubo; Masataka Tsuda
Journal: Appl Microbiol Biotechnol Date: 2007-07-19 Impact factor: 4.813

9. Comparative metagenomic analysis of soil microbial communities across three hexachlorocyclohexane contamination levels.

Authors: Naseer Sangwan; Pushp Lata; Vatsala Dwivedi; Amit Singh; Neha Niharika; Jasvinder Kaur; Shailly Anand; Jaya Malhotra; Swati Jindal; Aeshna Nigam; Devi Lal; Ankita Dua; Anjali Saxena; Nidhi Garg; Mansi Verma; Jaspreet Kaur; Udita Mukherjee; Jack A Gilbert; Scot E Dowd; Rajagopal Raman; Paramjit Khurana; Jitendra P Khurana; Rup Lal
Journal: PLoS One Date: 2012-09-28 Impact factor: 3.240

10. The Pfam protein families database.

Authors: Marco Punta; Penny C Coggill; Ruth Y Eberhardt; Jaina Mistry; John Tate; Chris Boursnell; Ningze Pang; Kristoffer Forslund; Goran Ceric; Jody Clements; Andreas Heger; Liisa Holm; Erik L L Sonnhammer; Sean R Eddy; Alex Bateman; Robert D Finn
Journal: Nucleic Acids Res Date: 2011-11-29 Impact factor: 16.971

3 in total

1. Metagenomics as a Public Health Risk Assessment Tool in a Study of Natural Creek Sediments Influenced by Agricultural and Livestock Runoff: Potential and Limitations.

Authors: Brittany Suttner; Eric R Johnston; Luis H Orellana; Luis M Rodriguez-R; Janet K Hatt; Diana Carychao; Michelle Q Carter; Michael B Cooley; Konstantinos T Konstantinidis
Journal: Appl Environ Microbiol Date: 2020-03-02 Impact factor: 4.792

2. Diazotrophic Anaeromyxobacter Isolates from Soils.

Authors: Yoko Masuda; Haruka Yamanaka; Zhen-Xing Xu; Yutaka Shiratori; Toshihiro Aono; Seigo Amachi; Keishi Senoo; Hideomi Itoh
Journal: Appl Environ Microbiol Date: 2020-08-03 Impact factor: 4.792

3. Recent Advanced Technologies for the Characterization of Xenobiotic-Degrading Microorganisms and Microbial Communities.

Authors: Sandhya Mishra; Ziqiu Lin; Shimei Pang; Wenping Zhang; Pankaj Bhatt; Shaohua Chen
Journal: Front Bioeng Biotechnol Date: 2021-02-10

3 in total