Literature DB >> 34220198

Reconstructing Draft Genomes Using Genome Resolved Metagenomics Reveal Arsenic Metabolizing Genes and Secondary Metabolites in Fresh Water Lake in Eastern India.

Samrat Ghosh^1,2, Aditya Narayan Sarangi¹, Mayuri Mukherjee^1,2, Deeksha Singh^1,2, Madduluri Madhavi¹, Sucheta Tripathy^1,2.

Abstract

Rabindra Sarovar lake is an artificial freshwater lake in the arsenic infested eastern region of India. In this study, using the genome resolved metagenomics approach; we have deciphered the taxonomic diversity as well as the functional insights of the gene pools specific to this region. Initially, a total of 113 Metagenome Assembled Genomes (MAGs) were recovered from the two predominant seasons, that is, rainy (n = 50) and winter (n = 63). After bin refinement and de-replication, 27 MAGs (18 from Winter season and 9 from Rainy season) were reconstructed. These MAGs were either of high-quality (n = 10) or of medium quality (n = 17) that was determined based on genome completeness and contamination. These 27 MAGs spanning across 6 bacterial phyla and the most predominant ones were Proteobacteria, Bacteroidetes, and Cyanobacteria regardless of the season. Functional annotation across the MAGs suggested the existence of all known types of arsenic resistance and metabolism genes. Besides, important secondary metabolites such as zoocin_A, prochlorosin, and microcin were also abundantly present in these genomes. The metagenomic study of this lake provides the first insights into the microbiome composition and functional classification of the gene pools in two predominant seasons. The presence of arsenic metabolism and resistance genes in the recovered genomes is a sign of adaptation of the microbes to the arsenic contamination in this region. The presence of secondary metabolite genes in the lake microbiome has several implications including the potential use of these for the pharmaceutical industry.

Entities: Chemical Disease Gene Species

Keywords: MAGs (Metagenome assembled genomes); NRPS (Non-ribosomal Peptide Synthases); PKS (Polyketide Synthases); Rabindra Sarovar lake; freshwater; genome-resolved metagenomics

Year: 2021 PMID： 34220198 PMCID： PMC8221699 DOI： 10.1177/11779322211025332

Source DB: PubMed Journal: Bioinform Biol Insights ISSN： 1177-9322

Introduction

Freshwater lakes are commonly found in low lying areas that are fed from rivers, streams, and overflow from the enclosing sites. Compared to other aquatic environments such as moving water and oceans, freshwater lakes provide a different environment for microbes. Microbes in such environments carry out several important processes which include sequestering inorganic compounds, decomposing organic matter, mineralizing nitrogen, and so on. However, climate change and human activities affect lake ecosystems especially the biogeochemical cycle. Bacteria are highly sensitive to environmental changes, therefore change in the environment leads to the change in microbial eco-dynamics. Rabindra Sarovar lake is an artificial freshwater lake situated at south Kolkata (22.5121ºN and 88.3637ºE) in the Indian state of West Bengal. The lake area is about 192 acres, of which 73 acres are the water body (Figure 1). It is encircled by Southern Avenue to the north, the Kolkata sub-urban Railway track to the south, Dhakuria to the east, and Shyama Prasad Mukherjee road to the west. This area is in the humid tropical climate zone with a maximum temperature of 40°C and a minimum temperature of 10°C. Due to its environmental and ecological significance, the status of the lake was upgraded to a national lake. However, because of serious anthropogenic pressures, the lake is witnessing rapid degradation of its ecosystem. In the recent past, unrestricted growth of several aquatic, semi-aquatic species of grasses and sedges are observed in the lake area. In addition to these, Cyanobacterial blooms are also formed regularly. It is known that 25% to 75% of Cyanobacterial blooms are harmful, especially in freshwater.[7,8]

Figure 1.

Sampling site of this project including (A) a broad view, (B) google map view and (C) photographic view of the Rabindra Sarovar lake, an artificial freshwater lake located in south Kolkata, West Bengal, India. The photo was taken by Dr. Abhishek Das (ORCID: 0000-0002-1640-1154). In freshwater lakes, Proteobacteria is found to be the most diverse group of microbes, controlling the water quality and pollution.[9,10] Members of Phylum Bacteroidetes also colonize in freshwater environments to a large extent. Environmental Bacteroidetes are considered to be specialized in the degradation of polysaccharides and proteins in the biosphere. Interestingly, the class Flavobacteria from the Bacteroidetes group contain opportunistic human pathogens, infecting hosts with a low immune system. Like Proteobacteria, microbes of the Bacteroidetes group play a crucial role in controlling the water quality. Microbes of aquatic ecosystems possess a broad array of metabolic genes that play a key role in maintaining the biogeochemical cycles. These metabolic genes need to be characterized for a better understanding of their roles in biogeochemical cycles. For example, the function of arsenic resistance and metabolism-related genes are well-characterized that have substantial effects on biogeochemistry. Arsenic resistant microbes are reportedly widespread in the environment and are major contaminants of the aquatic ecosystems worldwide. Organisms resistant to arsenic have reported from the sites even with low arsenic concentrations (<7 ppm). Lake microbial communities are also known for fertile sources of secondary metabolites such as non-ribosomal peptide synthases (NRPS) and polyketide synthases (PKS). NRPS and PKS are two families of modular mega-synthases, used to treat cancer, infectious diseases, inflammation, and many other diseases. Both mega-synthases families producing peptides (using aminoacyl monomers) and polyketides (using acyl-coA monomers) are present in this ecosystem and are found in many taxonomical groups vary from bacteria (Proteobacteria, Cyanobacteria, Actinobacteria) to fungi.[18,19] Secondary metabolites such as bacteriocins are often seen as part of an elaborate chemical defense system produced in all major lineages of bacteria. These are obtained from short ribosomally produced precursor proteins that comprised of a conserved N-terminal leader sequence and a C-terminal core peptide. The bacteriocins are carefully analyzed in this study since they are powerful potential alternatives to antibiotics and food preservatives.[22 -24] Our hypothesis is: in a highly urbanized area where arsenic and other heavy metal contamination is a common occurrence, the microbial composition and dynamics must be reflected in terms of metabolism of such ingredients by the microbial population. Furthermore, seasonal changes of these microbial composition is a perfect indicator of changed microbial dynamics due to physiological and environmental conditions. In this study, we have discussed the genomic features of the microbial genomes reconstructed from the lake ecosystem using a genome-resolved metagenomics approach. Screening of arsenic metabolism and resistance genes among the microbes help to understand their impact on local biogeochemistry and levels of contamination. Our analysis reveals strong presence of well-characterised arsenic resistance and metabolism-related genes among the metagenome assembled genomes (MAGs), and is a possible sign of arsenic stabilization and neutralization by the identified microbes in this region. Genes involved in secondary metabolite production could be explored for their industrial applications and potential use for environmental bio-indicators. Using the current approach, we have identified several important classes of secondary metabolites including NRPS, PKS, and bacteriocins across the MAGs, indicating the lake is a rich source of natural products.

Materials and Methods

Sampling, DNA isolation, and sequencing

Two liters of surface water samples were collected from the middle of the lake and were transported to the lab for further processing. Collections were made at 6 months interval, that is, during Winter season (January 2015) and Rainy season (July 2015). Simultaneously, the temperature and pH of lake water were also measured. Water filtration was carried out using the Millipore filtration unit attached to a vacuum pump. Each of the water samples (sample volume of 2 L) was immediately filtered through a 1.2 μm pore size membrane filter to remove pollutants. The filtrate was then passed through a 0.2 μm pore size membrane filter to capture the cells on the filter. The filter membranes were cut into pieces and transferred to a 15 ml falcon tube containing 1 ml lysis buffer. The lysis buffer containing tubes were stored at -20 C over the weekend for extraction of DNA. DNA was extracted following the protocol from Oh et al. Briefly, cells were treated with lysozyme and incubated at 37ºC for 30 minutes. Sodium dodecyl sulfate (SDS) was added and then the sample was incubated at 55ºC in a water bath for 4 hours. Phenol chloroform extraction was carried out and the upper phase carrying DNA was precipitated with 100% cold ethanol. Precipitated DNA was stored in -20 C for overnight. Subsequently, the DNA pellet was washed with 75% ethanol. The pellet was air-dried and dissolved in 20 μl autoclaved Milli-Q water. Approx 3 μg of genomic DNA was sheared using Covaris tubes (#SN001887) to obtain 300 base-pairs to 500 base-pairs fragment sizes. The size distribution was checked by running an aliquot of the sample on High Sensitivity Bioanalyzer Chip (Agilent, #5067-4626). Sample cleanup was done using HighPrep PCR clean up system (Magbio#AC-60050). A series of enzymatic reactions were carried out using the NEXTFlex DNA Sequencing kit that involves overhang addition and adaptor ligation. After ligation, ~300–600 bp fragments were size selected on a 2% agarose gel and cleaned using the MinElute column (QIAGEN, #28604 or 28606). The library thus prepared was quantified using Qubit fluorometer (Invitrogen, Cat # Q32854) and quality validated by running an aliquot on High Sensitivity Bioanalyzer Chip ( Agilent, #5067-4626). The QC qualified metagenomic DNA library sample was sequenced (150 bp paired-end) using the Illumina Miseq platform. The whole procedure (Library preparation & Sequencing) was performed at the Next-Generation Sequencing (NGS) facility, Genotypic Technology Pvt Ltd, Karnataka, India.

Quality control of reads

Quality of raw paired-end reads obtained from the two samples (Winter and Rainy) were checked using FastQC. Adapters, poor-quality reads (Q < 30), and reads containing ambiguous bases were discarded using fastp (v.0.19.1) with default parameters; except for —qualified_quality_phred and —n_base_limit parameters where they were set at 30 and 0 respectively.

Statistical analysis of metagenome data sets to check seasonal effects on lake microbes

After the initial quality check, unassembled reads were uploaded into the MG-RAST server for taxonomic and functional profiling. To check seasonal variation in lake microbes at a taxonomic and functional level, statistical analysis was performed using the MG-RAST server followed by STAMP analysis (Statistical Analysis of Metagenomic Profiles) v 2.1.3. Statistically significant taxonomic and functional differences between Winter and Rainy season samples were calculated using the Fisher exact test with Benjamini-Hochberg FDR correction. The Newcombe–Wilson test was performed to calculate confidence intervals (CIs).

Read assembly, binning and evaluation

Simultaneously metaSPAdes assembler (v3.12)[30,31] was used for the assembly of Q30 metagenomic reads. The Binning of the assembled reads was done using MetaBAT (v.2.12.1). During both assembly and binning, all the parameters of the tools were kept as default. CheckM (v.1.0.11) was used to estimate the genome completeness and contamination of the recovered genome bins.

Bin refinement, de-replication, evaluation, coverage calculation, taxonomic assignment, 16 S gene detection, and functional annotation

RefineM (v.0.0.3) was used to improve the completeness of each bin. Duplicate bins were removed using dRep. The de-replicated refined bins were further evaluated using CheckM (v.1.0.11) and selected as per stated standards. The coverage of each bin was calculated using the coverage and profile function of CheckM (v.1.0.11). Taxonomy was assigned to the final improved bins using CheckM (v.1.0.11), MiGA, and GTDBtk v.0.2.1 tools. After the taxonomy assignment, 16 S genes of each de-replicated genomes were detected using the ssu_finder function of CheckM (v.1.0.11). The taxonomically assigned refined bins were annotated using both Prokka v1.13 and RAST (Rapid Annotation using Subsystem Technology) software. COG annotations were carried out with the STRING v10.5 database (only top hits were taken). Pathways were annotated using GhostKOALA.

Average nucleotide identity and average amino acid identity calculation of the refined bins

All pair-wise ANI (Average Nucleotide Identity) and AAI (Average Amino Acid Identity) values of the selected bins were calculated with “pyani” and “comparem” python packages respectively. Finally, results were visualized with R package “pheatmap” using R studio.

Construction of phylogenomic tree to infer taxonomy of the refined bins

The taxonomy of the refined bins (n = 27) was further confirmed by placing them in a phylogenomic context. For this purpose, we retrieved 827 representative complete bacterial genomes from the RefSeq. The phylogenomic tree was constructed by UBCG v 3.0 pipeline where all the default parameters were used. The constructed tree was edited using iTOLv4.

Comparison of lake water sample bins

We compared the genome bins among themselves to get a deeper insight into their composition. The profiling of carbohydrate activating enzyme families was done using the stand-alone dbCAN against the CAZy database. In-addition, bacteriocins, CRISPRs, prophages, and ISs, were screened using BAGEL4, a web-based resource, CRISPRFinder stand-alone tool, PHASTER webserver and ISsaga web platform, respectively. The presence of arsenic resistance genes among the genome bins was confirmed by using blastp against already known publicly available arsenic resistance databases. Transporter related genes were determined using the TCDB (The Transporter Classification Database) database. Secondary metabolites were identified using Anti-SMASH (detection strictness parameter kept default, that is, “relaxed” and extra features parmeter kept “all on”) and NAPDOS.

Data availability

Merged paired-end reads with annotations are publicly available in the MG-RAST server under the following IDs: mgm4875225.3 (Rainy season), mgm4875226.3 (Winter season). Assembled data, recovered genome bins (n = 27) along with their annotations (protein files) as well as phylogenetic trees are available at https://ndownloader. Figureshare.com/articles/8058902/versions/6. Apart from the 27 refined bins, the rest of the bins (n = 86) are available at https://ndownloader. Figureshare.com/articles/8796320/versions/3. Due to the poor quality of these bins (n = 86), they are not used for further downstream analysis and are not discussed in this paper. Besides, the raw fastq files were deposited in the NCBI SRA database with accession numbers SRR8209851 and SRR8209852.

Results

Environmental conditions of sampling site and sequencing of metagenome samples

Sampling site environmental conditions (temperature and pH) differences were small. During sample collection, the temperature of the lake was 21ºC (Winter season) and 31°C (Rainy season) whereas pH was 8.5 (Winter season) and 8 (Rainy season). Whole-genome shotgun sequencing of Winter and Rainy season samples (microbial DNA of surface water) using Illumina produced 53,902,467 × 2 and 39,523,695 × 2 reads respectively. After initial quality check (QC), the high quality (Q30) reads (with read length 151 bp) from the Winter season (37659495 × 2) and Rainy season (24195845 × 2) were obtained (Table 1).

Table 1.

Pre- and post-QC read counts and assembly size after assembly with metaSPAdes.

Sample Name	Raw-reads (File size in GB)	Q30-reads (File size in GB)	Assembled-reads (File size in GB)
SRR8209851/WINTER	53,902,467 × 2 (38)	37659495 × 2 (24)	2230279 scaffolds (1.2)
SRR8209852/RAINY	39523695 × 2 (30)	24195845 × 2 (15.2)	2738321 scaffolds (1.2)

Pre- and post-QC read counts and assembly size after assembly with metaSPAdes.

Comparison between the winter and rainy season samples suggests a very subtle difference in microbial composition

To get a comparative account of seasonal variation in lake microbial community at the taxonomic and functional level, high quality (Q30) reads from Rainy and Winter season were uploaded to the MG-RAST server. Later the outputs from the MG-RAST platform were statistically validated using STAMP software. Bacteria appeared to be the most dominant kingdom across both the seasons (Figure 2A). Among the bacterial phyla, Proteobacteria was found to be the phylum with the highest proportion detected in both the seasons followed by Cyanobacteria, Actinobacteria, Bacteroidetes, Planctomycetes, Verrucomicrobia, and Firmicutes (Figure 2B). Phylum Proteobacteria, Actinobacteria, Planctomycetes, Verrucomicrobia and Firmicutes were significantly enriched in the Rainy season while phylum Cyanobacteria and Bacteroidetes were enriched in the Winter season (Figure 2B). Pathogenic bacteria such as Bordetella, Microcystis, Burkholderia, Mycobacterium, Aeromonas, Vibrio, and Escherichia were found in both the seasons. ().

Figure 2.

Taxonomic comparison of Winter season (yellow) and Rainy season (blue) microbial communities by Refseq (A) at the domain level and (B) phylum level using STAMP. STAMP indicates Statistical Analysis of Metagenomic Profiles. Functional annotation at level 1 revealed that carbohydrates, amino acids, and derivatives related genes were highly dominant in the lake microbial communities and these genes were slightly higher in the Rainy season than in the Winter season (Figure 3A). Interestingly, mobile genetic elements such as phage, prophage, and plasmids were found to be enriched in the Rainy season compared to the Winter season (Figure 3A). Annotation at level 2 and level 3 also shown that there was an enrichment of mobile genetic elements in the Rainy season (). In addition to mobile genetic elements, antibiotic resistance genes were also found enriched in Rainy season (Figure 3B and ).

Figure 3.

Functional group comparison of Winter season (yellow) and Rainy season (blue) microbial communities by SEED subsystems (A) at level 1 and (B) level 2 using STAMP. STAMP indicates Statistical Analysis of Metagenomic Profiles.

Bacteria remains the predominant phyla in MAGs with arsenic methylating genes in abundance

Assembly of Q30 reads generated a total of 2,230,279 and 2,738,321 scaffolds for Winter and Rainy season samples respectively (Table 1). To reconstruct quality single genomes from assembled metagenome, we used the MetaBAT2 binning tool which produced a total of 113 genome bins (a.k.a Metagenome Assembled Genomes or MAGs). Among the 113 bins, 63 bins were from the Winter season and the remaining 50 bins were from Rainy season samples (: CheckM of all bins). After successful refinement, de-replication, and assessment, out of 113 bins, 27 bins (18 from Winter and 9 from Rainy season) were selected according to the defined genome standards (high-quality draft [> 90% complete, < 5% contamination], medium-quality draft [> 50% complete, < 10% contamination] or low-quality draft [< 50% complete, < 10% contamination]). Based on these standards, 10 MAGs were classified as high quality and 17 were categorized as a medium quality draft (Table 2). Other genome quality assessment parameters such as N50, L50, and GC% were calculated for these MAGs and were given in Table 2 and : Quast. Also, information regarding computed coverage of reconstructed bins and bins with identified 16 S rRNA can be found in : Coverage of all bins, Refined bins coverage, and SSU information.

Table 2.

Characteristics of recovered genome bins based on CheckM and QUAST analysis.

Bin name	Completeness (%)	Contamination (%)	Draft genome quality	N50	L50	No. of scaffolds	Size (Mb)	GC %
bin_winter_42_filtered	99.15	0.45	High	81344	9	40	2.25	53.71
bin_winter_49_filtered	92.61	0	High	23814	37	189	3.10	51.38
bin_winter_31_filtered	92.35	0.55	High	31412	21	128	2.44	57.68
bin_winter_40_filtered	91.37	1.08	High	20066	40	181	2.59	68.92
bin_winter_6_filtered	90.8	0.47	High	13054	86	348	3.49	41.95
bin_winter_53_filtered	90.46	3.94	High	20859	58	295	3.91	46.1
bin_winter_34_filtered	90.32	1.08	High	14315	64	264	2.86	54.28
bin_winter_45_filtered	81.69	1.11	Medium	9566	106	427	3.23	43.38
bin_winter_59_filtered	80.41	3.04	Medium	7842	69	256	1.64	44.27
bin_winter_55_filtered	76.53	4.54	Medium	6044	245	817	4.47	52.08
bin_winter_33_filtered	68.38	0.6	Medium	6163	118	391	2.19	40.34
bin_winter_46_filtered	64.42	1.15	Medium	12520	60	207	2.22	68.56
bin_winter_7_filtered	64.2	0.16	Medium	7041	56	196	1.14	59.48
bin_winter_4_filtered	64	0.08	Medium	5225	82	255	1.26	35.58
bin_winter_22_filtered	63.79	1.72	Medium	71625	9	39	1.98	71.05
bin_winter_1_filtered	60.08	0.48	Medium	5052	127	401	1.89	40.18
bin_winter_23_filtered	54.9	3.15	Medium	4098	145	416	1.70	41.51
bin_winter_24_filtered	51.27	0	Medium	4576	162	494	2.20	43.15
bin_rainy_49_filtered	93.6	4.48	High	11199	68	289	2.46	69.22
bin_rainy_34_filtered	91.85	0.41	High	9550	158	579	4.39	56.22
bin_rainy_48_filtered	91.52	0.88	High	8316	78	276	1.88	53.96
bin_rainy_3_filtered	88.81	0.16	Medium	13245	57	255	2.49	68.96
bin_rainy_23_filtered	78.89	2.1	Medium	7634	57	212	1.45	47.54
bin_rainy_9_filtered	57.05	0	Medium	12329	64	450	2.37	72.97
bin_rainy_27_filtered	54	0.48	Medium	4717	74	232	1.07	57.54
bin_rainy_35_filtered	53.45	0	Medium	13596	63	242	2.66	70.77
bin_rainy_7_filtered	50.85	1.08	Medium	4694	146	450	2.03	66.25

Characteristics of recovered genome bins based on CheckM and QUAST analysis. Taxonomic ranks of all the 27 MAGs were determined using CheckM, MiGA, and GTDB-tk platform. MAGs were taxonomically inferred as bacteria spanning across 7 phyla, that is, Proteobacteria, Cyanobacteria, Bacteroidetes, Verrucomicrobia, Planctomycetes, Gemmatimonadetes, and Chlamydia (Table 3 and : Refined bins taxonomy). Among these 27 MAGs, 10 represent Proteobacteria, 8 Bacteroidetes, 4 Cyanobacteria, 2 Planctomycetes, and 1 each Verrucomicrobia, Gemmatimonadetes, and Chlamydia (Table 3). We further investigated whether MAGs of the same phylum were part of the same species or not. This was confirmed by average nucleotide identity (ANI) and average amino-acid identity (AAI) analysis. ANI and AAI values of bin_winter_40_filtered vs bin_rainy_3_filtered as well as bin_winter_42_filtered vs bin_rainy_48_filtered were found greater than 95% ( and ). ANI and AAI values above 95 % are considered as a threshold for assigning genomes to the same species.[57,58]

Table 3.

Taxonomy of recovered genome bins based on CheckM, MiGA, and GTDB-tk analysis.

Bin name	CheckM Taxonomy		MiGA Taxonomy		GTDB-tk Taxonomy
Bin name	Assigned taxon	Taxon level	Assigned taxon	Taxon level	Assigned taxon	Taxon level
bin_winter_42_filtered	Betaproteobacteria	Class	Burkholderiaceae	Family	Burkholderiaceae	Family
bin_winter_49_filtered	Bacteroidetes	Phylum	Flavobacteriia	Class	Chitinophagales	Order
bin_winter_31_filtered	Bacteria	Kingdom	Bacteroidetes	Phylum	Kapabacteriaceae	Family
bin_winter_40_filtered	Alphaproteobacteria	Class	Caulobacteraceae	Family	Phenylobacterium	Genus
bin_winter_6_filtered	Cyanobacteria	Phylum	Cyanobacteria	Phylum	Pseudanabaena	Genus
bin_winter_53_filtered	Cytophagales	Order	Cytophagia	Class	Cyclobacteriaceae	Family
bin_winter_34_filtered	Bacteria	Kingdom	Flavobacteria	Class	Flavobacteriales	Order
bin_winter_45_filtered	Bacteria	Kingdom	Verrucomicrobia	Phylum	Kapabacteriales	Order
bin_winter_59_filtered	Bacteria	Kingdom	Chlamydia	Class	Parachlamydiales	Order
bin_winter_55_filtered	Bacteria	Kingdom	Planctomycetes	Class	Pirellulaceae	Family
bin_winter_33_filtered	Bacteroidetes	Phylum	Chitinophagaceae	Family	Chitinophagaceae	Family
bin_winter_46_filtered	Bacteria	Kingdom	Bacteria	Kingdom	Pirellulales	Order
bin_winter_7_filtered	Proteobacteria	Phylum	Betaproteobacteria	Class	Burkholderiaceae	Family
bin_winter_4_filtered	Bacteria	Kingdom	Betaproteobacteria	Class	Caenarcaniphilales	Order
bin_winter_22_filtered	Bacteria	Kingdom	Rhodanobacteraceae	Family	Ahniellaceae	Family
bin_winter_1_filtered	Bacteroidetes	Phylum	Sphingobacteriia	Class	Bacteroidia	Class
bin_winter_23_filtered	Bacteroidetes	Phylum	Chitinophagaceae	Family	Chitinophagales	Order
bin_winter_24_filtered	Cyanobacteria	Phylum	Synechococcales	Order	Microcystis	Genus
bin_rainy_49_filtered	Cyanobacteria	Phylum	Synechococcaceae	Family	Cyanobiaceae	Family
bin_rainy_34_filtered	Cyanobacteria	Phylum	Cyanobacteria	Phylum	Prochlorotrichaceae	Family
bin_rainy_48_filtered	Betaproteobacteria	Class	Burkholderiaceae	Family	Burkholderiaceae	Family
bin_rainy_3_filtered	Alphaproteobacteria	Class	Caulobacteraceae	Family	Phenylobacterium	Genus
bin_rainy_23_filtered	Burkholderiales	Order	Burkholderiaceae	Family	Polynucleobacter	Genus
bin_rainy_9_filtered	Bacteria	Kingdom	Gemmatimonadetes	Class	Gemmatimonadaceae	Family
bin_rainy_27_filtered	Betaproteobacteria	Class	Betaproteobacteria	Class	Burkholderiaceae	Family
bin_rainy_35_filtered	Bacteria	Kingdom	Acetobacteraceae	Family	Roseomonas	Genus
bin_rainy_7_filtered	Bacteria	Kingdom	Flavobacteria	Class	Flavobacteriales	Order

Taxonomy of recovered genome bins based on CheckM, MiGA, and GTDB-tk analysis. Initial annotation of each of the MAGs was carried out using Prokka. All predicted genes of 27 bins were cataloged in . Protein coding genes involved in the arsenic biogeochemical cycle were identified. The annotations were confirmed by running a blast against publicly available known arsenic-related genes database. Generally, nine genes, that is, acr3, aioA, arrA, arsB, arsC_glut, arsC_thio, arsD, arsM, and arxA are considered as important players for arsenic resistance and metabolism in microbes.[59,14] Out of these nine arsenic linked genes eight genes were found to be present across the recovered MAGs/bins (n = 27) and only missing gene was arsD ( ). With the help of the existing knowledge on arsenic resistance and metabolism-related genes functions in microbes,[14,59] we made a hypothetical map of arsenic metabolism (Figure 4). Importantly, irrespective of the bin quality, higher occurrences of arsM gene were predicted in the genome bins (Figure 5). Arsenic detoxification genes (acr3, arsB, arsC_glut, arsC_thio) were less abundant than arsenic metabolism genes (aioA, arrA, arsM, and arxA) (Figure 5).

Figure 4.

A schematic overview of the arsenic metabolism and resistant related genes common pathways present in lake microbes. The presence of the genes is shown in bold black.

Figure 5.

Distribution of arsenic resistance and metabolism-related genes across 27 MAGs. MAGs indicates Metagenome Assembled Genomes.

A schematic overview of the arsenic metabolism and resistant related genes common pathways present in lake microbes. The presence of the genes is shown in bold black. Distribution of arsenic resistance and metabolism-related genes across 27 MAGs. MAGs indicates Metagenome Assembled Genomes. Annotation of transporter genes was carried out using data from TCDB (Transporter Classification Database) and cataloged in . The functional COG and RAST categories distribution were illustrated in heatmaps (Figures 6 and 7) and respective values were provided in table .

Figure 6.

COG frequency heat map of recovered genome bins based on hierarchical clustering. The horizontal axis shows COG categories and the vertical axis represents bins.

Figure 7.

Heat map based on hierarchical clustering illustrates the distribution of the RAST categories. The horizontal axis shows RAST categories and the vertical axis represents bins. RAST indicates Rapid Annotation using Subsystem Technology.

COG frequency heat map of recovered genome bins based on hierarchical clustering. The horizontal axis shows COG categories and the vertical axis represents bins. Heat map based on hierarchical clustering illustrates the distribution of the RAST categories. The horizontal axis shows RAST categories and the vertical axis represents bins. RAST indicates Rapid Annotation using Subsystem Technology. Pathway annotation using the Ghost-KOALA platform identified 80 complete KEGG pathway modules from all the 27 combined MAGs. Completeness of a pathway is demonstrated by including all essential components (gene blocks) to complete a metabolic cycle. Among the 80 modules, central carbohydrate metabolism (n = 13), ATP synthesis (n = 8), cofactor-vitamin metabolism (n = 6), and carbon fixation (n = 6) related modules were predominant ( ). Central carbohydrate metabolism pathway module M00005 (PRPP biosynthesis, ribose 5 P => PRPP) and fatty acid metabolism module M00086 (beta-oxidation, acyl-CoA synthesis) were found to be present in most of the MAGs (n = 20) ( ). Complete pathway modules of sulfur metabolism such as M00595 (Thiosulfate oxidation by SOX complex, thiosulfate => sulfate) and M00176 (Assimilatory sulfate reduction, sulfate => H2 S) were found in the lake ( ). Secondary metabolism-related pathway modules, that is, M00096 (C5 isoprenoid biosynthesis, non-mevalonate pathway), M00364 (C10-C20 isoprenoid biosynthesis, bacteria), and M00793 (dTDP-L-rhamnose biosynthesis) were also identified ( ). The abundance of ABC transporter and two-component system-related genes under the “environmental information processing” category was found to be high in all bins (n = 27) ( ). Importantly, under “xenobiotics biodegradation and metabolism” category benzoate degradation linked genes abundance was comparatively high ( ).

Phylogenomic analysis placed metagenome-assembled genomes into six bacterial phyla

Phylogenomic analysis placed MAGs into the six bacterial phyla (Figure 8). Among the recovered MAGs (n = 27) nine were placed in the Proteobacteria (bin_rainy_3_filtered, bin_rainy_35_filtered, bin_winter_40_filtered, bin_rainy_23_filtered, bin_rainy_27_filtered, bin_rainy_48_filtered, bin_witer_7_filtered, bin_winter_42_filtered and bin_winter_22_filtered); nine in Bacteroidetes (bin_winter_1_filtered, bin_winter_23_filtered, bin_winter_31_filtered, bin_winter_33_filtered, bin_winter_34_filtered, bin_winter_45_filtered, bin_winter_49_filtered, bin_winter_53_filtered and bin_rainy_7_filtered;); five in Cyanobacteria (bin_rainy_49_filtered, bin_rainy_34_filtered, bin_winter_24_filtered, bin_winter_6_filtered and bin_winter_4_filtered); two in Planctomycetes (bin_winter_46_filterd and bin_winter_55_filtered) and one each in Verrucomicrobiota (bin_winter_59_filtered) and in Gemmatimonadetes (bin_rainy_9_filtered) (Figure 8). Within the Proteobacteria, phylogenomic analysis placed one MAG in gamma-Proteobacteria (bin_winter_22_filtered), three in alpha-Proteobacteria (bin_rainy_3_filtered, bin_rainy_35_filtered and bin_winter_40_filtered) and five in beta-Proteobacteria (bin_rainy_23_filtered, bin_rainy_27_filtered, bin_rainy_48_filtered, bin_witer_7_filtered and bin_winter_42_filtered) (Figure 8). Importantly, all these five MAGs of beta-Proteobacteria group were part of Burkholderiaceaae family (Table 3). Further analysis with MiGA suggested that there were three ambiguities e.g.; bin_winter_59_filtered was assigned to Chlamydiia ( : Refined bins taxonomy); bin_winter_45_filtered was assigned to Verrucomicrobia (: Refined bins taxonomy); and bin_winter_4_filtered was assigned to Proteobacteria/beta-Proteobacteria. Phylogenomic analysis placed these three bins under phyla Verrucomicrobia, Bacteroidetes and Cyanobacteria respectively. In case of identification with GTDB-tk one ambiguity found, that is, bin_winter_59_filtered was assigned to Parachlamydiales whereas phylogenomically it comes under phylum Verrucomicrobiota (: Refined bins taxonomy). The complete tree in Newick format was deposited to Figureshare (Details in data availability section)

Figure 8.

A phylogenomic tree based on the alignment of 92 concatenated marker genes to infer the taxonomic position of 27 reconstructed MAGs. Branches representing MAGs are marked as red. MAGs indicates Metagenome Assembled Genomes.

Glycobiome profile of the recovered MAGs

We identified 72 glycoside hydrolases (GHs), 42 glycosyl transferases (GTs), 13 polysaccharide lyases (PLs), 28 carbohydrate-binding modules (CBMs), 12 carbohydrate esterases (CEs), 7 auxiliary activities (AAs) families and 3 other families among the recovered genome bins (n = 27). Throughout the bins, GH23 family among the glycosyl hydrolases (25 out of 72 GHs) families and GT4 family among the glycosyltransferases (27 out of 42 GTs) families predicted to be predominant. Except for bin_winter_46_filtered (Pirellulales) and bin_winter_59_filtered (Parachlamydiales), the GH23 family enzyme was identified in all the bins. On the other hand, GT4 family enzyme was found in all the recovered bins. The total number of all the carbohydrate activating enzyme families present in each bin varied from 12 (present in bin_winter_7_filtered) to 65 (present in bin_winter_53_filtered) ().

Genomics cluster of secondary metabolites screening indicates its abundance in the MAGs

We analyzed the MAGs using NAPDOS. A total of 53 ketide synthase (KS) domains were obtained and most of them were identified as the FAS (fatty acid synthase) domain. Interestingly, condensation (C) domains were completely missing from most of the recovered bins (n = 24) except for bin_rainy_7_filtered (Flavobacteriales), bin_rainy_34_filtered (Prochlorotrichaceae) and bin_winter_24_filtered (Microcystis). C domain associated pathway products of these three bins were identified as lychenicin (bin_rainy_7_filtered), Ca-dependent antibiotic (bin_rainy_filtered_34), syringomycin (bin_rainy_filtered_34 and bin_winter_24_filtered) and microcystin (bin_winter_24_filtered). We also observed that both KS and C domains were absent in bin_winter_4_filtered. The highest number (n = 7) of KS domains were detected in bin_winter_55_filtered (). With the help of Anti-SMASH, 66 secondary metabolite gene clusters were identified and these gene clusters were classified as terpene, NRPS-like, bacteriocins, NRPS, arylpolyene, t1pks, t3pks, resorcinol and beta lactone (Figures 9 and 10). The most abundant cluster class belonged to terpene and bacteriocins were the second most abundant class (Figure 9). Out of 66 gene clusters, 9 clusters belonged to bin_rainy_49_filtered and 6 clusters were from bin_winter_45_filtered. Notably, Anti-SMASH analysis revealed these gene clusters were absent in bin_rainy_3_filtered, bin_winter_1_filtered, bin_winter_23_filtered, and bin_winter_40_filtered (). Since bacteriocins can be analyzed best using BAGEL4, we carried out an analysis that further revealed that there were several classes of bacteriocins such as zoocin_A, prochlorosin, and microcin. A total of 39 bacteriocins were predicted. Out of which, the maximum number of bacteriocins were identified in bin_rainy_49_filtered (n = 8) and bin_winter_55_filtered (n = 5) (Figure 11 and ). Bacteriocins were completely missing in bin_rainy_3_filtered (Phenylobacterium), bin_rainy_7_filtered (Flavobacteriales), bin_rainy_27_filtered (Burkholderiaceae), bin_winter_7_filtered (Burkholderiaceae), bin_winter_23_filtered (Chitinophagaceae), bin_winter_31_filtered (Kapabacteriaceae), bin_winter_40_filtered (Phenylobacterium) and bin_winter_59_filtered (Parachlamydiales).

Figure 9.

Abundance of secondary metabolite gene cluster types among the recovered genome bins (n = 27) obtained with Anti-SMASH.

Figure 10.

Bins with secondary metabolite genes having more than 75% similarity with the biosynthetic gene clusters encoding secondary metabolites (BGC) database.

Figure 11.

Bacteriocin (Prochlorosin, Zoocin, and Microcin) gene clusters of the recovered genome bins.

Abundance of secondary metabolite gene cluster types among the recovered genome bins (n = 27) obtained with Anti-SMASH. Bins with secondary metabolite genes having more than 75% similarity with the biosynthetic gene clusters encoding secondary metabolites (BGC) database. Bacteriocin (Prochlorosin, Zoocin, and Microcin) gene clusters of the recovered genome bins.

Various genomic features derived from the comparative genome centric analysis of MAGs

We extended our analysis to phage elements that may contribute to bacterial genome plasticity. Among the 27 genome bins (or MAGs), 16 bins (bin_rainy_7_filtered, bin_rainy_23_filtered, bin_rainy_27_filtered, bin_rainy_48_filtered, bin_winter_1_filtered, bin_winter_4_filtered, bin_winter_6_filtered, bin_winter_7_filtered, bin_winter_22_filtered, bin_winter_24_filtered, bin_winter_31_filtered, bin_winter_33_filtered, bin_winter_34_filtered, bin_winter_45_filtered, bin_winter_46_filtered and bin_winter_49_filtered) were devoid of phage elements. Bins containing phage elements were found to be incomplete. The highest number of phage elements (n = 2) were detected in bin_winter_53_filtered and the rest of the bins contained only a single phage element (Figure 12 and ). In addition, IS elements were predicted across these genome bins using the latest version of ISsaga. A total of 134 IS elements were found across the genome bins. The higher number of IS elements were identified in bin_winter_6_filtered (n = 30) followed by bin_winter_42_filtered (n = 13) and bin_winter_40_filtered (n = 10). (Figure 12 and ).

Figure 12.

Distribution of PHAGE elements, CRISPRs and IS elements identified from the recovered genome bins.

Distribution of PHAGE elements, CRISPRs and IS elements identified from the recovered genome bins. A total of 102 CRISPR sites were detected among these bins (n = 27). Out of which, the highest number (n = 27) was found in bin_rainy_34_filtered. In-addition, CRISPR sites were found totally absent in bin_rainy_48_filtered, bin_rainy_35_filtered, bin_rainy_27_filtered, bin_rainy_23_filtered, bin_winter_42_filtered and bin_winter_7_filtered (Figure 12 and ).

Discussion

In freshwater ecosystems, bio-resources are largely unexplored, especially in the eastern part of India. Freshwater ecosystems are a treasure trove for the extraction of pharmaceutical and biotechnologically important bioactive compounds. Freshwater bacteria are also known to monitor the water quality and stabilizing the biogeochemical cycle. In this study, the analysis of MAGS (Metagenome Assembled Genomes) provides new insights into the structure and composition of indigenous microbial communities within this anthropogenically influenced artificial lake ecosystem. The preliminary objective was to catalog the differences in lake microbes during the Winter and Rainy seasons where there was a substantial amount of microbial growth observed in the lake region. However, our detailed analysis revealed that the microbial diversity of the lake during the aforesaid two seasons was minimal (Figure 2B). Across the seasons, the majority of the bacterial members were found to be Proteobacteria, Cyanobacteria, Actinobacteria, Bacteroidetes, Planctomycetes, Verrucomicrobia, and Firmicutes (Figure 2B). The phylum Proteobacteria is a group of Gram-negative bacteria, most of them are identified as agriculturally, industrially, and medically relevant. But few of them are known as opportunistic pathogens, for example, Roseomonas. The genome of this microbe was recovered as bin_rainy_35_filtered (Table 3). Among the recovered genome bins (n = 27), bin_winter_42_filtered, bin_winter_7_filtered, bin_rainy_48_filtered, bin_rainy_27 and bin_rainy_23_filtered were identified as members of “Burkholderiaceae” family (Table 3). The family Burkholderiaceae belongs to the order Burkholderiales under the class Betaproteobacteria. Bacteria of this family can be found in various ecological niches (e.g. coral, oceans, freshwater, plant-association). It has demonstrated that species from this group especially those are associated with plants can utilize a large number of aromatic compounds as energy and carbon sources, and some have a considerable biotechnological potential to reduce the concentration and toxicity of chemical pollutants from the environment. In-addition several reports mentioned that the genus Polynucleobacter from Burkholderiaceae family form symbiotic relationships with plants are found in freshwater[63,64] and possess genes related to the degradation of pesticides used in agriculture. In our study, MAG (bin_rainy_23_filtered) from the Burkholderiaceae family identified as genus Polynucleobacter. Cyanobacteria, a phylum of bacteria found to be dominant at the lake system (Figure 2B), are popular for secondary metabolite production. Since Cyanobacteria are known to produce secondary metabolites of biological significance, therefore this group could prove to be a source of a plethora of industrially relevant metabolites. Taxonomic analysis revealed that out of the 27 recovered genome bins, 4 belonged to group Cyanobacteria. Among these 4 Cyanobacterial bins, 2 were from Rainy season and the remaining 2 were from the Winter season (Table 2). Out of these 2 Winter season bins, one bin identified as genus Microcystis. It is noticeable that STAMP analysis also validates the presence of Mycrocystis in the lake (). In freshwater lakes due to eutrophication, several Cyanobacterial blooms are formed and Microcystis play an important role in these bloom formation. Importantly, Microcystis bloom release toxic secondary metabolites such as microcystins that causes serious health issues to humans and other organisms.[67,68] Eight out of the 27 MAGs belonged to “Bacteroidetes” (bin_rainy_7_filtered, bin_winter_1_filtered, bin_winter_23_filtered, bin_winter_31_filtered, bin_winter_33_filtered, bin_winter_34_filtered, bin_winter_49_filtered and bin_winter_53_filtered) (Table 3). This makes us wonder whether the lake was exposed to the human intestinal wastes. Considering their anaerobic nature, occurance of these bacteria in the aquatic environment may indicate recent contamination.[69,70] On the contrary, Bacteroidetes from normal bacterial communities of diverse habitats plays a beneficial role in the degradation or recycling of organic matter like complex polysaccharides and proteins. Glycobiome analysis revealed that all the recovered Bacteroidetes encoded GH23 family of enzymes and this family as per CAZy database was identified as chitinase and peptidoglycan lyase (). The presence of chitinase and peptidoglycan lyases is an indication that other competing organisms such as fungi and bacteria are selectively eliminated. Although Actinobacterial genomes are not part of the MAGs, STAMP analysis using unassembled reads confirmed the presence of this group. However, this phylum was in low abundance compared to other phyla such as Proteobacteria, Cyanobacteria, and Bacteroidetes (Figure 2B and ). One possible reason why this group is in low abundance maybe because of the predominance of Cyanobacteria in the lake region. Cyanobacterial blooms under circumstances of inorganic nutrient availability and high organic matter may compete out highly streamlined Actinobacterial cells. As a result of competition, the population shift is quite obvious and which could also affect the functional capacity of these ecosystems. The phylum Actinobacteria is the most prospective group among all living bacteria for the exploration of bioactive compounds such as antimicrobials, antiparasitics, antitumor agents, anticancer agents, and enzymes. It is reported that 45% of all the documented bioactive compounds of microbial origin are produced by Actinobacteria. Therefore, freshwater Actinobacteria have the potential to become standards of ecological freshwater quality. DNA metabolism, photosynthesis, respiration, amino acid derivatives, carbohydrates, and membrane transport like proteins were found in lake microbes (Figure 3A and ). These genes are essential for any living system to sustain in the environment. Central carbohydrate metabolism, ATP synthesis, and carbon fixation related pathway modules were dominant throughout the recovered bins (n = 27) ( ). However, the presence of a few functional genes is a distinct signature of this particular lake ecosystem. Sulfate-reducing bacteria can change the arsenic concentration of a lake by producing hydrogen sulfide, which can further re-precipitate arsenic. Complete pathway modules related to sulfur metabolism such as M00595 (Thiosulfate oxidation by SOX complex, thiosulfate => sulfate) and M00176 (Assimilatory sulfate reduction, sulfate => H2 S) were found in the lake ( ). The presence of sulfur oxidizers and reducers in this lake marks a very significant game-changer in reducing arsenic toxicity in this lake ecosystem. Low levels of metal ions that occur naturally in aquatic systems due to slow leaching of soil and rocks have no harmful impact on aquatic biota. But excessive metal ions in waters across many parts of the globe are reported which are primarily originated from agricultural, industrial, municipal, and other anthropogenic activity related waste. The area of West Bengal has reported to be heavily contaminated with arsenic which can cause serious health problems such as keratosis, skin cancer, liver fibrosis, chronic respiratory disease, etc. In the lake dataset, mobile genetic elements (MGEs) such as plasmids and transposable elements were found and significantly enriched in Rainy season which may enter this environment through urban runoff (Figure 3A and ). These elements are one of the important ingredients for manufacturing arsenic resistant microbes. Therefore the possible occurrence of resistance genes related to arsenic (As) contamination in lake microbes can be investigated with genome resolved metagenomics approach.[75 -78] Importantly, arsenic (As) resistant and metabolism-related genes were found to be present in the recovered genome bins of the lake sample (Figure 5 and ); This is an indication that the mechanism to stabilize the Arsenic toxicity is already active in the lake microbes. The predominant forms of As in environment are inorganic arsenate (As [V]) and arsenite (As [III]). The Arsenite (trivalent form) is considered to be more toxic and mobile. Thus, redox transformations of As are the keys to predict their fate in the environment. To date, the bioremediation potential of microorganisms in arsenic-rich environments is mainly based on the ability to resist or metabolize arsenic through oxido-reduction reactions. ArsB and Acr3 are arsenic biotransformations linked genes that encode arsenite (As [III]) pumps to get rid of reduced arsenic (As [III]) from the cell. Microorganisms evolved in the earlier anoxic environments, the presence of high concentrations of reduced As (III) is observed in their cell. Majority of microbes have developed efflux systems to remove As (III) from their cells. Hence, almost every extant microbe is equipped with As (III) permease encoding genes e.g.; ArsB or Acr3. Some organisms evolved to have anaerobic respiratory pathway genes that utilize As (III) as an electron donor. In the present study, this type of arsenic cycling is hypothesized to be present (Figure 4). With increasing, atmospheric oxygen concentrations, As (III) is oxidized to As (V), also a toxic compound that can enter the cells of most microbes via the phosphate uptake systems. In consequence, to survive with such toxic environments, organisms need to find ways. Independent evolution of the ArsC gene encoding As (V) reductases system is one such way. Arsenic methylation gene ArsM was found exclusively in the lake with higher abundance (Figure 5). This gene is largely responsible for methylating the arsenic species and making it less toxic for the organisms. Arsenic methylation is another strategy to get rid of toxic As (V). In some microbes, this coupled with ATP hydrolysis is an energy-dependent strategy to actively pump out As (III) from the cell.[72,83] ArsB gene was identified in the MAGs that can bind to ArsA and use membrane potential to pump out As (III). The presence of a whole set of Arsenic metabolizing and detoxifying genes is a powerful indicator that the microbes are stabilizing this region against Arsenic contamination. The up-surge of genome data research on natural products has significantly influenced in finding new lead structures and discovering new precursors. Ribosomally synthesized and post-translationally modified peptides (RiPPs) are an important class of natural products that are unveiled by the genome sequencing. These compounds are synthesized in all three domains of life and hold immense structural diversity. Bacteriocins belong to a class of ribosomally synthesized and post-translationally modified antimicrobial peptides that can inhibit or kill other bacterial strains but will not harm themselves. This class of compounds are viable alternatives to conventional antibiotics and can be utilized as narrow-spectrum antibiotics. Using a metagenomics approach very few studies have carried out to screen bacteriocin genes e.g. the fermented food microbiome[87 -89] and host-associated microbiomes. Our study identified several classes of bacteriocins such as zoocin_A, prochlorosin and microcin using advanced metagenomic approach i.e; genome-resolved metagenomics (Figure 11 and ). Among these class of bacteriocins, the biological role of prochlorosin is presently elusive, but they are cosidered to be functional, as they are found in several Prochlorcoccus strains, and their biosynthetic genes are transcribed in response to environmental changes. It has also reported that, prochlorosin leader peptides are unique in that they have sequence homology with the Nif11 proteins. The exact role of the Nif11 proteins is not understood yet, but they are believed to play a role in nitrogen fixation. Unlike prochlorosin, the biological role of microcins and zoocin_A are well studied. Many reports suggested that microcins hold crucial antimicrobial properties. These antimicrobial properties of microcins make them excellent candidates as next-generation antibiotics for their application in humans as well as in veterinary medicine. It has also evident that microcins could be used as a targeted strategy to treat Enterobacterial colitis and other infectious diseases. Application of zoocin_A to an artificial plaque significantly and selectively minimized the numbers of Streptococcus mutants, leading to the suggestion that zoocin A could be useful as an anti-carcinogenic agent. The presence of zoocin_A, prochlorosin, microcin, and other bacteriocins in the recovered bins suggests that the microbes can be potentially exploited for therapeutic purposes. Apart from the bacteriocins, our study also focused on other secondary metabolite classes such as NRPS and PKS. Non-ribosomal peptides and polyketides are key natural products and potential drug candidates. The most dominant pathway products for the KS domain among the recovered genome bins were fatty acid synthesis. In-addition, other KS domain pathway products assigned to epothilone, chlorothricin, kirromycin, 5-alkenyl-3, 3 (2 h)-furanone, nystatin, spinosad, and c-1027 were also found in few numbers across these bins (). Important C domain pathway products such as syringomycin, microcystin, and calcium-dependent antibiotic were identified in bin_winter_24_filtered and bin_rainy_34_filtered. Moreover, NRPS and NRPS-like clusters were also identified in bin_winter_24_filtered and bin_rainy_34_filtered respectively (). Interestingly, all these bins carried polyketides (C domain), NRPS, and NRPS-like clusters belonged to phylum Cyanobacteria. Recent studies demonstrate there are rich sources of non-ribosomal peptides and polyketides in lineages of Cyanobacteria, Myxobacteria, Streptomycetes and Pseudomonas. Natural products like Non-ribosomal peptide and polyketide hold exceptionally diverse structures and can be cyclic, linear, or have branched structures. Moreover, they can be further re-engineered to generate complex products with exotic chemical structures and biological activities. These products show a wide range of biological activity and many of them are clinically important anti-parasitic, anti-fungal, anti-microbial, anti-tumor, and immunosuppressive agents.[102 -104]

Conclusion

Analyzing lake metagenomes with Illumina reads using a genome-resolved metagenomic approach provides comprehensive information about the taxonomic diversity and potential commercially important metabolites available. The seasonal variation of lake microbiome was not significantly different since the temperature variation was not that drastic during Winter and Rainy season. Taxonomic analysis of the recovered genomes (n = 27) identified that the predominant families were Bacteriodetes, Cyanobacteria, and Proteobacteria. Some of these bacteria are well known for their pathogenic activity. Functional annotation of each recovered genomes revealed the diversity as well as the distribution of arsenic resistance and metabolism-related genes. The presence of arsenic resistance and metabolism-related genes across the genome bins is a possible indicator of arsenic stabilization and neutralization by the ensuing microbes in this region. Recovery of important classes of secondary metabolites including NRPS, PKS, and bacteriocins, indicates that the lake is a rich source of natural products that can be further explored for their applications in the pharmaceutical industry. The identification of secondary metabolite producing arsenic resistant microbes can be used in bioremediation. Therefore, the present study leading to the conclusion that recovered microbes has immense potential in bio-prospecting. Click here for additional data file. Supplemental material, sj-docx-1-bbi-10.1177_11779322211025332 for Reconstructing Draft Genomes Using Genome Resolved Metagenomics Reveal Arsenic Metabolizing Genes and Secondary Metabolites in Fresh Water Lake in Eastern India by Samrat Ghosh, Aditya Narayan Sarangi, Mayuri Mukherjee, Deeksha Singh, Madduluri Madhavi and Sucheta Tripathy in Bioinformatics and Biology Insights

92 in total

Review 1. Polyketide and nonribosomal peptide antibiotics: modularity and versatility.

Authors: Christopher T Walsh
Journal: Science Date: 2004-03-19 Impact factor: 47.728

Review 2. Natural Products as Sources of New Drugs from 1981 to 2014.

Authors: David J Newman; Gordon M Cragg
Journal: J Nat Prod Date: 2016-02-07 Impact factor: 4.050

3. Catalytic promiscuity in the biosynthesis of cyclic peptide secondary metabolites in planktonic marine cyanobacteria.

Authors: Bo Li; Daniel Sher; Libusha Kelly; Yanxiang Shi; Katherine Huang; Patrick J Knerr; Ike Joewono; Doug Rusch; Sallie W Chisholm; Wilfred A van der Donk
Journal: Proc Natl Acad Sci U S A Date: 2010-05-17 Impact factor: 11.205

4. Using the metagenomics RAST server (MG-RAST) for analyzing shotgun metagenomes.

Authors: Elizabeth M Glass; Jared Wilkening; Andreas Wilke; Dionysios Antonopoulos; Folker Meyer
Journal: Cold Spring Harb Protoc Date: 2010-01

5. STAMP: statistical analysis of taxonomic and functional profiles.

Authors: Donovan H Parks; Gene W Tyson; Philip Hugenholtz; Robert G Beiko
Journal: Bioinformatics Date: 2014-07-23 Impact factor: 6.937

Review 6. The chemical versatility of natural-product assembly lines.

Authors: Christopher T Walsh
Journal: Acc Chem Res Date: 2007-05-17 Impact factor: 22.384

7. ISsaga is an ensemble of web-based methods for high throughput identification and semi-automatic annotation of insertion sequences in prokaryotic genomes.

Authors: Alessandro M Varani; Patricia Siguier; Edith Gourbeyre; Vincent Charneau; Mick Chandler
Journal: Genome Biol Date: 2011-03-28 Impact factor: 13.583