Literature DB >> 27729905

Metagenomics of Thermophiles with a Focus on Discovery of Novel Thermozymes.

María-Eugenia DeCastro1, Esther Rodríguez-Belmonte1, María-Isabel González-Siso1.   

Abstract

Microbial populations living in environments with temperatures above 50°C (thermophiles) have been widely studied, increasing our knowledge in the composition and function of these ecological communities. Since these populations express a broad number of heat-resistant enzymes (thermozymes), they also represent an important source for novel biocatalysts that can be potentially used in industrial processes. The integrated study of the whole-community DNA from an environment, known as metagenomics, coupled with the development of next generation sequencing (NGS) technologies, has allowed the generation of large amounts of data from thermophiles. In this review, we summarize the main approaches commonly utilized for assessing the taxonomic and functional diversity of thermophiles through metagenomics, including several bioinformatics tools and some metagenome-derived methods to isolate their thermozymes.

Entities:  

Keywords:  NGS; bioinformatics; metagenomics; thermophiles; thermozymes

Year:  2016        PMID: 27729905      PMCID: PMC5037290          DOI: 10.3389/fmicb.2016.01521

Source DB:  PubMed          Journal:  Front Microbiol        ISSN: 1664-302X            Impact factor:   5.640


Introduction

Thermophiles (growing optimally at 50°C or higher), extreme thermophiles (65–79°C) and hyperthermophiles (above 80°C), categories defined per Wagner and Wiegel (2008), are naturally found in various geothermally heated regions of Earth such as hot springs and deep-sea hydrothermal vents. They can also be present in decaying organic matter like compost and in some man-made environments. Besides the high temperatures, many of these environments are characterized by extreme pH or anoxia. The adaptation to these harsh habitats explains the high genomic and metabolic flexibility of microbial communities in these ecosystems (Badhai et al., 2015) and makes thermophiles and their thermostable proteins very suitable for some industrial and biotechnological applications. Therefore, screening for novel biocatalysts from extremophiles has become a very important field. In the last few years, novel thermostable polymerases (Moser et al., 2012; Schoenfeld et al., 2013), beta-galactosidases (Wang et al., 2014), esterases (Fuciños et al., 2014), and xylanases (Shi et al., 2013), among others, have been described and characterized, opening a new horizon in biotechnology. Apart from the bioprospecting purposes, the analysis of these high-temperature ecosystems and their inhabitants can improve our understanding of microbial diversity from an ecological point of view and increase our knowledge of heat-tolerance adaptation (Lewin et al., 2013). Additionally, the study of thermophiles provides a better comprehension about the origin and evolution of earliest life, as they are considered to be phenotypically most similar to microorganisms present on the primitive Earth (Farmer, 1998; Stetter, 2006). In addition to the bacterial and archaeal communities, there is an increasing interest in the study of the viral populations living in high-temperature ecosystems, as viruses are reported to be the main predators of prokaryotes in such environments (Breitbart et al., 2004), participating in the biogeochemical cycles and being important exchangers of genetic information (Rohwer et al., 2009). The first studies of these extremophiles required their cultivation and isolation (Morrison and Tanner, 1922; Brock and Freeze, 1969; Fiala and Stetter, 1986; Prokofeva et al., 2005; De la Torre et al., 2008). Although these techniques have been improved (Tsudome et al., 2009; Pham and Kim, 2012), the growth of thermophiles under laboratory conditions is still a limitation for the insights into the microbial diversity. The evolution of high-throughput DNA sequencing has enabled the development and improvement of metagenomics: the genomic analysis of a population of microorganisms (Handelsman, 2004). Different high-temperature ecosystems like hot springs (Schoenfeld et al., 2008; Gupta et al., 2012; Ghelani et al., 2015; López-López et al., 2015b; Sangwan et al., 2015), deserts (Neveu et al., 2011; Fancello et al., 2012; Adriaenssens et al., 2015), compost (Martins et al., 2013; Verma et al., 2013), hydrocarbon reservoirs (de Vasconcellos et al., 2010; Kotlar et al., 2011), hydrothermal vents (Anderson et al., 2011, 2014), or a biogas plant (Ilmberger et al., 2012) have been analyzed using this metagenomic approach. These whole community DNA based studies were initially focused to answering the question “who are there” and now have shifted to finding out “what are they doing,” allowing us the access to the natural microbial communities and their metabolic potential (Kumar et al., 2015).

Diversity analysis of thermophiles

Targeted metagenomics

The universality of the 16S rRNA genes makes them an ideal target for phylogenetic analysis and taxonomic classification (Olsen et al., 1986). Schmidt et al. (1991) were the pioneers in performing a community characterization based on metagenome amplified 16S rRNA genes. Since then, the diversity of other natural microbial communities started to be studied using this approach. Jim's Black Pool hot spring, in Yellowstone National Park (YNP), is reported to be the first metagenome-derived analysis of a high-temperature environment based on 16S rRNA gene profiling (Barns et al., 1994). Initially, these studies required the amplification of the 16S rRNA genes followed by either denaturing gradient gel electrophoresis (DGGE, Muyzer et al., 1993) and sequencing or by cloning of the amplicons. In this case, the libraries obtained were screened using direct Sanger sequencing or restriction fragment length polymorphism (RFLP) analysis (Liu et al., 1997; Baker et al., 2001), to select and sequence those clones with unique patterns (Figure 1). As an example, the effect of pH, temperature, and sulfide in the hyperthermophilic microbial communities living in hot springs of northern Thailand was determined with the amplification of complete 16S rRNA genes followed by DGGE separation and sequencing (Purcell et al., 2007). In a different study, RFLP analysis and sequencing of clones with unique RFLP patterns was used to reveal the presence of abundant novel Bacteria and Archaea sequences in a 16S rRNA gene clone library prepared from the 55°C water and sediments of Boiling Spring Lake in California, USA (Wilson et al., 2008).
Figure 1

Schematic representation of the main approaches used for metagenomic analysis of thermophiles.

Schematic representation of the main approaches used for metagenomic analysis of thermophiles. With the development of next generation sequencing (NGS) technologies, more samples can be analyzed at lower sequencing cost and time, improving the production of 16S rRNA gene-based biodiversity studies. Additionally, the use of NGS allows to recover more information about the taxonomy of the sample, as reflected by Song et al. (2013), who obtained greater detail in the community structures from 16 Yunnan and Tibetan hot springs with high throughput 454-pyrosequencing than previous studies using conventional clone library and DGGE (Song et al., 2010). These analyses often rely on a partial sequence of 16S rRNA genes, as the read length of most NGS platforms is relatively short. For this purpose, primers designed for amplification of variable regions of 16S rRNA, like the V4–V8 (Hedlund et al., 2013; Huang et al., 2013), or the V3–V4 (Chan et al., 2015) are used. In the last few years, a high amount of extreme temperature environments have been analyzed with this procedure, especially hot springs, some of which are summarized in Table 1. Thanks to this strategy, a large number of 16S rRNA sequences have been produced and deposited in public databases like the Ribosomal Database Project (RDP, Cole et al., 2014) or the SILVA database (Quast et al., 2013).
Table 1

Examples of hot springs studied using the amplification of the variable regions of 16S rRNA.

Hot springType of samplepHTemperature (°C)SequencerRegion amplifiedReferences
Siloam, Limpopo, South AfricaWater9.563Roche 454 GS FLXV4–V7Tekere et al., 2011
Lake Bogoria, KenyaWater, sediment and microbial mat8.9–9.540–80Roche 454 GS FLXV3–V4Dadheech et al., 2013
Arzakan and Jermuk, ArmeniaWater and sediment7.20–7.5040–53Roche 454 GS FLXV4–V8Hedlund et al., 2013
Bacon Manito Geothermal Field, PhilippinesSediment3.72–6.5860–92Roche 454 GS FLXV4–V8Huang et al., 2013
Furnas Valley, Saõ Miguel, AzoresWater, sediment and microbial mat2.5–851–92Roche 454 GS FLXV2–V3Sahm et al., 2013
Yunnan province and Tibet, ChinaSediment3.2–8.647–96Roche 454 GS FLXV4Song et al., 2013
Zavarzin, Uzon Caldera, Kamchatka, RussiaMicrobial mat6.656–58Roche 454 GS FLXV3Rozanov et al., 2014
Sungai Klah, MalaysiaWater and sediment875–85Illumina MiSeqV3–V4Chan et al., 2015
Jakrem, Meghalaya, IndiaMicrobial matIlluminaV3Panda et al., 2015
Odisha, Deulajhari, IndiaSediment7.14–7.8343–55Illumina GAIIXV3–V4Singh and Subudhi, 2016
Examples of hot springs studied using the amplification of the variable regions of 16S rRNA. Even when the process of generating and sequencing the libraries is relatively fast, this PCR-based approach is biased due to limitations of primers, PCR artifacts like chimeras (Ashelford et al., 2005) and inhibitors that could be present in the sample hindering the amplification (Urbieta et al., 2015). Although there are some previous studies focused on primer design to acquire a high coverage rate (Wang and Qian, 2009), difficulties of the primers in recognizing all the 16S rRNA sequences have been described (Cai et al., 2013), leading to the unequal amplification of species 16S rRNA genes. Furthermore, analysis of 16S rRNA sequences can result in misidentification of the taxonomy, as closely related species may harbor nearly identical 16S rRNA genes. In addition, an overestimation of the community diversity could occur since sporadic cases of distant horizontal transfer of the 16S rRNA gene have been inferred from comparisons of these genes within and between individual genomes (Yap et al., 1999; Acinas et al., 2004). The most used taxonomically informative genomic marker in targeted metagenomics is 16S rRNA, but there are other signature sequences that have been used to study the diversity of thermophiles such as internal transcribed spacer regions (ITS, Ferris et al., 2003) or 18S rRNA genes (Wilson et al., 2008), as well as different protein-coding genes such as aoxB gene fragment, which encodes the catalytic subunit of As(III) oxidase, employed by Sharma et al. (2015) in combination to 16S rRNA to assess the microbial diversity of the Soldhar hot spring in India. Apart from the above mentioned amplicon-targeting strategy, in some studies a sequence capture technique coupled with NGS is driven to enrich the targeted sequences present in the metagenome. Captured metagenomics involves custom-designed hybridization-based oligonucleotide probes that hybridize with the metagenomic libraries followed by the sequencing of the probe-bound DNA fragments. Denonfoux et al. (2013) firstly used this procedure to explore the methanogen diversity in Lake Pavin (Frech Massif Central), showing that this GC-independent procedure is less biased and can detect broader diversity than traditional amplicon sequencing. The same approach has been used to enhance the capture of functional genes coding for carbohydrate-active enzymes and proteases in agricultural soils (Manoharan et al., 2015), and could also be an interesting tool to study thermophilic populations. Another method for targeted metagenomics enrichment is stable isotope probing (SIP) in which the environmental microorganisms are grown in the presence of substrates labeled with isotopes. As a consequence of metabolic activity, the isotope (usually 13C or 15N) is incorporated into the nucleic acids of the microbes metabolizing the substrate, increasing the density of DNA or RNA that can be after separated from unlabelled ones (Coyotzi et al., 2016). The high-density community DNA is then used as template to amplify by PCR the 16S rRNA sequences (Brady et al., 2015) and/or some functional genes involved in the selected metabolic pathway, thus allowing the study of the microorganisms that are actively participating in the processes of interest. Gerbl et al. (2014) used this technique to assess the microbial populations implicated in the carbon cycle in the Franz Josef Quelle radioactive thermal spring (Austrian Central Alps). Although the strategies of targeted metagenomics can be used to infer the taxonomic diversity of the community (16S rRNA gene profiling) or particular aspects of its functional diversity, a broader view of functional diversity, i.e., a more exhaustive answer to the question “what are they doing,” is provided by shotgun metagenomics (Figure 1).

Shotgun metagenomics

Random sequencing of metagenomic DNA using high-throughput sequencing technology is becoming increasingly common. In this approach, DNA is extracted from the whole community and subsequently sheared into small fragments that are independently sequenced. At present, this is considered the most accurate method for assessing the structure of an environmental microbial community, since it does not comprise any selection and reduces technical biases, especially the ones introduced by amplification of the 16S rRNA gene (Lewin et al., 2013). Shah et al. (2011) compared bacterial communities analyzed with both 16S rRNA and whole shotgun metagenomics, revealing that the taxonomy derived from these two different approaches cannot be directly compared. This study also proposed that low abundance species are best identified through 16S rRNA gene sequencing. Therefore, some high-temperature studies use, in parallel, both techniques to assess the taxonomic composition of the microbial community (Dadheech et al., 2013; Klatt et al., 2013; Chan et al., 2015). The biodiversity of several hot environments such as oil reservoirs (Kotlar et al., 2011), compost (Martins et al., 2013), or hot springs (Zamora et al., 2015; Mehetre et al., 2016), was studied using shotgun metagenomics sequencing. Some of them are summarized in Table 2.
Table 2

Examples of high temperature environments studied with shotgun metagenomics.

EnvironmentLocationType of sample/spHTemperature (°C)SequencerTotal readsSize (Mbp)References
Hot springYellowstone National Park, USAMicrobial mat60–65Sanger161,976167Klatt et al., 2011
Yellowstone National Park, USAMicrobial mat6.2–9.140–60Sanger320.6Klatt et al., 2013
Yellowstone National Park, USAMicrobial mat3.560–78Roche 454Kozubal et al., 2013
Yellowstone National Park, USAMicrobial mat2.5–7.865–80Sanger75,00060Inskeep et al., 2010
Yellowstone National Park, USAMicrobial mat and sediment2.5–6.470–85Sanger250Inskeep et al., 2013
Yellowstone National Park, USAWater1.879Roche 4541,604,079Menzel et al., 2015
Yellowstone National Park, USAWater3.5–4.092Roche 454420,726Menzel et al., 2015
Yellowstone National Park, USAMicrobial mat7.980–82Sanger1.29Colman et al., 2016
Los Azufres, MexicoSediment3.675Illumina GaIIx6,000,792216Servín-Garcidueñas et al., 2013
Lake Bogoria, KenyaWater, sediment and microbial mat8.9–9.540–80Roche 45424,56712.7Dadheech et al., 2013
Saõ Miguel, AzoresWater, sediment and microbial mat2.5–851–92Roche 454Sahm et al., 2013
Champagne pool, New ZealandWater and sediment5.5–6.945–75Illumina MiSeq4,623,251Hug et al., 2014
Long Valley Caldera, CaliforniaMicrobial mat50–80Illumina MiSeqStamps et al., 2014
Odisha, IndiaWater and sediment7.2–7.440–58Roche 454 GS71.26Badhai et al., 2015
Sungai Klah, MalaysiaWater and sediment8.0075–85Illumina HiSeq5,527,175,000Chan et al., 2015
Shi-Huang-Ping, TaiwanWater2.569Illumina HiSeq557,415,266Lin et al., 2015
Lobios, Ourense, SpainWater8.276Illumina HiSeq11,982,436López-López et al., 2015b
Tuwa, IndiaWater8.2–954–65Ion Torrent PGM541,37998.7Mangrola et al., 2015a
Lasundra, IndiaWater6.042–52Ion Torrent PGM606,86798.6Mangrola et al., 2015b
Eryuan, ChinaSediment7.065Illumina HiSeq10,360,000Menzel et al., 2015
Uzon Caldera, RussiaWater and sediment5.8–6.061–64Roche 454660,054Menzel et al., 2015
Pozzuoli, ItalyWater and sediment3.076Illumina HiSeq10,060,000Menzel et al., 2015
Pisciarelly, ItalySediment5.586Roche 454876,681Menzel et al., 2015
Grensdalur, IcelandWater and sediment5.085–90Illumina HiSeq10,330,000Menzel et al., 2015
Krísuvík, IcelandWater and sediment3.5–4.090Illumina HiSeq10,050,000Menzel et al., 2015
Parvati River, IndiaWater and microbial mat7.1–7.493–52Illumina GaIIx78,891,278Sangwan et al., 2015
Unkeshwar, IndiaWater7.350–60Illumina HiSeq848,096212.87Mehetre et al., 2016
Deep sea hidrotermal ventMid-Atlantic RidgeMicrobial mat9.0–11.090Illumina HiSeq46,36135Brazelton and Baross, 2009
Juan de Fuca RidgeSulfide chimney2.0–3.0316Roche 454308,03471Xie et al., 2011
Guaymas Baisin, Gulf of CaliforniaBlack-smoker chimney6.0190Roche 454512,83196.38He et al., 2013
Juan de Fuca RidgeVent fluid125Roche 454808,051Anderson et al., 2014
Biogas reactorGermanyMicrobial mat6.64–8.1155Roche 454303,493120.50Rademacher et al., 2012
Linköping, SwedenDigester material7.5–8.150–55Roche 454250,478Sundberg et al., 2013
RomaniaSludge6.94–7.6255Ion Torrent PGM300,000Pap et al., 2015
Oil reservoirNorwegian seaOil, water and gas85Roche 454702,607345Kotlar et al., 2011
Gippsland Basin, AustraliaFormation water7.2120Roche 454Li et al., 2013a
CompostSão Paulo Zoo Park, BrazilCompost material765.8–67.2Roche 4543,167,044836Martins et al., 2013
Gold mineSouth AfricaFracture water9.360Sanger and Roche 454500,008Chivian et al., 2008

In those studies comprising several samples, the total reads and size reflected is just the one of the sample with higher values.

Examples of high temperature environments studied with shotgun metagenomics. In those studies comprising several samples, the total reads and size reflected is just the one of the sample with higher values. Development of NGS has greatly enhanced this approach. The most widely used platforms for this kind of analysis in high temperature environments are Illumina and Roche 454 (Table 2). Illumina currently offers the highest throughput per run and the lowest cost per-base (Liu et al., 2012), generating read lengths up to 300 bp. On the other hand, Roche 454 gives longer reads (1 kb maximum), which are easier to map to a reference genome; however it is more expensive and has lower throughput (van Dijk et al., 2014). Even though they have substantial differences (Kumar et al., 2015), some studies have demonstrated that the information recovered from both sequencing platforms is comparable when analyzing the biodiversity of the same sample (Luo et al., 2012). The main limitations of shotgun metagenome sequencing include its relatively expensive setup cost and the requirement of very high computing power for data storage, retrieval, and analysis. Another important drawback of this approach is that high quality whole community DNA is needed, which makes the extraction a critical step in the process of generating metagenomic data. Therefore, some studies have focused on the improvement of metagenomic DNA extraction from thermal environments (Mitchell and Takacs-Vesbach, 2008; Li et al., 2013a; Gupta et al., 2016). Nowadays the NGS platforms allow sequencing with low inputs of DNA, nevertheless in some cases it is necessary to amplify the metagenomic DNA to obtain enough quantity for preparing the sequencing libraries. As an example, Nakai et al. (2011) used multiple displacement amplification with Phi29 to sequence the metagenome of the hydrothermal fluid of the Mariana Trough, an active back-arc basin in the western Pacific Ocean. This amplification step is frequently required to generate viral metagenomic libraries, introducing a subsequent bias (Kim and Bae, 2011), as the extraction of enough high quality viral nucleic acids is a difficult process that usually relies on virus concentration methods. To assess the taxonomic diversity with the short metagenomic reads obtained after sequencing, there are several non-exclusive approximations that can be done: analyzing taxonomically informative marker genes, grouping sequences into defined taxonomic groups (binning) or/and assembling sequences into definite genomes (Sharpton, 2014). As mentioned before, the most frequently used taxonomically informative marker genes are rRNA genes or protein-coding genes that tend to be single copy and common to microbial genomes. In this approach, those reads that are homologs to the marker gene are identified in the sequences of the metagenome and annotated using sequence or phylogenetic similarity to the marker gene database sequences. Bioinformatics applications for this purpose include MetaPhyler (Liu et al., 2010), EMIRGE (Miller et al., 2011), and AMPHORA (Wu and Scott, 2012). Gladden et al. (2011) used EMIRGE to reconstruct near full-length small subunit (SSU) rRNA genes from metagenomic Illumina sequences to determine the taxonomy of compost-derived microbial consortia adapted to switchgrass at 60°C, finding a low-diversity community with predominance of Rhodothermus marinus and Thermus thermophilus. In another study, Klatt et al. (2011) used AMPHORA to identify the phylogenetic and functional marker genes in the assemblies of several hot springs cyanobacterial metagenomes from YNP. These studies allowed the discovery of novel chlorophototrophic bacteria belonging to uncharacterized lineages within the order Chlorobiales and within the Kingdom Chloroflexi. In a similar approach, Lin et al. (2015) and Colman et al. (2016) used a 16S rRNA gene-based diversity method blasting the metagenomic reads against the SILVA reference database to characterize bacterial populations in Shi-Huang-Ping acidic hot spring (Taiwan) and in two thermal springs in YNP, respectively. Taxonomic binning is defined as the process of grouping reads or contigs and assigning them to operational taxonomic units, depending on information such as sequence similarity, sequence composition or read coverage (Dröge and McHardy, 2012). Metagenomic sequences can be binned based on their sequence similarity to a database of taxonomically annotated sequences using tools like MEGAN (Huson et al., 2011) or MG-RAST, a public resource for the automatic phylogenetic and functional analysis of metagenomes (Meyer et al., 2008). MEGAN bases its taxonomic classification on the NCBI taxonomy using BLAST. With this tool, Klatt et al. (2013) assessed the community structure of six phototrophic microbial mat communities in YNP and Badhai et al. (2015) revealed the dominance of Bacteria over Archaea in four geothermal springs in Odisha, India. Taxonomic binning can be done with assembled or unassembled reads, although assessing taxonomic abundance with assembled data can led to a miscalculation of the abundance of some taxa, as contigs are treated as a single sequence in most downstream analysis, hindering the analytical tools to accurately quantify the abundance of the taxon (Sharpton, 2014). Assembly is described as the process of merging individual metagenomic reads into longer pieces of contiguous sequences (contigs) based on overlapping sequences and paired read information (Dröge and McHardy, 2012). Bioinformatic implements like MetaVelvet (Namiki et al., 2012) or IDBA-UD (Peng et al., 2012) have been used in the assembly of whole shotgun metagenome reads to study the taxonomical composition of different high-temperature environments. For example, MetaVelvet was applied in the study of eight globally distributed hot springs by Menzel et al. (2015) and IDBA-UD in the analysis of the community composition of Sungai Klah hot spring in Malaysia (Chan et al., 2015). This step can simplify bioinformatic analysis, but it may also produce chimeras, therefore researchers often bin reads and assemble each bin independently to decrease the probability of generating chimeras (Sharpton, 2014). In recent studies, the integration of assembly and taxonomic binning by sequence composition allowed the reconstruction of several partial genomes from high-temperature environments such as the genome of a novel archaeal Rudivirus obtained from a Mexican hot spring, (Servín-Garcidueñas et al., 2013) or the draft genome sequence of Thermoanaerobacter sp. strain A7A, reconstructed from the metagenome of a 102°C hydrocarbon reservoir in the Bass Strait, Australia (Li et al., 2013b). Using a similar approach, Sangwan et al. (2015), reconstructed the genome of the bacterial predator Bdellovibrio ArHS, with the metagenomic assembly of the microbial mats of an arsenic rich hot spring in the Parvati river valley (Manikaran, India). Also, Sharma et al. (2016) combining genomic and metagenomic data, used two Cellulosimicrobium cellulans genomes derived from metagenomics, to study the evolution of pathogenicity across the species of C. cellulans.

Functional analysis of thermophiles

Sequence-based function prediction

The metagenomic reads obtained from shotgun sequencing of an environmental DNA can be annotated with functions to determine the functional diversity of the microbial community. This usually comprises two steps: identifying metagenomic reads that contain protein coding sequences (gene prediction), and comparing the coding sequences to a database of genes, proteins, protein families, or metabolic pathways (gene annotation) (Sharpton, 2014). Some frequently used databases for functional annotation are the SEED annotation system (Overbeek et al., 2014), the KEGG orthology (KO) database (Kanehisa et al., 2016) or the Pfam database, based on hidden Markov models (HMM) to classify in accordance with the protein domains (Finn et al., 2015). There are several robust web resources that can be easily used to perform gene prediction, database search, family classification, and annotation, including MG-RAST (Meyer et al., 2008), IMG/M (Markowitz et al., 2014), or SUPER-FOCUS (Silva et al., 2015). Considerable functional profiles of thermophilic populations have been based on these tools such as the study of the microbiota of Tuwa hot spring in India (Mangrola et al., 2015a) in which the functional annotation was performed using the MG-RAST pipeline. In this study, a high number of annotated features were classified as unknown function, suggesting the potential source of novel microbial species and their products. Similar results were found in the metagenome of Unkeshwar, another hot spring in India, where pathway annotation was done using KEGG (Mehetre et al., 2016). For each contig sequence, the assignment of KO numbers obtained from known reference hits was done, revealing up to 20% unclassified sequences. These results reflect a promising world of undiscovered proteins that could be explored to find new catalysts for biotechnological applications. In this approach, it is important to consider that, despite the information given by functional annotation of the metagenomic sequences; the presence of a gene on a metagenome does not mean that it is expressed. Therefore, functional metagenomics, metatranscriptomics, and metaproteomics assays are necessary to assess the real community functional activity. To increase the probability of finding active functional genes involved in a substrate uptake and transformation, some studies use a substrate-induced enrichment of the community before the metagenomic DNA extraction. After, these genes can be detected either by sequence (Graham et al., 2011; Wang et al., 2016) or by functional metagenomics (Chow et al., 2012). Using this procedure, Graham et al. (2011) found and characterized an hyperthermophilic cellulase in an archaeal community, obtained by growth at 90°C of the sediment of a geothermal source enriched with crystalline cellulose. Another important limitation of shotgun metagenomics is that the databases may be subjected to phylogenetic biases, as some communities are more accurately or more exhaustively annotated than others (Chistoserdova, 2010).

Functional metagenomics

Function-based metagenomics relies on the construction of metagenomic libraries by cloning environmental DNA into expression vectors and propagating them in the appropriate hosts, followed by activity-based screening. After an active clone is identified, the sequence of the clone is determined, the gene of interest is amplified and cloned with the subsequent expression and characterization of the product to explore its biotechnological potential (Figure 2). This technique has the advantage of not requiring the cultivation of the native microorganisms or previous sequence information of known genes, thus representing a valuable approach for mining enzymes with new features.
Figure 2

The two major strategies used for screening metagenomes in search of new thermozymes.

The two major strategies used for screening metagenomes in search of new thermozymes. The use of functional metagenomics allows the discovery of novel enzymes whose functions would not be predicted based on DNA sequence. This approach complements sequence-based metagenomics as the information from function-based analyses can be used to annotate genomes and metagenomes derived exclusively from sequence-based analyses (Lam et al., 2015). Therefore, several investigations in thermal environments combine sequencing methods (taxonomical and functional characterization) with functional screening of clones (Chen et al., 2007; Wemheuer et al., 2013; Leis et al., 2015; López-López et al., 2015b). Depending on the size of the insert, functional metagenomics can be explored using fosmids (35–45 kb insert), BACs (~200 kb insert), cosmids (30–42 kb insert), or plasmids (<10 kb insert). Bigger inserts are more likely to contain complete genes and operons, allowing the expression of more enzymes. A great number of high temperature functional metagenomics studies use the commercial vector pCC1FOS (Table 3), which allows inserts up to 40 kb, and it is available in a toolkit to simplify the library construction. More information about this vector is compiled in Lam et al. (2015) review.
Table 3

Examples of thermozymes obtained by functional metagenomics.

FamilyActivityMetagenomic DNA sourceVectorHostSubstrateTotal No. clonesPositive clonesReferences
Lypolitic enzymesLipaseBiomass sequencing fed-batch reactorpIAFS2E. coli LE392MPTributyrin10,00010Meilleur et al., 2009
LipaseOil field soilpZErO-2E. coli TOP10Tributyrin, rhodamine B83,0001Fan et al., 2011
LipaseBotanical Garden soil, Hamburg, GermanypSuperCosE. coli EPI100Tributyrin65002Chow et al., 2012
LipaseIndustrial wastewater treatment plantpBeloBAC11E. coli DH10BTributyrin, rhodamine B40,00012Sharma et al., 2012
LipaseActivated sludgepUC18E. coli DH5αTricaprylin130,0001Roh and Schmid, 2013
Lipase/EsteraseHot spring, Kamchatka PeninsulapCR-XL-TOPOE. coli DH5αTributyrin3Wemheuer et al., 2013
Lipase/EsteraseOil polluted mud flatpUC19E. coli DH5αGlyceryl trioctanoate30002Kim et al., 2015
EsterasePoly(DL-lactic acid) disks in compostpUC18E. coli DH10BPLA solution40,0007Mayumi et al., 2008
EsteraseCompostpCC1FOSE. coli EPI300T1RTributyrin, gum arabic13,00010Kang et al., 2011
EsteraseTurpan Basin soil, ChinapUC118E. coli TOP10Caprylate21,0003Fan et al., 2012a
EsteraseTurpan Basin soil, ChinapUC118E. coli DH5αCaprylate26,0001Fan et al., 2012b
EsteraseActivated sludgepBluescript SK+E. coli DH5αTributyrin40,0001Shao et al., 2013
EsteraseTurpan Basin soil, ChinapUC118E. coli DH5αTributyrin200,00019Wang et al., 2013
EsteraseDeep-sea hydrothermal field, East PacificpUC18E. coli XL1-blueTributyrinZhu et al., 2013
EsteraseGuayas Basin smoker chimmeneypCC2FOSE. coli EPI300T1RTributyrin18,0007Fu et al., 2015
EsteraseOil polluted mud flatpUC19E. coli DH5αTributyrin30007Kim et al., 2015
EsteraseHot spring, Furnas, AzorespCT3FKT. thermophilus BL03Tributyrin79686Leis et al., 2015
EsteraseHot spring water Lobios, Galicia, SpainpCC1FOSE. coli EPI300T1RTributyrin11,6006López-López et al., 2015b
GlycosidaseCellulaseCompostpCC2FOSE. coli EPI300T1RCMC Trypan blue12,3802Kwon et al., 2010
CellulaseBiogas plant and elephant fecespCC1FOSE. coli EPI300T1RCMC Congo red29,00014Ilmberger et al., 2012
CellulaseActivated sludgepUC18E. coli DH5αCMC32,0004Sharma et al., 2012
β-glucosidaseHot spring water, mud and sedimentpCR-XL-TOPOE. coli TOP10Esculin hydrate, ferricammonium citrate1Schröder et al., 2014
β-glucosidaseThermophilic methanogenic digesterpCC1FOSE. coli EPI300T1REsculin hydrate, ferricammonium citrateWang et al., 2015
β-glucosidaseTermite hindgutpUC118E. coli DH5αEsculin hydrate, ferricammonium citrate800,00013Gao et al., 2016
β-galactosidaseHot spring, Hymalaya, IndiapSMART LCampE. coli DH10BX-gal10,0001Gupta et al., 2012
β-galactosidaseTurpan Basin soil, ChinapUC19ΔlacZE. coli DH5αX-gal80001Zhang et al., 2013
β-galactosidaseTurpan Basin soil, ChinapUC19ΔlacZE. coli DH5αX-gal700,0001Wang et al., 2014
XylanaseCompostpCC2FOSE. coli EPI300T1RXylan12,3805Kwon et al., 2010
EndoxylanaseCompost-soilp18 GFPE. coli DH10BRBB-xylan180,0001Verma et al., 2013
β-xylosidaseThermophilic methanogenic digesterpCC1FOSE. coli EPI300T1REsculin hydrate, ferricammonium citrateWang et al., 2015
VariousRaw and torrified wheat strawpCC1FOSE. coli EPI300T1RX-fuc, X-gal, X-Xyl, X-Man, X-cel, X-glu44,00071Maruthamuthu et al., 2016
Amylaseα-amylaseWestern Ghats soilpCC1FOSE. coli EPI300T1RPotassium iodide solution76,0001Vidya et al., 2011
α-amylaseBiogas reactorpBK-CMVE. coli XLOLRAZCL-amylose20001Jabbour et al., 2013
PhosphatasePhytaseSoilpCC1FOSE. coli EPI300T1RPhytate14,00028Tan et al., 2014
ProteaseSerine proteaseSand Gobi desertpCC1FOSE. coli EPI300T1RSkim milk17,00016Neveu et al., 2011
Serine proteaseSand Death Valley desertpBSKII+E. coli DH10BSkim milk30,0001Neveu et al., 2011
Serine proteaseSoilpHT01E. coli DH10BAZCL-casein1Biver et al., 2013
Patatin-like proteinHot spring, KamchatkapCR-XL-TOPOE. coli DH5αSkim milk1Wemheuer et al., 2013
Serine proteaseHot spring, Chumathang LadakhpUC18E. coli DH5αCasein90001Singh et al., 2015
OxidoreductaseExtradiol dioxygenaseActivated sludgepCC1FOSE. coli EPI300T1RCatechol96,00091Suenaga et al., 2007
Oil reservoirpCC2FOSE. coli EPI300T1RHexadecane500072de Vasconcellos et al., 2010
DioxygenaseActivated sludgepBeloBAC11E. coli DH10BPhenol, catechol40,0004Sharma et al., 2012
Examples of thermozymes obtained by functional metagenomics. There are several technically challenging steps in library construction. Mainly, the high quality and length of the metagenomic DNA required for proceeding to the ligation into the vector and the need of obtaining a high proportion of clones in order to cover all the variability of the microbial community. This limitation is particularly important in soil studies, where it has been reported that contaminants like humic acids are present in metagenomic DNA extracts, interfering with the subsequent enzymatic reactions. Therefore, the widely extended method of soil DNA extraction established by Zhou et al. (1996), is usually accomplished with further purification of the sample that can lead to a loss of DNA yield. Some studies show that the humic acids can be easily removed by gel electrophoresis of the metagenomic DNA followed by gel extraction, as humic acids migrate faster than the large metagenomic DNA (Kwon et al., 2010). This simple approach was used to construct a Turpan Basin soil metagenomic library for a functional screening of thermostable beta-galactosidases (Wang et al., 2014). Alternatively, to avoid contaminating the circulating buffer, electrophoresis can be paused after humic acids have formed a front, excising the part of the gel containing the humic acids, and replacing it with fresh gel (Cheng et al., 2014). Another important drawback that compromises the functional metagenomics approach is the selection of the expression host. Although the commonly used E. coli strains have relaxed requirements for promoter recognition and translation initiation, some genes from environmental samples may not be efficiently expressed due to differences in codon usage, transcription and/or translation initiation signals, protein-folding elements, post-translational modifications, or toxicity of the active enzyme (Uchiyama and Miyazaki, 2009). This problem could be even worse when the proteins expressed need special conditions to be active, such as high temperatures, considering that mesophiles, like E. coli, do not survive at these high temperature conditions. Accordingly, an alternative expression host may be required to overcome the heterologous expression of some genes derived from hot environments and thus, identify a broader range of enzymes. The thermophilic bacterium T. thermophilus has been proposed as a good candidate for function-based detection of thermozymes. In a recent functional screening to detect esterases, Leis et al. (2015) constructed two large insert fosmid metagenomic libraries of compost and hot spring water using pCT3FK, a pCC1FOS derived T. thermophilus/E. coli shuttle fosmid (Angelov et al., 2009), in T. thermophilus and compared them to the same libraries expressed in E. coli. Only two esterases were found at 60°C in the libraries generated in E. coli while 5 different esterases were discovered in the same libraries expressed in T. thermophilus. Therefore, this could be a suitable system to improve the detection of metagenome-derived thermozymes. The main restriction of this approach is that pCT3FK integrates into T. thermophilus chromosomal DNA. In fact, the genomes of the positive clones isolated by Leis et al. (2015) were completely sequenced before proceeding with the PCR amplification and cloning of the candidate genes, with the consequent cost of time and money. Other versatile broad-host-range cosmids that have been used in a soil study (pJC8 and pJC24) allow the phenotypic screening of the library in bacteria such as Bacillus and in the yeast Saccharomyces (Cheng et al., 2014). The selection of the appropriate substrate for the functional screening is also a crucial step in this approach, as the substrate may cause biases in the selection of the activities of interest. Recent studies suggest that the initial selection of active clones with general substrates should be followed by a more specific one to improve the effectiveness of the detection (Ferrer et al., 2016). Other biases and limitations of functional metagenomics and strategies for its improvement have been previously reviewed by Ferrer et al. (2005) and Ekkers et al. (2012). Some hot environments where function-based screening of microbial communities have been done include hot springs (López-López et al., 2015b), deserts (Neveu et al., 2011), petroleum reservoirs (de Vasconcellos et al., 2010), or human-made environments like a biogas plant (Ilmberger et al., 2012), demonstrating the potential of functional metagenomics as a very important source of new thermozymes.

Metagenome-derived thermozymes

Many industrial processes require elevated temperatures to take place. Thus, microorganisms surviving at temperatures above 55°C represent an important source of biotechnological richness for high temperature bioprocesses by producing a large variety of biocatalysts. Biotechnological processes carried out at high temperatures provide numerous benefits such as higher solubility of reagents, and reduced risk of microbial contamination (Mirete et al., 2016). From an industrial point of view, thermozymes possess certain advantages over their mesophilic counterparts as they are active and efficient under high temperatures, extreme pH values, high substrate concentrations, and high pressure (Sarmiento et al., 2015). Some of them are also highly resistant to denaturing agents and organic solvents (Fan et al., 2011; Roh and Schmid, 2013). In addition, thermozymes are easier to separate from heat-labile proteins during purification steps as reported by Pessela et al. (2004). As a result, high temperature-active enzymes can be potentially used in diverse industrial and biotechnological applications including food, paper and textile processing, chemical synthesis and the production of pharmaceuticals. Some thermostable enzymes are still recovered by isolation from thermophilic microorganisms (Shi et al., 2013; Fuciños et al., 2014; Sen et al., 2016), however metagenomics has opened a new important field in the discovery of novel biocatalysts and has been revealed as a promising mining strategy of resources for the biotechnological and pharmaceutical industry. There are two different ways of screening a metagenome in search of thermozymes: a sequence-based approach and a function-based approach (Figure 2). Sequence-based screening methods rely on the prior knowledge of conserved sequences of domains/proteins/families of interest. It involves primer designing followed by amplification and cloning of the metagenomic genes. The main drawback of this approach is its failure to detect fundamentally different novel genes, as it cannot discover non-homologous enzymes. Some potential biocatalysts have been isolated mining metagenomic sequences in prospecting for genes coding thermozymes (Table 4). Namely, a gene encoding a thermostable pectinase was isolated from a soil metagenome sample collected from hot springs of Manikaran (India), using a PCR-based cloning strategy with primers designed based on known sequences of pectinase genes from other species (Singh et al., 2012). The recombinant protein is proposed to be of great use in industrial processes due to its activity over a broad pH range. Thanks to this search based on sequence homology to related gene families, 22 putative ORFs (open reading frames) were identified from a switchgrass-adapted compost community finding a bi-functional β-xylosidase/α-arabinofuranosidase that maintained ~75% of its activity after 16 h at 60°C (Dougherty et al., 2012). The same sequence-based approach was used by Ferrandi et al. (2015) who discovered, cloned and characterized two novel limonene-1,2-epoxide hydrolases (LEHs) with an in-silico screening of the LEHs sequences in the assembled contigs from hot spring metagenomes.
Table 4

Examples of thermozymes isolated by sequence-based screening of metagenomes.

EnzymeSource of metagenomic DNAReferences
CellulaseLong-term dry thermophilic methanogenic digesterWang et al., 2015
XylanaseCow dung compostSun et al., 2015
XylanaseLong-term dry thermophilic methanogenic digesterWang et al., 2015
EndoxylanaseCompostDougherty et al., 2012
β-galactosidaseHot spring water, YongtaiLiu et al., 2015
β-xylosidase/α-arabinofuranosidaseCompostDougherty et al., 2012
β-xylosidaseLong-term dry thermophilic methanogenic digesterWang et al., 2015
PectinaseHot spring soil, Manikaran, IndiaSingh et al., 2012
α-fucosidaseCompostDougherty et al., 2012
Phytase (phosphatase)Insect-cultivated fungus gardensTan et al., 2016
Limonene 1,2-epoxide hydrolaseHot springs waterFerrandi et al., 2015
NitrilaseAtlantis II Deep Brine Pool, Red SeaSonbol et al., 2016
α-amylaseHydrotermal vent, Juan de la FucaWang et al., 2011
DehidroclorinaseSoil contaminated with HCH-isomersMacwan et al., 2011
PolymeraseHot spring 3173 PolMoser et al., 2012
Examples of thermozymes isolated by sequence-based screening of metagenomes. The function-based metagenomic screening is the most important way to discover novel thermozymes as it doesn't rely on the sequence. The main advantage of directly screening for enzymatic activities from metagenome libraries is that it gives access to previously unknown genes and their encoded enzymes. Thus, some completely new thermozymes that couldn't be found by sequence screening have been discovered, like the unusual glycosyltransferase-like enzyme with β-galactosidase activity recovered by Wang et al. (2013) from a Turpan Basin soil metagenomic library. Function-based metagenomic screening has allowed the discovery of a wide range of thermozymes (Table 3). In this review, we focus on the recovery of the functional-derived thermostable metagenomic enzymes that are mostly used in biocatalysis and industrial sectors, such as lipolytic enzymes, glycosidases, proteases, and oxidoreductases (Böhnke and Perner, 2015).

Lipolytic enzymes

Lipolytic enzymes, comprising esterases (EC 3.1.1.1) and lipases (EC 3.1.1.3), are extensively distributed in microorganisms, plants, and animals. They catalyze the hydrolysis, synthesis, or transesterification of ester bonds. At present, these enzymes represent about 20% of commercialized enzymes for industrial use (López-López et al., 2015a), as they have great potential in several industrial processes such as production of biodegradable polymers, detergents, food flavoring, oil biodegradation, or waste treatment, among others (Anobom et al., 2014). Therefore, a considerable number of functional metagenomics studies are focused on mining thermal environments in search for these enzymes (Table 3). Lipases are generally defined as carboxylesterases hydrolyzing water-insoluble (acyl chain length >10) triglycerides, with trioleoylglycerol as the standard substrate. In contrast, esterases catalyze the hydrolysis of short-chain esters (acyl chain length <10) with tributylglycerols (tributyrin) as the standard substrate, although lipases are also capable of hydrolyzing esterase substrates (Rhee et al., 2005). At least 200 different substrates have been successfully applied in assays for functional selection of esterase/lipase biocatalysts in metagenomic clone libraries (Ferrer et al., 2016), including the widely used tributyrin (Rhee et al., 2005; Meilleur et al., 2009; López-López et al., 2015b), and p-nitrophenyl (NP) acetate (Wang et al., 2013). Meilleur et al. (2009) isolated a new alkali-thermostable lipase with an optimal activity at 60°C and pH 10.5 by functional screening of a metagenomic cosmid library from the biomass produced in a gelatin enriched fed-batch reactor. Another gene coding for a thermostable esterase was detected by functional screening of fosmid environmental DNA libraries constructed with metagenomes from thermal environmental samples of Indonesia (Rhee et al., 2005). The recombinant esterase was active from 30 up to 95°C with an optimal pH of approximately 6.0. Mayumi et al. (2008), generated a metagenomic library with the community DNA extracted from biodegradable polyester poly(lactic acid) (PLA) disks buried in compost and found a PLA depolymerase that had an esterase domain. Purified enzyme showed the highest activity at 70°C and degraded not only PLA, but also various aliphatic polyesters, tributyrin, and p-NP esters. As mentioned before, those enzymes able of retaining activity even in the presence of organic solvents are considered very interesting for industrial applications. A new thermophilic organic solvent-tolerant and halotolerant esterase with an optimum pH and temperature of 7.0 and 50°C, respectively, was found in the functional screening of a soil metagenomic library with 48,000 clones (Wang et al., 2013). Apart from these above cited sources, metagenomic esterases, and lipases have been isolated by functional screening of other hot environments like deep-sea hydrothermal vents (Zhu et al., 2013) and hot springs (López-López et al., 2015b) as shown in Table 3. A more extensive review of metagenome derived extremophilic lipolytic enzymes can be found in López-López et al. (2014).

Glycosidases

The enzymes that hydrolyze glycosidic bonds between two or more sugars or a sugar and a nonsugar moiety within carbohydrates or oligosaccharides are known as glycosyl hydrolases (GHs) or glycosidases (Sathya and Khan, 2014). There are 115 GH families, collected in the Carbohydrate Active enZyme database (CAZy; http://www.cazy.org) (Lombard et al., 2014), including a broad number of enzymes like cellulases, β-galactosidases, amylases, and pectinases. Cellulases encompass a group of complex enzymes conformed by endo-β-1,4 glucanases, cellobiohydrolases, cellodextrinases, and β-glucosidases. These enzymes work together to degrade cellulose into simple sugars and their thermostable representatives could be used in biofuel production from lignocellulosic biomass (Bhalla et al., 2013). Several substrates can be employed in plate-based screens for the functional detection of clones harboring cellulase activity, such as carboxymethyl-cellulose in combination with trypan blue, Gram's iodine, or Congo Red. Meddeb-Mouelhi et al. (2014), found that Gram's iodine may lead to the identification of false positives, making Congo Red a more suitable dye for this approach. Using Congo Red dye as a colorimetric substrate, Ilmberger et al. (2012) obtained two fosmid clones derived from a carboxymethyl-cellulose (CMC)-enriched library from a biogas plant. These two fosmids were designated as pFosCelA2 and pFosCelA3, encoding two thermostable cellulases with significant activities in the presence of 30% (v/v) ionic liquids (ILs). This is an interesting property for the cellulose degradation, as cellulose could increase its solubility in the ILs. From the group of cellulases, β-glucosidases have attracted considerable attention in recent years due to their important roles in various biotechnological processes such as hydrolysis of isoflavone glucosides or the production of fuel ethanol from agricultural residues (Singhania et al., 2013). Other uses of β-glucosidases include the cleavage of phenolic and phytoestrogen glucosides from fruits and vegetables for medical applications or to enhance the quality of beverages. An archaeal β-glucosidase (Bgl1) showing activity toward cellobiose, cellotriose, and lactose was isolated from a metagenome from a hydrothermal spring in the island of Sγo Miguel (Azores, Portugal) (Schröder et al., 2014). β-Galactosidases (EC 3.2.1.23), which hydrolyze lactose to glucose and galactose, have two main applications in the food industry: the production of low-lactose milk and dairy products for lactose intolerant people and the generation of galactooligosaccharides from lactose by the transgalactosylation reaction. These enzymes can be also used in the revalorisation of cheese whey (Becerra et al., 2015), a by-product of the dairy industry with a high organic load that can be considered a pollutant. The most widely used substrate for the β-galactosidase screening, 5-bromo-4-chloro-3-indolyl-β-D-galactopyranoside (X-gal), is the substrate providing, in some cases, the lowest number of positive hits in relation to the total number of clones screened (Ferrer et al., 2016). Usually, the positive clones capable of hydrolyzing the X-gal are further tested against ortho-NP-β-galactoside (ONPG) and lactose (Wierzbicka-Woś et al., 2013). Mayor drawbacks for the use of β-galactosidase in industrial processes is the inhibition by the reaction products, leading to a decrease in the reaction rates or even to stop the enzymatic reaction completely. The thermostable β-galactosidase (Gal308) discovered by Zhang et al. (2013) exhibited high tolerance to galactose and glucose with the highest activity at 78°C, an optimum pH of 6.8 and high enzyme activity with lactose as substrate. The authors suggest that these properties would make it a good candidate for the production of low-lactose milk and dairy products. Another novel and thermostable alkalophilic β-D-galactosidase with an optimum temperature at 65°C and with high transglycosylation activity was identified through functional screening of a metagenomic library from a hot spring in northern Himalayan region of India (Gupta et al., 2012). Xylans, made of β-1,4 linked xylopyranoses as a linear backbone with branches, constitute the second most significant group of polysaccharides in plant cell walls and are degraded by xylanases (Sathya and Khan, 2014). These hemicellulolytic enzymes are mostly used as biobleaching agents in the paper and pulp industry. The discovery of thermostable and alkali-stable xylanases has become an important goal in this field since this process requires high temperatures and alkali media, but this is not the only application of thermostable xylanases (Kumar et al., 2016). Functional metagenomics of hot environments represents an interesting source of xylanases. As an example, a novel alkali-stable and thermostable GH-11 endoxylanase encoding gene (Mxyl), was isolated by functional screening of a compost-soil metagenome (Verma et al., 2013). The thermostability of this enzyme was subsequently engineered by directed site mutagenesis (Verma and Satyanarayana, 2013). Amylases are known as enzymes that catalyze the hydrolysis of starch into sugars (Sundarram et al., 2014). A novel and thermostable amylase with the highest activity at 90°C was retrieved from a black smoker chimney by combining fosmid library construction with pyrosequencing (Wang et al., 2011). Another α-amylase was isolated in the functional screening of a metagenomic library of Western Ghats soil constructed in pCC1FOS. This amylase retained 30% activity after incubation for 60 min at 80°C and had an optimal pH of 5.0 and could be potentially used in some industrial processes like liquefaction and saccharification of starch in food industry, or formulation of enzymatic detergents and removing starch from textiles (Vidya et al., 2011).

Proteases

Proteases are protein-hydrolyzing enzymes classified into acidic, neutral, or alkaline groups, based on their optimum pH. They can also be classified into aspartic, cysteine, glutamic, metallo, serine, and threonine protease types based on the amino acids present in their active sites (Singh et al., 2015). These enzymes are widely used in various industries such as detergent, food, and leather (Haddar et al., 2009; George et al., 2014). A thermotolerant, alkali-stable and oxidation resistant protease (CHpro1) was found by functional screening of a metagenomic library constructed from sediments of hot springs in Chumathang area of Ladakh, India (Singh et al., 2015). This enzyme, that showed optimum activity at pH 11 and stability in high alkaline range, could be especially interesting for the detergent industry, as the pH of laundry detergents is generally in the range of 9.0–12.0. This property, in addition to the resistance in the presence of detergent compounds, like oxidizing agents, and the possibility of working at high wash temperatures (optimum activity at 80°C), makes it a very suitable detergent protease.

Oxidoreductases

These enzymes catalyze oxidation-reduction reactions, in which hydrogen or oxygen atoms or electrons are transferred between molecules and are important biocatalysts for several industrial processes. From a pharmaceutical point of view, oxidoreductases can act like quorum-quenching enzymes, degrading signal molecules to block quorum-sensing-dependent infection, as reported by Bijtenhoorn et al. (2011), who found a soil-derived dehydrogenase/reductase implicated in the decreasing of Pseudomonas aeruginosa biofilm formation and virulence of Caenorhabditis elegans. These enzymes can also be used in food industry as they catalyze oxidation- reduction reactions that can play an important role in taste, flavor and nutritional value of aliments such as virgin olive oil (Peres et al., 2015). Another relevant application of oxidoreductases is their role in decomposing specific recalcitrant contaminants by precipitation or by transforming them to other products, leading to a better final treatment of the waste. Some oxidoreductases that can be used for this purpose include peroxidases, polyphenol oxidases, and estradiol dioxygenases (EDOs) (Durán and Esposito, 2000). Suenaga et al. (2007) constructed a metagenomic library from activated sludge used to treat coke plant wastewater containing various organic pollutants like phenol, mono- and polycyclic nitrogen-containing aromatics or aromatic hydrocarbons, among others. The library was screened for EDOs, using catechol as a substrate, yielding 91 EDO-positive clones, 38 of them were sequenced in order to conduct similarity searches using BLASTX. A polyphenol oxidase enzyme, with alkaline laccase activity and highly soluble expression, showing the optimum activity of 55°C, was isolated from a functional screening of DNA from mangrove soil (Ye et al., 2010).

Comparative metagenomics

The increasing number of metagenomes from high-temperature environments sequenced and the possibility of generating more sequences with a lower cost of time and money has enabled the comparison of metagenomic sequences between and within environments, opening a new field in metagenomics. Comparative metagenomics can enlighten how the microbial community taxa or the metabolic potential vary between sampling locations or time points, as well as explain the influence of several factors, such as high temperatures, in the taxonomical and functional composition of an ecosystem. Comparison of metagenomic data recovered from different high temperature habitats indicates that these communities are different with respect to species abundance and microbial composition. However, some groups of species are more commonly represented, for example, bacterial taxa such as Thermotoga, Deinococcus-Thermus, and Proteobacteria, as well as Archaea, like Methanococcus, Thermoprotei, and Thermococcus (Lewin et al., 2013). The comparison between metagenomes derived from six distantly located hot springs of varying temperature and pH revealed a wide distribution of four archaeal viral families, Ampullaviridae, Bicaudaviridae, Lipothrixviridae, and Rudiviridae (Gudbergsdóttir et al., 2016). Even though the important role of viruses in high temperature ecosystems has been demonstrated, the comparative studies are limited since the diversity of thermophilic viruses in many hot environments remains unknown, as revealed by Adriaenssens et al. (2015) in the Namib desert hypoliths metagenome, where the majority of the viral sequence reads were classified as unknown. Comparative metagenomics can also increase our insight into the adaptation of microorganisms to high temperature environments. Xie et al. (2011) compared the sequences obtained from a fosmid metagenomic library of a black smoker chimney 4143-1 in the Mothra hydrothermal vent field at the Juan de Fuca Ridge with metagenomes of different environments, including a biofilm of a carbonate chimney from the Lost City hydrothermal vent field (90°C, pH 9–11 fluids). This study revealed that the deep-sea vent chimneys are highly enriched in genes for mismatch repair and homologous recombination, and exhibited a high proportion of transposases. These enzymes, which are critical in horizontal gene transfer, were also abundantly found when comparing the metagenomic data obtained from three different deep-sea hydrothermal vent chimneys (He et al., 2013). This fact supports the previous hypothesis that horizontal gene transfer may be common in the deep-sea vent chimney biosphere and could be an important source of phenotypic diversity (Brazelton and Baross, 2009). Other comparative studies show that, apart from temperature, pH is also an important factor in the composition of microbial communities. A comparison of the biodiversity and community composition in eight geographically remote hot springs (temperature range between 61 and 92°C and pH between 1.8 and 7) showed a decrease in biodiversity with increasing temperature and decreasing pH (Menzel et al., 2015). The loss of biodiversity in hot environments with low pH was also observed by Song et al. (2013), showing a more diverse bacterial population in non-acidic hot springs than in acidic hot springs from the Yunan Province (China). IMG/M (Markowitz et al., 2014) and MG-RAST (Meyer et al., 2008) are two frequently used metagenomics pipelines to easily perform comparative analysis of microbial communities, and can be explored to find sequences of different high temperature environments, since a considerable number of metagenomes are deposited in their databases. MG-RAST has about 240 thousand data sets containing over 800 billion sequences and more than 36 thousand public metagenomes, including 225 metagenomes (0.61%) from different thermophilic biomes with temperatures ranging from 52 to 122°C obtained by whole shotgun and/or amplicon sequencing (data publicly available at MG-RAST server on August 2016). Usually, these studies require statistical tools to explore multivariate data, like principal component analysis (PCA), in order to compare and contrast metagenomes from different environments. PCA is one of the most widely used statistical analyses for genomic data as it is a simple and robust data reduction technique that can be applied to large data sets. A more exhaustive description of some of the statistical analysis that can be used to compare metagenomes can be found in the study by Dinsdale et al. (2013). In this study, the metabolic functions of 212 metagenomes, including six different hot springs, were compared between and within environments using different statistical methods. Several tools like STAMP (statistical analysis of metagenomics profiles, Parks et al., 2014) and PRIMER-E can be used for this purpose, allowing the statistical analysis of multivariate data.

Future perspectives

High-throughput screening methods

Although the new ultra-fast sequencing technologies quickly generate a remarkable number of target gene candidates, functional assays are still needed to confirm them. These assays for protein function represent one of the most reliable and invaluable tools for mining target genes. Thus, developing of high-throughput screening (HTS) methods and improved chromogenic substrates for the detection of thermozymes (Kračun et al., 2015) is a priority for reducing the time invested in primary screening. HTS techniques increase the success of function-based metagenomic screens since they compensate for the often low hit rates in such screens (Ekkers et al., 2012). Apart from conventional high throughput screens, which use microtiter plate wells to store a large number of clones (Ko et al., 2013), microarray-based technologies coupled with microfluidic devices, cell compartmentalization, flow cytometry, and cell sorting are arising as promising new technologies for this purpose (Najah et al., 2014; Meier et al., 2015; Vidal-Melgosa et al., 2015). Microfluidic technologies are of undeniable interest when it comes to reaching screening rates of a million clones per day (Ufarté et al., 2015). This screening method generally uses fluorogenic substrates (Najah et al., 2013) and it is based on the encapsulation of single clones of the metagenomic library in droplets, followed by the substrate induced gene-expression screening and the fluorescence-activated cell sorting to isolate plasmidic clones containing the genes of interest (Colin et al., 2015; Hosokawa et al., 2015). The main advantages of this ultra-fast screening method are the small volume required (usually picoliters to femtoliters) and the capability of detecting intracellular, extracellular, and membrane proteins. This approach could be used for the screening of thermozymes, as droplets can be incubated at high temperature before proceeding to the screening and fluorescence sorting. In this regard, there is an ongoing FP7 Marie Curie Action named HOTDROPS that involves four companies and four academic partners (including the authors' group) aimed to develop a microfluidics-based ultra-high-throughput platform for the selection of thermozymes from metagenomics and directed evolution libraries.

Advanced sequencing

Until recently, most of the sequences collected in reference databases were related to humans and their pathogens. Currently, the advances in sequencing technologies have enabled the generation of considerable amounts of longer reads in less time. This fact, in addition to the lower per base cost and the development of metagenomics, has produced a relevant increase in the number of genomes sequenced and annotated deposited in databases like GenBank, thus covering a high range of microorganisms from a wide variety of habitats, including high temperature environments. Therefore, the bias in the databases toward microorganisms with clinical or pathogenic interest is decreasing, allowing a better analysis of the populations with metagenomics. Furthermore, metagenomics is becoming a tool in reach of many laboratories with the recent release of new cheaper and smaller devices such as the Oxford Nanopore MinION, a USB flash drive-size sequencer that measures deviations in electrical current as a single DNA strand passes through a protein nanopore (Bayley, 2015). However, this technology presents high error rates compared to the others (Goodwin et al., 2015) and still has to be improved. Altogether, these breakthrough developments make metagenomics a more affordable and robust tool to explore the taxonomy and the functional diversity of microbial communities. Nevertheless, the complexity of microbial species, together with the limitations of the technology to cover fully whole genome sequences, still pose a great challenge for metagenome research. NGS technologies have limitations and remain at least an order of magnitude more expensive than other conventional microbiological assays, thus samples often must be individually barcoded and pooled into single runs to decrease costs. All these deficiencies will probably disappear as technologies continue developing like they did in the last years, from the end of the human genome sequencing project in 2003 (Collins et al., 2003), up to now.

Advances in bioinformatics

Due to the massive amount of metagenome data generated in the last 10 years, infrastructural developments associated with managing and serving sequence data are needed. Additionally, the fast growth in the size of data complicates its storage, organization, and distribution. As the volume of metagenomics data keeps growing, new assemblers have been developed, namely MEGAHIT that can assemble large and complex metagenomics data in a time and cost-efficient way, especially on a single-node server (Li et al., 2015). New bioinformatic pipelines designed to support researchers involved in functional and taxonomic studies of environmental microbial communities have been released like BioMaS (Fosso et al., 2015), DUDes (Piro et al., 2016), or MOCAT2 (Kultima et al., 2016), among others. Since there is an increasing number of complex communities sequenced, improved statistical methodology is needed, especially to enhance comparative studies where a large number of covariates (e.g., environmental or host physiological parameters) are collected for each sample.

Author contributions

MD did all the data gathering and write-up. ER and MG supervised and reviewed the manuscript, providing comments and guidance during the manuscript development.

Funding

Funding both from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 324439, and from the Xunta de Galicia (Consolidación D.O.G. 10-10-2012, Contract Number: 2012/118) co-financed by FEDER. The work of MD was supported by a FPU fellowship (Ministerio de Educación Cultura y Deporte) FPU12/05050.

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
  190 in total

1.  High abundance of heterotrophic prokaryotes in hydrothermal springs of the Azores as revealed by a network of 16S rRNA gene-based methods.

Authors:  Kerstin Sahm; Patrick John; Heiko Nacke; Bernd Wemheuer; Ralf Grote; Rolf Daniel; Garabed Antranikian
Journal:  Extremophiles       Date:  2013-05-26       Impact factor: 2.395

2.  Roles of viruses in the environment.

Authors:  Forest Rohwer; David Prangishvili; Debbie Lindell
Journal:  Environ Microbiol       Date:  2009-11       Impact factor: 5.491

3.  Improvement in thermostability of metagenomic GH11 endoxylanase (Mxyl) by site-directed mutagenesis and its applicability in paper pulp bleaching process.

Authors:  Digvijay Verma T Satyanarayana
Journal:  J Ind Microbiol Biotechnol       Date:  2013-10-08       Impact factor: 3.346

4.  Taxonomic binning of metagenome samples generated by next-generation sequencing technologies.

Authors:  Johannes Dröge; Alice C McHardy
Journal:  Brief Bioinform       Date:  2012-07-31       Impact factor: 11.622

5.  The metagenome-derived enzymes LipS and LipT increase the diversity of known lipases.

Authors:  Jennifer Chow; Filip Kovacic; Yuliya Dall Antonia; Ulrich Krauss; Francesco Fersini; Christel Schmeisser; Benjamin Lauinger; Patrick Bongen; Joerg Pietruszka; Marlen Schmidt; Ina Menyes; Uwe T Bornscheuer; Marrit Eckstein; Oliver Thum; Andreas Liese; Jochen Mueller-Dieckmann; Karl-Erich Jaeger; Wolfgang R Streit
Journal:  PLoS One       Date:  2012-10-24       Impact factor: 3.240

6.  A novel family VII esterase with industrial potential from compost metagenomic library.

Authors:  Chul-Hyung Kang; Ki-Hoon Oh; Mi-Hwa Lee; Tae-Kwang Oh; Bong Hee Kim; Jung- Hoon Yoon
Journal:  Microb Cell Fact       Date:  2011-05-29       Impact factor: 5.328

7.  A novel metagenomic short-chain dehydrogenase/reductase attenuates Pseudomonas aeruginosa biofilm formation and virulence on Caenorhabditis elegans.

Authors:  Patrick Bijtenhoorn; Hubert Mayerhofer; Jochen Müller-Dieckmann; Christian Utpatel; Christina Schipper; Claudia Hornung; Matthias Szesny; Stephanie Grond; Andrea Thürmer; Elzbieta Brzuszkiewicz; Rolf Daniel; Katja Dierking; Hinrich Schulenburg; Wolfgang R Streit
Journal:  PLoS One       Date:  2011-10-26       Impact factor: 3.240

8.  A new generation of versatile chromogenic substrates for high-throughput analysis of biomass-degrading enzymes.

Authors:  Stjepan Krešimir Kračun; Julia Schückel; Bjørge Westereng; Lisbeth Garbrecht Thygesen; Rune Nygaard Monrad; Vincent G H Eijsink; William George Tycho Willats
Journal:  Biotechnol Biofuels       Date:  2015-04-23       Impact factor: 6.040

9.  Metagenomics of an Alkaline Hot Spring in Galicia (Spain): Microbial Diversity Analysis and Screening for Novel Lipolytic Enzymes.

Authors:  Olalla López-López; Kamila Knapik; Maria-Esperanza Cerdán; María-Isabel González-Siso
Journal:  Front Microbiol       Date:  2015-11-20       Impact factor: 5.640

10.  Metagenomic analysis of a tropical composting operation at the são paulo zoo park reveals diversity of biomass degradation functions and organisms.

Authors:  Layla Farage Martins; Luciana Principal Antunes; Renata C Pascon; Julio Cezar Franco de Oliveira; Luciano A Digiampietri; Deibs Barbosa; Bruno Malveira Peixoto; Marcelo A Vallim; Cristina Viana-Niero; Eric H Ostroski; Guilherme P Telles; Zanoni Dias; João Batista da Cruz; Luiz Juliano; Sergio Verjovski-Almeida; Aline Maria da Silva; João Carlos Setubal
Journal:  PLoS One       Date:  2013-04-24       Impact factor: 3.240

View more
  18 in total

1.  Cellulolytic, amylolytic and xylanolytic potential of thermophilic isolates of Surajkund hot spring.

Authors:  Snehi Soy; Vinod Kumar Nigam; Shubha Rani Sharma
Journal:  J Biosci       Date:  2019-10       Impact factor: 1.826

2.  Identification of a novel protease from the thermophilic Anoxybacillus kamchatkensis M1V and its application as laundry detergent additive.

Authors:  Sondes Mechri; Khelifa Bouacem; Nadia Zaraî Jaouadi; Hatem Rekik; Mouna Ben Elhoul; Maroua Omrane Benmrad; Hocine Hacene; Samir Bejar; Amel Bouanane-Darenfed; Bassem Jaouadi
Journal:  Extremophiles       Date:  2019-08-12       Impact factor: 2.395

Review 3.  The endophytome (plant-associated microbiome): methodological approaches, biological aspects, and biotech applications.

Authors:  Thamara de Medeiros Azevedo; Flávia Figueira Aburjaile; José Ribamar Costa Ferreira-Neto; Valesca Pandolfi; Ana Maria Benko-Iseppon
Journal:  World J Microbiol Biotechnol       Date:  2021-10-28       Impact factor: 3.312

Review 4.  Hunt for α-amylase from metagenome and strategies to improve its thermostability: a systematic review.

Authors:  Prayatna Sharma; Krishnendu Mondal; Keshab Chandra Mondal; Nagendra Thakur
Journal:  World J Microbiol Biotechnol       Date:  2022-08-24       Impact factor: 4.253

5.  Characterization of a novel thermophilic metagenomic GH5 endoglucanase heterologously expressed in Escherichia coli and Saccharomyces cerevisiae.

Authors:  Juan-José Escuder-Rodríguez; María González-Suarez; María-Eugenia deCastro; Almudena Saavedra-Bouza; Manuel Becerra; María-Isabel González-Siso
Journal:  Biotechnol Biofuels Bioprod       Date:  2022-07-07

6.  A bifunctional cellulase-xylanase of a new Chryseobacterium strain isolated from the dung of a straw-fed cattle.

Authors:  Hao Tan; Renyun Miao; Tianhai Liu; Lufang Yang; Yumin Yang; Chunxiu Chen; Jianrong Lei; Yuhui Li; Jiabei He; Qun Sun; Weihong Peng; Bingcheng Gan; Zhongqian Huang
Journal:  Microb Biotechnol       Date:  2017-12-04       Impact factor: 5.813

7.  Establishment of a functional system for recombinant production of secreted proteins at 50 °C in the thermophilic Bacillus methanolicus.

Authors:  Marta Irla; Eivind B Drejer; Trygve Brautaset; Sigrid Hakvåg
Journal:  Microb Cell Fact       Date:  2020-07-28       Impact factor: 5.328

Review 8.  Cell-free synthesis of stable isotope-labeled internal standards for targeted quantitative proteomics.

Authors:  Ryohei Narumi; Keiko Masuda; Takeshi Tomonaga; Jun Adachi; Hiroki R Ueda; Yoshihiro Shimizu
Journal:  Synth Syst Biotechnol       Date:  2018-02-21

Review 9.  Cellulases from Thermophiles Found by Metagenomics.

Authors:  Juan-José Escuder-Rodríguez; María-Eugenia DeCastro; María-Esperanza Cerdán; Esther Rodríguez-Belmonte; Manuel Becerra; María-Isabel González-Siso
Journal:  Microorganisms       Date:  2018-07-10

10.  Functional metagenomics identifies an exosialidase with an inverting catalytic mechanism that defines a new glycoside hydrolase family (GH156).

Authors:  Léa Chuzel; Mehul B Ganatra; Erdmann Rapp; Bernard Henrissat; Christopher H Taron
Journal:  J Biol Chem       Date:  2018-09-24       Impact factor: 5.157

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.