| Literature DB >> 33193158 |
Mehdi Foroozandeh Shahraki1, Shohreh Ariaeenejad2, Fereshteh Fallah Atanaki1, Behrouz Zolfaghari3, Takeshi Koshiba4, Kaveh Kavousi1, Ghasem Hosseini Salekdeh2,5.
Abstract
As the availability of high-throughput metagenomic data is increasing, agile and accurate tools are required to analyze and exploit this valuable and plentiful resource. Cellulose-degrading enzymes have various applications, and finding appropriate cellulases for different purposes is becoming increasingly challenging. An in silico screening method for high-throughput data can be of great assistance when combined with the characterization of thermal and pH dependence. By this means, various metagenomic sources with high cellulolytic potentials can be explored. Using a sequence similarity-based annotation and an ensemble of supervised learning algorithms, this study aims to identify and characterize cellulolytic enzymes from a given high-throughput metagenomic data based on optimum temperature and pH. The prediction performance of MCIC (metagenome cellulase identification and characterization) was evaluated through multiple iterations of sixfold cross-validation tests. This tool was also implemented for a comparative analysis of four metagenomic sources to estimate their cellulolytic profile and capabilities. For experimental validation of MCIC's screening and prediction abilities, two identified enzymes from cattle rumen were subjected to cloning, expression, and characterization. To the best of our knowledge, this is the first time that a sequence-similarity based method is used alongside an ensemble machine learning model to identify and characterize cellulase enzymes from extensive metagenomic data. This study highlights the strength of machine learning techniques to predict enzymatic properties solely based on their sequence. MCIC is freely available as a python package and standalone toolkit for Windows and Linux-based operating systems with several functions to facilitate the screening and thermal and pH dependence prediction of cellulases.Entities:
Keywords: MCIC; cellulase; enzyme screening; machine learning; metagenomics; optimum pH; optimum temperature
Year: 2020 PMID: 33193158 PMCID: PMC7645119 DOI: 10.3389/fmicb.2020.567863
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
FIGURE 1Samples in temperature (A) and pH (B) datasets were divided into three classes based on their reported optima. The non-redundant union (C) of both datasets, had 163 instances with 3 different EC numbers and from 16 different glycoside hydrolase (GH) families. This figure shows the composition of each class or family in the collected datasets.
FIGURE 2Schematic workflow of building the MCIC’s prediction models.
FIGURE 3Boxplots of temperature (left) and pH (right) dependence prediction performance computed through 100 sixfold CV tests.
MCIC’s prediction performance during 100 iterations of sixfold CV tests.
| Accuracy | Macro-recall | Macro-precision | Macro-F1 | |||||
| Topt | pHopt | Topt | pHopt | Topt | pHopt | Topt | pHopt | |
| CV performance | 0.75 | 0.71 | 0.73 | 0.62 | 0.77 | 0.60 | 0.73 | 0.60 |
Two novel cellulolytic enzymes were discovered from camel rumen metagenome and are being investigated in other studies. This table represents the detected enzymatic function of each enzyme and the comparison of predicted dependence characteristics with real pH and temperature optima values. MCIC was able to correctly predict the desired attributes.
| Enzyme name | Enzymatic function | Predicted thermal dependence | Predicted pH dependence | Optimum temperature | Optimum pH |
| PersiCel5 | Endo-glucanase | Mesophilic | Neutral | 50 | 6.5 |
| PersiCel6 | Endo-glucanase | Thermophilic | Neutral | 70 | 7.5 |