| Literature DB >> 35629336 |
Yassin Mreyoud1, Myoungkyu Song2, Jihun Lim3, Tae-Hyuk Ahn1,4.
Abstract
The diversity within different microbiome communities that drive biogeochemical processes influences many different phenotypes. Analyses of these communities and their diversity by countless microbiome projects have revealed an important role of metagenomics in understanding the complex relation between microbes and their environments. This relationship can be understood in the context of microbiome composition of specific known environments. These compositions can then be used as a template for predicting the status of similar environments. Machine learning has been applied as a key component to this predictive task. Several analysis tools have already been published utilizing machine learning methods for metagenomic analysis. Despite the previously proposed machine learning models, the performance of deep neural networks is still under-researched. Given the nature of metagenomic data, deep neural networks could provide a strong boost to growth in the prediction accuracy in metagenomic analysis applications. To meet this urgent demand, we present a deep learning based tool that utilizes a deep neural network implementation for phenotypic prediction of unknown metagenomic samples. (1) First, our tool takes as input taxonomic profiles from 16S or WGS sequencing data. (2) Second, given the samples, our tool builds a model based on a deep neural network by computing multi-level classification. (3) Lastly, given the model, our tool classifies an unknown sample with its unlabeled taxonomic profile. In the benchmark experiments, we deduced that an analysis method facilitating a deep neural network such as our tool can show promising results in increasing the prediction accuracy on several samples compared to other machine learning models.Entities:
Keywords: deep learning; metagenomics; phenotype prediction; sample classification
Year: 2022 PMID: 35629336 PMCID: PMC9143510 DOI: 10.3390/life12050669
Source DB: PubMed Journal: Life (Basel) ISSN: 2075-1729
Dataset breakdown for each of the three utilized datasets. Listed are the sample size and sample breakdown along with the accession number to access the raw data.
| Sample Size | Controls | Diseased | Accession # | |
|---|---|---|---|---|
| Cirrhosis | 232 | 118 | 114 | ERP005860 |
| T2D | 440 | 217 | 223 | SRA045646 |
| Purina | 3096 | 1536 | 1560 | PRJEB20308 |
Figure 1This figure shows the general workflow of our analysis and DNN tool from sampling to prediction. The first panel shows the selection of samples from a training and test dataset. The second panel shows the flow of sequencing of the samples to thereby produce a taxonomic profile. Finally, the last panel shows how the processed data is used in our tool to produce a predictive model for classification tasks.
Figure 2This figure shows a high-level representation of our deep neural network. The modularity of the network is represented by the black dots, which indicate a user-defined number of nodes in each layer. Additionally, although an arbitrary number of layers may be selected, only three were represented for this figure.
Cross-validation accuracy for each software tool on Cirrhosis and type II diabetes datasets. Each tool was accessed on 27 April 2022.
| Software | Software Available | T2D | Cirrhosis |
|---|---|---|---|
| MegaD (v1.0) |
| 0.700 | 0.833 |
| MegaR (v1.0) |
| 0.670 | 0.885 |
| PopPhy (v1.0) |
| 0.650 | 0.910 |
| MetAML (v1.0) |
| 0.664 | 0.877 |
Performance of MegaD on Purina dataset using default parameters and grid search function. Listed are the accuracy, area under ROC curve, precision, and F1 score.
| Accuracy | AUC | Precision | F1 Score | |
|---|---|---|---|---|
| Default Parameter | 0.941 | 0.959 | 0.9103 | 0.940 |
| Grid search | 0.987 | 0.986 | 0.981 | 0.987 |
Figure 3(A) This figure shows the ROC curve for our model on the Purina dataset using the grid search function. (B) This figure shows the confusion matrix which demonstrates the model’s misclassification rate for the Purina dataset broken down by true positive, false positive, false negative, and true negative.