| Literature DB >> 30220912 |
Abhishek S Dhoble1, Pratik Lahiri1, Kaustubh D Bhalerao1.
Abstract
BACKGROUND: Flow cytometry, with its high throughput nature, combined with the ability to measure an increasing number of cell parameters at once can surpass the throughput of prevalent genomic and metagenomic approaches in the study of microbiomes. Novel computational approaches to analyze flow cytometry data will result in greater insights and actionability as compared to traditional tools used in the analysis of microbiomes. This paper is a demonstration of the fruitfulness of machine learning in analyzing microbial flow cytometry data generated in anaerobic microbiome perturbation experiments.Entities:
Keywords: Anaerobic digestion; Deep learning; Flow cytometry; Machine learning; Microbial community fingerprinting; Pattern recognition
Year: 2018 PMID: 30220912 PMCID: PMC6134764 DOI: 10.1186/s13036-018-0112-9
Source DB: PubMed Journal: J Biol Eng ISSN: 1754-1611 Impact factor: 4.355
Fig. 1Unsupervised autoencoder analysis can be used to identify significantly perturbed microbiomes. The mean squared error (MSE) between actual value and reconstruction has been displayed on the y-axis for each sample tested. The red horizontal line at 0.05 MSE represents a threshold error to decide an outlier
Machine learning model comparison (values in the boxes are prediction accuracies on test data; higher values are better) (* Demonstrated deep learning model was a feed forward artificial neural network with three hidden layers)
| Putative Groups | Gradient Boosting | Naïve Bayes | Distributed Random Forests | Deep Learning* |
|---|---|---|---|---|
| Acetogens | 41.87% | 63.87% | 18.00% | 52.67% |
| Acidogens | 91.20% | 97.07% | 53.07% | 99.73% |
| Hydrolyzers | 65.60% | 67.20% | 10.67% | 57.07% |
| Methanogens | 85.17% | 44.75% | 89.33% | 76.83% |
| Overall | 71.26% | 60.44% | 53.55% | 70.55% |
Fig. 2Receiver Operating Characteristic (ROC) curves comparing Gradient Boosting (GB), Naïve Bayes (NB), Distributed Random Forests (DRF) and Deep Learning (DL) (feed forward artificial neural network with three hidden layers) models on classification of (a) Acetogens (ACETO) (b) Acidogens (ACIDO) (c) Hydrolyzers (HYDRO) (d) Methanogens (METHA)
Area under the curve (AUC) values corresponding to Receiver Operating Characteristics (ROC) curves shown in Fig. 2 for test data (* Demonstrated deep learning model was a feed forward artificial neural network with three hidden layers)
| Putative Groups | Gradient Boosting | Naïve Bayes | Distributed Random Forests | Deep Learning* |
|---|---|---|---|---|
| Acetogens | 0.7829 | 0.7279 | 0.6482 | 0.7853 |
| Acidogens | 0.9993 | 0.9999 | 0.9833 | 0.9983 |
| Hydrolyzers | 0.9638 | 0.9391 | 0.8055 | 0.9269 |
| Methanogens | 0.8520 | 0.8024 | 0.7773 | 0.8585 |
Fig. 3Box plots of the deep learning (*feed forward artificial neural network with three hidden layers) classification probabilities for carbon source
Fig. 4Box plots of deep learning (*feed forward artificial neural network with three hidden layers) prediction probabilities for nanoparticle-perturbed communities
Fig. 5Unsupervised autoencoder analysis on antibiotics perturbed communities. The mean squared error (MSE) between actual value and reconstruction has been displayed on the y-axis for each sample tested. The red horizontal line at 17.5 MSE represents a threshold error to decide an outlier