| Literature DB >> 30067305 |
Klemen Pečnik1, Vesna Todorović2, Maša Bošnjak2, Maja Čemažar2, Igor Kononenko3, Gregor Serša2, Janez Plavec1,4,5.
Abstract
Machine learning models in metabolomics, despite their great prediction accuracy, are still not widely adopted owing to the lack of an efficient explanation for their predictions. In this study, we propose the use of the general explanation method to explain the predictions of a machine learning model to gain detailed insight into metabolic differences between biological systems. The method was tested on a dataset of 1 H NMR spectra acquired on normal lung and mesothelial cell lines and their tumor counterparts. Initially, the random forests and artificial neural network models were applied to the dataset, and excellent prediction accuracy was achieved. The predictions of the models were explained with the general explanation method, which enabled identification of discriminating metabolic concentration differences between individual cell lines and enabled the construction of their specific metabolic concentration profiles. This intuitive and robust method holds great promise for in-depth understanding of the mechanisms that underline phenotypes as well as for biomarker discovery in complex diseases.Entities:
Keywords: NMR spectroscopy; cancer; general explanation method; machine learning; metabolomics
Mesh:
Year: 2018 PMID: 30067305 PMCID: PMC6220813 DOI: 10.1002/cbic.201800392
Source DB: PubMed Journal: Chembiochem ISSN: 1439-4227 Impact factor: 3.164
Overview of cell lines.
| A549 | WI‐38 | MeT‐5A | MSTO‐211H | NCI‐H2052 | |
|---|---|---|---|---|---|
| cell type | epithelial | fibroblast | epithelial | fibroblast | epithelial |
| disease | carcinoma | normal | normal | mesothelioma | mesothelioma |
| no. of samples | 23 | 23 | 8 | 4 | 5 |
| sample nos. | 1–23 | 24–46 | 47–54 | 55–58 | 59–63 |
Figure 1Representative 1H NMR spectrum of the WI‐38 cell line sample (index 32). Orange bars represent 25 important features with the highest average contributions indicated with numbers at the top of the spectrum.
Figure 2Average contribution values, calculated by the GEM, of important features for the A) RF and B) ANN models for all cell line types (indicated on top). Average contribution values are presented numerically at the top of the bars.
Figure 3Feature values of the 15 important features for the RF model for all samples. Cell line types are indicated at the top. Plots for features 82 and 101 contain sample indexes indicated with numbers. *** p<0.001, one‐way ANOVA.
Figure 4Feature values of 15 important features for the ANN model for all samples. Cell line types are indicated at the top. *** p<0.001, one‐way ANOVA.
Figure 5Changes in concentrations of the metabolites that discriminate cell lines according to the RF and ANN models. Upward‐pointing arrows indicate an increased concentration of a metabolite, whereas downward‐pointing arrows indicate a decreased concentration. Blue line around lungs represents mesothelium, and the cyan part on the right lung represents pleural infusion.