| Literature DB >> 36232571 |
Elettra Barberis1,2, Shahzaib Khoso1,2, Antonio Sica3,4, Marco Falasca5, Alessandra Gennari1, Francesco Dondero6, Antreas Afantitis7, Marcello Manfredi1,2.
Abstract
Recent technological innovations in the field of mass spectrometry have supported the use of metabolomics analysis for precision medicine. This growth has been allowed also by the application of algorithms to data analysis, including multivariate and machine learning methods, which are fundamental to managing large number of variables and samples. In the present review, we reported and discussed the application of artificial intelligence (AI) strategies for metabolomics data analysis. Particularly, we focused on widely used non-linear machine learning classifiers, such as ANN, random forest, and support vector machine (SVM) algorithms. A discussion of recent studies and research focused on disease classification, biomarker identification and early diagnosis is presented. Challenges in the implementation of metabolomics-AI systems, limitations thereof and recent tools were also discussed.Entities:
Keywords: artificial intelligence; biomarkers; machine learning; metabolomics; precision medicine
Mesh:
Year: 2022 PMID: 36232571 PMCID: PMC9569627 DOI: 10.3390/ijms231911269
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 6.208
Figure 1Machine learning model training and prediction of new sample using metabolomics analysis of biological samples with mass spectrometry and nuclear magnetic resonance.
Figure 2Machine learning application using a stepwise process for biomarker discovery and diagnostic modeling. The machine learning process begins with input of dataset generated by various platforms; data are then subjected to a feature selection algorithm to reduce dimensionality and obtain optimal subsets of features to build a robust classification model and to discover biomarkers.
Recent studies carried out to discover biomarkers combining metabolomics data and machine learning algorithms. Feature selection algorithms, number of purposed biomarkers, and type of disease are reported.
| Feature Selection Algorithm | Number of Purposed | Study | Diseases |
|---|---|---|---|
| Random Forest | 18 metabolites as biomarker | [ | Weight gain (Metabolomic disorder) |
| Fast correlation-based feature selection (FCBS) | 5 biomarkers | [ | Lung cancer |
| Recursive feature elimination and PLS regression | 10 metabolites as biomarker | [ | Renal cell carcinoma |
| MUVR algorithm | 13 metabolites | [ | Gout and asymptomatic hyperuricemia |
| PLSDA | 26 Metabolites and lipids | [ | Rheumatoid arthritis |
| SVM-RFE (Recursive feature elimination) | 16 Metabolites | [ | Epithelial Ovarian cancer |
Recent application of machine learning algorithms coupled to metabolomics for the diagnosis of diseases. The diseases, the best-performing algorithm and the samples analyzed are reported.
| Diseases | Best-Performing | Other Models | References | Sample Collection |
|---|---|---|---|---|
| Zika virus | RF | SVM (Sequential minimal optimization) and (Iterative single data algorithm), Decision Trees | [ | Serum |
| Colorectal cancer | RF | PLS, LDA, SVM | [ | Urine |
| Paracoccidioidomycosis | RF | N/A | [ | Serum |
| Malignant Mesothelioma | RF | N/A | [ | Plasma |
| Diabetic cognitive impairments | SVM | N/A | [ | Urine |
| Benzylpenicillin and multidrug resistance of Staphylococcus aureus | SVM (Radial basis function), Logistic regression, Neural network | Random Forest, Linear SVM, ADA Boost, Quadratic discriminant analysis (QDA) and linear discriminant analysis (LDA), Naïve Bayes, Decision Tree | [ | Milk |
| Intrauterine growth restriction | SVM (Radial basis function) | N/A | [ | Cord blood serum |
| Parkinson Disease | Neural Network | N/A | [ | Plasma from Blood |
| Small-cell lung cancer (SCLC) and non-small-cell lung cancer | Neural Network | N/A | [ | Sputum |
| Lung cancer | Naïve Bayes | Random Forest, SVM, Neural Network, KNN, AdaBoost | [ | Plasma |
| Renal Cell Carcinoma Status Prediction | KNN | Random forest (RF), linear kernel support vector machine (SVM-Lin) | [ | Urine |
| Gout from asymptomatic hyperuricemia | SVM | Logistic regression, Random Forest | [ | Serum |
| Irritable Bowels Syndrome | Combination of Logistic regression and Random Forest | Random Forest, Logistic regression | [ | Faecal samples |
| Autoimmune diseases | Artificial neural network and Logistic regression | NA | [ | Plasma |
| Multiple sclerosis | Random Forest | GLM, PLS-DA, PCA-LDA | [ | Plasma |
| Intracerebral hemorrhage from Acute Cerebral Infarction | Neural Network | N/A | [ | Dried blood spot (DBS) |
Tools and libraries used by metabolomics studies for the application of machine learning algorithms.
| Tools/Libraries | Purpose of Use in Studies | Programing Language | Programing | Metabolomic |
|---|---|---|---|---|
| Weka | Classification/feature selection | Java | No | [ |
| KNIME | Data processing | Java | No | [ |
| Orange data mining | Classification | Python, Cython, C++, C | No | [ |
| Scikit-learn | Data processing/Classification | Python | Yes | [ |
| TPOT | Classification/feature selection | Python | Yes | [ |
| Caret | Classification/feature selection | R | Yes | [ |
| Keras and Tensor flow | Data processing/Peak identification | Python, R | Yes | [ |