| Literature DB >> 30371751 |
Jörn Lötsch1,2, Dario Kringel1, Thomas Hummel3.
Abstract
The complexity of the human sense of smell is increasingly reflected in complex and high-dimensional data, which opens opportunities for data-driven approaches that complement hypothesis-driven research. Contemporary developments in computational and data science, with its currently most popular implementation as machine learning, facilitate complex data-driven research approaches. The use of machine learning in human olfactory research included major approaches comprising 1) the study of the physiology of pattern-based odor detection and recognition processes, 2) pattern recognition in olfactory phenotypes, 3) the development of complex disease biomarkers including olfactory features, 4) odor prediction from physico-chemical properties of volatile molecules, and 5) knowledge discovery in publicly available big databases. A limited set of unsupervised and supervised machine-learned methods has been used in these projects, however, the increasing use of contemporary methods of computational science is reflected in a growing number of reports employing machine learning for human olfactory research. This review provides key concepts of machine learning and summarizes current applications on human olfactory data.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30371751 PMCID: PMC6295796 DOI: 10.1093/chemse/bjy067
Source DB: PubMed Journal: Chem Senses ISSN: 0379-864X Impact factor: 3.160
Reports of human olfactory research where machine-learned methods were used
| Olfactory context | Analyzed problem | Machine-learning methods of main data analysis | Ref. |
|---|---|---|---|
| Physiology of pattern-based odor detection and recognition | Relationship between odorant response and mutations in olfactory receptors | Naive Bayes, neural networks, SVM, kNN, meta learning, decision trees (CART) | ( |
| Odor recognition and discrimination | Gnostic fields as a sub-neuronal network derived algorithm | ( | |
| Ordering olfactory stimuli according to descriptors of odor perception | Multidimensional scaling and self-organizing maps | ( | |
| Prediction of the affective component of an odor from EEG-derived responses to control olfactory stimulation | Principal component analysis-based feature selection, linear discriminant analysis-based classifier | ( | |
| Prediction of personalized olfactory perception | RF classifier followed by Pearson’s correlation | ( | |
| Prediction of the activity of chemicals for a given odorant receptor | SVM algorithm | ( | |
| Pattern recognition inolfactory phenotypes | Prediction of olfactory diagnosis and underlying etiologies from olfactory subtest results | Emergent self-organized (Kohonen) maps (ESOM) | ( |
| Determination of pattern and types of odor affected in Alzheimer’s disease | RF with recursive feature elimination algorithm (RF-RFE) | ( | |
| Pattern recognition by construction of an olfactory bionic model and a 3-layered cortical model, mimicking the main features of the olfactory system | Three artificial neural networks (ANNs): back-propagation network, SVM classifier, and a radial basis function classifier | ( | |
| Olfactory acuity as diagnostic biomarker | Early Parkinson diagnosis from multimodal features including olfaction | Naive Bayes classifier, logistic regression, adaptive boosted trees, RFs, and SVM | ( |
| Diagnosis of Parkinson’s disease from olfactory phenotypes | SVM, linear discriminant analysis | ( | |
| Feature selection for predicting rapid progression of Parkinson’s disease using public data from the Parkinson’s Progression Markers Initiative, including olfactory parameters | Feature selection using the so-called wrapper approach with decision tree and naive Bayes based methods, classification using the C4.5 decision tree algorithm | ( | |
| Odor identification as diagnostic tool for Parkinson’s disease. | Random-forests classifier | ( | |
| Odor recognition from physico-chemical properties of volatile molecules (electronic noses) | Prediction of odors from molecular properties of odorant molecular separated by means of gas chromatography | Feature selection using RF, classification using RF, SVM, extreme learning machines | ( |
| Prediction of olfactory perception from chemical features of odor molecules | Regularized linear models, random-forests | ( | |
| Relationships between molecular structure and perceived odor quality of ligands for a human olfactory receptor. | 3D pharmacophore-based molecular modeling techniques | ( | |
| Evaluation of the applicability of composite odors | ANN | ( | |
| Application of Sensory Evaluation by an electronic nose for quality detection in citrus fruits | RF-based on bootstrap sampling | ( | |
| Knowledge discovery in publicly available big databases. | Biological roles exerted by the genes expressed in the human olfactory bulb | Over-representation analysis | ( |
The list has been obtained from a PubMed search at https://www.ncbi.nlm.nih.gov/pubmed on 21 September 2018, for “(machine-learn* OR machine learn* OR support* vector machines OR svm OR naive bayes OR bayes OR random forest* OR knn OR k nearest neighbor* OR k-nearest neighbor* OR adaptive boosting OR boosting OR boosted tree* OR decision tree* OR deep learning OR artificial neural network*) AND (smell OR olfact*) AND (human OR patient OR volunt*) NOT review[Publication Type]. The search obtained 57 hits, followed by data cleaning for reports where the focus was non-olfactory research such as using olfactory data as an example for method validation.
Figure 1.Overview about approaches to data processing, pursed either by statistics (left part) or by machine learning (right part). As explained previously (Chollet and Allaire 2018), statistics applies preselected rules, mathematical methods, or statistical algorithms to data with the aim to obtain an answer about a preformulated hypothesis such as a difference in a parameter among diagnostic groups of subjects. By contrast, in typical machine-learning tasks data are provided together with the answers such as the membership in a diagnostic group with the aim to obtain rules or algorithms that can provide the diagnosis group membership from new data where this is unknown yet. Such artificial intelligence or machine learning–based algorithms can take several different forms. The icons in the third line of the right part of the figure symbolize respective typical machine-learning methods, i.e., multilayer neuronal networks, decision tree–based algorithms, algorithms such as SVM that separate the classes by placing a hyperplane between them, or prototype-based algorithms such as k nearest neighbors that compare the feature vectors carried by a case with those carried by other cases and assign the class on the basis of the classes to which cases with most similar feature vectors belong. In human olfactory research, machine-learned algorithms have been applied to obtain answers that can be classified into 5 groups (bottom line of the right part of the figure; Table 1).
Figure 2.An example of unsupervised machine learning applied on data related to human olfaction. Representation of the olfactory subtest results pattern obtained using a projection of the data points onto a toroid neuronal grid. The data originate from a previous analysis of pattern in olfactory subtests acquired in 10,714 subjects (Lötsch et al. 2016). For the present graphical demonstration, a subset of 5% of these data had been randomly drawn in a class-proportional manner, i.e., preserving the relative numbers of subjects with normosmia, hyposmia, or anosmia. (A) The projection was obtained using a parameter-free polar swarm, Pswarm consisting of so-called DataBots (Thrun 2018) which are self-organizing artificial “life forms” that carry vectors of the individual olfactory subtest results. During the learning phase, the DataBots were allowed adaptively adjusting their location on the grid close to DataBots, according to the Euclidean distance, carrying data with similar features, with successively decreasing search radius. When the algorithm ended, the DataBots became projected points. To enhance the emergence of data structures on this projection, an U-matrix (Ultsch 2003; Lötsch and Ultsch 2014) displaying the distance in the high-dimensional space was added as a third dimension. It was colored in a geographical map analogy with brown snow-covered heights and green valleys with blue lakes. Watersheds indicate borderlines between different groups of subjects suggesting 3 clusters. (B) Hierarchical clustering of the projected data also indicated 3 clusters, supporting the machine-leaned results shown in A. (C) Mosaic plot representing a contingency table of the olfactory diagnoses versus the machine-learned clusters of olfactory subtest results. The size of the cells is proportional to the number of subjects included. The calculations and figure creation were performed using the R software package (version 3.4.3 for Linux; http://CRAN.R-project.org/; R Development Core Team 2008), in particular the libraries “DatabionicSwarm” (https://cran.r-project.org/package=DatabionicSwarm [Thrun 2018]). The figure reproduces results of a previous analysis of the same data set (Lötsch et al. 2016), however, using a different unsupervised machine-learning method for non-redundancy.