| Literature DB >> 31998109 |
Jonathan Bac1,2,3,4, Andrei Zinovyev1,2,3,5.
Abstract
Machine learning deals with datasets characterized by high dimensionality. However, in many cases, the intrinsic dimensionality of the datasets is surprisingly low. For example, the dimensionality of a robot's perception space can be large and multi-modal but its variables can have more or less complex non-linear interdependencies. Thus multidimensional data point clouds can be effectively located in the vicinity of principal varieties possessing locally small dimensionality, but having a globally complicated organization which is sometimes difficult to represent with regular mathematical objects (such as manifolds). We review modern machine learning approaches for extracting low-dimensional geometries from multi-dimensional data and their applications in various scientific fields.Entities:
Keywords: dimension reduction; high-dimensional data; intrinsic dimension; manifold learning; multi-manifold learning
Year: 2020 PMID: 31998109 PMCID: PMC6962297 DOI: 10.3389/fnbot.2019.00110
Source DB: PubMed Journal: Front Neurorobot ISSN: 1662-5218 Impact factor: 2.650
Figure 1A simple inspiration example of a dataset, possessing low-dimensional intrinsic structure, which, however, remains hidden in any low-dimensional linear projection. The dataset is generated by a simple branching process, filling the volume of an n-dimensional hypercube: one starts with a non-linear (parabolic) trajectory from a random point inside the cube which goes up to one of its faces. Then it stops, a random point from the previously generated points is selected, and a new non-linear trajectory starts in a random direction. The process continues to generate k branches; then the data point cloud is generated by adding a uniformly distributed noise around the generated trajectories. If k is large enough then the global estimate of the dataset dimensionality will be close to n: however, the local intrinsic dimension of the dataset remains one (or, between one and two, in the vicinity of branch junctions or intersections). The task of the lizard brain is to uncover the low-dimensional structure of this dataset: in particular, classify the data points into the underlying trajectory branches and uncover the tree-like structure of branch connections. The figure shows how various unsupervised machine learning methods mentioned in this review capture the complexity of this dataset having only k = 12 branches generated with n = 10 (each shown in color) in 2D projections. Most of the methods here use simple Euclidean base space, besides ElPiGraph, in which case the structure of the base space (tree-like) is shown by a black line and the 2D representation is created by using the force-directed layout of the graph.