Literature DB >> 35079712

Data-driven artificial and spiking neural networks for inverse kinematics in neurorobotics.

Alex Volinski¹, Yuval Zaidel¹, Albert Shalumov¹, Travis DeWolf², Lazar Supic³, Elishai Ezra Tsur¹.

Abstract

Inverse kinematics is fundamental for computational motion planning. It is used to derive an appropriate state in a robot's configuration space, given a target position in task space. In this work, we investigate the performance of fully connected and residual artificial neural networks as well as recurrent, learning-based, and deep spiking neural networks for conventional and geometrically constrained inverse kinematics. We show that while highly parameterized data-driven neural networks with tens to hundreds of thousands of parameters exhibit sub-ms inference time and sub-mm accuracy, learning-based spiking architectures can provide reasonably good results with merely a few thousand neurons. Moreover, we show that spiking neural networks can perform well in geometrically constrained task space, even when configured to an energy-conserved spiking rate, demonstrating their robustness. Neural networks were evaluated on NVIDIA's Xavier and Intel's neuromorphic Loihi chip.

Entities: Chemical

Keywords: Intel Loihi; NVIDIA Xavier; artificial neural networks; neural engineering framework; neuromorphic engineering; online learning; redundancy resolution; robotic arm; spiking neural networks; underdetermined systems

Year: 2021 PMID： 35079712 PMCID： PMC8767299 DOI： 10.1016/j.patter.2021.100391

Source DB: PubMed Journal: Patterns (N Y) ISSN： 2666-3899

Introduction

In the past few decades, multi-joint open-chain robotic arms have been utilized in a diverse set of applications, ranging from robotic surgeries to space debris mitigation. While the position of a robotic arm's End-Effector (EE) is often defined in Cartesian task space, the arm is configured in configuration space, spanned by the robot's joint angles accounting for the robot's Degrees of Freedom (DoF). While mapping can be trivially elucidated by realizing the robot's Forward Kinematics (FK) with transformation matrices, the inverse mapping, termed Inverse Kinematics (IK), is computationally challenging, as the same reached point in task space can be realized with different configurations. IK is usually numerically optimized to achieve redundancy resolution using Jacobian inverse or fuzzy logic. While analytical descriptions of IK are generally limited to relatively simple robotic systems, numerical methods can be used to optimize solutions for intricate scenarios. While proven incredibly useful, numerical methods are subjected to complete mechanical descriptions and known environments. Several numerical approaches for IK were developed to handle geometrical and environmental constraints. However, numerically handled constraints should be mathematically formulated and integrated. More recently, efforts toward utilizing Artificial Neural Networks (ANNs) for data-driven IK have been explored. As long as an FK is available, a neural network-powered IK provides a unified framework, supporting robotic systems with arbitrary complexity, operating in arbitrarily convoluted environments, subjected to any set of constraints. A data-driven approach alleviates the requirement for a full mechanical and mathematics descriptions. In this work, we critically re-evaluated the utilization of ANNs for IK, with various loss functions, activations, and architectures. We further extend the discussion to IK with geometrical constraints. Robotic arms often operate in a convoluted operational space or in collaboration with a human operator. As a result, IK can be constrained to work in a sub- configuration space. For example, a robotic arm should be prevented from invading a human's personal space or hitting obstacles. Recently, Chembuly and colleagues extended conventional IK optimizers to support collision avoidance. Here, we've taken a data-driven approach, relying on the versatility of neural networks, to propose a robust framework for non-constrained and geometrically constrained IK. Uniquely, we utilized Spiking Neural Networks (SNNs), which closely emulate the nervous system's computational properties for IK. SNNs stand at the foundation of neurorobotics, an important frontier in neuromorphic computing research. It provides biologically inspired energy-efficient control of robotic systems. We evaluated SNNs for IK with three different approaches: (1) converting ANNs to SNNs using spikes-tailored activation functions; (2) neuromorphically implementing Stochastic Gradient Descent (SGD), with recurrent neural connections; and (3) utilizing neuromorphic online learning. ANNs were defined using Keras, and SNNs were simulated using the Neural Engineering Framework (NEF). NEF brings forth a theoretical framework for a neuromorphic representation and transformation of mathematical constructs with spiking neurons, allowing the implementation of functional large-scale neural networks. NEF is extensively used to design neuromorphic systems capable of visual perception and motor control. It serves as the foundation for Nengo, a Python-based "neural compiler," which translates high-level descriptions to low-level neural models. NEF-inspired neuromorphic hardware designs have been implemented in both analog and digital circuitry. A version of it has been compiled to work on the most prominent neuromorphic hardware architectures available, including Intel's neuromorphic Loihi circuit. Here, we converted ANNs to SNNs using NengoDL and implemented neuromorphic SGD using the Nengo-based Gyrus framework. We trained the networks offline and evaluated them on specialized hardware. ANNs and SNNs were evaluated on NVIDIA's Xavier board. SNNs were also assessed on Intel's Loihi chip. We demonstrate how ANNs with no prior joint data can provide IK with fast inference time and accurate results. While their spiking counterparts were shown to be less accurate, they provide sufficient neuromorphic approximations.

Results

Neural network size and inference time

In this work, we examined the performance of various neuronal architectures (Figure 1), each with a different number of parameters and inference time. Data are summarized in Table 1. Table 1 also includes the inference time for conventional numerical optimization (with the AGX Xavier) using Jacobian inverse for comparison.

Figure 1

System design

A reachability study was conducted using a robotic model and evaluated with numerical optimization reaching to 200,000 uniformly distributed points across a 2 × 2 × 1-m space. Reachable points and the forward kinematic model are used to train deep ANNs and SNNs as well as recurrent and learning-based SNNs, providing predictive models for IK.

Table 1

ANN number of parameters and inference time

	Number of parameters	Inference (ms)
Numerical	–	16.52 ± 1.26
ResNet 4 blocks	3,141,893	6.05 ± 0.29
ResNet 2 block	183,173	2.55 ± 0.44
FC 10x128	149,765	1.34 ± 0.23
FC 8x128	116,741	1.06 ± 0.14
FC 6x128	83,717	0.87 ± 0.13
FC 6x256	331,269	0.92 ± 0.12
FC 5x128	67,205	0.79 ± 0.12
FC 5x256	265,477	0.82 ± 0.12
FC 4x128	50,693	0.67 ± 0.11
FC 4x256	199,685	0.71 ± 0.11
FC 3x128	34,181	0.58 ± 0.1
FC 3x256	133,893	0.62 ± 0.1
FC 2x128	17,669	0.49 ± 0.15

ANN number of parameters and inference time We evaluated the loss from Equations 1 and 2 with two Fully Connected (FC) layer ANNs (Figure 2A). Results demonstrate the superior performance of the regularized loss functions for target accuracy within 1 cm and 1 mm distance across Tanh, Rectified Linear Unit (ReLU), and leaky ReLU activations (Figure 2B). We used shallow networks to evaluate the regularization scheme, thus justifying regularization in further assessment. We further explored IK performance for ANNs with varying depths and widths across the three activation functions. We measured the mean accuracy and the percentage of points for which accuracy of >1 mm and >1 cm has been reached for each architecture. Results for 200,000 training points, 2 to 10 layers, across Tanh, ReLU, and leaky ReLU are shown in Figure 2C. Results demonstrate the superior performance of the 5 × 128 FC architecture (<1 ms inference time, < 1 cm accuracy).

Figure 2

Artificial neural networks for inverse kinematics

(A) ANN schematic.

(B) Superior performance of regularized loss functions with shallow 2 × 128 neural networks.

(C) The percentage of target points within 1 mm (top), 1 cm (middle), and the mean error distance (bottom) for ANNs with varying depths and activations.

(D and E) Performance for Tanh activated ANN with varying width, depth (D) and training data size (E).

(F) Residual neural networks architecture.

(G) Residual networks performance compared with FC ANNs with varying architectures and activations.

(H–J) The percentage of target points within 1 mm (H) and 1 cm (I), as well as mean error distance (J) for FC 5 × 128 ANNs, Tanh activated, with energy loss function.

(K) Comparison between FC 5 × 128 ANNs with energy loss function with Tanh, Swish, and Mish activations.

System design A reachability study was conducted using a robotic model and evaluated with numerical optimization reaching to 200,000 uniformly distributed points across a 2 × 2 × 1-m space. Reachable points and the forward kinematic model are used to train deep ANNs and SNNs as well as recurrent and learning-based SNNs, providing predictive models for IK. Artificial neural networks for inverse kinematics (A) ANN schematic. (B) Superior performance of regularized loss functions with shallow 2 × 128 neural networks. (C) The percentage of target points within 1 mm (top), 1 cm (middle), and the mean error distance (bottom) for ANNs with varying depths and activations. (D and E) Performance for Tanh activated ANN with varying width, depth (D) and training data size (E). (F) Residual neural networks architecture. (G) Residual networks performance compared with FC ANNs with varying architectures and activations. (H–J) The percentage of target points within 1 mm (H) and 1 cm (I), as well as mean error distance (J) for FC 5 × 128 ANNs, Tanh activated, with energy loss function. (K) Comparison between FC 5 × 128 ANNs with energy loss function with Tanh, Swish, and Mish activations. We further explored this architecture with a width of 256 neurons (Figure 2D) and 200,000 data points (Figure 2E), demonstrating superior performance with a six-layer network. This network has 0.18% < 1 cm accuracy with <1 ms inference time (Table 1). To further investigate the computational capacity of ANNs for IK, we utilized ResNets, featuring >parameters, and skip links (Figure 2F; Table 1). ResNets were found to have comparable performance with 1 mm accuracy specification (Figure 2G). We explored our data-driven energy-based loss function (described in Equation 3) with varying depth (Figures 2H–2J), demonstrating dramatic improvement compared with the performance gained with traditional loss definitions. We show that a six-layer ANN's mean error distance was improved by ∼10x (∼2 mm–0.2 mm) and that the percentage of points above the 1-cm threshold dropped by ∼10x (∼60%–6%). We evaluated the energy-driven network with Swish and Mish activations, demonstrating further performance improvement, with a 33% drop in the percentage of points above the 1-cm threshold (4%) (Figure 2K). We investigated the performance of NengoDL-based deep SNNs (Figure 3A) for IK. We used ANN to SNN transfer learning by modulating neurons' tuning curves into a differentiable form (Figure 3B, see methods for details). Network performance with and without transfer learning from their artificial counterparts produced similar results. ANNs can be therefore translated to SNNs by defining neurons as soft leaky-integrate-and-fire (LIF) spiking neurons and temporally integrating spikes. In such implementation, the spiking neurons' differentiable approximations are used during training, while the spiking neurons themselves are used during inference.

Figure 3

Deep spiking neural networks for inverse kinematics

(A) ANNs can be converted to SNNs using the NengoDL framework, where neurons are defined in ensembles.

(B) Differentiable temporally integrated activation of LIF-based spiking neurons.

(C and D) The percentage of target points within 1 cm (left) as well as mean error distance (right) for SNN with varying depth, synaptic smoothing, and maximal firing rate. Mean error heatmap is shown in (D).

(E) Mean error comparison between the best-performing SNN and its ANN counterpart.

Deep spiking neural networks for inverse kinematics (A) ANNs can be converted to SNNs using the NengoDL framework, where neurons are defined in ensembles. (B) Differentiable temporally integrated activation of LIF-based spiking neurons. (C and D) The percentage of target points within 1 cm (left) as well as mean error distance (right) for SNN with varying depth, synaptic smoothing, and maximal firing rate. Mean error heatmap is shown in (D). (E) Mean error comparison between the best-performing SNN and its ANN counterpart. We explored deep SNNs with varying depths, synaptic smoothing factors, and maximal fire rates (see methods for details). As expected, as the neurons' firing rates increase, they approximate their non-spiking versions more closely as their integrated (or convolved) spike trains more closely resemble a non-spiking behavior, thus producing lower mean error and a higher percentage of targets within 1-cm accuracy (Figure 3C). Smoothness was shown to be most effective with a 20-ms time constant. A heatmap demonstrating the results is shown in Figure 3D. As expected, a comparison between spiking to non-spiking networks shows the superior performance of conventional ANNs. Note that the high frequency of spikes required for accurate predictions is not attainable with current neuromorphic hardware. Therefore, a more neuromorphic-appropriate design would utilize recurrent or learning-based neuronal designs. Learning-based approaches can be either pre-trained to achieve a better initial weight scheme or naively initialized to zero. Pre-training was implemented by Prescribed Error Sensitivity (PES) on the entire training set, where the resulted weights were averaged and saved for network initialization. Note that network training does not constitute a general solution for IK. IK is resolved using PES-driven online learning for each target point, as was described above. We show that pre-trained models have faster inference. Inference performance is shown in Table 2. SNNs were deployed on Intel's Loihi circuit (with a 1-kHz spiking rate).

Table 2

SNN number of parameters and inference time

	Hardware	# parameters	Inference (s)
Deep SNN 4x256	Intel's Loihi	199,685	0.4
Learning-based SNN	Xavier	5,000	2.9 ± 1.22
Learning-based SNN	Intel's Loihi	5,000	3.4 ± 0.61
Learning-based SNN (pre-trained)	Intel's Loihi	5,000	2.6 ± 0.04
SGD-recurrent SNN	Xavier	300,000	3.8 ± 0.62

Inference time includes the standard deviation computed on the entire test set.

SNN number of parameters and inference time Inference time includes the standard deviation computed on the entire test set.

Recurrent and learning-based SNN

Gyrus-based (Figure 4A) derivation of is demonstrated in Figure 4B (at point [0.17, 0.17, 0.35]), where the 3x5 = 15 Jacobian's values are approximated. Reaching that point via the derivation of the appropriate joint configuration is shown in Figure 4C. Another approach would be using a learning-based derivation of IK via the PES learning rule (Figure 4D). The network architecture is discussed in detail in Zaidel et al. Error convergence and reaching point [0.17, 0.17, 0.35] are demonstrated in Figures 4E and 4F. The comparison between learning and SGD-recurrent implementation is shown in Figure 4G.

Figure 4

Learning and recurrent SNNs for inverse kinematics

(A) Simplified recurrent SNN for IK.

(B) Inverse Jacobian approximation with SNN (15 values approximating the 3 × 5 Jacobian matrix).

(D) Learning-based SNN for IK.

(E and F) Error convergence and reaching p with learning-based SNN.

(G) Compared error convergence (left), accuracy histogram (middle), the percentage of target points within 1 cm, and mean error distance (right), for learning and recurrence-based SNNs.

(H) Raster plot for error representing neurons in the learning-based SNN.

Learning and recurrent SNNs for inverse kinematics (A) Simplified recurrent SNN for IK. (B) Inverse Jacobian approximation with SNN (15 values approximating the 3 × 5 Jacobian matrix). (C) Exemplified reaching to point p = [0.17, 0.17, 0.35] with a recurrent SNN. (D) Learning-based SNN for IK. (E and F) Error convergence and reaching p with learning-based SNN. (G) Compared error convergence (left), accuracy histogram (middle), the percentage of target points within 1 cm, and mean error distance (right), for learning and recurrence-based SNNs. (H) Raster plot for error representing neurons in the learning-based SNN. Results demonstrate the learning-based approach's superior performance. While the mean error is higher with a learning-based approach, it is biased by points to which reach was not successfully computed. The mean error for the points to which the learning-based approach was able to converge was approximately 1 mm, as is demonstrated in the histogram. Note that it takes time for the network to accurately compute the FK and the Jacobian used to calculate IK in the online learning-based method due to the neurons' response dynamic (synaptic time constant). A raster plot for the error representing neurons in the learning-based SSN is shown in Figure 4H, showing convergence pattern of spikes, reaching homogeneous activation pattern after ∼2.4 s. Note that the error is rate-coded following the neurons' tuning curves. Some neurons decrease their firing rate while others increase it with reduced error. Results show a significantly higher inference time of SNNs (increasing from milliseconds to seconds' range) required for computing convergence. The SGD-based network requires a relatively high number of neurons for reasonable results, pointing out the learning-based approach's advantages. Inference time was calculated for points for which 1-cm convergence was achieved. We found that higher accuracy is difficult to attain in our current configuration (number of neurons, high DoF). The number of parameters and inference times are detailed in Table 2. Note that we've limited the evaluation metric to a 1-cm threshold due to the limited accuracy attained using SNNs (mean error distance is in the order of a few mm). While a few millimeters of deviation, for many applications, might be sufficient, it highlights the need to use further adaptation (e.g., from sensors), allowing more accurate targeting (see discussion).

Geometrically constrained IK

We used our best-performing five hidden layers ANN to calculate geometrically constrained IK. The network was trained to avoid collision with radial obstacles featuring radii of 10 and 20 cm, following Equation 5. Reachability map (see methods for detailed description) and example of 100 configurations, calculated to avoid a 20-cm obstacle, are shown in Figure 5. We measured the mean accuracy and the percentage of points for which accuracy of >1 mm and >1 cm was reached with 10- and 20-cm spherical obstacles. Results across Tanh, ReLU, leaky ReLU, Swish, and Mish activations are shown in Figure 6A. Results demonstrate the superior performance of Mish activation. To illustrate the importance of network optimization for obstacle avoidance, we compared its performance when optimized to minimize Equations 3 and 5. Our comparison shows that the number of intersections with a 20-cm spherical obstacle is dramatically reduced by a factor of 4 when the loss function considers obstacle avoidance (Figure 6B). We further compared the intersections with obstacles across the different activations, showing that while Mish activation outperformed all other activations for large obstacles, performance plateaued at 10% for small obstacles (Figure 6C). Given our relatively small dataset, we compared our ANN performance to a two-layer ResNet, demonstrating ANN's superior performance (Figure 6D).

Figure 5

Geometrically constrained IK

A reachability map indicates the location of a 20-cm spherical obstacle (colored green) and out of reach points (colored blue). A demonstration of 100 configurations to target locations (red dots) is shown on the right. Each of the robot's links is colored differently.

Figure 6

Geometrically constrained IK with ANNs

(A) Mean accuracy and the percentage of points for which accuracy of <1 mm and <1 cm was not achieved with 10- and 20-cm spherical obstacles. Results achieved with a 5-layer ANN with Tanh, ReLU, leaky ReLU, Swish, and Mish activations.

(B) Number of the robot and the spherical obstacle intersections with (Equation 3) and without (Equation 5) consideration for obstacle avoidance.

(C) Number of the robot and the spherical obstacle intersections yielded with a five-layer ANN with Tanh, ReLU, leaky ReLU, Swish, and Mish activations.

(D) Percentage of points for which accuracy of <1 mm and <1 cm was not achieved with a five-layer Mish-activated ANN and two layers ResNet.

Geometrically constrained IK A reachability map indicates the location of a 20-cm spherical obstacle (colored green) and out of reach points (colored blue). A demonstration of 100 configurations to target locations (red dots) is shown on the right. Each of the robot's links is colored differently. Geometrically constrained IK with ANNs (A) Mean accuracy and the percentage of points for which accuracy of <1 mm and <1 cm was not achieved with 10- and 20-cm spherical obstacles. Results achieved with a 5-layer ANN with Tanh, ReLU, leaky ReLU, Swish, and Mish activations. (B) Number of the robot and the spherical obstacle intersections with (Equation 3) and without (Equation 5) consideration for obstacle avoidance. (C) Number of the robot and the spherical obstacle intersections yielded with a five-layer ANN with Tanh, ReLU, leaky ReLU, Swish, and Mish activations. (D) Percentage of points for which accuracy of <1 mm and <1 cm was not achieved with a five-layer Mish-activated ANN and two layers ResNet. We evaluated this dataset with SNNs via transfer learning, as was discussed above. Interestingly, we show that the obstacle intersection performance is similar across the different maximal firing rates. In contrast to the constraint-free scenario, where network performance was found to heavily rely on spiking rate (Figure 3C), spiking rate is not a critical factor for obstacle avoidance (Figure 7A). This result demonstrates the robustness of an SNN to efficiently avoid obstacles (a low spiking rate constitutes reduced energy consumption). However, when accuracy is considered, SNN's performance heavily relies on the neurons' maximal firing rate and synaptic time constant (Figure 7B). As the maximum firing rate increases, the more accurate IK becomes (Figure 7C). Similar to the non-constrained case (Figure 3C), we show that a synaptic smoothing factor of ∼20 ms outperforms a faster and slower synapse configuration (Figure 7D).

Figure 7

Geometrically constrained IK with SNNs

(A) Obstacle intersections with five-layer SNNs featuring various maximal firing rates across different synapse smoothing factors.

(B) Percentage of points for which accuracy of <1 cm was not achieved with five-layer SNNs featuring various maximal firing rates across different synapse smoothing factors.

(C) Percentage of points for which accuracy of <1 cm was not achieved with five-layer SNNs featuring various maximal firing rates and a synapse smoothing factor of 20 ms.

(D) Percentage of points for which accuracy of <1 cm has was not achieved with a five-layer SNN featuring various synapse smoothing factors and a maximal firing rate of 5,000 Hz.

Geometrically constrained IK with SNNs (A) Obstacle intersections with five-layer SNNs featuring various maximal firing rates across different synapse smoothing factors. (B) Percentage of points for which accuracy of <1 cm was not achieved with five-layer SNNs featuring various maximal firing rates across different synapse smoothing factors. (C) Percentage of points for which accuracy of <1 cm was not achieved with five-layer SNNs featuring various maximal firing rates and a synapse smoothing factor of 20 ms. (D) Percentage of points for which accuracy of <1 cm has was not achieved with a five-layer SNN featuring various synapse smoothing factors and a maximal firing rate of 5,000 Hz. We further evaluated our data-driven (200k sample points) best-performing five layers (x 128) FC Mish-activated ANN in a convoluted space featuring two to five randomly positioned 10 cm in radius obstacles. For training, we used Equation 6, a generalized form of Equation 5 we used for a single obstacle. A demonstration of the resulting 100 targets reaching in a five-obstacle environment is shown in Figure 8A. As expected, the number of out-of-reach targets in multiple obstacle environments increased with the number of obstacles (Figure 8B). However, as the obstacles were randomly positioned, some obstacles may constrain the arm's configuration space to a greater degree than others, as is demonstrated in the number of reachable points in the three-obstacle scenario, which is higher than in the four-obstacle scenario (Figure 8C). The resulting distances to targets in the one- to five-obstacle environments demonstrate the modularity and high performance of the ANN-based IK (Figures 8D–8F). This is particularly evident in the number of obstacle intersections, which is kept lower than 1% in all tested scenarios (Figure 8G). While network performance generally increases with fewer obstacles, it is not always true, as is evident with the similar performance measured for the three- and four-obstacle scenarios. Thus, pointing out again the importance of the obstacle position on the ability of the arm to navigate efficiently.

Figure 8

Multiple obstacles constrained inverse kinematics

(A) A demonstration of 100 targets reaching in a five-obstacle environment.

(B and C) Number of out-of-reach (B) and reachable targets (C) in one- to five-obstacle environments.

(D) Mean distances to targets in one- to five-obstacle environments.

(E and F) Percentage of points for which accuracy of <1 cm (E) and <1 mm (F) was not achieved in one- to five-obstacle environments,

(G) Number of obstacle intersections in one- to five-obstacle environments.

Multiple obstacles constrained inverse kinematics (A) A demonstration of 100 targets reaching in a five-obstacle environment. (B and C) Number of out-of-reach (B) and reachable targets (C) in one- to five-obstacle environments. (D) Mean distances to targets in one- to five-obstacle environments. (E and F) Percentage of points for which accuracy of <1 cm (E) and <1 mm (F) was not achieved in one- to five-obstacle environments, (G) Number of obstacle intersections in one- to five-obstacle environments.

Discussion

Biological motor control is considered far superior to our most advanced robotic systems, which predominantly rely on analytical and numerical control. Therefore, there is a great promise in neurorobotics that strives to use neuronal architectures to guide robotic systems' behavior. Neural networks have the enticing capacity to be trained to perform tasks without an analytical solution; however, the successful application of neural networks to robotics, beyond sensory processing, remains limited. In contrast to numerical and analytical descriptions of IK, as long as an FK is available, a neural network-powered IK provides a unified framework for motion planning for robotic systems with arbitrary complexity, operating in arbitrarily convoluted environments, subjected to any definable constraints. In this work, we investigate the utilization of ANNs and SNNs for IK, the most fundamental problem in robotics. The utilization of ANNs for IK has been vastly explored. For example, Almusawi and colleagues enhanced a deep ANN with known joint configurations to solve IK for a 6-DoF robotic arm; Wand and colleagues recently enhanced conventional ANNs with a damped least square optimization to provide faster convergence to the desired IK threshold; Duka and colleagues generalize ANNs for EE trajectory planning; and Li and colleagues further generalized ANNs to constrained trajectories. Here, we further explored ANN-based solutions to IK, with various configurations and activations, having only the FK model and a reachability map as inputs. Our models presented here were not trained to elucidate known joint configurations but rather elucidate robot configurations by optimizing EE's Euclidean distance from its target using known FK. Our training data are therefore ambiguous, as multiple joint configurations can realize the same position in task space. We demonstrated that training a neural network with ambiguous data results in low accuracy, mainly when a fine threshold is set (1 mm in contrast to 1 cm). A common practice is to resolve this ambiguity by defining a default configuration and using Mean Square Error (MSE) to that set of joint angles so that all solutions are attempting to be near the same configuration (constituting a “null controller”). As long as none of the points in the null space are at a singularity, this method usually works well. Here we have taken a different, data-driven approach. We reduced ambiguity by using an energy-driven loss function and training the network with numerous neighboring joints' configurations such that they will be mapped to close points in task space. This method does not require having one reference joint's configuration to which all configurations relate. Note that since we use many pairs across the robot's reachable map and that across the configuration space a transitive property holds, the network is driven to overall consistency. This energy-driven loss resulted in dramatically improved accuracy (10x in terms of mean error distance and a smaller percentage of points above the 1 cm accuracy threshold). This loss function allows for IK-based motion planning with smooth transitions between targets. Using an energy term to achieve redundancy resolution in IK does not require a new dataset but rather a rational choice of neighboring target points for training. This data-driven approach has the flexibility of handling complex scenarios, as we showcased with geometrically constrained IK, discussed below. We further demonstrate Swish and Mish activations' superior performance for energy-driven ANNs and explored overparameterized architectures such as ResNets. While it is often unattainable to explain the underlying reasons for the differences in performance across the different activation functions, we show that across the traditional activations, Tanh has better performance. We show that the Tanh-dependent Mish activation further improves accuracy. Probably due to its positively unbounded, negatively bounded, smooth, and nonmonotonic characteristics. We used ResNets, which are very rich in parameters and feature skip links for IK. These skip links allow “skipping” non-contributing parameters, thus, allowing a better estimation of the capacity of ANNs to solve the problem given a dataset. We show that millions of parameters ResNets underperformed conventional FC ANNs, suggesting that we maximized the capacity of ANN to resolve IK with our current dataset size. Uniquely, we extend the discussion to spiking neuronal architectures. In recent years, neurorobotics, in which SNNs are the underlying computational framework, have gained increased attention. Neurorobotics are argued to outperform conventional control paradigms in terms of robustness to perturbations and adaptation to varying conditions., However, neuromorphic implementation of robotic control is usually tailored toward adaptive control rather than utilized to perform exact localization. This is since spanning high-dimensional space (as required by robotic systems with high DoF) requires a large number of neurons. Recently, learning-based IK was implemented with SNNs, demonstrating how carefully tuned neuromorphic encoding can be used to perform high-dimensional nonlinear computations. Here we further extend the discussion. We investigate deep SNNs, recurrent and learning-based SNNs for IK. Deep SNNs were shown to perform well for various perception tasks. By performing ANN to SNN transfer learning, we show that SNNs require a high spiking rate to approximate traditional ANN performance. Spiking rate is correlated with increased energy expenditure, implying a non–energy-efficient utilization of their capacity. Rate-coded SNNs are known to be limited in their capacity to perform exact computations. They are much better when realizing a learned behavior. With deep SNNs, we achieved a few millimeters of mean deviation from the target. While a few millimeters distance from the target might be sufficient for many applications, it highlights the need to use further adaptation (e.g., from sensors), allowing more accurate targeting. We further show that while highly parameterized neural networks with tens to hundreds of thousands of parameters exhibit very high accuracy, learning-based spiking architectures can provide reasonably good results with merely a few thousand neurons. Note that considering a typical robotic hardware response time and accuracy, we consider 1-mm accuracy is sufficient and millisecond-range response time as good enough in most applications. To further evaluate SNN for IK, we assessed a spiking SGD. For the first time, we used the Nengo-based Gyrus environment for IK. Gyrus provides a numerical computing framework for SNNs, with which we designed gradient-based approximation of IK (following the traditional Jacobian inverse guidelines). While successfully implemented, we show that for IK, SNNs are more efficiently utilized with real-time learning rather than used to approximate numerical methods. Finally, we show that when the number of intersections with an obstacle is considered, SNNs can perform very well, even when configured to an energy-conserved spiking rate, demonstrating their robustness. In this work, we show that SNNs underperform ANNs in terms of accuracy and inference time. However, we believe that an exploratory utilization of SNN is of high importance when such a fundamental problem in neurorobotics is addressed, mainly when a pure neuromorphic robotic control is desirable. This work can be further extended to include other SNN-based optimization methods, such as adding a firing rate regularization parameter during training or using amplitude scaling. Moreover, it could be used to investigate its utilization in trajectory planning and in dynamic scenarios where obstacles are in motion. We note that the methodology described here can be utilized in other areas featuring redundancy resolution, where one solution should be chosen from an infinite set. Underdetermined systems are of interest to a broad range of disciplines. For example, recently, Hyun and colleagues highlighted that underdetermined inverse problems had become one of the major concerns in the medical imaging domain (e.g., under-sampled magnetic resonance imaging, interior tomography, and sparse-view computed tomography). Their work laid down the mathematical foundations for utilizing neural networks to handle such underdetermined problems. Similar challenges are addressed in hydrology and geophysics,, pharmacokinetics,, and image reconstruction.

Experimental procedures

Resource availability

Lead contact

Elishai Ezra Tsur at elishai@nbel-lab.com.

Materials availability

This study did not generate new unique reagents.

Robotic model

The simulated robotic arm model has six DoF, out of which, here, we controlled five (the sixth DoF accounts for gripping). The model follows the physical arm design described in Zaidel et al. and is visualized in Figure 1. The arm features six links (12-, 30-, 6-, 20-, 10-, and 20-cm long, respectively) and five joints (rotated around the y, x, x, z, x axes, respectively). The arm has an 82-cm reach and a 1.64-m span. Visualization and measurements were performed using Autodesk's Fusion 360 CAD software. FK was implemented using transformation matrices. For our five joints robotic arm, FK takes the form of , where is the transformation matrix (rotation, translation) in homogeneous coordinates, mapping coordinates at joint axis to coordinates at joint axis (indices 0 and 6 refer to the world and EE axes, respectively). We initialized with the appropriate set of rotations and translations and multiplied it by a zero vector , resulting in an FK model . returns the EE position in the world's coordinate system, where the origin is at the robot's base.

Jacobian inverse

IK can often not be analytically solved, and it is usually numerically optimized using the Jacobian inverse. The Jacobian relates the change of the EE position to the evolution of joint angles using , constituting , where is the change in EE position, resulting from a shift in the robot's configuration . As the Jacobian is not necessarily invertible, a common practice is to use its pseudo-inverse form constituting . Therefore, given an error in space coordinates as the distance between the EE current position and its target position , the appropriate change in joint space can be computed using , where is the difference in joint angles, with which the robot's EE will get closer to its target. In each iteration, this equation is re-evaluated until is within some accuracy threshold. The Jacobian inverse is more elaborately discussed in Buss.

Artificial neural networks

Deep ANNs comprise layers of computational entities characterized with differentiable, nonlinear activation functions and weighted inputs. ANNs can be optimized, with gradient-based training, to provide predictive models. ANNs can be defined using the open-source Python-based software library Keras, providing a layer of abstraction for the TensorFlow library. Here, we used Keras to define FC ANNs with varying depth (number of “hidden” neural layers), width (number of neurons per layers), and activation functions. For activation, we evaluated the performance of standard Tanh, ReLU, and leaky ReLU activations, as well as the more recently defined functions, Mish and Swish. Swish activation is defined using , where is a trainable parameter. Depending on the value of , Swish activation interpolates the Sigmoid-weighted linear function and the ReLU activation function. Mish is a smooth nonmonotonic function defined using . We further utilized Residual neural Networks (ResNets) for IK, first introduced for image classification and recently adopted for other tasks, including nonlinear regression. A ResNet block can be described as a stack of one density and two identity blocks, integrating a varied and preserved number of features, respectively (Figure 1). Here, we started with 64 features (neurons), doubling in every dense block. All ANNs had a five-neuron FC output layer, corresponding to the configuration space dimensionality.

Network optimization

A reachability study was conducted using the robotic model, evaluated with numerically Jacobian-based optimized reaching to 200,000 uniformly distributed points across a 2 × 2 × 1-m space. Numerical reachability was defined with a threshold of 1 mm proximity. To synthetically generate datasets for geometrically constrained task spaces, we implanted spherical obstacles of various radii and uniformly sampled 200,000 target points. We evaluated spheres with 10 and 20 cm in radius located at the target space's origin. Every target point was evaluated for reachability. We geometrically evaluated if the arm's total length is long enough to reach that target point when bent around the obstacle. Target points within obstacles were also defined as unreachable. The dataset was divided into three parts: training (70%), validation (15%), and test (inference) (15%) data. We defined three loss functions: naive, regularized, and energy-based. We define the naive cost function using:where FK is a mapping, is the predicted joint configuration, and is the target EE position. IK is not an injective function due to the angles' periodicity and the fact that several joint configurations can realize the same target in task space (the system is underdetermined). Underdetermined systems have infinitely many least-squares solutions. Like pseudo-inverse techniques, L2 provides the least-squares solution with a minimum norm, which is unique:where is a regularization coefficient, set here as . However, regularization does not resolve the ambiguities completely, since it does not consider joint identity nor sign (L2 cost for , equals its cost for ). Therefore, we propose adding an energy term guided by a heuristic according to two close points in configuration space should be close in task space. This heuristic allows for abrupt transitions between targets, with minimum changes in the robot's configuration. Here, similarity in task space was defined by Euclidean distance, and configuration similarity was determined using MSE:where and are the model prediction for and (close points in task space), respectively. Here we set . In Equation 3, the first two terms minimize the distance between the predicted and target configurations. The third term aims to reduce the change in configuration space between neighboring points in the task space. With this loss function, the network is trained with points within a distance threshold (here, we chose 3 cm). Equation 3 defines a loss function responsible for reaching a target point in task space such that the redundancy resolution is resolved. To account for geometrical constraints, we incorporate another term, which is augmented significantly when any part of the robotic arm approaches an obstacle. We would like that the closer any of the robotic arms' links get to the center of a spherical obstacle, the loss function gets larger. This loss function should consider penalty for each link . should increase dramatically when the arm crosses the sphere's boundary. It can comprise the sum of the squared distances from each arm's links edges and midpoint to the sphere's center (three points were evaluated for each link). To implement rapid increase across the sphere's boundary, we define using a parameterized inverse logistic function:where represents the curve slope, is the squared distance from a link's point to the sphere's center, is the squared sphere's radius, and is a penalty epsilon, which controls the value of x above, which the function initiates a rapid ascent and s is the ascent's rate. Here we empirically set and . When , the distance between a link's point and sphere boundary (its radius), and rapidly climbs. Following Equation 3, Equation 2 can be further extended as: To handle a convoluted environment featuring multiple obstacles, Equation 5 can be generalized to:where is the number of obstacles. For ANN training, we used batch SGD, with a batch size of 10 and an initial learning rate of 0.1. The learning rate was scheduled to reduce by 20% when an error plateau is reached, defined as a non-improving error in a 22-epoch interval, with a minimum relative improvement of 0.0005 and a minimum rate of . Early stopping was scheduled to 750 epochs.

The NEF and online learning

NEF-based neuromorphic spikes' rate coding of numerical input vectors (or stimuli) is defined as , where is the spike rate of neuron , G is the LIF neuronal model, is a gain term, is the encoding vector (the value for which the neuron is firing with the highest spike rate), and is a fixed background current. An ensemble of neurons collectively represents a stimulus as using , where are linear decoders, which were optimized to reproduce x using least squared optimization and is the spiking activity convolved with filter . Similar to decoder optimization, it has been shown that any function could be approximated using some set of functional decoders . Defining in NEF can be made by connecting two neuronal ensembles A and B via neural connection weights using: , where is the neuron index in ensemble , is the neuron index in ensemble , are the decoders of ensemble A, which were optimized to transform to , is the encoders of ensemble B, which represents , and is the outer product operation. Connection weights, which govern the transformation between one neuromorphic representation to another, can also be adapted or learned in real time rather than optimized during model building. One efficient way to implement real-time learning with NEF is using the PES learning rule, which modifies a connection's decoders to minimize an error signal . is calculated as the difference between the stimulus and its approximated representation . PES applies the update rule: , where is the learning rate. It has been shown that when (denoted ; here, is a vector of firing rates) is larger than , the error goes to 0 exponentially with rate. PES is described at length in reference.

Spiking neuronal architectures

We demonstrate three distinct spiking metho1ogies: transfer learning from ANNs to SNNs using a spike-tailored activation function, neuromorphically implemented SGD, and online learning. With deep SNNs, NEF's non-differentiable neurons' response curves (or tuning curves) are modulated to a differentiable form. LIF neurons tuning can be described using: , where is the neuron's spiking rate, is its refractory period, is its threshold for spike initiation, and is its input current. A rectified version of it would be:where . To provide a differentiable model, can be defined as a soft-max function: , where is a smoothing (low-pass) function. ANNs were defined using Keras, and SNNs were defined using NEF. We used the Python-based Nengo framework to simulate our SNNs and deploy them on Intel's Loihi. Nengo has two useful abstractions: NengoDL and Gyrus, providing interfaces to Keras and Python's numerical computing library, NumPy. We used NengoDL to convert ANNs into SNNs. These deep, layered SNNs are defined with a synaptic time constant (specifying a low-pass filter) and a maximal firing rate. We neuromorphically implemented Pseudo-inverse-based optimization (described above) using Gyrus. Gyrus recursively generates large-scale Nengo models using NumPy semantics. However, since Gyrus does not currently support pseudo-inverse NumPy methods (e.g., linalg.pinv), we use SGD to calculate . Following each time step, is recalculated, and a small joint correction signal is derived. The correction and the Jacobian calculation were implemented with a neuromorphic integrator, allowing past information to influence the current neuronal state. We can realize intricate dynamic behavior by integrating NEF's representation and transformation capabilities by recurrently connecting neuronal ensembles. NEF can be used to resolve the equation , where is an input (the input can be from another neural population), by defining a recursive connection that resolves the transformation: . It can therefore be used to solve differential equations as required here for the derivation of SGD. Our learning-based SNN follows the model proposed in Zaidel et al. Briefly, the robot's current configuration is transformed to a target configuration via PES-driven learning. Transformational synaptic weights are modulated to minimize the decoded error, calculated from the derived distance to the target. As learning progresses, the error is continually minimized, and the new robot configuration is calculated.

Hardware

Algorithms were evaluated on NVIDIA's AGX Xavier and Intel's neuromorphic Loihi chip. The AGX Xavier features 32 TeraOPS (TOPS) 512-Core Volta GPU with Tensor Cores, 8-Core ARM v8.2 64-Bit CPU, and 32 GB RAM. The Xavier is often used in robotic systems. The Loihi chip comprises 128 neuron cores; each simulates 1,024 neurons. The chip includes x86 cores, which are used for spike routing and monitoring. Nengo models were compiled on the Loihi using the nengo_loihi library (version 0.19).

12 in total

Review 1. Robotic surgery: a current perspective.

Authors: Anthony R Lanfranco; Andres E Castellanos; Jaydev P Desai; William C Meyers
Journal: Ann Surg Date: 2004-01 Impact factor: 12.969

Review 2. A review of the integrate-and-fire neuron model: I. Homogeneous synaptic input.

Authors: A N Burkitt
Journal: Biol Cybern Date: 2006-04-19 Impact factor: 2.086

3. NengoDL: Combining Deep Learning and Neuromorphic Modelling Methods.

Authors: Daniel Rasmussen
Journal: Neuroinformatics Date: 2019-10

4. Deep learning-based solvability of underdetermined inverse problems in medical imaging.

Authors: Chang Min Hyun; Seong Hyeon Baek; Mingyu Lee; Sung Min Lee; Jin Keun Seo
Journal: Med Image Anal Date: 2021-01-16 Impact factor: 8.545

5. A spiking neural model of adaptive arm control.

Authors: Travis DeWolf; Terrence C Stewart; Jean-Jacques Slotine; Chris Eliasmith
Journal: Proc Biol Sci Date: 2016-11-30 Impact factor: 5.349

6. Neuromorphic NEF-Based Inverse Kinematics and PID Control.

Authors: Yuval Zaidel; Albert Shalumov; Alex Volinski; Lazar Supic; Elishai Ezra Tsur
Journal: Front Neurorobot Date: 2021-02-03 Impact factor: 2.650

7. Nengo: a Python tool for building large-scale functional brain models.

Authors: Trevor Bekolay; James Bergstra; Eric Hunsberger; Travis Dewolf; Terrence C Stewart; Daniel Rasmussen; Xuan Choo; Aaron Russell Voelker; Chris Eliasmith
Journal: Front Neuroinform Date: 2014-01-06 Impact factor: 4.081

8. A New Artificial Neural Network Approach in Solving Inverse Kinematics of Robotic Arm (Denso VP6242).

Authors: Ahmed R J Almusawi; L Canan Dülger; Sadettin Kapucu
Journal: Comput Intell Neurosci Date: 2016-08-17

Review 9. A Survey of Robotics Control Based on Learning-Inspired Spiking Neural Networks.

Authors: Zhenshan Bing; Claus Meschede; Florian Röhrbein; Kai Huang; Alois C Knoll
Journal: Front Neurorobot Date: 2018-07-06 Impact factor: 2.650

10. Neuromorphic Analog Implementation of Neural Engineering Framework-Inspired Spiking Neuron for High-Dimensional Representation.

Authors: Avi Hazan; Elishai Ezra Tsur
Journal: Front Neurosci Date: 2021-02-22 Impact factor: 4.677

2 in total

1. Comparison of Graph Fitting and Sparse Deep Learning Model for Robot Pose Estimation.

Authors: Jan Rodziewicz-Bielewicz; Marcin Korzeń
Journal: Sensors (Basel) Date: 2022-08-29 Impact factor: 3.847

2. Adaptive control of a wheelchair mounted robotic arm with neuromorphically integrated velocity readings and online-learning.

Authors: Michael Ehrlich; Yuval Zaidel; Patrice L Weiss; Arie Melamed Yekel; Naomi Gefen; Lazar Supic; Elishai Ezra Tsur
Journal: Front Neurosci Date: 2022-09-29 Impact factor: 5.152

2 in total