Matthew Dale1,2, Julian F Miller2, Susan Stepney1,2, Martin A Trefzer3,2. 1. Department of Computer Science, University of York, York YO10 5DD, UK. 2. York Cross-disciplinary Centre for Systems Analysis, University of York, York YO10 5DD, UK. 3. Department of Electronic Engineering, University of York, York YO10 5DD, UK.
Abstract
The reservoir computing (RC) framework states that any nonlinear, input-driven dynamical system (the reservoir) exhibiting properties such as a fading memory and input separability can be trained to perform computational tasks. This broad inclusion of systems has led to many new physical substrates for RC. Properties essential for reservoirs to compute are tuned through reconfiguration of the substrate, such as change in virtual topology or physical morphology. As a result, each substrate possesses a unique 'quality'-obtained through reconfiguration-to realize different reservoirs for different tasks. Here we describe an experimental framework to characterize the quality of potentially any substrate for RC. Our framework reveals that a definition of quality is not only useful to compare substrates, but can help map the non-trivial relationship between properties and task performance. In the wider context, the framework offers a greater understanding as to what makes a dynamical system compute, helping improve the design of future substrates for RC.
The reservoir computing (RC) framework states that any nonlinear, input-driven dynamical system (the reservoir) exhibiting properties such as a fading memory and input separability can be trained to perform computational tasks. This broad inclusion of systems has led to many new physical substrates for RC. Properties essential for reservoirs to compute are tuned through reconfiguration of the substrate, such as change in virtual topology or physical morphology. As a result, each substrate possesses a unique 'quality'-obtained through reconfiguration-to realize different reservoirs for different tasks. Here we describe an experimental framework to characterize the quality of potentially any substrate for RC. Our framework reveals that a definition of quality is not only useful to compare substrates, but can help map the non-trivial relationship between properties and task performance. In the wider context, the framework offers a greater understanding as to what makes a dynamical system compute, helping improve the design of future substrates for RC.
Reservoir computing (RC) first emerged as an alternative method for constructing and training recurrent neural networks [1,2]. The method primarily involved con- structing a random fixed recurrent network of neurons, and training only a single linear readout layer. It was found that random networks constructed with certain dynamical traits could produce state-of-the-art performance without the laborious process of training individual internal connections. The concept later expanded to encompass any high-dimensional, input-driven dynamical system that could operate within specific dynamical regimes, leading to an explosion in new RC substrates.[1]The ability to perform useful information processing is an almost universal characteristic of dynamical systems, provided a fading memory and linearly independent internal variables are present [3]. However, each dynamical system will tend to suit different tasks, with only some performing well across a range of tasks.In recent years, RC has been applied to a variety of physical systems such as optoelectronic and photonic [4,5], quantum [6-8], disordered and self-organizing [9,10], magnetic [11,12] and memristor-based [13] computing systems. The way in which each substrate realizes a reservoir computer varies. However, each tends to implement, physically or virtually, a network of coupled processing units.In each implementation, the concept is to use and exploit the underlying physics of the substrate, to embrace intrinsic properties that can improve performance, efficiency and/or computational power. Each substrate is configured, controlled and tuned to perform a desired functionality, typically requiring the careful tuning of parameters in order to produce a working, or optimal, physical reservoir computer for ad hoc problems.Despite the recent advances of new physical reservoir systems, basic questions for RC are still unanswered. These open problems are summarized and explained in [14]. Relevant questions include: What class of problems can RC solve efficiently? What is the role of heterogeneous structure in RC? What are the limits and benefits of a given physical system for RC? What are the benefits of a physical implementation? To answer these questions, and for the field to move forward, a greater understanding is required about the computational expressiveness of reservoirs and the substrates they are implemented on, if not to at least determine what tasks, for what substrates, are realistically solvable.In the terminology used here, a reservoir represents the resulting abstract system and its dynamics instantiated by (typically, but not limited to) a single, typically static, configuration of the substrate. For an artificial recurrent neural network, implemented in silico, configuration may refer to a set of trained connection weights, defined neuron types and topology. For another substrate, configuration may refer to the physical morphology, physical state, external control signals or complexification of the driving input signal. The number of possible reservoir systems realizable by a substrate depends upon the number of free parameters, and the distinct dynamical behaviours resulting from those parameters. For unconstrained substrates, limited only by the laws of physics, this number may be vast. Yet this does not imply that every such configuration and corresponding reservoir is practical or useful. This also does not imply that each new configuration leads to a different reservoir system; the same or similar dynamical behaviour may be produced by different configurations. The mapping between substrate configuration and instantiated reservoir may be complex.Here we present a practical framework to measure the computational expressiveness of physical or virtual substrates, providing a method to characterize and measure the RC quality of substrates.A higher quality substrate is one that can realize more distinct reservoirs through configuration, giving it greater expressiveness and higher dynamical freedom, and so a greater capacity to tackle very different tasks. Quality is quantified and measured here as the number of distinct reservoirs, or dynamical behaviours, a single substrate can exhibit. The number of reservoirs, rather than configurations, is what is important. This does not imply that substrates with fewer available configuration degrees of freedom perform poorly at every task; they may perform very well at specific tasks within their dynamical range, but are likely to perform poorly when evaluated across a broad range of tasks.To characterize the quality of different substrates, we present the CHARC (CHAracterization of Reservoir Computers) framework. The framework has a basic underlying structure, which can be extended if needed. To demonstrate the framework, it is applied to three different substrates: echo state networks (ESNs) [15], simulated delay-based reservoirs (DRs) [4,16] and a physical carbon nanotube (CNT) composite [9]. The definitions, techniques and substrate-independence of the framework are evaluated using a number of common benchmark tasks.The rest of the paper describes the framework and the techniques used within it, beginning with a description of the workflow, the task-independent properties and search procedure used to characterize the substrate.
Framework outline
The basic structure and flow of the framework is presented in figure 1. The complete characterization process is divided into a series of phases and levels. In phase one (P1), a reference substrate is evaluated, forming the basis against which to compare quality values. In phase two (P2), the test substrate is assessed and compared to the reference.
Figure 1.
Framework phases and levels. (Online version in colour.)
Framework phases and levels. (Online version in colour.)
Basic levels
The three basic levels required for each phase are definition, exploration and evaluation. Additional levels may be added, providing further functions that can be used to manipulate, model and learn from the data produced by the characterization process. Here, an additional level is used to validate and determine the reliability and substrate-independence of the overall framework; others are also possible, see §2b.In general, each level requires the results from the previous level. Techniques applied at each level are flexible, and may be substituted with alternative approaches. The techniques and measures used here are simple, and provide a good foundation to demonstrate the framework's concept.The definition level (P1.1, P2.1) defines the reservoir behaviour space to be explored. The behaviour space represents the abstract behaviour of the configured substrate relative to measures of dynamical properties, and is the space in which quality is measured. The framework uses n measures (see example in figure 2) to define the axes of the n-dimensional behaviour space. See §2c for the measures used here.
Figure 2.
Example of a three-dimensional behaviour space. Here each abstract behaviour is relative to the three chosen property measures. Given enough time to explore the space, the substrate's dynamical/behavioural limitations become apparent.
Example of a three-dimensional behaviour space. Here each abstract behaviour is relative to the three chosen property measures. Given enough time to explore the space, the substrate's dynamical/behavioural limitations become apparent.The exploration level (P1.2, P2.2) measures the quality, by determining how much of the behaviour space is realizable through substrate configurations. An exhaustive search of the substrate's parameter space is infeasible. Instead, the use of diversity search algorithms [17] is recommended. These exploration techniques, based on evolutionary algorithms, can characterize the behaviour range and dynamical freedom of the substrate.The evaluation level (P1.3, P2.3) estimates quality, by using the behaviours discovered from the exploration level. The behaviour space is divided into discrete voxels; the total number of voxels occupied by discovered behaviours provides the final quality value of the substrate. In P2.3, the quality of the test substrate is compared with the quality of the reference substrate from P1.3.
Additional levels providing further functions
Additional levels can be added to the framework to extract further features about the substrate and the behaviour space representation. They need not necessarily relate to the evaluation level (the quality value), and may work independently of it. Example additional levels include: modelling the relationships between the behaviour space and task-specific performances; modelling the relationships between the behaviour space and configuration space. Such relationships can be modelled and learnt using machine learning techniques.Here, one additional level is created: a learning level (P1.4, P2.4). The learning level is used here to evaluate whether the framework is reliable (that the behaviour metrics capture the underlying reservoir properties) and substrate-independent (that behaviours learned in one substrate can be transferred to a different substrate). To achieve independence, the reliability of the behaviour space representation should be high. In reality, due to noise and translation between the simulated and physical domain, we require reliability above some acceptable threshold.Further levels building on the exploration and learning levels are also possible. For example, the discovered behaviours can provide a reduced search space from which to rank and find optimal reservoirs for a particular task. As the number of tasks increases, this reduced search space decreases the required time to find good task-specific reservoirs without having to repeatedly search over the full configuration space.
Task-independent properties
In order to form a suitable behaviour space, we need to define each dimension of the space carefully. Some potentially interesting properties are difficult, if not impossible, to transfer across all substrates. For example, measures that require access to the system's internal workings will not transfer to black-box systems. Measures used with the framework must represent only the observable behaviour of the system, independent of its implementation.In general, the behaviour space should be created using as many uncorrelated measures as possible, representing different computational and dynamical properties. This will improve the reliability of the framework, but result in a larger space to explore, requiring more evaluations to build a useful characterization.In the work here, three common property measures are taken from the RC literature to form the behaviour space. These measures are the kernel rank (KR), generalization rank and memory capacity.KR is a measure of the reservoir's ability to separate distinct input patterns [18]. It measures a reservoir's ability to produce a rich nonlinear representation of the input u and its history u(t − 1), u(t − 2), …. This is closely linked to the linear separation property, measuring how different input signals map onto different reservoir states. As many practical tasks are linearly inseparable, reservoirs typically require some nonlinear transformation of the input. KR is a measure of the complexity and diversity of these nonlinear operations performed by the reservoir.GR is a measure of the reservoir's capability to generalize given similar input streams [18]. It attempts to quantify the generalization capability of a learning device in terms of its estimated VC-dimension [19], i.e. how well the learned nonlinear function generalizes to new inputs. In general, a low GR symbolizes a robust ability to map similar inputs to similar reservoir states, rather than overfitting noise.Reservoirs in ordered dynamical regimes typically have low ranking values in both KR and GR, and in chaotic regimes have both high. A rule-of-thumb is that good reservoirs possess a high KR and a low GR [20]. In terms of matching reservoir dynamics to tasks, the precise balance will vary.A unique trait that physical and unconventional substrates are likely to possess is the ability to feature multiple time-scales and possess tunable time scales through reconfiguration, unlike their more conventional reservoir counterparts.Another important property for RC is memory, as reservoirs are typically configured to solve temporal problems. (A substrate without memory may still be computationally interesting for solving non-temporal problems.) A simple measure for reservoir memory is the linear short-term memory capacity (MC). This was first outlined in [21] to quantify the echo state property. For the echo state property to hold, the dynamics of the input-driven reservoir must asymptotically wash out any information resulting from initial conditions. This property therefore implies a fading memory exists, characterized by the short-term memory capacity.A full understanding of a reservoir's MC, however, cannot be encapsulated through a linear memory measure alone, as a reservoir will possess some nonlinear memory. Other memory measures proposed in the literature quantify other aspects of memory, such as the quadratic and cross-memory capacities, and total memory of reservoirs using the Fisher memory curve [3,22]. The linear measure is used here to demonstrate the framework; additional measures can be added as needed.
Behaviour space exploration
To characterize the reservoir behaviour space, the search must explore without optimizing towards any particular property values. A balance between properties is essential to match reservoir dynamics to tasks. However, determining the right balance is challenging. During the characterization process, the exact balance required for specific tasks is irrelevant. Instead, the focus is to explore and map the space of possible trade-offs the substrate can exhibit, and use this to determine substrate quality.For the framework to function, the mapped reservoir behaviour space requires substrate-independence, so the exploration cannot be conducted, or measured, in the substrate-specific parameter space. Also, the exploration must be able to function without prior knowledge of how to construct reservoirs far apart from each other in the behaviour space, as diversity in observed dynamics is not easily related to diversity in substrate-specific parameters.Here, exploration is performed using the open-ended novelty search (NS) algorithm [23-25], one of several possible diversity algorithms [17]. NS increases the selection pressure of an underlying evolutionary algorithm towards novel behaviours far apart in the behaviour space. The full details of our NS implementation are given in appendix A.
Phase one: reference substrate
Phase one establishes a suitable reference substrate to compare against. Here, we use recurrent neural networks (RNNs) that closely resemble ESNs [21] as the reference. These are well established state-of-the-art reservoir ‘substrates’. RNNs are flexible, universal approximators of dynamical systems [26] producing a vast range of dynamics when reconfigured.For a standard ESN, the reservoir state update equation x(t) is
where f is the neuron activation function (typically a sigmoid) and the weight matrices Win, W and Wfb are matrices of connection weights to inputs (Win), internal neurons (W) and from the output to internal neurons (Wfb). In many cases, the feedback weights Wfb are unused and the other weight matrices are selected from a random distribution, then scaled globally.The final trained output y(t) is given when the reservoir states x(t) are combined with the trained readout layer Wout:
Training of the readout is typically carried out in a supervised way using one-shot linear regression with a teacher signal. A practical guide to creating and training ESNs is given in [27].
Demonstrating and validating the framework
In a typical use of the framework, one would now perform the various levels of phase one to characterize the ESN reference substrate. Here we do more, performing several experiments to demonstrate why certain choices have been made, to explore the framework in action and to determine the reliability of the results.Here, four sizes of ESNs are used for the purpose of demonstrating the framework. The four network sizes chosen have 25, 50, 100 and 200 nodes. This small spectrum from simple to complex reservoirs provides a useful test suite. Each size is a constrained version of the general ESN substrate, and exhibits different ranges of dynamical behaviours.
Novelty versus random search
Here we apply the exploration process P1.2, and evaluate the use of NS by comparing it to random search, determining its usefulness for characterizing a substrate. If NS performs well, if it discovers a greater variation in behaviours than random search within the same time, across network sizes, we argue it will continue to be advantageous for different substrates.First, we compare NS and random search visually. The hypothesis here is that NS can cover a greater volume of the behaviour space within the same number of search evaluations.The results of this experiment show that for every network size, NS expands further in all dimensions of the behaviour space. In figure 3, the explored spaces of the 50 and 200 node ESNs using both search techniques are plotted. In total, approximately 20 000 configurations from 10 separate runs are displayed.
Figure 3.
Behaviours discovered using NS (red, top row) and random search (black, bottom row) for network sizes of (a) 50 nodes, (b) 200 nodes. The three-dimensional behaviour space is shown as three projections. (Online version in colour.)
Behaviours discovered using NS (red, top row) and random search (black, bottom row) for network sizes of (a) 50 nodes, (b) 200 nodes. The three-dimensional behaviour space is shown as three projections. (Online version in colour.)Random search (in black, bottom row), which selects weights and scaling parameters from uniform random distributions, appears to produce similar patterns in the behaviour space across all network sizes. These patterns show sparse regions that are difficult to discover, and dense areas that are frequently visited despite different configuration parameters. As network size increases, random search tends to find it more challenging to uniformly cover the behaviour space, suggesting it becomes less effective as substrate complexity increases.NS (in red, top row) covers the behaviour space more uniformly, filling sparse regions and expanding into areas undiscovered by the random search. It does this within the same number of network evaluations, showing itself to be a more effective search technique than simply sampling the configuration space from a random uniform distribution.
Quality measure
Here we perform the evaluation process P1.3 on the behaviours discovered by NS above, in order to evaluate the voxel-based quality measure proposed to quantify the coverage of the behaviour space, and thus quality.To measure quality and coverage of the behaviour space, standard statistical measures of dispersion such as standard deviation, mean absolute deviation and interquartile range are not suitable by themselves: they downplay outliers, whereas the aim is to measure both the volume and the boundaries of the region explored. For this reason, a voxel-based measure is adopted here. Discovered behaviour instances occupying the same voxel are counted once, thereby grouping similarly behaved reservoirs as a single behaviour voxel.In our three-dimensional example, the discovered behaviours define the bounds of the measurable behaviour space: a large cube. The space is then discretized and partitioned into smaller voxel cubes. The smallest possible voxel size is 1 × 1 × 1: the smallest discretized value of the KR and GR property measures.Voxel size needs to be chosen carefully in order to accurately compare substrates. If the voxel size is too small, every occupied voxel will contain exactly one explored reservoir behaviour, and the quality measure will merely record the number of search points evaluated. If the voxel size is too large, the quality measure will be too coarse grained to make distinctions.Experiments to investigate the effect of voxel size are given in appendix E. These lead us to choose a voxel size of Vsize = 10 × 10 × 10 for the rest of this paper.The quality of a tested substrate is equal to the final coverage value. As voxel size and total number of evaluations both affect this value, the reference and test substrate should be compared using the same framework parameters.
Reliability of the behaviour space
In the last part of phase one addressed here, P1.4, the reliability of the behaviour space is measured, to demonstrate that the framework produces usable results. The outcome of this measure is also used to determine that the behaviour space is independent of the substrate implementation, P2.4, §4d. If the reliability is poor, independence is difficult to measure and interpret.To assess reliability and independence, concepts such as the representation relation and commuting diagrams from Abstraction/Representation (A/R) theory [28] are adapted to form a testable hypothesis. In A/R theory, a framework is proposed to define when a physical system computes. Using those concepts, one can assess whether an abstract computational model reliably represents computation performed by a physical system.Our hypothesis for the framework is that if the abstract reservoir space is truly representative of system dynamics, and independent of its implementation, it should hold that similar behaviours across substrates produce similar task performances. This hypothesis was conceived using A/R commuting diagrams as a template, where if the computational model faithfully represents the computation of the physical system, one can predict how the physical system states will evolve.To test the hypothesis, first the relationship between task performance and reservoir behaviour is modelled. The reliability of this model, measured as the prediction error of the model, indicates how well the behaviour space captures the computation occurring within the substrate.As explained in [14], relating property measures to expected performance across many tasks is a non-trivial problem, as good properties for one task are often detrimental to another. Therefore, no single set of measured values will lead to high performance across all tasks. However, the relationship between behaviour measure values and a single task are often simpler to determine; these are the relationships to be modelled.To create the prediction model, four common RC benchmark tasks are selected: the nonlinear autoregressive moving average (NARMA) task with a 10th and a 30th order time-lag; the Santa Fe laser time-series prediction task; the nonlinear channel equalization (NCE) task. Each task requires a different set of reservoir properties. Full details of the tasks are provided in appendix B.The modelling process of matching behaviours to task performances is framed as a regression problem. The model is created using standard feed-forward neural networks (FFNNs) and trained using a sample of the behaviours discovered in the exploration process, and their evaluated task performances. The inputs of the FFNNs are MC (continuous-valued), KR and GR (discrete values). The output of the FFNNs is the predicted task performance (continuous-valued) of each behaviour, measured as the normalized mean squared error (NMSE).The prediction error of the FFNNs is measured on the test sample, as the root mean squared error (RMSE) between the predicted NMSE and the behaviour's actual evaluated NMSE for a given task. That is, the prediction error PE is
where N is the size of the test set, ptp is the predicted task performance NMSE, and atp is the actual task performance NMSE.In the experiment, multiple FFNNs of the same size are trained per task, and per substrate network size (see appendix F for experimental details). If the behaviour space provides a reliable representation, the mean prediction error of the trained FFNNs should be low, since reliability implies a strong relationship is present, that is not too difficult to model, and that is similar when network size changes.Some difference in prediction error is present between models trained with different network sizes. This is due to different behavioural ranges, resulting in an increase or decrease in complexity of the modelled relationships. For example, reservoirs in the behaviour space around KR = GR = MC ≤ 25 tend to have similar poor performances for the NARMA-30 task because they do not meet a minimum requirement (MC≥30). This means the task is easier to model for small networks, as performance tends to be equally poor for all behaviours. Similarly, when larger ESNs are used to model the relationship, prediction error will likely increase as the distribution of errors increases and the complexity of the model increases. Patterns such as this are task-dependent, adding another level of complexity to the modelling process. For some tasks, to reliably model the relationship requires a greater variety of behaviours than smaller ESNs can provide. Therefore, FFNNs trained on the behaviour space of a 200 node network perform better than ones provided by the smaller networks, despite the apparent increase in complexity.The mean prediction errors of the FFNNs, for each task and substrate size, are shown in figure 4. Overall, the prediction errors are low, with typical values of less than 0.16. Depending on the task, errors increase or decrease as substrate network size increases. The prediction error for task 3 (Santa Fe laser) and task 4 (nonlinear channel equalization) decreases with substrate size, suggesting the model improves when trained using a larger variety of behaviours. However, these two tasks are particularly challenging to model (with a typical RMSE > 0.1) because of outliers in the training data coming from poor (high task error) and very good (low task error) reservoirs, typically with an NMSE≪0.1.
Figure 4.
Mean prediction error (RMSE) of FFNNs across all tasks and ESN sizes. Task 1: NARMA-10, Task 2: NARMA-30, Task 3: Santa Fe laser, and Task 4: nonlinear channel equalization. (The spread in values across the 10 FFNNs evaluated at each point is too small to see on this plot.)
Mean prediction error (RMSE) of FFNNs across all tasks and ESN sizes. Task 1: NARMA-10, Task 2: NARMA-30, Task 3: Santa Fe laser, and Task 4: nonlinear channel equalization. (The spread in values across the 10 FFNNs evaluated at each point is too small to see on this plot.)For the NARMA tasks, task 1 (NARMA-10) and 2 (NARMA-30), the prediction error increases as the network size increases. Prediction accuracy of the model therefore tends to degrade when trained with larger behaviour spaces, in contrast with tasks 3 and 4. However, this increase in error happens as variation in task performances increases, mirroring the same modelling problem for tasks 3 and 4. The lowest task errors for the NARMA-10 drop from an NMSE ≈ 0.13 to an NMSE ≈ 0.01 as size increases. The same also occurs for the NARMA-30 task, with the lowest errors decreasing from an NMSE ≈ 0.48 to an NMSE ≈ 0.14.From these results, a strong correlation emerges between the variance in task performance (NMSE) of each behaviour space and the prediction error (RMSE). This suggests refocusing the learning process: instead of trying to reliably model all behaviours, including the poor reservoirs, try to reliably model and predict only the best performing behaviours. The additional experiments in appendix F show the effect of this refocusing. The RMSE is significantly reduced when modelling behaviours below a task performance error threshold, rather than all behaviours. The results show the behaviour representation and model is most reliable when representing only the better task performing behaviours.Overall, the results of this evaluation step suggest the behaviour space provides a sufficiently reliable representation of the substrate's computational capabilities. However, given that the provided behaviour measures are known not to capture all the interesting dynamics of the system, there is room to improve the behaviour representation and the modelling process.
Phase two: test substrates
Phase two of the framework, P2, evaluates the test substrate(s) against the phase one reference substrate(s). The behaviour space of the test substrate(s) is explored in level P2.2; the quality is determined relative to the reference in level P2.3; here the framework's substrate-independence is evaluated in level P2.4.To demonstrate and evaluate the framework, two test substrates are characterized here: a simulated delay-based reservoir, and a physical CNT-based system. Each chosen substrate poses a unique challenge for the framework. These include differences in implementation (simulated or physical), structure (spatial or temporal) and levels of noise in each system.
Delay-based reservoir
The first substrate to be characterized is based on the delay-line reservoir (DR) system [4,16,29], using a single nonlinear node and a delay line. This particular reservoir system mimics the structure of a recurrent network of coupled processing nodes in the time domain rather than spatially. By applying time multiplexing and nonlinear mixing to the input signal, a virtual network of processing nodes is created. To date, DRs have produced excellent performances across different RC benchmarks [4,30,31].Delay-feedback dynamical systems possess high-dimensional state spaces and tunable memory making them ideal candidates for RC. The dynamics of these systems are typically modelled using delay differential equations of the type:
where t is time, τ is the delay time, f is the nonlinear function and J(t) is the weighted and time multiplexed input signal u(t).The DR technique is popular for optical and optoelectronic dynamical systems as it enables the exploitation of properties unique to these systems. It also provides a simple structure to overcome technical hardware challenges. These include exploiting high bandwidth and ultra-high speeds, and removing the demanding requirement of large complex physical networks. The technique however is not limited to these systems. It also offers a novel approach to implement networks efficiently on other hardware platforms. This is particularly useful when few inputs and outputs are available, creating the required state and network complexity in the time-domain to solve tasks. Examples include electronic circuits [4,32], Boolean nodes on a field-programmable gate array (FPGA) [33], a nonlinear mechanical oscillator [34] and spin-torque nano-oscillators [8]. However, the DR technique also has potential shortcomings including a serialized input, pre-processing required on the input and limits determined by the length of the delay line. To overcome some of these shortcomings, more complex architectures of multiple time-delay nodes have been proposed, leading to improved performances compared to single-node architectures [35].The DR system characterized here consists of a simulated Mackey–Glass oscillator and a delay line, inspired by Appeltant et al. [4]. This same system was also realized physically using an analogue electronic circuit in [4]. Details on the implementation of the Mackey–Glass system and the time-multiplexing procedure are provided in appendix Cc.
Physical carbon nanotube-based reservoir
The second substrate to be characterized is a physical material deposited on a micro-electrode array. The substrate is electrically stimulated and observed using voltage signals and configured through the selection of input and output locations on the array. The material is a mixed CNT–polymer composite, forming random networks of semi-conducting nanotubes suspended in a insulating polymer. The material has been applied to, and performs well on, several computational problems including function approximation, the travelling salesman problem and robot control [36,37]. However, the material has so far produced only modest performances on challenging RC benchmark tasks [9]. As a reservoir, the material has been shown to perform well on simple tasks, but struggles to exhibit strong nonlinearity and sufficient memory for more complex tasks [38,39].In previous work [9,38,39], a small level of characterization was carried out on different CNT-based reservoirs, showing even the best fabricated material (a 1% concentration of CNTs w.r.t. weight mixed with poly-butyl-methacrylate) typically exhibits low MC, despite different biasing and stimulation methods for configuration.The right concentration and arrangement of CNTs, and method for stimulating and observing the material, is still an open question. So far, the methods and materials used have led to overall modest performances on benchmark tasks such as NARMA-10 [9] and the Santa Fe laser time-series prediction task [38], but encouraging when the number of inputs and outputs are taken into account.Characterizing a black-box material like the CNT composite is challenging because of its disordered structure and stochastic fabrication process, making it impractical (or even impossible for the general case) to model its exact internal workings. Originally, the CNT–polymer composite was proposed as a sandpit material to discover whether computer-controlled evolution could exploit a rich partially constrained source of physical complexity to solve computational problems [40]. Because of its physicality, with somewhat unknown computing properties, it provides a challenging substrate for the CHARC framework to characterize. Further details about the CNT-based substrate and its parameters are provided in appendix Cb.
Quality of test substrates
A visualization of exploration level P2.2 and the results of the evaluation level P2.3 for each substrate are presented here. Similar to phase one, the quality of each substrate is calculated as the total number of voxels occupied after 2000 search generations. Figure 5 shows the total number of occupied voxels after every 200 generations, with error bars displaying the min–max values for different evolutionary runs.
Figure 5.
Voxel measure of coverage as number of generations increase. Test substrates are shown as black lines, reference substrates are dashed grey lines. Error bars display the min-max values for all search runs. Note the logarithmic coverage scale.
Voxel measure of coverage as number of generations increase. Test substrates are shown as black lines, reference substrates are dashed grey lines. Error bars display the min-max values for all search runs. Note the logarithmic coverage scale.The differences in behavioural freedom between the DR, CNT and ESN substrates are significant. Using the voxel measure, we can determine which of the reference substrates are close equivalents in terms of quality to the test substrates. At the beginning of the search process, the DR appears similar in quality to an ESN with 100 nodes, while the CNT has a considerably smaller quality than the ESN of 25 nodes. As the number of search generations increases, the DR's coverage increases rapidly, reaching a final value close to an ESN with 200 nodes, yet the CNT struggles to increase its coverage. The rate at which behaviours are discovered for the DR and CNT are very telling, suggesting it is much harder to discover new behaviours for the CNT than the DR. This increased difficulty could imply the bounds of the substrate's behaviour space have almost been met: as the discovery rate of new novel behaviours decreases, either the search is stuck exploiting a niche area, or it has reached the boundaries of the whole search space.A visual inspection of the covered behaviour spaces provides a more detailed understanding of the final quality values. The discovered behaviours for both substrates are shown in figure 6. In each subplot, the behaviours for each test substrate (DR in figure 6a and CNT in figure 6b) are presented in the foreground and reference substrates with the most similar quality (200 node ESN in figure 6a and 25 node ESN in figure 6b) are placed in the background.
Figure 6.
Behaviours discovered when exploring the ESN, CNT and DR substrates. To visually compare substrates, each test substrate is plotted over the reference substrate with the most similar quality. (a) 200 node ESN (light grey) with DR (dark grey), (b) 25 node ESN (light grey) with CNT (dark grey).
Behaviours discovered when exploring the ESN, CNT and DR substrates. To visually compare substrates, each test substrate is plotted over the reference substrate with the most similar quality. (a) 200 node ESN (light grey) with DR (dark grey), (b) 25 node ESN (light grey) with CNT (dark grey).In figure 6a, the DR behaviours extend into regions that the 200 node ESN cannot reach, and, as a consequence, only sparsely occupies regions occupied by the ESN. Given more search generations, these sparse regions would likely be filled, as similar behaviours are already discovered.The DR struggles to exceed the MC of the 200 node ESNs, or exhibit a KR or GR beyond 300, despite having 400 virtual nodes. This could indicate that increasing the number of virtual nodes does not necessarily lead to greater memory or dynamical variation, a feature more typical of ESNs (figure 11d). However, the virtual network size is not an isolated parameter; the time-scale and nonlinearity of the single node, and the separation between virtual nodes, all play an important role in reservoir dynamics.
Figure 11.
Average coverage (over 10 runs) of behaviour space against number of generations. Four network sizes and the two search methods are shown: NS (black, solid line) and random search (grey, dashed line). Error bars show minimum and maximum coverage. Quality is given for all network sizes, for both random and NS. (a) V
size = 1 × 1 × 1, (b) V
size = 5 × 5 × 5, (c) V
size = 10 × 10 × 10, (d) Compared behaviour spaces.
In figure 6b, the CNT exploration process struggles to find behaviours with MC > 5, reaching what appears to be an MC limit. The highest discovered KR and GR values are also small, tending to be lower than (almost half) their possible maximum values, i.e. the total number of electrodes used as outputs. This suggests the substrate struggles to exhibit enough (stable) internal activity to create a strong nonlinear projection, and to effectively store recent input and state information, agreeing with previous results [9,38,39]. The results here also highlight why only a limited range of tasks are suitable for the substrate, and why small ESNs tend to be good models of the substrate.These results show the CNT substrate in its current form features a limited set of behaviour, explaining its usefulness to only a small class of problems. The DR system features greater dynamical freedom, implying it can perform well across a larger set of problems. The coverage of this particular Mackey–Glass system is similar to large ESNs, explaining why they can closely match the performance of ESNs across the same class of problems [4,41].
Prediction transfer
The final level here, P2.4, evaluates the substrate-independence of the framework. To do this, we evaluate the transferability of the learnt relationships (FFNNs) from level P1.4 by measuring their prediction accuracy when tasked to predict a different substrate. We evaluate how well the trained models (FFNNs) of the reference substrates predict the performance of the other reference substrates, i.e. predict the task performance of different ESN sizes.Figure 7 shows the mean prediction error (RMSE) of all FFNNs for every predicted substrate. Each dashed line represents the predicted substrate. The x-axis represents the FFNNs trained on different reference network sizes; four sizes are shown for each task, being FFNNs trained using the ESN sizes 25, 50, 100 and 200 nodes. The y-axis is the prediction error (RMSE) of each model for each substrate.
Figure 7.
Prediction error (RMSE) of the learned models (FFNNs) from P1.3 when set to predict the task performance of other substrates. The modelled reference substrate (i.e. ESN size) used for the prediction is given on the x-axis and the test substrate is given in the legend.
Prediction error (RMSE) of the learned models (FFNNs) from P1.3 when set to predict the task performance of other substrates. The modelled reference substrate (i.e. ESN size) used for the prediction is given on the x-axis and the test substrate is given in the legend.The results show that the models trained with smaller network sizes tend to poorly predict the task performance of larger networks across all tasks. This intuitively makes sense; the smaller network models are trained without any data examples beyond their own behavioural limits, and thus cannot make an accurate prediction for larger networks.The models trained with larger network sizes tend to predict the smaller networks fairly well. The best predictions occur when the model is trained and tested using the same network size. Considering the variation in task performance as size increases, and fewer training examples within specific areas occupied by smaller network sizes, prediction appears to be reasonably robust when using the largest explored reference substrate.The model of the largest network (200 node) tends to better predict the DR, on average resulting in the lowest prediction errors. For the CNT, models of all network sizes result in low prediction errors for most tasks, except the nonlinear channel equalization task. Prediction error for this task, however, continues to improve as network size increases. Given these results, we argue that a reference substrate with high quality will tend to provide a good prediction of lower quality substrates.Figure 8 summarizes the results of the substrate-independence experiment. It plots the difference (Δ) between the best prediction error and the test substrates prediction error. When the overall prediction error is low and the difference (Δ) is close to zero, the relationship between behaviour and task performance is strong, and thus the abstract behaviour space reliably represents underlying computational properties, independent of the substrate's implementation.
Figure 8.
Difference (Δ) between best (self-)prediction and test prediction for CNT and DR substrates.
Difference (Δ) between best (self-)prediction and test prediction for CNT and DR substrates.Figure 8 plots Δ for the two test substrates on all four benchmark tasks. Each bar signifies the difference Δ between the best prediction error (from the model trained and tested with the same network size) and the trained model used to predict the test substrate. The results show on average the CNT tends to provide the smallest Δ with models of smaller networks. For the DR, the model of the largest network tends to provide Δ's closest to zero.Overall, the low and similar prediction errors across substrates indicates that the CHARC framework has a good level of substrate independence. The results also highlight the non-trivial nature of modelling the task–property relationship, with some tasks being more difficult to model and predict than others. Although not the original purpose of this level, this demonstrates that one could roughly predict the task performances of newly characterized substrates, or potentially even test new tasks using a trained model without having to evaluate the test substrate directly. This feature of the framework is potentially beneficial to hardware systems where training can be time and resource intensive.
Conclusion
A fundamental question in RC is: For a given task, what characteristics of a dynamical system or substrate are crucial for information processing? The CHARC framework tackles this question by focusing on the characteristic behaviour of the substrate rather than its performance on a specific task. In the process, two non-trivial problems were attempted: (i) how to characterize the quality of a substrate for RC and (ii) how do computational properties relate to performance.To use the framework, two phases must be completed. In the first phase, the basic levels (definition, exploration and evaluation) are applied to a reference substrate, providing context for future quality characterizations for other substrates. In the second phase, the test substrate is explored, characterized and compared.The presented framework is flexible, allowing new computational measures, techniques and additional high-level functions to be added. In this work, we have proposed and demonstrated just one possible high-level function that could model the challenging relationships between tasks and computational properties. This is used to predict the task performance of the substrate given its task-independent behaviour.Using the framework, we have shown that exploration through open-ended evolution can be a powerful tool for outlining the limitations and capability of a substrate. This explains why a CNT-based composite can solve some simple computational tasks but often struggles to compete with more complex reservoir substrates. It is also shown why DR compare so favourably to ESNs due to similar behavioural quality.The characterization process of CHARC has many potential future applications, for example assessing the effect structure, topology and complexity has on dynamical freedom; using quality to guide, understand and explore substrate design; and, eventually, the design of suitable computational models. Ultimately, this can open the door for the co-design of both computational model and substrate to build better, more efficient unconventional computers.