| Literature DB >> 30020942 |
Paolo Pagliuca1, Nicola Milano1, Stefano Nolfi1.
Abstract
In this paper we compare systematically the most promising neuroevolutionary methods and two new original methods on the double-pole balancing problem with respect to: the ability to discover solutions that are robust to variations of the environment, the speed with which such solutions are found, and the ability to scale-up to more complex versions of the problem. The results indicate that the two original methods introduced in this paper and the Exponential Natural Evolutionary Strategy method largely outperform the other methods with respect to all considered criteria. The results collected in different experimental conditions also reveal the importance of regulating the selective pressure and the importance of exposing evolving agents to variable environmental conditions. The data collected and the results of the comparisons are used to identify the most effective methods and the most promising research directions.Entities:
Mesh:
Year: 2018 PMID: 30020942 PMCID: PMC6051599 DOI: 10.1371/journal.pone.0198788
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1The double-pole balancing problem.
Initial states used during different trials carried out in the Fixed Initial States condition.
| Trial | θ1 | θ2 | ||||
|---|---|---|---|---|---|---|
| 1 | -1.944 | 0 | 0 | 0 | 0 | 0 |
| 2 | 1.944 | 0 | 0 | 0 | 0 | 0 |
| 3 | 0 | -1.215 | 0 | 0 | 0 | 0 |
| 4 | 0 | 1.215 | 0 | 0 | 0 | 0 |
| 5 | 0 | 0 | -0.10472 | 0 | 0 | 0 |
| 6 | 0 | 0 | 0.10472 | 0 | 0 | 0 |
| 7 | 0 | 0 | 0 | -0.135088 | 0 | 0 |
| 8 | 0 | 0 | 0 | 0.135088 | 0 | 0 |
Range of the states used to set the initial state in the Randomly Varying Initial States condition.
| Min | max | |
|---|---|---|
| -1.944 | 1.944 | |
| -1.215 | 1.215 | |
| θ1 | -0.10472 | 0.10472 |
| θ2 | -0.135088 | 0.135088 |
| -0.10472 | 0.10472 | |
| -0.135088 | 0.135088 |
Evaluations required to solve the problem averaged over 50 replications of the experiments and percentage of replications that successfully solved the problem (i.e. that found a network capable of balancing the pole until the end of the trial).
| Classic Double-Pole | Evaluations |
|---|---|
| xNES | 395 |
| PSHC | 563 |
| CoSyNe | 1257 |
| SSS | 1557 |
| CGPANN | 6885 |
| NEAT | 7743 |
In the methods with fixed topologies, we used full recurrent architectures. The data reported in the table are those obtained with the best combination of parameters: xNES (fixed positions: LearningRate 0.5, NumHiddens 0), PSHC (MutRate 50%, Stochasticity 10%, Interbreeding 10%, NumHiddens 1), CoSyNE (MutRate 90%, SubPopulations 5, NumHiddens 1), SSS (fixed positions: MutRate 50%, Stochasticity 20%, NunHiddens 1), CGPANN (MutRate 3%; NumIncomingConnections 8, NumHiddens 2), and NEAT (popSize 100, NumHiddens 1.8, see Stanley and Miikkulainen [2002] for the other parameters). In the case of NEAT, the number of hidden neurons indicates the average number of internal neurons possessed by first solution obtained in each replication. The network of the fastest replication included 1 hidden neuron.
Performance and generalization ability of neural network controllers evolved with different methods on the double-pole balancing problem.
| Standard Double-Pole | Fixed Initial States | Randomly Varying Initial States | ||
|---|---|---|---|---|
| Performance | Generalization | Performance | Generalization | |
| NEAT | 0.879 | 0.397 [322] | 0.696 | 0.710 [656] |
| CoSyNE | 0.971 | 0.701 [609] | 0.911 | 0.699 [594] |
| CGPANN | 0.967 | 0.693 [622] | 0.824 | 0.613 [506] |
| xNES | 0.717 [696] | 0.889 | 0.897 [884] | |
| SSS | 0.821 [791] | 0.906 [893] | ||
| PSHC | 0.785 [746] | 0.807 [788] | ||
Each number indicates the average performance obtained during 30 replications of the experiment. Generalization refers to the average performance obtained by post-evaluating the evolved networks on 1000 trials during which the initial states of the cart have been set randomly. The numbers in square brackets indicate the number of trials in which the agents manage to maintain the poles balanced for the entire duration of the trial during the post-evaluation test. The data reported in the table are those obtained with the best combination of parameters: NEAT (fixed positions: Population 1000; varying positions: Population 1000, for the other parameters see Stanley and Miikkulainen, 2002), CoSyNE (fixed positions: MutRate 5%, SubPopulation 60; varying positions: MutRate 10%, SubPopulation 60), CGPANN (fixed positions: MutRate 20%, Incoming Connections 8); varying positions: MutRate 1%, Incoming Connections 4), xNES (fixed positions: LearningRate 0.2; varying positions: LearningRate 0.1), SSS (fixed positions: MutRate 7%, Stochasticity 10%; varying positions: MutRate 3%, Stochasticity 10%), PSHC (fixed positions: MutRate 7%, Stochasticity 50%, Interbreeding 10%; varying positions: MutRate 7%, Stochasticity 0%, Interbreeding 10%).
Performance and generalization ability of neural network controllers evolved with different methods on the delayed double-pole balancing problem.
| Delayed Double-Pole | Fixed Initial States | Randomly Varying Initial States | ||
|---|---|---|---|---|
| Performance | Generalization | Performance | Generalization | |
| NEAT | 0.292 | 0.038 [0] | 0.300 | 0.037 [0] |
| CoSyNE | 0.589 | 0.453 [380] | 0.642 | 0.434 [360] |
| CGPANN | 0.206 | 0.077 [40] | 0.367 | 0.284 [171] |
| xNES | 0.943 | 0.692 [679] | 0.992 | 0.911 [903] |
| SSS | 0.996 | 0.756 [731] | 0.873 [854] | |
| PSHC | 0.685 [635] | 0.755 [722] | ||
Each number indicates the average performance obtained during 30 replications of the experiment. Generalization refers to the average performance obtained by post-evaluating the evolved networks on 1000 trials during which the initial states of the cart have been set randomly. The numbers in square brackets indicate the number of trials in which the agents manage to maintain the poles balanced for the entire duration of the trial during the post-evaluation test. The best performance is indicated in bold. The data reported in the table are those obtained with the best combination of parameters: NEAT (fixed positions: Population 1000; varying positions, Population 1000, for the other parameters see Stanley and Miikkulainen, 2002), CoSyNE (fixed positions: MutRate 10%, Subpopulations 60; varying positions: MutRate 5%, Subpopulations 60), CGPANN (fixed positions: MutRate 20%, Incoming Connections 8; varying positions: MutRate 1%, Incoming Connections 5), xNES (fixed positions: LearningRate 0.2; varying positions: LearningRate 0.1), SSS (fixed positions: MutRate 3%, Stochasticity 100%; varying positions: MutRate 7%, Stochasticity 70%), PSHC (fixed positions: MutRate 7%, Stochasticity 50%; Interbreeding10%, varying positions: MutRate 3%, Stochasticity 0%, Interbreeding 10%).
Fig 2Performance of the best agents evolved through different methods during the evolutionary process.
The top and bottom picture display that data of the standard and delayed double-pole problems, respectively. The left and right pictures display the data of the experiment carried out in the Fixed and Randomly varying initial conditions, respectively. Each curve displays the average result of the best 30 networks evolved in the 30 corresponding replications of each experiment. Data refer to the best individual of the population.
Fig 3Long double-pole problem.
Performance of the best agents during the course of the evolutionary process. The left and right pictures display the data of the experiment carried out in the Fixed and Randomly varying initial conditions, respectively. Each curve displays the average result of the best 30 networks evolved in the 30 corresponding replications of each experiment. Data refer to the best individual of the population.
Average number of evaluations required to reach a fitness equal or greater than 0.9 for the first time.
| Fixed Initial States condition | Standard Double-Pole | Delayed Double-Pole |
|---|---|---|
| NEAT | 234,666.6 | * |
| CoSyNE | 256,000.0 | * |
| CGPANN | 2,389,333.3 | * |
| xNES | 490,666.6 | 1,450,666.6 |
| SSS | 192,000.0 | 3,989,333.3 |
| PSHC | 405,333.3 | 1,902,933.3 |
Asterisks indicate the conditions that failed to reach this threshold in more than 15 out of 30 replications.
Performance and generalization ability of neural network controllers evolved with the SSS method on the standard and delayed double-pole balancing problem in the Varying Initial States experimental condition in experiment carried out with different combination of parameters.
| 1.0 [0.901] | 1.0 [0.899] | 1.0 [0.894] | 1.0 [0.890] | 1.0 [0.888] | |
| 1.0 [0.897] | 1.0 [0.906] | 1.0 [0.897] | 1.0 [0.894] | 1.0 [0.888] | |
| 0.996 [0.896] | 1.0 [0.896] | 1.0 [0.895] | 0.998 [0.881] | 1.0 [0.880] | |
| 1.0 [0.894] | 1.0 [0.895] | 1.0 [0.881] | 1.0 [0.883] | 1.0 [0.881] | |
| 0.989 [0.870] | 1.0 [0.893] | 1.0 [0.875] | 1.0 [0.874] | 1.0 [0.875] | |
| 1.0 [0.887] | 0.998 [0.880] | 0.999 [0.870] | 1.0 [0.882] | 1.0 [0.872] | |
| 0.917 [0.744] | 0.956 [0.806] | 0.957 [0.818] | 0.962 [0.809] | 0.972 [0.805] | |
| 0.894 [0.743] | 0.909 [0.748] | 0.977 [0.820] | 0.960 [0.836] | 0.988 [0.847] | |
| 0.952 [0.780] | 0.993 [0.829] | 0.994 [0.844] | 0.969 [0.818] | 0.971 [0.827] | |
| 0.965 [0.812] | 0.983 [0.839] | 0.969 [0.831] | 0.963 [0.826] | 0.966 [0.822] | |
| 0.942 [0.789] | 0.989 [0.852] | 0.983 [0.852] | 1.0 [0.862] | 0.994 [0.863] | |
| 0.982 [0.835] | 0.986 [0.836] | 0.993 [0.856] | 0.993 [0.856] | 0.980 [0.823] | |
| 1.0 [0.847] | 1.0 [0.862] | 0.994 [0.862] | 1.0 [0.873] | 1.0 [0.869] | |
| 0.988 [0.834] | 0.996 [0.845] | 1.0 [0.867] | 0.818 [0.700] | 0.350 [0.250] |
Each number indicates the average performance obtained during 30 replications of the experiment. Generalization refers to the average performance obtained by post-evaluating the evolved networks on 1000 trials during which the initial states of the cart have been set randomly. The numbers in square brackets indicate the number of trials in which the agents manage to maintain the poles balanced for the entire duration of the trial during the post-evaluation test.
Performance and generalization ability of neural network controllers evolved with different methods on the long double-pole balancing problem.
| Long double-pole | Fixed Initial States | Randomly Varying Initial States | ||
|---|---|---|---|---|
| Performance | Generalization | Performance | Generalization | |
| xNES | 0.220 [190] | 0.502 | 0.520 [493] | |
| SSS | 0.802 | 0.247 [197] | 0.664 | 0.431 [394] |
| PSHC | 0.799 | 0.209 [142] | 0.312 [245] | |
| PSHC* | 0.715 | 0.238[179] | 0.602 | 0.335 [261] |
Each number indicates the average performance obtained during 30 replications of the experiment. Generalization refers to the average performance obtained by post-evaluating the evolved networks on 1000 trials during which the initial states of the cart has been set randomly. The best performances are indicated in bold. The numbers in square brackets indicate the number of trials in which the agents manage to maintain the poles balanced for the entire duration of the trial during the post-evaluation test. The data reported in the table are those obtained with the best combination of parameters: xNES (fixed positions: LearningRate 0.5; varying positions: LearningRate 0.1), SSS (fixed positions: MutRate 20%, Stochasticity 30%; varying positions: MutRate 3%, Stochasticity 0%), PSHC (fixed positions: MutRate 7%, Stochasticity 50%, Interbreeding10%; varying positions: MutRate 3%, Stochasticity 0%, Interbreeding 10%), PHSC* (fixed positions: MutRate 7%, EliminateConnection 40%, Stochasticity 70%, Interbreeding 10%; varying positions: MutRate 1%, EliminateConnection 15%, Stochasticity 50%, Interbreeding 10%).
Performance and generalization ability of the agents evolved with the PSHC and the SSS method in the Fixed Initial States condition in the experiments carried out with or without stochasticity.
| SSS | 1.0 | 1.0 |
| PSHC | 1.0 | 1.0 |
| SSS | 0.867 | 0.996 |
| PSHC | 0.823 | 1.0 |
| SSS | 0.760 | 0.802 |
| PSHC | 0.479 | 0.799 |
The numbers refer to the data obtained with the best combination of parameters, i.e. with the best mutation rate in the case of the data reported in the second column, and with the best mutation rate and stochasticity level in the case of the data reported in the third column.
Statistical difference between the generalization performance obtained in the Fixed and Randomly Varying Initial States conditions for the experiment carried with the xNES, SSS, and PSHC, methods.
| Fixed Initial States | Randomly Varying Initial States | Mann-Whitney U test, p-value | |
|---|---|---|---|
| xNES | 0.717 [696] | 0.897 [884] | < 0.05 |
| SSS | 0.821 [791] | 0.906 [893] | < 0.05 |
| PSHC | 0.785 [746] | 0.807 [788] | < 0.05 |
| xNES | 0.692 [679] | 0.911 [904] | < 0.05 |
| SSS | 0.756 [731] | 0.873 [854] | < 0.05 |
| PSHC | 0.685 [635] | 0.755 [722] | < 0.05 |
| xNES | 0.220 [190] | 0.520 [493] | < 0.05 |
| SSS | 0.247 [197] | 0.431 [394] | < 0.05 |
| PSHC | 0.209 [142] | 0.312 [245] | < 0.05 |
Data collected on the experiments carried with the best parameters (see Tables 4, 5 and 8). Each number indicates the average performance obtained during 30 replications of the experiment. Generalization refers to the average performance obtained by post-evaluating the evolved networks on 1000 trials during which the initial states of the cart have been set randomly. The numbers in square brackets indicate the number of trials in which the agents manage to maintain the poles balanced for the entire duration of the trial during the post-evaluation test.
Performance and generalization capabilities of neural network controllers evolved with the SSS method in the Randomly Varying Initial States condition by using a variable number of trials.
| N. Trials | Double-Pole | Delayed Double-Pole | Long Double-Pole |
|---|---|---|---|
| 1 | 1.0 [0.826] | 1.0 [0.818] | 0.180 [0.075] |
| 2 | 1.0 [0.863] | 1.0 [0.870] | 0.458 [0.182] |
| 4 | 1.0 [0.882] | 1.0 [0.865] | |
| 0.664 [0.431] | |||
| 16 | 0.990[0.907] | 0.969 [0.867] | 0.683 [0.421] |
| 32 | 0.968 [0.905] | 0.903 [0.821] | 0.493 [0.353] |
| 64 | 0.917 [0.885] | 0.855 [0.810] | 0.325 [0.261] |
Data collected by using the best parameters. The numbers in square brackets indicate the average performance obtained by post-evaluating the corresponding 30 best evolved agents for 1000 trials during which the initial state of the cart have been set randomly within the range shown in Table 2. Each number indicates the average performance obtained during 30 replications of the corresponding experiment.