| Literature DB >> 35143521 |
Claudio Angione1,2,3,4, Eric Silverman5, Elisabeth Yaneske1.
Abstract
In this proof-of-concept work, we evaluate the performance of multiple machine-learning methods as surrogate models for use in the analysis of agent-based models (ABMs). Analysing agent-based modelling outputs can be challenging, as the relationships between input parameters can be non-linear or even chaotic even in relatively simple models, and each model run can require significant CPU time. Surrogate modelling, in which a statistical model of the ABM is constructed to facilitate detailed model analyses, has been proposed as an alternative to computationally costly Monte Carlo methods. Here we compare multiple machine-learning methods for ABM surrogate modelling in order to determine the approaches best suited as a surrogate for modelling the complex behaviour of ABMs. Our results suggest that, in most scenarios, artificial neural networks (ANNs) and gradient-boosted trees outperform Gaussian process surrogates, currently the most commonly used method for the surrogate modelling of complex computational models. ANNs produced the most accurate model replications in scenarios with high numbers of model runs, although training times were longer than the other methods. We propose that agent-based modelling would benefit from using machine-learning methods for surrogate modelling, as this can facilitate more robust sensitivity analyses for the models while also reducing CPU time consumption when calibrating and analysing the simulation.Entities:
Mesh:
Year: 2022 PMID: 35143521 PMCID: PMC8830643 DOI: 10.1371/journal.pone.0263150
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.752
Summary of methods implemented in our study.
| Method | Description | Advantages | Disadvantages |
|---|---|---|---|
| Linear Regression | Predicts values using a linear combination of input features, in this case parameter values [ | Fast, extremely well-studied, easy to implement with any number of tools. | Not well-suited for data with non-linear relationships. |
| Decision Trees | Generates binary trees that predict the value of a variable based on several inputs, represented as interior nodes in the tree [ | Fast and easy to train, simple to implement, very easy to understand and interpret. | High variance (slight changes to input data can produce very different trees), prone to overfitting. |
| Random Forests | Ensemble approach to decision trees in which multiple models are trained using random split points. When making a prediction, the predictions of each tree in the ensemble are averaged together to produce the final result [ | Easy to parallelise, low computational load, low variance. | Prone to overfitting, low interpretability. |
| Gradient Boosted Trees | Ensemble method that combines weak learners into a strong learner [ | Fast, low variance, very successful across a wide range of problems. | Prone to overfitting, requires extensive parameter optimisation. |
| K-Nearest Neighbours | This method makes a prediction by finding the K most similar points in the entire training data set to the new data point we would like to label, then summarising the output values at those points to arrive at the prediction for the new point [ | Simple to implement, fast, only requires computational resources when making a prediction. | Must include the entire training set, becomes ineffective as the number of input variables becomes high (the ‘curse of dimensionality’). |
| Gaussian Process Emulation | The most common method for surrogate modelling of computer models, GPs model the simulation as a Gaussian process [ | Very computationally efficient, highly useful for sensitivity analyses, specialised free software (GEM-SA) speeds the process up considerably. | Assumes that the surrogate model is smooth (may not be the case with complex ABMs), GEM-SA software no longer maintained, only copes with single model outputs. |
| Support Vector Machine (SVM) | Finds a hyperplane in a high-dimensional space of data points that can separate those points with the widest possible margin. It can be used for classification or regression [ | Scales well, few hyperparameters to optimise, flexible and powerful in higher dimensions thanks to the ‘kernel trick’. | It can be difficult to choose the right kernel, long training times for large datasets, low interpretability. |
| Neural Network | A network of nodes loosely based on biological neurons, normally consisting of an input and output layer with one or more hidden layers of neurons in between. Learning algorithms adjust the weighted connections between neurons to enable regression or classification of input datasets. Deep neural networks use many hidden layers and can model complex non-linear relationships [ | An enormous variety of possible layer types and network architectures, can learn supervised or unsupervised, highly suitable for modelling non-linear relationships, very well-supported by powerful open-source software. | Computationally expensive hyperparameter optimisation, prone to overfitting, large networks require GPU access for training, low interpretability. |
The ten parameters used in the Linked Lives ABM surrogate model generation process, with descriptions, default values and lower and upper bounds used when generating simulation output data.
| Parameter | Description | Default | Range |
|---|---|---|---|
| ageingParentsMoveInWithKids | Probability agents move back in with adult children | 0.1 | 0.1–0.8 |
| baseCareProb | Base probability used for care provision functions | 0.0002 | 0.0002–0.0016 |
| retiredHours | Hours of care provided by retired agents | 60.0 | 40–80 |
| ageOfRetirement | Age of retirement for working agents | 65 | 55–75 |
| personCareProb | General individual probability of requiring care | 0.0008 | 0.0002–0.0016 |
| maleAgeCareScaling | Scaling factor for likelihood of care need for males | 18.0 | 10–25 |
| femaleAgeCareScaling | Scaling factor for likelihood of care need for females | 19.0 | 10–25 |
| childHours | Hours of care provided by children living at home | 5.0 | 1–10 |
| homeAdultHours | Hours of care provided by unemployed adults | 30.0 | 5–50 |
| workingAdultHours | Hours of care provided by employed adults | 25.0 | 5–40 |
Fig 1Performance of the nine machine-learning methods trained on simulation outputs from 200, 400, 800 and 1600 runs.
The spider plots compare speed and accuracy across all nine methods for the 200, 400, 800 and 1600 run scenarios in plots (a), (b), (c) and (d) respectively. For each method, the total computational runtime on an 8-core i7 CPU and the mean-squared error (MSE) on the test set are shown (both in log scale, reversed, and mapped to the [0, 1] interval to represent relative speed and accuracy, respectively). Neural networks were the strongest overall performers, with gradient-boosted trees also performing well overall, and non-linear SVM performing increasingly well for higher numbers of runs. The high accuracy of the neural network models has a significant cost in terms of speed. Gradient-boosted trees and non-linear SVM consistently perform well in terms of speed, but suffer from a lower accuracy overall.
Fig 2Sample results on the 800-run simulation scenario.
Diagrams of the neural network architecture in full detail in (a) and in simplified schematic form in (b). In the 800-run scenario, the network with 10 hidden layers pictured here performed the best in a brief comparison between networks with varying numbers of hidden layers. (c) Loss of a 15000-round training run of the simple neural network. (d) Comparison plot produced after training the neural network on the simulation data.
Fig 3Output of the GP emulator run, performed using the 400-run simulation data set.
(a) Graphs of the main effects of each of the 10 input parameters on the final output of interest, in this case social care cost per person per year. The graphs demonstrate that the emulator was unable to fit a model to the simulation results, as each successive emulator run produced very different results and estimates of the main effects. (b) Numerical outputs of the emulator. The emulator estimates total output variance at 5.41 billion, a clear indication that the emulator is not able to fit the simulation data.
Fig 4PCA variable contribution maps and scree plots for the 400- and 1600-sample datasets.
The scree plots of the percent variance contribution of each component visually convey the location where there is a sharp change in gradient, which defines the number of significant components, i.e. the components to be retained in the analysis. The gradient change seen at component 6 of the 400-sample dataset contrasts with the steep gradient change at component 1 of the 1600-sample dataset. The 400-sample dataset variable contribution map shows variables beginning to be clustered, however, there is very little separating the contribution to variance between components with less than 2% difference between the first and last components (as can be seen in the 400-sample scree plot). The variable contribution map of the 1600-sample dataset shows the variables converging into a single component (component 1) contributing 90.3% of the variance. Here PCA is unable to make any useful discrimination between the variables, while identifying eight parameters (on the first component) significantly explaining the variance in the ABM social care per capita.
Fig 5Predicted value (x axis) versus actual value (y axis) for the 200 run scenario, across all the methods implemented in our comparative study.
The dotted line represents the y = x line.