| Literature DB >> 31104604 |
W J M Probert1, S Lakkur2, C J Fonnesbeck2, K Shea3, M C Runge4, M J Tildesley5, M J Ferrari3.
Abstract
The number of all possible epidemics of a given infectious disease that could occur on a given landscape is large for systems of real-world complexity. Furthermore, there is no guarantee that the control actions that are optimal, on average, over all possible epidemics are also best for each possible epidemic. Reinforcement learning (RL) and Monte Carlo control have been used to develop machine-readable context-dependent solutions for complex problems with many possible realizations ranging from video-games to the game of Go. RL could be a valuable tool to generate context-dependent policies for outbreak response, though translating the resulting policies into simple rules that can be read and interpreted by human decision-makers remains a challenge. Here we illustrate the application of RL to the development of context-dependent outbreak response policies to minimize outbreaks of foot-and-mouth disease. We show that control based on the resulting context-dependent policies, which adapt interventions to the specific outbreak, result in smaller outbreaks than static policies. We further illustrate two approaches for translating the complex machine-readable policies into simple heuristics that can be evaluated by human decision-makers. This article is part of the theme issue 'Modelling infectious disease outbreaks in humans, animals and plants: epidemic forecasting and control'. This theme issue is linked with the earlier issue 'Modelling infectious disease outbreaks in humans, animals and plants: approaches and important themes'.Entities:
Keywords: FMD; machine learning; optimal control; outbreak response; reinforcement learning; vaccination
Mesh:
Year: 2019 PMID: 31104604 PMCID: PMC6558555 DOI: 10.1098/rstb.2018.0277
Source DB: PubMed Journal: Philos Trans R Soc Lond B Biol Sci ISSN: 0962-8436 Impact factor: 6.237
Figure 1.Schematic of Monte Carlo control for solving RL problem (adapted from Sutton and Barto [26]). (Online version in colour.)
Figure 2.Three preprocessing steps to construct the state (a–c). (a) We first plot the farms and identify farm-level infection statuses: black = infected, white = susceptible; then (b) overlay a grid to ‘pixelate’ the landscape so that no more than one farm occupies a pixel; then (c) construct a two-dimensional array of farm-level infection status: 0 = no farm, 1 = infected farm, 2 = susceptible farm. (d) Schematic of utility table, with flattened states as rows and actions as columns. Shaded cell represents the action with the highest utility for the state in each row. The RL methods in both case studies seek to approximate the value function represented in this ‘look-up table’ representation of the state-action space.
Figure 3.The spatial distribution of farms for the three scenarios (rows a–c respectively); circle size scales with farm size (column i). (a(ii)-c(ii)) Performance of DQN, in terms of the reward, r, for each case study during training. (a(iii)-c(iii)) The frequency at which susceptible farms were culled during 2000 simulations of testing (colours) plotted as a function of the mean distance to the initially infected farms and the farm size. (a(iv)-c(iv)) The distribution of rewards for 2000 simulations using either the best DQN policy, a policy of culling farms at random (e.g. a null policy), or a policy of culling infected premises (IPs) or a policy of ring culling.
Figure 4.Optimal policies to minimize outbreak duration as a function of outbreak area and number of infected premises. (a) Output policy for minimizing outbreak duration for different carcass constraints: (i) 11 000, (ii) 13 000, (iii) 15 000, (iv) 17 000 carcasses. (b) Histogram of outbreak duration following enacting ring culling or ring vaccination for states highlighted in a(iii). (c) Heatmap of the frequency of visits to each state throughout all simulations used to construct the RL policy in a(iii). (d) Heatmap of the difference in outbreak duration when using ring culling at 3 km or ring vaccination at 3 km for each state for the carcass constraint illustrated in a(iii). (e) Distribution of outbreak duration for simulations (using culling constraints ranging from 10 000 to 20 000; see electronic supplementary material, figure S6 for all policies) managed using the RL policy compared with static policies of ring culling and ring vaccination; circles give mean, bars give IQR. (Online version in colour.)