| Literature DB >> 35610985 |
Raghav Awasthi1, Keerat Kaur Guliani2, Saif Ahmad Khan1, Aniket Vashishtha3, Mehrab Singh Gill1, Arshita Bhatt4, Aditya Nagori5, Aniket Gupta1, Ponnurangam Kumaraguru1, Tavpritesh Sethi1.
Abstract
A COVID-19 vaccine is our best bet for mitigating the ongoing onslaught of the pandemic. However, vaccine is also expected to be a limited resource. An optimal allocation strategy, especially in countries with access inequities and temporal separation of hot-spots, might be an effective way of halting the disease spread. We approach this problem by proposing a novel pipeline VacSIM that dovetails Deep Reinforcement Learning models into a Contextual Bandits approach for optimizing the distribution of COVID-19 vaccine. Whereas the Reinforcement Learning models suggest better actions and rewards, Contextual Bandits allow online modifications that may need to be implemented on a day-to-day basis in the real world scenario. We evaluate this framework against a naive allocation approach of distributing vaccine proportional to the incidence of COVID-19 cases in five different States across India (Assam, Delhi, Jharkhand, Maharashtra and Nagaland) and demonstrate up to 9039 potential infections prevented and a significant increase in the efficacy of limiting the spread over a period of 45 days through the VacSIM approach. Our models and the platform are extensible to all states of India and potentially across the globe. We also propose novel evaluation strategies including standard compartmental model-based projections and a causality-preserving evaluation of our model. Since all models carry assumptions that may need to be tested in various contexts, we open source our model VacSIM and contribute a new reinforcement learning environment compatible with OpenAI gym to make it extensible for real-world applications across the globe.Entities:
Keywords: COVID-19; Contextual bandits problem; Policy modeling; Reinforcement learning; Vaccine distribution
Year: 2022 PMID: 35610985 PMCID: PMC9119863 DOI: 10.1016/j.ibmed.2022.100060
Source DB: PubMed Journal: Intell Based Med ISSN: 2666-5212
Fig. 1VacSIM architecture: A novel feed forwarded pipeline for policy learning of COVID-19 vaccine optimal distribution. The actions and rewards obtained from the Deep Reinforcement Learning models were fed forward into the training of Contextual Bandit algorithm so that faster optimal online decisions could be calculated for further dates taking in context, the ever changing demographics of the states.
Fig. 2Flexible model setup for optimization of vaccine distribution policy in India using the VacSIM approach where components can be easily replaced with alternatives and adopted for diverse settings.
Features present in context space used in our models.
| S.No. | Feature | Description |
|---|---|---|
| 1 | Predicted Death Rate | The percentage ratio of the predicted deaths in the State to the total predicted cases in that State calculated using projections obtained from a fitted standard |
| 2 | Predicted Recovery Rate | The percentage ratio of the predicted recoveries in the State to the total predicted cases in that State using projections obtained from the SEIR model. |
| 3 | Population of a State | We extracted the population for each State from the 2011 census data conducted by the government of India [ |
| 4 | Predicted Susceptible Cases of a State | We estimated the susceptible population as the difference between the population of a particular State and the total number of predicted infected cases of that State. |
| 5 | Hospital Facilities in a State | We used overall Hospital Beds, overall ICU Beds and Ventilators data in our models [ |
| 6 | Age Distribution of a State | In order to prioritize vulnerable population, we considered people with age greater than 50. |
Fig. 3Day-wise rewards at every iteration were recorded for all the states. The Mean and SD obtained over 26 day projections are shown as confidence bands. (Top) Smoothed and Increasing Reward Curve of DQN (Bottom) Smoothed and Increasing Reward Curve of ACKTR.
Hyper-parameters used during Policy learning.
| S.No. | Hyperparameter Name | Hyperparameter Value |
|---|---|---|
| 1 | Batch Size: number of vials in one round of distribution. | 1000000 |
| 2 | Exploration Rate of DQN | 10% |
| 3 | Vaccine efficacy | 100% |
| 4 | Number of days to reach full efficacy | 45 |
| 5 | Bucket size | 1000 |
| 6 | Number of recipients per day | 5 |
VacSIM model output on different bucket size.
| ACKTR + CB | 200 | 15.5 | 8 | 17.5 | 58 | 1 |
| 300 | 15.5116 | 7.9208 | 17.4917 | 57.7558 | 1.3201 | |
| 400 | 15.5172 | 8.1281 | 17.734 | 57.3892 | 1.2315 | |
| 500 | 15.5206 | 8.055 | 17.6817 | 57.3674 | 1.3752 | |
| DQN + CB | 200 | 15.8974 | 8.2051 | 16.9231 | 58.4615 | 0.5128 |
| 300 | 15.9864 | 8.1633 | 17.0068 | 58.1633 | 0.6803 | |
| 400 | 16.1616 | 8.3333 | 17.1717 | 57.5758 | 0.7576 | |
| 500 | 16.129 | 8.2661 | 17.1371 | 57.6613 | 0.8065 |
Fig. 4Additional projected infection cases prevented in next 45 days by following VacSIM driven approach instead of Naive approach.
Fig. 5Ensemble averaged causal structure of the Bayesian network obtained from 501 bootstraps, using Hill Climbing optimizer for AIC (left) and BIC (right) as scoring functions. Vaccine Percentage obtained from the model was observed as a parent node of Susceptible cases, thus indicating the causality preserving nature of VacSIM simulations.