| Literature DB >> 32616757 |
Ricardo Andrade-Pacheco1, Francois Rerolle1, Jean Lemoine2, Leda Hernandez3, Aboulaye Meïté4, Lazarus Juziwelo5, Aurélien F Bibaut6, Mark J van der Laan6, Benjamin F Arnold7, Hugh J W Sturrock8.
Abstract
The identification of disease hotspots is an increasingly important public health problem. While geospatial modeling offers an opportunity to predict the locations of hotspots using suitable environmental and climatological data, little attention has been paid to optimizing the design of surveys used to inform such models. Here we introduce an adaptive sampling scheme optimized to identify hotspot locations where prevalence exceeds a relevant threshold. Our approach incorporates ideas from Bayesian optimization theory to adaptively select sample batches. We present an experimental simulation study based on survey data of schistosomiasis and lymphatic filariasis across four countries. Results across all scenarios explored show that adaptive sampling produces superior results and suggest that similar performance to random sampling can be achieved with a fraction of the sample size.Entities:
Year: 2020 PMID: 32616757 PMCID: PMC7331748 DOI: 10.1038/s41598-020-67666-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Experimental procedure.
| Pseudo code for experiments | ||
|---|---|---|
| 1 | for | |
| 2 | for | |
| 3 | | |
| 4 | | |
| 5 | | total number of iterations |
| 6 | for | |
| 7 | | |
| 8 | | |
| 9 | find | |
| 10 | compute validation statistics on | |
| 11 | | |
| 12 | | |
We repeated each experiment a hundred times (line 1), for batches of size 1, 10 and 50 (line 2). We started with an initial random sample of 100 locations (line 3) for both random and adaptive methods (line 4). We incorporated subsequent samples until 100 additional sampling locations were added (line 5). For the locations selected to be sampled we simulated the observed positive cases according to a Binomial distribution with prevalence (line 7) and incorporated the environmental data (line 8). We then used the accumulated data to find the probability of exceeding the threshold (line 9). Finally we defined a new batch of locations according to a random mechanism (line 11) and to the adaptive sampling method proposed (line 12).
for step .
Figure 1Out of sample accuracy (batch size = 1). The solid line represents the average value across 50 repetitions. The shaded area represents the 2.5% and 97.5% quantiles of the values observed across all 50 repetitions at each step. Note that step 1 here refers to the initial random sample of 100 sites. (A) Côte d’Ivoire (). (B) Malawi (). (C) Haiti (). (D) Philippines ().
Figure 2Summary of validation statistics. Metrics computed after adding 100 new samples in batches of 1, 10 and 50 sites. Dots represent the mean and whiskers represent the the 2.5% and 97.5% quantiles of values observed across all 50 repetitions. The thresholds used to define a hotspot are: in Côte d’Ivoire and Malawi and in Haiti and Philippines.
For random design RS with sample size of 100, we show the sample size needed to achieve a similar accuracy using an adaptive design AS.
| Country | Num. obsv. | Accuracy (%) | PPV (%) | Sensitivity (%) | MSE ( | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| RS | AS | RS | AS | RS | AS | RS | AS | RS | AS | ||
| Côte d’Ivoire | 1 | 100 | 27 | 85.2 | 85.3 | 64.9 | 78.6 | 64.8 | 65.1 | 17.3 | 20.5 |
| 10 | 100 | 30 | 85.0 | 85.3 | 78.6 | 78.7 | 64.7 | 65.2 | 16.5 | 20.4 | |
| 50 | 100 | 50 | 85.1 | 85.5 | 79.2 | 78.6 | 64.3 | 65.5 | 17.7 | 20.0 | |
| Malawi | 1 | 100 | 36 | 81.8 | 81.9 | 80.8 | 80.6 | 59.9 | 59.4 | 14.0 | 14.9 |
| 10 | 100 | 40 | 81.6 | 82.0 | 79.5 | 80.5 | 60.8 | 59.7 | 14.1 | 15.0 | |
| 50 | 100 | 50 | 82.1 | 82.1 | 80.3 | 80.3 | 61.9 | 60.4 | 13.8 | 14.5 | |
| Haiti | 1 | 100 | 31 | 82.4 | 82.6 | 70.3 | 75.7 | 43.0 | 35.0 | 1.0 | 1.4 |
| 10 | 100 | 30 | 81.5 | 81.6 | 71.7 | 71.0 | 38.9 | 36.4 | 1.0 | 1.5 | |
| 50 | 100 | 50 | 81.5 | 82.3 | 70.8 | 70.5 | 38.1 | 39.8 | 0.9 | 1.5 | |
| Philippines | 1 | 100 | 7 | 95.2 | 95.2 | 93.7 | 94.4 | 69.7 | 67.7 | 2.5 | 4.3 |
| 10 | 100 | 10 | 95.1 | 95.2 | 94.1 | 93.6 | 68.9 | 68.6 | 2.3 | 5.1 | |
| 50 | 100 | 50 | 95.2 | 95.6 | 94.6 | 85.0 | 68.9 | 79.8 | 2.7 | 5.5 | |
Additional validation statistics: PPV, sensitivity and MSE are also shown. Along the rows, results are shown per country and batch size .
Figure 3Exploration-exploitation trade-off. (A) Spatially correlated uncertainty. (B) Batch selected (red dots) by using the greedy approach of targeting the highest values of uncertainty. (C) Batch of locations selected (red dots) using the acquisition function described in Eq. (6).
Figure 4Simulated prevalence scenarios. The locations of the villages is marked by the dots, whose colors represent the hypothetical prevalence of each scenario. (A) Côte d’Ivoire (schistosomiasis). (B) Malawi (schistosomiasis). (C) Haiti (lymphatic filariasis). (D) Philippines (lymphatic filariasis).