Literature DB >> 35324929

An evidence synthesis approach for combining different data sources illustrated using entomological efficacy of insecticides for indoor residual spraying.

Nathan Green¹, Fiacre Agossa², Boulais Yovogan², Richard Oxborough³, Jovin Kitau⁴, Pie Müller^5,6, Edi Constant⁷, Mark Rowland⁸, Emile F S Tchacaya⁷, Koudou G Benjamin⁷, Thomas S Churcher⁹, Michael Betancourt¹⁰, Ellie Sherrard-Smith⁹.

Abstract

BACKGROUND: Prospective malaria public health interventions are initially tested for entomological impact using standardised experimental hut trials. In some cases, data are collated as aggregated counts of potential outcomes from mosquito feeding attempts given the presence of an insecticidal intervention. Comprehensive data i.e. full breakdowns of probable outcomes of mosquito feeding attempts, are more rarely available. Bayesian evidence synthesis is a framework that explicitly combines data sources to enable the joint estimation of parameters and their uncertainties. The aggregated and comprehensive data can be combined using an evidence synthesis approach to enhance our inference about the potential impact of vector control products across different settings over time.
METHODS: Aggregated and comprehensive data from a meta-analysis of the impact of Pirimiphos-methyl, an indoor residual spray (IRS) product active ingredient, used on wall surfaces to kill mosquitoes and reduce malaria transmission, were analysed using a series of statistical models to understand the benefits and limitations of each.
RESULTS: Many more data are available in aggregated format (N = 23 datasets, 4 studies) relative to comprehensive format (N = 2 datasets, 1 study). The evidence synthesis model had the smallest uncertainty at predicting the probability of mosquitoes dying or surviving and blood-feeding. Generating odds ratios from the correlated Bernoulli random sample indicates that when mortality and blood-feeding are positively correlated, as exhibited in our data, the number of successfully fed mosquitoes will be under-estimated. Analysis of either dataset alone is problematic because aggregated data require an assumption of independence and there are few and variable data in the comprehensive format.
CONCLUSIONS: We developed an approach to combine sources from trials to maximise the inference that can be made from such data and that is applicable to other systems. Bayesian evidence synthesis enables inference from multiple datasets simultaneously to give a more informative result and highlight conflicts between sources. Advantages and limitations of these models are discussed.

Entities: Chemical

Mesh：

Substances：
Insecticides

Year: 2022 PMID： 35324929 PMCID： PMC8947499 DOI： 10.1371/journal.pone.0263446

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Interventions that shorten the mean lifespan of a mosquito and interrupt biting cycles are integral to the control of malaria infections across Africa [1]. The entomological effect of the indoor residual spraying of insecticide (IRS) and insecticide-treated nets are tested using experimental hut trials as part of the product validation process and before World Health Organization (WHO) recommendations can be considered and made [2]. These trials apply IRS to huts and then the temporal impact of the IRS is tracked using human study participants who stay overnight to act as bait for blood-seeking mosquitoes. Over the course of the malaria transmission season, when mosquitoes are present, the outcomes of mosquito mortality and feeding attempts are observed and compared to those incurred for a study participant who stays overnight in an unsprayed hut. On entering an IRS treated hut, a mosquito may: i) blood-feed on human-baits; ii) be killed by the IRS chemical compound, or; iii) exit into window or veranda traps. Data may be recorded in what we shall call comprehensive or aggregate formats. Aggregate data provide total counts of a particular outcome but do not account for the counts of other outcomes. Alternatively, comprehensive data provide more detailed information about subgroups representing two or more outcomes. These hut trial data are often published in aggregated format as the key metrics outlined by the WHO (induced mortality, blood-feeding inhibition and deterrence) [2] do not need data to be disaggregated. The effects of interventions must be replicated in multiple settings that have different ecological characteristics to better understand the overall protection that IRS–or other vector control–can afford. Systematic reviews are increasingly used to assess ecological trends in these combined data and summarise evidence to help evaluate interventions [3-5]. Compiling aggregated data (AD) such as experimental hut studies can cause complications for meta-analyses that use the data for slightly different purposes because, among other reasons: i) AD can be presented in inconsistent ways by summarising results [6] making data hard to harmonize across different trials [7]; ii) AD may not fully account for characteristics evident in comprehensive data (CD) leading to ecological bias [8,9]; iii) if large numbers of large trials are available, meta-regression analyses of AD may prove statistically powerful but with few or smaller trials AD may miss clinically significant treatment level effects [10], and; iv) within study variability may be missed [11]. However, there are often few data sets available for any intervention tested and, historically, only a subset of these trials may provide the CD, which could alleviate some of these issues. In the experimental hut data testing IRS products, a specific challenge arises from aggregating data because no distinction is made as to whether mosquitoes have blood-fed and survived or blood-fed and been killed. This is an important epidemiological distinction because those mosquitoes that blood-feed and survive may go on to oviposit or transmit malaria parasites onward. Our recent assessment of IRS products made the assumption that mosquitoes were equally likely to have blood-fed and survived or blood-fed and died on entering a sprayed hut [3]. However, IRS exploits the resting behaviour of mosquitoes after feeding, so we need a method to capture the likely higher mortality in fed mosquitoes. In this paper, we use systematically collated data from Sherrard-Smith et al. [3] on the IRS active 300g/L Pirimiphos-methyl as an example dataset to explore different models that aim to infer how the impact of the vector control product on mosquitoes changes over time using both AD and CD. Within these data we have 4 studies presenting AD (23 time series), of which 1 study also presents CD (2 time series). Careful consideration is required particularly for any analysis where data are aggregated in different ways across trials, and where the comprehensive data are only part of the total available data. Ideally, we want to ensure that any inference using AD is in agreement with inference afforded by the comprehensive data. Bayesian statistical methodologies provide a natural paradigm to analyse evidence from multiple sources in different formats [11-16]. Bayesian evidence synthesis is a framework that explicitly combines data sources enabling joint estimation of parameters and their uncertainties [17]. We compare predictions of each statistical model presented to those estimated by analysing the subset of either aggregated data or comprehensive data individually. We demonstrate the advantage of inferring from both data sources.

Methods

We apply the models to an empirical dataset to explore how mosquito outcomes can be interpreted from experimental hut trials testing the efficacy of an IRS product.

Empirical data

Briefly, a meta-analysis of IRS experimental hut trials was previously conducted on metrics of IRS efficacy [3]. PRISMA guidelines were followed to highlight how best to conduct the systematic review. The outcome metrics of interest are count data for mosquitoes over a time series of multiple months. The original review used four search engines (Web of Knowledge, PubMed, JSTOR and Google Scholar) to find relevant data resources. For the present analysis, these previously published data were then divided into those studies reporting summary data; that is, the total number of mosquitoes, the total number fed, or killed during the trial. These form the aggregated dataset. Other time series included a comprehensive division of mosquito outcomes; that is, the total number of mosquitoes, the number that had fed and died, fed and survived, or not fed and died, or not fed and survived. The aggregated dataset included data without these distinctions, so it is not possible to know whether a fed mosquito was also dead. The IRS product data used consists of the organophosphate insecticide Pirimiphos-methyl that is widely used for IRS campaigns across the African continent since WHO recommendation in 2013 [18]. The product was evaluated using West African experimental huts in Benin, Côte d’Ivoire and Tanzania across 4 studies [19-22]. Twenty three datasets from these studies had aggregated data reporting the total number of mosquitoes that entered sprayed huts, and the total number killed, blood-fed or exited without the distinction for comprehensive assessment for at least 3 time points (S1 Appendix). Comprehensive data were available from two of these datasets (S1 Appendix) (data resources are summarised in Table 1). At least 3 repeated measures through time were made for each study, ranging from 6 to 12 months since the insecticide was first deployed.

Table 1

A list of the studies included and which models are informed by each dataset.

Where indicated, the data are provided in full, in S1 Appendix.

Data source	Aggregated data	Comprehensive data
[21] (using pirimiphos methyl 2 gm^-2)	1 time series	-
[19] (using pirimiphos methyl B 30% CS, pirimiphos methyl AA 30% CS, testing An. gambiae sl and An funestus sl)	16 time series	-
[22] (using pirimiphos methyl B 30% CS, pirimiphos methyl BM 30% CS, mud walls, An arabiensis)	2 times series	-
[20] (using pirimiphos methyl B 30% CS, mud and cement walls, testing An gambiae sl and An. funestus sl)	4 time series	2 time series(Study 1 cement, 0.5 gm^-1)(Study 2 cement, 1 gm^-1)
Total	23	2

*reference for Fig 6.

A list of the studies included and which models are informed by each dataset.

Where indicated, the data are provided in full, in S1 Appendix. *reference for Fig 6.

Fig 6

Probability of success (alive) and fed over time for independent and joint models using simulated trial data.

A copula method is used to impose correlation. The black lines are the Independent estimates and the red lines are the correct joint estimates. The estimates are based on the empirical marginal probabilities. a) negative correlation. b) independent c) positive correlation (as is the case for the experimental hut data examined here).

Our aim is to determine the probability that a mosquito is either killed or blood-fed and surviving after attempting to feed in an experimental hut sprayed with insecticide. We determine how this association changes over time and how probabilities differ between mosquito species. A contingency table to demonstrate the different aggregation of these data from the available sources is shown in Fig 1.

Fig 1

Contingency tables of trial data.

The number of mosquitoes that are blood-fed (X) and the number of mosquitoes that are killed (X) are known for both data sources but, for the aggregated data we do not know directly the number of mosquitoes that are both alive and blood-fed (X) (circled). We have the full complement of data from the comprehensive source.

Contingency tables of trial data.

Statistical methods

We built three statistical models that were fitted in a Bayesian framework using the probabilistic programming software OpenBUGS (release 3.2.3) [23], accessed through R (version 3.6.1) [24]. Data and code are provided in S2 Appendix. We provide equivalent code for the same process to be conducted in RStan (version 2.21.1) [25] in S2 Appendix. All models are represented schematically in Fig 2.

Fig 2

Directed acyclic graphs (DAGs) for models to assess aggregated (a, b), comprehensive (c) or both aggregated and comprehensive (d) data.

Directed acyclic graphs (DAGs) for models to assess aggregated (a, b), comprehensive (c) or both aggregated and comprehensive (d) data.

Models 1, 2a and 2b represent models previously used in this field and Model 3 is the new approach. Model 2a depicts the original analysis in [3]. In each case, β0 and β1 are the shared intercept and time coefficients in the linear component of the logistic regressions, and likewise γ1 and γ0 for Model 1 which has separate sub-models for fed and dead. X denotes the count data and N are the total counts of mosquitos in each subgroup. (a) Mosquitoes are counted as killed with no information on whether they are also blood fed (X) or unfed without information that mosquitoes are also killed (X). We can infer marginal probabilities (shown in orange boxes) then (assuming independence), after estimating the probability of either outcome ( and ), infer the probability that mosquitoes are both alive and blood fed (). b) Alternatively, we can adjust the data using the same assumption of independence prior to fitting the model and fit a logistic binomial model to the adjusted data. c) The same model structure (Model 2) can be fitted to the comprehensive data. d) Using evidence synthesis, we can learn from the comprehensive data (N = 2 datasets) to infer probabilities that are supported by the aggregated data (N = 23 datasets). For each dataset k represents the study index and t is the time point at which data are collected. The first model (in Model 1) uses the aggregated data only and estimates the proportion of mosquitoes that are killed and those that are blood-fed before inferring from the product of these probabilities, those that are surviving and blood-feeding, using the assumption that a mosquito is equally likely to be blood-fed whether killed or not. The second model adjusts the aggregated data directly prior to fitting the model and then fits a logistic binomial to estimate the respective probabilities from the adjusted data. This is an approach used previously [3] on aggregated data (Model 2a). This model structure is also fitted to the comprehensive data where no prior assumptions are required (Model 2b) to estimate the probabilities from the comprehensive data set. The third model (Model 3), a Bayesian evidence synthesis model, is constructed similarly but distinct from previous methods [16,17,26]. This model combines the data sources probabilistically to incorporate the inferences that can be made from the comprehensive data benefited by the additional aggregated data source.

Model 1

The largest number of available trials provide AD. Model 1 disregards the smaller subset of richer CD. These trials do not record which mosquitoes had jointly survived and blood-fed. Posterior distributions are estimated for the marginal probabilities of mosquitoes that are blood-fed (P) or mosquitoes that have been killed (P) (Fig 1). A time-dependent logistic function is fitted to IRS impact on mosquito mortality and mosquito blood-feeding (t, in days). Superscript denotes whether a mosquito will die (d) or survive (s). The subscripts denote the trial identifier k and the time point at which measurements are taken i. We use equivalent formulae for the fed model where blood-fed (f) replaces mosquitoes that die (d). Parameters α and β determine the shape of the binomial relationship and have normally distributed priors with mean μ and variance σ2. The raw data are indicated by X, the count of mosquitoes that were killed (or blood-fed), and N, the total count of mosquitoes. Given the absence of data on the mosquitoes that have survived and blood-fed, we can instead infer this probability by assuming independence, combine the marginal probabilities to estimate the joint probability that mosquitoes are both dead and blood-fed: Or those that survived and fed: In other words, we assume that mosquitoes are equally likely to have survived whether they had blood-fed or not blood-fed (Fig 3, bottom row).

Fig 3

Model predictions of proportion of mosquitoes killed and successfully fed.

For the best-estimate (median posterior predictive value) across all aggregate datasets, 50% (darker shaded region) and 95% (lighter shaded region) credible intervals (CrI) from the respective fits to aggregated data. Subfigures i) and ii) show results in the case of empirically disaggregating the blood fed data before fitting the model (Model 2a), and subfigures iii) and iv) show results for the case of taking this step after model fitting marginal probabilities. Data for the fits are overlaid onto the figure to demonstrate the suitability of the time-dependent functions. In each of the fits, the individual study predictions are overlaid on the figures by thin grey lines and noted by matching symbol type for each timeseries. In iv) the model was not fit to successfully fed data so has no overlaid points. Anopheles gambiae s.l. (circles), An. funestus s.l. (squares) and An. arabiensis (triangles) mosquitoes are noted. The data from the comprehensive source are only used in aggregated format.

Model predictions of proportion of mosquitoes killed and successfully fed.

Model 2

We contrast this with a previous approach [3], where, before fitting the time series data, the number of mosquitoes that were blood feeding (X) is adjusted by assuming that blood-feeding does not increase the probability of being killed. The blood-feeding and surviving mosquitoes are assumed to be: A time-dependent logistic function is fitted to these adjusted data on IRS impact for mosquito survival and successful blood-feeding (t, in days) (Fig 3, top row): This model is also fitted to the comprehensive data only (Fig 4, top row) to estimate the four different potential outcomes from a mosquito feeding attempt (Model 2b) shown in grey boxes in the contingency table (Fig 1).

Fig 4

Model predictions of proportion of mosquitoes killed and successfully fed.

For the best-estimate (median posterior predictive value) across all study datasets, 50% (darker shaded region) and 95% (lighter shaded region) credible intervals (CrI) from the respective fits to the comprehensive data. Subfigures i) and ii) show results using comprehensive data only (Model 2b), and subfigures iii) and iv) show results using all datasets. Data for the fits are overlaid onto subfigures i) and ii) to demonstrate the suitability of the time-dependent functions. In these fits, the individual study predictions are overlaid on the figures by thin grey lines and noted by matching symbol type for each timeseries. As Model 3 is inferred from both AD and CD sources, there are no data to overlay.

Model 3

Model 3 presents the Bayesian evidence synthesis that allows inference of the parameter estimates for each potential mosquito outcome through combining both data sources (Fig 4, lower row). AD are assumed to fit binomial distributions such that: A logistic function is used to describe the time-dependent data: The superscript j represents the possible outcomes noted in the contingency table (Fig 1). The comprehensive data are modelled similarly but using a multinomial distribution for the 4 categories and the logistic link function. The hyper-parameters to generate the posterior predictions for each distribution are linked across the two datasets such that they are exchangeable, i.e. for k consisting of both CD and AD trials To understand better how the variability in data sources affects predictive ability of the evidence synthesis model, we contrast the predicted outcomes of the models in Fig 5. To understand how this prediction is affected by correlated data, i.e. where the predicted mosquitoes blood-feeding is correlated with the mosquito mortality outcome, we use simulated trial data from a simple copula model (Fig 6).

Fig 5

A forest plot of posterior predictions for all models considered.

Showing median, 25% and 75% percentiles for dead and survived fed probabilities across all study datasets at 3, 6 and 12 months. Individual study predictions are shown with grey points.

A forest plot of posterior predictions for all models considered.

Showing median, 25% and 75% percentiles for dead and survived fed probabilities across all study datasets at 3, 6 and 12 months. Individual study predictions are shown with grey points.

Probability of success (alive) and fed over time for independent and joint models using simulated trial data.

Results

The model predictions of the probability that mosquitoes are killed, or alive and blood-fed, in insecticide-treated experimental huts and how these effects change over time are presented in Figs 3 and 4. The median probable outcomes are reassuringly similar however the uncertainties for different models are very different. Where only aggregated data are available, it is only possible to predict the alive and blood-fed probability if some assumption is made about independence between feeding and dying. Assuming independence using aggregated data predicts very similar median impacts to predictions made using data with the full complement of potential outcomes from a mosquito feeding attempt. However, behavioural vector biologists report that Anopheles gambiae sensu lato and An. funestus s.l. most often rest indoors after blood-feeding [27-33], which indicates that blood-fed mosquitoes are possibly more likely to be exposed to insecticides sprayed onto indoor surfaces. Only 2 datasets with a full breakdown of potential outcomes were available (Table 1). The study generating these data was conducted in Benin, and tested An gambiae s.l. mosquitoes in West African huts with both walls and ceilings sprayed. The Benin study provided no netting for sleepers. Almost all mosquitoes blood-feed throughout the studies (median estimate across studies and months since spraying; 91.5%, 95%CI; 68.4% - 100% mosquitoes are blood-fed). We can see in Fig 5 that the difference between the different predictions can be significant over longer periods of time. The model results using comprehensive data only are very uncertain, largely due to few time series available. However, these data provide worthwhile additional information when combined with the aggregate data such that the proportion survived and fed at 12 months could be approximately 10% higher than the aggregate data only model suggests. A major advantage of the Bayesian evidence synthesis over the alternative methods is seen by the tightened uncertainty bounds predicted (see Fig 5). This is driven by the additional data available for this statistical framework. To understand how this prediction is affected by correlated data sources we explore the correlation coefficient using simulated trial data from a simple copula model. We can see from Fig 6a-6c that when mortality and blood-feeding are positively correlated, the number of successfully fed mosquitoes is under-estimated. This is our data situation. Alternatively, when mortality and blood-feeding are negatively correlated the number of successfully fed is over-estimated. Increasing data in the comprehensive resource will always be beneficial to the accuracy of the predictions.

Discussion

Concerns around mosquito resistance to pyrethroid insecticides has accelerated development of alternative insecticide products to help mitigate against any diminishing impact from pyrethroid-based vector control. By necessity, these products have different mechanisms of action so that rigorous testing is required prior to determining their potential benefit. This generates a wealth of data. Alongside this, statistical methods are developing and their availability to biological scientists is increasing through software languages such as R, BUGS and Stan. We demonstrate with these data how statistical models can be adapted and developed to maximise performance of inference from different but related data sources. This is all the more important since there are ethical decisions to be made when conducting such trials that result in differences: a key example is the debate about providing an untreated (sometimes holed) net for the study participant present to lure mosquitoes indoors during an IRS hut study. This is necessary since providing an intact net during an IRS study will prevent most blood-feeding which will then affect mosquito behaviour as IRS is designed primarily to target mosquitoes that are resting on walls after blood-feeding. In an IRS hut without a net most mosquitoes will feed and then rest on walls to digest the blood meal and are then killed by the IRS. A further complication is that, if a mosquito encounters a net, she may alight on the wall unfed and be killed by the IRS in which case the proportion of mosquitoes that are unfed-dead will be higher, as will be the proportion unfed-live mosquitoes, which may go on to feed elsewhere and transmit malaria. Bayesian evidence synthesis (Model 3) allows both the most available, aggregated data and most accurate, comprehensive data to inform predictions of probable outcomes of mosquito feeding attempts. We show that this approach can refine predictions and tighten uncertainty. In practical terms, this may significantly increase or decrease the predicted proportions and so potentially impact decision making. The volume of aggregated data relative to comprehensive data means that the inference is shrunk towards that of Models 1 and 2a (Fig 3) which use aggregated data alone. With a greater volume of comprehensive data, which would be the ideal, these predictions might be altered. Simulating a correlated Bernoulli random variable allows us to understand how these biases play out; essentially, where probabilities are positively correlated (in our case, the probability that mosquitoes will be killed is correlated with the probability that mosquitoes will also be fed), we would expect the evidence synthesis to under-estimate the probability of mosquitoes blood-feeding and surviving. Conversely, were these probabilities negatively correlated then those mosquitoes surviving having blood-fed would be overestimated. In the context of IRS impact on mosquito behaviour, both scenarios are problematic. The first would overestimate the impact of the intervention while the second would undersell its potential. Acknowledging these effects of the statistical approach are therefore crucial. We recognise that other, related modelling choices were available. For example, in terms of data, we could have aggregated the comprehensive data and combined these with the originally aggregated data to create a new aggregated data set. In terms of modelling, the aggregate data-only analysis assumed independent Binomial models for death and successfully fed. We could have estimated the two outcomes jointly as in the full Bayesian evidence synthesis model. However, the aim of this work was to compare approaches performed previously [16,17,26] against the full Bayesian evidence synthesis approach, which we consider the best option given all data and can be used for understanding impact. Alternative approaches to evidence synthesis are published which could be explored for their suitability to the peculiarities of the example used here, such as partial reconstruction of comprehensive data [34]. Another alternative is to include an indicator to the linear predictor to note those studies with additional data or potentially those where some assumption like independence is made a priori and allow them to have more weight in the ultimate predictions [35]. There are also sources of uncertainty in the data that we do not explore here including differences in wall surface, the provision of, and number of holes in, untreated mosquito nets for volunteer sleepers, the geographic location of the hut trial and respective mosquito species represented. Other studies have begun to consider the scale and implications of these differences [36-38]. The approach detailed here can be thought of generally as a method of model calibration using indirect data [39]. In our case, this is using marginal (aggregate) and joint (comprehensive) data to inform a joint distribution, but the problem can be more general than this and the data less directly relevant. The approach provides a principled framework to synthesise multiple, different data sets. To date, Bayesian evidence synthesis has been applied to a range of problems, for example, to estimating HIV and Hepatitis C prevalence [40,41]. Given the increased capacity to collate multiple data and the increased access to statistical software, it is increasingly important to explore how different methods compare and infer biologically relevant conclusions. Bayesian evidence synthesis provides a robust statistical approach when different resources are available but may be biased toward the dataset with the greatest absolute quantity of information.

Datasets for analyses.

(XLSX) Click here for additional data file.

Modelling details, including BUGS and Stan code.

(PDF) Click here for additional data file.

Transfer Alert

This paper was transferred from another journal. As a result, its full editorial history (including decision letters, peer reviews and author responses) may not be present. 10 Nov 2021

PONE-D-21-15152

An evidence synthesis approach for combining different data sources illustrated using entomological efficacy of insecticides for indoor residual spraying

PLOS ONE Dear Dr. Green, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Dec 25 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Monika Gulia-Nuss, PhD Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. In the Methods section, please provide additional details regarding the methodology used to identify the three datasets included in the analysis. 3. As explained in PLOS ONE's manuscript guidelines (http://journals.plos.org/plosone/s/submission-guidelines#loc-references), we do not allow citation of, or reliance on, unpublished work. Please provide the relevant information in the manuscript and/or a Supporting Information file; if the manuscript described in line 118 is accepted for publication while your PLOS ONE submission is under review, you can remove the Supporting Information file and revert to the citation in your Methods section before acceptance/publication of the current submission. 4. Please update your submission to use the PLOS LaTeX template. The template and more information on our requirements for LaTeX submissions can be found at http://journals.plos.org/plosone/s/latex. 5. Please note that in order to use the direct billing option the corresponding author must be affiliated with the chosen institute. Please either amend your manuscript to change the affiliation or corresponding author, or email us at plosone@plos.org with a request to remove this option. 6. PLOS requires an ORCID iD for the corresponding author in Editorial Manager on papers submitted after December 6th, 2016. Please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field. This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. Please see the following video for instructions on linking an ORCID iD to your Editorial Manager account: https://www.youtube.com/watch?v=_xcclfuvtxQ 7. Please amend the manuscript submission data (via Edit Submission) to include authors Fiacre Agossa, Boulais Yovogon, Richard Oxborough, Jovin Kitau, Pie Müller,Edi Constant, Mark Rowland, Emile FS Tchacaya, Koudou G Benjamin, Thomas S Churcher, Michael Betancourt, Ellie Sherrard-Smith. 8. Please upload a copy of Supporting Material 1 ,2, and Supplementary data file 1 which you refer to in your text on page 20. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly Reviewer #2: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: I Don't Know Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The authors present a well written manuscript on use of Bayesian evidence synthesis for predicting effects of indoor residual spraying. The method they propose allows combination of different data types. However, this is also the pitfall of the manuscript. As their Bayesian evidence synthesis model uses different data than the models to which is it is compared, it is challenging to draw conclusions towards differences in performance. Line 38, this reviewer does not see a need for using the registered trademark symbol in the mention of the brand name of the product under study in the abstract. Instead, the authors may use a description of the product using the trivial name, e.g. 300g/L Pirimiphos-methyl Lines 43-44 “The evidence synthesis model was most robust at predicting the probability of mosquitoes dying or surviving and blood-feeding.“ Most robust compared to what? Results Line 260: Reference to figure 5 where figure 6 is meant Figures 3& 4: at the resolution presented, the distinction between dotted and dashed lines mentioned in the legend for data from Benin and Cote d’Ivoire, respectively, is not clear Figure 5: the legend indicates that the plot shows predictions for all models but the annotations on the y-axis are not clear. What is meant with ‘aggregate after’ and aggregate before’? Where can we see the predictions for models 1, 2 and 3? Line 252-254. Indeed the uncertainty is lower for the outcomes predicted for ‘all’ compared to the other groups. But is this an intrinsic property of the model or just a result of the fact that more data is fed into this model? It is not immediate evident from the manuscript that model 3 is superior over the other models. The predictions presented in Figures 3A&C are not very different from those presented in Figure 4C, and uncertainties appear similar. The claim that the Bayesian evidence synthesis provides the best modelling option is not fully substantiated. The manuscript lacks a ‘ground truth’ to which data is compared. The approach described by the authors in lines 302-308, to created an aggregated dataset from comprehensive data, would provide a means to generate such a ground truth. It is advised that the authors include this analysis in a new version of the manuscript Conflict of interest statement Please clarify whether the authors affiliated to private entities (e.g. Abt Associates and Symplectromorphic) hold stock or performed commercial services Reviewer #2: Review of the article: “An evidence synthesis approach for combining different data sources illustrated using entomological efficacy of insecticides for indoor residual spraying” The authors describe a Bayesian evidence synthesis model and framework, building on their previously described models, to combine aggregated and comprehensive data sources for outcomes relating to mosquito feeding. The manuscript outlines the benefits and limitations of previous models and the new modelling framework, through the analysis of 23 aggregated datasets and 3 comprehensive datasets. The evidence synthesis approach is clearly described via DAGs and within the text, and the statistical differences between the previous models and the new approach are generally clearly outlined. My comments mainly relate to the presentation of some of the results: 1) One comment for the Editor, I was unable to access the Supplementary Material and Supplementary Data Files in the Editorial Manager system so my review does not include these materials 2) Throughout most of the manuscript, including the methods and results sections, the authors refer to ‘aggregated’ and ‘comprehensive’ data (i.e. referring to whether the outcomes related to mosquito feeding are published in a combined format or separated by each possible feeding outcome). In the Introduction, rather than ‘comprehensive’ data, the authors refer to ‘individual-level data’ (presumably referring to outcomes reported at an individual level, rather than mosquito level data, i.e. individual participant data, being available). I suggest sticking to the term ‘comprehensive data’ throughout and defining this term in the Introduction, to avoid confusion with individual participant data synthesis, such as IPD meta-analysis, which is a different framework entirely. 3) Some of the labelling of Figures in the text seems to be incorrect: a. Line 139: should this read Figure 2a, Model 1? b. Line 155: Which Model 2 is referred to here, Model 2a? c. Line 218: I'm not following how Figure 5 shows this, should this be Figure 6? d. Line 260: Should this read Figures 6a-c? 4) The assumption have needed to be made in previous models (e.g. Model 1), that mosquitos are equally likely to have been killed and blood-fed is described as a simplifying assumption within the Introduction. While it may be a simple assumption, I guess the key question is whether it is a reasonable assumption to make clinically or not. If I understand correctly, part of the rationale for the new Bayesian evidence synthesis framework is that this assumption is not required? But if this ‘simple’ assumption is actually reasonable, then is the added complexity (presumably) of this new framework a reasonable trade-off to not make the assumption? I think what I’m asking here is for a little more clarity on what the benefits are (if any) of the new modelling framework compared to the previous models with respect to this assumption? 5) I found the legends of most of the Figures quite unclear, specifically what each individual Figure (i.e. a – d) was showing me, i.e. which model (1, 2a/b, 3), aggregated data, comprehensive data or both etc. Although some of this information can be deduced from the text, it would be helpful to have fully descriptive legends. Specifically: a. Figure 2: A clear statement of which of these are the previously described models and which of these is the new model would be helpful. Also, not all notation is defined – e.g. the betas, Y's and N's. Again while some of this can be deduced, a full list would be helpful for completeness and particularly for readers to who may not be experts in DAGs b. Figure 3 and 4: While the general summaries are quite clear, i.e. what the shaded regions shown etc. it isn’t clear what a-d individually show (model (1, 2a/b, 3), aggregated data, comprehensive data or both etc.). I am also unable to see dotted lines and dashed lines corresponding to Benin and Côte d'Ivoire respectively on any of these Figures. c. Figure 5: Not sure I follow what aggregate data before and aggregate data after means 6) Line 248-251: “The model using comprehensive data only is very uncertain. However, these data provide worthwhile additional information when combined with the aggregate data such that the proportion survived and fed at 12 months could be approximately 10% higher than the aggregate data only model..” This is eluded to in the text below but this uncertainty is likely due to the small amount comprehensive data and the large amount of aggregate data in comparison, rather than directly due to the models themselves? Perhaps it would be better to describe that the results or estimates using comprehensive data only are uncertain, rather than the models themselves? 7) A general query, related somewhat to the discussion. The series of models here are clearly suitable for addressing the particular question in hand related to mosquito feeding but are these models ‘tailor made’ only for this question? Are there any features of the models which could be generalizable to a wider set of research questions or contexts? Bringing this out a little more in the discussion may be helpful ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: Yes: Sarah J Nevitt [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 11 Jan 2022 We thank both reviewers for insightful critique of this work which we think has greatly improved it. Reviewer #1: The authors present a well written manuscript on use of Bayesian evidence synthesis for predicting effects of indoor residual spraying. The method they propose allows combination of different data types. However, this is also the pitfall of the manuscript. As their Bayesian evidence synthesis model uses different data than the models to which is it is compared, it is challenging to draw conclusions towards differences in performance. Thank you, however, we disagree here that this is a pitfall because the point of the paper is to explore how the evidence synthesis model is able to incorporate additional comprehensive data to augment the inference, which by definition, we cannot use within the Model 1 framework. We have endeavoured to make the approaches clearer. As this is a statistical analysis, all models will benefit from more relevant data and particularly if these data contain information about additional structure. This work enables us to use all available data to make the most reliable inference. Previously, researchers in this field had to either aggregate data where comprehensive data were sparse, or disregard some resources altogether. We have tried to make this clearer and have added to the introduction: Lines 101-105: “Our recent assessment of IRS products made the assumption that mosquitoes were equally likely to have blood-fed and survived or blood-fed and died on entering a sprayed hut(3). However, IRS exploits the resting behaviour of mosquitoes after feeding, so we need a method to capture the likely higher mortality in fed mosquitoes.” To the method: Lines 121-122: “We apply the models to an empirical dataset to explore how mosquito outcomes can be interpreted from experimental hut trials testing the efficacy of an IRS product.” And: Lines 235-243: “To understand better how the variability in data sources affects predictive ability of the evidence synthesis model, we contrast the predicted outcomes of the models in Figure 5. To understand how this prediction is affected by correlated data sources, i.e. where the predicted mosquitoes blood-feeding is correlated with the mosquito mortality outcome, we explore the correlation coefficient using simulated trial data from a simple copula model (Figure 6).” Line 38, this reviewer does not see a need for using the registered trademark symbol in the mention of the brand name of the product under study in the abstract. Instead, the authors may use a description of the product using the trivial name, e.g. 300g/L Pirimiphos-methyl. We are happy to remove and have followed the advice from the reviewer. Lines 43-44 “The evidence synthesis model was most robust at predicting the probability of mosquitoes dying or surviving and blood-feeding.“ Most robust compared to what? We agreed this was unclear and changed to 'has the smallest uncertainty'. Results Line 260: Reference to figure 5 where figure 6 is meant Thanks for noticing this. Now corrected. Figures 3& 4: at the resolution presented, the distinction between dotted and dashed lines mentioned in the legend for data from Benin and Cote d’Ivoire, respectively, is not clear We agree this was not clear and so we have removed reference to the study country as this is not important for the analysis. We have re-labelled the Figures for clarity to make it clear which model is represented in each row for Figures 3 and 4. We have re-written the legends for clarity as follows: “Fig 3: Model predictions of proportion of mosquitos killed and successfully fed for the best-estimate (median posterior predictive value) across all aggregate datasets, 50% (darker shaded region) and 95% (lighter shaded region) credible intervals (CrI) from the respective fits to aggregated data. Subfigures i) and ii) show results in the case of empirically disaggregating the blood fed data before fitting the model (Model 2a), and subfigures iii) and iv) show results for the case of taking this step after model fitting marginal probabilities. Data for the fits are overlaid onto the figure to demonstrate the suitability of the time-dependent functions. In each of the fits, the individual study predictions are overlaid on the figures by thin grey lines and noted by matching symbol type for each timeseries. In iv) the model was not fit to successfully fed data so has no overlaid points. Anopheles gambiae s.l. (circles), An. funestus s.l. (squares) and An. arabiensis (triangles) mosquitoes are noted. The data from the comprehensive source are only used in aggregated format.” “Fig 4: Model predictions of proportion of mosquitos killed and successfully fed for the best-estimate (median posterior predictive value) across all study datasets, 50% (darker shaded region) and 95% (lighter shaded region) credible intervals (CrI) from the respective fits to the comprehensive data. Subfigures i) and ii) show results using comprehensive data only (Model 2b), and subfigures iii) and iv) show results using all datasets. . Data for the fits are overlaid onto subfigures i) and ii) to demonstrate the suitability of the time-dependent functions. In these fits, the individual study predictions are overlaid on the figures by thin grey lines and noted by matching symbol type for each timeseries. As Model 3 is inferred from both AD and CD sources, there are no data to overlay.” Figure 5: the legend indicates that the plot shows predictions for all models but the annotations on the y-axis are not clear. What is meant with ‘aggregate after’ and aggregate before’? Where can we see the predictions for models 1, 2 and 3? We agree that this was unclear. In this instance, we made the assumption that the same proportion of mosquitoes would have been killed had they blood fed, as would have survived having blood fed. To fit Model 2a using this assumption, we manipulate the data to provide a ‘count’ of mosquitoes that have fed and survived (as the numbers fed and surviving plus dead should sum to the total or less than this count). In Model 1, we obtain this using probabilities after fitting the statistical model. To clarify, we have updated the y-axis as suggested. Now we make explicit reference to the data used and the model employed for each fit. “Fig 5: A forest plot of posterior predictions for all models considered, showing median, 25% and 75% percentiles for dead and survived fed probabilities across all study datasets at 3, 6 and 12 months. Individual study predictions are shown with grey points.” Line 252-254. Indeed the uncertainty is lower for the outcomes predicted for ‘all’ compared to the other groups. But is this an intrinsic property of the model or just a result of the fact that more data is fed into this model? We agree and have added (now to Line 272-274): “The model results using comprehensive data only are very uncertain, largely due to few time series available” It is not immediate evident from the manuscript that model 3 is superior over the other models. The predictions presented in Figures 3A&C are not very different from those presented in Figure 4C, and uncertainties appear similar. The claim that the Bayesian evidence synthesis provides the best modelling option is not fully substantiated. The manuscript lacks a ‘ground truth’ to which data is compared. The approach described by the authors in lines 302-308, to created an aggregated dataset from comprehensive data, would provide a means to generate such a ground truth. It is advised that the authors include this analysis in a new version of the manuscript. In our case, we consider it is better to use a model that principally and appropriately incorporates all the different data sources. Since the results are different for the different models then that is an argument to use the evidence synthesis one. It shows that more information (the inclusion of both AD and CD sources) is modifying the outcomes predicted. The aggregate models assume independence which is unlikely. The question is are these data (the proportion of mosquitoes feeding, and those that are dying) correlated and can we use the information about this in the comprehensive data to help out? From the covariance formula, we get: E(XY) = E(X)E(Y) + cov(X,Y) This gives, p(dead and fed) = p(dead)p(fed) + cov(I_dead, I_fed) And so we don’t need to simulate a trial to show this. The aggregate models do not properly represent the correlation, but the evidence synthesis model has a latent dead and fed category. This also then means that the aggregate and comprehensive data can be fit in the same model. We certainly appreciate the suggestion from the reviewer and so we now show the effect for differently correlated data by employing a copula approach to simulate the trial data. This enables us to simulate uncorrelated and highly correlated dead and fed trial data. This gives the following figure, which we include as Figure 6. The red curves are the correct joint distribution p(fed, dead) and the black curves are the empirical p(fed)p(dead) estimates, which we would have obtained with the naïve model. ”Fig 6. Probability of success (alive) and fed over time for independent and joint models using simulated trial data. A copula method is used to impose correlation. The black lines are the Independent estimates and the red lines are the correct joint estimates. The estimates are based on the empirical marginal probabilities. a) negative correlation. b) independent c) positive correlation (as is the case for the experimental hut data examined here).” We have explained this in more detail. What we see is how the predictions may be affected given the correlation between data outcomes. That is, if blood feeding mosquitoes are more likely to die – if data outcomes are positively correlated – then we can show this would under-estimate the number of successfully fed mosquitoes (this is likely the case for our data). As more comprehensive data becomes available, we can now use the evidence synthesis model to learn this association more robustly, with a better inference of the impacts of the IRS (and other interventions where the similar challenge exists) moving forward. Lines 277-289: “A major advantage of the Bayesian evidence synthesis over the alternative methods is seen by the tightened uncertainty bounds predicted (see Figure 5). This is driven by the additional data available for this statistical framework. To understand how this prediction is affected by correlated data sources we explore the correlation using simulated trial data from a simple copula model. We can see from Figure 6a-c that when mortality and blood-feeding are positively correlated, the number of successfully fed mosquitoes is under-estimated. This is our data situation. Alternatively, when mortality and blood-feeding are negatively correlated the number of successfully fed is over-estimated. Increasing data in the comprehensive resource will always be beneficial to the accuracy of the predictions.” ----- Reviewer #2: Review of the article: “An evidence synthesis approach for combining different data sources illustrated using entomological efficacy of insecticides for indoor residual spraying” The authors describe a Bayesian evidence synthesis model and framework, building on their previously described models, to combine aggregated and comprehensive data sources for outcomes relating to mosquito feeding. The manuscript outlines the benefits and limitations of previous models and the new modelling framework, through the analysis of 23 aggregated datasets and 3 comprehensive datasets. The evidence synthesis approach is clearly described via DAGs and within the text, and the statistical differences between the previous models and the new approach are generally clearly outlined. My comments mainly relate to the presentation of some of the results: 1) One comment for the Editor, I was unable to access the Supplementary Material and Supplementary Data Files in the Editorial Manager system so my review does not include these materials. Apologies, the Supplementary Data (S1 Appendix) containing both complete data sources, and Supplementary Material (S2 Appendix) which contains the code, are now included. 2) Throughout most of the manuscript, including the methods and results sections, the authors refer to ‘aggregated’ and ‘comprehensive’ data (i.e. referring to whether the outcomes related to mosquito feeding are published in a combined format or separated by each possible feeding outcome). In the Introduction, rather than ‘comprehensive’ data, the authors refer to ‘individual-level data’ (presumably referring to outcomes reported at an individual level, rather than mosquito level data, i.e. individual participant data, being available). I suggest sticking to the term ‘comprehensive data’ throughout and defining this term in the Introduction, to avoid confusion with individual participant data synthesis, such as IPD meta-analysis, which is a different framework entirely. Thank you. This is a good point. By comprehensive data in our case we don’t actually require individual level data although this would be the ideal. We have changed all mention of ILD and individual-level as suggested. 3) Some of the labelling of Figures in the text seems to be incorrect: a. Line 139: should this read Figure 2a, Model 1? b. Line 155: Which Model 2 is referred to here, Model 2a? c. Line 218: I'm not following how Figure 5 shows this, should this be Figure 6? d. Line 260: Should this read Figures 6a-c? Thank you for picking this up. Now corrected. We have further adjusted the Figures to make it clearer as to which Model and which data are being considered in each case. We have updated the legends accordingly. 4) The assumption have needed to be made in previous models (e.g. Model 1), that mosquitos are equally likely to have been killed and blood-fed is described as a simplifying assumption within the Introduction. While it may be a simple assumption, I guess the key question is whether it is a reasonable assumption to make clinically or not. If I understand correctly, part of the rationale for the new Bayesian evidence synthesis framework is that this assumption is not required? But if this ‘simple’ assumption is actually reasonable, then is the added complexity (presumably) of this new framework a reasonable trade-off to not make the assumption? I think what I’m asking here is for a little more clarity on what the benefits are (if any) of the new modelling framework compared to the previous models with respect to this assumption? We agree this is an important clarification. IRS is designed to work by targeting the mosquito pathway of activity observed where having fed, mosquitoes then rest on a surface. So, it is clinically likely that the assumption of feeding and dying being independent misrepresents what might be seen in reality. If mosquitoes rest after feeding it may be reasonable to assume more deaths post feeding, and therefore the outcomes to be positively correlated. This is not possible to see with the data when aggregated. So, the benefits are to have an approach that can learn from all the data available, until comprehensive data are numerous enough to use to explore this unknown more thoroughly. We have added to the introduction: Lines 101-105: “Our recent assessment of IRS products made the assumption that mosquitoes were equally likely to have blood-fed and survived or blood-fed and died on entering a sprayed hut(3). However, IRS exploits the resting behaviour of mosquitoes after feeding, so we need a method to capture the likely higher mortality in fed mosquitoes.” To the method: Lines 121-122: “We apply the models to an empirical dataset to explore how mosquito outcomes can be interpreted from experimental hut trials testing the efficacy of an IRS product.” And: Lines 235-243: “To understand better how the variability in data sources affects predictive ability of the evidence synthesis model, we contrast the predicted outcomes of the models in Figure 5. To understand how this prediction is affected by correlated data sources, i.e. where the predicted mosquitoes blood-feeding is correlated with the mosquito mortality outcome, we explore the correlation coefficient using simulated trial data from a simple copula model (Figure 6).” This is further complicated depending on the way that the experiment was conducted which is beyond the scope of our contribution (particularly given we have 2 time series from the same study comprising our comprehensive data). In these experimental hut studies, volunteers are provided with either no net, an untreated and holed net, or an untreated net without holes. This clearly will interact with the observed feeding outcomes. With more data, this is something we could eventually look at with the evidence synthesis framework. We discuss these issues briefly. Lines 300-310: “This is all the more important since there are ethical decisions to be made when conducting such trials that result in differences: a key example is the debate about providing an untreated (sometimes holed) net for the study participant present to lure mosquitoes indoors during an IRS hut study. This is necessary since providing an intact net during an IRS study will prevent most blood-feeding which will then affect mosquito behaviour as IRS is designed primarily to target mosquitoes that are resting on walls after blood-feeding. In an IRS hut without a net most mosquitoes will feed and then rest on walls to digest the blood meal and are then killed by the IRS. A further complication is that, if a mosquito encounters a net, she may alight on the wall unfed and be killed by the IRS in which case the proportion of mosquitoes that are unfed-dead will be higher, as will be the proportion unfed-live mosquitoes, which may go on to feed elsewhere and transmit malaria.” 5) I found the legends of most of the Figures quite unclear, specifically what each individual Figure (i.e. a – d) was showing me, i.e. which model (1, 2a/b, 3), aggregated data, comprehensive data or both etc. Although some of this information can be deduced from the text, it would be helpful to have fully descriptive legends. Specifically: a. Figure 2: A clear statement of which of these are the previously described models and which of these is the new model would be helpful. Also, not all notation is defined – e.g. the betas, Y's and N's. Again while some of this can be deduced, a full list would be helpful for completeness and particularly for readers to who may not be experts in DAGs b. Figure 3 and 4: While the general summaries are quite clear, i.e. what the shaded regions shown etc. it isn’t clear what a-d individually show (model (1, 2a/b, 3), aggregated data, comprehensive data or both etc.). I am also unable to see dotted lines and dashed lines corresponding to Benin and Côte d'Ivoire respectively on any of these Figures. c. Figure 5: Not sure I follow what aggregate data before and aggregate data after means We agree this was unclear and thank the reviewer(s) for highlighting this. We have: - Changed the labelling on the figures to make it clear which model and data source we are using. - Ensured the notation used in the DAGs, Table and methods aligns. - Rewritten the legends (please see below). - Removed reference to study country because this is not directly relevant to this piece of work. “Fig 2: Directed acyclic graphs (DAGs) for models to assess aggregated (a, b), comprehensive (c) or both aggregated and comprehensive (d) data. Models 1, 2a and 2b represent models previously used in this field and Model 3 is the new approach. Model 2a depicts the original analysis in (3). In each case, β_0 and β_1 are the shared intercept and time coefficients in the linear component of the logistic regressions, and likewise γ_1 and γ_0 for Model 1 which has separate sub-models for fed and dead. X denotes the count data and N are the total counts of mosquitos in each subgroup. (a) Mosquitoes are counted as killed with no information on whether they are also blood fed (X^f) or unfed without information that mosquitoes are also killed (X^d). We can infer marginal probabilities (shown in orange boxes) then (assuming independence), after estimating the probability of either outcome ( (P^f ) ~ and (P^sf ) ~), infer the probability that mosquitoes are both alive and blood fed ((P^sf ) ~). b) Alternatively, we can adjust the data using the same assumption of independence prior to fitting the model and fit a logistic binomial model to the adjusted data. c) The same model structure (Model 2) can be fitted to the comprehensive data. d) Using evidence synthesis, we can learn from the comprehensive data (N = 2 datasets) to infer probabilities that are supported by the aggregated data (N = 23 datasets). For each dataset k represents the study index and t is the time point at which data are collected.” Fig 3: Model predictions of proportion of mosquitos killed and successfully fed for the best-estimate (median posterior predictive value) across all aggregate datasets, 50% (darker shaded region) and 95% (lighter shaded region) credible intervals (CrI) from the respective fits to aggregated data. Subfigures i) and ii) show results in the case of empirically disaggregating the blood fed data before fitting the model (Model 2a), and subfigures iii) and iv) show results for the case of taking this step after model fitting marginal probabilities. Data for the fits are overlaid onto the figure to demonstrate the suitability of the time-dependent functions. In each of the fits, the individual study predictions are overlaid on the figures by thin grey lines and noted by matching symbol type for each timeseries. In iv) the model was not fit to successfully fed data so has no overlaid points. Anopheles gambiae s.l. (circles), An. funestus s.l. (squares) and An. arabiensis (triangles) mosquitoes are noted. The data from the comprehensive source are only used in aggregated format. Fig 4: Model predictions of proportion of mosquitos killed and successfully fed for the best-estimate (median posterior predictive value) across all study datasets, 50% (darker shaded region) and 95% (lighter shaded region) credible intervals (CrI) from the respective fits to the comprehensive data. Subfigures i) and ii) show results using comprehensive data only (Model 2b), and subfigures iii) and iv) show results using all datasets. Data for the fits are overlaid onto subfigures i) and ii) to demonstrate the suitability of the time-dependent functions. In these fits, the individual study predictions are overlaid on the figures by thin grey lines and noted by matching symbol type for each timeseries. As Model 3 is inferred from both AD and CD sources, there are no data to overlay. Fig 5: A forest plot of posterior predictions for all models considered, showing median, 25% and 75% percentiles for dead and survived fed probabilities across all study datasets at 3, 6 and 12 months. Individual study predictions are shown with grey points. 6) Line 248-251: “The model using comprehensive data only is very uncertain. However, these data provide worthwhile additional information when combined with the aggregate data such that the proportion survived and fed at 12 months could be approximately 10% higher than the aggregate data only model..” This is eluded to in the text below but this uncertainty is likely due to the small amount comprehensive data and the large amount of aggregate data in comparison, rather than directly due to the models themselves? Perhaps it would be better to describe that the results or estimates using comprehensive data only are uncertain, rather than the models themselves? We completely agree with the reviewer. This was the original intention. We have updated the text as suggested. Lines 273-274: “The model results using comprehensive data only are very uncertain, largely due to few time series available.” 7) A general query, related somewhat to the discussion. The series of models here are clearly suitable for addressing the particular question in hand related to mosquito feeding but are these models ‘tailor made’ only for this question? Are there any features of the models which could be generalizable to a wider set of research questions or contexts? Bringing this out a little more in the discussion may be helpful Thank you for the question. These models are not tailor-made as such but can and have been adopted elsewhere. In fact, the reverse is true, meaning we wanted to see if models that have been successfully used elsewhere for similar problems could be used in this context. We have added the following text to the discussion: Lines 344-349: “The approach detailed here can be thought of generally as a method of model calibration using indirect data. In our case, this is using marginal (aggregate) and joint (comprehensive) data to inform a joint distribution, but the problem can be more general than this and the data less directly relevant. The approach provides a principled framework to synthesise multiple, different data sets. To date, Bayesian evidence synthesis has been applied to a range of problems, for example, to estimating HIV and Hepatitis C prevalence.” 20 Jan 2022 An evidence synthesis approach for combining different data sources illustrated using entomological efficacy of insecticides for indoor residual spraying PONE-D-21-15152R1 Dear Dr. Green, Thank you for your patience! The external reviewer and myself have reviewed your revised manuscript. Congratulations! The manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Monika Gulia-Nuss, PhD Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Thank you for addressing all concerns raised. The manuscript is now recommended for publication. ***** ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Koen Dechering 15 Mar 2022 PONE-D-21-15152R1 An evidence synthesis approach for combining different data sources illustrated using entomological efficacy of insecticides for indoor residual spraying Dear Dr. Green: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Monika Gulia-Nuss Academic Editor PLOS ONE

33 in total

1. Calibration of complex models through Bayesian evidence synthesis: a demonstration and tutorial.

Authors: Christopher H Jackson; Mark Jit; Linda D Sharples; Daniela De Angelis
Journal: Med Decis Making Date: 2013-07-25 Impact factor: 2.583

2. Resting behaviour, ecology and genetics of malaria vectors in large scale agricultural areas of Western Kenya.

Authors: A K Githeko; M W Service; C M Mbogo; F K Atieli
Journal: Parassitologia Date: 1996-12

3. The BUGS project: Evolution, critique and future directions.

Authors: David Lunn; David Spiegelhalter; Andrew Thomas; Nicky Best
Journal: Stat Med Date: 2009-11-10 Impact factor: 2.373

4. Individual patient meta-analysis--rewards and challenges.

Authors: Carl van Walraven
Journal: J Clin Epidemiol Date: 2010-03 Impact factor: 6.437

5. A guide to systematic review and meta-analysis of prognostic factor studies.

Authors: Richard D Riley; Karel G M Moons; Kym I E Snell; Joie Ensor; Lotty Hooft; Douglas G Altman; Jill Hayden; Gary S Collins; Thomas P A Debray
Journal: BMJ Date: 2019-01-30

6. The influence of mosquito resting behaviour and associated microclimate for malaria risk.

Authors: Krijn P Paaijmans; Matthew B Thomas
Journal: Malar J Date: 2011-07-07 Impact factor: 2.979

7. An evidence synthesis approach to estimating the incidence of seasonal influenza in the Netherlands.

Authors: Scott A McDonald; Anne M Presanis; Daniela De Angelis; Wim van der Hoek; Mariette Hooiveld; Gé Donker; Mirjam E Kretzschmar
Journal: Influenza Other Respir Viruses Date: 2013-11-10 Impact factor: 4.380

8. Bias modelling in evidence synthesis.

Authors: Rebecca M Turner; David J Spiegelhalter; Gordon C S Smith; Simon G Thompson
Journal: J R Stat Soc Ser A Stat Soc Date: 2009-01 Impact factor: 2.483

9. Comparative performance of three experimental hut designs for measuring malaria vector responses to insecticides in Tanzania.

Authors: Dennis J Massue; William N Kisinza; Bernard B Malongo; Charles S Mgaya; John Bradley; Jason D Moore; Filemoni F Tenu; Sarah J Moore
Journal: Malar J Date: 2016-03-15 Impact factor: 2.979

Review 10. Get real in individual participant data (IPD) meta-analysis: a review of the methodology.

Authors: Thomas P A Debray; Karel G M Moons; Gert van Valkenhoef; Orestis Efthimiou; Noemi Hummel; Rolf H H Groenwold; Johannes B Reitsma
Journal: Res Synth Methods Date: 2015-08-19 Impact factor: 5.273