Literature DB >> 36166479

Optimizing laboratory-based surveillance networks for monitoring multi-genotype or multi-serotype infections.

Qu Cheng¹, Philip A Collender¹, Alexandra K Heaney¹, Aidan McLoughlin², Yang Yang³, Yuzi Zhang⁴, Jennifer R Head⁵, Rohini Dasan¹, Song Liang⁶, Qiang Lv⁷, Yaqiong Liu⁸, Changhong Yang⁹, Howard H Chang⁴, Lance A Waller⁴, Jon Zelner^10,11, Joseph A Lewnard⁵, Justin V Remais¹.

Abstract

With the aid of laboratory typing techniques, infectious disease surveillance networks have the opportunity to obtain powerful information on the emergence, circulation, and evolution of multiple genotypes, serotypes or other subtypes of pathogens, informing understanding of transmission dynamics and strategies for prevention and control. The volume of typing performed on clinical isolates is typically limited by its ability to inform clinical care, cost and logistical constraints, especially in comparison with the capacity to monitor clinical reports of disease occurrence, which remains the most widespread form of public health surveillance. Viewing clinical disease reports as arising from a latent mixture of pathogen subtypes, laboratory typing of a subset of clinical cases can provide inference on the proportion of clinical cases attributable to each subtype (i.e., the mixture components). Optimizing protocols for the selection of isolates for typing by weighting specific subpopulations, locations, time periods, or case characteristics (e.g., disease severity), may improve inference of the frequency and distribution of pathogen subtypes within and between populations. Here, we apply the Disease Surveillance Informatics Optimization and Simulation (DIOS) framework to simulate and optimize hand foot and mouth disease (HFMD) surveillance in a high-burden region of western China. We identify laboratory surveillance designs that significantly outperform the existing network: the optimal network reduced mean absolute error in estimated serotype-specific incidence rates by 14.1%; similarly, the optimal network for monitoring severe cases reduced mean absolute error in serotype-specific incidence rates by 13.3%. In both cases, the optimal network designs achieved improved inference without increasing subtyping effort. We demonstrate how the DIOS framework can be used to optimize surveillance networks by augmenting clinical diagnostic data with limited laboratory typing resources, while adapting to specific, local surveillance objectives and constraints.

Entities: Chemical

Mesh：

Year: 2022 PMID： 36166479 PMCID： PMC9543988 DOI： 10.1371/journal.pcbi.1010575

Source DB: PubMed Journal: PLoS Comput Biol ISSN： 1553-734X Impact factor: 4.779

1 Introduction

Laboratory procedures to identify pathogen subtypes (e.g., with respect to strain, genotype, serotype, variant, or phenotypic traits such as drug resistance) are important components of infectious disease surveillance, yielding information on transmissibility, clinical spectrum, evolutionary trends, and subtype-specific risk factors [1-7]. Indeed, information gathered from laboratory pathogen typing is integral to modern disease surveillance, enabling the discovery of SARS-CoV-2 variants with higher transmissibility [7], influenza A serotypes with high mortality and transmissibility [5], changes in the prevalence rate of drug-resistant tuberculosis and Methicillin-resistant Staphylococcus aureus (MRSA) [8,9], shifts in dominant serotypes causing invasive pneumococcal disease[6], and differing routes of infection across hepatitis C virus genotypes [10]. Such findings can guide the development, allocation, and evaluation of public health interventions. For instance, knowledge about the relative prevalence and virulence of pathogen subtypes is used to prioritize subtypes for vaccine or treatment development [11-13]; identify high-risk subpopulations to target with interventions [14]; and evaluate the risk of unintended consequences of interventions, such as serotype replacement [15,16]. Because of the high cost and complexity of collecting and processing laboratory samples, and because data on pathogen subtype may not inform clinical decision-making for individual patients, typing is often undertaken for only a small subset of clinical cases. As examples, 2.8% of COVID-19 cases in the United States have been sequenced since January 10, 2020 [17]; <3% of hand foot and mouth disease (HFMD) cases in China were serotyped between 2011 and 2015 [2]; and only 9 influenza cases per participating laboratory are required to be characterized every other week across the United States to evaluate whether circulating influenza viruses are sufficiently similar genetically and/or antigenically to those that are included in current influenza vaccines [18]. Subtyping even a small proportion of cases may enable relevant inferences about the distribution of pathogen subtypes of interest within the larger set of clinically identified cases of a disease. However, in the absence of well-designed protocols for selection of isolates for subtyping, direct extrapolation of data from subtyped cases to the much broader population of clinical cases is susceptible to substantial biases, e.g., laboratory typing tends to be affected by clinical severity, healthcare capabilities, case clustering status, seasonality and other factors. In China, for example, severe cases of HFMD were serotyped at a rate of 72%, but only 2% of mild cases were serotyped [2]. Such imbalanced sampling regimes, often arising from practical clinical considerations, can substantially impact estimates of genotype-, serotype-, or other subtype-specific epidemiologic parameters (e.g., subtype-specific incidence; response of pathogen subtype distribution to public health interventions; etc.) [2]. Statistical inference may be improved by modifying sampling design to minimize such biases across the surveillance network, such as by redistributing total samples across time, space, or populations. In practice, sampling designs for laboratory subtyping vary widely across surveillance systems, and are generally ad hoc in nature, constrained by budget, logistics, or infrastructure [2,4]. Optimizing sampling under these constraints is a high priority for laboratory surveillance systems [2,19]. Here, we develop methods to support the optimization of sampling clinical cases for laboratory typing with the goal of improved monitoring of the distribution of specific pathogen subtypes, while abiding by constraints on available resources, e.g., the total number of clinical cases subjected to subtyping. Our work is based on the Disease Surveillance Informatics Optimization and Simulation (DIOS) framework [20], which iteratively evaluates surveillance network performance on predefined goals while varying surveillance system design parameters using numerical optimization algorithms. We adapt the DIOS framework to the problem of optimal allocation of laboratory typing resources across subregions and case severity groups of a surveillance network in order to minimize error in estimating the incidence rates of pathogen subtypes causing a clinically-diagnosed disease. We examine major enteroviruses causing HFMD in a region experiencing a high HFMD burden in China to illustrate the application of this framework.

2 Materials and methods

2.1 General framework for optimizing laboratory-based surveillance systems to monitor multi-genotype or multi-serotype infections

Simulation framework

DIOS [20] is a simulation-based optimization framework to facilitate the design of robust disease surveillance systems. DIOS functions by linking disease system models that simulate epidemiologic processes with surveillance system models that simulate information derived from alternative surveillance system designs. Applying DIOS involves specifying surveillance objectives (e.g., accurate estimation of disease frequency; timely outbreak detection; accurate estimation of intervention effectiveness), defining relevant surveillance design parameters (e.g., target population, diagnostic techniques, and site enrollment), and imposing operational constraints (e.g., total resources available for laboratory typing) (Box 1).

Box 1. Example DIOS optimization procedure

Consider the problem of identifying the optimal active surveillance strategy to estimate the incidence rate of a disease in key subpopulations, with possible designs given by altering the number of individuals to be surveyed across each subpopulation and the diagnostic test. Objective: minimize bias in estimated incidence rate within each subpopulation Design parameters: 1) number of persons to be selected from each subpopulation for diagnostic testing; and 2) laboratory technique used for diagnostic testing (e.g., polymerase chain reaction test, rapid antigen test, and culture) Models: The disease system model simulates the underlying dynamics of the target disease in each subpopulation. The surveillance model selects a given number of persons from each subpopulation for testing according to the current design parameter values, simulates test results according to the sensitivity, specificity, or any other relevant characteristics of the test, and extrapolates incidence rates from the test results. The performance of the surveillance model is then evaluated by how close estimated incidence rates are, on average, to the true values simulated by the disease system model, using a score such as mean absolute error. After each evaluation, an optimization search algorithm (e.g., simulated annealing; evolutionary algorithm; particle swarm optimization) is used to update the design parameter, possibly based on an archive of previous performance including the current iteration. The following process is repeated: until a stopping criterion is met, such as exceeding a preset computational budget or failing to improve upon the best simulated design for a certain number of iterations. propose a new design → simulate disease and surveillance processes → evaluate performance of design →; The design parameters associated with the best performance are returned (see [20]). The disease system model (see [20]) may be statistical, mechanistic, or an ensemble of different models or parameters that account for epistemic and parametric uncertainties, and should be developed with special attention to representing any processes thought to be relevant to the surveillance process. Multiple realizations of the disease system model, which may comprise incident cases or other phenomena of interest, is then filtered through measurement processes simulated by the surveillance model [20], which mimics relevant data collection and processing behaviors of a surveillance system, subsequently yielding estimates of the target epidemiologic parameter(s) (e.g., disease incidence; probability of an outbreak; change in incidence following intervention) that can be compared to true underlying values generated by the disease system model to assess the performance of the surveillance design.

Adaptation of DIOS to the design of laboratory-based surveillance for monitoring infections caused by multiple genotypes or serotypes

To apply the DIOS framework to the optimization of surveillance for multiple pathogen subtypes (Fig 1), a first step is to define objective functions to evaluate surveillance performance on estimating epidemiologic parameter(s) related to pathogen subtype(s) of interest. For instance, researchers may be interested in early detection of a more infectious variant of a circulating infection, e.g., the Delta variant of SARS-CoV-2, and therefore specify an objective as minimizing prevalence of that subtype by the time it is detected. If the overall composition of cases associated with multiple pathogen subtypes is of interest, a suitable objective might be to minimize the mean absolute error of incidence rate estimates across subtypes. More example objective functions can be found in Fig 1.

Fig 1

Schematic of the DIOS framework for optimizing surveillance of infections caused by multiple pathogen subtypes, with example design parameters and objective functions presented in green boxes.

Second, design parameters relevant to laboratory surveillance must be conceptualized and defined in the surveillance system model. Examples of surveillance design parameters that may bear on the abovementioned objectives include the number of cases sampled for typing across different subpopulations; the sampling protocols used to select cases to subtype from these subpopulations; and the laboratory techniques used to identify pathogen subtypes. Third, the disease system model must represent the dynamics of multiple pathogen subtypes and their possible interactions, and be able to correct for known biases in the observed surveillance data. For example, the negative interaction between dengue virus serotypes—possibly due to short-term cross-protection [21]—would need to be accounted for in a disease model simulating the incidence of dengue fever associated with multiple serotypes. Similarly, any tendency to select severe cases for typing would need to be corrected by incorporating the heterogenous selection probability of different disease severity groups in the disease system model [2,4]. Finally, the DIOS surveillance model must be able to represent necessary characteristics of laboratory-based surveillance systems, such as assay-dependent classification performance, turnaround time, or cost. For instance, if the design parameter subject to optimization is the laboratory technique used to determine the presence of a pathogen subtype, the surveillance model should be able to simulate known relevant attributes of the candidate techniques, such as the probability of false positive or false negative results.

2.2 Application of DIOS to optimize laboratory-based surveillance of serotypes of enteroviruses causing HFMD

2.2.1 Background

HFMD is a pediatric infectious disease of growing public health importance [22,23], with a particularly high burden in East and Southeast Asia [22,24]. A variety of enteroviruses transmitted through fecal-oral or respiratory routes are causative agents of HFMD—including enterovirus-A71 (EV-A71), coxackievirus-A16 (CV-A16), CV-A6, and CV-A10 [25]. EV-A71 and CV-A16 have long been the serotypes associated with the highest disease burden, but other serotypes, such as CV-A6 are emerging with increasing clinical relevance in recent years [26,27]. The specific etiology of HFMD impacts the severity of symptoms, and has ramifications for intervention strategies, particularly vaccination. In China, recent deployment of monovalent vaccines against EV-A71, the most virulent serotype, has led to a reduction in the incidence of severe HFMD, but the overall incidence of HFMD is still rising, suggesting the possibility of serotype replacement [15]. Thus, it is critical to optimize laboratory surveillance to accurately estimate incidence of all HFMD and severe HFMD attributable to various enterovirus serotypes within the constraints of available resources.

2.2.2 Study region and surveillance system

Between 2004–2013, HFMD was the leading cause of death for children under five years old in China amongst all 39 nationally notifiable infectious diseases, and had the highest incidence of any infectious disease in the country [28, 29]. Since the inclusion in 2008 of HFMD on the list of mandatory notifiable infectious diseases in China, over 22.5 million cases have been reported across the country as of 2019 [30]. Sichuan Province (population >80 million) exhibits strong spatial and temporal heterogeneity in HFMD disease burden across prefectures, and is among multiple ongoing centers of transmission [31]. Clinically diagnosed HFMD cases are registered by the National Infectious Disease Reporting System (NIDRS), which covers nearly all healthcare facilities in China [32]. Clinical cases of HFMD are diagnosed by the presence of papular or vesicular rash on hands, feet, mouth or buttocks with or without fever, and are required to be reported to NIDRS within 24 hours [23]. Because of the narrow affected age group, distinct clinical features, and known seasonality of the disease, clinical diagnosis is considered to be highly specific [33]. Specimens are collected from a subset of clinical cases presenting at sentinel hospitals in an ad hoc manner to determine the underlying serotype using reverse-transcriptase polymerase chain reaction (RT-PCR), and the test results are reported to a laboratory surveillance system [31]. Deidentified data on clinical HFMD cases were obtained for the 21 prefectures of Sichuan from Sichuan Center for Disease Control and Prevention, including serotype information (recorded as EV-A71, CV-A16, or other enterovirus) and indicators of case severity, and were aggregated at the prefectural level for each year from 2009 up to 2015, stopping one year before the introduction of EV-A71 vaccines into the region in 2016 [15]. Prefecture-level population data were collected from public sources for 2009–2015 [34]. The epidemiologic data supporting the optimization analysis herein included a total of 388,365 HFMD cases reported from 2009–2015 in Sichuan, of which 0.87 percent (3,380 cases) were severe. Annual HFMD incidence rates increased gradually over time (Fig 2A) and varied substantially across space (Fig 2B), with the highest annual mean incidence rate observed in Chengdu, the capital prefecture, and its surrounding prefectures, as well as the city with the highest per capita gross regional product, Panzhihua, in the southwest of the province. Laboratory tests were conducted for 22,100 cases (5.7%), with 52% of severe cases and 5.3% of mild cases subjected to serotyping. The number of laboratory-tested cases increased over time (Fig 2C) and exhibited substantial spatial variation (Fig 2D). The proportion of all, mild, and severe HFMD cases tested from 2009–2015 by prefecture are shown in S1 Fig. CV-A16, EV-A71, and other enteroviruses caused 26.6%, 29.1% and 44.3% of all serotyped cases, and 7.3%, 58.5% and 34.1% of severe serotyped cases, respectively, indicating EV-A71 (CV-A16) tended to cause severe (mild) symptoms. CV-A6 and CV-A10 likely constitute the majority of other enteroviruses in circulation [35-38].

Fig 2

Temporal and spatial variations in HFMD incidence rate (A,B) and laboratory serotyping (C,D). (A) HFMD incidence rate for Sichuan 2009–2015; (B) annual mean HFMD incidence rate for each prefecture; (C) number of serotyped HFMD cases by year; (D) proportion of all serotyped cases drawn from each prefecture from 2009–2015. The boundaries of the prefectures were obtained from https://gadm.org/download_country.html.

2.2.3 Defining the optimization problem

We pursued optimization of estimates of total and severe HFMD incidence across serotypes, with the proportion of typing allocated to each prefecture (“location”) and case severity group (mild and severe) as design parameters. The optimization seeks the sample allocation vector = {θ1, θ2,…,θ, θ} (I = 21) that minimizes the mean absolute error (MAE) in the estimates of serotype-specific incidence rate of: 1) total; and 2) severe HFMD across time, space, and realizations, where θ represents the proportion of total serotyping resources allocated to the i-th location in the study province, and θ represents the probability of a severe case being tested, which is assumed to be fixed across locations. After allocating typing resources to severe cases as defined by θ, the remaining available typing according to θ at location i will be allocated to mild cases. The total number of cases sampled for subtyping each year is fixed at the observed frequency of typing (Fig 2C). The optimization problem can be formalized as: where f() is the n-th objective function, representing MAE (i.e., performance) of the candidate surveillance system defined by the design parameter . The first objective function explored (f1()) represents the MAE of the estimated serotype-specific incidence rates of all cases (i.e., incidence rates of EV-A71, CV-A16, and other enteroviruses) across locations, time, serotypes, and realizations (i.e., samples from the posterior distribution) of disease system model, expressed as: Where I, T, K, and R represent the total number of locations (I = 21), study years (T = 6), serotypes (K = 3; for CA-V16 [k = 1], EV-A71 [k = 2], and other enterovirus [k = 3]), and disease system model realizations (R = 80, selected to ensure convergence of the estimated MAE across model runs), respectively; represents the simulated incidence rate of the ith location during the tth year for the kth serotype in the rth realization of the disease system model; and represents the corresponding incidence rate estimated using the laboratory surveillance information ascertained by the surveillance system defined by the design parameter . The methods for simulating and estimating with HFMD surveillance data in the study province are described below in sections 2.2.4 and 2.2.5, respectively. An alternative objective function examined (f2()) represented the MAE of the estimated serotype-specific incidence rates of severe cases across locations, time, serotypes, and realizations of disease system model, defined as: where represents the simulated probability of the kth serotype causing severe disease in the rth realization, while represents its estimate with information ascertained by the surveillance system defined by .

2.2.4 Disease system model

We estimated the underlying serotype-specific incidence rates in each region (λ) and the serotype-specific probability of severe disease (p) using data in the study region with a multivariate spatio-temporal Bayesian hierarchical framework (i.e., “the disease system model”; see schematic and hyperparameter priors in S2 Fig). The unobserved incidence rate of cases caused by serotype k, in location i, in year t, λ, is modeled as: where β0 represents the intercept; represents disease risk factors with corresponding coefficients (although for simplicity, we incorporate only an intercept, but no risk factors, in the model); and γ is a random effect. The vector of γ is organized as with a covariance matrix Σ, which is a separable multivariate space-time conditional autoregressive (MSTCAR) structure. More specifically, Σ is the Kronecker product of three covariance matrices characterizing: the spatial dependence; between-serotype dependence; and the temporal dependence (see S2 Fig for details) [39]. Observed data, representing total HFMD cases at location i, in year t, with severity s, are denoted as , and serotyping results, , are used to infer the latent disease process parameters, as well as parameters of the observation process. Given the large population size of each location, the number of new cases in each location is assumed to be adequately represented by a Poisson distribution [40]: where represents the aggregated incidence rate across serotype groups in location i, during year t, with severity s (s = 1 represents severe disease; s = 2, mild disease); and N represents the population size of location i at year t. We denote the probability of serotype k causing severe disease as p. Thus, the incidence rate of cases of severe disease is , while that of mild disease is , and . We assume that the probability of being selected for laboratory typing does not depend on serotype after conditioning on case severity [2]. The number of annual tests is large, and thus test-positive case counts for each serotype are assumed to be adequately represented by a Poisson distribution: where represents the number of cases tested positive for serotype k at location i, year t, with severity s; and represents the probability of being selected for serotyping at location i, year t, with severity s, which was estimated by smoothing observed data in the study region by assuming spatial and temporal autocorrelation. Epidemiologic parameters estimated by the disease system model can be found in S1 Text. Generating disease data. To ensure that our optimized surveillance scenarios account for uncertainty in the observation process and parameter estimates, after fitting the disease process and observation model to data from 2009–2014, R sets (R = 80) of serotype-specific incidence rates () and parameter values (including β0 and hyperparameters of Σ) were drawn from the joint posterior distributions. The sampled parameter values were used by the surveillance model (section 2.2.5) to simulate and to estimate and under different surveillance designs.

2.2.5 Surveillance model

The surveillance model generates realizations of surveillance information conditional on the simulated disease data and the candidate design parameter. After proposing a sample allocation vector , we first estimated the number of typing tests allocated to location i in year t, for case severity s based on and the total number of laboratory typing tests conducted in year t across all locations (Fig 2C), then further estimated , the probability of being typed based on the estimated number of typing tests and the total observed number of HFMD cases at location i and year t. The estimated probability , together with the rth sample of β0 and hyperparameters of Σ, were then used to re-estimate and based on the disease system model described in section 2.2.4, and to further evaluate the objective functions f1() and f2().

2.2.6 Optimization search

Since design vector is constrained by , possibly rendering the optimization search process less efficient than an unconstrained optimization problem, we first converted the 22-dimensional design vector to an unconstrained 21-dimension internal design vector by following methods described elsewhere [41]. This internal design vector was then optimized with a genetic algorithm (GA)—a metaheuristic optimization algorithm inspired by a natural selection process [42]. GAs have the ability to handle complex optimization problems, avoid local optima, and find near-optimal solutions within a reasonable amount of time [43,44], and have been used extensively in public health and medical research [45-50]. To optimize with a GA, first an initial population of n random designs was generated and the objective function value (i.e., MAE of estimated subtype-specific incidence rates, f1()) of each design was evaluated. A small number of designs with the lowest MAEs survived to the next generation, while other designs were selected for recombination with probability determined by a function of their MAE. For each randomly matched pair of designs in the recombination pool, two new descendants were produced for the next generation, during which crossover occurs with high probability, p, and mutation occurs with low probability, p. If crossover occurs, the descendants were generated as linear combinations of the parent designs with randomly sampled weights. For example, if the two parents are and , and the random weight sampled from Uniform(0,1) is ω, then the two descendants are and , respectively. When mutation happens, one random element of the design vector is changed to a random number sampled from its domain. Following previous studies [49,51], we set the initial population size n = 50, p = 0.8, and p = 0.05. The optimization process took about 45 hours on two nodes, each with a 96 GB RAM and two Skylake 20-core 2.1 GHz processors.

2.2.7 Benchmarking and evaluation of robustness of optima

The surveillance performance of the optimal design was benchmarked against seven archetypal designs: 1) the existing allocation of laboratory typing across locations (Fig 2D, hereafter referred to as Existing); 2) an equal allocation of typing across all locations (hereafter Equal); 3) allocation of typing proportional to the location’s population (hereafter PopSize); 4) allocation of typing proportional to absolute number of HFMD cases (hereafter Case); 5) allocation of typing proportional to HFMD incidence rate (hereafter IncRate); 6) allocation of typing proportional to absolute number of severe HFMD cases (hereafter SevereCase); 7) allocation of typing proportional to HFMD incidence rate of severe cases (hereafter SevereIncRate). See S3 and S4 Figs for the proportion of serotyping allocated to each location under each of these archetypal designs. The proportion of typing tests allocated to each location for these archetypal designs was estimated based on the 2009–2014 data, while the probabilities of severe cases serotyped were set to the values that minimize the MAEs with the corresponding locational allocation strategy, according to grid searches (S5 Fig). These seven archetypal designs were included in the initial population of the GA, together with another 43 randomly generated designs. To examine the robustness of the designs selected by the optimization process, epidemiologic data for 2015 were held out to establish whether optimal designs based on 2009–2014 data performed well for the near-term future. During this process, we compared surveillance performance of the optimal design obtained with the 2009–2014 data to that of the seven archetypal designs described above, using only 2015 data. Furthermore, to investigate how robust the optimal design was when the total typing capacity changes, we repeated the analyses with halved, doubled, and quintupled total frequency of typing across all locations in each year (i.e., scaling the observed frequencies shown in Fig 2C). As an alternative method to examine if the performance of the surveillance system changes with resource limits, we also randomly selected 300 designs from the design space, evaluated the two objective values with the original resource constraint and when each constraint was halved, doubled, or quintupled, and investigated if the designs that performed well under one constraint also performed well under others.

2.2.8 Computing platform and code availability

All analyses were conducted in R 4.0.3 [52] on Berkeley’s Savio computational cluster [53], with rstan package 2.18.2 for Bayesian hierarchical modeling [54], GA package 3.2 for implementation of the genetic algorithm [55], and packages ggplot2 3.1.1 [56], cowplot 0.9.4 [57], and tmap 2.2 [58] for visualization. All code and data are available at: https://github.com/qu-cheng/Lab_surveillance_optimization

3 Results

3.1 Optimal designs

Allocation by location

The existing laboratory surveillance network (Existing archetypal design) allocates approximately a quarter of all subtyping effort to the most populous prefecture in the study region (Chengdu), while in contrast, less than one percent of subtyping effort is allocated to Ganzi, a remote prefecture in the northwestern mountainous region of the study area (Figs 3A and 2D). The optimal designs to minimize error in estimated serotype-specific incidence rates of all HFMD cases (Optimal for all) and only severe HFMD cases (Optimal for severe) shift the typing allocation substantially (Figs 3C and 3D and S6). Although the very populous Chengdu prefecture still receives the largest proportion of typing resources, the optimal designs allocate just 12.2% and 9.5% of total typing resources for the two objectives, respectively. Notably, in S7 Fig, which shows the proportion of cases being serotyped at each location according to the Optimal for all and Optimal for severe designs, certain prefectures with low absolute typing allocations (e.g., Ganzi and Aba in Fig 3C and 3D) are able to serotype a large proportion of total cases (e.g., >30 percent of cases are serotyped in Ganzi and Aba); for the populous Chengdu prefecture, optimal designs serotyped less than 2% of total cases in this prefecture, by comparison.

Fig 3

Comparison between Existing, IncRate, and Optimal subtyping allocation strategies across locations.

Treemaps show the proportion of typing efforts allocated to each location in the (A) Existing, (B) IncRate, and Optimal designs that minimize the error in estimated serotype-specific incidence rate of (C) all HFMD cases and (D) only severe HFMD cases. Tiles represent study locations, with the area of the tile representing the proportion of all typing efforts allocated to the location, and the color of the tile representing the location’s annual mean HFMD incidence rate. Tiles are ordered by decreasing annual mean incidence rate from top to bottom, then left to right. Scatterplots show the correlation between annual mean incidence rate of the optimal proportion of total typing resources allocated to each location to minimize error in estimated serotype-specific incidence rate of (E) all HMFD cases and (F) only severe HFMD cases. Black dots represent the archetypal design IncRate (see definition in section 2.2.7), blue triangles in (E) and squares in (F) represent the optimal allocation strategy for minimizing error in estimating serotype-specific incidence rate for all cases and only severe cases, respectively. The blue lines represent the best fit relating annual mean incidence rates to typing allocations across the Optimal designs. Vertical arrows represent changes from IncRate to Optimal: red arrows represent increases in typing efforts from IncRate to Optimal; green arrows represent reductions in typing efforts from IncRate to Optimal. Inset figures show data for all prefectures, showing the range (red dashed rectangle) displayed in the main panel.

Comparison between Existing, IncRate, and Optimal subtyping allocation strategies across locations.

Allocation by case severity

The optimized proportion of severe cases to serotype depended strongly on the surveillance objective: 0.17 when minimizing errors in serotype-specific total HFMD incidence rates, and 0.70 when minimizing errors in serotype-specific severe HFMD incidence rates. To explore the effect of changing the proportion of severe cases being serotyped on surveillance performance, we fixed the spatial allocation of typing resources for each objective at the values in the Optimal designs while varying the proportion of severe cases subjected to serotyping from 0.01 to 0.99. The mean absolute error (MAE) of estimated total serotype-specific HFMD incidence rates was minimized at 11% of severe cases serotyped (Fig 4A). Notably, the MAE increases for this goal as severe cases are increasingly prioritized for serotyping.

Fig 4

Impact of the proportion of severe cases serotyped on mean absolute error (MAE) of the estimated serotype-specific incidence rate of (A) all HFMD cases and (B) severe HFMD cases. Colored lines are smoothed by Gaussian process models. Black dot and triangle represent the probabilities of severe cases being serotyped that lead to the lowest error in estimating serotype-specific incidence rate of all (dot) and only severe (triangle) HFMD cases; blue dot and triangle represent the optimal designs from GA. For severe HFMD cases, the MAE initially decreases as greater proportions of severe cases are serotyped, then plateaus when about half of the severe cases are serotyped, reaching its optimum when the proportion of severe cases serotyped is 0.65. The optimal proportion of severe cases subjected to serotyping identified by GA are very close to the ones identified in this experiment, which suggests that the GA successfully explored the design space. For further analyses, we updated the probability of serotyping severe cases in both Optimal designs to be the values identified in this grid search of θ conditioning on optimal values of θ1, θ2,…,θ found by the GA, as the conditional grid search guarantees a better or equal estimate of θ.

3.2 Comparisons with archetypal designs

The optimal allocation of subtyping among regional subpopulations and case severity groups—while adhering to the same level of typing effort as the current design (Existing)—yielded a significant improvement in estimating the target parameters. The distribution of error (MAE) of estimated serotype-specific incidence rate of all HFMD and severe HFMD cases, across location, serotype, and year in 1000 realizations of the disease model for the optimal design was compared to the seven archetypal designs described in section 2.2.7 (Fig 5). When compared with the current surveillance design (Existing), with the same number of cases subjected to serotyping, the selected optimal designs (Optimal) exhibit 14.1 and 20.5 percent lower average MAE for the estimated serotype-specific incidence rate of all cases for the 2009–2014 (Fig 5A) and 2015 (Fig 5C) period, respectively; and a 13.3 and 14.8 percent lower average MAE of the estimated serotype-specific incidence rate of only severe cases for the 2009–2014 (Fig 5B) and 2015 (Fig 5D) period, respectively. Among the archetypal designs, IncRate generally performed well for both objectives. The results indicate that optimal designs based on historical observed data from 2009–2014 performed well for the 2015 year, which was held out of the optimization procedure, suggesting that optimal designs identified by DIOS may be useful for planning typing resource allocations in the short-term future.

Fig 5

Surveillance performance of the optimal design and the seven archetypal designs evaluated with data from 2009–2014 and 2015 over 1000 realizations of the disease system model.

Violin plots and boxplots for different designs (shades of color) show the distribution of mean absolute error (MAE) in estimating serotype-specific incidence rates of (A) all cases and (B) only severe cases using 2009–2014 data; and (C) all cases and (D) only severe cases using 2015 data, which was not used in the optimization procedure. The horizontal dashed lines show the median MAEs of the optimal designs.

Surveillance performance of the optimal design and the seven archetypal designs evaluated with data from 2009–2014 and 2015 over 1000 realizations of the disease system model.

3.3 Sensitivity of selected designs to the total number of cases sampled for subtyping

To investigate whether the optimal design is robust to changes in the availability of typing resources, we compared the optimal designs for both objectives when the frequency of typing is set to half, two times, or five times that of historical serotyping rates. With more typing resources, MAE of estimated serotype-specific incidence rate of total and severe cases decreases substantially (S8 Fig), while the optimal location-wise allocation changes modestly (Figs 6 and S9 and S10). For both surveillance objectives, as serotyping resources increase, the optimal proportion of typing allocated at each location tends to become more evenly distributed, particularly for estimating the serotype-specific incidence rates of severe HFMD, because the marginal benefits of more intensive serotyping at locations with higher incidence fall, while more frequent serotyping at locations with lower incidence rates can continue to reduce estimation error.

Fig 6

Scatterplots of annual mean incidence rate and the proportion of typing resources allocating to each location under the archetypal design IncRate (black dots) and the Optimal designs for minimizing the MAE of estimated serotype-incidence rate of all HFMD cases (blue triangles) when the available typing resources is (A) halved, (B) doubled, and (C) quintupled; and the Optimal designs for minimizing the MAE of estimated serotype-incidence rate of severe HFMD cases (blue squares) when the available typing resources is (D) halved, (E) doubled, and (F) quintupled. When examining 300 randomly sampled designs, the MAE of estimated serotype-specific incidence rate of total and severe cases across the four resource limit scenarios were highly correlated (>0.8, S11 Fig), which again suggests that the optimal allocation of laboratory resources is relatively insensitive to resource constraints in this framework, even as additional typing resources results in lower estimation errors. When seeking to minimize error in estimated serotype-specific incidence rates of all HFMD cases, the optimal proportion of severe cases to serotype decreases as the availability of typing resources increases (Fig 7A). Conversely, when seeking to minimize error in estimated serotype-specific incidence rates of severe HFMD cases, the optimal proportion of severe cases to serotype increases as the availability of typing resources increases (Fig 7B). This is likely because for estimating the serotype-specific incidence rates of all HFMD cases, the marginal improvements diminish as long as enough samples of severe cases are tested to accurately estimate the virulence of each serotype; while for estimating the serotype-specific incidence rates of severe HFMD cases, the errors continue to decrease as more severe cases are tested.

Fig 7

Optimal proportion of severe cases to be subjected to serotyping as the availability of typing resources changes, when seeking to minimize error serotype-specific incidence rates of (A) all HFMD cases and (B) only severe HFMD cases.

4 Discussion

Laboratory-based disease surveillance networks are often designed in an ad hoc manner, guided by budgetary, logistical, or infrastructural considerations [19], which may lead to inefficient use of limited typing resources. Here, we adapted the DIOS framework to provide a quantitative platform for the simulation of epidemiologic and surveillance processes in the context of optimizing the allocation of scarce laboratory typing resources under operational constraints. In a case study, we apply the framework to determine how a limited number of samples for typing should be drawn from subpopulations to optimize estimation of serotype-specific incidence rates for all—and the subset of severe—HFMD cases in a study region in China. We demonstrated that, with the same level of typing effort as the existing network, optimal designs chosen using DIOS can reduce the mean absolute error of estimates of serotype-specific incidence rates and proportions of clinical cases caused by each serotype by 14.1 and 13.3 percent, respectively. Although beyond the scope of this study, the DIOS framework accommodates multi-objective optimization as well [20], providing a means to identify optimal designs for simultaneously optimizing both objectives. Changes to the total number of cases sampled for subtyping minimally impacted the relative performance of surveillance designs. Our optimization identified that allocating laboratory typing resources across locations in proportion to their HFMD incidence rates gave near-optimal performance for estimating both the total serotype-specific incidence rates and serotype-specific incidence rates of severe HFMD. For estimating total HFMD incidence, this is fairly intuitive, since errors in incidence rates will exhibit higher variance when the incidence rate itself is higher, additional typing to stabilize these estimates across locations will benefit the average MAE. The optimal design for estimating serotype-specific incidence rates of severe cases involves a slightly more equal distribution of subtyping resources, in part because fewer tests are available to type mild cases as the proportion of severe cases typed increases in the optimal design, which results in insufficient resources to accurately estimate background serotype-specific incidence of mild cases at locations with low incidence rates. Our study opens several areas for future research. While we focused on a surveillance design parameter representing the proportion of all typed cases to be drawn from each region, other design parameters can certainly be examined, such as the sampling of cases for subtyping across demographic groups, the selection of laboratories to include in the surveillance network, and the assays used for typing. Besides the total number of typing tests, other constraints, such as the total cost for processing and shipping the specimens given the fact that the cost may vary across locations, can also be considered. What is more, other surveillance objectives—beyond estimating serotype-specific incidence rates for all cases and only severe cases—are possible, such as early detection of a new subtype or an unusual increase in existing subtypes, evaluation of the effectiveness of subtype-specific interventions, and confirmation of the elimination or eradication of a specific subtype. Multiple objectives can also be evaluated simultaneously through multi-objective optimization, as we have demonstrated elsewhere [20]. Expanding on the use of a single disease system model here, multiple models with different structures—e.g., hierarchical models with different covariance structures, machine learning algorithms, and mechanistical models—and different parameter values or covariates could be run in an ensemble to better represent uncertainty in the underlying epidemiologic processes. Periodic intensive, cross-sectional sampling may also help to validate and fine-tune the design optimization process by providing high-resolution, high-confidence estimates of incidence rates. Furthermore, while this study assumed that the optimal design is fixed and does not change over time, future optimizations could update optimal designs iteratively as new data becomes available, refitting the disease system model and updating the optimal design. Such an adaptive sampling approach may result in improved surveillance performance in settings where transmission dynamics change substantially over time [59]. In conclusion, we have shown that designing laboratory networks for surveillance systems with the DIOS framework can reveal designs that allocate limited resources more efficiently. For jurisdictions with sophisticated computational capabilities, the analyses in this work could be repeated to identify the optimal designs for specific settings and surveillance goals. For regions with limited resources, rules of thumb, such as the allocation of typing resources in proportion to incidence rates, may emerge from simulations of general scenarios. Future work is needed to generate such transcendent surveillance rules for various surveillance design parameters and goals, and to yield improved understanding of the design parameters that would allow the most cost-effective laboratory-based surveillance architectures. The scope of applications of the DIOS framework extends across many dimensions of laboratory-based surveillance networks and associated goals, raising important opportunities for developing the next generation of laboratory surveillance systems to monitor pathogen subtypes.

Epidemiological parameter estimates by the disease system model.

(DOCX) Click here for additional data file. Annual mean percentage of cases being tested for (A) all clinical HFMD cases, (B) mild cases, and (C) severe cases between 2009–2015. The boundaries of the prefectures were obtained from https://gadm.org/download_country.html. (PDF) Click here for additional data file.

Schematic of the multivariate spatio-temporal Bayesian hierarchical model.

See the main text for the definitions of notations. Priors of the hyperparameters are highlighted in blue, while observed data are highlighted in green. (PDF) Click here for additional data file. Proportion of typing resources allocate to each location for the archetypal designs: (A) Existing, (B) Equal, (C) PopSize, (D) Case, (E) IncRate, (F) SevereCase, and (G) SevereIncRate. See descriptions of these designs in Section 2.2.7 of the main text. The prefectures are colored by the proportion of serotyping resources allocated to them, with darker colors representing more serotyping resources. The boundaries of the prefectures were obtained from https://gadm.org/download_country.html. (PDF) Click here for additional data file. Proportion of serotyping resources allocate to each location for the archetypal designs: (A) Existing, (B) Equal, (C) PopSize, (D) Case, (E) IncRate, (F) SevereCase, and (G) SevereIncRate. See descriptions of these designs in Section 2.2.7 of the main text. Each tile represent one location, with the area of the tile proportional to the amount of typing resources allocated to it and the color of the tile representing the annual mean incidence rate of that location. (PDF) Click here for additional data file. Optimal probability of severe cases being serotyped for each archetypal design minimizing mean absolute errors (MAE) of the estimated serotype-specific incidence rate of (A) all HFMD cases and (B) only severe HFMD cases. Different colors represent different archetypal designs. The colored lines are smoothed by Gaussian Process model. Black dots and triangles represent the optimal probability of severe cases being serotyped for each archetypal design minimizing mean absolute errors (MAE) of the estimated serotype-specific incidence rate of all HFMD cases and only severe HFMD cases, respectively. (PDF) Click here for additional data file. The optimal proportion of subtyping to allocate to each location for minimizing mean absolute error in estimating serotype-specific incidence rate of (A) all cases and (B) severe cases. The boundaries of the prefectures were obtained from https://gadm.org/download_country.html. (PDF) Click here for additional data file. The proportion of cases being subtyped according to the optimal designs that minimize mean absolute error in estimating serotype-specific incidence rate of (A) all cases and (B) severe cases. The boundaries of the prefectures were obtained from https://gadm.org/download_country.html. (PDF) Click here for additional data file. Mean absolute error in estimating serotype-specific incidence rate of (A) all cases and (B) only severe cases when the availability of typing resources changes. (PDF) Click here for additional data file. The optimal proportion of subtyping to allocate to each location for minimizing mean absolute error in estimating serotype-specific incidence rate of all cases when the total amount of subtyping resources is (A) half, (B) two times, or (C) five times that of the observed frequency; and for minimizing mean absolute error in estimating serotype-specific incidence rate of severe cases when the total amount of subtyping resources is (D) half, (E) two times, or (F) five times that of the observed frequency. The boundaries of the prefectures were obtained from https://gadm.org/download_country.html. (PDF) Click here for additional data file. Scatterplots of annual mean incidence rate and the proportion of typing resources allocating to each location under the archetypal design IncRate (black dots) and the Optimal designs for minimizing the MAE of estimated serotype-incidence rate of all HFMD cases (blue triangles) when the available typing resources is (A) halved, (B) doubled, and (C) quintupled; and the Optimal designs for minimizing the MAE of estimated serotype-incidence rate of severe HFMD cases (blue squares) when the available typing resources is (D) halved, (E) doubled, and (F) quintupled. (PDF) Click here for additional data file.

Correlation between the objective function values under four resource limit scenarios.

Correlation between the MAEs of estimated serotype-specific incidence rate of (A) all cases and (B) only severe cases. (PDF) Click here for additional data file. 23 May 2022 Dear Prof. Remais, Thank you very much for submitting your manuscript "Optimizing laboratory-based surveillance networks for monitoring multi-genotype or multi-serotype infections" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations. Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Benjamin Althouse Associate Editor PLOS Computational Biology Nina Fefferman Deputy Editor PLOS Computational Biology Jason A. Papin Editor-in-Chief PLOS Computational Biology Feilim Mac Gabhann Editor-in-Chief PLOS Computational Biology *********************** A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately: [LINK] Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: General: This is a Research Article submission by Cheng and colleagues focused on development of an analytic framework to improve decisions on sampling for infectious disease surveillance with the case example of Hand-foot-and-mouth disease in China. They model where to allocate a fixed number of PCR assays that identify viral etiology (CA-V16, EV-A71, or other) amongst N=21 locations with a goal of optimizing serotype specific case incidence. Overall this is an interesting analysis. My main concern is whether the data can truly support the authors’ goals. Major comments: -The goal of the analysis is to compare a model-driven sampling scheme to current practice (archetypal). Therefore a reference standard is needed to identify the “true” serotype specific incidence, by which to compare these approaches. Ideally, this would be high resolution sampling of all locations. However, the data from almost all locations except Chengdu appears quite limited (Figure 2D). Therefore, even with the approach done by the authors with resampling, I worry the data is limited to make an informed comparison. Would welcome clarification and input from the authors, and additional description of the available data. Minor comments: -I am not clear how the DIOS framework is particularly unique beyond an iterative algorithm for this case example. -Could the authors clarify the “Realizations” from equation on line 269 and why there are 80? -There appears to be the assumption that the number of samples is fixed rather than variable, which could be further explored. -How do the authors deal with uncertainty? -The authors mention additional dimensions of time and serotype, is varying these for optimization explored? Would be reasonable to do only location, just curious. -Methods section is somewhat hard to follow. Reviewer #2: Thank you for the chance to review the research article submission "Optimizing laboratory-based surveillance networks for monitoring multi-genotype or multi-serotype infections". This paper is a clear, well written, original, innovative and potentially rather important contribution to the literature on disease surveillance system design, evaluation and optimization. While the paper may appear to overlap in-part with the prior publication by Cheng et al in PLOS Comp Biol from 2020, https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008477, the current submission presents a very nice methodological advance in this area. A few issues with editing and clarity are outlined below (by page and line number), Page 5, Line 98-99: re. "and only 2-3 influenza cases are required to be typed..." Is this a typo, and should be indicating 2-3% of influenza cases, or does CDC and APHL actually say "2-3 influenza cases", and if so, I gather they also stipulate some denominator, possibly such as 2-3 cases per specified geography or time period or population. P6 L110-118: re. "biases arising from sampling clinical cases for subtyping..." This language seems to imply that sampling follows a scheme that is intentionally designed in most cases, but I worry that it is the absence of design that is actually more typical. The academic and engineering perspective should clearly be informing hospital, clinical, laboratory and public health systems and practices. I just wonder if the assumptions in this paper are too far from reality, and a more basic and simple set of guidelines need to be met first by public health surveillance authorities. P7 L138-143: re. "Applying DIOS involves specifying surveillance objectives..." Again, this may be assuming that surveillance systems are being developed, implemented, and are operating under more optimistic conditions then they actually are. While it is not the place for this paper to necessarily address or try to resolve the deficiencies of public health systems, it might be useful to recognize how far from optimal such practices are. To meaningfully run DIOS in most public health settings would likely require a major investment and a realignment in approach. P10 L172-174: very true re. dynamics, and that it should represent known biases due to subtype interaction, but also possibly unrecognized interactions between variants/subtypes that that are not (yet) known. P26 L457 and L460: possibly redundant, should it just be "GA" instead of "GA algorithm" here? ********** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols References: Review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. 15 Jun 2022 Submitted filename: HFMD_lab_surveillance_response_letter_R1.docx Click here for additional data file. 15 Sep 2022 Dear Prof. Remais, We are pleased to inform you that your manuscript 'Optimizing laboratory-based surveillance networks for monitoring multi-genotype or multi-serotype infections' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Benjamin Althouse Academic Editor PLOS Computational Biology Nina Fefferman Section Editor PLOS Computational Biology Jason A. Papin Editor-in-Chief PLOS Computational Biology Feilim Mac Gabhann Editor-in-Chief PLOS Computational Biology *********************************************************** Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: Thank you for your revision. ********** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: None ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No 22 Sep 2022 PCOMPBIOL-D-21-01940R1 Optimizing laboratory-based surveillance networks for monitoring multi-genotype or multi-serotype infections Dear Dr Remais, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Olena Szabo PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

39 in total

1. Early Evidence of Inactivated Enterovirus 71 Vaccine Impact Against Hand, Foot, and Mouth Disease in a Major Center of Ongoing Transmission in China, 2011-2018: A Longitudinal Surveillance Study.

Authors: Jennifer R Head; Philip A Collender; Joseph A Lewnard; Nicholas K Skaff; Ling Li; Qu Cheng; Julia M Baker; Charles Li; Dehao Chen; Alison Ohringer; Song Liang; Changhong Yang; Alan Hubbard; Benjamin Lopman; Justin V Remais
Journal: Clin Infect Dis Date: 2020-12-15 Impact factor: 9.079

2. Hand, foot, and mouth disease in China, 2008-12: an epidemiological study.

Authors: Weijia Xing; Qiaohong Liao; Cécile Viboud; Jing Zhang; Junling Sun; Joseph T Wu; Zhaorui Chang; Fengfeng Liu; Vicky J Fang; Yingdong Zheng; Benjamin J Cowling; Jay K Varma; Jeremy J Farrar; Gabriel M Leung; Hongjie Yu
Journal: Lancet Infect Dis Date: 2014-01-31 Impact factor: 25.071

3. Drug-resistant tuberculosis in Shanghai, China, 2000-2006: prevalence, trends and risk factors.

Authors: X Shen; K DeRiemer; Z-An Yuan; M Shen; Z Xia; X Gui; L Wang; Q Gao; J Mei
Journal: Int J Tuberc Lung Dis Date: 2009-02 Impact factor: 2.373

4. Temporal trends in invasive pneumococcal disease and pneumococcal serotypes over 7 decades.

Authors: Zitta B Harboe; Thomas L Benfield; Palle Valentiner-Branth; Thomas Hjuler; Lotte Lambertsen; Margit Kaltoft; Karen Krogfelt; Hans Christian Slotved; Jens Jørgen Christensen; Helle B Konradsen
Journal: Clin Infect Dis Date: 2010-02-01 Impact factor: 9.079

5. Time series modeling of pathogen-specific disease probabilities with subsampled data.

Authors: Leigh Fisher; Jon Wakefield; Cici Bauer; Steve Self
Journal: Biometrics Date: 2016-07-05 Impact factor: 2.571

6. The Epidemiology of Hand, Foot and Mouth Disease in Asia: A Systematic Review and Analysis.

Authors: Wee Ming Koh; Tiffany Bogich; Karen Siegel; Jing Jin; Elizabeth Y Chong; Chong Yew Tan; Mark Ic Chen; Peter Horby; Alex R Cook
Journal: Pediatr Infect Dis J Date: 2016-10 Impact factor: 2.129

7. Finding hotspots: development of an adaptive spatial sampling approach.

Authors: Ricardo Andrade-Pacheco; Francois Rerolle; Jean Lemoine; Leda Hernandez; Aboulaye Meïté; Lazarus Juziwelo; Aurélien F Bibaut; Mark J van der Laan; Benjamin F Arnold; Hugh J W Sturrock
Journal: Sci Rep Date: 2020-07-02 Impact factor: 4.379

8. Epidemiological and aetiological characteristics of hand, foot, and mouth disease in Sichuan Province, China, 2011-2017.

Authors: Di Peng; Yue Ma; Yaqiong Liu; Qiang Lv; Fei Yin
Journal: Sci Rep Date: 2020-04-09 Impact factor: 4.379

9. Model-informed COVID-19 vaccine prioritization strategies by age and serostatus.

Authors: Kate M Bubar; Kyle Reinholt; Stephen M Kissler; Marc Lipsitch; Sarah Cobey; Yonatan H Grad; Daniel B Larremore
Journal: Science Date: 2021-01-21 Impact factor: 47.728

10. Surveillance systems for neglected tropical diseases: global lessons from China's evolving schistosomiasis reporting systems, 1949-2014.

Authors: Song Liang; Changhong Yang; Bo Zhong; Jiagang Guo; Huazhong Li; Elizabeth J Carlton; Matthew C Freeman; Justin V Remais
Journal: Emerg Themes Epidemiol Date: 2014-11-25