Literature DB >> 27154479

Algorithms for detecting and predicting influenza outbreaks: metanarrative review of prospective evaluations.

Abstract

OBJECTIVES: Reliable monitoring of influenza seasons and pandemic outbreaks is essential for response planning, but compilations of reports on detection and prediction algorithm performance in influenza control practice are largely missing. The aim of this study is to perform a metanarrative review of prospective evaluations of influenza outbreak detection and prediction algorithms restricted settings where authentic surveillance data have been used.
DESIGN: The study was performed as a metanarrative review. An electronic literature search was performed, papers selected and qualitative and semiquantitative content analyses were conducted. For data extraction and interpretations, researcher triangulation was used for quality assurance.
RESULTS: Eight prospective evaluations were found that used authentic surveillance data: three studies evaluating detection and five studies evaluating prediction. The methodological perspectives and experiences from the evaluations were found to have been reported in narrative formats representing biodefence informatics and health policy research, respectively. The biodefence informatics narrative having an emphasis on verification of technically and mathematically sound algorithms constituted a large part of the reporting. Four evaluations were reported as health policy research narratives, thus formulated in a manner that allows the results to qualify as policy evidence.
CONCLUSIONS: Awareness of the narrative format in which results are reported is essential when interpreting algorithm evaluations from an infectious disease control practice perspective. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/

Entities: Disease Species

Keywords: detection algorithms; evaluation; influenza; meta-narrative review; prediction algorithms

Mesh：

Year: 2016 PMID： 27154479 PMCID： PMC4861093 DOI： 10.1136/bmjopen-2015-010683

Source DB: PubMed Journal: BMJ Open ISSN： 2044-6055 Impact factor: 2.692

A metanarrative review of influenza detection and prediction algorithm evaluations was restricted to settings where authentic prospective data were used. Application of a semiqualitative review method allowed attention to be paid to critical dissimilarities between narratives, for example, the learning period dilemma caused by the statistical models used in algorithms to detect or predict an influenza-related event must be determined in a preceding time interval. Application of the review inclusion criteria resulted in the exclusion of a large number of papers. These papers may have contained additional narratives, but not on the appropriate topic.

Introduction

Experiences from winter influenza seasons1 and the pandemic pH1N1 outbreak in 20092 suggest that existing information systems used for detecting and predicting outbreaks and informing situational awareness show deficiencies when under heavy demand. Public health specialists seek more effective and equitable response systems, but methodological problems frequently limit the usefulness of novel approaches.3 In these biosurveillance systems, algorithms for outbreak detection and prediction are essential components.4 5 Regarding outbreak detection, characteristics influential for successful performance include representativeness of data and the type and specificity of the outbreak detection algorithm, while influential outbreak characteristics comprise the magnitude and shape of the signal and the timing of the outbreak.6 After detection, mathematical models can be used to predict the progress of an outbreak and lead to the identification of thresholds that determine whether an outbreak will dissipate or develop into an epidemic. However, it has been pointed out that present prediction models have often been designed for particular situations using the data that are available and making assumptions where data are lacking.7 8 In consequence, also biosurveillance models that have been subject to evaluation seldom produce output that fulfils standard criteria for operational readiness.9 For instance, a recent scoping review of influenza forecasting methods assessed studies that validated models against independent data.10 Use of independent data is vital for predictive model validation, because using the same data for model fitting and testing inflates estimates of predictive performance.11 The review concluded that the outcomes predicted and metrics used in validations varied considerably, which limited the possibility to formulate recommendations. Building on these experiences, we set out to perform a metanarrative review of evaluations of influenza outbreak detection and prediction algorithms. To ensure that the review results can be used to inform operational readiness, we restricted the scope to settings where authentic prospective surveillance data had been used for the evaluation.

Methods

A metanarrative review12 was conducted to assess publications that prospectively evaluated algorithms for the detection or short-term prediction of influenza outbreaks based on routinely collected data. A metanarrative review was conducted because it is suitable for addressing the question ‘what works?’, and also to elucidate a complex topic, highlighting the strengths and limitations of different research approaches to that topic.13 Metanarrative reviews look at how particular research traditions have unfolded over time and shaped the kind of questions being asked and the methods used to answer them. They inspect the range of approaches to studying an issue, interpret and produce an account of the development of these separate ‘metanarratives’ and then form an overarching metanarrative summary. The principles of pragmatism (inclusion criteria are guided by what is considered to be useful to the audience), pluralism (the topic is illuminated from multiple perspectives; only research that lacks rigour is rejected), historicity (research traditions are described as they unfold over time), contestation (conflicting data are examined to generate higher order insights), reflexivity (reviewers continually reflect on the emerging findings) and peer review were applied in the analysis.12 Four steps were taken: an electronic literature search was carried out, papers were selected, data from these papers were extracted and qualitative and semiquantitative content analyses were conducted. For data extraction and analyses, researcher triangulation (involving several researchers with different backgrounds) was used as a strategy for quality assurance. All steps were documented and managed electronically using a database. To be included in the review, an evaluation study had to apply an outbreak detection or prediction algorithm to authentic data prospectively collected to detect or predict naturally occurring influenza outbreaks among humans. Following the inclusive approach of the metanarrative review methodology, studies using clinical and laboratory diagnosis of influenza for case verification were included.14 For the evaluations of the prediction algorithms, correlation analyses were also accepted, because interventions could have been implemented during the evaluation period. In addition, studies were required to compare syndromic data with some gold standard data from known outbreaks. All studies published from 1 January 1998 to 31 January 2016 were considered. PubMed was searched using the following search term combinations: ‘influenza AND ((syndromic surveillance) OR (outbreak detection OR outbreak prediction OR real-time prediction OR real-time estimation OR real-time estimation of R))’. The database searches were conducted in February 2016. Only articles and book chapters available in the English language were selected for further analysis. To describe the characteristics of the selected papers, information was documented regarding the main objective, the publication type, whether syndromic data were used, country, algorithm applied and context of application. Information about the papers was analysed semiquantitatively by grouping papers with equal or similar characteristics and by counting the number of papers per group. In the next step, text passages, that is, sentences or paragraphs containing key terms (study aims, algorithm description and application context) were extracted and entered into the database. If necessary, sentences before and after a statement containing the key terms were added to ensure that the meaning and context were not lost. The documentation of data about the papers and the extraction of text were conducted by one reviewer and critically rechecked by a second reviewer. Next, content analysis of the extracted text was performed. The meaning of the original text was condensed. The condensed statements contained as much information as necessary to adequately represent the meaning of the text in relation to the research aim, but were as short and simple as possible to enable straightforward processing. If the original text contained several pieces of information, then a separate condensed statement was created for each piece of information. To analyse the information contained in the papers, a coding scheme was developed inductively. Also, a semantical system was developed to facilitate interpretation of algorithm performance. Values for the area under the curve (AUC) exceeding 0.90, 0.80 and 0.70, respectively, were chosen to denote very strong (outstanding), strong (excellent) and acceptable performance.15 The same limits are used to interpret the area under the weighted receiver operating characteristic curve (AUWROC) and volume under the time-ROC surface (VUTROC) metrics. Sensitivity, specificity and positive predictive value (PPV) limits were set at 0.95, 0.90 and 0.85, respectively, when weekly data were analysed, and 0.90, 0.85 and 0.80 when daily data were analysed, denoting very strong (outstanding), strong (excellent) and acceptable discriminatory performance. To interpret the strength of correlations, limit values were modified from the Cohen scale.16 This scale defines small, medium and large effect sizes as 0.10, 0.30 and 0.50, respectively. The limits for the present study were set at 0.90, 0.80 and 0.70 for analyses of weekly data, and 0.85, 0.75 and 0.65 for daily data, denoting very strong (outstanding), strong (excellent) and acceptable predictive performance. A summary of the sematic system is provided in table 1.

Table 1

Summary of semantic system used to interpret algorithm performance

	Performance
Measurement	Outstanding	Excellent	Acceptable
Outbreak detection and prediction
AUC, AUWROC, VUTROC	0.90	0.80	0.70
Sensitivity, specificity, PPV (weekly)	0.95	0.90	0.85
Sensitivity, specificity, PPV (daily)	0.90	0.85	0.80
Only outbreak prediction
Pearson's correlation (weekly)	0.90	0.80	0.70
Pearson's correlation (daily)	0.85	0.75	0.65

AUC, area under the curve; AUWROC, area under the weighted receiver operating characteristic curve; PPV, positive predictive value; VUTROC, volume under the time-ROC surface.

Summary of semantic system used to interpret algorithm performance AUC, area under the curve; AUWROC, area under the weighted receiver operating characteristic curve; PPV, positive predictive value; VUTROC, volume under the time-ROC surface. Condensed statements could be labelled with more than one code. The creation of the condensed statements and their coding was carried out by one reviewer and rechecked by a second reviewer. Preliminary versions were compared and agreed upon, which resulted in final versions of the condensed statements and coding. The information about the detection and prediction algorithms was summarised qualitatively in tables and analysed semiquantitatively on the basis of the coding. Next analysis phase consisted of identifying the key dimensions of algorithm evaluations, providing a narrative account of the contribution of each dimension and explaining conflicting findings. The resulting two narratives (biodefence informatics and health policy research) are presented using descriptive statistics and narratively without quantitative pooling. In the last step, a wider research team and policy leaders (n=11) with backgrounds in public health, computer science, statistics, social sciences and cognitive science were engaged in a process of testing the findings against their expectations and experience, and their feedback was used to guide further reflection and analysis. The final report was compiled following this feedback.

Results

The search identified eight studies reporting prospective algorithm performance based on data from naturally occurring influenza outbreaks: three studies17–19 evaluated one or more outbreak detection algorithms and five20–24 evaluated prediction algorithms (figure 1).

Figure 1

Flow chart of the paper selection process. Additional reasons for exclusion (*) included that the case definition did not comprise at least a clinical diagnosis of influenza or influenza-like illness. Regarding outbreak detection, outstanding algorithm performance was reported from a Spanish study18 for two versions of algorithms based on hidden Markov models and Serfling regression (table 2). Simple regression was reported to show poor performance in this study. The same technique displayed excellent performance on US influenza data in a study comparing algorithm performances on data from two continents, as did time-series analysis and the statistical process control method based on cumulative sum (CUSUM).19 However, the performance of these three algorithms was found to be poor to acceptable when applied on Hong Kong data in the latter study.

Table 2

Evaluation algorithms include in the metanarrative review and their absolute and relative performance

Study	Algorithm	Modification	Temporal	Absolute performance	Relative performance
Outbreak detection
Closas et al17	Kolmogorov-Smirnov test		Weekly	Acceptable (sensitivity 1.00; specificity 0.88)	No comparisons
Martínez-Beneito et al18	Markov model (hidden) V.1		Weekly	Outstanding (AUWROC 0.97–0.98)	Markov model (switching)>Markov model (hidden)>regression (Serfling)>CUSUM>regression (simple)
	Regression (Serfling)			Outstanding (AUWROC 0.93)	Markov model (switching)>Markov model (hidden)>regression (Serfling)>CUSUM>regression (simple)
	Markov model (hidden) V.2			Outstanding (AUWROC 0.93–0.95)	Markov model (switching)>Markov model (hidden)>regression (Serfling)>CUSUM>regression (simple)
	Regression (simple)			Poor (AUWROC 0.57)	Markov model (switching)>Markov model (hidden)>regression (Serfling)>CUSUM>regression (simple)
	SPC (CUSUM)			Poor (AUWROC 0.65–0.70)	Markov model (switching)>Markov model (hidden)>regression (Serfling)>CUSUM>regression (simple)
Cowling et al19	Time series, dynamic linear model	Different parameter combinations tested. W represents the assumed smoothness of the underlying system. Range: 0.025, 0.050, 0.075 or 0.100	Weekly	Hong Kong: acceptable (VUTROC 0.77, sensitivity 1.00, timeliness 1.40 weeks), with fixed specificity=0.95USA: excellent (VUTROC 0.81, sensitivity 1.00, timeliness 0.75 weeks), with fixed specificity=0.95	Hong Kong data: time series (dynamic linear model)>regression (simple)>CUSUM US data: time series (dynamic linear model)>CUSUM>regression (simple)
	Regression (simple)	Different parameter combinations tested. m represents the number of prior weeks used to calculate the running mean and variance. Range: 3, 5, 7 or 9		Hong Kong: acceptable (VUTROC 0.75, sensitivity 1.00, timeliness 1.72 weeks), with fixed specificity=0.95USA: excellent (VUTROC 0.81, sensitivity 0.90, timeliness 1.45 weeks), with fixed specificity=0.95	Hong Kong data: time series (dynamic linear model)>regression (simple)>CUSUMUS data: time series (dynamic linear model)>CUSUM>regression (simple)
	SPC (CUSUM)	Different parameter combinations tested. d represents the number of weeks t separating the baseline and the index day of the outbreak. Range: 2 or 3. k represents the minimum standardised difference. Range: 1 or 2		Hong Kong: poor (VUTROC 0.56, sensitivity 0.86, timeliness 2.00 weeks), with fixed specificity=0.95USA: excellent (VUTROC 0.90, sensitivity 0.82, timeliness 1.51 weeks), with fixed specificity=0.95	Hong Kong data: time series (dynamic linear model)>regression (simple)>CUSUMUS data: time series (dynamic linear model)>CUSUM>regression (simple)
Outbreak prediction
Timpka et al20	Shewhart type		Daily and weekly	Pandemic outbreak: poor (AUC 0.84; PPV 0.58) on a daily basis and poor (at most acceptable) (AUC 0.78; PPV 0.79) on a weekly basisSeasonal outbreaks: outstanding (AUC 0.89; PPV 0.93) on a daily basis and excellent (AUC 0.83; PPV 1.00) on a weekly basis	No comparisons
Yuan et al21	Multiple linear regression		Monthly	NA. Limits not defined for the adjusted metrics of residuals used (APE)	No comparisons
Jiang et al22	Bayesian network		Daily	Outstanding (r=0.97, prediction on day 13; r=0.94, prediction on day 22)	No comparisons
Burkom et al23	Regression (log-linear, non-adaptive)	Non-adaptive	Daily	NA. Limits not defined for the adjusted metrics of residuals used (MAD, MedAPE)	Ten series of case count data: Holt-Winters>regression (log-linear, adaptive)>regression (log-linear, non-adaptive)
	Regression (log-linear, adaptive)	Adaptive			Ten series of case count data: Holt-Winters>regression (log-linear, adaptive)>regression (log-linear, non-adaptive)
	Holt-Winters (generalised exponential smoothing)				Ten series of case count data: Holt-Winters>regression (log-linear, adaptive)>regression (log-linear, non-adaptive)
Viboud et al24	Method of analogues (non-parametric time-series forecasting method)		Weekly	From poor (r=0.66, for 10-week-ahead prediction) to excellent (r=0.81, for 1-week-ahead prediction)	Method of analogues>autoregressive model (linear)>Stone's naive method
	Autoregressive model (linear)			From poor (r=–0.07, for 10-week-ahead prediction) to acceptable (r=0.73, for 1-week-ahead prediction)	Method of analogues>autoregressive model (linear)>Stone's naive method
	The naive method			Poor (r=–0.09, for 10-week-ahead prediction; r=0.65, for 1-week-ahead prediction)	Method of analogues>autoregressive model (linear)>Stone’s naive method

APE, absolute percentage error; AUC, area under the curve; AUWROC, area under the weighted receiver operating characteristic curve; CUSUM, cumulative sum; MAD, median absolute residual; MedAPE, median absolute percentage error; NA, not applicable; PPV, positive predictive value; SPC, statistical process control; VUTROC, volume under the time-ROC surface.

Evaluation algorithms include in the metanarrative review and their absolute and relative performance APE, absolute percentage error; AUC, area under the curve; AUWROC, area under the weighted receiver operating characteristic curve; CUSUM, cumulative sum; MAD, median absolute residual; MedAPE, median absolute percentage error; NA, not applicable; PPV, positive predictive value; SPC, statistical process control; VUTROC, volume under the time-ROC surface. Regarding prediction algorithms, a French study predicted national-level influenza outbreaks over 18 seasons,24 observing excellent performance for a non-parametric time-series method in 1-week-ahead predictions and poor performance in 10-week-ahead predictions. A study using county-level data from the USA22 reported outstanding predictive performance for a Bayesian network algorithm. However, the predictions in that study were made on days 13 and 22 of one single ongoing outbreak. Another study using telenursing data from a Swedish county to predict influenza outbreaks over three seasons, including the H1N1 pandemic in 2009, showed outstanding performance for seasonal influenza outbreaks on a daily basis and excellent performance on a weekly basis.20 However, the performance for the pandemic was poor on a daily and on a weekly basis (see online supplementary material file). An explanation of the apparent diversity of evaluation methods and findings is that the methodological perspectives and experiences from algorithm evaluations were reported in two distinct narrative formats. These narrative formats can be interpreted to represent biodefence informatics and health policy research, respectively (table 3).

Table 3

Summary of narrative characteristics

Narrative	Storyline	Intended audience*	Learning period dilemma	Theoretical proofs	Population descriptions	End point measures
Biodefenceinformatics17 18 22 23	System verification	Engineers and modellers	Irregular attention	Included in argument	Summary	Various statistical
Health policyresearch19–21 24	System validation	Policymakers	Binding attention	Excluded	Extensive	Standard epidemiological

*In addition to researchers.

Summary of narrative characteristics *In addition to researchers.

The biodefence informatics narrative

Assessments informing construction of technically and mathematically sound algorithms for outbreak detection and prediction were reported from mathematical modelling and health informatics contexts. Research in these fields was described in a biodefence informatics narrative. The setting for this narrative is formative evaluation and justification of algorithms for detection and prediction of atypical outbreaks of infectious diseases and bioterror attacks. In other words, these studies can be said to answer the system verification question: ‘Did we build the system right?’25 The narrative is set in a context where algorithms need to be modified and assured for detection and prediction of microbiological agents with unusual or unknown characteristics, for example, novel influenza virus strains or anthrax.26 The number of studies presented in the biodefence informatics narrative grew rapidly after the terrorist attacks in 2001.27 Reporting of influenza algorithm performance in this narrative is characterised by presentation of statistical or technical advancements, for example, making use of increments instead of rates or introduction of methods based on Markov models.18 As empirical data for logical reasons are scarce in biodefence settings, limited attention is in this narrative paid to the learning period dilemma. This dilemma represents a generic methodological challenge in algorithm development, that is, the statistical associations between indicative observations and the events to be predicted are determined in one time interval (the learning period) and used to predict the occurrence of corresponding events in a later interval (the evaluation period).28 When trying to detect or predict a novel infectious agent, the learning period dilemma primarily shows unavailability of learning data for calibration of model-based algorithms. For instance, for prediction algorithms based on the reproductive number,29 series of learning data of sufficient length for empirical determination of the serial interval cannot be made available during early outbreak stages, implying that the method cannot be used as supposed.30 Moreover, the microbiological features of the pathogen and the environmental conditions in effect during the learning period can change after the algorithm has been defined, requiring adjustments of algorithm components and parameters to be made for preserving the predictive performance. Algorithm performance can in the biodefence informatics be narrative verified by combining prospective evaluations with formal proofs and analyses of simulated and retrospective data. Although it is commonly emphasised that the evaluation results are preliminary with regard to population outcomes,22 the evaluation results are still included in the narrative.

The health policy research narrative

For evaluation study results to qualify as input to recommendations regarding infectious disease control practice, they should conform to general criteria established for health policy evidence. The analyses must be unbiased and not open for manipulation, for example, the data sources and analytic models should be described and fixed before data are accessed for analyses.31 In the corresponding research paradigm, the use of prospective study designs is regarded as the cornerstone in the research process.32 Correspondingly, the studies reported in the health policy research narrative answer the validation question: ‘Have we built the right system for detection and prediction of influenza seasons and outbreaks?’ Although the studies reported in this narrative mainly used data on clinical diagnoses and from laboratory tests, the two most recent studies also employed syndromic data: one study used data from telenursing call centres20 and the other study used data from an internet search engine.21 In the health policy research narrative, the foundation in real-world validation of alerts and predictions was shown, for instance, by pointing out that usually only a small number of annual infectious disease cycles of data are available for evaluations of new algorithms, leading to a constant lack of evidence-based information on which to base policy.19 It was also shown by that space was provided for discussions regarding whether algorithms would yield worse performances when outbreak conditions change, for example, that pandemic incidences are higher than those recorded during interpandemic periods.20 24 Moreover, evaluations presented in the health policy research narrative highlight the quantitative strength of the research evidence. For instance, in the study reporting excellent predictive performance of a non-parametric time-series method,24 the evaluation period lasted 938 weeks and covered an entire nation. In comparison, a prospective study reported in the biodefence informatics narrative accounted for an evaluation of a Bayesian network model22 that lasted 26 weeks and covered one US county.

Discussion

In a metanarrative review of studies evaluating the prospective performance of influenza outbreak detection and prediction algorithms, we found that methodological perspectives and experiences have, over time, been reported in two narratives, representing biodefence informatics and health policy discourse, respectively. Differences between the narratives are found in elements ranging from the evaluation settings and end point measures used to the structure of the argument. The biodefence informatics narrative, having an emphasis on verification of technically and mathematically sound algorithms, originates from the need to rapidly respond to evolving outbreaks of influenza pandemics and agents disseminated in bioterror attacks. Only more recently, studies presented in the biodefence informatics narrative have been directed to common public health problems, such as seasonal influenza and air pollution.33 Although evidence-based practices have been promoted by public health agencies during the period the assessed studies were published,34 only four prospective evaluations of influenza detection and prediction algorithms were reported as a health policy research narrative. However, despite being scarce for influenza, algorithm evaluations emphasising real-world validation of algorithm performance are relatively common for several other infectious diseases, for example, dengue fever.35 One reason for not choosing to report evaluations of influenza detection and prediction algorithms in the health policy narrative may be that the urgent quest for knowledge in association with atypical influenza outbreaks has led to an acceptance of evaluation accounts with limited empirical grounding. These accounts agree with mathematical and engineering research practices in biodefence informatics and are thus accepted as scientific evidence within those domains. This implies that awareness of the narrative format in which evidence is reported is essential when interpreting algorithm evaluations. This study has methodological strengths and limitations that need to be taken into account when interpreting the results. A strength is that it was based on a metanarrative review. This is a relatively new method of systematic analyses of published literature, designed for topics that have been conceptualised differently and studied by different groups of researchers.36 We found that in a historical perspective, researchers from different paradigms have evaluated algorithms for influenza outbreak detection and prediction with different means and purposes. Some researchers have conceptualised algorithm evaluations as an engineering discipline, others as a subarea of epidemiology. The intention was not to conclude recommendations for algorithm use. Instead, the aim was to summarise different perspectives on algorithm development and reporting in overarching narratives, highlighting what different researchers might learn from one another's approaches. Regarding the limitations of the review, it must be taken into consideration that the ambition was to base the narrative analysis on evaluations with relevance for operational readiness and real-world application. There is a possibility that we failed to identify some relevant evaluations due to the absence of specific indexing terms for infection disease detection and prediction methods and that we excluded studies that were not indexed in research databases. However, we believe that the probability that we missed relevant evaluations for these reasons is low. We initially identified 1084 studies out of which 116 had relevant abstracts. Following examination of the corresponding articles, the majority had to be excluded from the final review because they did not fulfil the inclusion criteria at the detailed level (figure 1). One overall interpretation of this finding is that more research activity had been associated with developing detection and prediction algorithms than evaluating them and carefully reporting the results. For instance, a large number of interesting studies had to be excluded because non-prospective data were used for the evaluations, for example, the models were developed from learning data and evaluated against out-of-sample verification data from the same set using a leave-one-season-out approach.37 38 Regarding prediction algorithms, numerous potentially interesting studies were excluded because they did not report standard evaluation metrics. One example is a prospective Japanese study of predictions conducted during the pandemic outbreak in 2009, which reported only descriptive results.39 We found no prospective algorithm evaluations that applied an integrated outbreak detection and prediction. An Australian study applied an algorithm including detection and prediction functions,40 but this study used simulated data for the evaluation. Nonetheless, the eligibility criteria applied in this review accepted syndromic definitions of influenza as the gold standard, that is, specified sets of symptoms not requiring laboratory confirmation for diagnosis.41 If laboratory-confirmed diagnosis of influenza would have been included in the criteria, almost no studies would have qualified for inclusion in the review. In summary, two narratives for reporting influenza detection and prediction algorithm evaluations have been identified. In the biodefence informatics narrative, technical and mathematical verification of algorithms is described, while the health policy narrative is employed to allow conclusions to be drawn about public health policy. A main dissimilarity between the narratives is the attention paid to the learning period dilemma. This dilemma represents a generic methodological challenge in the development of biosurveillance algorithms; the statistical models used to detect or predict an influenza-related event must be determined in a preceding time interval (the learning period). This means that there is always a shortage of time when algorithms for novel infectious diseases are to be validated in real-world settings. We offer two suggestions for future research and development based on these results. First, a sequence of evaluation research phases interconnected by a translation process should be defined, starting from theoretical research on construction of new algorithms in the biodefence informatics setting and proceeding stepwise to prospective field trials performed as health policy research. In the latter setting, the evaluation study design should be registered in an international trial database, such as ClinicalTrials.gov, before the start of prospective data collection. Second, standardised and transparent reporting criteria should be formulated for all types of algorithm evaluation research. The recent development of consensus statements for evaluations of prognostic models in clinical epidemiology42 can here be used as a reference.

35 in total

Review 1. Assessing a decade of public health preparedness: progress on the precipice?

Authors: Elin A Gursky; Gregory Bice
Journal: Biosecur Bioterror Date: 2012-03

Review 2. Evidence based public health: A review of the experience of the National Institute of Health and Clinical Excellence (NICE) of developing public health guidance in England.

Authors: Michael Kelly; Antony Morgan; Simon Ellis; Tricia Younger; Jane Huntley; Catherine Swann
Journal: Soc Sci Med Date: 2010-07-14 Impact factor: 4.634

3. Bayesian prediction of an epidemic curve.

Authors: Xia Jiang; Garrick Wallstrom; Gregory F Cooper; Michael M Wagner
Journal: J Biomed Inform Date: 2008-06-13 Impact factor: 6.317

4. Prediction and surveillance of influenza epidemics.

Authors: Justin R Boyle; Ross S Sparks; Gerben B Keijzers; Julia L Crilly; James F Lind; Louise M Ryan
Journal: Med J Aust Date: 2011-02-21 Impact factor: 7.738

5. Real-time estimation and prediction for pandemic A/H1N1(2009) in Japan.

Authors: Yasushi Ohkusa; Tamie Sugawara; Kiyosu Taniguchi; Nobuhiko Okabe
Journal: J Infect Chemother Date: 2011-03-09 Impact factor: 2.211

Review 6. Emergence of viral diseases: mathematical modeling as a tool for infection control, policy and decision making.

Authors: Derrick Louz; Hans E Bergmans; Birgit P Loos; Rob C Hoeben
Journal: Crit Rev Microbiol Date: 2010-08 Impact factor: 7.624

7. Requirements and design of the PROSPER protocol for implementation of information infrastructures supporting pandemic response: a Nominal Group study.

Authors: Toomas Timpka; Henrik Eriksson; Elin A Gursky; Magnus Strömgren; Einar Holm; Joakim Ekberg; Olle Eriksson; Anders Grimvall; Lars Valter; James M Nyce
Journal: PLoS One Date: 2011-03-28 Impact factor: 3.240

Review 8. What have we learned from the novel influenza A (H1N1) pandemic in 2009 for strengthening pandemic influenza preparedness?

Authors: José Santos-Preciado; Carlos Franco-Paredes; Isabel Hernandez-Flores; Ildefonso Tellez; Carlos Del Rio; Roberto Tapia-Conyer
Journal: Arch Med Res Date: 2010-01-29 Impact factor: 2.235

Review 9. Use of unstructured event-based reports for global infectious disease surveillance.

Authors: Mikaela Keller; Michael Blench; Herman Tolentino; Clark C Freifeld; Kenneth D Mandl; Abla Mawudeku; Gunther Eysenbach; John S Brownstein
Journal: Emerg Infect Dis Date: 2009-05 Impact factor: 6.883

10. Estimation of the reproductive number and the serial interval in early phase of the 2009 influenza A/H1N1 pandemic in the USA.

Authors: Laura Forsberg White; Jacco Wallinga; Lyn Finelli; Carrie Reed; Steven Riley; Marc Lipsitch; Marcello Pagano
Journal: Influenza Other Respir Viruses Date: 2009-11 Impact factor: 4.380

4 in total

1. Influenza detection and prediction algorithms: comparative accuracy trial in Östergötland county, Sweden, 2008-2012.

Authors: A Spreco; O Eriksson; Ö Dahlström; T Timpka
Journal: Epidemiol Infect Date: 2017-05-17 Impact factor: 4.434

2. Performances of statistical methods for the detection of seasonal influenza epidemics using a consensus-based gold standard.

Authors: C Souty; R Jreich; Y LE Strat; C Pelat; P Y Boëlle; C Guerrisi; S Masse; T Blanchon; T Hanslik; C Turbelin
Journal: Epidemiol Infect Date: 2017-12-06 Impact factor: 4.434

3. Integrated Detection and Prediction of Influenza Activity for Real-Time Surveillance: Algorithm Design.

Authors: Armin Spreco; Olle Eriksson; Örjan Dahlström; Benjamin John Cowling; Toomas Timpka
Journal: J Med Internet Res Date: 2017-06-15 Impact factor: 5.428

4. Traditions of research in community mental health care planning and care coordination: A systematic meta-narrative review of the literature.

Authors: Aled Jones; Ben Hannigan; Michael Coffey; Alan Simpson
Journal: PLoS One Date: 2018-06-22 Impact factor: 3.240

4 in total