Rajib Paul1, Dan Han2, Elise DeDoncker3, Diana Prieto4,5. 1. Department of Public Health Sciences, University of North Carolina at Charlotte, Charlotte, North Carolina, USA. 2. Department of Mathematics, University of Louisville, Louisville, Kentucky, USA. 3. Department of Computer Science, Western Michigan University, Kalamazoo, Michigan, USA. 4. Carey School of Business, Johns Hopkins University, Baltimore, Maryland, USA. 5. School of Industrial Engineering, Pontificia Universdad de Catòlica de Valparaìso, Valparaìso, Chile.
Abstract
Real-time trends from surveillance data are important to assess and develop preparedness for influenza outbreaks. The overwhelming testing demand and limited capacity of testing laboratories for viral positivity render daily confirmed case data inaccurate and delay its availability in preparedness. Using Bayesian dynamic downscaling models, we obtained posterior estimates for daily influenza incidences from weekly estimates of the Centers for Disease Control and Prevention and daily reported constitutional and respiratory complaints during emergency department (ED) visits obtained from the state health departments. Our model provides one-day and seven-day lead forecasts along with 95 % $$ \% $$ prediction intervals. Our hybrid Markov Chain Monte Carlo and Kalman filter algorithms facilitate faster computation and enable us to update our estimates as new data become available. Our method is tested and validated using the State of Michigan data over the years 2009-2013. Reported constitutional and respiratory complaints at the EDs showed strong correlations of 0.81 and 0.68 respectively, with influenza rates. In general, our forecast model can be adapted to track an outbreak with only one respiratory virus as a causative agent.
Real-time trends from surveillance data are important to assess and develop preparedness for influenza outbreaks. The overwhelming testing demand and limited capacity of testing laboratories for viral positivity render daily confirmed case data inaccurate and delay its availability in preparedness. Using Bayesian dynamic downscaling models, we obtained posterior estimates for daily influenza incidences from weekly estimates of the Centers for Disease Control and Prevention and daily reported constitutional and respiratory complaints during emergency department (ED) visits obtained from the state health departments. Our model provides one-day and seven-day lead forecasts along with 95 % $$ \% $$ prediction intervals. Our hybrid Markov Chain Monte Carlo and Kalman filter algorithms facilitate faster computation and enable us to update our estimates as new data become available. Our method is tested and validated using the State of Michigan data over the years 2009-2013. Reported constitutional and respiratory complaints at the EDs showed strong correlations of 0.81 and 0.68 respectively, with influenza rates. In general, our forecast model can be adapted to track an outbreak with only one respiratory virus as a causative agent.
Every year, it is estimated that around one billion individuals fall sick with flu like symptoms worldwide,
and between and individuals die from illnesses associated with seasonal influenza.
When viruses mutate, immunity mechanisms fall short in responding to the mutation, which may yield a rapid spread of pandemic proportions, like the H1N1 outbreak in 2009. Due to the continuously changing and fast spreading nature of respiratory viruses, the resulting diseases pose constant threats to human health.Surveillance systems are key to addressing uncertainties in disease detection, monitoring, and control for influenza.
However, due to disparities in reporting mechanisms, daily data are often noisy, subjected to delay in reporting, and unreliable. Traditional surveillance occurs passively when symptomatic cases report their symptoms to the healthcare systems. As an example, the state of Michigan works with three different sources to collect data from symptomatic cases: (1) the Michigan Syndromic Surveillance System (MSSS), which collects daily chief complaint data from individual registrations in emergency departments (EDs) and urgent cares, (2) the Michigan component of the Centers for Disease Control and Prevention (CDC) ILINet (influenza like illness network), which collects weekly aggregates from primary and urgent care practitioners,
and (3) the MSSS, which captures weekly totals and individual case reports from EDs, primary, and urgent care providers, as well as any other type of individual submission (eg, from schools or flu testing labs). Although non‐traditional data sources have been proposed for influenza surveillance (from Google search queries, Twitter messages, Facebook, Wikipedia article reviews, restaurant reservations, non‐prescription pharmacy sales, etc.), there remain challenges in system validation and implementation,
which render traditional surveillance as the most reliable source of uncertainty management.Current research on influenza surveillance points at real‐time influenza incidences forecasting as an ideal tool for uncertainty management, and resource planning during influenza outbreaks.
A handful of forecasting studies are presented in the literature, where forecasts vary in their estimation method, timeframe, settings, outcomes, and data types used, and are reviewed in their scope by Chretien et al
and Nsoesie et al.
Since the 2013/2014 influenza season, the CDC has organized a challenge where modeling groups are allowed to submit real‐time weekly forecasts of influenza incidences. A recent cross‐validation study compares the performance of 22 of the models producing forecasts for the CDC challengeDue to the wide variability across seasons, and the unpredictable nature of pandemic outbreaks, several modeling groups have accommodated uncertainty in their prediction models, a feature that is becoming increasingly prevalent in the more recent models with demonstrated real‐time usefulness. There exist Bayesian networks with probabilistic nodes,
,
empirical Bayes,
mechanistic models coupled with Bayesian‐based filtering procedures to assimilate data,
dynamic Bayesian hierarchical models embedding a mechanistic model,
kernel density estimation to model autoregressive dependency on previous observations,
and kernel density estimation of incidence distributions for the forward looking weeks of the season, with copulas to model the dependence in incidence across different weeks.
There are also superensemble methods that combine the predictions of multiple models into a single estimate. Most of these superensembles also incorporate probabilistic components.
,
,Using a continuous time SIR (susceptible‐infected‐recovery) modeling framework and simulated data, Rhodes et al
provided a strategy for detecting the start date of an epidemic and predicting the behavior of the epidemic on a daily scale. Roberts et al
also used an SIR modeling framework to estimate the prevalence and incidence during epidemics and its peak. Lytras et al
used a flexible Bayesian model to predict probabilities for five distinct phases during an epidemic, such as, pre‐epidemic, epidemic growth, epidemic plateau, epidemic decline, and post‐epidemic. While predicting epidemics and understanding its behavior is paramount, very little work has been done on how to get a hold on the behavior of influenza viruses on a daily basis, utilizing the resources available to health departments.Most of the existing forecasting methods are designed to predict on a weekly basis.
,
,
This is the case since publicly available data sources (eg, influenza incidences Sentinel Network and Google Flu trends) either accumulate case counts by week or need to refine their daily counts as more data become available. In addition, many methods are made for seasonal flu, in which the weekly resolution may suffice for planning purposes. The situation is different with pandemic outbreaks; daily forecasts are an ideal feature to support decisions with high cost‐benefit trade‐off where daily incidence rates may grow exponentially, and daily forecasts can be helpful to timely detect and stop chains of transmission (eg, by closing schools and workplaces) that would be harder to control if observed on a weekly basis. Out of the four models in the literature that provide forecasts on a daily basis, three assumed data that were collected daily and it included all daily cases of the outbreak analyzed.
,
,
Some improvements are presented in Jiang et al,
which assumed incomplete daily counts of data, and used an additional data trend (daily ER chief complaints of respiratory symptoms) to complete the daily estimate. However, we were unable to find articles that fuse data with multiple time resolutions.Our work focuses on a tool for dynamic downscaling from weekly to daily rates using data resources that are already available to the state health departments, and provides one‐day and seven‐day forecasts of influenza incidences that county and state health departments can use for timely detection of fast growing transmission chains. Dynamic downscaling is a numerical procedure where data collected at coarser time resolution (eg, weekly scale) are used to interpolate at a finer resolution (eg, daily scale). Dynamic downscaling methods are popular in regional climate predictions based on information at a coarser scale from general circulation models.
However, it has been hardly applied for disease predictions and forecasts.Our modeling framework is purely data‐driven and requires little to no strong assumptions that are often needed for SIR/SEIR models. In addition to using prior seasonal outbreak data to predict the incidence in upcoming flu seasons, our model fuses data collected in real‐time, which broadens the applicability of our approach to single‐event pandemic outbreaks.Our model has several interesting features and it fuses two different data sources: (1) weekly confirmed flu cases from CDC, as reported based on the World Health Organization (WHO) and National Respiratory Enteric Virus Surveillance Systems (NREVSS), (2) MSSS data on daily reported symptoms aggregated in two categories: constitutional symptoms (ie, fever, headache, malaise, fatigue, and diarrhea), and respiratory symptoms (ie, shortness of breath or difficulty breathing, cough, sore throat, and runny nose). Constitutional and respiratory symptoms can characterize other diseases, but are a strong indicator of influenza like illness (ILI) when appearing concurrently. We assume that the true underlying daily influenza incidence trend is a latent variable that cannot be observed, but these datasets provide information as predictors. Hybrid Markov Chain Monte Carlo (MCMC)
and Kalman filter algorithms
are developed for model fitting. Thus, the true underlying daily influenza incidence trend is some type of shrinkage estimate between the weekly CDC rates and daily respiratory and constitutional symptoms reported.The remainder of this article is organized as follows: Section 2 gives details on our datasets; Section 3
provides the modeling framework; Section 4 presents the results along with model evaluation and sensitivity analysis; Sections 5 and 6 present our discussion and conclusions.
DATA
Weekly incidence data were retrieved from the CDC, which aggregates the reports from the NREVSS for the state of Michigan, and the WHO (https://www.who.int). The NREVSS receives data on influenza incidence cases tested for influenza and reported laboratory confirmed cases of influenza from a network of clinics and hospitals, and then reports these both unweighted and weighted against the state's population. The WHO's Global Influenza Surveillance and Response System (GISRS), a global network of laboratories that examine influenza focused disease trends and viral surveillance, also contributes to the NREVSS (influenza incidence cases tested and confirmed influenza cases).On the daily scale, IRB exempted MSSS data were obtained from the Michigan Department of Health and Human Services (MDHHS). The MSSS data included constitutional and respiratory symptoms of patients attending EDs in Michigan from 1 January 2009 until 25 December 2013. The MSSS system automatically captures descriptive features of symptoms from the chief complaint data in the EDs, and classifies them into symptom categories, being constitutional and respiratory a couple of the categories used for classification. Hence, not all the reports in either the constitutional or respiratory categories correspond to an influenza incidence case. In addition, there exists underreporting, as just around 61% of the Michigan hospitals were participating in the MSSS at the time of data retrieval (exact numbers of hospitals participating were not provided by the MDHSS and underreporting rates could not be calculated). However, this information plays an important predictor for estimating the daily influenza incidences.
BAYESIAN DYNAMIC DOWNSCALING MODEL
Combining disparate data sources to improve estimates at a granular level is widely popular in survey statistics and small area estimation. For example, Rose et al
combined small area census records and household microdata via an iterative proportional algorithm to find refined estimates for infant mortality and household attributes (such as the percentage of house with electricity, tap water etc.) in Bangladesh provinces. Multilevel regressions and poststratification techniques were used by Zhang et al
on Behavioral Risk Factor Surveillance System data, Missouri County‐Level Study, and American Community Survey data to obtain county‐level estimates of health indicators such as cardio obstructive pulmonary disease and uninsured rates. Spatial downscaling approaches using regression‐based models that are widely used in Geosciences also incorporate data fusion techniques based on exogenous variables.
,
,
However, it is imperative to keep in mind that validity and sensitivity analyses based on scientific reasoning are essential to avoid bias and “garbage in and garbage out” scenarios. In what follows, we describe a Bayesian dynamic downscaling and nowcasting procedure based on data fusion and semiparametric approaches.Our analyses were conducted in log‐transformed scale, which means that all rates were modeled as log rates. We start with some notations. Let denote the influenza incidence for the th day of the th week. This quantity is latent, as we do not know this from the available data, and our goal is to obtain this from weekly CDC influenza incidence daily constitutional rates and daily respiratory rates We also provide a mechanism for nowcasting the influenza incidence. We adopt Bayesian methods for their natural benefits of hierarchical modeling, forecasting, and prediction intervals. The quintessential structure of our hierarchical models is comprised of four levels and prior distributions for parameters as:
where is a vector of coefficients that maps weekly observations to latent daily influenza incidences, a priori we assume that it is uniformly distributed across the week and set the values to and denotes prior distributions for all unknown parameters estimated in the model.At the top level of our modeling framework, we link the weekly CDC rates with downscaled daily influenza incidences as:
where is a matrix with elements of daily influenza incidences during a week and the white noise, follows a zero mean Gaussian distribution with varianceWe assume Bayesian Structural Time Series (BSTS) models
,
,
,
for and The BSTS model decomposes a time series into trend, autoregression, seasonal component, cyclical component, and white noise. Specifically, we consider a structural time series model in the state space form as:
where represents the th day in the th week; is the unobserved underlying trend of the time series that evolves through an autoregressive process; is a slope term characterized by a random walk distribution, which controls a steady upward or downward movement of the time series and imposes a drift into the structural model for the trend; denotes the seasonal component, where is the number of seasons for , and , , and are independent zero‐mean Gaussian white noise processes with variance parameters , , , and , respectively. Note that we excluded the cyclical component from the BSTS model in Equation (2). From our datasets, we could observe that influenza outbreaks exhibit strong seasonal variations but no cyclical/periodic variations out of regular season.At the next level of our hierarchical modeling, we characterize the relation between and the mean respiratory rates and constitutional rates as:
where is an unknown function of the trends and seasonal components of constitutional and respiratory symptoms, which will be specified semiparametrically using basis functions as:
where is the number of knots, and are selected knots on the bivariate domain that control the smoothness of the function . The knots were selected using exploratory analyses based on a Clara algorithm,
where we consider a uniform distribution of CDC daily confirmed rates by dividing the weekly rates with an initial value of seven. The Clara algorithm selected the knots based on a two‐dimensional surface of the observed daily time series of constitutional and respiratory rates via restricted maximum likelihood estimation using connections between penalized splines and linear mixed models. Clara algorithms are available in R packages SemiPar
and Cluster.
White noise, follows a zero mean Gaussian distribution with varianceWe use Markov Chain Monte Carlo methods and Kalman filter techniques to fit our proposed model to the data. Under our model assumptions, follows a univariate Gaussian distribution with mean and variance and follows a multivariate Gaussian distribution with mean vector and covariance matrix Here is a vector of function values computed over the th week. The full conditional distribution for generating is multivariate Gaussian with mean vector of dimension and covariance matrix of dimension are obtained as:
Thus, the downscaled daily ILI trends are shrinkage estimates of the function and Note that the function is a function of constitutional and respiratory rates after distilling their measurement errors.We impose improper priors on the real line for all variance parameters (), that resulted in full conditional distributions for as proper inverse‐gamma with shape parameter and scale parameter where denotes the vector of all daily downscaled influenza incidences over our study period. Similarly, in our MCMC algorithm, was generated from an inverse‐gamma distribution with rate parameter and scale parameter where is the number of weeks.Let denote the coefficients in the semi‐parametric regression; we impose a multivariate Gaussian prior with mean vector and covariance matrix on The full conditional distribution for results in a Gaussian distribution with mean vector and covariance matrix as:
The full conditional distributions for mean respiratory rates and constitutional rates are unavailable in closed form due to the nonlinearity in the semiparametric regression, Equation (7). We use a hybrid method that comprises a Kalman filter and a Metropolis‐Hastings step to sample these parameters. Now we focus on the Bayesian Kalman filter algorithm that is widely used for BSTS models.We rewrite Equations (2)‐(4) for constitutional and respiratory rates as:
where the state‐space parameters are
is the number of seasons, and is a zero‐mean multivariate Gaussian random vector of dimension with diagonal covariance matrixIn Equation (8), is , where and is a vector with the first element 1, and the remaining elements are 0. The matrix is of dimension and can be written as where the matrix and the matrix are:
In Equation (9), the matrix can be written as Diag with being an identity matrix of rank 2 and is an ‐dimensional vector with the first element one and the remaining entries zero.A Kalman filter algorithm,
is used assuming the initial state follows a Gaussian distribution with mean and covariance . Let denote the observations up to time as . Denoting , , and , the derived Kalman‐Filter equations are:
where
and . The state vector is sampled from a Gaussian distribution with mean and covariance matrix . This distribution is considered as a proposal distribution for the Metropolis‐Hastings step to sample the trend and seasonal components
, , , respectively. Pseudocode of the algorithms is provided. For details, see Algorithms 1 and 2.
RESULTS
Real data analysis
Recall that our primary goal is obtaining influenza incidence trends on a daily scale, based on weekly CDC reported influenza incidence trends and MDHHS reported daily constitutional and respiratory trends. We also obtain prediction intervals for the downscaled influenza incidences.Since we fitted a semiparametric model, the nonparametric function estimation involves a large number of parameters. In Table 1, we only showed the posterior estimates of parameters from the linear portion of function and all variance parameters. These parameters were summarized in the table using posterior medians, means, and credible intervals. According to our model, the function is the a priori mean of the downscaled ILI rates on log scale. In Figure 1, we plotted the posterior median of function with respect to log‐transformed respiratory and constitutional rates. From this plot, we see an overall increasing trend of the mean surface of daily downscaled ILI rates with constitutional and respiratory rates.
TABLE 1
Posterior medians, means, and credible intervals (LB: lower bound and UB: upper bound) for model parameters
Parameters
Median
Mean
95% CrI LB
95% CrI UB
β0
−9.12
−9.11
−10.09
−8.09
β1
2.26
2.26
2.12
2.41
β2
0.44
0.44
0.26
0.6
σz
0.24
0.12
1.9 × 10−4
0.25
σy
0.27
0.27
0.26
0.28
σϵ,C
0.05
0.05
0.04
0.07
ση,C
0.06
0.06
4.9 × 10−3
0.06
σν,C
1.9 × 10−4
3.9 × 10−4
9.2 × 10−5
4.3 × 10−3
σξ,C
0.015
0.02
2.1×10−4
0.07
σϵ,R
0.04
0.04
0.03
0.08
ση,R
0.07
0.07
4.3×10−3
0.08
σν,R
2.2 × 10−4
4.1 × 10−4
1.01 × 10−4
3.5 × 10−3
σξ,R
2.6 × 10−3
7.6 × 10−3
1.6 × 10−4
0.04
FIGURE 1
Posterior median of function with respect to constitutional and respiratory rates
Posterior median of function with respect to constitutional and respiratory ratesPosterior medians, means, and credible intervals (LB: lower bound and UB: upper bound) for model parametersFigure 2 shows four time series, three of which are on daily scale and one on weekly scale. The green and greenish blue lines indicate daily reported respiratory and constitutional log‐transformed rates per 100 000 population. The red dots indicate weekly log‐transformed CDC influenza incidences per 100 000 population. From these three time series, one can clearly see the seasonal patterns. Also, the 2009 H1N1 pandemic is clearly observed. The Pearson's correlations between weekly influenza incidences and weekly constitutional and respiratory rates are 0.81 and 0.68, respectively. These high correlation values confirm that constitutional and respiratory rates are strong predictors of influenza incidences. Whenever there is a peak in weekly influenza incidences, we also see peaks for constitutional and respiratory rates.
FIGURE 2
Analyses were conducted in log‐scale. Downscaled daily log influenza incidences are denoted by black lines, red dots denote the log CDC reported weekly influenza incidences, the gray shades indicate prediction intervals, green lines indicate log daily respiratory rates, and greenish blue lines indicate log daily constitutional rates
Analyses were conducted in log‐scale. Downscaled daily log influenza incidences are denoted by black lines, red dots denote the log CDC reported weekly influenza incidences, the gray shades indicate prediction intervals, green lines indicate log daily respiratory rates, and greenish blue lines indicate log daily constitutional ratesThe black lines in Figure 2 indicate the posterior medians of downscaled daily influenza incidences, with gray‐shaded prediction intervals. The downscaled time series follow similar patterns of weekly CDC rates. During the 2009 pandemic, there are many confirmed cases in a typical week. For example, based on our downscaled influenza incidence trends, between 12 October 2009 and 28 November 2009, 43 741 population got affected with an average of 1121 people per day. During a regular flu season, between 1 January 2012 and 30 April 2012, 25 084 population got affected with an average of 207 people per day. Note that our downscaled series have several desirable properties: (1) they provide some degree of shrinkage, and (2) they are more stable and less noisy. If we look closely at the time period between 1 January 2013 and 1 March 2013, we see that the respiratory rates are substantially high; however, our estimates are balanced by the other two time series. Our estimates are impacted less by the noise present in the respiratory data during that time period.We set aside the last 15 days of data and predicted for those days using the remaining data from 1805 days. Our predicted daily influenza incidence trends (blue lines) and the associated prediction intervals (gray shades) are shown in Figure 3. The weekly CDC data is also shown (red dots). Comparing the CDC data with the predictions, we can clearly see that our predictions follow similar patterns. Additionally, the widths of prediction intervals ensure that the model is able to capture the uncertainties associated with the data.
FIGURE 3
Analyses were conducted in log‐scale. Downscaled daily log influenza incidences are denoted by black lines, red dots denote the CDC reported weekly log influenza incidences, the gray shades indicate prediction intervals, green lines indicate daily log respiratory rates, and greenish blue lines indicate daily log constitutional rates. The blue lines indicate 14 day predictions for log influenza incidences
Analyses were conducted in log‐scale. Downscaled daily log influenza incidences are denoted by black lines, red dots denote the CDC reported weekly log influenza incidences, the gray shades indicate prediction intervals, green lines indicate daily log respiratory rates, and greenish blue lines indicate daily log constitutional rates. The blue lines indicate 14 day predictions for log influenza incidencesSince there is no gold standard to validate our model, we conducted sensitivity analysis by setting aside portions of data, and cross‐validation by comparing downscaled influenza incidences with full data and with partial data.In Figure 4A, we set aside data for 90 days from 1 January 2011 through 1 April 2011. The orange lines show the downscaled influenza incidences with partial data and the black lines show the downscaled influenza incidences with complete data. The red dots are weekly CDC confirmed cases. We can easily see that both downscaled time series are close enough to assure us that the model developed based on the three time series provides meaningful dynamic downscaling and predictions. To further ensure, we also set aside data from peaks and dips of the time series, obtained downscaled influenza incidence trends from those missing portions and compared them with downscaled influenza incidences with full data. Figure 4B singles out a dip in the time frame of June through the end of August 2011, and Figure 4C does the same for a peak in the range from January through the end of June 2012. In each case, we find that the predictions show meaningful trends as we compare them with CDC data.
FIGURE 4
Analyses were conducted in log scale. In all figures, log downscaled daily influenza incidences are denoted by dark purple lines, red dots denote the CDC reported weekly log influenza incidences, the gray shades indicate prediction intervals, green lines indicate daily log respiratory rates, and greenish blue lines indicate daily log constitutional rates. The orange lines indicate the predicted daily log influenza incidences for the withheld time period
Analyses were conducted in log scale. In all figures, log downscaled daily influenza incidences are denoted by dark purple lines, red dots denote the CDC reported weekly log influenza incidences, the gray shades indicate prediction intervals, green lines indicate daily log respiratory rates, and greenish blue lines indicate daily log constitutional rates. The orange lines indicate the predicted daily log influenza incidences for the withheld time period
Simulation results
Since we do not have daily influenza incidence rates in the real dataset, we simulated these rates and use them for model fitting. We generated daily constitutional and respiratory rates using the BSTS model described in Equations (2)‐(5). The generated respiratory and constitutional cases were used in the Equation (6) to get daily cases. Then, weekly rates were generated by adding the daily cases over a week and dividing by population sizes. We simulated 30 unique datasets where we set the variance parameters as follows: , , , , , , , , , The terms and terms were set at , , . These initial values of and were set as the observed values of constitutional rates and respiratory rates at time 0. The level and slope parameters
, and were chosen from one realization of MCMC based on real data after burn‐in. The noise parameters were chosen to avoid the overlap in the seasonal perturbations of constitutional and respiratory rates.We fitted our models to these 30 unique simulated datasets, each over days. The boxplot in Figure 5 shows the summary of mean squared errors (MSE) from these 30 simulated datasets in log rates per million population. The MSE was calculated using the differences between posterior medians and simulated daily ILI rates. Figure 6 shows the results of six out of 30 simulated datasets, where estimated (posterior medians) daily log influenza incidence rates (in black) closely resemble the simulated log daily rates (in red). These figures clearly indicate reasonable model fitting with acceptable uncertainty estimates measured by prediction intervals. We also use posterior predictive checks to assess model fitting.
FIGURE 5
Boxplot of mean squared error from 30 simulated datasets
FIGURE 6
Results from 6 simulations out of 30 simulations done. Black line: Estimated downscaled daily log influenza incidences per 100 000 million. Red dotted line: Simulated log daily influenza incidences per 100 000 million. Gray shades: prediction intervals. Green line: Simulated log daily respiratory rates. Greenish blue line: Simulated log daily constitutional rates
Boxplot of mean squared error from 30 simulated datasetsResults from 6 simulations out of 30 simulations done. Black line: Estimated downscaled daily log influenza incidences per 100 000 million. Red dotted line: Simulated log daily influenza incidences per 100 000 million. Gray shades: prediction intervals. Green line: Simulated log daily respiratory rates. Greenish blue line: Simulated log daily constitutional rates
DISCUSSION
In this article, we present a dynamic downscaling method for estimating daily influenza incidences from weekly data. These estimates can support preparedness by the state and county health systems during flu emergencies. In the United States, the healthcare burden for flu is high and the county health departments are at the forefront to get their communities ready by providing enough vaccines, monitoring severity by tracking hospitalization and death rates, and creating awareness. The granularity of the method in this article allows identifying the peaks and will help in channeling appropriate resources whenever necessary. Bayesian hierarchical modeling
provides a natural platform for combining information from disparate data sources in a coherent mathematical framework. Semiparametric modeling
boosts the flexibility that was needed to characterize the unknown relations between the daily observed respiratory and constitutional symptoms, and the actual influenza incidence. Markov Chain Monte Carlo
and Kalman filter
based hybrid algorithms provided the efficiency for better convergence and continuous updating of the estimates as new data become available on a daily basis. Posterior distribution based inferences rely on the shrinkage between the data and the prior beliefs of the unknown parameters. We selected noninformative priors for robust Bayesian inference. Additionally, the prediction intervals provide the uncertainty associated with the estimates.Our algorithm contributes to timely and accurate information that the data collection systems are currently unable to provide due to delays in submission of testing results to the electronic reporting systems at the state level. In the Michigan case, the electronic reporting system is the MDSS, which expects information from healthcare providers in real time. It is well known, however, that due to the variety of testing methods and processing criteria, queuing priorities for testing, and exacerbated demand for testing during influenza emergencies, submission of confirmed cases can take up to 2 weeks.
Therefore, daily trends of influenza incidence are currently observed weekly and corrected as more confirmed specimens are submitted to the system. Our algorithm is suitable to accommodate the time resolution from weekly to daily, which corrects for the operational delays in the submission.From our work, we found that MDHHS constitutional and respiratory trends are good proxies for estimating benchmark daily trends of influenza incidence. There are several advantages in the use of constitutional and respiratory trends: They are retrieved with a one‐day delay only, and they are based on reported symptoms and not on confirmatory testing, which makes them independent from the specificity, sensitivity, timeliness, and variability of the testing infrastructure built in each reporting healthcare provider. Although the constitutional and respiratory trends present underreporting, our work shows that they are meaningful predictors contributing their fluctuation patterns in the estimation of the daily influenza incidence trend.In this article, a semi‐parametric function was estimated to fuse the daily constitutional and respiratory trends. We found that the shrinkage contribution associated with the constitutional trends in the basis function was higher than the contribution of the respiratory trends. This finding is justifiable as the respiratory trends might be subject to more noise during the influenza seasons than the constitutional. The EDs receive many patients with respiratory symptoms from non‐flu related diseases (eg, heart failure, chronic obstructive pulmonary disease, asthma, acute coronary syndrome, and arrhythmia). In contrast, the constitutional burden might be mostly attributed to flu and flu related diseases during the influenza seasons.Our modeling framework considers that the constitutional and respiratory trends are proxies of influenza activity, which is a reasonable assumption during the flu seasons or during an outbreak from a recognized causative flu strain. During flu seasons, however, the trends are only helpful to track ILI, and determining which virus is driving the increasing trend is a recognized challenge.Constitutional and respiratory trends might also work to track an outbreak caused by a respiratory virus, as long as the outbreak does not overlap with a flu related case surge. Previous research demonstrates how increasing ILI trends were heavily correlated with the respiratory syncytial virus outbreak in Washington State during December 2019, and with the SARS‐CoV‐2 outbreak in the United States during March 2020.
CONCLUSION
We developed a dynamic downscaling method for estimating daily influenza incidences from weekly data. Despite the assumptions and limitations described in the discussion section, our work possesses several strengths: (1) our method is based on two powerful data sources, which are also widely and easily available to the MDHHS. Most of the state health departments in the U.S. have an established infrastructure to retrieve influenza incidence trends like the constitutional and respiratory. In addition, the CDC's weekly trends of cases confirmed with a flu virus are publicly available; (2) since we used structured surveillance data/systems to build our framework, public health officials are better suited to question and validate the results of the down‐scaled trends, if our algorithm is implemented; (3) our fast algorithms facilitates updated predictions as the new data become available; and (4) our modeling framework has the ability to capture adaptive seasonal effects.When infrastructure limits the capacity of testing, EDs report constitutional and respiratory cases that are useful proxies for estimating surges and the true incidences during epidemics. At the time of writing this article, it is unknown how SARS‐CoV‐2 and flu will interact during flu season. However, our model accounts for symptoms that are common in both diseases, and hence our model contributes to the groundwork that is needed to understand such interactions in the future.
Authors: Evan L Ray; Krzysztof Sakrejda; Stephen A Lauer; Michael A Johansson; Nicholas G Reich Journal: Stat Med Date: 2017-09-14 Impact factor: 2.373
Authors: Diana M Prieto; Tapas K Das; Alex A Savachkin; Andres Uribe; Ricardo Izurieta; Sharad Malavade Journal: BMC Public Health Date: 2012-03-30 Impact factor: 3.295
Authors: Benjamin M Althouse; Samuel V Scarpino; Lauren Ancel Meyers; John W Ayers; Marisa Bargsten; Joan Baumbach; John S Brownstein; Lauren Castro; Hannah Clapham; Derek At Cummings; Sara Del Valle; Stephen Eubank; Geoffrey Fairchild; Lyn Finelli; Nicholas Generous; Dylan George; David R Harper; Laurent Hébert-Dufresne; Michael A Johansson; Kevin Konty; Marc Lipsitch; Gabriel Milinovich; Joseph D Miller; Elaine O Nsoesie; Donald R Olson; Michael Paul; Philip M Polgreen; Reid Priedhorsky; Jonathan M Read; Isabel Rodríguez-Barraquer; Derek J Smith; Christian Stefansen; David L Swerdlow; Deborah Thompson; Alessandro Vespignani; Amy Wesolowski Journal: EPJ Data Sci Date: 2015-10-16 Impact factor: 3.184
Authors: Nicholas G Reich; Logan C Brooks; Spencer J Fox; Sasikiran Kandula; Craig J McGowan; Evan Moore; Dave Osthus; Evan L Ray; Abhinav Tushar; Teresa K Yamana; Matthew Biggerstaff; Michael A Johansson; Roni Rosenfeld; Jeffrey Shaman Journal: Proc Natl Acad Sci U S A Date: 2019-01-15 Impact factor: 11.205