| Literature DB >> 23409145 |
Kim M Pepin1, Jia Wang, Colleen T Webb, Jennifer A Hoeting, Mary Poss, Peter J Hudson, Wenshan Hong, Huachen Zhu, Yi Guan, Steven Riley.
Abstract
An ability to forecast the prevalence of specific subtypes of avian influenza viruses (AIV) in live-bird markets would facilitate greatly the implementation of preventative measures designed to minimize poultry losses and human exposure. The minimum requirement for developing predictive quantitative tools is surveillance data of AIV prevalence sampled frequently over several years. Recently, a 4-year time series of monthly sampling of hemagglutinin subtypes 1-13 in ducks, chickens and quail in live-bird markets in southern China has become available. We used these data to investigate whether a simple statistical model, based solely on historical data (variables such as the number of positive samples in host X of subtype Y time t months ago), could accurately predict prevalence of H5 and H9 subtypes in chickens. We also examined the role of ducks and quail in predicting prevalence in chickens within the market setting because between-species transmission is thought to occur within markets but has not been measured. Our best statistical models performed remarkably well at predicting future prevalence (pseudo-R(2) = 0.57 for H9 and 0.49 for H5), especially considering the multi-host, multi-subtype nature of AIVs. We did not find prevalence of H5/H9 in ducks or quail to be predictors of prevalence in chickens within the Chinese markets. Our results suggest surveillance protocols that could enable more accurate and timely predictive statistical models. We also discuss which data should be collected to allow the development of mechanistic models.Entities:
Mesh:
Year: 2013 PMID: 23409145 PMCID: PMC3567063 DOI: 10.1371/journal.pone.0056157
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Models of H9 that include H9 data in other hosts compared to the “best” model selected.
| Model | AIC |
|
|
|
|
| 211.9 | 0 | 52.0 | NA |
|
| 213.4 | 0.01 | 52.2 | 30.8 |
|
| 211.5 | 0.06 | 56.3 | 3.8 |
|
| 213.2 | 0.07 | 59.2 | 3.2 |
|
| 191.8 | 0.57 | 30.0 | 0.4 |
Cragg & Uhler’s method: (1-(L0/Lm)2/N )/1-L0 2/N; L = likelihood; 0 = intercept-only model; m = full model; N = number of data points.
Mean Squared Prediction Error: sum(y-m)2/N; smaller values indicate better fits; y = observed data; m = mean of predicted data; N = number of points predicted.
Normalized Mean Squared Prediction Error: sum((y-m)/s)2/N; smaller values indicate better fits; y = observed data; m = mean of predicted data; s = standard deviation of predicted data; N = number of points predicted.
Best model covariates: H4 prevalence in all hosts, H6 prevalence in all hosts, H6 prevalence in quail, H9 prevalence in quail, H5 prevalence in all hosts one month in the past.
Covariates in other models: DKH9 = H9 prevalence in ducks, QAH9 = H9 prevalence in quail, DKH9+QAH9 = sum of DKH9 and QAH9.
Figure 1Model fits for H9.
Data at the prevalence of H9 per 100 chickens sampled. Data were modeled by negative binomial regression with a log link. “Best” is the set of covariates that were selected by AIC: allH4, allH6, QAH6, QAH9, allH5t-1, where “all” is the prevalence of subtype HX in all 3 host species (CK+DK+QA), “QA” is for prevalence in only quail, “DK” is for prevalence in only duck, and t-1 is the prevalence in the previous month.
Parameter estimates for the best model of H9 in chickens.
| Covariate | Estimate | SE | P |
|
| 1.24 | 0.15 | <0.0001 |
|
| 0.28 | 0.15 | 0.059 |
|
| −0.56 | 0.29 | 0.058 |
|
| 1.05 | 0.29 | 0.0003 |
|
| 0.24 | 0.13 | 0.064 |
|
| 0.41 | 0.13 | 0.0021 |
Model covariates: allH4 = H4 prevalence in all hosts, allH6 = H6 prevalence in all hosts, QAH6 = H6 prevalence in quail, QAH9 = H9 prevalence in quail, allH5t-1 = H5 prevalence in all hosts one month in the past.
Figure 2Forecasts with the best model for H9.
The model was fit (red) on the first 3 years of data (black). Forecasts are shown for the fourth year of data using 3 methods: 1) Forecasting the full 12 months of data (blue), 2) Iterative fitting and forecasting where additional data were included at each step (SxS A, purple), and 3) Iterative fitting and forecasting using a sliding window where model parameters were always estimated from 36 months of data (SxS B, green). B-D show an alternative way of viewing the fits. B shows the fit of the model and C and D show the fit of the forecasted points using the two best methods (SxS A (C) and SxS B (D)).
Evaluation of best model of H9 in chickens.
| Method | MSPE | NMSPE |
|
| 30.0 | 0.4 |
|
| 24.5 | 1.9 |
|
| 23.1 | 1.8 |
|
| 24.1 | 1.6 |
In-sample data are for the fitted model. Other methods are described in Figure 2. Note that MSPE emphasizes deviations from larger peaks. The in-sample data show poorer performance relative to the forecasts since the first 3 years of data contained several much larger peaks than the last year of data.
Mean Squared Prediction Error (MSPE): sum(y-m)2/N; smaller values indicate better fits; y = observed data; m = mean of predicted data; N = number of points predicted.
Normalized Mean Squared Prediction Error (NMSPE): sum((y-m)/s)2/N; smaller values indicate better fits; y = observed data; m = mean of predicted data; s = standard deviation of predicted data; N = number of points predicted.
Models of H5 that include H5 data in other hosts compared to the “best” model selected.
| Model | AIC |
|
|
|
|
| 71.8 | 0 | 1.7 | NA |
|
| 71.4 | 0.14 | 1.7 | 111.1 |
|
| 74.1 | 0.06 | 1.7 | 64.1 |
|
| 73.1 | 0.20 | 1.7 | 39.5 |
|
| 60.9 | 0.49 | 0.8 | 0.8 |
Column statistics are by the same methods as described in Table 1.
Cragg & Uhler’s method: (1-(L0/Lm)2/N )/1-L0 2/N; L = likelihood; 0 = intercept-only model; m = full model; N = number of data points.
Mean Squared Prediction Error: sum(y-m)2/N; smaller values indicate better fits; y = observed data; m = mean of predicted data; N = number of points predicted.
Normalized Mean Squared Prediction Error: sum((y-m)/s)2/N; smaller values indicate better fits; y = observed data; m = mean of predicted data; s = standard deviation of predicted data; N = number of points predicted.
Best model covariates: Maximum windspeed, H9 prevalence in ducks one month in the past.
Covariates in other models: DKH5 = H5 prevalence in ducks, QAH5 = H5 prevalence in quail, DKH5+QAH5 = sum of DKH5 and QAH5.
Figure 3Model fits for H5.
Data are the prevalence of H5 per 100 chickens sampled. Data were modeled by zero-inflated negative binomial regression with a log link on the abundance component. “Best” is the set of covariates that were selected by AIC: maximum wind speed and DKH9t-1, where “DK” is for prevalence in ducks, and t-1 is the prevalence in the previous month.
Parameter estimates for the best model of H5 in chickens.
| Component | Covariate | Estimate | SE | P |
|
| ||||
|
| −0.96 | 0.58 | 0.099 | |
|
| 1.04 | 0.43 | 0.017 | |
|
| 0.47 | 0.20 | 0.022 | |
|
| ||||
|
| −2.81 | 8.63 | 0.74 | |
|
| −0.60 | 0.94 | 0.53 | |
|
| −8.81 | 17.08 | 0.61 | |
The zero-inflated negative binomial model is a mixture of two separate data generation processes (i.e., model “components”): one to describe zeros (binomial) and the other to describe counts from a negative binomial model.
DKH9t-1 = H9 prevalence in ducks one month in the past.
Figure 4Forecasts with the best model for H5.
The model was fit (red) on the first 3 years of data (black). Forecasts are shown for the fourth year of data using 3 methods: 1) Forecasting the full 12 months of data (blue), 2) Iterative fitting and forecasting where additional data were included at each step (SxS A, purple), and 3) Iterative fitting and forecasting using a sliding window where model parameters were always estimated from 36 months of data (SxS B, green). B-D show an alternative way of viewing the fits. B shows the fit of the model and C and D show the fit of the forecasted points using the two best methods (SxS A (C) and SxS B (D)).
Evaluation of best model of H5 in chickens.
| Method | MSPE | NMSPE |
|
| 0.8 | 0.8 |
|
| 2.9 | 1.1 |
|
| 2.9 | 1.1 |
|
| 0.1 | 8.0 |
In-sample data are for the fitted model. Other methods are described in Figure 2. Column statistics are by the same methods as described in Table 3.
Mean Squared Prediction Error (MSPE): sum(y-m)2/N; smaller values indicate better fits; y = observed data; m = mean of predicted data; N = number of points predicted.
Normalized Mean Squared Prediction Error (NMSPE): sum((y-m)/s)2/N; smaller values indicate better fits; y = observed data; m = mean of predicted data; s = standard deviation of predicted data; N = number of points predicted.