| Literature DB >> 31253685 |
Benjamin J K Davis1,2, John M Jacobs3, Benjamin Zaitchik4, Angelo DePaola5, Frank C Curriero1,2.
Abstract
Vibrio parahaemolyticus is a leading cause of seafood-borne gastroenteritis. Given its natural presence in brackish waters, there is a need to develop operational forecast models that can sufficiently predict the bacterium's spatial and temporal variation. This work attempted to develop V. parahaemolyticus prediction models using frequently measured time-indexed and -lagged water quality measures. Models were built using a large data set (n = 1,043) of surface water samples from 2007 to 2010 previously analyzed for V. parahaemolyticus in the Chesapeake Bay. Water quality variables were classified as time indexed, 1-month lag, and 2-month lag. Tobit regression models were used to account for V. parahaemolyticus measures below the limit of quantification and to simultaneously estimate the presence and abundance of the bacterium. Models were evaluated using cross-validation and metrics that quantify prediction bias and uncertainty. Presence classification models containing only one type of water quality parameter (e.g., temperature) performed poorly, while models with additional water quality parameters (i.e., salinity, clarity, and dissolved oxygen) performed well. Lagged variable models performed similarly to time-indexed models, and lagged variables occasionally contained a predictive power that was independent of or superior to that of time-indexed variables. Abundance estimation models were less effective, primarily due to a restricted number of samples with abundances above the limit of quantification. These findings indicate that an operational in situ prediction model is attainable but will require a variety of water quality measurements and that lagged measurements will be particularly useful for forecasting. Future work will expand variable selection for prediction models and extend the spatial-temporal extent of predictions by using geostatistical interpolation techniques.IMPORTANCE Vibrio parahaemolyticus is one of the leading causes of seafood-borne illness in the United States and across the globe. Exposure often occurs from the consumption of raw shellfish. Despite public health concerns, there have been only sporadic efforts to develop environmental prediction and forecast models for the bacterium preharvest. This analysis used commonly sampled water quality measurements of temperature, salinity, dissolved oxygen, and clarity to develop models for V. parahaemolyticus in surface water. Predictors also included measurements taken months before water was tested for the bacterium. Results revealed that the use of multiple water quality measurements is necessary for satisfactory prediction performance, challenging current efforts to manage the risk of infection based upon water temperature alone. The results also highlight the potential advantage of including historical water quality measurements. This analysis shows promise and lays the groundwork for future operational prediction and forecast models.Entities:
Keywords: Chesapeake Bay; Tobit regression; Vibrio parahaemolyticuszzm321990; forecast; prediction; public health; temporal lags
Mesh:
Year: 2019 PMID: 31253685 PMCID: PMC6696964 DOI: 10.1128/AEM.01007-19
Source DB: PubMed Journal: Appl Environ Microbiol ISSN: 0099-2240 Impact factor: 4.792
FIG 1Map of Chesapeake Bay monitoring stations used for V. parahaemolyticus sampling as well as time-indexed and -lagged water quality measurements.
Descriptive characteristics of V. parahaemolyticus and water measurement variables, stratified by season
| Parameter | No. of samples | Median (IQR | |||
|---|---|---|---|---|---|
| Total | Uncensored | Overall ( | Season | ||
| Summer (July) ( | Autumn (October) ( | ||||
| 1,043 | 226 | 0.217 (226) | 0.232 (128) | 0.199 (98) | |
| 226 | 226 | 0.563 (0.258, 0.864) | 0.560 (0.257, 0.879) | 0.606 (0.265, 0.847) | |
| Water temp (°C) | |||||
| Time indexed | 1,043 | 226 | 25.3 (18.1, 27.3) | 27.3 (26.4, 28.1) | 17.9 (15.6, 20.7) |
| 1-mo lag | 1,017 | 216 | 24.8 (23.3, 26.3) | 25.5 (24.1, 27.1) | 23.6 (22.9, 24.8) |
| 2-mo lag | 994 | 209 | 21.7 (17.9, 26.9) | 18.0 (16.7, 27.7) | 27.0 (26.0, 28.3) |
| Salinity (‰) | |||||
| Time indexed | 1,042 | 226 | 12.0 (2.8, 17.1) | 14.3 (4.1, 18.3) | 10.7 (1.6, 14.9) |
| 1-mo lag | 987 | 216 | 10.8 (2.6, 16.3) | 8.8 (0.7, 13.5) | 13.7 (4.0, 18.3) |
| 2-mo lag | 968 | 209 | 9.4 (1.3, 14.6) | 7.7 (0.1, 12.0) | 12.2 (3.1, 16.4) |
| DO concn (mg/liter) | |||||
| Time indexed | 1,031 | 224 | 7.4 (6.5, 8.5) | 6.8 (6.0, 7.6) | 8.3 (7.3, 9.0) |
| 1-mo lag | 984 | 214 | 7.1 (6.3, 8.1) | 7.3 (6.3, 8.4) | 7.1 (6.3, 7.9) |
| 2-mo lag | 975 | 205 | 7.5 (6.5, 8.8) | 8.6 (7.7, 9.7) | 6.7 (6.0, 7.3) |
| Secchi disk depth (m) | |||||
| Time indexed | 1,021 | 222 | 0.8 (0.5, 1.3) | 0.8 (0.5, 1.1) | 1.0 (0.6, 1.5) |
| 1-mo lag | 1,006 | 215 | 0.8 (0.5, 1.2) | 0.7 (0.5, 1.0) | 0.8 (0.5, 1.3) |
| 2-mo lag | 982 | 205 | 0.8 (0.5, 1.2) | 0.7 (0.4, 1.1) | 0.8 (0.5, 1.2) |
IQR, interquartile range.
FIG 2(A) Nonlinear association of 1-month-lagged water temperature and V. parahaemolyticus (Vp) abundance in autumn with a knot set at 21°C. Note that all samples taken below this threshold had quantifiable levels of V. parahaemolyticus. (B) Association of 1-month-lagged salinity and V. parahaemolyticus abundance in summer stratified by quartiles of 2-month-lagged salinity.
Cross-validation results for presence of V. parahaemolyticus, using water temperature, salinity, DO, and Secchi disk depth lagged at 0, 1, and 2 months
| Season and model no. | Lag or model | Median (IQR | |||||
|---|---|---|---|---|---|---|---|
| AUC | AUC 2.5% | Optimal threshold | Accuracy | Sensitivity | Specificity | ||
| Summer | |||||||
| 1 | Lag 0 | 0.853 (0.834, 0.870) | 0.780 (0.755, 0.804) | 0.721 (0.694, 0.759) | 0.776 (0.744, 0.801) | 0.752 (0.702, 0.793) | 0.857 (0.800, 0.914) |
| 2 | Lag 1 + lag 2 | 0.847 (0.826, 0.868) | 0.775 (0.749, 0.802) | 0.676 (0.645, 0.717) | 0.769 (0.731, 0.795) | 0.736 (0.686, 0.791) | 0.857 (0.800, 0.914) |
| 3 | Lag 0 + lag 1 + lag 2 | 0.866 (0.850, 0.884) | 0.800 (0.777, 0.824) | 0.754 (0.704, 0.800) | 0.795 (0.756, 0.821) | 0.777 (0.711, 0.826) | 0.857 (0.800, 0.914) |
| 4 | Model 1 + covariates | 0.860 (0.840, 0.879) | 0.789 (0.762, 0.814) | 0.797 (0.762, 0.835) | 0.788 (0.762, 0.812) | 0.765 (0.728, 0.803) | 0.877 (0.833, 0.909) |
| 5 | Model 2 + covariates | 0.853 (0.833, 0.878) | 0.784 (0.759, 0.815) | 0.719 (0.678, 0.791) | 0.770 (0.729, 0.806) | 0.745 (0.686, 0.808) | 0.867 (0.789, 0.909) |
| 6 | Model 3 + covariates | 0.865 (0.844, 0.883) | 0.798 (0.771, 0.823) | 0.792 (0.741, 0.851) | 0.792 (0.752, 0.819) | 0.776 (0.718, 0.828) | 0.844 (0.793, 0.903) |
| Autumn | |||||||
| 1 | Lag 0 | 0.832 (0.812, 0.854) | 0.753 (0.725, 0.780) | 0.723 (0.680, 0.792) | 0.746 (0.687, 0.813) | 0.722 (0.639, 0.833) | 0.846 (0.731, 0.923) |
| 2 | Lag 1 + lag 2 | 0.821 (0.794, 0.843) | 0.736 (0.700, 0.762) | 0.782 (0.738, 0.839) | 0.739 (0.679, 0.804) | 0.713 (0.630, 0.824) | 0.846 (0.731, 0.885) |
| 3 | Lag 0 + lag 1 + lag 2 | 0.852 (0.830, 0.872) | 0.773 (0.744, 0.803) | 0.814 (0.779, 0.864) | 0.761 (0.724, 0.806) | 0.741 (0.676, 0.806) | 0.865 (0.808, 0.923) |
| 4 | Model 1 + covariates | 0.801 (0.777, 0.829) | 0.702 (0.672, 0.738) | 0.735 (0.686, 0.795) | 0.744 (0.683, 0.790) | 0.726 (0.636, 0.802) | 0.818 (0.720, 0.875) |
| 5 | Model 2 + covariates | 0.828 (0.801, 0.855) | 0.735 (0.700, 0.769) | 0.899 (0.835, 0.941) | 0.808 (0.762, 0.845) | 0.821 (0.752, 0.876) | 0.750 (0.696, 0.826) |
| 6 | Model 3 + covariates | 0.838 (0.813, 0.865) | 0.751 (0.719, 0.786) | 0.894 (0.832, 0.938) | 0.794 (0.744, 0.833) | 0.792 (0.719, 0.855) | 0.800 (0.739, 0.874) |
AUC, area under curve of the receiving operating characteristic.
Lower bound of the 95% confidence interval from bootstrapped AUC.
IQR, interquartile range.
Covariates include the additional environmental variables described in Table S1 in the supplemental material.
The probability of quantification of V. parahaemolyticus (≥1 GE/ml).
2010 forecast results for presence of V. parahaemolyticus, using water temperature, salinity, DO, and Secchi disk depth lagged at 0, 1, and 2 months
| Season and model no. | Lag or model | AUC | AUC 2.5% | Optimal threshold | Accuracy | Sensitivity | Specificity |
|---|---|---|---|---|---|---|---|
| Summer | |||||||
| 1 | Lag 0 | 0.814 | 0.726 | 0.756 | 0.729 | 0.603 | 0.921 |
| 2 | Lag 1 + lag 2 | 0.736 | 0.632 | 0.872 | 0.729 | 0.828 | 0.579 |
| 3 | Lag 0 + lag 1 + lag 2 | 0.777 | 0.679 | 0.867 | 0.750 | 0.793 | 0.684 |
| 4 | Model 1 + covariates | 0.811 | 0.725 | 0.966 | 0.785 | 0.893 | 0.622 |
| 5 | Model 2 + covariates | 0.710 | 0.603 | 0.857 | 0.656 | 0.607 | 0.730 |
| 6 | Model 3 + covariates | 0.785 | 0.693 | 0.895 | 0.720 | 0.679 | 0.784 |
| Autumn | |||||||
| 1 | Lag 0 | 0.822 | 0.741 | 0.617 | 0.761 | 0.722 | 0.838 |
| 2 | Lag 1 + lag 2 | 0.802 | 0.718 | 0.649 | 0.752 | 0.764 | 0.730 |
| 3 | Lag 0 + lag 1 + lag 2 | 0.812 | 0.733 | 0.649 | 0.688 | 0.542 | 0.973 |
| 4 | Model 1 + covariates | 0.821 | 0.734 | 0.581 | 0.769 | 0.761 | 0.784 |
| 5 | Model 2 + covariates | 0.807 | 0.718 | 0.877 | 0.806 | 0.944 | 0.541 |
| 6 | Model 3 + covariates | 0.786 | 0.693 | 0.706 | 0.741 | 0.761 | 0.703 |
AUC, area under curve of the receiving operating characteristic.
Lower bound of the 95% confidence interval from bootstrapped AUC.
Covariates include the additional environmental variables described in Table S1 in the supplemental material.
The probability of quantification of V. parahaemolyticus (≥1 GE/ml).
Prediction results for log10 abundance of V. parahaemolyticus (conditional expectation) using water temperature, salinity, DO, and Secchi disk depth lagged at 0, 1, and 2 months
| Season and model no. | Lag or model | Random cross-validation | 2010 forecast | ||||
|---|---|---|---|---|---|---|---|
| RMSE | MPSE | CV- | RMSE | MPSE | CV- | ||
| Summer | |||||||
| 1 | Lag 0 | 0.409 (0.380, 0.439) | 0.240 (0.215, 0.269) | −0.069 (−0.242, 0.059) | 0.625 | 0.171 | −1.025 |
| 2 | Lag 1 + lag 2 | 0.449 (0.404, 0.496) | 0.259 (0.235, 0.283) | −0.292 (−0.491, −0.129) | 0.682 | 0.172 | −1.412 |
| 3 | Lag 0 + lag 1 + lag 2 | 0.422 (0.390, 0.450) | 0.333 (0.299, 0.372) | −0.145 (−0.335, 0.021) | 0.639 | 0.215 | −1.115 |
| 4 | Model 1 + covariates | 0.350 (0.321, 0.381) | 0.379 (0.334, 0.416) | 0.299 (0.131, 0.427) | 0.652 | 0.276 | −1.146 |
| 5 | Model 2 + covariates | 0.413 (0.377, 0.459) | 0.365 (0.327, 0.410) | −0.001 (−0.195, 0.169) | 0.699 | 0.288 | −1.467 |
| 6 | Model 3 + covariates | 0.411 (0.378, 0.444) | 0.418 (0.373, 0.469) | 0.032 (−0.158, 0.195) | 0.669 | 0.343 | −1.258 |
| Autumn | |||||||
| 1 | Lag 0 | 0.354 (0.319, 0.391) | 0.188 (0.167, 0.210) | −0.482 (−0.677, −0.263) | 0.418 | 0.124 | −0.607 |
| 2 | Lag 1 + lag 2 | 0.330 (0.290, 0.390) | 0.261 (0.227, 0.314) | −0.224 (−0.802, −0.007) | 0.372 | 0.182 | −0.276 |
| 3 | Lag 0 + lag 1 + lag 2 | 0.326 (0.290, 0.383) | 0.322 (0.282, 0.374) | −0.231 (−0.753, 0.031) | 0.356 | 0.286 | −0.169 |
| 4 | Model 1 + covariates | 0.379 (0.331, 0.421) | 0.255 (0.229, 0.284) | −0.510 (−0.731, −0.231) | 0.415 | 0.215 | −0.589 |
| 5 | Model 2 + covariates | 0.343 (0.303, 0.388) | 0.334 (0.291, 0.379) | −0.222 (−0.520, 0.019) | 0.379 | 0.295 | −0.326 |
| 6 | Model 3 + covariates | 0.351 (0.317, 0.385) | 0.374 (0.332, 0.425) | −0.284 (−0.613, −0.047) | 0.443 | 0.258 | −0.810 |
RMSE, root mean square error.
MPSE, mean prediction standard error.
CV-R2, cross-validation R2.
Data represent the median (interquartile range).
Covariates include additional environmental variables described in Table S1 in the supplemental material.
Prediction results for log10 abundance of V. parahaemolyticus (unconditional expectation) using water temperature, salinity, DO, and Secchi disk depth lagged at 0, 1, and 2 months
| Season and model no. | Lag or model | Random cross-validation | 2010 forecast | ||||
|---|---|---|---|---|---|---|---|
| RMSE | MPSE | CV- | RMSE | MPSE | CV- | ||
| Summer | |||||||
| 1 | Lag 0 | 0.323 (0.313, 0.334) | 0.108 (0.099, 0.115) | 0.221 (0.155, 0.294) | 0.472 | 0.114 | 0.299 |
| 2 | Lag 1 + lag 2 | 0.340 (0.326, 0.354) | 0.122 (0.115, 0.131) | 0.145 (0.070, 0.213) | 0.496 | 0.117 | 0.224 |
| 3 | Lag 0 + lag 1 + lag 2 | 0.325 (0.312, 0.337) | 0.149 (0.140, 0.161) | 0.220 (0.131, 0.293) | 0.476 | 0.146 | 0.286 |
| 4 | Model 1 + covariates | 0.301 (0.289, 0.314) | 0.161 (0.149, 0.172) | 0.366 (0.287, 0.430) | 0.484 | 0.179 | 0.285 |
| 5 | Model 2 + covariates | 0.324 (0.308, 0.340) | 0.165 (0.152, 0.177) | 0.272 (0.195, 0.337) | 0.514 | 0.197 | 0.194 |
| 6 | Model 3 + covariates | 0.315 (0.302, 0.329) | 0.179 (0.167, 0.193) | 0.303 (0.221, 0.380) | 0.498 | 0.224 | 0.243 |
| Autumn | |||||||
| 1 | Lag 0 | 0.295 (0.283, 0.307) | 0.098 (0.092, 0.104) | 0.061 (−0.023, 0.141) | 0.348 | 0.078 | 0.152 |
| 2 | Lag 1 + lag 2 | 0.298 (0.282, 0.325) | 0.132 (0.121, 0.146) | 0.036 (−0.220, 0.155) | 0.326 | 0.107 | 0.259 |
| 3 | Lag 0 + lag 1 + lag 2 | 0.292 (0.276, 0.311) | 0.152 (0.140, 0.164) | 0.072 (−0.087, 0.195) | 0.316 | 0.172 | 0.300 |
| 4 | Model 1 + covariates | 0.301 (0.289, 0.315) | 0.138 (0.128, 0.147) | 0.056 (−0.035, 0.143) | 0.333 | 0.109 | 0.232 |
| 5 | Model 2 + covariates | 0.295 (0.281, 0.312) | 0.162 (0.151, 0.175) | 0.098 (−0.031, 0.205) | 0.314 | 0.172 | 0.320 |
| 6 | Model 3 + covariates | 0.296 (0.282, 0.311) | 0.179 (0.167, 0.192) | 0.096 (−0.029, 0.199) | 0.329 | 0.159 | 0.250 |
RMSE, root mean square error.
MPSE, mean prediction standard error.
CV-R2, cross-validation R2.
Data represent the median (interquartile range).
Covariates include additional environmental variables described in Table S1 in the supplemental material.
FIG 3Geographic variation of classification performance for summer cross-validation (A), autumn cross-validation (B), summer 2010 forecast (C), and autumn 2010 forecast (D). Model 3 (index + lagged) results are shown for each map. Unsampled sites in 2010 are not displayed.