| Literature DB >> 30771890 |
Toluwalope Ajayi1, Rozita Dara2, Zvonimir Poljak3.
Abstract
Porcine Epidemic Diarrhea Virus (PEDV) emerged in North America in 2013. The first case of PEDV in Canada was identified on an Ontario farm in January 2014. Surveillance was instrumental in identifying the initial case and in minimizing the spread of the virus to other farms. With recent advances in predictive analytics showing promise for health and disease forecasting, the primary objective of this study was to apply machine learning predictive methods (random forest, artificial neural networks, and classification and regression trees) to provincial PEDV incidence data, and in so doing determine their accuracy for predicting future PEDV trends. Trend was defined as the cumulative number of new cases over a four-week interval, and consisted of four levels (zero, low, medium and high). Provincial PEDV incidence and prevalence estimates from an industry database, as well as temperature, humidity, and precipitation data, were combined to create the forecast dataset. With 10-fold cross validation performed on the entire dataset, the overall accuracy was 0.68 (95% CI: 0.60 - 0.75), 0.57 (95% CI: 0.49 - 0.64), and 0.55 (0.47 - 0.63) for the random forest, artificial neural network, and classification and regression tree models, respectively. Based on the cross-validation approach to evaluating predictive accuracy, the random forest model provided the best prediction.Entities:
Keywords: Artificial neural networks; Classification and regression trees; Disease forecasting; Disease surveillance; Porcine epidemic diarrhea; Random forest
Mesh:
Year: 2019 PMID: 30771890 PMCID: PMC7125872 DOI: 10.1016/j.prevetmed.2019.01.005
Source DB: PubMed Journal: Prev Vet Med ISSN: 0167-5877 Impact factor: 2.670
PEDV trend classification performance on the test dataset for Ontario PEDV and weather data from January 2014 - April 2017 (171 observations - 70% of the dataset allocated for model training and 30% allocated for model testing).
| Prediction | Kappa | 95% CI | Overall Accuracy | 95% CI | ||||
|---|---|---|---|---|---|---|---|---|
| Reference | High | Medium | Low | Zero | ||||
| training set performance - random forest model with 30 co-variates | ||||||||
| High | 9 | 5 | 0 | 0 | 0.50 | 0.38 – 0.62 | 0.64 | 0.55 – 0.72 |
| Medium | 3 | 17 | 7 | 3 | ||||
| Low | 0 | 7 | 23 | 9 | ||||
| Zero | 0 | 2 | 8 | 29 | ||||
|
| ||||||||
| High | 12 | 1 | 1 | 0 | 0.88 | 0.80 – 0.95 | 0.91 | 0.84 – 0.95 |
| Medium | 0 | 28 | 2 | 0 | ||||
| Low | 0 | 1 | 35 | 3 | ||||
| Zero | 0 | 0 | 3 | 36 | ||||
|
| ||||||||
| High | 11 | 0 | 3 | 0 | 0.73 | 0.63 – 0.82 | 0.80 | 0.72 – 0.87 |
| Medium | 1 | 22 | 6 | 1 | ||||
| Low | 2 | 4 | 33 | 0 | ||||
| Zero | 0 | 3 | 4 | 32 | ||||
PEDV trend classification performance on the test dataset for Ontario PEDV and weather data from January 2014 - April 2017 (171 observations - 70% of the dataset allocated for model training and 30% allocated for model testing).
| Prediction | Kappa | 95% CI | Overall Accuracy | 95% CI | ||||
|---|---|---|---|---|---|---|---|---|
| Reference | High | Medium | Low | Zero | ||||
| test set performance - random forest model with 30 co-variates | ||||||||
| High | 3 | 1 | 1 | 0 | 0.60 | 0.38 – 0.62 | 0.71 | 0.57 – 0.83 |
| Medium | 1 | 9 | 2 | 0 | ||||
| Low | 1 | 0 | 11 | 4 | ||||
| Zero | 0 | 1 | 3 | 12 | ||||
|
| ||||||||
| High | 4 | 0 | 1 | 0 | 0.66 | 0.49 – 0.82 | 0.76 | 0.61 – 0.87 |
| Medium | 1 | 7 | 2 | 2 | ||||
| Low | 1 | 0 | 12 | 3 | ||||
| Zero | 0 | 0 | 2 | 14 | ||||
|
| ||||||||
| High | 2 | 2 | 1 | 0 | 0.23 | *0.00 – 0.35 | 0.45 | 0.31 – 0.60 |
| Medium | 2 | 4 | 3 | 3 | ||||
| Low | 1 | 8 | 4 | 3 | ||||
| Zero | 0 | 5 | 2 | 9 | ||||
*truncated to 0.
PEDV trend classification performance for Ontario PEDV and weather data from January 2014 - April 2017 (171 observations), with randomly allocated training and test sets (10-fold cross validation).
| Prediction | Kappa | 95% CI | Overall Accuracy | 95% CI | ||||
|---|---|---|---|---|---|---|---|---|
| Reference | High | Medium | Low | Zero | ||||
| 10-fold cross validation performance - random forest model with 30 co-variates | ||||||||
| High | 11 | 7 | 1 | 0 | 0.55 | 0.45 – 0.65 | 0.68 | 0.60 – 0.75 |
| Medium | 2 | 33 | 5 | 2 | ||||
| Low | 1 | 9 | 34 | 11 | ||||
| Zero | 0 | 1 | 16 | 38 | ||||
|
| ||||||||
| High | 9 | 6 | 1 | 3 | 0.39 | 0.29 – 0.49 | 0.57 | 0.49 – 0.64 |
| Medium | 3 | 21 | 9 | 9 | ||||
| Low | 1 | 8 | 25 | 21 | ||||
| Zero | 0 | 2 | 11 | 42 | ||||
|
| ||||||||
| High | 9 | 7 | 2 | 1 | 0.38 | 0.27 – 0.48 | 0.55 | 0.47 – 0.63 |
| Medium | 2 | 25 | 11 | 4 | ||||
| Low | 3 | 13 | 29 | 10 | ||||
| Zero | 1 | 7 | 16 | 31 | ||||
PEDV trend classification diagnostics for Ontario PEDV and weather data from January 2014 - April 2017, for the 30% test set and randomly allocated training and test sets (10-fold cross validation).
| High | Medium | Low | Zero | |
|---|---|---|---|---|
| diagnostics – random forest model with 30 co-variates | ||||
| Sensitivity | 0.60 | 0.75 | 0.69 | 0.75 |
| Specificity | 0.95 | 0.95 | 0.82 | 0.88 |
|
| ||||
| Sensitivity | 0.80 | 0.58 | 0.75 | 0.88 |
| Specificity | 0.95 | 1.00 | 0.85 | 0.85 |
|
| ||||
| Sensitivity | 0.40 | 0.25 | 0.50 | 0.56 |
| Specificity | 0.93 | 0.81 | 0.67 | 0.82 |
|
| ||||
| Sensitivity | 0.58 | 0.79 | 0.62 | 0.69 |
| Specificity | 0.98 | 0.87 | 0.81 | 0.89 |
|
| ||||
| Sensitivity | 0.47 | 0.50 | 0.46 | 0.76 |
| Specificity | 0.97 | 0.88 | 0.82 | 0.72 |
| 10-fold cross validation diagnostics – classification tree model with 30 co-variates | ||||
| Sensitivity | 0.47 | 0.60 | 0.53 | 0.56 |
| Specificity | 0.96 | 0.79 | 0.75 | 0.87 |
Fig. 1Boxplots of sensitivity and specificity of PEDV trend classification across models, with 10-fold cross validation applied to the Ontario PEDV and weather dataset for January 2014 - April 2017.
Fig. 2Variable importance plot for the random forest classification model with 30 co-variates (PEDV long-term prediction), as applied to the Ontario PEDV and weather dataset for January 2014 - April 2017. The x-axis represents the decrease in predictive accuracy once this variable has been omitted from the random forest model, with longer bars representing a larger loss in accuracy, therefore indicating the variable is of higher importance in predicting trends in the number of new cases. Variables are further colored based on whether they are related to environmental factors or level of PEDV infection.