| Literature DB >> 35029707 |
Abstract
Lumpy skin disease virus (LSDV) causes an infectious disease in cattle. Due to its direct relationship with the survival of arthropod vectors, geospatial and climatic features play a vital role in the epidemiology of the disease. The objective of this study was to assess the ability of some machine learning algorithms to forecast the occurrence of LSDV infection based on meteorological and geological attributes. Initially, ExtraTreesClassifier algorithm was used to select the important predictive features in forecasting the disease occurrence in unseen (test) data among meteorological, animal population density, dominant land cover, and elevation attributes. Some machine learning techniques revealed high accuracy in predicting the LSDV occurrence in test data (up to 97%). In terms of area under curve (AUC) and F1 performance metric scores, the artificial neural network (ANN) algorithm outperformed other machine learning methods in predicting the occurrence of LSDV infection in unseen data with the corresponding values of 0.97 and 0.94, respectively. Using this algorithm, the model consisted of all predictive features and the one which only included meteorological attributes as important features showed similar predictive performance. According to the findings of this research, ANN can be used to forecast the occurrence of LSDV infection with high precision using geospatial and meteorological parameters. Applying the forecasting power of these methods could be a great help in conducting screening and awareness programs, as well as taking preventive measures like vaccination in areas where the occurrence of LSDV infection is a high risk.Entities:
Keywords: Forecasting; Geospatial features; Lumpy skin disease; Machine learning techniques; Meteorological parameters
Mesh:
Year: 2022 PMID: 35029707 PMCID: PMC8759057 DOI: 10.1007/s11250-022-03073-2
Source DB: PubMed Journal: Trop Anim Health Prod ISSN: 0049-4747 Impact factor: 1.559
Fig. 1Summary of steps taken in the materials and methods section
Fig. 2The distribution of reported LSDV infection points during 2011–2021
Fig. 3Reported LSDV infection outbreaks in each year during 2011–2021
The most important tuned parameters after implementing hyperparameter tuning for model 1
| Machine learning algorithm | Tuned parameters |
|---|---|
| Logistic regression | class_weight = {0: 50, 1: 50}, penalty = 'l1', solver = 'liblinear' |
| Support vector machine | kernel = 'poly', degree = 5, coef0 = 1, gamma = 'scale', class_weight = {0: 50, 1: 50} |
| Decision Tree | splitter = 'best', class_weight = {0: 25, 1: 75}, criterion = 'entropy', max_depth = 14 |
| Random forest (decision tree as base estimator) | n_estimators = 5000, min_samples_split = 2, bootstrap = True, max_leaf_nodes = 200, class_weight = {0: 30, 1: 70}, criterion = 'entropy', max_depth = 14 |
| AdaBoost (decision tree as base estimator) | n_estimators = 1000, algorithm = 'SAMME.R', learning_rate = 0.1 |
| Bagging (decision tree as base estimator) | warm_start = True, oob_score = False, n_estimators = 100, max_samples = 1000, max_features = 10, bootstrap = False |
| XGBoost | objective = 'binary:logistic', max_depth = 10, colsample_bytree = 1, eta = 0.01, gamma = 2, min_child_weight = 0.1, subsample = 0.6 |
| Artificial neural networks | Input dimension = 24, Total number of neurons = 70, number of hidden layers = 0, EarlyStopping(patience = 10), activation = "relu", solver = "adam", learning rate = 0.001, loss = "binary_crossentropy”, epochs = 200 |
The most important tuned parameters after implementing hyperparameter tuning for model 2
| Machine learning algorithm | Tuned parameters |
|---|---|
| Logistic regression | max_iter = 1000, class_weight = {0: 50, 1: 50}, penalty = 'l1', solver = 'liblinear' |
| Support vector machine | kernel = 'rbf', coef0 = 0, gamma = 'scale',class_weight = {0: 50, 1: 50} |
| Decision tree | splitter = 'best', class_weight = {0: 30, 1: 70}, criterion = 'entropy', max_depth = 13 |
| Random forest (decision tree as base estimator) | n_estimators = 5000, bootstrap = True, max_leaf_nodes = 300, class_weight = {0: 20, 1: 80}, criterion = 'entropy' |
| AdaBoost (decision tree as base estimator) | n_estimators = 700, algorithm = 'SAMME.R', learning_rate = 0.1 |
| Bagging (decision tree as base estimator) | warm_start = False, oob_score = True, n_estimators = 200, max_samples = 1000, max_features = 10, bootstrap = True |
| XGBoost | objective = 'binary:logistic', max_depth = 15, colsample_bytree = 1, eta = 0.2, gamma = 1.5, min_child_weight = 1, subsample = 0.6 |
| Artificial neural networks | Input dimension = 10, Total number of neurons = 80, number of hidden layers = 0, EarlyStopping(patience = 10), activation = "relu", solver = "adam", learning rate = 0.001, loss = "binary_crossentropy", epochs = 200 |
Comparative performance of various machine learning algorithms using two sets of predictors
| Logistic regression | Support vector machine | Decision tree | Random forest | AdaBoost | Bagging | XGBoost | Artificial neural networks | ||
|---|---|---|---|---|---|---|---|---|---|
| Model 1 | Accuracy score | 0.93 | 0.94 | 0.89 | 0.96 | 0.88 | 0.96 | 0.94 | 0.96 |
| Precision | 0.85 | 0.77 | 0.57 | 0.89 | 0.67 | 0.89 | 0.91 | 0.88 | |
| Recall | 0.48 | 0.66 | 0.34 | 0.71 | 0.07 | 0.74 | 0.50 | 1 | |
| f1 score | 0.61 | 0.71 | 0.43 | 0.79 | 0.13 | 0.81 | 0.65 | 0.94 | |
| AUC | 0.73 | 0.82 | 0.65 | 0.85 | 0.53 | 0.86 | 0.75 | 0.97 | |
| Model 2 | Accuracy score | 0.92 | 0.96 | 0.90 | 0.93 | 0.91 | 0.95 | 0.92 | 0.97 |
| Precision | 0.84 | 0.89 | 0.63 | 0.92 | 0.89 | 0.90 | 0.91 | 0.88 | |
| Recall | 0.41 | 0.73 | 0.37 | 0.45 | 0.27 | 0.63 | 0.39 | 1 | |
| f1 score | 0.55 | 0.80 | 0.46 | 0.61 | 0.42 | 0.74 | 0.54 | 0.94 | |
| AUC | 0.70 | 0.86 | 0.67 | 0.72 | 0.63 | 0.81 | 0.69 | 0.97 |
Fig. 4Receiver operating characteristic (ROC) curves of various machine learning algorithms for model 1 (including all predictors)
Fig. 5Receiver operating characteristic (ROC) curves of various machine learning algorithms for model 2 (including only predictive meteorological variables)