| Literature DB >> 31662788 |
Yaoqin Lu1, Huan Yan2, Lijiang Zhang3, Jiwen Liu1.
Abstract
Occupational disease is a huge problem in China, and many workers are under risk. Accurate forecasting of occupational disease incidence can provide critical information for prevention and control. Therefore, in this study, five hybrid algorithm combing models were assessed on their effectiveness and applicability to predict the incidence of occupational diseases in China. The five hybrid algorithm combing models are the combination of five grey models (EGM, ODGM, EDGM, DGM, and Verhulst) and five state-of-art machine learning models (KNN, SVM, RF, GBM, and ANN). The quality of the models were assessed based on the accuracy of model prediction as well as minimizing mean absolute percentage error (MAPE) and root-mean-squared error (RMSE). Our results showed that the GM-ANN model provided the most precise prediction among all the models with lowest mean absolute percentage error (MAPE) of 3.49% and root-mean-squared error (RMSE) of 1076.60. Therefore, the GM-ANN model can be used for precise prediction of occupational diseases in China, which may provide valuable information for the prevention and control of occupational diseases in the future.Entities:
Mesh:
Year: 2019 PMID: 31662788 PMCID: PMC6791229 DOI: 10.1155/2019/8159506
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
Figure 1The incidence of occupational diseases in China from 2005 to 2017. The dashed line indicates the first 2/3 of the data used as the training set, and the solid line indicates the last 1/3 of the data used as the testing set. The Y-axis represents the number of occupational diseases, and the X-axis represents the time series.
The models, programming languages, libraries, and parameter adjustments used in this study.
| Models | Programming languages | Libraries | Parameters |
|---|---|---|---|
| GM | R (version 3.6.1) | Self-compiled function | EGM |
| ODGM | |||
| EDGM | |||
| DGM | |||
| Verhulst | |||
|
| |||
| KNN | R (version 3.6.1) | kknn (version 1.3.1) |
|
| caret (version 6.0–81) | train.kknn() | ||
| kernel = inv | |||
|
| |||
| SVM | R (version 3.6.1) | e1071 (version 1.8–8) | Kernel |
|
| |||
| RF | R (version 3.6.1) | RandomForest (version 4.6–1.4) | mtry = 1 |
| ntree | |||
|
| |||
| GBM | R (version 3.6.1) | xgboost (version 0.82.1) | nrounds |
| colsample_bytree | |||
| min_child_weight | |||
| Eta | |||
| Gamma | |||
| Subsample | |||
| max_depth | |||
|
| |||
| ANN | R (version 3.6.1) | nnet (version 7.3–12) | Size |
| Decay | |||
Figure 2Flowchart of the hybrid method.
The fitted values of GM models.
| Year | Number of occupational diseases | EGM | EDGM | ODGM | DGM | Verhulst |
|---|---|---|---|---|---|---|
| 2005 | 12212 | 12212 | 12212 | 12212 | 12212 | 12212 |
| 2006 | 11805 | 14255 | 14268 | 14136 | 14415 | 14677 |
| 2007 | 14296 | 15805 | 15821 | 15700 | 15954 | 17261 |
| 2008 | 13744 | 17523 | 17543 | 17438 | 17658 | 19855 |
| 2009 | 18128 | 19429 | 19452 | 19368 | 19544 | 22345 |
| 2010 | 27240 | 21541 | 21569 | 21511 | 21631 | 24638 |
| 2011 | 29879 | 23883 | 23917 | 23892 | 23941 | 26668 |
| 2012 | 27420 | 26480 | 26519 | 26536 | 26498 | 28404 |
| 2013 | 26393 | 29359 | 29406 | 29473 | 29328 | 29845 |
| 2014 | 29972 | 32552 | 32606 | 32735 | 32460 | 31012 |
| 2015 | 27389 | 36091 | 36155 | 36358 | 35926 | 32663 |
| 2016 | 29838 | 40015 | 40089 | 40382 | 39763 | 33222 |
| 2017 | 25114 | 44366 | 44452 | 44851 | 44009 | 33649 |
Figure 3Comparison among real and fitted curves of different grey models for occupational diseases in China.
Accuracy of GM models.
| Model | ME | RMSE | MAE | MPE | MAPE |
|---|---|---|---|---|---|
| EGM_training | −194.98 | 3301.76 | 2721.9 | −4.14 | 13.02 |
| EGM_testing | −12710.28 | 13539.22 | 12710.28 | −47.51 | 47.51 |
| EDGM_training | −222.41 | 3303.21 | 2729.17 | −4.26 | 13.07 |
| EDGM_testing | −12785.03 | 13612.4 | 12785.03 | −47.79 | 47.79 |
| ODGM_training | −191.14 | 3303.79 | 2711.09 | −3.99 | 12.85 |
| ODGM_testing | −13083.32 | 13918.45 | 13083.32 | −48.89 | 48.89 |
| DGM_training | −255.17 | 3305.4 | 2748.96 | −4.56 | 13.32 |
| DGM_testing | −12452.21 | 13271.52 | 12452.21 | −46.56 | 46.56 |
| Verhulst_training | −1582.72 | 3212.63 | 2745.38 | −11.26 | 15.32 |
| Verhulst_testing | −5730.92 | 6113.1 | 5730.92 | −21.53 | 21.53 |
Figure 4Comparison among real and fitted curves of GM-KNN models.
Figure 5Comparison among real and fitted curves of GM-SVM models.
Figure 6Comparison among real and fitted curves of hybrid models.
Prediction accuracy of hybrid models.
| Models | Parameter | Group | ME | RMSE | MAE | MPE | MAPE |
|---|---|---|---|---|---|---|---|
| GM-KNN |
| Training | 240.70 | 1197.26 | 556.50 | 0.74 | 2.32 |
| Testing | 5634.33 | 9151.44 | 6487.00 | 20.17 | 23.57 | ||
| kernel = inv | Training | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| Testing | 6305.41 | 9155.53 | 7479.90 | 22.21 | 26.89 | ||
|
| |||||||
| GM-SVM | kernel = linear | Training | 1055.45 | 3388.72 | 2422.38 | 1.71 | 11.25 |
| Testing | −7738.91 | 8587.02 | 7738.91 | −29.16 | 29.16 | ||
| kernel = polynomial | Training | 731.33 | 2742.63 | 1970.80 | 1.33 | 8.94 | |
| Testing | 280.26 | 1573.30 | 1280.50 | 0.78 | 4.45 | ||
| kernel = radial | Training | −11.44 | 863.23 | 805.92 | −1.04 | 4.43 | |
| Testing | 3964.53 | 4693.51 | 3964.53 | 14.10 | 14.10 | ||
| kernel = sigmoid | Training | 1333.48 | 5934.06 | 3859.30 | 4.08 | 17.64 | |
| Testing | −2810.06 | 3422.28 | 2810.06 | −10.79 | 10.79 | ||
|
| |||||||
| GM-RF | mtry = 1 | Training | 212.67 | 1317.38 | 1174.73 | −0.45 | 6.02 |
| ntree = 30 | Testing | −804.74 | 2090.13 | 1862.25 | −3.44 | 6.99 | |
|
| |||||||
| GM-GBM | nrounds = 100 | Training | 5.27 | 418.30 | 365.87 | −0.23 | 1.86 |
| colsample_bytree = 1 | |||||||
| min_child_weight = 1 | |||||||
| eta = 0.1 | Testing | −1833.39 | 2661.27 | 2205.13 | −7.21 | 8.45 | |
| max_depth = 3 | |||||||
| Subsample = 0.5 | |||||||
| Gamma = 0.5 | |||||||
|
| |||||||
| GM–ANN | Size = 5 | Training | −3.29 |
| 12.03 | −0.01 |
|
| decay = 1 | Testing | −222.60 |
| 914.97 | −1.04 |
| |
Note. ME: mean error; MAE: mean absolute error; MPE: mean percentage error; MAPE: mean absolute percentage error.