| Literature DB >> 33727985 |
Terence Fusco1, Yaxin Bi1, Haiying Wang1, Fiona Browne1.
Abstract
This research presents viable solutions for prediction modelling of schistosomiasis disease based on vector density. Novel training models proposed in this work aim to address various aspects of interest in the artificial intelligence applications domain. Topics discussed include data imputation, semi-supervised labelling and synthetic instance simulation when using sparse training data. Innovative semi-supervised ensemble learning paradigms are proposed focusing on labelling threshold selection and stringency of classification confidence levels. A regression-correlation combination (RCC) data imputation method is also introduced for handling of partially complete training data. Results presented in this work show data imputation precision improvement over benchmark value replacement using proposed RCC on 70% of test cases. Proposed novel incremental transductive models such as ITSVM have provided interesting findings based on threshold constraints outperforming standard SVM application on 21% of test cases and can be applied with alternative environment-based epidemic disease domains. The proposed incremental transductive ensemble approach model enables the combination of complimentary algorithms to provide labelling for unlabelled vector density instances. Liberal (LTA) and strict training approaches provided varied results with LTA outperforming Stacking ensemble on 29.1% of test cases. Proposed novel synthetic minority over-sampling technique (SMOTE) equilibrium approach has yielded subtle classification performance increases which can be further interrogated to assess classification performance and efficiency relationships with synthetic instance generation. © Springer-Verlag GmbH Germany, part of Springer Nature 2019.Entities:
Keywords: Data imputation; Disease prediction modelling; Incremental transductive approaches; SMOTE; Schistosomiasis; Synthetic data simulation
Year: 2019 PMID: 33727985 PMCID: PMC7224118 DOI: 10.1007/s13042-019-01029-x
Source DB: PubMed Journal: Int J Mach Learn Cybern
Fig. 1Prediction model flowchart
Fig. 2EO data sample
Common environment attributes
| Attribute | Name | Description | Reason For use |
|---|---|---|---|
| TC_B | Tasselled Cap Brightness | The Brightness value of a pixel in an image | (Band 1) Measure of soil |
| TC_G | Tasselled Cap Greenness | The Greenness value of a pixel in an image | (Band 2) Measure of vegetation |
| TC_W | Tasselled Cap Wetness | The Wetness value of a pixel in an image | (Band 3) Interrelationship of soil and canopy moisture |
| MNDWI | Modified NDWI | Modified NDWI uses (MIR) middle infra-red instead of (NIR) near infra-red remote sensing | Modified NDWI can enhance open water features while reducing land/vegetation/soil noise |
| NDMI | Normalised-Difference Moisture Index | Moisture Index | Used to assess whether the target being observed contains much soil moisture |
| NDVI | Normalised-Difference Vegetation Index | Vegetation Index | Used to analyse whether the target being observed contains live green vegetation |
| NDWI | Normalised-Difference Water Index | Water Index | Water index that uses near infra-red band. The image is not modified and can include many other factors which can confuse the reading |
Fig. 3ITSVM process
Collective data vector classification accuracy
| NB% | SVM% | J48% | MLP% | AVG% | |
|---|---|---|---|---|---|
| Cross-Val | 55.16 | 66.82 | 65.47 | 63.68 | 62.78 |
| 2003 | 59.26 | 85.19 | 85.19 | 88.89 | 79.63 |
| 2005 | 58.70 | 84.78 | 84.78 | 84.78 | 78.26 |
| 2007 | 50 | 54.55 | 54.55 | 54.55 | 53.41 |
| 2008 | 60.87 | 50 | 56.52 | 56.52 | 55.98 |
| 2009 | 41.67 | 66.67 | 66.67 | 66.67 | 60.42 |
MRMD results
| NB | SVM | J48 | MLP | LibLin | AVG% | |
|---|---|---|---|---|---|---|
| 2003 | 70.3704 | 85.1852 | 81.4815 | 77.7778 | 85.1852 | 80 |
| 2005 | 63.0435 | 84.7826 | 71.7391 | 82.6087 | 84.7826 | 77.39 |
| 2007 | 45.4545 | 54.5455 | 54.5455 | 54.5455 | 45.4545 | 50.91 |
| 2008 | 56.5217 | 50 | 54.3478 | 56.5217 | 47.8261 | 53.04 |
| 2009 | 58.3333 | 66.6667 | 63.3333 | 70 | 66.6667 | 65 |
Fig. 4Data imputation methods
Fig. 6LTA-STA comparative analysis results
Fig. 5ITEA comparison
SMOTE equilibrium results 2005
| NB | J48 | SVM | ||||
|---|---|---|---|---|---|---|
| Instances | Acc. (%) | F-M | Acc. (%) | F-M | Acc. (%) | F-M |
| 100 | 93.77 | 0.936 | 95.44 | 0.954 | 79.14 | 0.759 |
| 200 | 93.88 | 0.937 | 97.66 | 0.977 | 79.14 | 0.759 |
| 300 | 93.81 | 0.937 | 97.99 | 0.98 | 80.43 | 0.776 |
| 400 | 93.53 | 0.934 | 98.44 | 0.984 | 83.33 | 0.815 |
| 500 | 93.63 | 0.935 | 98.66 | 0.987 | 83.66 | 0.819 |
| 600 | 94.15 | 0.94 | 98.83 | 0.988 | 83.63 | 0.818 |
| 700 | 92.09 | 0.918 | 90.65 | 0.906 | 48.2 | 0.422 |
| 800 | 94.24 | 0.941 | 90.65 | 0.906 | 78.42 | 0.751 |
| 900 | 93.92 | 0.938 | 99.04 | 0.99 | 84.09 | 0.824 |
| 1000 | 94.17 | 0.94 | 99.28 | 0.993 | 84.89 | 0.834 |
Fig. 7SMOTE classifier results
SMOTE equilibrium results 2007
| NB | J48 | SVM | ||||
|---|---|---|---|---|---|---|
| Instances | Acc. (%) | F-M | Acc. (%) | F-M | Acc. (%) | F-M |
| 100 | 71.21 | 0.685 | 81.82 | 0.817 | 39.18 | 0.35 |
| 200 | 74.24 | 0.711 | 87.12 | 0.871 | 44.85 | 0.35 |
| 300 | 73.23 | 0.699 | 90.91 | 0.909 | 45.7 | 0.365 |
| 400 | 74.62 | 0.715 | 93.56 | 0.936 | 42.03 | 0.42 |
| 500 | 74.96 | 0.717 | 96.23 | 0.962 | 47.76 | 0.406 |
| 600 | 74.97 | 0.718 | 96.48 | 0.965 | 48.05 | 0.412 |
| 700 | 75.43 | 0.726 | 96.75 | 0.968 | 50.29 | 0.44 |
| 800 | 75.76 | 0.731 | 96.97 | 0.97 | 52.78 | 0.47 |
| 900 | 75.93 | 0.733 | 97.56 | 0.976 | 54.55 | 0.497 |
| 1000 | 76.72 | 0.742 | 97.73 | 0.977 | 60.69 | 0.555 |
SMOTE equilibrium results 2008
| NB | J48 | SVM | ||||
|---|---|---|---|---|---|---|
| Instances | Acc. (%) | F-M | Acc. (%) | F-M | Acc. (%) | F-M |
| 100 | 68.12 | 0.673 | 77.54 | 0.773 | 70.71 | 0.691 |
| 200 | 67.39 | 0.666 | 88.41 | 0.885 | 73.74 | 0.726 |
| 300 | 67.39 | 0.666 | 88.89 | 0.889 | 76.1 | 0.754 |
| 400 | 69.2 | 0.685 | 93.48 | 0.935 | 75.6 | 0.748 |
| 500 | 69.28 | 0.685 | 94.35 | 0.943 | 76.5 | 0.759 |
| 600 | 71.01 | 0.704 | 96.86 | 0.969 | 76.24 | 0.756 |
| 700 | 73.05 | 0.724 | 96.6 | 0.966 | 77.02 | 0.764 |
| 800 | 72.79 | 0.721 | 97.3 | 0.973 | 76.56 | 0.759 |
| 900 | 72.76 | 0.721 | 97.28 | 0.973 | 76.11 | 0.754 |
| 1000 | 71.65 | 0.709 | 97.55 | 0.975 | 77 | 0.764 |
SMOTE equilibrium results 2003
| NB | J48 | SVM | ||||
|---|---|---|---|---|---|---|
| Instances | Acc. (%) | F-M | Acc. (%) | F-M | Acc. (%) | F-M |
| 100 | 89.74 | 0.897 | 89.74 | 0.897 | 84.31 | 0.835 |
| 200 | 92.95 | 0.929 | 93.59 | 0.936 | 92.65 | 0.925 |
| 300 | 93.59 | 0.935 | 94.87 | 0.949 | 92 | 0.919 |
| 400 | 94.87 | 0.948 | 96.79 | 0.968 | 94.61 | 0.945 |
| 500 | 95.13 | 0.951 | 97.18 | 0.972 | 94.51 | 0.944 |
| 600 | 94.02 | 0.939 | 95.51 | 0.955 | 95.1 | 0.95 |
| 700 | 96.52 | 0.965 | 97.44 | 0.974 | 94.92 | 0.95 |
| 800 | 96.63 | 0.966 | 97.92 | 0.979 | 95.45 | 0.954 |
| 900 | 96.58 | 0.966 | 97.86 | 0.979 | 95.49 | 0.954 |
| 1000 | 96.92 | 0.969 | 97.82 | 0.978 | 95.63 | 0.956 |
SMOTE equilibrium results 2009
| NB | J48 | SVM | ||||
|---|---|---|---|---|---|---|
| Instances | Acc. (%) | F-M | Acc. (%) | F-M | Acc. (%) | F-M |
| 100 | 76.67 | 0.758 | 86.67 | 0.865 | 58.33 | 0.551 |
| 200 | 76.67 | 0.757 | 93.06 | 0.93 | 58.06 | 0.547 |
| 300 | 76.85 | 0.759 | 94.81 | 0.948 | 57.78 | 0.546 |
| 400 | 77.55 | 0.765 | 96.51 | 0.965 | 54.39 | 0.518 |
| 500 | 77.63 | 0.766 | 96.76 | 0.967 | 53.8 | 0.513 |
| 600 | 77.87 | 0.768 | 97.95 | 0.979 | 57.52 | 0.544 |
| 700 | 78.58 | 0.776 | 98.24 | 0.982 | 60.91 | 0.583 |
| 800 | 79.26 | 0.783 | 98.4 | 0.984 | 61.87 | 0.598 |
| 900 | 79.05 | 0.781 | 98.34 | 0.983 | 61.49 | 0.594 |
| 1000 | 78.77 | 0.778 | 97.95 | 0.98 | 61.14 | 0.589 |