| Literature DB >> 36052051 |
Mohammed J Abdulaal1,2, Ibrahim M Mehedi1,2, Abdulah Jeza Aljohani1,2, Ahmad H Milyani1, Mohamed Mahmoud3, Abdullah M Abusorrah1, Rahtul Jannat4.
Abstract
A combination of environmental conditions may cause skin illness everywhere on the earth, and it is one of the most dangerous diseases that can develop as a result. A major goal in the selection of characteristics is to produce predictions about skin disease instances in connection with influencing variables, which is one of the most important tasks. As a consequence of the widespread usage of sensors, the amount of data collected in the health industry is disproportionately large when compared to data collected in other sectors. In the past, researchers have used a variety of machine learning algorithms to determine the relationship between illnesses and other disorders. Forecasting is a procedure that involves many steps, the most important of which are the preprocessing of any scenario and the selection of forecasting features. A major disadvantage of doing business in the health industry is a lack of data availability, which is particularly problematic when data is provided in an unstructured format. Filling in missing numbers and converting between various types of data take somewhat more than 70% of the total time. When dealing with missing data in machine learning applications, the mean, average, and median, as well as the stand mechanism, may all be employed to solve the problem. Previous research has shown that the characteristics chosen for a model's overall performance may have an influence on the overall performance of the model's overall performance. One of the primary goals of this study is to develop an intelligent algorithm for identifying relevant traits in models while simultaneously eliminating nonsignificant attributes that have an impact on model performance. To present a full view of the data, artificial intelligence techniques such as SVM, decision tree, and logistic regression models were used in conjunction with three separate feature combination methodologies, each of which was developed independently. As a consequence of this, their accuracy, F-measure, and precision are all raised by a factor of ten, respectively. We then have a list of the most important features, together with the weights that have been allocated to each of them.Entities:
Mesh:
Year: 2022 PMID: 36052051 PMCID: PMC9427218 DOI: 10.1155/2022/7538643
Source DB: PubMed Journal: Comput Intell Neurosci
Representation of all feature names with data types.
| Number | Features | Type | New_feature_name |
|---|---|---|---|
| 1 | Total precipitation | Float64 |
|
| 2 | Southeast_NDVI | Float64 |
|
| 3 | Max_air_temp | Float64 |
|
| 4 | Total_precipitation in KG | Float64 |
|
| 5 | Northeast_NVDI | Float64 |
|
| 6 | Diurnal_temp | Float64 |
|
| 7 | Northwest_NDVI | Float64 |
|
| 8 | Mean_duepoint | Float64 |
|
| 9 | South_NDVI | Float64 |
|
| 10 | Mean humidity | Float64 |
|
Figure 1Male versus female gender ratio.
Figure 2Proposed framework.
Figure 3List of data types is the data set in supervised learning.
Figure 4Types of feature selection techniques.
SVM versus decision tree versus logistic regression with the feature set.
| Models | Precision (%) | Features_set |
|---|---|---|
| SVM | 75.03 | Features −1, 3, 5,7, 9, 11, 13, 15, 17, 19 |
| Decision tree |
| Features −1, 2, 5, 9, 11, 13, 17, 8 |
| Logistic regression | 74.90 | Features −1, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20 |
SVM versus decision tree versus logistic regression with the feature set.
| Models |
| Features_set |
|---|---|---|
| SVM |
| Features −1, 3, 5, 7, 9, 11, 13, 15, 17, 19 |
| Decision tree | 81.45 | Features −1, 2, 5, 9, 11, 13, 17, 8 |
| Logistic regression | 82.75 | Features −1, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20 |
SVM versus decision tree versus logistic regression with the feature set.
| Models | Accuracy (%) | Features_set |
|---|---|---|
| SVM |
| Features −1, 3, 5, 7, 9, 11, 13, 15, 17, 19 |
| Decision tree | 80 | Features −1, 2, 5, 9, 11, 13, 17, 8 |
| Logistic regression | 81 | Features −1, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20 |
Final model accuracy with the old and new feature sets.
| Final accuracy | SVM (%) | Decision tree (%) | Logistic regression (%) |
|---|---|---|---|
| With all features (20) | 83.12 | 81.74 | 79.63 |
| With new features (11) | 84 | 82.87 | 80.32 |
List of feature importance weights.
| Number | Features | New_feature_name | Feature_weights |
|---|---|---|---|
| 1 | Year_week |
| 0.0899 |
| 2 | Recorded_year |
| 0.0612 |
| 3 | Recorded_month |
| 0.0394 |
| 4 | Air_temp |
| 0.0174 |
| 5 | Humidity |
| 0.0167 |
| 6 | Surface_water3 |
| 0.0152 |
| 7 | Total_vegetation |
| 0.0101 |
| 8 | Min_air_temp |
| 0.0069 |
| 9 | Surface_water5 |
| 0.0058 |
| 10 | Surface_water1 |
| 0.0051 |
| 11 | Total_precipitation |
| 0.0013 |
| 12 | Southeast_NDVI |
| 0.003 |
| 13 | Max_air_temp |
| 0.0003 |
| 14 | Total_precipitaion in KG |
| 0.0002 |
| 15 | Northeast_NVDI |
| 0 |
| 16 | Diurnal_temp |
| −0.0002 |
| 17 | Northwest_NDVI |
| −0.0019 |
| 18 | Mean_duepoint |
| −0.0046 |
| 19 | South_NDVI |
| −0.0054 |
| 20 | Mean_humidity |
| −0.0067 |