| Literature DB >> 33854542 |
Mohammad Savargiv1, Behrooz Masoumi1, Mohammad Reza Keyvanpour2.
Abstract
The goal of aggregating the base classifiers is to achieve an aggregated classifier that has a higher resolution than individual classifiers. Random forest is one of the types of ensemble learning methods that have been considered more than other ensemble learning methods due to its simple structure, ease of understanding, as well as higher efficiency than similar methods. The ability and efficiency of classical methods are always influenced by the data. The capabilities of independence from the data domain, and the ability to adapt to problem space conditions, are the most challenging issues about the different types of classifiers. In this paper, a method based on learning automata is presented, through which the adaptive capabilities of the problem space, as well as the independence of the data domain, are added to the random forest to increase its efficiency. Using the idea of reinforcement learning in the random forest has made it possible to address issues with data that have a dynamic behaviour. Dynamic behaviour refers to the variability in the behaviour of a data sample in different domains. Therefore, to evaluate the proposed method, and to create an environment with dynamic behaviour, different domains of data have been considered. In the proposed method, the idea is added to the random forest using learning automata. The reason for this choice is the simple structure of the learning automata and the compatibility of the learning automata with the problem space. The evaluation results confirm the improvement of random forest efficiency.Entities:
Year: 2021 PMID: 33854542 PMCID: PMC8019375 DOI: 10.1155/2021/5572781
Source DB: PubMed Journal: Comput Intell Neurosci
Algorithm 1The random forest pseudocode for classification applications [1].
Brief review of RF literature on functionality and innovation.
| Type | Field | Paper |
|---|---|---|
| Functionality | Astronomy | [ |
| Bioinformatics | [ | |
| Economics | [ | |
| Global problem | [ | |
| Healthcare | [ | |
| Industrial | [ | |
| Network | [ | |
| Physics | [ | |
| Text processing | [ | |
| Tourism | [ | |
| Urban planning | [ | |
| Innovative method | Economics | [ |
| General | [ | |
| Global problem | [ | |
| Healthcare | [ | |
| Industrial | [ | |
| Network | [ | |
| Physics | [ | |
| Text processing | [ |
Figure 1Interaction of learning automata with the environment.
Figure 2The block diagram of the proposed method.
Algorithm 2The pseudocode of the proposed method.
Details of textual data used for evaluation.
| Domain | Name | # Feature | # Instance |
|---|---|---|---|
| Text | Stanford—Sentiment 140 corpus [ | Bag of word | 1600000 |
| Large dataset of movie reviews [ | Bag of word | 50000 | |
| Sentence polarity dataset v1.0 [ | Bag of word | 10662 | |
| Internet movie database [ | Bag of word | 1400 | |
| Yelp review [ | Bag of word | 598000 | |
| Amazon review [ | Bag of word | 1000000 | |
| Healthcare | Heart disease dataset [ | 13 | 200 |
| Breast cancer dataset [ | 30 | 569 | |
| Arrhythmia dataset [ | 279 | 454 | |
| Parkinson dataset [ | 45 | 241 | |
| Caesarean section dataset [ | 5 | 81 | |
| Gene expression dataset [ | 255 | 801 | |
| Diabetes dataset [ | 7 | 765 | |
| Statlog (heart) dataset [ | 13 | 271 | |
| Physical | Ionosphere dataset [ | 34 | 352 |
| Sonar, mines vs. rocks dataset [ | 60 | 208 | |
| Sound | Voice dataset [ | 20 | 3168 |
| Emotions from music dataset [ | 28 | 592 |
Figure 3The results of the proposed method in LRI mode.
Figure 4The results of the proposed method in the LRɛP mode.
Figure 5The results of the proposed method in the LRP mode.
Comparison of the proposed method with similar approaches in the subject literature.
| Dataset | Averaging | Majority Voting | Random Forest | Our Method | |
|---|---|---|---|---|---|
| Text | Sentiment140 dataset | 74.54 | 75.50 | 74.30 |
|
| Large dataset of movie reviews | 86.28 | 86.86 | 86.42 |
| |
| Sentence polarity dataset | 73.75 | 74.63 | 73.38 |
| |
| Movie reviews dataset | 81.58 | 81.58 | 81.67 |
| |
| Yelp review polarity | 89.47 | 90.32 | 89.74 |
| |
| Amazon review polarity | 80.86 | 81.66 | 80.97 |
| |
| Healthcare | Heart disease dataset | 58.00 | 57.50 | 57.50 |
|
| Breast cancer dataset | 97.41 | 97.36 | 96.49 |
| |
| Arrhythmia dataset | 80.71 | 85.71 | 81.31 |
| |
| Parkinson dataset | 63.95 | 64.58 | 64.58 |
| |
| Caesarean section dataset | 60.31 | 62.50 | 43.75 |
| |
| Gene expression dataset | 95.59 | 95.62 | 96.27 |
| |
| Diabetes dataset | 75.77 | 75.32 | 74.67 |
| |
| Statlog (heart) data set | 81.20 | 81.48 | 79.62 |
| |
| Physical | Ionosphere dataset | 91.05 | 91.54 | 92.95 |
|
| Sonar, mines vs. rocks dataset | 85.23 | 85.71 | 73.80 |
| |
| Sound | Voice dataset | 76.38 | 76.18 | 76.49 |
|
| Emotions from music dataset | 78.23 | 78.15 | 82.35 |
|
Comparison of statistical criteria.
| Positive class | Negative class | Positive class | Negative class | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Method | P (%) | R (%) | F1 (%) | P (%) | R (%) | F1 (%) | Method | P (%) | R (%) | F1 (%) | P (%) | R (%) | F1 (%) |
|
| Parkinson dataset | ||||||||||||
| MV | 72.44 | 70.35 | 71.38 | 70.90 | 72.96 | 71.92 | MV | 69.23 | 66.67 | 67.92 |
| 61.9 | 60.47 |
| RF | 72.44 | 74.58 | 73.49 | 76.46 | 74.43 | 75.43 | RF | 69.23 | 66.67 | 67.92 |
| 61.9 | 60.47 |
| OM |
|
|
|
|
|
| OM |
|
|
|
|
|
|
|
|
| ||||||||||||
| MV | 87.50 | 87.33 | 87.41 | 87.41 | 87.59 | 87.50 | MV | 55.56 | 71.43 | 62.5 |
| 55.56 | 62.5 |
| RF | 85.83 | 76.16 | 80.70 | 73.37 | 83.93 | 78.29 | RF | 33.33 | 50 | 40 | 57.14 | 40 | 47.06 |
| OM |
|
|
|
|
|
| OM |
|
|
|
|
|
|
|
|
| ||||||||||||
| MV | 75.63 | 72.83 | 74.20 | 72.80 | 75.61 | 74.18 | MV | 92.31 | 94.12 | 93.2 | 97.25 | 96.36 | 96.8 |
| RF | 74.62 | 67.78 | 71.08 | 65.95 | 72.94 | 69.27 | RF | 94.23 | 93.23 | 94.23 | 97.25 | 97.25 | 97.25 |
| OM |
|
|
|
|
|
| OM |
|
|
|
|
|
|
|
|
| ||||||||||||
| MV | 83.10 | 82.52 | 82.81 | 81.84 | 82.09 | 81.78 | MV | 90.29 | 76.86 | 83.04 | 45.1 | 69.7 | 54.76 |
| RF | 77.46 | 71.43 | 74.32 | 76.41 | 73.98 | 70.54 | RF | 90.29 | 76.23 | 82.67 | 43.14 | 68.75 | 53.01 |
| OM |
|
|
|
|
|
| OM |
|
|
|
|
|
|
|
|
| ||||||||||||
| MV |
| 88.04 | 88.42 | 87.11 |
| 88.59 | MV | 64.58 | 76.09 | 69.86 | 84.85 | 76.24 | 80.31 |
| RF | 80.64 | 85.94 | 83.21 | 86.22 | 81.00 | 83.53 | RF | 64.58 | 76.75 | 70.14 | 85.4 | 76.35 | 80.62 |
| OM | 90.00 |
|
|
| 89.49 |
| OM | 87.82 | 86.55 | 87.18 | 89.81 | 90.81 | 90.3 |
|
|
| ||||||||||||
| MV | 82.68 | 79.45 | 81.03 |
| 82.62 | 80.97 | MV | 83.58 | 78.87 | 81.16 | 71.15 | 77.08 | 74 |
| RF | 79.12 | 74.56 | 76.77 | 73.98 | 87.61 | 76.22 | RF | 88.06 |
| 84.89 |
| 82.98 | 78.79 |
| OM |
|
|
| 79.21 |
|
| OM |
| 81.58 |
| 73.08 |
|
|
|
|
| ||||||||||||
| MV |
| 59.09 | 60.47 | 52.63 | 55.56 | 54.05 | MV |
|
|
|
|
|
|
| RF |
| 59.09 | 60.47 | 52.63 | 55.56 | 54.05 | RF | 79.17 | 76 | 77.55 | 66.67 | 70.59 | 68.57 |
| OM |
|
|
|
|
|
| OM |
|
|
|
|
|
|
|
|
| ||||||||||||
| MV | 95.74 | 97.83 | 96.77 |
| 97.06 | 97.78 | MV |
| 79.19 | 79.19 | 83.33 | 83.33 | 83.33 |
| RF |
| 93.88 | 95.83 | 95.52 | 98.46 | 96.97 | RF | 75 | 78.26 | 76.6 | 83.33 | 80.65 | 81.67 |
| OM |
|
|
|
|
|
| OM |
|
|
|
|
|
|
|
|
| ||||||||||||
| MV | 88.37 |
| 85.39 |
| 88.89 | 86.02 | MV | 82.14 | 95.83 | 88.46 | 97.67 | 89.36 | 93.33 |
| RF | 83.72 | 78.26 | 80.9 | 79.17 | 79.74 | 73.85 | RF | 85.71 | 96 | 90.57 | 97.67 | 91.3 | 94.38 |
| OM |
| 80 |
| 79.19 |
|
| OM |
|
|
|
|
|
|
P, R, and F1 refer to Precision, Recall, and F1-score. MV: majority voting, RF: random forest, and OM: our method.
Figure 6Details of the preprocessing step for text data.
Numerical values tuned for reward and penalty parameters.
| Mode | Parameter | ||||||
|---|---|---|---|---|---|---|---|
| LRI |
| 0 | 0.1 | 0.1 | 0.3 | 0.5 | 0.7 |
|
| 0 | 0 | 0 | 0 | 0 | 0 | |
| LR |
| 0.1 | 0.1 | 0.3 | 0.5 | 0.7 | |
|
| 0 | 0 | 0 | 0 | 0 | ||
| LRP |
| 0 | 0.1 | 0.1 | 0.3 | 0.5 | 0.7 |
|
| 0 | 0.1 | 0.1 | 0.3 | 0.5 | 0.7 | |
Friedman test statistical verification results for ranking the parameters of reward and penalty and comparing the proposed method with the literature.
| Method | Tuning | Mean rank | Final rank |
|---|---|---|---|
| LRP |
| 19.17 | 1 |
| LRP |
| 16.83 | 2 |
| LRP |
| 15.58 | 3 |
| MV | Majority voting | 14.67 | 4 |
| LRP |
| 13.92 | 5 |
| LReP |
| 12.17 | 6 |
| LReP |
| 11.83 | 7 |
| LReP |
| 10.08 | 8 |
| LRP |
| 9.58 | 9 |
| RF | Random forest | 9.17 | 10 |
| LRP |
| 8.75 | 11 |
| LIR |
| 8.42 | 12 |
| LIR |
| 7.67 | 13 |
| LIR |
| 7.67 | 13 |
| LIR |
| 7.67 | 13 |
| LIR |
| 7.67 | 13 |
| LIR |
| 7.67 | 13 |
| AV | Averaging | 7.58 | 14 |
| LReP |
| 7.17 | 15 |
| LReP |
| 6.75 | 16 |
Figure 7Convergence rate for different reward and penalty parameters. (a) a = 0.5, b = 0.5; (b) a = 0.3, b = 0.3; (c) a = 0.7, b = 0; (d) a = 0.1, b = 0.1; (e) a = 0.01, b = 0; (f) a = 0.05, b = 0.05; (g) a = 0.3, b = 0.
Figure 8The evaluation of the proposed method in the presence of noise.