| Literature DB >> 34945927 |
Przemysław Juszczuk1, Jan Kozak2, Grzegorz Dziczkowski2, Szymon Głowania2, Tomasz Jach2, Barbara Probierz2.
Abstract
In the era of the Internet of Things and big data, we are faced with the management of a flood of information. The complexity and amount of data presented to the decision-maker are enormous, and existing methods often fail to derive nonredundant information quickly. Thus, the selection of the most satisfactory set of solutions is often a struggle. This article investigates the possibilities of using the entropy measure as an indicator of data difficulty. To do so, we focus on real-world data covering various fields related to markets (the real estate market and financial markets), sports data, fake news data, and more. The problem is twofold: First, since we deal with unprocessed, inconsistent data, it is necessary to perform additional preprocessing. Therefore, the second step of our research is using the entropy-based measure to capture the nonredundant, noncorrelated core information from the data. Research is conducted using well-known algorithms from the classification domain to investigate the quality of solutions derived based on initial preprocessing and the information indicated by the entropy measure. Eventually, the best 25% (in the sense of entropy measure) attributes are selected to perform the whole classification procedure once again, and the results are compared.Entities:
Keywords: classification; decision table; entropy measure; preprocessing; real-world data
Year: 2021 PMID: 34945927 PMCID: PMC8700715 DOI: 10.3390/e23121621
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1A sample list of rejected words, the so-called Stop Words.
Figure 2The sample matrix of words occurrence (selected as conditional attributes) in documents.
The example frequency of words (selected as conditional attributes).
| Attribute Name | True News | Fake News |
|---|---|---|
|
| 608 | 463 |
|
| 592 | 715 |
|
| 1036 | 78 |
|
| 1151 | 655 |
|
| 840 | 585 |
|
| 631 | 692 |
|
| 2193 | 47 |
|
| 572 | 1441 |
|
| 2520 | 666 |
|
| 1227 | 654 |
|
| 859 | 975 |
|
| 371 | 970 |
|
| 471 | 1167 |
|
| 577 | 821 |
|
| 920 | 269 |
|
| 8843 | 5538 |
|
| 8369 | 40 |
|
| 592 | 703 |
|
| 1975 | 36 |
|
| 2874 | 815 |
Figure 3The distribution of attribute values due to decision classes for fake news data.
Session attributes.
| User ID | Session ID |
|---|---|
| Day/Month/Year | Hour of begin |
| Hour of end | Purchase |
| Total amount | No. products bought |
| No. references bought | Discount code |
| New user | Source of navigation |
| Total time | Time universe (1–7) |
| No. total pages seen | No. pages universe (1–7) seen |
| No. universes changes | No. sections changes |
| No. subsect. changes | No. subsubsect. changes |
| No. of section seen | No. of subsection seen |
| No. product pages seen | No. of same product seen |
Figure 4The distribution of attribute values due to decision classes for user websites navigation data.
Figure 5The distribution of attribute values due to decision classes for real estate market data.
Figure 6The distribution of attribute values due to decision classes for sport data.
Discretization procedure for the market indicators. * in the rare cases, where indicator value exceeds the border value (cases with the word “above” or “below”, the indicator value is set to the border value).
| Indicator Name | Range * | Discretization Step |
|---|---|---|
|
|
| 0.0005 |
|
|
| 0.0005 |
|
| Above 0.1 | 0.005 |
|
| Below −0.1 | 0.005 |
|
|
| 20.0 |
|
|
| 0.1 |
|
|
| 0.0005 |
|
|
| 0.0005 |
|
| Above 0.1 | 0.005 |
|
| Below −0.1 | 0.005 |
| RSI |
| 10.0 |
|
|
| 10.0 |
Figure 7Decision calculation method for the financial data.
Figure 8The distribution of attribute values due to decision classes for financial data.
Information attribute values for fake news data.
| Attribute Name | Value Count | Information Attribute |
|---|---|---|
|
| 2 | 0.998331 |
|
| 2 | 0.998048 |
|
| 2 | 0.983818 |
|
| 2 | 0.996864 |
|
| 2 | 0.998051 |
|
| 2 | 0.998287 |
|
| 2 | 0.957435 |
|
| 2 | 0.990542 |
|
| 2 | 0.981507 |
|
| 2 | 0.996318 |
|
| 2 | 0.998106 |
|
| 2 | 0.992925 |
|
| 2 | 0.992260 |
|
| 2 | 0.997342 |
|
| 2 | 0.993210 |
|
| 2 | 0.986877 |
|
| 2 | 0.803497 |
|
| 2 | 0.998101 |
|
| 2 | 0.961002 |
|
| 2 | 0.998470 |
|
|
|
|
|
| 2 | 0.998473 |
Classification results for fake news data by decision class for full set of attributes [in %] (all bold numbers correspond the best values obtained).
| Decision Tree | Random Forest | Bagging | AdaBoost | |||||
|---|---|---|---|---|---|---|---|---|
| Decision Class | PPV | TPR | PPV | TPR | PPV | TPR | PPV | TPR |
|
|
| 46.05 | 93.95 | 54.87 | 91.38 |
| 92.10 | 56.88 |
|
| 62.70 |
| 66.02 | 96.12 |
| 93.93 |
| 94.65 |
Classification results for fake news data by decision class for limited set of attributes (5 attributes selected) [in %] (all bold numbers correspond the best values obtained).
| Decision Tree | Random Forest | Bagging | AdaBoost | |||||
|---|---|---|---|---|---|---|---|---|
| Decision Class | PPV | TPR | PPV | TPR | PPV | TPR | PPV | TPR |
|
|
| 46.05 | 93.68 | 54.04 | 93.67 |
| 93.67 |
|
|
| 62.70 |
| 65.58 | 96.00 |
| 95.97 |
| 95.97 |
Accuracy results for the classification over fake news data [in %].
| Decision Tree | Random Forest | Bagging | AdaBoost | |
|---|---|---|---|---|
| Accuracy (20 attributes) | 71.51 | 74.55 | 75.49 | 74.89 |
| Accuracy (5 attributes) | 71.51 | 74.56 | 74.17 | 74.17 |
Information attribute values for user websites navigation data.
| Attribute Name | Value Count | Information Attribute |
|---|---|---|
|
| 14 | 0.478533 |
|
| 65 | 0.465879 |
|
| 101 | 0.478267 |
|
| 101 | 0.479259 |
|
| 30 | 0.474476 |
|
| 77 | 0.402749 |
|
| 2 | 0.173014 |
|
| 2 | 0.161386 |
|
| 3 | 0.479204 |
|
| 87 | 0.364767 |
|
| 64 | 0.347537 |
|
| 74 | 0.445645 |
|
| 61 | 0.214535 |
|
| 27 | 0.481156 |
|
| 36 | 0.481021 |
|
| 48 | 0.481082 |
|
| 74 | 0.323663 |
|
| 38 | 0.339926 |
|
| 65 | 0.432955 |
|
| 60 | 0.217707 |
|
| 20 | 0.481144 |
|
| 7 | 0.481129 |
|
| 7 | 0.481159 |
|
| 64 | 0.349787 |
|
| 16 | 0.439947 |
|
| 47 | 0.464543 |
|
| 64 | 0.478707 |
|
| 42 | 0.468668 |
|
| 88 | 0.436594 |
|
| 78 | 0.471988 |
|
| 89 | 0.470698 |
|
|
|
|
|
| 2 | 0.481233 |
Classification results for user websites navigation data by decision class values for full set of attributes [in %].
| Decision Tree | Random Forest | Bagging | AdaBoost | |||||
|---|---|---|---|---|---|---|---|---|
| Decision Class | PPV | TPR | PPV | TPR | PPV | TPR | PPV | TPR |
|
| 99.91 | 89.50 | 99.92 | 87.44 | 49.46 | 92.55 | 89.15 | 91.49 |
|
| 98.80 | 99.99 | 98.56 | 99.99 | 99.03 | 89.03 | 99.01 | 98.70 |
Classification results for user websites navigation data by decision class values for limited set of attributes (7 attributes selected) [in %].
| Decision Tree | Random Forest | Bagging | AdaBoost | |||||
|---|---|---|---|---|---|---|---|---|
| Decision Class | PPV | TPR | PPV | TPR | PPV | TPR | PPV | TPR |
|
| 99.92 | 89.50 | 99.92 | 89.50 | 96.70 | 92.42 | 98.83 | 90.61 |
|
| 98.80 | 99.99 | 98.80 | 99.99 | 99.13 | 99.63 | 98.92 | 99.88 |
Accuracy results for user websites navigation data [in %].
| Decision Tree | Random Forest | Bagging | AdaBoost | |
|---|---|---|---|---|
| Accuracy (31 attributes) | 98.90 | 98.69 | 89.40 | 97.96 |
| Accuracy (7 attributes) | 98.90 | 98.90 | 98.89 | 98.91 |
Information attribute values for real estate market data.
| Attribute Name | Value Count | Information Attribute |
|---|---|---|
|
| 166 | 3.294200 |
|
| 35 | 3.418894 |
|
| 6 | 3.529588 |
|
| 14 | 3.536040 |
|
| 3,698 | 2.071138 |
|
| 5 | 3.455887 |
|
| 6 | 3.485657 |
|
| 3 | 3.542212 |
|
| 2 | 3.576844 |
|
| 2 | 3.552212 |
|
| 2 | 3.572712 |
|
|
|
|
|
| 16 | 3.579787 |
Figure 9Histogram of cardinality of the decision set.
Classification results for real estate market data by decision class for full set of attributes [in %] (all bold numbers correspond the best values obtained).
| Decision Tree | Random Forest | Bagging | AdaBoost | |||||
|---|---|---|---|---|---|---|---|---|
| Decision Class | PPV | TPR | PPV | TPR | PPV | TPR | PPV | TPR |
| 1 |
| 9.52 | 0.00 | 0.00 | 25.00 |
| 0.16 |
|
| 2 | 0.00 | 0.00 |
| 0.00 | 99.47 |
| 24.61 | 10.02 |
| 3 | 0.00 | 0.00 | 0.00 | 0.00 |
|
| 0.00 | 0.00 |
| 4 | 0.00 | 0.00 | 0.00 | 0.00 |
|
| 0.00 | 0.00 |
| 5 | 0.00 | 0.00 | 0.00 | 0.00 |
|
| 0.00 | 0.00 |
| 6 | 0.00 | 0.00 | 0.00 | 0.00 |
|
| 0.00 | 0.00 |
| 7 | 0.00 | 0.00 | 49.08 | 21.62 |
|
| 27.44 | 87.80 |
| 8 | 0.00 | 0.00 | 0.00 | 0.00 |
|
| 0.67 | 9.52 |
| 9 | 0.00 | 0.00 | 0.00 | 0.00 |
|
| 0.00 | 0.00 |
| 10 | 99.57 |
|
| 7.10 | 99.57 | 99.74 | 99.57 | 79.98 |
| 11 | 0.00 | 0.00 | 0.00 | 0.00 |
|
| 0.00 | 0.00 |
| 12 | 21.55 |
| 31.25 | 21.79 |
|
| 67.78 | 10.03 |
| 13 |
|
| 32.13 | 91.97 |
| 99.90 | 0.00 | 0.00 |
| 14 |
| 0.99 | 73.49 |
|
| 99.65 | 38.48 |
|
| 15 |
|
| 87.70 |
|
|
| 39.41 | 9.97 |
Classification results for real estate market data by decision class for a limited set of attributes (3 attributes selected) [in %]. (all bold numbers correspond the best values obtained).
| Decision Tree | Random Forest | Bagging | AdaBoost | |||||
|---|---|---|---|---|---|---|---|---|
| Decision Class | PPV | TPR | PPV | TPR | PPV | TPR | PPV | TPR |
| 1 | 0.00 | 0.00 | 0.00 | 0.00 |
|
| 0.16 |
|
| 2 | 0.00 | 0.00 | 56.27 | 15.61 |
|
| 24.61 | 10.02 |
| 3 | 0.00 | 0.00 | 0.00 | 0.00 |
|
| 0.00 | 0.00 |
| 4 | 0.00 | 0.00 | 0.00 | 0.00 |
|
| 0.00 | 0.00 |
| 5 | 0.00 | 0.00 | 0.00 | 0.00 |
|
| 0.00 | 0.00 |
| 6 | 0.00 | 0.00 | 0.00 | 0.00 |
|
| 0.00 | 0.00 |
| 7 | 0.00 | 0.00 | 0.00 | 0.00 |
|
| 27.44 | 87.80 |
| 8 | 0.00 | 0.00 | 0.00 | 0.00 |
|
| 0.67 | 9.52 |
| 9 | 0.00 | 0.00 | 0.00 | 0.00 |
|
| 0.00 | 0.00 |
| 10 |
| 100.00 | 98.91 | 15.57 |
|
|
| 79.98 |
| 11 | 0.00 | 0.00 | 0.00 | 0.00 |
|
| 0.00 | 0.00 |
| 12 | 21.56 |
| 39.43 | 30.61 |
| 99.92 | 67.78 | 10.03 |
| 13 |
|
| 30.61 | 99.95 |
| 99.74 | 0.00 | 0.00 |
| 14 | 99.74 |
| 94.89 |
| 99.74 |
| 38.49 |
|
| 15 |
|
| 98.98 |
|
|
| 39.55 | 10.03 |
Accuracy results for the classification over real estate data [in %].
| Decision Tree | Random Forest | Bagging | AdaBoost | |
|---|---|---|---|---|
| Accuracy (15 attributes) | 69.00 | 53.69 | 99.07 | 28.92 |
| Accuracy (3 attributes) | 69.00 | 56.71 | 98.71 | 28.92 |
Information attribute values for sport data.
| Attribute Name | Value Count | Information Attribute |
|---|---|---|
| Germany | ||
|
| 10 | 1.535055 |
|
| 30 | 1.514509 |
|
| 28 (28) | 1.471725 (1.464826) |
|
| 18 (18) | 1.417129 (1.438857) |
|
| 30 (30) | 1.514509 (1.514509) |
|
| 30 (30) | 1.461058 (1.469589) |
|
| 16 (16) | 1.506093 (1.502721) |
|
| 24 (25) | 1.473808 (1.468121) |
|
| 92 (94) | 1.457664 (1.454900) |
|
| 74 (77) | 1.476410 (1.470156) |
|
| 116 (117) | 1.382856 (1.395159) |
|
| 85 (86) | 1.440930 (1.452137) |
|
| 40 (40) | 1.505347 (1.498025) |
| Match Result (Decision) | 3 | 1.539089 |
| Italy | ||
|
| 10 | 1.538402 |
|
| 34 | 1.531829 |
|
| 34 (34) | 1.458934 (1.460329) |
|
| 20 (20) | 1.415896 (1.424320) |
|
| 34 (34) | 1.531829 (1.531829) |
|
| 33 (32) | 1.468639 (1.475596) |
|
| 19 (19) | 1.514713 (1.512888) |
|
| 29 (30) | 1.469099 (1.465326) |
|
| 92 (91) | 1.481346 (1.475606) |
|
| 86 (86) | 1.484403 (1.488688) |
|
| 112 (110) | 1.397699 (1.398332) |
|
| 97 (97) | 1.448064 (1.454139) |
|
| 40 (40) | 1.506473 (1.516376) |
|
|
|
|
|
| 3 | 1.545029 |
| Spain | ||
|
| 10 | 1.519782 |
|
| 34 | 1.514018 |
|
| 33 (33) | 1.438616 (1.436341) |
|
| 20 (20) | 1.403615 (1.401159) |
|
| 34 (34) | 1.514018 (1.514018) |
|
| 32 (32) | 1.451360 (1.455801) |
|
| 19 (19) | 1.493165 (1.486429) |
|
| 27 (27) | 1.462337 (1.453162) |
|
| 115 (111) | 1.448292 (1.437169) |
|
| 82 (81) | 1.467792 (1.464831) |
|
| 130 (132) | 1.376633 (1.375712) |
|
| 95 (94) | 1.438554 (1.435251) |
|
| 40 (40) | 1.493445 (1.488321) |
|
|
|
|
|
| 3 | 1.523545 |
Classification results for sport data by decision class for full set of attributes [in %] (all bold numbers correspond the best values obtained).
| Decision Tree | Random Forest | Bagging | AdaBoost | |||||
|---|---|---|---|---|---|---|---|---|
| Decision Class | PPV | TPR | PPV | TPR | PPV | TPR | PPV | TPR |
| Germany | ||||||||
| 1 | 53.66 | 85.51 | 53.96 |
| 56.94 | 70.36 |
| 68.44 |
| 2 | 55.74 | 48.34 |
|
| 51.92 | 49.94 | 55.05 | 50.92 |
|
|
| 3.18 | 0.00 | 0.00 | 33.09 | 20.45 |
|
|
| Italy | ||||||||
| 1 | 54.45 | 87.44 | 54.57 |
|
| 70.85 | 58.90 | 69.58 |
| 2 | 56.56 | 52.63 |
| 53.13 | 52.04 | 55.81 | 50.17 |
|
|
| 45.16 | 1.62 |
| 0.46 | 34.40 |
| 39.87 |
|
| Spain | ||||||||
| 1 | 55.78 | 88.11 | 55.57 |
|
| 69.23 |
| 69.79 |
| 2 | 52.00 |
|
| 44.87 | 49.55 |
| 50.73 | 40.53 |
|
| 0.00 | 0.00 |
| 0.85 | 30.29 | 22.83 | 31.73 |
|
Accuracy results for the classification over sport data [in %].
| Decision Tree | Random Forest | Bagging | AdaBoost | |
|---|---|---|---|---|
| Germany | 53.89 | 55.24 | 51.83 | 53.70 |
| Accuracy 24 attributes | ||||
| Germany | 53.31 | 54.84 | 49.01 | 55.78 |
| Accuracy 6 attributes | ||||
| Italy | 54.96 | 55.85 | 53.59 | 53.35 |
| Accuracy 24 attributes | ||||
| Italy | 54.47 | 54.85 | 50.09 | 53.54 |
| Accuracy 6 attributes | ||||
| Spain | 54.82 | 55.41 | 51.55 | 51.64 |
| Accuracy 24 attributes | ||||
| Spain | 55.01 | 55.49 | 51.74 | 55.31 |
| Accuracy 6 attributes |
Classification results for sport data by decision class values for for limited set of attributes (6 attributes selected) [in %] (all bold numbers correspond the best values obtained).
| Decision Tree | Random Forest | Bagging | AdaBoost | |||||
|---|---|---|---|---|---|---|---|---|
| Decision Class | PPV | TPR | PPV | TPR | PPV | TPR | PPV | TPR |
| Germany | ||||||||
| 1 | 53.43 | 84.93 | 54.28 |
| 56.06 | 65.86 |
| 80.27 |
| 2 | 54.53 | 48.83 |
| 52.40 | 49.08 | 49.20 | 54.77 |
|
|
| 22.22 | 1.21 | 0.00 | 0.00 | 26.62 |
|
| 5.01 |
| Italy | ||||||||
| 1 | 54.47 | 86.83 | 53.73 |
|
| 67.61 | 55.92 | 80.31 |
| 2 | 55.76 | 54.32 |
| 51.34 | 49.85 | 50.94 | 52.44 |
|
|
| 2.00 | 0.12 | 0.00 | 0.00 | 27.26 |
|
| 5.10 |
| Spain | ||||||||
| 1 | 55.98 | 87.78 | 55.88 |
|
| 71.28 | 58.03 | 85.14 |
| 2 | 52.24 | 48.15 |
| 46.98 | 49.23 | 47.62 | 51.08 |
|
|
| 0.00 | 0.00 | 0.00 | 0.00 | 27.84 |
|
| 3.86 |
Information attribute values for the financial data (all bold numbers correspond the best values obtained).
| Attribute Name | AUDUSD | EURUSD | GBPUSD | NZDUSD | ||||
|---|---|---|---|---|---|---|---|---|
| Value | Inf. | Value | Inf. | Value | Inf. | Value | Inf. | |
| Count | Attribute | Count | Attribute | Count | Attribute | Count | Attribute | |
|
| 2714 |
| 2717 |
| 2749 |
| 2665 |
|
|
| 2686 |
| 2673 |
| 2734 |
| 2653 |
|
|
| 689 | 1.380616 | 767 | 1.258658 | 853 | 1.103368 | 623 | 1.417858 |
|
| 440 | 1.612055 | 497 | 1.467387 | 510 | 1.355281 | 389 | 1.652498 |
|
| 341 | 1.747386 | 336 | 1.654923 | 296 | 1.584065 | 290 | 1.800077 |
|
| 78 | 1.936437 | 76 | 1.852890 | 80 | 1.752818 | 80 | 1.964629 |
|
| 10 | 2.008175 | 11 | 1.928776 | 10 | 1.807938 | 9 | 2.035717 |
|
| 154 | 1.899572 | 166 | 1.812773 | 214 | 1.653933 | 131 | 1.939371 |
|
| 5 | 2.014515 | 5 | 1.931348 | 6 | 1.812072 | 5 | 2.039597 |
|
| 10 | 2.008801 | 11 | 1.928170 | 12 | 1.807163 | 11 | 2.034959 |
|
|
|
|
|
|
|
|
|
|
|
| 5 | 2.321928 | 5 | 1.935982 | 5 | 1.815961 | 5 | 2.044563 |
Classification results for the financial data by decision class for full set of attributes [in %].
| Decision Tree | Random Forest | Bagging | AdaBoost | |||||
|---|---|---|---|---|---|---|---|---|
| Decision Class | PPV | TPR | PPV | TPR | PPV | TPR | PPV | TPR |
| AUDUSD | ||||||||
|
| 4.05 | 1.53 | - | - | 1.52 | 2.55 | - | - |
|
| - | - | - | - | 1.99 | 3.23 | 6.25 | 0.46 |
|
| 35.55 | 34.77 | 33.22 | 49.91 | 31.03 | 30.81 | 33.15 | 37.32 |
|
| 35.08 | 53.50 | 30.57 | 32.15 | 26.75 | 21.89 | 35.76 | 49.59 |
|
| - | - | - | - | 1.29 | 0.93 | 5.88 | 0.47 |
| EURUSD | ||||||||
|
| - | - | - | - | 1.83 | 2.30 | 2.22 | 0.57 |
|
| - | - | - | - | 1.44 | 1.62 | - | - |
|
| 33.01 | 20.79 | 29.45 | 19.65 | 35.46 | 38.43 | 34.06 | 35.28 |
|
| 38.06 | 66.58 | 37.71 | 65.08 | 36.43 | 34.42 | 35.80 | 44.44 |
|
| - | - | - | - | 1.49 | 0.61 | 4.76 | 0.61 |
| GBPUSD | ||||||||
|
| - | - | - | - | 2.27 | 0.88 | - | - |
|
| - | - | - | - | 1.56 | 2.05 | - | - |
|
| 42.92 | 84.02 | 46.09 | 65.25 | 42.82 | 48.61 | 43.09 | 51.39 |
|
| 44.44 | 16.84 | 47.45 | 43.72 | 41.20 | 40.00 | 44.17 | 49.72 |
|
| - | - | - | - | 2.22 | 0.67 | - | - |
| NZDUSD | ||||||||
|
| - | - | - | - | 5.95 | 4.81 | 12.82 | 2.40 |
|
| - | - | - | - | 5.21 | 4.80 | 7.27 | 1.75 |
|
| 40.20 | 85.76 | 39.35 | 83.06 | 37.41 | 45.94 | 40.11 | 62.01 |
|
| 39.63 | 13.94 | 39.29 | 16.25 | 33.80 | 31.02 | 38.48 | 35.00 |
|
| - | - | - | - | 5.81 | 2.50 | - | - |
Classification results for the financial data by the decision class values for 25% of attributes with the lowest information attribute (in %).
| Decision Tree | Random Forest | Bagging | AdaBoost | |||||
|---|---|---|---|---|---|---|---|---|
| Decision Class | PPV | TPR | PPV | TPR | PPV | TPR | PPV | TPR |
| AUDUSD | ||||||||
|
| 2.74 | 1.02 | 4.44 | 1.02 | 2.33 | 3.06 | 2.33 | 0.51 |
|
| - | - | - | - | 2.95 | 4.61 | - | - |
|
| 33.43 | 29.69 | 32.32 | 36.56 | 33.85 | 34.45 | 36.80 | 44.23 |
|
| 34.89 | 56.49 | 31.99 | 44.60 | 29.72 | 23.89 | 32.94 | 43.60 |
|
| - | - | - | - | 4.85 | 5.12 | - | - |
| EURUSD | ||||||||
|
| - | - | - | - | 2.97 | 3.45 | - | - |
|
| - | - | - | - | 3.91 | 5.41 | - | - |
|
| 32.11 | 21.50 | 34.14 | 28.23 | 38.10 | 38.20 | 30.41 | 24.21 |
|
| 39.37 | 67.34 | 37.81 | 59.23 | 41.24 | 38.76 | 36.59 | 58.48 |
|
| - | - | - | - | 3.73 | 3.05 | - | - |
| GBPUSD | ||||||||
|
| - | - | - | - | 2.70 | 2.65 | 16.67 | 0.88 |
|
| - | - | - | - | 2.11 | 3.42 | - | - |
|
| 41.28 | 37.05 | 44.40 | 54.92 | 42.92 | 42.95 | 42.87 | 48.77 |
|
| 42.52 | 60.73 | 43.32 | 47.53 | 42.21 | 41.05 | 44.39 | 52.55 |
|
| - | - | - | - | - | - | - | - |
| NZDUSD | ||||||||
|
| 3.51 | 0.96 | - | - | 1.27 | 1.44 | - | - |
|
| - | - | - | - | 6.46 | 8.30 | 27.27 | 1.31 |
|
| 39.61 | 84.97 | 38.87 | 82.60 | 37.45 | 38.72 | 40.69 | 70.28 |
|
| 39.13 | 11.63 | 42.26 | 16.90 | 29.32 | 27.89 | 39.56 | 30.10 |
|
| - | - | - | - | 2.50 | 1.50 | 5.13 | 1.00 |
Accuracy results for the classification over the financial data [in %].
| Decision Tree | Random Forest | Bagging | AdaBoost | |
|---|---|---|---|---|
| AUDUSD | 34.45 | 32.15 | 21.12 | 33.93 |
| AUDUSD 2 atr. | 33.55 | 31.70 | 23.78 | 34.32 |
| EURUSD | 36.13 | 35.04 | 30.02 | 32.74 |
| EURUSD 2 atr. | 36.73 | 36.03 | 32.19 | 34.11 |
| GBPUSD | 43.04 | 46.63 | 38.12 | 43.32 |
| GBPUSD 2 atr. | 41.97 | 43.89 | 36.28 | 43.47 |
| NZDUSD | 39.55 | 39.34 | 30.99 | 38.32 |
| NZDUSD 2 atr. | 38.41 | 39.39 | 26.89 | 39.63 |
Number of attributes after selection.
| CFS | Proposed Approach | Original | |
|---|---|---|---|
| Fake News Data | 8 | 5 | 20 |
| User Websites Navigation Data | 5 | 7 | 31 |
| Real-Estate Market Data | 9 | 3 | 15 |
| Sport Data (Germany) | 8 | 6 | 24 |
| Financial Data (GBPUSD) | 2 | 2 | 11 |
Accuracy results for the classification over the data after selection [in %].
| Data | Decision Tree | Random Forest | Bagging | AdaBoost |
|---|---|---|---|---|
| FN | 71.51 | 74.24 | 74.25 | 74.27 |
| Change: | — | −0.32 | +0.08 | +0.10 |
| UWN | 98.90 | 98.90 | 98.60 | 98.81 |
| Change: | — | — | −0.29 | −0.10 |
| R-EM | 69.00 | 57.67 | 98.93 | 28.92 |
| Change: | — | +0.96 | +0.22 | — |
| SD | 53.87 | 55.67 | 50.39 | 55.89 |
| Change: | +0.56 | +0.83 | +1.38 | +0.10 |
| FD | 42.44 | 41.55 | 33.52 | 43.51 |
| Change: | +0.47 | −2.34 | −2.76 | +0.04 |
Figure 10Classification accuracy depending on the number of attributes.