| Literature DB >> 35626457 |
Jazmín S De la Cruz-García1, Juan Bory-Reyes2, Aldo Ramirez-Arellano1.
Abstract
Decision trees are decision support data mining tools that create, as the name suggests, a tree-like model. The classical C4.5 decision tree, based on the Shannon entropy, is a simple algorithm to calculate the gain ratio and then split the attributes based on this entropy measure. Tsallis and Renyi entropies (instead of Shannon) can be employed to generate a decision tree with better results. In practice, the entropic index parameter of these entropies is tuned to outperform the classical decision trees. However, this process is carried out by testing a range of values for a given database, which is time-consuming and unfeasible for massive data. This paper introduces a decision tree based on a two-parameter fractional Tsallis entropy. We propose a constructionist approach to the representation of databases as complex networks that enable us an efficient computation of the parameters of this entropy using the box-covering algorithm and renormalization of the complex network. The experimental results support the conclusion that the two-parameter fractional Tsallis entropy is a more sensitive measure than parametric Renyi, Tsallis, and Gini index precedents for a decision tree classifier.Entities:
Keywords: Gini index; complex networks; decision trees; two-parameter Tsallis entropy
Year: 2022 PMID: 35626457 PMCID: PMC9141694 DOI: 10.3390/e24050572
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.738
Figure 1A decision tree to the classification task.
Figure 2Network construction from a database. The nodes in the same color belong to the same box for .
Figure 3Box covering of a network for . (a) Original network. (b) Dual network. (c) Colouring process. (d) Mapping colours to the original network.
The results of and from the network of Figure 3, and the “pseudo matrix” of .
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| 1 | 6 | - | - | - | - |
| 2 | 3 | 0.107 |
|
|
|
| 3 | 2 | 0.107 |
|
| - |
| 4 | 2 | 0.143 |
|
| - |
| 5 | 1 | - | - | - | - |
Figure 4Renormalization of a network. (a) Grouping nodes into boxes. (b) Converting boxes into supernodes.
Database and network features. N = nominal, U = numerical, M = mixed.
| Database | Records | Attributes | Type | Classes | Balanced | Nodes | Edges |
|---|---|---|---|---|---|---|---|
| Breast Cancer | 699 | 9 | N | 2 | No | 737 | 1276 |
| Car | 1728 | 6 | N | 4 | Yes | 25 | 70 |
| Cmc | 1473 | 9 | M | 3 | No | 74 | 264 |
| Glass | 214 | 10 | U | 7 | No | 1159 | 1743 |
| Haberman | 306 | 3 | U | 2 | No | 94 | 395 |
| Hayes | 160 | 5 | N | 3 | No | 150 | 186 |
| Image | 2310 | 19 | U | 7 | Yes | 12,705 | 24,411 |
| Letter | 20,000 | 16 | U | 16 | Yes | 282 | 2700 |
| Scale | 625 | 4 | N | 3 | No | 23 | 90 |
| Vehicle | 946 | 18 | U | 4 | Yes | 1434 | 8064 |
| Wine | 178 | 13 | U | 3 | No | 1279 | 2239 |
| Yeast | 1484 | 9 | M | 10 | No | 1917 | 4907 |
Figure 5The networks from (a) non-discretized and (b) discretized vehicle database.
The parameters of the fractional Tsallis decision tree were obtained using the networks from discretized databases.
| Database |
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| Breast Cancer | 0.173 | 0.189 | 1.147 | 1.134 | 0.853 | 0.866 |
| Car | 0.303 | 0.347 | 1.137 | 1.120 | 0.863 | 0.880 |
| Cmc | 0.169 | 0.185 | 1.152 | 1.138 | 0.848 | 0.862 |
| Glass | 0.171 | 0.187 | 1.154 | 1.141 | 0.846 | 0.859 |
| Haberman | 0.344 | 0.420 | 1.333 | 1.273 | 0.667 | 0.727 |
| Hayes | 0.269 | 0.310 | 1.231 | 1.200 | 0.769 | 0.800 |
| Image | 0.117 | 0.123 | 1.056 | 1.054 | 0.944 | 0.946 |
| Letter | 0.155 | 0.165 | 1.05 | 1.047 | 0.950 | 0.953 |
| Scale | 0.352 | 0.421 | 1.217 | 1.182 | 0.783 | 0.818 |
| Vehicle | 0.092 | 0.096 | 1.106 | 1.101 | 0.894 | 0.899 |
| Wine | 0.119 | 0.127 | 1.147 | 1.138 | 0.853 | 0.862 |
| Yeast | 4.574 | 5.081 | 1.003 | 1.003 | 0.997 | 0.997 |
The AUROC and MCC of classical (CT) and two-parameter fractional Tsallis decision trees (TFTT) and their parameters q, , . + means that AUROC or MCC is statistically greater than AUROC or MCC of CT.
| Database |
|
|
|
|
|
|
| Param. Set |
|---|---|---|---|---|---|---|---|---|
| Breast Cancer | 0.959 | 0.889 | 0.173 | 0.853 | 0.866 | |||
| Car | 0.981 | 0.982 | 0.892 | 0.347 | 0.863 | 0.880 | ||
| Cmc | 0.691 | 0.315 | 0.169 | 1.152 | 1.138 | |||
| Glass | 0.794 | 0.56 | 0.171 | 1.154 | 1.141 | |||
| Haberman | 0.579 | 0.18 |
| 0.344 | 1.333 | 1.273 | ||
| Hayes | 0.869 | 0.578 | 0.269 | 1.231 | 1.200 | |||
| Image | 0.994 | 0.992 | 0.982 |
| 0.123 | 1.056 | 1.054 | |
| Letter | 0.969 | 0.912 | 0.155 | 0.950 | 0.953 | |||
| Scale | 0.845 | 0.678 | 0.421 | 1.217 | 1.182 | |||
| Vehicle | 0.762 | 0.755 | 0.395 |
| 0.092 | 0.894 | 0.899 | |
| Wine | 0.968 | 0.933 | 0.119 | 1.147 | 1.138 | |||
| Yeast | 0.743 | 0.733 | 0.462 |
| 4.574 | 0.997 | 0.997 |
AUROC and MCC of classical (CT), Renyi (RT), and Tsallis (TT) decision trees. + means that AUROC is statistically greater than AUROC or MCC of CT, and − means the opposite.
| Database |
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|
| Breast Cancer | 0.959 | 0.963 | 0.889 | 0.887 | ||||
| Car | 0.981 | 0.983 | 0.982 | 0.892 | ||||
| Cmc | 0.691 | 0.315 |
| |||||
| Glass | 0.794 | 0.56 | ||||||
| Haberman | 0.579 | 0.18 | 0.152 | |||||
| Hayes | 0.869 | 0.869 | 0.578 |
| 0.587 | |||
| Image | 0.994 | 0.997 | 0.995 | 0.982 | 0.984 | 0.978 | ||
| Letter | 0.969 | 0.967 | 0.912 | 0.913 | ||||
| Scale | 0.845 | 0.839 | 0.857 | 0.678 |
| |||
| Vehicle | 0.762 | 0.776 | 0.748 | 0.395 | 0.371 | |||
| Wine | 0.968 | 0.963 | 0.933 |
| 0.924 | |||
| Yeast | 0.743 | 0.462 |
AUROC and MCC of Gini decision trees (GT) and two-parameter fractional Tsallis decision trees (TFTT). + means that AUROC is statistically greater than AUROC or MCC of GT.
| Database |
|
|
|
|
|---|---|---|---|---|
| Breast Cancer | 0.963 |
| 0.888 | |
| Car | 0.981 |
| 0.897 | |
| Cmc | 0.58 | 0.357 |
| |
| Glass | 0.712 | 0.437 | ||
| Haberman | 0.52 | 0.068 | ||
| Hayes | 0.871 | 0.655 |
| |
| Image | 0.988 |
| 0.946 | |
| Letter | 0.962 | 0.894 | ||
| Scale | 0.866 |
| 0.654 | |
| Vehicle | 0.71 | 0.294 | ||
| Wine | 0.932 |
| 0.847 | |
| Yeast | 0.728 |
| 0.414 |