Literature DB >> 34945947

Decision Rules Derived from Optimal Decision Trees with Hypotheses.

Mohammad Azad¹, Igor Chikalov², Shahid Hussain³, Mikhail Moshkov⁴, Beata Zielosko⁵.

Abstract

Conventional decision trees use queries each of which is based on one attribute. In this study, we also examine decision trees that handle additional queries based on hypotheses. This kind of query is similar to the equivalence queries considered in exact learning. Earlier, we designed dynamic programming algorithms for the computation of the minimum depth and the minimum number of internal nodes in decision trees that have hypotheses. Modification of these algorithms considered in the present paper permits us to build decision trees with hypotheses that are optimal relative to the depth or relative to the number of the internal nodes. We compare the length and coverage of decision rules extracted from optimal decision trees with hypotheses and decision rules extracted from optimal conventional decision trees to choose the ones that are preferable as a tool for the representation of information. To this end, we conduct computer experiments on various decision tables from the UCI Machine Learning Repository. In addition, we also consider decision tables for randomly generated Boolean functions. The collected results show that the decision rules derived from decision trees with hypotheses in many cases are better than the rules extracted from conventional decision trees.

Entities: Chemical

Keywords: decision rule; decision tree; hypothesis; representation of information

Year: 2021 PMID： 34945947 PMCID： PMC8700404 DOI： 10.3390/e23121641

Source DB: PubMed Journal: Entropy (Basel) ISSN： 1099-4300 Impact factor: 2.524

1. Introduction

Decision trees are commonly used as classifiers, as an algorithmic tool for solving various problems, and a means of representing information [1,2,3]. They form a part of statistical learning, which refers to a vast set of tools for understanding data [4]. Conventional decision trees studied in test theory [5], rough set theory [6,7,8], and many other areas of computer science exploit queries based on a single attribute. In [9,10,11,12], we considered decision trees that also exploit queries based on hypotheses. Such decision trees are analogous to the tools that have been analyzed in exact learning [13,14,15], where both membership and equivalence queries are used. In the present paper, we analyze decision trees with hypotheses as a means for representation of information. We design dynamic programming algorithms to optimize such trees corresponding to two cost functions. For various decision tables, we build optimal decision trees and analyze the length and coverage of decision rules extracted from the constructed trees to study which kinds of decision trees are more suitable for the representation of information. Let us have a decision table T that contains n attributes. We can use two types of queries in the decision trees for this table. We can ask about the value of an attribute. As a result, we obtain this value. We can choose an n-tuple of possible values of attributes and formulate a hypothesis that it is really the tuple of values of the considered attributes. As a result, we either obtain confirmation of the hypothesis or a counterexample. We call this hypothesis proper if the considered n-tuple is a row of the table T. We studied the following five types of decision trees: Using attributes. Using hypotheses. Using both attributes and hypotheses. Using proper hypotheses. Using attributes as well as proper hypotheses. We analyzed four different cost functions for the decision trees: the depth, the number of realizable nodes, the number of realizable leaf nodes, and the number of internal nodes. We define a node as realizable relative to a given decision table if a computation can pass through this node for at least one row of the considered decision table. Previously, we proposed a dynamic programming algorithm in [12] for each of these four cost functions. When we give a decision table and a type of decision tree to this algorithm, it returns the minimum cost of a decision tree of a given type for the given table. The results of the computer experiments show that decision trees with hypotheses can have less complexity than conventional decision trees. It means that they can be used as a means for the representation of information. The present paper has two aims. The first aim is to construct optimal decision trees with hypotheses. We know that such trees can be used for the representation of information (especially decision trees of type 3). However, the algorithms from [12] were designed only to find the complexity of optimal trees. The algorithms for the two cost functions (the depth and the number of internal nodes) can be modified to build optimal decision trees. Unfortunately, we cannot use a similar approach to build optimal decision trees of types 2 and 3 relative to the number of realizable nodes and optimal decision trees of type 2 relative to the number of realizable leaf nodes. The second aim is to study the length and coverage of decision rules extracted from the optimal decision trees. Decision rules can be considered one of the simplest and most understandable models for the representation of information. Deriving decision rules from decision trees is a well-known approach. We want to confirm that the decision rules derived from decision trees with hypotheses can be better than the rules derived from conventional decision trees. For computer experiments, we chose eight decision tables from the UCI ML Repository [16] as well as 100 randomly generated Boolean functions that contain n variables (). We constructed optimal (relative to the depth or to the number of internal nodes) decision trees of five types for these tables. Then we analyzed the length and coverage of decision rules extracted from these trees. For a decision tree with hypotheses for some rows of the considered decision table, it can be more than one derived decision rule that covers the row. In this case, for each row we chose the best rule. The results of the computer experiments show that the decision rules derived from the decision trees with hypotheses, in many cases, are better than the ones derived from conventional decision trees. The novelty of the paper is directly related to its two main contributions: (i) the modification of dynamic programming algorithms described in [12] such that the modified algorithms can now construct optimal decision trees of five types relative to two cost functions and (ii) the experimental confirmation that the decision rules derived from the decision trees with hypotheses can be more suitable for the representation of information than the decision rules derived from conventional decision trees. To make the paper more understandable, we add to it slightly modified definitions and one algorithm from [12]. We present the remaining parts of the paper as follows: important notions in Section 2 and Section 3, the decision tree optimization based on dynamic programming algorithms in Section 4, Section 5 and Section 6, experimental results in Section 7, and short conclusions in Section 8.

2. Decision Tables

We can define a decision table T as follows: It is a rectangular table that contains n () columns. Its columns are tagged by conditional attributes . Its columns’ values are from the set . Its rows are unique. Its rows are tagged by numbers from interpreted as decisions. Its rows are considered as tuples of values of the conditional attributes. When a decision table does not have any rows, then we call it an empty table. We define a degenerate table as a decision table which is either empty or has all of its rows tagged by the same decision. Furthermore, we consider the following notation for T: is the set of conditional attributes, i.e., . is the set of decisions that are attached to rows. is the set of ’s values where . is the set of conditional attributes of T for which . Let be a system of equations where , , and (S is empty when ). We denote by the subtable of T consisting of all rows of T that have values when they intersect with the columns . Such subtables are called separable subtables of T.

3. Decision Trees and Rules

In this section, we define notions of decision trees and rules related with a given nonempty decision table T that contains n conditional attributes . Let us consider the decision trees in connection with two types of queries. The first type of query is to ask the value of an attribute . The answer of this query is from the set . The second type of query is to ask about a hypothesis over T in the form of where . The answer of this query is from the set . If the answer is H, then the hypothesis is true. Other answers are counterexamples. Note that H is a proper hypothesis for T if is a row of the table T. A decision tree over T is a tagged finite directed rooted tree, where the following hold: We label each leaf node by a number from the set . We label each internal node by a hypothesis over T or an attribute from the set . In both cases, there is exactly one edge leaving this node for each answer, either from the set in the case of hypothesis query or from the set in the case of attribute query, and no other edges exit from this node. Let us consider a decision tree over T. If v is a node of , then we define an equation system over T corresponding to the node v. We denote the directed path from the root of to the node v as . When does not have any internal nodes, then is the empty system. On the other side, if it has internal nodes, then is the union of the systems of equation attached to the edges in . We consider a decision tree over T as a decision tree for T if, for any node v of , the following hold: When the node v is a leaf node, then the subtable is degenerate and vice versa. When v is a leaf node and the subtable is empty, then we label the node v by the decision 0. When v is a leaf node and the subtable is nonempty, then we label the node v by the decision attached to all rows of . An arbitrary directed path from the root to a leaf node v in is called a complete path in . Denote . The depth of a decision tree is analogous to its time complexity. We denote its depth by , which is defined as the maximum number of internal nodes in a complete path in the tree. Similarly, the number of internal nodes in a decision tree (denoted by ) is analogous to its space complexity. Let be a decision tree for T, be a complete path in such that is a nonempty table, and the leaf node of the path be tagged with the decision d. We now define a system of equations . is the empty system in the case of no internal nodes in . Let us assume now that contains at least one internal node. We now transform systems of equations attached to edges leaving internal nodes of . If an edge is tagged with an equation system containing exactly one equation, then we not change this system. Let an edge e leaving a internal node v be tagged with an equation system containing more than one equation. Then v is tagged with a hypothesis H and e is tagged with the equation system H. (Note that if such a node exists, then it is the last internal node in the complete path .) In this case, we remove from the equation system H attached to e all equations of the kind such that . Then, we can obtain as the union of new equation systems corresponding to edges in the path . One can show that . We correspond to the complete path the decision rule, We denote this rule by . The number of equations in the equation system is called the length of the rule and is denoted . The number of rows in the subtable is called the coverage of the rule and is denoted . Denote the set of complete paths in such that the table is nonempty and the set of rows of the decision table T. For a row , we denote by the minimum length of a rule such that and r is a row of the subtable , and we denote by the maximum coverage of a rule such that and r is a row of the subtable . We use the following notation:

4. Construction of Directed Acyclic Graph

Let us consider a nonempty decision table T that has n conditional attributes . The Algorithm 1 is used for the construction of a directed acyclic graph (DAG) . Consequently, this DAG is used for the construction of optimal decision trees. Some separable subtables of the table T are the nodes of this DAG. We process one node during each iteration of the algorithm. We begin by the graph consisting of unprocessed one node T and end by processing all nodes of the graph. This algorithm was described and used in [9,10,12]. It is a special version of the more general algorithm considered in [17]. Input: A nonempty decision table T that has n conditional attributes . Output: Directed acyclic graph . Build the graph consisting of one node T that is not tagged as processed. Check the processing of all nodes of the graph is completed or not. If yes, then the algorithm halts and returns the resulting graph as . Otherwise, select a node (table) which is yet unprocessed. Check node is degenerate or not. If yes, then tag the node as processed and move to step 2. If no, then draw a bundle of edges from the node for each . Let . Then draw k edges from and attach to these edges systems of equations . These edges enter nodes , respectively. In case some of the nodes are not available in the graph, then add these nodes to the graph. Tag the node as processed and move to step 2.

5. Construction of Decision Trees with Minimum Depth

Let us consider a nonempty decision table T that contains n conditional attributes and . We can use the DAG to construct a decision tree of the type k with the minimum depth for the decision table T. For this purpose, we have to construct, corresponding to each vertex of , a decision tree of the type k with minimum depth for the table . It is necessary not only consider subtables corresponding to the nodes of but also empty subtable of T as well as subtables containing only one row r of T, which are not nodes of . The idea is to start with these special subtables as well as leaf nodes of that are degenerate separable subtables of T. In this way, we move step wise in a bottom up fashion to the table T. Let us consider the case when is a leaf node of or for a row r of the table T, or . If is nonempty, then has only one node that is tagged by a decision which is attached to all rows of . Otherwise, it is tagged with 0. Let us consider other case when is an internal node of and the construction of the decision tree is already completed for each child of . Based on these trees, a decision tree for having the minimum depth can be constructed that uses decision trees of the type k for the subtables corresponding to the children of the root. In this tree, the root can be tagged as follows: By an attribute from (such decision tree can be designated as ). By a hypothesis over T (such decision tree can be designated as ). By a proper hypothesis over T (such decision tree can be designated as ). The set is nonempty because is nondegenerate. Now, three procedures for the construction of the trees , , and are described. We now concentrate on a decision tree for the node , where the root is tagged by an attribute . For each , there exists an edge that exits the root and enters the root of the decision tree . We tag this edge by the equation system . It is obvious that One can easily prove using (1) that is a decision tree with the minimum depth for such that the root of this tree is tagged by the attribute and it uses decision trees of the type k for the subtables corresponding to the children of the root. It is obvious not to consider attributes . The reason is that for such , we can find with . Therefore, we cannot construct an optimal tree for based on . Construction of the tree. We build the set . For any , construct the decision tree and choose among these trees a tree with the minimum depth. Return this tree as . Let us consider a hypothesis over T. We call this hypothesis admissible for and an attribute if for any . This hypothesis is not admissible for and an attribute if and only if and . We call H admissible for when we find that H is admissible for and any attribute . We now describe a decision tree for . The root of this tree is tagged by an admissible hypothesis for . For any equation system , there is an edge that exits the root of and enters the root of the tree This edge is tagged by the equation system S. It is obvious that One can easily prove using (2) that is a decision tree with the minimum depth for such that the root of this tree is tagged by the hypothesis H and it uses decision trees of the type k for the subtables corresponding to the children of the root. It is obvious not to consider hypotheses H that are not admissible for . The reason is that for such H, we can find an equation system with . Therefore, we cannot construct an optimal decision tree for based on H. Construction of the tree. Construct a hypothesis for . If , then is the only value from . If , then is minimum number from for which . Return the tree as . Using (2), one can prove the correctness of this procedure. Construction of the tree. For each row of the decision table T, we consider a proper hypothesis . We inspect if is admissible for . If yes, then we construct the decision tree . We choose among the constructed trees a tree with the minimum depth. Return this tree as . Given input of a decision table T and , the following Algorithm 2 builds for each node of the DAG a decision tree of the type k for the table having the minimum depth. Input: T (a nonempty decision table), (the directed acyclic graph for T), and k (a natural number between 1 to 5). Output: A decision tree . Check all nodes of the DAG whether there is a decision tree attached to each node. If yes, then return the tree attached to the node T as and break the algorithm. If not, select a node of the graph that does not have an attached tree. It can be either a leaf node of or an internal node of where all children are tagged with trees. If is a leaf node, then attach to it the decision tree that have only a single node. This node is tagged with the decision attached to all rows of . Move to step 1. If is not a leaf node, then do the following according to the value k: When , construct the tree and attach it to as the tree . When , construct the tree and attach it to as the tree . When , construct the trees and , choose between them a tree with the minimum depth and attach it to as the tree . When , construct the tree and attach it to as the tree . When , construct the trees and , choose between them a tree with the minimum depth and attach it to as the tree . Move to step 1. Let T be a decision table and We use the following notation: and .

6. Construction of Decision Trees Containing Minimum Number of Internal Nodes

Let us consider a nonempty decision table T that contains n conditional attributes and . We can use the DAG to construct a decision tree of the type k with the minimum number of internal nodes for the decision table T. To construct the tree , for each node of the DAG , we construct a decision tree of the type k with the minimum number of internal nodes for the decision table . It is necessary to not only consider the subtables corresponding to the nodes of but also the empty subtable of T as well as the subtables containing only one row r of T which are not nodes of . The idea is to start with these special subtables as well as leaf nodes of that are degenerate separable subtables of T. In this way, we move step wise in a bottom up fashion to the table T. Let us consider the case when is a leaf node of or for a row r of the table T, or . If is nonempty, then the decision tree has only one node that is tagged by a decision which is attached to all rows of . Otherwise, it is tagged with 0. Let us consider another case when is an internal node of such that the construction of the decision tree is already completed for each child of . Based on these trees, a decision tree containing the minimum number of internal nodes for can be constructed that uses decision trees of the type k for the subtables corresponding to the children of the root. In this tree, the root can be tagged as follows: By an attribute from (such decision tree can be designated as ). By a hypothesis over T (such decision tree can be designated as ). By a proper hypothesis over T (such decision tree can be designated as ). The set is nonempty because is nondegenerate. Now, three procedures for the construction of the trees , , and are described. We now concentrate on a decision tree for the node where the root is tagged by an attribute . For each , there is an edge that exits the root and enters the root of the decision tree . We tag this edge by the equation system . It is obvious that One can easily prove using (3) that is a decision tree with the minimum number of internal nodes for such that the root of the tree is tagged by the attribute and it uses decision trees of the type k for the subtables corresponding to the children of the root. It is obvious not to consider attributes . The reason is that for such , we can find with . Therefore, we cannot construct an optimal tree for based on . Construction of the tree. We build the set of attributes . For any , construct the decision tree and choose among these trees a tree with the minimum number of internal nodes. We return this tree as . We now describe a decision tree for . The root of this tree is tagged by an admissible hypothesis for . For any equation system , there is an edge that exits the root of and enters the root of the tree . This edge is tagged by the equation system S. It is obvious that One can easily prove using (4) that is a decision tree with the minimum number of internal nodes for such that the root of the tree is tagged by the hypothesis H and it uses decision trees of the type k for the subtables corresponding to the children of the root. It is obvious not to consider hypotheses H that are not admissible for . The reason is that for such H, we can find an equation system with . Therefore, we cannot construct an optimal decision tree for based on such H. Construction of the tree. We construct a hypothesis for . If , then is the only value in . Let . Then is the minimum number from such that . Obviously, is admissible for . Return the tree as . Using (4), one can prove the correctness of this procedure. Construction of the tree. For each row of the decision table T, let us consider a proper hypothesis . We inspect if is admissible for . If yes, then we construct the decision tree . We choose among the constructed trees a tree with the minimum number of internal nodes. Return this tree as . Given input of a decision table T and , the following Algorithm 3 builds for each node of the DAG a decision tree of the type k for the table having the minimum number of internal nodes. Input: T (a nonempty decision table), (the directed acyclic graph for T), and k (a natural number between 1 to 5). Output: A decision tree . Check all nodes of the DAG whether there is a decision tree attached to each node. If yes, then return the tree attached to the node T as and break the algorithm. If not, select a node of the graph that does not have an attached tree. It can be either a leaf node of or an internal node of where all children are tagged with trees. If is a leaf node, then attach to it the decision tree that have only a single node. This node is tagged with the decision attached to all rows of . Move to step 1. If is not a leaf node, then do the following according to the value k: When , construct the tree and attach it to as the tree . When , construct the tree and attach it to as the tree . When , construct the trees and , choose between them a tree with the minimum number of internal nodes and attach it to as the tree . When , construct the tree and attach it to as the tree . When , construct the trees and , choose between them a tree with the minimum number of internal nodes and attach it to as the tree . Move to step 1. Let T be a decision table and We use the following notation: and .

7. Experimental Results and Discussion

In this section, we describe the results of the experiments. First, we accomplished the experiments on eight decision tables from the UCI ML Repository [16]. We give the description of these tables in Table 1 where we show first the name of the table (Name), then number of rows (#Rows) and the number of attributes (#Attrs). We arranged the decision tables in Table 1 based on the number of rows. For each of these tables, we built an optimal decision tree of each of five possible types for each of the two possible cost functions. From these trees, we derive decision rules and study their coverage and length.

Table 1

Description of decision tables from [16] which were used in experiments.

Name	#Rows	#Attrs
soybean-small	47	36
zoo-data	59	17
hayes-roth-data	69	5
breast-cancer	266	10
balance-scale	625	5
tic-tac-toe	958	10
cars	1728	7
nursery	12,960	9

Next, we experimented with 100 Boolean functions having n variables () which are generated randomly. Let f be such a Boolean function with n variables . We can map it to a decision table having n attributes . This table has rows corresponding to all possible n-tuples of variable values. We label each row with the decision that is the value of the function f for the considered row. The decision trees for the table are interpreted as decision trees that compute the function f. For each of tables representing the generated Boolean functions, we build an optimal decision tree of each of five possible types for each of the two possible cost functions. From these trees, we derive decision rules and study their coverage and length.

7.1. Decision Trees with Minimum Depth

The results of experiments based on eight decision tables from [16] and decision trees optimal relative to the depth are represented in Table 2 and Table 3. The first column of Table 2 contains the name of the considered decision table T. The last five columns contain values (minimum values for each decision table are in bold).

Table 2

Results for decision tables from [16]: length of decision rules derived from decision trees with minimum depth.

Decision Table T	lh(1)(T)	lh(2)(T)	lh(3)(T)	lh(4)(T)	lh(5)(T)
soybean-small	1.89	1.00	1.89	1.55	1.89
zoo-data	3.69	1.56	2.42	2.17	3.24
hayes-roth-data	2.83	2.22	2.16	2.32	2.35
breast-cancer	3.61	2.68	2.71	2.70	2.78
balance-scale	3.60	3.20	3.20	3.21	3.20
tic-tac-toe	5.09	3.04	3.40	3.24	3.14
cars	3.72	2.44	2.48	3.07	3.02
nursery	5.78	3.16	4.53	3.12	4.50
Average	3.78	2.41	2.85	2.67	3.01

Table 3

Results for decision tables from [16]: coverage of decision rules derived from decision trees with minimum depth.

Decision Table T	ch(1)(T)	ch(2)(T)	ch(3)(T)	ch(4)(T)	ch(5)(T)
soybean-small	3.47	12.53	3.47	10.62	3.47
zoo-data	7.88	10.78	9.80	10.86	6.46
hayes-roth-data	3.46	6.20	6.49	5.48	5.45
breast-cancer	4.98	9.30	8.36	9.38	6.90
balance-scale	2.60	4.19	4.18	4.16	4.19
tic-tac-toe	8.38	66.01	27.23	56.64	61.45
cars	197.60	332.76	330.35	97.20	99.42
nursery	29.33	1524.04	304.71	1530.50	246.14
Average	32.21	245.73	86.82	215.60	54.18

The first column of Table 3 contains the name of the considered decision table T. The last five columns contain values (maximum values for each decision table are in bold). The results of experiments based on Boolean functions and decision trees optimal relative to the depth are represented in Table 4 and Table 5. The first column of Table 4 contains the number n of variables in the considered Boolean functions. The last five columns contain information about values in the format (minimum values of for each n are in bold).

Table 4

Results for Boolean functions: length of decision rules derived from decision trees with minimum depth.

Number of Variables n	lh(1)	lh(2)	lh(3)	lh(4)	lh(5)
3	1.502.202.75	1.252.052.63	1.251.992.50	1.252.082.63	1.252.012.50
4	1.883.183.75	1.632.923.50	1.632.873.50	1.632.913.50	1.632.943.50
5	3.444.094.63	3.003.644.22	2.973.604.06	3.133.664.13	3.093.704.19
6	4.785.145.47	3.984.364.77	3.984.414.75	3.974.354.70	4.034.464.78

Table 5

Results for Boolean functions: coverage of decision rules derived from decision trees with minimum depth.

Number of Variables n	ch(1)	ch(2)	ch(3)	ch(4)	ch(5)
3	1.251.943.00	1.382.223.63	1.502.213.63	1.382.143.63	1.502.173.63
4	1.251.995.38	1.502.576.44	1.502.526.44	1.502.536.44	1.502.366.44
5	1.382.103.69	2.033.034.56	2.062.984.97	2.032.934.69	1.812.764.66
6	1.592.032.84	2.693.554.81	2.583.374.64	2.723.534.70	2.533.244.69

The first column of Table 5 contains the number n of variables in the considered Boolean functions. The last five columns contain information about values in the format (maximum values of for each n are in bold).

7.2. Decision Trees Containing Minimum Number of Internal Nodes

We present the results based on the decision tables from [16] and decision trees optimal relative to the number of internal nodes in Table 6 and Table 7. The first column of Table 6 contains the name of the considered decision table T. The last five columns contain values (minimum values for each decision table are in bold).

Table 6

Results for decision tables from [16]: length of decision rules derived from decision trees with minimum number of internal nodes.

Decision Table T	lLw(1)(T)	lLw(2)(T)	lLw(3)(T)	lLw(4)(T)	lLw(5)(T)
soybean-small	1.34	1.00	1.34	1.51	1.34
zoo-data	3.05	1.69	3.05	2.39	3.05
hayes-roth-data	2.64	2.22	2.61	2.23	2.61
breast-cancer	4.98	2.72	5.30	2.73	5.27
balance-scale	3.55	3.20	3.53	3.20	3.53
tic-tac-toe	4.45	3.35	4.41	3.15	4.45
cars	2.97	2.49	2.96	2.49	2.96
nursery	3.77	3.19	3.77	3.19	3.77
Average	3.34	2.48	3.37	2.61	3.37

Table 7

Results for decision tables from [16]: coverage of decision rules derived from decision trees with minimum number of internal nodes.

Decision Table T	cLw(1)(T)	cLw(2)(T)	cLw(3)(T)	cLw(4)(T)	cLw(5)(T)
soybean-small	11.51	12.53	11.51	9.81	11.51
zoo-data	10.69	10.68	10.69	10.63	10.69
hayes-roth-data	3.84	6.20	3.87	6.20	3.87
breast-cancer	2.73	8.96	3.05	9.06	3.15
balance-scale	2.79	4.21	2.88	4.21	2.88
tic-tac-toe	22.49	30.19	23.50	56.50	22.69
cars	237.33	332.46	237.37	332.46	237.37
nursery	1471.45	1527.95	1471.47	1527.95	1471.47
Average	220.35	241.65	220.54	244.60	220.45

The first column of Table 7 contains the name of the considered decision table T. The last five columns contain values (maximum values for each decision table are in bold). The results of experiments based on Boolean functions and decision trees optimal relative to the number of internal nodes are represented in Table 8 and Table 9. The first column of Table 8 contains the number n of variables in the considered Boolean functions. The last five columns contain information about values in the format (minimum values of for each n are in bold).

Table 8

Results for Boolean functions: length of decision rules derived from decision trees with minimum number of internal nodes.

Number of Variables n	lLw(1)	lLw(2)	lLw(3)	lLw(4)	lLw(5)
3	1.502.072.75	1.252.062.63	1.251.942.50	1.252.062.63	1.251.942.50
4	1.882.903.50	1.632.943.50	1.812.793.50	1.632.943.50	1.812.793.50
5	3.133.754.19	3.003.774.31	3.133.694.25	3.003.774.31	3.133.694.25
6	4.284.695.06	4.134.695.19	4.254.614.98	4.134.695.19	4.254.614.98

Table 9

Results for Boolean functions: coverage of decision rules derived from decision trees with minimum number of internal nodes.

Number of Variables n	cLw(1)	cLw(2)	cLw(3)	cLw(4)	cLw(5)
3	1.252.143.00	1.382.223.63	1.502.273.63	1.382.223.63	1.502.273.63
4	1.632.515.38	1.502.566.44	1.502.635.44	1.502.566.44	1.502.635.44
5	1.882.794.19	1.942.914.59	1.752.824.34	1.942.914.59	1.752.824.34
6	2.162.884.09	2.093.164.58	2.272.994.11	2.093.164.58	2.272.994.11

The first column of Table 9 contains the number of variables n in the considered Boolean functions. The last five columns contain information about values in the format (maximum values of for each n are in bold).

7.3. Analysis of Experimental Results

The experimental results show that the decision rules derived from decision trees with hypotheses in many cases are better than the ones derived from conventional decision trees. In particular, in the case of decision trees with the minimum depth, for each row in Table 2, Table 3, Table 4 and Table 5, the results for type 2 decision trees are better than the results for type 1 decision trees. In the case of decision trees with a minimum number of internal nodes, for each row of Table 6, Table 7, Table 8 and Table 9 (with the exception of the row zoo-data in Table 7) there is a number such that the results for type k decision trees are superior compared to the results for type 1 decision trees. Note that for the decision trees with the minimum depth, for each decision table from [16] considered in this paper, the best results related to the length and the coverage among decision trees of types are close to the optimal ones obtained in [18] with the help of dynamic programming algorithms for the construction of optimal decision rules. Results for the decision trees of the type 1 are, generally, far from the optimal ones. From the obtained experimental results, it follows that the decision rules derived from optimal decision trees with hypotheses are more preferable as a tool for the representation of information than the decision rules derived from optimal conventional decision trees.

8. Conclusions

In this paper, we studied modified decision trees that use two types of queries. We constructed optimal trees relative to two cost functions for a number of known datasets from the UCI Machine Learning Repository and randomly generated Boolean functions, and compared the length as well as coverage of decision rules extracted from the constructed decision trees. The experimental results confirmed that the decision rules derived from the decision trees with hypotheses in many cases are better than the ones derived from conventional decision trees.

1 in total

1. Entropy-Based Greedy Algorithm for Decision Trees Using Hypotheses.

Authors: Mohammad Azad; Igor Chikalov; Shahid Hussain; Mikhail Moshkov
Journal: Entropy (Basel) Date: 2021-06-25 Impact factor: 2.524

1 in total