Literature DB >> 35052142

On the Depth of Decision Trees with Hypotheses.

Mikhail Moshkov1.   

Abstract

In this paper, based on the results of rough set theory, test theory, and exact learning, we investigate decision trees over infinite sets of binary attributes represented as infinite binary information systems. We define the notion of a problem over an information system and study three functions of the Shannon type, which characterize the dependence in the worst case of the minimum depth of a decision tree solving a problem on the number of attributes in the problem description. The considered three functions correspond to (i) decision trees using attributes, (ii) decision trees using hypotheses (an analog of equivalence queries from exact learning), and (iii) decision trees using both attributes and hypotheses. The first function has two possible types of behavior: logarithmic and linear (this result follows from more general results published by the author earlier). The second and the third functions have three possible types of behavior: constant, logarithmic, and linear (these results were published by the author earlier without proofs that are given in the present paper). Based on the obtained results, we divided the set of all infinite binary information systems into four complexity classes. In each class, the type of behavior for each of the considered three functions does not change.

Entities:  

Keywords:  complexity classes; decision trees; exact learning; rough set theory; test theory

Year:  2022        PMID: 35052142      PMCID: PMC8774416          DOI: 10.3390/e24010116

Source DB:  PubMed          Journal:  Entropy (Basel)        ISSN: 1099-4300            Impact factor:   2.524


1. Introduction

Decision trees are studied in different areas of computer science, in particular in exact learning [1], rough set theory [2,3,4], and test theory [5]. In some sense, these theories deal with dual objects: for example, membership queries from exact learning correspond to attributes from test theory and rough set theory. In contrast to test theory and rough set theory, in exact learning, besides membership queries, equivalence queries are also considered. We extend the model considered in test theory and rough set theory by adding the notion of a hypothesis that is an analog of equivalence query. Papers [6,7,8,9,10] are related mainly to the experimental study of decision trees with hypotheses. The present paper contains a theoretical study of the depth of decision trees with hypotheses. An infinite binary information system is a pair where A is an infinite set of elements and F is an infinite set of functions (attributes) from A to . A problem over U is given by a finite number of attributes from F: for , we should find the tuple . To solve this problem, we can use decision trees with two types of queries. We can ask about the value of an attribute . As a result, we obtain an answer of the kind where . We also can ask if a hypothesis is true where . Either we obtain the confirmation or a counterexample in the form . The depth of decision trees with hypotheses can be essentially less than the depth of decision trees using only attributes. As an example, we consider the problem of the computation of the disjunction . The minimum depth of a decision tree solving this problem using only attributes is equal to n. However, the minimum depth of a decision tree with hypotheses solving this problem is equal to one: it is enough to ask only about the hypothesis . If it is true, then the considered disjunction is equal to zero. Otherwise, it is equal to one. Based on the results of exact learning, rough set theory, and test theory [1,11,12,13,14,15,16], we study for an arbitrary infinite binary information system three functions of the Shannon type that characterize the growth in the worth case of the minimum depth of a decision tree solving a problem with the growth of the number of attributes in the problem description. The considered three functions correspond to the following three cases: Only attributes are used in decision trees; Only hypotheses are used in decision trees; Both attributes and hypotheses are used in decision trees. We show that the first function has two possible types of behavior: logarithmic and linear. The second and third functions have three possible types of behavior: constant, logarithmic, and linear. Bounds for the case (i) can be derived from more general results obtained in [15,16]. Results related to the cases (ii) and (iii) were presented in the conference paper [17] without proofs. In the present paper, we consider complete proofs for the cases (ii) and (iii). We also investigate the join behavior of these three functions and describe four complexity classes of infinite binary information systems; these results are completely new. The obtained results allow us to understand the difference of time complexity for conventional decision trees that use only queries based on one attribute each and for decision trees with hypotheses. Moreover, we know now which combinations of types of behavior of the three Shannon-type functions we can take under consideration of an arbitrary infinite binary system, and we know the criteria for each combination. This paper consists of six sections. In Section 2 and Section 3, we consider the basic notions and main results. Section 4 and Section 5 contain proofs of the main results, and Section 6 gives a short conclusion.

2. Basic Notions

Let A be a set of elements and F be a set of functions from A to . Functions from F are called attributes, and the pair is called a binary information system (this notion is close to the notion of information systems proposed by Pawlak [18]). If A and F are infinite sets, then the pair is called an infinite binary information system. A problem over U is an arbitrary n-tuple where , is the set of natural numbers , and . The problem z may be interpreted as a problem of searching for the tuple for an arbitrary . The number is called the dimension of the problem z. Denote . We denote by the set of problems over U. A system of equations over U is an arbitrary equation system of the kind: where , , and (if , then the considered equation system is empty). This equation system is called a system of equations over z if . The considered equation system is called consistent (on A) if its set of solutions on A is nonempty. The set of solutions of the empty equation system coincides with A. As algorithms for problem z solving, we consider decision trees with two types of queries. We can choose an attribute and ask about its value. This query has two possible answers: and . We can formulate a hypothesis over z in the form where and ask about this hypothesis. This query has possible answers: where and . The first answer means that the hypothesis is true. Other answers are counterexamples. A decision tree over z is a marked finite directed tree with the root in which: Each terminal node is labeled with an n-tuple from the set ; Each node, which is not terminal (such nodes are called working), is labeled with an attribute from the set or with a hypothesis over z; If a working node is labeled with an attribute from , then there are two edges, which leave this node and are labeled with the systems of equations and , respectively; If a working node is labeled with a hypothesis: over z, then there are edges, which leave this node and are labeled with the system of equations , respectively. Let be a decision tree over z. A complete path in is an arbitrary directed path from the root to a terminal node in . We now define an equation system over U associated with the complete path . If there are no working nodes in , then is the empty system. Otherwise, is the union of equation systems assigned to the edges of the path . We denote by the set of solutions on A of the system of equations (if this system is empty, then its solution set is equal to A). We say that a decision tree over z solves the problem z relative to U if, for each element and for each complete path in such that , the terminal node of the path is labeled with the tuple . We now consider an equivalent definition of a decision tree solving a problem. Denote by the set of tuples such that the system of equations is consistent. The set is the set of all possible solutions to the problem z. Let , , and . Denote: the set of all n-tuples for which . Let be a decision tree over the problem z. We correspond to each complete path in the tree a word in the alphabet . If the equation system is empty, then is the empty word. If , then . The decision tree over z solves the problem z relative to U if, for each complete path in , the set contains at most one tuple, and if this set contains exactly one tuple, then the considered tuple is assigned to the terminal node of the path . As the time complexity of a decision tree , we consider its depth , that is the maximum number of working nodes in a complete path in the tree . Let . We denote by the minimum depth of a decision tree over z, which solves z relative to U and uses only attributes from . We denote by the minimum depth of a decision tree over z, which solves z relative to U and uses only hypotheses over z. We denote by the minimum depth of a decision tree over z, which solves z relative to U and uses both attributes from and hypotheses over z. For , we define a function of the Shannon type that characterizes the dependence of on in the worst case. Let and . Then:

3. Main Results

Let be an infinite binary information system and . The information system U is called r-reduced if, for each consistent on A system of equations over U, there exists a subsystem of this system that has the same set of solutions and contains at most r equations. We denote by the set of infinite binary information systems each of which is r-reduced for some . The next theorem follows from the results obtained in [15], where we considered closed classes of test tables (decision tables). It also follows from the results obtained in [16], where we considered the weighted depth of decision trees. Let U be an infinite binary information system. Then, the following statements hold: If If A subset of F is called independent if, for any , the system of equations is consistent on the set A. The empty set of attributes is independent by definition. We now define the independence dimension or I-dimension of the information system U (this notion is similar to the notion of the independence number of the family of sets considered by Naiman and Wynn in [19]). If, for each , the set F contains an independent subset of cardinality m, then . Otherwise, is the maximum cardinality of an independent subset of the set F. We denote by the set of infinite binary information systems with a finite independence dimension. Let be a binary information system, which is not necessarily infinite, , and . Denote: We now define inductively the notion of a k-information system, . The binary information system U is called a 0-information system if all attributes from F are constant on the set A. Let, for some , the notion of a m-information system be defined for . The binary information system U is called a -information system if it is not a m-information system for and, for any , there exist numbers and such that the information system is a m-information system. It is easy to show by induction on k that if is a k-information system, then , , is a l-information system for some . We denote by the set of infinite binary information systems for each of which there exists such that the considered system is a k-information system. The following theorem was presented in [17] without proof. Let U be an infinite binary information system. Then, the following statements hold: If If If Let U be an infinite binary information system. We now consider the join behavior of the functions , , and . It depends on the belonging of the information system U to the sets , , and . We correspond to the information system U its indicator vector in which if and only if , if and only if , and if and only if . For any infinite binary information system, its indicator vector coincides with one of the rows of For , we denote by the class of all infinite binary information systems, for which the indicator vector coincides with the ith row of Table 1. Table 2 summarizes Theorems 1–3. The first column contains the name of complexity class . The next three columns describe the indicator vector of information systems from this class. The last three columns , , and contain information about the behavior of the functions , , and for information systems from the class .
Table 1

Possible indicator vectors of infinite binary information systems.

R D C
1000
2010
3011
4110
Table 2

Summary of Theorems 1–3.

R D C hU(1)(n) hU(2)(n) hU(3)(n)
V1 000 n n n
V2 010 n Θ(logn) Ω(lognloglogn),O(logn)
V3 011 n O(1) O(1)
V4 110 Θ(logn) Θ(logn) Ω(lognloglogn),O(logn)

4. Proof of Theorem 2

We precede with the proof of Theorem 2 by two lemmas. Let . A d-complete tree over the information system  is a marked finite directed tree with the root in which: Each terminal node is not labeled; Each nonterminal node is labeled with an attribute . There are two edges leaving this node that are labeled with the systems of equations and , respectively; The length of each complete path (the path from the root to a terminal node) is equal to d; For each complete path , the equation system , which is the union of equation systems assigned to the edges of the path , is consistent. Let G be a d-complete tree over U and be the set of all attributes attached to the nonterminal nodes of the tree G. The number of nonterminal nodes in G is equal to . Therefore, . The results mentioned in the following lemma are obtained by methods similar to those used by Littlestone [12], Maass and Turán [13], and Angluin [11]. Let (a) We prove the inequality by induction on d. Let . Then, the tree G has the only one nonterminal node, which is labeled with an attribute f that is not constant on A. Therefore, and . Let, for and for any natural d, , the considered statement hold. Assume now that , G is a d-complete tree over U, is a problem over U such that , and is a decision tree over z with the minimum depth, which solves the problem z and uses only hypotheses. Let f be the attribute attached to the root of the tree G and H be the hypothesis attached to the root of the decision tree . Then, there is an edge that leaves the root of and is labeled with the equation system where the equation belongs to the hypothesis H. This edge enters to the root of the subtree of , which is denoted by . There is an edge that leaves the root of G and is labeled with the equation system . This edge enters the root of the subtree of G, which is denoted by . One can show that the decision tree solves the problem z relative to the information system and is a t-complete tree over . It is clear that . Using the inductive hypothesis, we obtain . Therefore, and . (b) We now prove the inequality . Let and be a decision tree over z with the minimum depth, which solves the problem z and uses both attributes and hypotheses. The d-complete tree G has complete paths . For , we denote by a solution of the equation system . Denote . We now show that the decision tree contains a complete path, the length of which is at least . We describe the process of this path construction beginning with the root of . Let the root of be labeled with an attribute . For , we denote by the set of solutions on B of the equation system and choose for which . It is clear that . In the considered case, the beginning of the constructed path in is the root of , the edge that leaves the root and is labeled with the equation system , and the node to which this edge enters. Let as assume now that the root of is labeled with a hypothesis . We denote by the complete path in G for which the system of equations is a subsystem of H. Let the nonterminal nodes of the complete path be labeled with the attributes . For , we denote by the set of solutions on B of the equation system . It is clear that . Therefore, there exists such that . In the considered case, the beginning of the constructed path in is the root of , the edge that leaves the root and is labeled with the equation system , and the node to which this edge enters. We continue the construction of the complete path in in the same way such that after the tth query, we have at least elements from B. The process of path construction continues at least until , i.e., at least until . Since we have and . □ Let We prove the considered statement by induction on k. Let . In this case, U is not a 0-information system. Then, there exists an attribute , which is not constant on A. Using this attribute, it is easy to construct a 1-complete tree over U. Let the considered statement hold for some k, . We now show that it also holds for . Let be a binary information system, which is not an m-information system for . Then, there exists an attribute such that, for any , the information system is not an m-information system for . Using the inductive hypothesis, we conclude that, for any , there exists a -complete tree over . Denote by G a directed tree with root in which the root is labeled with the attribute f, and for any , there is an edge that leaves the root, is labeled with the equation system , and enters the root of the tree . One can show that the tree G is a -complete tree over U. □ It is clear that for any problem z over U. Therefore, for any . (a) Let . We now show by induction on k that, for each binary k-information system U (not necessarily infinite) for each problem z over U, the inequality holds. Let be a binary 0-information system and z be a problem over U. Since all attributes from are constant on A, the set contains only one tuple. Therefore, the decision tree containing only one node labeled with this tuple solves the problem z relative to U, and . Let and, for each m, , the considered statement hold. Let us show that it holds for . Let be a binary -information system and be a problem over U. For , choose a number such that the information system is an -information system where . Using the inductive hypothesis, we conclude that, for , there is a decision tree over z, which uses only hypotheses, solves the problem z over , and has depth at most . We denote by a decision tree in which the root is labeled with the hypothesis , the edge leaving the root and labeled with H enters the terminal node labeled with the tuple , and for , the edge leaving the root and labeled with enters the root of the tree . One can show that solves the problem z relative to U and . Therefore, for any problem z over U. Let . Then, U is a k-information system for some natural k, and for each problem z over U, we have . Therefore, and . (b) Let . First, we show that . Let be an arbitrary problem over U. From Lemma 5.1 [16], it follows that . The proof of this lemma is based on results similar to the ones obtained by Sauer [20] and Shelah [21]. We consider a decision tree over z, which solves z relative to U and uses only hypotheses. This tree is constructed by the halving algorithm [1,12]. We describe the work of this tree for an arbitrary element a from A. Set . If , then the only n-tuple from is the solution of the problem z for the element a. Let . For , we denote by a number from such that . The root of is labeled with the hypothesis . After this query, either the problem z is solved (if the answer is H) or we halve the number of objects in the set (if the answer is a counterexample ). In the latter case, set . The decision tree continues to work with the element a and the set of n-tuples in the same way. Let, during the work with the element a, the considered decision tree make q queries. After the th query, the number of remaining n-tuples in the set is at least two and at most . Therefore, and . Therefore, during the processing of the element a, the decision tree makes at most queries. Since a is an arbitrary element from A, the depth of is at most . Since z is an arbitrary problem over U, we obtain . Therefore, . Using Lemma 2 and the relation , we obtain that, for any , there exists d-complete tree over U. Let . We know that . Denote . From Lemma 1, it follows that and . As a result, we have and . Let and . Then, there exists such that . We have , , , and . It is easy to show that the function is nondecreasing for . Therefore, and . (c) Let . We now consider an arbitrary problem over U and a decision tree over z, which uses only hypotheses and solves the problem z over U in the following way. For a given element , the first query is about the hypothesis . If the answer is , then the problem z is solved for the element a. If, for some , the answer is , then the second query is about the hypothesis obtained from by replacing the equality with the equality , etc. It is clear that after at most n queries, the problem z for the element a will be solved. Thus, and . Since z is an arbitrary problem over U, we have and for any . Let . Since , there exist attributes such that, for any , the equation system is consistent on A. We now consider the problem and an arbitrary decision tree over z, which solves the problem z over U and uses both attributes and hypotheses. Let us show that . If , then the considered inequality holds since . Let . It is easy to show that an equation system over z is inconsistent if and only if it contains equations and for some . For each node v of the decision tree , we denote by the union of systems of equations attached to edges in the path from the root of to v. A node v of will be called consistent if the equation system is consistent. We now construct a complete path in the decision tree , for which the nodes are consistent. We start from the root that is a consistent node. Let the path reach a consistent node v of . If v is a terminal node, then the path is constructed. Let v be a working node labeled with an attribute . Then, there exists for which the system of equations is consistent. Then, the path will pass through the edge leaving v and labeled with the system of equations . Let v be labeled with a hypothesis . If there exists such that the system of equations is consistent, then the path will pass through the edge leaving v and labeled with the system of equations . Otherwise, , and the path will pass through the edge leaving v and labeled with the system of equations H. Let all edges in the path be labeled with systems of equations containing one equation each. Since all nodes of are consistent, the equation system is consistent. We now show that contains at least n equations. Let us assume that this system contains less than n equations. Then, the set contains more than one n-tuple, which is impossible. Therefore, the length of the path is at least n. Let there be edges in , which are labeled with hypotheses, and the first edge in labeled with a hypothesis H leaves the node v. Then, , and the length of is at least n. Therefore, , , and . As a result, we obtain and . Thus, and for any . □

5. Proof of Theorem 3

First, we prove several auxiliary statements. Let . By Theorem 1, . Let us assume that . Then, for any , there exists a problem over U such that . Let be a decision tree over z, which solves the problem z relative to U and uses only attributes. Then, should have at least terminal nodes. One can show that the number of terminal nodes in the tree is at most . Then, , , and Therefore, for any , which is impossible. Thus, . □ Let . By Theorem 2, . Let us assume that . Then, by Theorem 2, for any , which is impossible. Therefore, . □ Assume the contrary: and . Let , U be an r-reduced information system and U be a k-information system. We now consider an arbitrary problem over U and describe a decision tree over z, which uses only attributes, solves the problem z over U, and has depth at most . For , let be a number from such that is an -information system with . Let t be the maximum number from the set such that the system of equations is consistent. Then, there exists a subsystem of the system S, which has the same set of solutions as S and for which . For a given , the decision tree computes sequentially values . If, for some , , and , then the decision tree continues to work with the problem z and the information system where is the set of solutions on A of the equation system . We have that is an -information system for some . Let . If , then is the solution of the problem z for the considered element a. Let . Then, the decision tree continues to work with the problem z and the information system where is the set of solutions on A of the equation system . We know that the equation system is inconsistent. Therefore, the system is inconsistent. Hence, and is an -information system for some . As a result, after the computation of the values of at most r attributes, we either solve the problem z or reduce the consideration of the problem z over the k-information system U to the consideration of the problem z over some l-information system where . After the computation of the values of at most attributes, we solve the problem z since each problem over the 0-information system has exactly one possible solution. Therefore, and . By Theorem 1, . The obtained contradiction shows that . □ For any infinite binary information system, its indicator vector coincides with one of the rows of Table 3 contains as rows all three-tuples from the set . We now show that the rows with the numbers 5–8 cannot be indicator vectors of infinite binary information systems. Assume the contrary: there is such that the row with the number i is the indicator vector of an infinite binary information system U. If , then and , but this is impossible, since, by Proposition 1, . If , then and , but this is impossible, since, by Proposition 2, . If , then and , but this is impossible, since, by Proposition 1, . If , then and , but this is impossible, since, by Proposition 3, . Therefore, for any infinite binary information system, its indicator vector coincides with one of the rows of Table 3 with Numbers 1–4. Thus, it coincides with one of the rows of Table 1. □
Table 3

All 3-tuples from the set .

R D C
1000
2010
3011
4110
5100
6001
7101
8111
Define an infinite binary information system as follows: and is the set of all functions from to . The information system It is easy to show that the information system has an infinite I-dimension. Therefore, . Using Proposition 4, we obtain , i.e., . □ For any , we define two functions and . Let . Then, if and only if and if and only if . Define an infinite binary information system as follows: and . The information system For , denote . One can show that the equation system is consistent and each proper subsystem of has a set of solutions different from the set of solutions of . Therefore, . Using attributes from the set , we can construct a d-complete tree over for each . By Lemma 1 and Theorem 2, . One can show that . Therefore, . Thus, , i.e., . □ Define an infinite binary information system as follows: and . The information system It is easy to show that is a 1-information system. Therefore, . Using Proposition 4, we obtain , i.e., . □ Define an infinite binary information system as follows: and . The information system Let us consider an arbitrary consistent system of equations S over . We now show that there is a subsystem of S, which has at most two equations and the same set of solutions as S. Let S contain both equations of the kind and . Denote and . One can show that the system of equations has the same set of solutions as S. The case when S contains for some only equations of the kind can be considered in a similar way. In this case, the equation system contains only one equation. Therefore, the information system is 2-reduced and . Using Proposition 4, we obtain , i.e., . □ From Proposition 4, it follows that, for any infinite binary information system, its indicator vector coincides with one of the rows of Table 1. Using Lemmas 3–6, we conclude that each row of Table 1 is the indicator vector of some infinite binary information system. □

6. Conclusions

Based on the results of exact learning, test theory, and rough set theory, for an arbitrary infinite binary information system, we studied three functions of the Shannon type, which characterize the dependence in the worst case of the minimum depth of a decision tree solving a problem on the number of attributes in the problem description. These three functions correspond to (i) decision trees using attributes, (ii) decision trees using hypotheses, and (iii) decision trees using both attributes and hypotheses. We described possible types of behavior for each of these three functions. We also studied the join behavior of these functions and distinguished four corresponding complexity classes of infinite binary information systems. In the future, we plan to translate the obtained results into the language of exact learning. The problems studied in this paper allow us to confine ourselves to considering only the crisp (conventional) sets that are completely defined by attributes. However, in the future, when we investigate approximately defined problems or approximate decision trees, it will be necessary to work with rough sets given by their lower and upper approximations. This will require a wider range of rough set theory techniques than those used in the present paper.
  1 in total

1.  Entropy-Based Greedy Algorithm for Decision Trees Using Hypotheses.

Authors:  Mohammad Azad; Igor Chikalov; Shahid Hussain; Mikhail Moshkov
Journal:  Entropy (Basel)       Date:  2021-06-25       Impact factor: 2.524

  1 in total
  1 in total

1.  Random RotBoost: An Ensemble Classification Method Based on Rotation Forest and AdaBoost in Random Subsets and Its Application to Clinical Decision Support.

Authors:  Shin-Jye Lee; Ching-Hsun Tseng; Hui-Yu Yang; Xin Jin; Qian Jiang; Bin Pu; Wei-Huan Hu; Duen-Ren Liu; Yang Huang; Na Zhao
Journal:  Entropy (Basel)       Date:  2022-04-28       Impact factor: 2.738

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.