Literature DB >> 28500013

New Splitting Criteria for Decision Trees in Stationary Data Streams.

Maciej Jaworski, Piotr Duda, Leszek Rutkowski, Maciej Jaworski, Piotr Duda, Leszek Rutkowski, Leszek Rutkowski, Piotr Duda, Maciej Jaworski.   

Abstract

The most popular tools for stream data mining are based on decision trees. In previous 15 years, all designed methods, headed by the very fast decision tree algorithm, relayed on Hoeffding's inequality and hundreds of researchers followed this scheme. Recently, we have demonstrated that although the Hoeffding decision trees are an effective tool for dealing with stream data, they are a purely heuristic procedure; for example, classical decision trees such as ID3 or CART cannot be adopted to data stream mining using Hoeffding's inequality. Therefore, there is an urgent need to develop new algorithms, which are both mathematically justified and characterized by good performance. In this paper, we address this problem by developing a family of new splitting criteria for classification in stationary data streams and investigating their probabilistic properties. The new criteria, derived using appropriate statistical tools, are based on the misclassification error and the Gini index impurity measures. The general division of splitting criteria into two types is proposed. Attributes chosen based on type- splitting criteria guarantee, with high probability, the highest expected value of split measure. Type- criteria ensure that the chosen attribute is the same, with high probability, as it would be chosen based on the whole infinite data stream. Moreover, in this paper, two hybrid splitting criteria are proposed, which are the combinations of single criteria based on the misclassification error and Gini index.

Entities:  

Year:  2017        PMID: 28500013     DOI: 10.1109/TNNLS.2017.2698204

Source DB:  PubMed          Journal:  IEEE Trans Neural Netw Learn Syst        ISSN: 2162-237X            Impact factor:   10.451


  2 in total

1.  Risk prediction of cardiovascular disease using machine learning classifiers.

Authors:  Madhumita Pal; Smita Parija; Ganapati Panda; Kuldeep Dhama; Ranjan K Mohapatra
Journal:  Open Med (Wars)       Date:  2022-06-17

2.  Complement as Prognostic Biomarker and Potential Therapeutic Target in Renal Cell Carcinoma.

Authors:  Britney Reese; Ashok Silwal; Elizabeth Daugherity; Michael Daugherity; Mahshid Arabi; Pierce Daly; Yvonne Paterson; Layton Woolford; Alana Christie; Roy Elias; James Brugarolas; Tao Wang; Magdalena Karbowniczek; Maciej M Markiewski
Journal:  J Immunol       Date:  2020-11-06       Impact factor: 5.422

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.