Literature DB >> 20351798

A Framework for Learning from Distributed Data Using Sufficient Statistics and its Application to Learning Decision Trees.

Doina Caragea1, Adrian Silvescu, Vasant Honavar.   

Abstract

This paper motivates and precisely formulates the problem of learning from distributed data; describes a general strategy for transforming traditional machine learning algorithms into algorithms for learning from distributed data; demonstrates the application of this strategy to devise algorithms for decision tree induction from distributed data; and identifies the conditions under which the algorithms in the distributed setting are superior to their centralized counterparts in terms of time and communication complexity; The resulting algorithms are provably exact in that the decision tree constructed from distributed data is identical to that obtained in the centralized setting. Some natural extensions leading to algorithms for learning from heterogeneous distributed data and learning under privacy constraints are outlined.

Entities:  

Year:  2004        PMID: 20351798      PMCID: PMC2846376          DOI: 10.3233/his-2004-11-210

Source DB:  PubMed          Journal:  Int J Hybrid Intell Syst        ISSN: 1448-5869


  2 in total

1.  Information Integration from Semantically Heterogeneous Biological Data Sources.

Authors:  Doina Caragea; Jie Bao; Jyotishman Pathak; Adrian Silvescu; Carson Andorf; Drena Dobbs; Vasant Honavar
Journal:  Int Workshop databases Expert Syst Appl       Date:  2005-08-26

2.  Learning accurate and concise naïve Bayes classifiers from attribute value taxonomies and data.

Authors:  J Zhang; D-K Kang; A Silvescu; V Honavar
Journal:  Knowl Inf Syst       Date:  2006-02-01       Impact factor: 2.822

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.