Literature DB >> 16522200

Locating previously unknown patterns in data-mining results: a dual data- and knowledge-mining method.

Mir S Siadaty1, William A Knaus.   

Abstract

BACKGROUND: Data mining can be utilized to automate analysis of substantial amounts of data produced in many organizations. However, data mining produces large numbers of rules and patterns, many of which are not useful. Existing methods for pruning uninteresting patterns have only begun to automate the knowledge acquisition step (which is required for subjective measures of interestingness), hence leaving a serious bottleneck. In this paper we propose a method for automatically acquiring knowledge to shorten the pattern list by locating the novel and interesting ones.
METHODS: The dual-mining method is based on automatically comparing the strength of patterns mined from a database with the strength of equivalent patterns mined from a relevant knowledgebase. When these two estimates of pattern strength do not match, a high "surprise score" is assigned to the pattern, identifying the pattern as potentially interesting. The surprise score captures the degree of novelty or interestingness of the mined pattern. In addition, we show how to compute p values for each surprise score, thus filtering out noise and attaching statistical significance.
RESULTS: We have implemented the dual-mining method using scripts written in Perl and R. We applied the method to a large patient database and a biomedical literature citation knowledgebase. The system estimated association scores for 50,000 patterns, composed of disease entities and lab results, by querying the database and the knowledgebase. It then computed the surprise scores by comparing the pairs of association scores. Finally, the system estimated statistical significance of the scores.
CONCLUSION: The dual-mining method eliminates more than 90% of patterns with strong associations, thus identifying them as uninteresting. We found that the pruning of patterns using the surprise score matched the biomedical evidence in the 100 cases that were examined by hand. The method automates the acquisition of knowledge, thus reducing dependence on the knowledge elicited from human expert, which is usually a rate-limiting step.

Entities:  

Mesh:

Year:  2006        PMID: 16522200      PMCID: PMC1420278          DOI: 10.1186/1472-6947-6-13

Source DB:  PubMed          Journal:  BMC Med Inform Decis Mak        ISSN: 1472-6947            Impact factor:   2.796


  4 in total

1.  Bioinformatics and medical informatics: collaborations on the road to genomic medicine?

Authors:  Victor Maojo; Casimir A Kulikowski
Journal:  J Am Med Inform Assoc       Date:  2003-08-04       Impact factor: 4.497

2.  Data mining and clinical data repositories: Insights from a 667,000 patient data set.

Authors:  Irene M Mullins; Mir S Siadaty; Jason Lyman; Ken Scully; Carleton T Garrett; W Greg Miller; Rudy Muller; Barry Robson; Chid Apte; Sholom Weiss; Isidore Rigoutsos; Daniel Platt; Simona Cohen; William A Knaus
Journal:  Comput Biol Med       Date:  2005-12-22       Impact factor: 4.589

3.  Data mining in soft computing framework: a survey.

Authors:  S Mitra; S K Pal; P Mitra
Journal:  IEEE Trans Neural Netw       Date:  2002

4.  [''R"--project for statistical computing].

Authors:  Ram Benny Dessau; Christian Bressen Pipper
Journal:  Ugeskr Laeger       Date:  2008-01-28
  4 in total
  2 in total

1.  The autoimmune tautology: an in silico approach.

Authors:  Ricardo A Cifuentes; Daniel Restrepo-Montoya; Juan-Manuel Anaya
Journal:  Autoimmune Dis       Date:  2012-03-05

Review 2.  Technology-Based Innovations to Foster Personalized Healthy Lifestyles and Well-Being: A Targeted Review.

Authors:  Emmanouil G Spanakis; Silvina Santana; Manolis Tsiknakis; Kostas Marias; Vangelis Sakkalis; António Teixeira; Joris H Janssen; Henri de Jong; Chariklia Tziraki
Journal:  J Med Internet Res       Date:  2016-06-24       Impact factor: 5.428

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.