Literature DB >> 18698807

Clinical and pharmacogenomic data mining: 4. The FANO program and command set as an example of tools for biomedical discovery and evidence based medicine.

Barry Robson1.   

Abstract

The culmination of methodology explored and developed in the preceding three papers is described in terms of the FANO program (also known as CliniMiner) and specifically in terms of the contemporary command set for data mining. This provides a more detailed account of how strategies were implemented in applications described elsewhere, in the previous papers in the series and in a paper on the analysis of 667 000 patient records. Although it is not customary to think of a command set as the output of research, it represents the elements and strategies for data mining biomedical and clinical data with many parameters, that is, in a high dimensional space that requires skilful navigation. The intent is not to promote FANO per se, but to report its science and methodologies. Typical example rules from traditional data mining are that A and B and C associate, or IF A & B THEN C. We need much higher complexity rules for clinical data especially with inclusion of proteomics and genomics. FANO's specific goal is to be able routinely to extract from clinical record repositories and other data not only the complex rules required for biomedical research and the clinical practice of evidence based medicine, but to quantify their uncertainty, that is, their essentially probabilistic nature. The underlying information and number theoretic basis previously described is less of an issue here, being "under the hood", although the fundamental role and use of the Incomplete (generalized) Riemann Zeta Function as a general surprise measure is highlighted, along with its covariance or multivariance analogue, as it appears to be a unique and powerful feature. Another characteristic described is the very general tactic of the metadata operator ':='. It allows decomposition of diverse data types such as trees, spreadsheets, biosequences, sets of objects, amorphous data collections with repeating items, XML structures, and so forth into universally atomic data items with or without metadata, and assists in reconstruction of ontology from the associations and numerical correlations so data mined.

Entities:  

Mesh:

Year:  2008        PMID: 18698807     DOI: 10.1021/pr800204f

Source DB:  PubMed          Journal:  J Proteome Res        ISSN: 1535-3893            Impact factor:   4.466


  1 in total

1.  Drug discovery using very large numbers of patents: general strategy with extensive use of match and edit operations.

Authors:  Barry Robson; Jin Li; Richard Dettinger; Amanda Peters; Stephen K Boyer
Journal:  J Comput Aided Mol Des       Date:  2011-05-03       Impact factor: 3.686

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.