Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Knowledge discovery by accuracy maximization.

Literature DB >> 24706821

Knowledge discovery by accuracy maximization.

Stefano Cacciatore¹, Claudio Luchinat, Leonardo Tenori.

Abstract

Here we describe KODAMA (knowledge discovery by accuracy maximization), an unsupervised and semisupervised learning algorithm that performs feature extraction from noisy and high-dimensional data. Unlike other data mining methods, the peculiarity of KODAMA is that it is driven by an integrated procedure of cross-validation of the results. The discovery of a local manifold's topology is led by a classifier through a Monte Carlo procedure of maximization of cross-validated predictive accuracy. Briefly, our approach differs from previous methods in that it has an integrated procedure of validation of the results. In this way, the method ensures the highest robustness of the obtained solution. This robustness is demonstrated on experimental datasets of gene expression and metabolomics, where KODAMA compares favorably with other existing feature extraction methods. KODAMA is then applied to an astronomical dataset, revealing unexpected features. Interesting and not easily predictable features are also found in the analysis of the State of the Union speeches by American presidents: KODAMA reveals an abrupt linguistic transition sharply separating all post-Reagan from all pre-Reagan speeches. The transition occurs during Reagan's presidency and not from its beginning.

Entities: Disease

Keywords: clustering; data visualization; dissimilarity matrix; mapping; multivariate statistics

Year: 2014 PMID： 24706821 PMCID： PMC3986136 DOI： 10.1073/pnas.1220873111

Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN： 0027-8424 Impact factor: 11.205

21 in total

1. A global geometric framework for nonlinear dimensionality reduction.

Authors: J B Tenenbaum; V de Silva; J C Langford
Journal: Science Date: 2000-12-22 Impact factor: 47.728

2. The isomap algorithm and topological stability.

Authors: Mukund Balasubramanian; Eric L Schwartz
Journal: Science Date: 2002-01-04 Impact factor: 47.728

3. Mapping knowledge domains.

Authors: Richard M Shiffrin; Katy Börner
Journal: Proc Natl Acad Sci U S A Date: 2004-01-23 Impact factor: 11.205

4. Stochastic proximity embedding.

Authors: Dimitris K Agrafiotis
Journal: J Comput Chem Date: 2003-07-30 Impact factor: 3.376

5. Clustering by passing messages between data points.

Authors: Brendan J Frey; Delbert Dueck
Journal: Science Date: 2007-01-11 Impact factor: 47.728

6. A simple and exact Laplacian clustering of complex networking phenomena: application to gene expression profiles.

Authors: Choongrak Kim; Mookyung Cheon; Minho Kang; Iksoo Chang
Journal: Proc Natl Acad Sci U S A Date: 2008-03-12 Impact factor: 11.205

Review 7. Computational solutions to large-scale data management and analysis.

Authors: Eric E Schadt; Michael D Linderman; Jon Sorenson; Lawrence Lee; Garry P Nolan
Journal: Nat Rev Genet Date: 2010-09 Impact factor: 53.242

8. Cluster analysis and display of genome-wide expression patterns.

Authors: M B Eisen; P T Spellman; P O Brown; D Botstein
Journal: Proc Natl Acad Sci U S A Date: 1998-12-08 Impact factor: 11.205

9. Evidence of different metabolic phenotypes in humans.

Authors: Michael Assfalg; Ivano Bertini; Donato Colangiuli; Claudio Luchinat; Hartmut Schäfer; Birk Schütz; Manfred Spraul
Journal: Proc Natl Acad Sci U S A Date: 2008-01-29 Impact factor: 11.205

10. GPU-FS-kNN: a software tool for fast and scalable kNN computation using GPUs.

Authors: Ahmed Shamsul Arefin; Carlos Riveros; Regina Berretta; Pablo Moscato
Journal: PLoS One Date: 2012-08-28 Impact factor: 3.240

9 in total

1. Metabolic Profiling in Formalin-Fixed and Paraffin-Embedded Prostate Cancer Tissues.

Authors: Stefano Cacciatore; Giorgia Zadra; Clyde Bango; Kathryn L Penney; Svitlana Tyekucheva; Oscar Yanes; Massimo Loda
Journal: Mol Cancer Res Date: 2017-01-10 Impact factor: 5.852

Review 2. Innovation in metabolomics to improve personalized healthcare.

Authors: Stefano Cacciatore; Massimo Loda
Journal: Ann N Y Acad Sci Date: 2015-05-26 Impact factor: 5.691

3. FLOW-MAP: a graph-based, force-directed layout algorithm for trajectory mapping in single-cell time course datasets.

Authors: Melissa E Ko; Corey M Williams; Kristen I Fread; Sarah M Goggin; Rohit S Rustagi; Gabriela K Fragiadakis; Garry P Nolan; Eli R Zunder
Journal: Nat Protoc Date: 2020-01-13 Impact factor: 13.491

4. KODAMA: an R package for knowledge discovery and data mining.

Authors: Stefano Cacciatore; Leonardo Tenori; Claudio Luchinat; Phillip R Bennett; David A MacIntyre
Journal: Bioinformatics Date: 2017-02-15 Impact factor: 6.937

5. Integrated Lipidomics and Proteomics Point to Early Blood-Based Changes in Childhood Preceding Later Development of Psychotic Experiences: Evidence From the Avon Longitudinal Study of Parents and Children.

Authors: Francisco Madrid-Gambin; Melanie Föcking; Sophie Sabherwal; Meike Heurich; Jane A English; Aoife O'Gorman; Tommi Suvitaival; Linda Ahonen; Mary Cannon; Glyn Lewis; Ismo Mattila; Caitriona Scaife; Sean Madden; Tuulia Hyötyläinen; Matej Orešič; Stanley Zammit; Gerard Cagney; David R Cotter; Lorraine Brennan
Journal: Biol Psychiatry Date: 2019-01-30 Impact factor: 13.382

6. Integrative measurement analysis via machine learning descriptor selection for investigating physical properties of biopolymers in hairs.

Authors: Ayari Takamura; Kaede Tsukamoto; Kenji Sakata; Jun Kikuchi
Journal: Sci Rep Date: 2021-12-21 Impact factor: 4.379