Literature DB >> 24706821

Knowledge discovery by accuracy maximization.

Stefano Cacciatore1, Claudio Luchinat, Leonardo Tenori.   

Abstract

Here we describe KODAMA (knowledge discovery by accuracy maximization), an unsupervised and semisupervised learning algorithm that performs feature extraction from noisy and high-dimensional data. Unlike other data mining methods, the peculiarity of KODAMA is that it is driven by an integrated procedure of cross-validation of the results. The discovery of a local manifold's topology is led by a classifier through a Monte Carlo procedure of maximization of cross-validated predictive accuracy. Briefly, our approach differs from previous methods in that it has an integrated procedure of validation of the results. In this way, the method ensures the highest robustness of the obtained solution. This robustness is demonstrated on experimental datasets of gene expression and metabolomics, where KODAMA compares favorably with other existing feature extraction methods. KODAMA is then applied to an astronomical dataset, revealing unexpected features. Interesting and not easily predictable features are also found in the analysis of the State of the Union speeches by American presidents: KODAMA reveals an abrupt linguistic transition sharply separating all post-Reagan from all pre-Reagan speeches. The transition occurs during Reagan's presidency and not from its beginning.

Entities:  

Keywords:  clustering; data visualization; dissimilarity matrix; mapping; multivariate statistics

Year:  2014        PMID: 24706821      PMCID: PMC3986136          DOI: 10.1073/pnas.1220873111

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   11.205


  21 in total

1.  A global geometric framework for nonlinear dimensionality reduction.

Authors:  J B Tenenbaum; V de Silva; J C Langford
Journal:  Science       Date:  2000-12-22       Impact factor: 47.728

2.  The isomap algorithm and topological stability.

Authors:  Mukund Balasubramanian; Eric L Schwartz
Journal:  Science       Date:  2002-01-04       Impact factor: 47.728

3.  Mapping knowledge domains.

Authors:  Richard M Shiffrin; Katy Börner
Journal:  Proc Natl Acad Sci U S A       Date:  2004-01-23       Impact factor: 11.205

4.  Stochastic proximity embedding.

Authors:  Dimitris K Agrafiotis
Journal:  J Comput Chem       Date:  2003-07-30       Impact factor: 3.376

5.  Clustering by passing messages between data points.

Authors:  Brendan J Frey; Delbert Dueck
Journal:  Science       Date:  2007-01-11       Impact factor: 47.728

6.  A simple and exact Laplacian clustering of complex networking phenomena: application to gene expression profiles.

Authors:  Choongrak Kim; Mookyung Cheon; Minho Kang; Iksoo Chang
Journal:  Proc Natl Acad Sci U S A       Date:  2008-03-12       Impact factor: 11.205

Review 7.  Computational solutions to large-scale data management and analysis.

Authors:  Eric E Schadt; Michael D Linderman; Jon Sorenson; Lawrence Lee; Garry P Nolan
Journal:  Nat Rev Genet       Date:  2010-09       Impact factor: 53.242

8.  Cluster analysis and display of genome-wide expression patterns.

Authors:  M B Eisen; P T Spellman; P O Brown; D Botstein
Journal:  Proc Natl Acad Sci U S A       Date:  1998-12-08       Impact factor: 11.205

9.  Evidence of different metabolic phenotypes in humans.

Authors:  Michael Assfalg; Ivano Bertini; Donato Colangiuli; Claudio Luchinat; Hartmut Schäfer; Birk Schütz; Manfred Spraul
Journal:  Proc Natl Acad Sci U S A       Date:  2008-01-29       Impact factor: 11.205

10.  GPU-FS-kNN: a software tool for fast and scalable kNN computation using GPUs.

Authors:  Ahmed Shamsul Arefin; Carlos Riveros; Regina Berretta; Pablo Moscato
Journal:  PLoS One       Date:  2012-08-28       Impact factor: 3.240

View more
  9 in total

1.  Metabolic Profiling in Formalin-Fixed and Paraffin-Embedded Prostate Cancer Tissues.

Authors:  Stefano Cacciatore; Giorgia Zadra; Clyde Bango; Kathryn L Penney; Svitlana Tyekucheva; Oscar Yanes; Massimo Loda
Journal:  Mol Cancer Res       Date:  2017-01-10       Impact factor: 5.852

Review 2.  Innovation in metabolomics to improve personalized healthcare.

Authors:  Stefano Cacciatore; Massimo Loda
Journal:  Ann N Y Acad Sci       Date:  2015-05-26       Impact factor: 5.691

3.  FLOW-MAP: a graph-based, force-directed layout algorithm for trajectory mapping in single-cell time course datasets.

Authors:  Melissa E Ko; Corey M Williams; Kristen I Fread; Sarah M Goggin; Rohit S Rustagi; Gabriela K Fragiadakis; Garry P Nolan; Eli R Zunder
Journal:  Nat Protoc       Date:  2020-01-13       Impact factor: 13.491

4.  KODAMA: an R package for knowledge discovery and data mining.

Authors:  Stefano Cacciatore; Leonardo Tenori; Claudio Luchinat; Phillip R Bennett; David A MacIntyre
Journal:  Bioinformatics       Date:  2017-02-15       Impact factor: 6.937

5.  Integrated Lipidomics and Proteomics Point to Early Blood-Based Changes in Childhood Preceding Later Development of Psychotic Experiences: Evidence From the Avon Longitudinal Study of Parents and Children.

Authors:  Francisco Madrid-Gambin; Melanie Föcking; Sophie Sabherwal; Meike Heurich; Jane A English; Aoife O'Gorman; Tommi Suvitaival; Linda Ahonen; Mary Cannon; Glyn Lewis; Ismo Mattila; Caitriona Scaife; Sean Madden; Tuulia Hyötyläinen; Matej Orešič; Stanley Zammit; Gerard Cagney; David R Cotter; Lorraine Brennan
Journal:  Biol Psychiatry       Date:  2019-01-30       Impact factor: 13.382

6.  Integrative measurement analysis via machine learning descriptor selection for investigating physical properties of biopolymers in hairs.

Authors:  Ayari Takamura; Kaede Tsukamoto; Kenji Sakata; Jun Kikuchi
Journal:  Sci Rep       Date:  2021-12-21       Impact factor: 4.379

Review 7.  Current Knowledge in Skin Metabolomics: Updates from Literature Review.

Authors:  Alessia Paganelli; Valeria Righi; Elisabetta Tarentini; Cristina Magnoni
Journal:  Int J Mol Sci       Date:  2022-08-07       Impact factor: 6.208

8.  The Da Vinci European BioBank: A Metabolomics-Driven Infrastructure.

Authors:  Dario Carotenuto; Claudio Luchinat; Giordana Marcon; Antonio Rosato; Paola Turano
Journal:  J Pers Med       Date:  2015-04-22

Review 9.  High-Throughput Metabolomics by 1D NMR.

Authors:  Alessia Vignoli; Veronica Ghini; Gaia Meoni; Cristina Licari; Panteleimon G Takis; Leonardo Tenori; Paola Turano; Claudio Luchinat
Journal:  Angew Chem Int Ed Engl       Date:  2018-11-11       Impact factor: 15.336

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.