Literature DB >> 32521216

On a Scalable Entropic Breaching of the Overfitting Barrier for Small Data Problems in Machine Learning.

Illia Horenko1.   

Abstract

Overfitting and treatment of small data are among the most challenging problems in machine learning (ML), when a relatively small data statistics size T is not enough to provide a robust ML fit for a relatively large data feature dimension D. Deploying a massively parallel ML analysis of generic classification problems for different D and T, we demonstrate the existence of statistically significant linear overfitting barriers for common ML methods. The results reveal that for a robust classification of bioinformatics-motivated generic problems with the long short-term memory deep learning classifier (LSTM), one needs in the best case a statistics T that is at least 13.8 times larger than the feature dimension D. We show that this overfitting barrier can be breached at a 10-12 fraction of the computational cost by means of the entropy-optimal scalable probabilistic approximations algorithm (eSPA), performing a joint solution of the entropy-optimal Bayesian network inference and feature space segmentation problems. Application of eSPA to experimental single cell RNA sequencing data exhibits a 30-fold classification performance boost when compared to standard bioinformatics tools and a 7-fold boost when compared to the deep learning LSTM classifier.

Entities:  

Year:  2020        PMID: 32521216     DOI: 10.1162/neco_a_01296

Source DB:  PubMed          Journal:  Neural Comput        ISSN: 0899-7667            Impact factor:   2.026


  4 in total

1.  Low-Cost Probabilistic 3D Denoising with Applications for Ultra-Low-Radiation Computed Tomography.

Authors:  Illia Horenko; Lukáš Pospíšil; Edoardo Vecchi; Steffen Albrecht; Alexander Gerber; Beate Rehbock; Albrecht Stroh; Susanne Gerber
Journal:  J Imaging       Date:  2022-05-31

Review 2.  A deeper look into natural sciences with physics-based and data-driven measures.

Authors:  Davi Röhe Rodrigues; Karin Everschor-Sitte; Susanne Gerber; Illia Horenko
Journal:  iScience       Date:  2021-02-09

3.  Cheap robust learning of data anomalies with analytically solvable entropic outlier sparsification.

Authors:  Illia Horenko
Journal:  Proc Natl Acad Sci U S A       Date:  2022-03-01       Impact factor: 11.205

4.  Genomic basis for drought resistance in European beech forests threatened by climate change.

Authors:  Markus Pfenninger; Friederike Reuss; Angelika Kiebler; Philipp Schönnenbeck; Cosima Caliendo; Susanne Gerber; Berardino Cocchiararo; Sabrina Reuter; Nico Blüthgen; Karsten Mody; Bagdevi Mishra; Miklós Bálint; Marco Thines; Barbara Feldmeyer
Journal:  Elife       Date:  2021-06-16       Impact factor: 8.140

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.