Literature DB >> 23265730

Sample size planning for classification models.

Claudia Beleites1, Ute Neugebauer, Thomas Bocklitz, Christoph Krafft, Jürgen Popp.   

Abstract

In biospectroscopy, suitably annotated and statistically independent samples (e.g. patients, batches, etc.) for classifier training and testing are scarce and costly. Learning curves show the model performance as function of the training sample size and can help to determine the sample size needed to train good classifiers. However, building a good model is actually not enough: the performance must also be proven. We discuss learning curves for typical small sample size situations with 5-25 independent samples per class. Although the classification models achieve acceptable performance, the learning curve can be completely masked by the random testing uncertainty due to the equally limited test sample size. In consequence, we determine test sample sizes necessary to achieve reasonable precision in the validation and find that 75-100 samples will usually be needed to test a good but not perfect classifier. Such a data set will then allow refined sample size planning on the basis of the achieved performance. We also demonstrate how to calculate necessary sample sizes in order to show the superiority of one classifier over another: this often requires hundreds of statistically independent test samples or is even theoretically impossible. We demonstrate our findings with a data set of ca. 2550 Raman spectra of single cells (five classes: erythrocytes, leukocytes and three tumour cell lines BT-20, MCF-7 and OCI-AML3) as well as by an extensive simulation that allows precise determination of the actual performance of the models in question.
Copyright © 2012 Elsevier B.V. All rights reserved.

Entities:  

Mesh:

Year:  2012        PMID: 23265730     DOI: 10.1016/j.aca.2012.11.007

Source DB:  PubMed          Journal:  Anal Chim Acta        ISSN: 0003-2670            Impact factor:   6.558


  44 in total

1.  Using Low-Frequency Oscillations to Detect Temporal Lobe Epilepsy with Machine Learning.

Authors:  Gyujoon Hwang; Veena A Nair; Jed Mathis; Cole J Cook; Rosaleena Mohanty; Gengyan Zhao; Neelima Tellapragada; Candida Ustine; Onyekachi O Nwoke; Charlene Rivera-Bonet; Megan Rozman; Linda Allen; Courtney Forseth; Dace N Almane; Peter Kraegel; Andrew Nencka; Elizabeth Felton; Aaron F Struck; Rasmus Birn; Rama Maganti; Lisa L Conant; Colin J Humphries; Bruce Hermann; Manoj Raghavan; Edgar A DeYoe; Jeffrey R Binder; Elizabeth Meyerand; Vivek Prabhakaran
Journal:  Brain Connect       Date:  2019-03

2.  Application of Machine Learning to Predict Dietary Lapses During Weight Loss.

Authors:  Stephanie P Goldstein; Fengqing Zhang; John G Thomas; Meghan L Butryn; James D Herbert; Evan M Forman
Journal:  J Diabetes Sci Technol       Date:  2018-05-24

3.  Prediction of outcome in internet-delivered cognitive behaviour therapy for paediatric obsessive-compulsive disorder: A machine learning approach.

Authors:  Fabian Lenhard; Sebastian Sauer; Erik Andersson; Kristoffer Nt Månsson; David Mataix-Cols; Christian Rück; Eva Serlachius
Journal:  Int J Methods Psychiatr Res       Date:  2017-07-28       Impact factor: 4.035

4.  Using Raman spectroscopy to characterize biological materials.

Authors:  Holly J Butler; Lorna Ashton; Benjamin Bird; Gianfelice Cinque; Kelly Curtis; Jennifer Dorney; Karen Esmonde-White; Nigel J Fullwood; Benjamin Gardner; Pierre L Martin-Hirsch; Michael J Walsh; Martin R McAinsh; Nicholas Stone; Francis L Martin
Journal:  Nat Protoc       Date:  2016-03-10       Impact factor: 13.491

5.  Noninvasive detection of macrophage activation with single-cell resolution through machine learning.

Authors:  Nicolas Pavillon; Alison J Hobro; Shizuo Akira; Nicholas I Smith
Journal:  Proc Natl Acad Sci U S A       Date:  2018-03-06       Impact factor: 11.205

6.  Using prerecorded hemodynamic response functions in detecting prefrontal pain response: a functional near-infrared spectroscopy study.

Authors:  Ke Peng; Meryem A Yücel; Christopher M Aasted; Sarah C Steele; David A Boas; David Borsook; Lino Becerra
Journal:  Neurophotonics       Date:  2017-10-16       Impact factor: 3.593

7.  Quantitative assessment of cancer cell morphology and motility using telecentric digital holographic microscopy and machine learning.

Authors:  Van K Lam; Thanh C Nguyen; Byung M Chung; George Nehmetallah; Christopher B Raub
Journal:  Cytometry A       Date:  2017-12-28       Impact factor: 4.355

8.  Salivary microRNAs identified by small RNA sequencing and machine learning as potential biomarkers of alcohol dependence.

Authors:  Andrew J Rosato; Xiaochun Chen; Yoshiaki Tanaka; Lindsay A Farrer; Henry R Kranzler; Yaira Z Nunez; David C Henderson; Joel Gelernter; Huiping Zhang
Journal:  Epigenomics       Date:  2019-05-29       Impact factor: 4.778

9.  Estimating Surgical Blood Loss Volume Using Continuously Monitored Vital Signs.

Authors:  Yang Chen; Chengcheng Hong; Michael R Pinsky; Ting Ma; Gilles Clermont
Journal:  Sensors (Basel)       Date:  2020-11-17       Impact factor: 3.576

10.  Development of a practical spatial-spectral analysis protocol for breast histopathology using Fourier transform infrared spectroscopic imaging.

Authors:  F Nell Pounder; Rohith K Reddy; Rohit Bhargava
Journal:  Faraday Discuss       Date:  2016-06-23       Impact factor: 4.008

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.