Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Sample size planning for classification models.

Literature DB >> 23265730

Sample size planning for classification models.

Claudia Beleites¹, Ute Neugebauer, Thomas Bocklitz, Christoph Krafft, Jürgen Popp.

Abstract

In biospectroscopy, suitably annotated and statistically independent samples (e.g. patients, batches, etc.) for classifier training and testing are scarce and costly. Learning curves show the model performance as function of the training sample size and can help to determine the sample size needed to train good classifiers. However, building a good model is actually not enough: the performance must also be proven. We discuss learning curves for typical small sample size situations with 5-25 independent samples per class. Although the classification models achieve acceptable performance, the learning curve can be completely masked by the random testing uncertainty due to the equally limited test sample size. In consequence, we determine test sample sizes necessary to achieve reasonable precision in the validation and find that 75-100 samples will usually be needed to test a good but not perfect classifier. Such a data set will then allow refined sample size planning on the basis of the achieved performance. We also demonstrate how to calculate necessary sample sizes in order to show the superiority of one classifier over another: this often requires hundreds of statistically independent test samples or is even theoretically impossible. We demonstrate our findings with a data set of ca. 2550 Raman spectra of single cells (five classes: erythrocytes, leukocytes and three tumour cell lines BT-20, MCF-7 and OCI-AML3) as well as by an extensive simulation that allows precise determination of the actual performance of the models in question.

Entities: Disease Gene Species

Mesh：

Year: 2012 PMID： 23265730 DOI： 10.1016/j.aca.2012.11.007

Source DB: PubMed Journal: Anal Chim Acta ISSN： 0003-2670 Impact factor: 6.558

Keyword Cloud
Cited

44 in total

1. Using Low-Frequency Oscillations to Detect Temporal Lobe Epilepsy with Machine Learning.

Authors: Gyujoon Hwang; Veena A Nair; Jed Mathis; Cole J Cook; Rosaleena Mohanty; Gengyan Zhao; Neelima Tellapragada; Candida Ustine; Onyekachi O Nwoke; Charlene Rivera-Bonet; Megan Rozman; Linda Allen; Courtney Forseth; Dace N Almane; Peter Kraegel; Andrew Nencka; Elizabeth Felton; Aaron F Struck; Rasmus Birn; Rama Maganti; Lisa L Conant; Colin J Humphries; Bruce Hermann; Manoj Raghavan; Edgar A DeYoe; Jeffrey R Binder; Elizabeth Meyerand; Vivek Prabhakaran
Journal: Brain Connect Date: 2019-03

2. Application of Machine Learning to Predict Dietary Lapses During Weight Loss.

Authors: Stephanie P Goldstein; Fengqing Zhang; John G Thomas; Meghan L Butryn; James D Herbert; Evan M Forman
Journal: J Diabetes Sci Technol Date: 2018-05-24

3. Prediction of outcome in internet-delivered cognitive behaviour therapy for paediatric obsessive-compulsive disorder: A machine learning approach.

Authors: Fabian Lenhard; Sebastian Sauer; Erik Andersson; Kristoffer Nt Månsson; David Mataix-Cols; Christian Rück; Eva Serlachius
Journal: Int J Methods Psychiatr Res Date: 2017-07-28 Impact factor: 4.035

4. Using Raman spectroscopy to characterize biological materials.

Authors: Holly J Butler; Lorna Ashton; Benjamin Bird; Gianfelice Cinque; Kelly Curtis; Jennifer Dorney; Karen Esmonde-White; Nigel J Fullwood; Benjamin Gardner; Pierre L Martin-Hirsch; Michael J Walsh; Martin R McAinsh; Nicholas Stone; Francis L Martin
Journal: Nat Protoc Date: 2016-03-10 Impact factor: 13.491

5. Noninvasive detection of macrophage activation with single-cell resolution through machine learning.

Authors: Nicolas Pavillon; Alison J Hobro; Shizuo Akira; Nicholas I Smith
Journal: Proc Natl Acad Sci U S A Date: 2018-03-06 Impact factor: 11.205

6. Using prerecorded hemodynamic response functions in detecting prefrontal pain response: a functional near-infrared spectroscopy study.

Authors: Ke Peng; Meryem A Yücel; Christopher M Aasted; Sarah C Steele; David A Boas; David Borsook; Lino Becerra
Journal: Neurophotonics Date: 2017-10-16 Impact factor: 3.593

7. Quantitative assessment of cancer cell morphology and motility using telecentric digital holographic microscopy and machine learning.

Authors: Van K Lam; Thanh C Nguyen; Byung M Chung; George Nehmetallah; Christopher B Raub
Journal: Cytometry A Date: 2017-12-28 Impact factor: 4.355

8. Salivary microRNAs identified by small RNA sequencing and machine learning as potential biomarkers of alcohol dependence.

Authors: Andrew J Rosato; Xiaochun Chen; Yoshiaki Tanaka; Lindsay A Farrer; Henry R Kranzler; Yaira Z Nunez; David C Henderson; Joel Gelernter; Huiping Zhang
Journal: Epigenomics Date: 2019-05-29 Impact factor: 4.778

9. Estimating Surgical Blood Loss Volume Using Continuously Monitored Vital Signs.

Authors: Yang Chen; Chengcheng Hong; Michael R Pinsky; Ting Ma; Gilles Clermont
Journal: Sensors (Basel) Date: 2020-11-17 Impact factor: 3.576

10. Development of a practical spatial-spectral analysis protocol for breast histopathology using Fourier transform infrared spectroscopic imaging.

Authors: F Nell Pounder; Rohith K Reddy; Rohit Bhargava
Journal: Faraday Discuss Date: 2016-06-23 Impact factor: 4.008