Literature DB >> 34891749

Machine Learning Model Validation for Early Stage Studies with Small Sample Sizes.

Robyn Larracy, Angkoon Phinyomark, Erik Scheme.   

Abstract

In early stage biomedical studies, small datasets are common due to the high cost and difficulty of sample collection with human subjects. This complicates the validation of machine learning models, which are best suited for large datasets. In this work, we examined feature selection techniques, validation frameworks, and learning curve fitting for small simulated datasets with known underlying discriminability, with the aim of identifying a protocol for estimating and interpreting early stage model performance and for planning future studies. Of a variety of examined validation configurations, a nested cross-validation framework provided the most accurate reflection of the selected features' discriminability, but the relevant features were often not properly identified during the feature selection stage for datasets with small sample sizes. Ultimately, we recommend that: (1) filter-based feature selection methods should be used to minimize overfitting to noise-based features, (2) statistical exploration should be conducted on datasets as a whole to estimate the level of discriminability and the feasibility of the classification problems, and (3) learning curves should be employed using nested cross-validation performance estimates for forecasting accuracy at larger sample sizes and estimating the required number of samples to converge towards best performance. This work should serve as a guideline for researchers incorporating machine learning in small-scale pilot studies.

Entities:  

Mesh:

Year:  2021        PMID: 34891749     DOI: 10.1109/EMBC46164.2021.9629697

Source DB:  PubMed          Journal:  Annu Int Conf IEEE Eng Med Biol Soc        ISSN: 2375-7477


  2 in total

1.  Machine Learning-Based Radiomics for Prediction of Epidermal Growth Factor Receptor Mutations in Lung Adenocarcinoma.

Authors:  Jiameng Lu; Xiaoqing Ji; Lixia Wang; Yunxiu Jiang; Xinyi Liu; Zhenshen Ma; Yafei Ning; Jie Dong; Haiying Peng; Fei Sun; Zihan Guo; Yanbo Ji; Jianping Xing; Yue Lu; Degan Lu
Journal:  Dis Markers       Date:  2022-05-07       Impact factor: 3.464

2.  Individualized identification of sexual dysfunction of psychiatric patients with machine-learning.

Authors:  Yang S Liu; Jeffrey R Hankey; Stefani Chokka; Pratap R Chokka; Bo Cao
Journal:  Sci Rep       Date:  2022-06-10       Impact factor: 4.996

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.