Literature DB >> 19959327

Data splitting for artificial neural networks using SOM-based stratified sampling.

R J May1, H R Maier, G C Dandy.   

Abstract

Data splitting is an important consideration during artificial neural network (ANN) development where hold-out cross-validation is commonly employed to ensure generalization. Even for a moderate sample size, the sampling methodology used for data splitting can have a significant effect on the quality of the subsets used for training, testing and validating an ANN. Poor data splitting can result in inaccurate and highly variable model performance; however, the choice of sampling methodology is rarely given due consideration by ANN modellers. Increased confidence in the sampling is of paramount importance, since the hold-out sampling is generally performed only once during ANN development. This paper considers the variability in the quality of subsets that are obtained using different data splitting approaches. A novel approach to stratified sampling, based on Neyman sampling of the self-organizing map (SOM), is developed, with several guidelines identified for setting the SOM size and sample allocation in order to minimize the bias and variance in the datasets. Using an example ANN function approximation task, the SOM-based approach is evaluated in comparison to random sampling, DUPLEX, systematic stratified sampling, and trial-and-error sampling to minimize the statistical differences between data sets. Of these approaches, DUPLEX is found to provide benchmark performance with good model performance, with no variability. The results show that the SOM-based approach also reliably generates high-quality samples and can therefore be used with greater confidence than other approaches, especially in the case of non-uniform datasets, with the benefit of scalability to perform data splitting on large datasets. Copyright 2009 Elsevier Ltd. All rights reserved.

Mesh:

Year:  2009        PMID: 19959327     DOI: 10.1016/j.neunet.2009.11.009

Source DB:  PubMed          Journal:  Neural Netw        ISSN: 0893-6080


  8 in total

1.  Boosted feature selectors: a case study on prediction P-gp inhibitors and substrates.

Authors:  Gonzalo Cerruela García; Nicolás García-Pedrajas
Journal:  J Comput Aided Mol Des       Date:  2018-10-26       Impact factor: 3.686

Review 2.  Applications of artificial neural networks in health care organizational decision-making: A scoping review.

Authors:  Nida Shahid; Tim Rappon; Whitney Berta
Journal:  PLoS One       Date:  2019-02-19       Impact factor: 3.240

3.  Classification of MRI Brain Images Using DNA Genetic Algorithms Optimized Tsallis Entropy and Support Vector Machine.

Authors:  Wenke Zang; Zehua Wang; Dong Jiang; Xiyu Liu; Zhenni Jiang
Journal:  Entropy (Basel)       Date:  2018-12-13       Impact factor: 2.524

4.  QSAR analysis on a large and diverse set of potent phosphoinositide 3-kinase gamma (PI3Kγ) inhibitors using MLR and ANN methods.

Authors:  Fereydoun Sadeghi; Abbas Afkhami; Tayyebeh Madrakian; Raouf Ghavami
Journal:  Sci Rep       Date:  2022-04-12       Impact factor: 4.379

5.  Classification of fruits using computer vision and a multiclass support vector machine.

Authors:  Yudong Zhang; Lenan Wu
Journal:  Sensors (Basel)       Date:  2012-09-13       Impact factor: 3.576

6.  Modelling of Urban Air Pollutant Concentrations with Artificial Neural Networks Using Novel Input Variables.

Authors:  Laura Goulier; Bastian Paas; Laura Ehrnsperger; Otto Klemm
Journal:  Int J Environ Res Public Health       Date:  2020-03-19       Impact factor: 3.390

7.  Smart Anomaly Detection and Prediction for Assembly Process Maintenance in Compliance with Industry 4.0.

Authors:  Pavol Tanuska; Lukas Spendla; Michal Kebisek; Rastislav Duris; Maximilian Stremy
Journal:  Sensors (Basel)       Date:  2021-03-29       Impact factor: 3.576

8.  Radiomics machine learning study with a small sample size: Single random training-test set split may lead to unreliable results.

Authors:  Chansik An; Yae Won Park; Sung Soo Ahn; Kyunghwa Han; Hwiyoung Kim; Seung-Koo Lee
Journal:  PLoS One       Date:  2021-08-12       Impact factor: 3.240

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.