Literature DB >> 28777019

Multiple Versus Single Set Validation of Multivariate Models to Avoid Mistakes.

Peter de Boves Harrington1.   

Abstract

Validation of multivariate models is of current importance for a wide range of chemical applications. Although important, it is neglected. The common practice is to use a single external validation set for evaluation. This approach is deficient and may mislead investigators with results that are specific to the single validation set of data. In addition, no statistics are available regarding the precision of a derived figure of merit (FOM). A statistical approach using bootstrapped Latin partitions is advocated. This validation method makes an efficient use of the data because each object is used once for validation. It was reviewed a decade earlier but primarily for the optimization of chemometric models this review presents the reasons it should be used for generalized statistical validation. Average FOMs with confidence intervals are reported and powerful, matched-sample statistics may be applied for comparing models and methods. Examples demonstrate the problems with single validation sets.

Entities:  

Keywords:  Bootstrap Latin partition; calibration; chemometrics; classification; dataset sampling; statistical validation

Mesh:

Year:  2017        PMID: 28777019     DOI: 10.1080/10408347.2017.1361314

Source DB:  PubMed          Journal:  Crit Rev Anal Chem        ISSN: 1040-8347            Impact factor:   6.535


  4 in total

1.  Nonclinical Features in Predictive Modeling of Cardiovascular Diseases: A Machine Learning Approach.

Authors:  Mirza Rizwan Sajid; Noryanti Muhammad; Roslinazairimah Zakaria; Ahmad Shahbaz; Syed Ahmad Chan Bukhari; Seifedine Kadry; A Suresh
Journal:  Interdiscip Sci       Date:  2021-03-06       Impact factor: 2.233

2.  Metabolomic profiling and comparison of major cinnamon species using UHPLC-HRMS.

Authors:  Yifei Wang; Peter de B Harrington; Pei Chen
Journal:  Anal Bioanal Chem       Date:  2020-09-02       Impact factor: 4.142

3.  On Splitting Training and Validation Set: A Comparative Study of Cross-Validation, Bootstrap and Systematic Sampling for Estimating the Generalization Performance of Supervised Learning.

Authors:  Yun Xu; Royston Goodacre
Journal:  J Anal Test       Date:  2018-10-29

4.  Radiomics machine learning study with a small sample size: Single random training-test set split may lead to unreliable results.

Authors:  Chansik An; Yae Won Park; Sung Soo Ahn; Kyunghwa Han; Hwiyoung Kim; Seung-Koo Lee
Journal:  PLoS One       Date:  2021-08-12       Impact factor: 3.240

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.