Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Multiple Versus Single Set Validation of Multivariate Models to Avoid Mistakes.

Literature DB >> 28777019

Multiple Versus Single Set Validation of Multivariate Models to Avoid Mistakes.

Abstract

Validation of multivariate models is of current importance for a wide range of chemical applications. Although important, it is neglected. The common practice is to use a single external validation set for evaluation. This approach is deficient and may mislead investigators with results that are specific to the single validation set of data. In addition, no statistics are available regarding the precision of a derived figure of merit (FOM). A statistical approach using bootstrapped Latin partitions is advocated. This validation method makes an efficient use of the data because each object is used once for validation. It was reviewed a decade earlier but primarily for the optimization of chemometric models this review presents the reasons it should be used for generalized statistical validation. Average FOMs with confidence intervals are reported and powerful, matched-sample statistics may be applied for comparing models and methods. Examples demonstrate the problems with single validation sets.

Entities: Disease

Keywords: Bootstrap Latin partition; calibration; chemometrics; classification; dataset sampling; statistical validation

Mesh：

Year: 2017 PMID： 28777019 DOI： 10.1080/10408347.2017.1361314

Source DB: PubMed Journal: Crit Rev Anal Chem ISSN： 1040-8347 Impact factor: 6.535

Keyword Cloud
Cited

4 in total

1. Nonclinical Features in Predictive Modeling of Cardiovascular Diseases: A Machine Learning Approach.

Authors: Mirza Rizwan Sajid; Noryanti Muhammad; Roslinazairimah Zakaria; Ahmad Shahbaz; Syed Ahmad Chan Bukhari; Seifedine Kadry; A Suresh
Journal: Interdiscip Sci Date: 2021-03-06 Impact factor: 2.233

2. Metabolomic profiling and comparison of major cinnamon species using UHPLC-HRMS.

Authors: Yifei Wang; Peter de B Harrington; Pei Chen
Journal: Anal Bioanal Chem Date: 2020-09-02 Impact factor: 4.142

3. On Splitting Training and Validation Set: A Comparative Study of Cross-Validation, Bootstrap and Systematic Sampling for Estimating the Generalization Performance of Supervised Learning.

Authors: Yun Xu; Royston Goodacre
Journal: J Anal Test Date: 2018-10-29

4. Radiomics machine learning study with a small sample size: Single random training-test set split may lead to unreliable results.

Authors: Chansik An; Yae Won Park; Sung Soo Ahn; Kyunghwa Han; Hwiyoung Kim; Seung-Koo Lee
Journal: PLoS One Date: 2021-08-12 Impact factor: 3.240

4 in total