| Literature DB >> 26092335 |
Bai-Chuan Deng1, Yong-Huan Yun2, Yi-Zeng Liang3, Dong-Sheng Cao4, Qing-Song Xu5, Lun-Zhao Yi6, Xin Huang2.
Abstract
Partial least squares (PLS) is one of the most widely used methods for chemical modeling. However, like many other parameter tunable methods, it has strong tendency of over-fitting. Thus, a crucial step in PLS model building is to select the optimal number of latent variables (nLVs). Cross-validation (CV) is the most popular method for PLS model selection because it selects a model from the perspective of prediction ability. However, a clear minimum of prediction errors may not be obtained in CV which makes the model selection difficult. To solve the problem, we proposed a new strategy for PLS model selection which combines the cross-validated coefficient of determination (Qcv(2)) and model stability (S). S is defined as the stability of PLS regression vectors which is obtained using model population analysis (MPA). The results show that, when a clear maximum of Qcv(2) is not obtained, S can provide additional information of over-fitting and it helps in finding the optimal nLVs. Compared with other regression vector based indictors such as the Euclidean 2-norm (B2), the Durbin Watson statistic (DW) and the jaggedness (J), S is more sensitive to over-fitting. The model selected by our method has both good prediction ability and stability.Entities:
Keywords: Cross-validation; Model population analysis; Model selection; Model stability; Over-fitting; Partial least squares
Mesh:
Year: 2015 PMID: 26092335 DOI: 10.1016/j.aca.2015.04.045
Source DB: PubMed Journal: Anal Chim Acta ISSN: 0003-2670 Impact factor: 6.558