| Literature DB >> 12377002 |
Héctor C Goicoechea1, Alejandro C Olivieri.
Abstract
Genetic algorithms and other procedures mimicking natural processes are being increasingly used for variable selection, to improve the predictive ability of partial least-squares multivariate calibration. Two issues are critical for the success of genetic algorithms: initialization (setting the first candidates for solving the problem at hand) and overfitting (the tendency to produce excellent results when training, but poor predictions toward fresh samples). A new procedure is presented for sensor selection problems, involving iterative reinitialization based on a statistical analysis of the included sensors. It is shown to give excellent results without the requirement of preparing independent test data sets. Monte Carlo simulations using a theoretical three-component example illustrate how partial least-squares regression greatly benefits from variable selection when the analyte of interest is diluted, and how the new initialization method compares with other strategies. The new genetic algorithm was applied to five experimental data sets. The target parameters were the concentrations of diluted analytes in four pharmaceutical mixtures studied by UV-visible spectrophotometry and the octane number in gasolines analyzed by near-infrared spectroscopy.Entities:
Year: 2002 PMID: 12377002 DOI: 10.1021/ci0255228
Source DB: PubMed Journal: J Chem Inf Comput Sci ISSN: 0095-2338