Literature DB >> 22337290

Interpolation and extrapolation problems of multivariate regression in analytical chemistry: benchmarking the robustness on near-infrared (NIR) spectroscopy data.

Roman M Balabin1, Sergey V Smirnov.   

Abstract

Modern analytical chemistry of industrial products is in need of rapid, robust, and cheap analytical methods to continuously monitor product quality parameters. For this reason, spectroscopic methods are often used to control the quality of industrial products in an on-line/in-line regime. Vibrational spectroscopy, including mid-infrared (MIR), Raman, and near-infrared (NIR), is one of the best ways to obtain information about the chemical structures and the quality coefficients of multicomponent mixtures. Together with chemometric algorithms and multivariate data analysis (MDA) methods, which were especially created for the analysis of complicated, noisy, and overlapping signals, NIR spectroscopy shows great results in terms of its accuracy, including classical prediction error, RMSEP. However, it is unclear whether the combined NIR + MDA methods are capable of dealing with much more complex interpolation or extrapolation problems that are inevitably present in real-world applications. In the current study, we try to make a rather general comparison of linear, such as partial least squares or projection to latent structures (PLS); "quasi-nonlinear", such as the polynomial version of PLS (Poly-PLS); and intrinsically non-linear, such as artificial neural networks (ANNs), support vector regression (SVR), and least-squares support vector machines (LS-SVM/LSSVM), regression methods in terms of their robustness. As a measure of robustness, we will try to estimate their accuracy when solving interpolation and extrapolation problems. Petroleum and biofuel (biodiesel) systems were chosen as representative examples of real-world samples. Six very different chemical systems that differed in complexity, composition, structure, and properties were studied; these systems were gasoline, ethanol-gasoline biofuel, diesel fuel, aromatic solutions of petroleum macromolecules, petroleum resins in benzene, and biodiesel. Eighteen different sample sets were used in total. General conclusions are made about the applicability of ANN- and SVM-based regression tools in the modern analytical chemistry. The effectiveness of different multivariate algorithms is different when going from classical accuracy to robustness. Neural networks, which are capable of producing very accurate results with respect to classical RMSEP, are not able to solve interpolation problems or, especially, extrapolation problems. The chemometric methods that are based on the support vector machine (SVM) ideology are capable of solving both classical regression and interpolation/extrapolation tasks.

Entities:  

Mesh:

Substances:

Year:  2012        PMID: 22337290     DOI: 10.1039/c2an15972d

Source DB:  PubMed          Journal:  Analyst        ISSN: 0003-2654            Impact factor:   4.616


  3 in total

1.  FLOating-Window Projective Separator (FloWPS): A Data Trimming Tool for Support Vector Machines (SVM) to Improve Robustness of the Classifier.

Authors:  Victor Tkachev; Maxim Sorokin; Artem Mescheryakov; Alexander Simonov; Andrew Garazha; Anton Buzdin; Ilya Muchnik; Nicolas Borisov
Journal:  Front Genet       Date:  2019-01-15       Impact factor: 4.599

2.  Flexible Data Trimming Improves Performance of Global Machine Learning Methods in Omics-Based Personalized Oncology.

Authors:  Victor Tkachev; Maxim Sorokin; Constantin Borisov; Andrew Garazha; Anton Buzdin; Nicolas Borisov
Journal:  Int J Mol Sci       Date:  2020-01-22       Impact factor: 5.923

Review 3.  The Sample, the Spectra and the Maths-The Critical Pillars in the Development of Robust and Sound Applications of Vibrational Spectroscopy.

Authors:  Daniel Cozzolino
Journal:  Molecules       Date:  2020-08-12       Impact factor: 4.411

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.