| Literature DB >> 35164133 |
Valeria Tafintseva1, Tiril Aurora Lintvedt1,2, Johanne Heitmann Solheim1, Boris Zimmermann1, Hafeez Ur Rehman1, Vesa Virtanen3, Rubina Shaikh4,5, Ervin Nippolainen4, Isaac Afara4, Simo Saarakkala3, Lassi Rieppo3, Patrick Krebs6, Polina Fomina6, Boris Mizaikoff6, Achim Kohler1.
Abstract
The aim of the study was to optimize preprocessing of sparse infrared spectral data. The sparse data were obtained by reducing broadband Fourier transform infrared attenuated total reflectance spectra of bovine and human cartilage, as well as of simulated spectral data, comprising several thousand spectral variables into datasets comprising only seven spectral variables. Different preprocessing approaches were compared, including simple baseline correction and normalization procedures, and model-based preprocessing, such as multiplicative signal correction (MSC). The optimal preprocessing was selected based on the quality of classification models established by partial least squares discriminant analysis for discriminating healthy and damaged cartilage samples. The best results for the sparse data were obtained by preprocessing using a baseline offset correction at 1800 cm-1, followed by peak normalization at 850 cm-1 and preprocessing by MSC.Entities:
Keywords: multiplicative signal correction; preprocessing; quantum cascade lasers; sparse spectra
Mesh:
Year: 2022 PMID: 35164133 PMCID: PMC8839829 DOI: 10.3390/molecules27030873
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Figure 1Human dataset 2 samples: (a) raw broadband spectra of after removing water spectra by EMSC pre-classification algorithm; (b) spectra preprocessed by the weighted EMSC1 (MSC plus linear term) model.
Figure 2Simulated human cartilage spectra: mean spectra of simulated healthy (in blue) and damaged (in red) spectra showing (a) the full spectral range and (b) the fingerprint region; (c) all simulated apparent spectra in fingerprint region; (d) simulated spectra preprocessed by the weighted EMSC1 (MSC plus linear term) model.
Figure 3Binary PLSDA classification of healthy and damaged samples based on OARSI grades. Models were established using preprocessed spectra of human dataset 2. From left to right: (1) benchmark broadband data, (2) sparse spectra of the benchmark data, (3) sparse raw data, (4) sparse data with simple preprocessing, (5) sparse data preprocessed by MSC, (6) sparse data preprocessed by EMSC1. Overall misclassification rate (MCR = 1-Accuracy) as well as False Negative Rate (FNR = 1-Sensitivity) and False Positive Rate (FPR = 1-Specificity) for the damaged group are provided.
Figure 4Binary PLSDA classification of healthy and damaged samples based on OARSI grades. Models were established using preprocessed simulated cartilage spectra. From left to right: (1) benchmark broadband data, (2) sparse spectra of the benchmark data, (3) sparse raw data, (4) sparse data with simple preprocessing, (5) sparse data preprocessed by MSC, (6) sparse data preprocessed by EMSC1. Overall misclassification rate (MCR = 1-Accuracy) as well as False Negative Rate (FNR = 1-Sensitivity) and False Positive Rate (FPR = 1-Specificity) for the damaged group are provided.
Figure 5A flowchart for the PCA simulation of spectra. Blue blocks denote datasets, green blocks denote a computational action and yellow blocks denote results from the corresponding green block.
Figure 6An average broadband spectrum of human cartilage obtained using dataset 2 in blue and the seven selected wavenumbers shown by red circles. The wavenumbers 1800, 1745, 1620, 1560, 1210, 1080, and 850 cm−1 were selected based on their relevance to cartilage quality assessment and spectral preprocessing.