Literature DB >> 32674566

Partial Least-Squares Regression as a Tool to Retrieve Gas Concentrations in Mixtures Detected Using Quartz-Enhanced Photoacoustic Spectroscopy.

Andrea Zifarelli^1,2, Marilena Giglio^1,2, Giansergio Menduni^2,3, Angelo Sampaolo^1,2, Pietro Patimisco^1,2, Vittorio M N Passaro³, Hongpeng Wu^1,4, Lei Dong^1,4, Vincenzo Spagnolo^1,2.

Abstract

We report on a statistical tool based on partial least-squares regression (PLSR) able to retrieve single-component concentrations in a multiple-gas mixture characterized by spectrally overlapping absorption features. Absorption spectra of mixtures of CO-N2O and mixtures of C2H2-CH4-N2O, both diluted in N2, were detected in the mid-IR range by exploiting quartz-enhanced photoacoustic spectroscopy (QEPAS) and using two quantum cascade lasers as light sources. Single-gas reference spectra of each target molecule were acquired and used as PLSR-based algorithm training data set. The concentration range explored in the analysis varies from a few parts-per-million (ppm) to thousands of ppm. Within this concentration range, the influence of the gas matrix on nonradiative relaxation processes can be neglected. Exploiting the ability of PLSR to deal with correlated data, these spectra were used to generate new simulated spectra, i.e., linear combinations of the reference ones. A Gaussian noise distribution was added to the created data set, simulating the real QEPAS signal fluctuations around the peak value. Compared with standard multilinear regression, PLSR predicted gas concentrations with a calibration error up to 5 times better, even with absorption features with spectral overlap greater than 97%.

Entities: Chemical Disease Gene Species

Year: 2020 PMID： 32674566 PMCID： PMC8009469 DOI： 10.1021/acs.analchem.0c00075

Source DB: PubMed Journal: Anal Chem ISSN： 0003-2700 Impact factor: 6.986

Optical trace gas detection techniques are of great interest for a wide range of real world applications spanning from environmental protection[1,2] and health monitoring[3,4] to industrial process control and security,[5,6] since they offer high sensitivity and selectivity together with fast response time. Among all optical spectroscopic techniques, quartz-enhanced photoacoustic spectroscopy (QEPAS) has emerged as a powerful, reliable, and robust technique, with demonstrated high sensitivity in the detection of several trace gas species.[7−10] QEPAS exploits the photoacoustic effect and uses a quartz tuning fork (QTF) to detect the weak sound waves produced by molecules absorbing modulated light. Although based on a light absorption process, QEPAS works differently from direct absorption spectroscopy. The QTF signal strongly depends on the acoustic wave generation efficiency within the gas sample, which in turns is strongly related to the targeted gas molecules. For these reasons, the measured signals are not directly proportional to the line strength of the targeted absorption features. In addition, QEPAS is a wavelength-independent technique, in which the same QTF can operate with laser sources emitting in the spectral range from UV to THz. This classifies QEPAS as an ideal technique for multigas detection. Many applications require the detection of one or more analytes in a multigas mixture, specially at atmospheric pressure. Several approaches have been developed in QEPAS sensing to selectively identify different components within a mixture. One possibility is to exploit the full tunability range of a single laser source to target not-overlapped absorption features.[11] Another approach is to employ several laser sources, each one targeting the absorption feature of a single component of the gas mixture. In this case, the light sources are shined in sequence[12] or, for the specific instance of a two-gas mixture, simultaneously excite the QTF fundamental and first overtone resonance mode, respectively.[13−15] QEPAS typically targets isolated absorption features to evaluate the analytes concentrations and avoid interferences from other species contained in the gas matrix. A partially resolved or unresolved spectrum, resulting from the overlap of absorption features of different gases requires a distinct approach. Multivariate analysis (MVA) is generally used to analyze and discriminate each independent analyte of a gas mixture, treated as a physical system made up of several components. The most common MVA approach is the multilinear regression (MLR), which extends the standard linear regression to multiple variables. MLR models the relationship between the concentration of each component and the measured spectra based on ordinary least-squares. An iterative fit procedure is employed to minimize the sum-of-squares of the differences between measured and predicted values, with no possibility to predict the presence of other components besides the ones used as references. This procedure is efficient as long as the experimental data, namely X-variables, are uncorrelated or at least weakly correlated, and affected by low noise. MLR shows a high statistical significance when there is no collinearity among the predictor variables. When two or more variables in a multiple regression model are correlated (multicollinearity), they cannot independently predict the value of the dependent variable, leading to a decrease in the statistical significance of the prediction. Therefore, when dealing with complex systems made of correlated data,[16] which is the case for spectroscopic analysis of overlapping absorption features of different components in a gas mixture, these requirements cannot be guaranteed, and the use of an MLR approach can result in a lack of precision and accuracy.[17,18] Moreover, MLR models can easily fall into overfitting problems dealing with spectroscopic data, due to the high number of involved variables.[19] Sampaolo[20] and Giglio[21] detected merged absorption features using QEPAS-based sensors and analyzed using MLR. In both cases, the regression technique results in calculating values with large confidence intervals. This suggests empowering all the laser based spectroscopic techniques with a more sophisticated analysis tool whenever strongly overlapping gas species must be analyzed in a mixture. Partial Least Squares Regression (PLSR) is an excellent candidate to overcome these limitations. Originally developed as a tool for social and economic sciences,[22] PLSR has established itself as a solid technique for modeling complex systems in physics and chemistry branches in recent years.[23−27] PLSR extends the MLR approach to deal with a large number of strongly correlated and noisy experimental data. In this work, we combined the QEPAS technique with PLSR to identify gas components in a mixture with strongly overlapping absorption features over the full spectral dynamic range of quantum cascade laser (QCL) sources. A two-gas mixture composed of carbon monoxide (CO) and nitrous oxide (N2O) and a three-gas mixture of acetylene (C2H2), methane (CH4), and nitrous oxide have been analyzed. Both mixtures are diluted in nitrogen (N2). Absolute concentrations of gas components in the mixtures were estimated starting from single-gas reference spectra. Then, the results of the PLSR algorithm were compared with a standard MLR approach.

Partial Least Squares Regression

The multiple regression equation in matrix form is as follows:where X is the n × m matrix of independent variables (matrix of experimental spectra), Y the n × k matrix of the predicted values of the variables (matrix of physical parameters to be estimated, i.e., the gas component concentrations), B is the m × k matrix of the regression coefficients, and E is the n × k errors matrix, assumed to be uncorrelated and with the same variance. PLSR is based on the assumption that the investigated system is influenced by a set of factors called latent variables (LVs).[28] The prediction is achieved by extracting LVs having the best predictive power from the predictors.[18] From a geometrical point of view, this procedure is equal to a projection of the X-variables into a new space, representative of the latent variables. The strength of the PLSR method compared to other MVA techniques (i.e., multiple linear regression, ridge regression etc.) is in the stability of predictors. Since the uncertainty of the estimated parameters is the dominant factor in the variability of predictors, it is crucial to keep the number of variables as low as possible. In this way, PLSR gives the minimum number of necessary variables.[29,30] This technique can be used for both modeling the underlying relationship between physical or chemical parameters and performing predictive analysis on a sample with unknown properties requiring evaluation. The latter condition assumes a machine learning-like approach, where the experimental data set X is split into a training-set X, associated with a known Y, and a test-set X, with Y to be evaluated. With the aim of evaluating the concentrations of chemical species in a multigas mixture, the training-data set will be developed starting from single-gas spectra used as reference spectra to calibrate the model and analyze the gas mixtures spectra. The PLSR analysis is performed on the training-set to calculate the regression coefficients matrix B, used in turn to evaluate the Y matrix via the matrix product: Y = X×B. The regression matrix B provides information about the correlation between the experimental data set and the concentration of the corresponding gas, since a high absolute value of the regression coefficient highlights a significant influence of the experimental point on the gas concentration.[18] To perform PLSR, a MATLAB code has been developed using MATLAB built-in Simple Partial Least Squares (SIMPLS) algorithm to perform the regression.[31,32] In contrast with MLR, where the error on calibration is calculated as the error on the regression coefficients, the evaluation of the PLSR calibration error is not straightforward. A reliable tool for the estimating calibration errors is the 10-fold cross-validation (CV).[33] This procedure returns the root mean squared error of calibration (RMSECV, ε) based on the algorithm performance in the training step, which is known a priori without any information about the test data set. Therefore, RMSECV is based on the predictive ability of the PLS algorithm rather than on the quality of the measurements under test. For these reasons, the estimation of CV-RMSEP will be considered in the following discussion as the error associated with the PLS prediction of gas concentrations in the analyzed mixtures.[34−36] Root mean squared error of prediction (RMSEP) will be also evaluated comparing the expected concentrations and the retrieved values, for each analyte.[37]

Experimental Section

The PLSR algorithm has been tested to retrieve the concentration of the single species in a two-gas mixture and a three-gas mixture. Absorption spectra of gas mixtures were acquired by using the QEPAS setup depicted in Figure .

Figure 1

QEPAS sensor for multigas detection. QCL, quantum cascade laser; P, pinhole; L, lens; QTF, quartz tuning fork; ADM, acoustic detection module; PM, power meter; TA, transimpedance amplifier; TEC, thermo-electric cooler; WFG, waveform generator; DAQ, digital acquisition card; PC, personal computer; GC, gas cylinder; and PRESSURE CTRL, pressure controller. An AdTech QCL with a central emission wavelength at 4.61 μm and a Corning QCL with a central emission wavelength at 7.72 μm were used to detect N2O and CO in a two-gas mixture and C2H2, CH4, and N2O in a three-gas mixture, respectively. For both mixtures, the absorption features can be detected by varying the laser injection current within its dynamic range, at a fixed operating temperature. The setup allowed an easy interchange of QCL sources. The laser beams were first spatially filtered by using a pinhole and then focused within an acoustic detection module (ADM) by means of a lens with focal length f = 75 mm. The ADM (Thorlabs ADM01) consisted of a gas cell, equipped with two windows (Thorlabs WW01050-E1 with 2–5 μm AR coating and Thorlabs WW71050-E3 with 7–12 μm AR coating), a pair of connectors for gas inlet and outlet and a custom T-shaped quartz tuning fork with a resonance frequency of f0 = 12458 Hz and a quality factor of 12 500 at atmospheric pressure.[38] A power meter was set behind the ADM for alignment purpose. All measurements were performed at atmospheric pressure (P = 760 Torr) and room temperature (T = 25 °C). The piezoelectric current generated by the QTF was collected and transduced into a voltage signal by a transimpedance amplifier with a feedback resistor Rfb = 10 MΩ. The voltage signal was sent to an EG&G model 7265 lock-in amplifier, set with a time constant of 100 ms. Both QCLs were polarized using an Arroyo 5300 current driver. An Arroyo 4300 thermo-electric cooler (TEC) was used to stabilize the operating temperature. QCL emission frequencies were tuned by sweeping the laser injection current with a 2 mHz triangular ramp and were simultaneously modulated by a sinusoidal waveform with frequency f0/2. The lock-in demodulated the QTF voltage signal at f0: in this way the sensor was operated in 2f based wavelength modulation. Both the sweep and the modulation were provided by a Tektronix AFG3102 waveform generator, which also supplied the reference signal for the lock-in amplifier at f0/2. The demodulated output signal was then sent to a DAQ card (National Instrument 6002) and stored on a PC using a LabVIEW-based software. All the measurements were performed in a continuous gas flow of 30 sccm. Four cylinders with certified concentrations of the single gas targets (1000 ppm of CO in N2, 1000 ppm of N2O in N2, 1000 ppm of C2H2 in N2, and 1% of CH4 in N2) and one cylinder of pure N2 were used to generate different gas mixtures. A gas mixer (MCQ Instrument Gas Blender 1003) was used to manage gas flows for up to 3 different input gas lines at the same time, with 1σ single-channel accuracy of ∼1% provided by the instrument datasheet. The pressure inside the gas line was fixed and monitored by an MKS Pressure Controller Type 649.

Results and Discussion

Two-Gas Mixture Detection

HITRAN database[39] was used to simulate the absorption cross section of 1000 ppm of CO in N2 and 1000 ppm of N2O in N2, at atmospheric pressure and room temperature over the whole spectral dynamic range of the AdTech QCL (2188.8–2191.2 cm–1). The results of the simulation are shown in Figure (a). The CO exhibits a Lorentzian-like absorption feature peaked at 2190.02 cm–1 while the N2O shows two partially merged absorption features with peaks at 2189.35 and 2189.4 cm–1 and a well-isolated Lorentzian-like absorption feature at 2190.35 cm–1.

Figure 2

(a) HITRAN simulation of absorption cross section spectrum and (b) QEPAS scan of 1000 ppm of CO in N2 (black curve) and 1000 ppm of N2O in N2 (red curve).

(a) HITRAN simulation of absorption cross section spectrum and (b) QEPAS scan of 1000 ppm of CO in N2 (black curve) and 1000 ppm of N2O in N2 (red curve). As a first step, the single-gas reference spectra were acquired by analyzing gas mixture coming directly from the gas cylinders, without the use of the mixer. Hence, the reference spectra are referred to certified concentrations. To scan the spectral range reported in Figure (a), the AdTech QCL operating temperature was set at 15 °C, and the injection current was tuned from 230 mA to 310 mA. The maximum optical power measured at the injected current of 310 mA (corresponding to 2188.8 cm–1) was 75 mW. The QEPAS signal was collected with a lock-in amplifier demodulation phase φ1 = −132.17°, corresponding to the phase value maximizing the CO peak signal. The QEPAS scan referred to the positive slope of the triangular ramp is reported in Figure (b). In order to enlarge the data statistics, both positive and negative ramp slopes were considered in the PLSR algorithm. With a signal acquisition time of 300 ms, a single spectrum consisted in 1666 data-points. To ensure the reproducibility of the measurements, the reference data are collected every time a new set of mixtures spectra is acquired. In this way, the consistency of the operative conditions is guaranteed. The CO reference spectrum shows a single absorption feature with a signal intensity of ∼2 mV, corresponding to the isolated absorption peak at 2190.02 cm–1 in Figure (a). From left to right, the N2O reference spectrum shows three features with peak intensities of ∼1, ∼0.3, and ∼0.6 mV. The first two peaks are clearly due to the partially merged absorption features at 2189.35 cm–1 and at 2189.4 cm–1, while the third peak is associated with the isolated absorption line peaked at 2190.35 cm–1. The 1σ-noise level measured far from the absorption features is ∼3 μV for both spectra, resulting in a Signal-to-Noise Ratio (SNR) of 660 and 330 for CO and N2O, respectively. The measured noise level is comparable to the calculate thermal noise value of 2.6 μV, which affects the resonator in the employed configuration.[40] Starting from the certified concentrations of 1000 ppm of N2O in N2 and 1000 ppm of CO in N2, the following mixtures of N2O–CO were generated by using the gas blender: 250–750 ppm, 500–500, and 750–250 ppm, in N2. All QEPAS measurements were performed by setting the lock-in phase to φ1. The acquired QEPAS spectra scans are reported in Figure .

Figure 3

QEPAS scan acquired for three mixtures containing 250 ppm of N2O and 750 ppm of CO (black curve), 500 ppm of N2O and 500 ppm of CO (red curve), and 750 ppm of N2O and 250 ppm of CO (blue curve), respectively, in N2. All absorption features of N2O and CO are clearly distinguishable. Spectral overlap is only limited to the superposition of the right-side negative lobe of the CO absorption feature with the left-side negative lobe of the N2O absorption feature peaked at 2190.35 cm–1.

Two-Gas Mixture PLS Model Calibration and Test

Data analysis starts from the configuration of the training data set for PLS model calibration. MATLAB-based algorithm projects the training data set on a number of PLS factors, i.e., the number of latent variables, equal to two, representing the components of the gas mixtures. However, larger data sets correspond to lower calibration errors. Unlike MLR, PLSR allows the employment of simulated data to perform the regression.[41,42] Hence, the experimental data set was enriched by simulated spectra calculated as linear combinations of the actual reference spectra.[25] Actual reference spectra were combined using 10 coefficients, from 0 to 0.9 at step of 0.1, properly chosen according to the concentration range expected in mixtures under investigation. The simulation process resulted in 10PLS-factors = 100 simulated spectra. This enlargement of the experimental data set represents one of the main advantages of PLSR compared to MLR. In contrast to MLR, PLSR allows for the addition of input noise fluctuations on the simulated spectra to consider the non-negligible fluctuations affecting the reference spectra. A reliable distribution of the input noise fluctuations must match the distribution of the QEPAS signal fluctuations around its mean value, namely the peak value. The experimental distribution has been retrieved by repeatedly scanning over the QEPAS absorption peaks. For both target gases, a Gaussian noise distribution with a 1σ-noise fluctuation of ∼3% around the mean value was obtained. Hence, a white Gaussian noise was superimposed to the simulated reference spectra. With these conditions, the X-training data set is a 100 × 1666 matrix (100 different simulated reference spectra, each one composed of 1666 data samples) while the Y-training data set is a 100 × 2 matrix with the related gas concentrations. A preliminary analysis on the whole data set showed that modeling the system with 2 PLS factors explains more than 99% of Y variance, confirming the validity of the theoretical assumptions about physical relevance of PLS factors. Then, the PLSR algorithm is used to calculate regression coefficients (matrix B). In Table , the results of the PLSR applied to the three gas mixtures are reported, together with MLR results and their associated calibration errors ε. The nominal concentrations of the mixture components are also reported with an accuracy of η = ±10 ppm, calculated by considering the gas mixer flow accuracy of 1% starting from the certified gas cylinder concentrations. Considering this instrumental limitation, we used the cross-validation error ε as main indicator for quantifying the robustness of the regression model employed, i.e., PLSR and MLR.

Table 1

PLSR and MLR Results (Concentrations and Calibration Errors) for Each Component of Dual-Gas Mixturesa

The nominal concentrations are also reported together with the accuracy determined from the gas mixer datasheet.

The nominal concentrations are also reported together with the accuracy determined from the gas mixer datasheet. The results show that PLSR and MLR predict the same concentration values in gas mixtures, while the RMSECV estimated by PLSR is up to 3 times lower than the MLR estimation. The estimated values of gas concentrations are within the 2σ interval determined by the accuracy of the gas mixer. The PLSR-RMSEP are equal to 18 and 17 ppm, while the MLR-RMSEP are equal to 19 and 17 ppm, for N2O and CO, respectively. Due to the instrumental limitation, it is not possible to compare the collected results with the reference standard concentration values in the gas line. However, the stability of the algorithms results, which is strictly connected to the regression precision, can be verified by performing the analysis on repeated measurements. As expected from the theoretical background,[18] the PLSR results are less affected by experimental data fluctuations. This means that bias effects in concentrations estimation can be removed in a validation step to be performed before moving the sensor outside the laboratory. The influence of the input noise fluctuations added to the simulated spectra on the calibration error has been evaluated. PLSR analysis was performed by varying the input 1σ-noise fluctuation to evaluate the effect both on the retrieved concentrations and on the associated errors ε. Negligible variations in the estimated concentration values (<1 ppm) were calculated for fluctuations up to 50%. Whereas, the ε values are strongly dependent from input noise fluctuations. Figure shows the total RMSECV, calculated as the square root of the sum of the squared ε of the single gases divided by the number of gases, as a function of input 1σ-noise fluctuations.

Figure 4

Total RMSECV as a function of the 1σ-noise fluctuation added to simulated spectra in training data set (black squares) and the best linear fit (red line). Inset: zoom in the range 0–5% of noise fluctuations, as typical values in spectroscopic experiments.

Three-Gas Mixture Detection

Gas mixtures with three components having a strong spectral overlap were tested to get a benchmark on the efficiency of PLSR in analyzing QEPAS-based absorption features. With this aim, C2H2, CH4, and N2O were selected. Figure (a) shows HITRAN database simulations[39] at atmospheric pressure and room temperature of the listed gases absorption cross-section within the emission spectral range of the Corning QCL operated at 30 °C, when varying the injection current from 200 to 270 mA (1295.5 cm–1- 1296.5 cm–1). The cross-sections are scaled on the certified concentrations in gas cylinders: 1000 ppm for C2H2, 1000 ppm for N2O, and 10 000 ppm for CH4, in N2. C2H2 has a strong absorption features peaked at 1295.78 cm–1 and a weak one at 1296.16 cm–1; CH4 has two absorption lines falling at 1295.81 and 1296.12 cm–1; and N2O shows a single absorption line peaked at 1296.27 cm–1. The maximum optical power detected at the injected current of 270 mA is 112 mW. To build the training data set, the single-gas reference spectra for the three target gases were acquired directly from the gas cylinders. The lock-in amplifier demodulation phase was fixed at φ2 = −136.75°, corresponding to the phase maximizing the C2H2 peak signal. This choice allowed the enhancement of the C2H2 spectral feature, showing the weakest absorption coefficient. The three QEPAS spectral scans obtained by sweeping the QCL injection current are reported in Figure (b). As for the two-gas mixtures, the reference data are collected every time a new set of mixtures spectra is acquired, in order to ensure the consistency of the operative conditions.

Figure 5

(a) HITRAN simulation of the absorption cross section spectrum and (b) QEPAS scan of 1000 ppm of C2H2 in N2 (black curve), 10 000 ppm of CH4 in N2 (red curve), and 1000 ppm of N2O in N2 (blue curve). The C2H2 reference spectrum shows a characteristic line-shape of the second derivative of Lorentzian profile, with a signal intensity of ∼1.9 mV. Due to the choice of lock-in demodulation phase, the spectral characteristics of N2O have the same line-shape of C2H2 but inverted. The QEPAS CH4 reference spectrum has a pronounced absorption peak of ∼4.3 mV, corresponding to the strongest absorption peak at 1295.81 cm–1, while the absorption feature at 1296.12 cm–1 is also recognizable but inverted in shape due to a difference in signal phase with the peak at 1295.81 cm–1. On the left side of the graph, CH4 and C2H2 strongly overlap; on the right side, the N2O absorption feature is weakly disturbed by the other two gases. The measured 1σ-noise is ∼4 μV for all three gases, comparable with the QTF thermal noise and resulting in an SNR of 470, 1150, and 1500 for C2H2, CH4, and N2O, respectively. Starting from the certified concentrations, five mixtures of C2H2–CH4–N2O, with a fixed concentration of 3000 ppm of CH4 have been generated, as reported in the legend of Figure . All the QEPAS measurements were performed by setting the lock-in phase to φ2. The acquired QEPAS spectra scans are reported in Figure .

Figure 6

QEPAS scan for five mixtures containing 0 ppm of C2H2, 3000 ppm of CH4 and 700 ppm of N2O (purple curve), 150 ppm of C2H2, 3000 ppm of CH4 and 550 ppm of N2O (green curve), 350 ppm of C2H2, 3000 ppm of CH4 and 350 ppm of N2O (blue curve), 550 ppm of C2H2, 3000 ppm of CH4 and 150 ppm of N2O (red curve), 700 ppm of C2H2, 3000 ppm of CH4, and 0 ppm of N2O (black curve). As expected, the strong absorption feature of N2O is well recognizable at 1296.27 cm–1(650 data sample). The CH4 and C2H2 absorption features are completely overlapped in the wavenumber range from 1295.65 to 1295.91 cm–1, while deformations of the spectra induced by the increasing amount of acetylene can be observed in the 1295.50 cm–1– 1295.65 cm–1 sample range.

Three-Gas Mixture PLS Analysis

The PLSR has been performed by projecting the training data set on three PLS factors, representing the number of gas components in the mixtures. As for the two-gas mixture, a 1000 × 1666 matrix X, has been obtained by simulating 10PLS-factors = 1000 spectra with a superimposed Gaussian noise. The Y training data set is a 1000 × 3 matrix with the associated gas concentrations. Preliminary analysis on the whole data set shows that modeling the system with 3 PLS factors explains more than 99% of Y variance. The PLSR is therefore performed, and the regression coefficients matrix B is calculated. As for the two-gas mixture analysis, variations lower than 1 ppm in the estimated concentration values were calculated for input 1σ-noise fluctuations up to 50%. The analysis of the Total RMSECV as a function of Gaussian noise fluctuation showed a linear trend with a best fit equation y = (4.65 ± 0.04)x + (−0.61 ± 0.64) and R2 = 0.999. In Table , the PLSR results for the five mixtures shown in Figure and related MLR results are reported.

Table 2

PLSR and MLR Results (Concentrations and Calibration Errors) for Each Component of the Analyzed Three-Gas Mixturesa

The nominal concentrations are also reported together with the accuracy determined by the gas mixer datasheet.

The nominal concentrations are also reported together with the accuracy determined by the gas mixer datasheet. In contrast to the two-gas mixtures analysis, PLSR and MLR predict different concentration values. The estimated values of gas concentrations are within the 2σ interval determined by the accuracy of the gas mixer, with few exceptions for C2H2. This can be ascribed to the difficulties of both methods in the identification of the C2H2 contribution, due to the strong overlap with the CH4 absorption line, as supported by the highest relative error measured for C2H2 in all mixtures. With three-gas mixtures with strongly overlapped features, calibration error by PLSR is significantly lower than MLR, up to a factor of ∼5. The calculated PLSR-RMSEP are equal to 32, 113, and 9 ppm, while the MLR-RMSEP are equal to 39, 130, and 9 ppm for C2H2, CH4, and N2O, respectively. In mixture 1 with no C2H2, both PLSR and MLR predict the presence of C2H2, with a concentration of 44 and 55 ppm, respectively. A lower accuracy for the MLR can be explained considering that the algorithm is forced to search for all the gas components set as reference spectra. Therefore, a higher bias in regression is expected, reducing the accuracy of the prediction. However, as reported for the two-gas mixtures, even with three-gas mixtures, the PLSR results were verified to be more stable to repeated measurements compared to MLR ones. The evidence of the bias influence can be observed repeating the analysis excluding the C2H2 reference spectrum from the training data set. The retrieved concentrations thus become 2958 and 710 ppm for CH4 and N2O, respectively, and a decrease in calibration error is obtained, with εCH4 = 4.2 ppm and εN2O = 0.4 ppm. In the mixture with no N2O, both methods predict a negative value for N2O concentration: this is obviously not possible and must be intended as a zero concentration. However, with respect to C2H2 estimation in the mixture with no C2H2, both algorithms are more accurate in the prediction because the N2O absorption feature is well-defined within all mixture spectra. These kinds of false-positive results may occur when dealing with missing components, as well as false-negative results may occur when one of the target analytes generates a negligible QEPAS signal. For real-field applications, regression algorithms are trained on analyte concentrations similar to the ones expected in the sample to test. In this case, a threshold concentration can be set to discern the effective presence of a chemical species, based on the expected sample composition.

Overlap Parameter Estimation

The results obtained for the two-gas mixtures showed that, when dealing with weakly overlapping spectral features, the PLSR and the MLR return the same values, but the PLSR calibration error estimation can be up to 3 times lower than that of the MLR. When analyzing spectra originated by strongly overlapping absorbing features, PLSR predicts different gas concentrations with respect to MLR, with a lower calibration error up to a factor of 5. To quantify the overlap between absorption features, a parameter should be introduced. Considering the Lorentzian-like line-shape (see Figures (a) and 5(a)), the overlap parameter Z between two absorption features labeled as 1 and 2 can be defined as follows:where x is the peak wavenumber, and w is the normalized Lorentzian width defined as the ratio between the full-width-half-maximum of the Lorentzian curve w and the peak value A. The overlap parameter tends to 0 when the distance between the absorption peaks tends to w,1 + w,2, while Z is equal to 1 when x1 = x2. For features whose peaks distance is greater than w,1 + w,2, Z is negative and overlap effects are negligible. The overlap parameters (in %) calculated for adjacent absorption peaks in the three-gas mixture are ZCO-N2O = 7.3%, ZCH4-N2O = 79.8%, and ZC2H2-CH4 = 97.4%. With overlap as high as 97%, PLSR is able to identify both contributions with a precision significantly higher than that of the standard MLR.

Conclusions

In this work, we combined QEPAS with PLSR analysis to retrieve single components gas concentrations in multigas samples. Two different mixtures have been analyzed, one composed of two gases (CO–N2O) and the other one composed of three gases (C2H2–CH4–N2O), both diluted in N2. A QEPAS sensor has been realized using a custom quartz tuning fork and employing two QCLs emitting at 4.61 and 7.72 μm for the two- and three-gas mixture investigation, respectively. As a first step, the single-gas reference spectra were acquired. The PLSR procedure was implemented using a training-test approach. The training data set was built starting from the reference spectra and was enlarged by means of simulated spectra, calculated as linear combinations of reference ones, exploiting the ability of PLS model to deal with correlated measurements. A Gaussian distribution noise was added to the simulated spectra to consider the experimental errors involved in the measurements. PLSR calibration errors have been calculated as a cross-validation error on the training data set. Then, the PLSR algorithm was employed to retrieve gas concentrations in a series of gas mixtures generated from certified single-gas concentrations. The estimated values of gas concentrations are within the 2σ interval determined by the accuracy of the gas mixer. Compared to MLR, the error of calibration decreases by a factor of ∼3, for CO–N2O mixtures, and up to a factor of ∼5, for C2H2–CH4–N2O mixtures. To properly quantify the superposition among the spectral features, an overlap parameter was defined by considering the distance between the spectral peaks and their width. This allowed us to affirm that PLSR can identify a single-gas contribution in a mixture even when a 97% spectral overlap occurs. Further applications of the PLSR approach can involve the analysis of gas mixtures with missing components (like mixture 1 in Table ) exploiting the PLS capability of finding the number of components in the training step. The mutual influence of the analytes, as in the case of species acting as promoters, could also be estimated. The next step will be the testing of the sensor outdoor, in this case the concentration of water vapor acting as relaxation promoter must be fixed by using a Nafion humidifier.[13] The ADM will also be heated up to 40 °C to avoid adsorption of sticky molecules like H2O and NH3 on the internal surfaces of the sensor.[43]

15 in total

1. Thermoelectrically cooled quantum-cascade-laser-based sensor for the continuous monitoring of ambient atmospheric carbon monoxide.

Authors: Anatoliy A Kosterev; Frank K Tittel; Rüdeger Köhler; Claire Gmachl; Federico Capasso; Deborah L Sivco; Alfred Y Cho; Shawn Wehe; Mark G Allen
Journal: Appl Opt Date: 2002-02-20 Impact factor: 1.980

2. Allan Deviation Plot as a Tool for Quartz-Enhanced Photoacoustic Sensors Noise Analysis.

Authors: Marilena Giglio; Pietro Patimisco; Angelo Sampaolo; Gaetano Scamarcio; Frank K Tittel; Vincenzo Spagnolo
Journal: IEEE Trans Ultrason Ferroelectr Freq Control Date: 2015-10-27 Impact factor: 2.725

3. Quartz-enhanced photoacoustic sensor for ethylene detection implementing optimized custom tuning fork-based spectrophone.

Authors: Marilena Giglio; Arianna Elefante; Pietro Patimisco; Angelo Sampaolo; Fabrizio Sgobba; Hubert Rossmadl; Verena Mackowiak; Hongpeng Wu; Frank K Tittel; Lei Dong; Vincenzo Spagnolo
Journal: Opt Express Date: 2019-02-18 Impact factor: 3.894

4. Tuning forks with optimized geometries for quartz-enhanced photoacoustic spectroscopy.

Authors: Pietro Patimisco; Angelo Sampaolo; Marilena Giglio; Stefano Dello Russo; Verena Mackowiak; Hubert Rossmadl; Alex Cable; Frank K Tittel; Vincenzo Spagnolo
Journal: Opt Express Date: 2019-01-21 Impact factor: 3.894

5. Laser spectroscopy applied to environmental, ecological, food safety, and biomedical research.

Authors: Sune Svanberg; Guangyu Zhao; Hao Zhang; Jing Huang; Ming Lian; Tianqi Li; Shiming Zhu; Yiyun Li; Zheng Duan; Huiying Lin; Katarina Svanberg
Journal: Opt Express Date: 2016-03-21 Impact factor: 3.894

6. QEPAS based ppb-level detection of CO and N2O using a high power CW DFB-QCL.

Authors: Yufei Ma; Rafał Lewicki; Manijeh Razeghi; Frank K Tittel
Journal: Opt Express Date: 2013-01-14 Impact factor: 3.894

Review 7. Detection of cancer through exhaled breath: a systematic review.

Authors: Agne Krilaviciute; Jonathan Alexander Heiss; Marcis Leja; Juozas Kupcinskas; Hossam Haick; Hermann Brenner
Journal: Oncotarget Date: 2015-11-17

8. Multivariate Analysis as a Tool to Identify Concentrations from Strongly Overlapping Gas Spectra.

Authors: Yannick Saalberg; Marcus Wolff
Journal: Sensors (Basel) Date: 2018-05-15 Impact factor: 3.576

Review 9. Recent Developments in Spectroscopic Techniques for the Detection of Explosives.

Authors: Wei Zhang; Yue Tang; Anran Shi; Lirong Bao; Yun Shen; Ruiqi Shen; Yinghua Ye
Journal: Materials (Basel) Date: 2018-08-06 Impact factor: 3.623

10. Broadband detection of methane and nitrous oxide using a distributed-feedback quantum cascade laser array and quartz-enhanced photoacoustic sensing.

Authors: Marilena Giglio; Andrea Zifarelli; Angelo Sampaolo; Giansergio Menduni; Arianna Elefante; Romain Blanchard; Christian Pfluegl; Mark F Witinski; Daryoosh Vakhshoori; Hongpeng Wu; Vittorio M N Passaro; Pietro Patimisco; Frank K Tittel; Lei Dong; Vincenzo Spagnolo
Journal: Photoacoustics Date: 2019-12-26

5 in total

1. Calibration of Quartz-Enhanced Photoacoustic Sensors for Real-Life Adaptation.

Authors: Jesper B Christensen; David Balslev-Harder; Lars Nielsen; Jan C Petersen; Mikael Lassen
Journal: Molecules Date: 2021-01-25 Impact factor: 4.411

2. Fiber-Coupled Quartz-Enhanced Photoacoustic Spectroscopy System for Methane and Ethane Monitoring in the Near-Infrared Spectral Range.

Authors: Giansergio Menduni; Fabrizio Sgobba; Stefano Dello Russo; Ada Cristina Ranieri; Angelo Sampaolo; Pietro Patimisco; Marilena Giglio; Vittorio M N Passaro; Sebastian Csutak; Dario Assante; Ezio Ranieri; Eric Geoffrion; Vincenzo Spagnolo
Journal: Molecules Date: 2020-11-28 Impact factor: 4.411

3. Rapid Quantitative Analysis of IR Absorption Spectra for Trace Gas Detection by Artificial Neural Networks Trained with Synthetic Data.

Authors: Jens Goldschmidt; Leonard Nitzsche; Sebastian Wolf; Armin Lambrecht; Jürgen Wöllenstein
Journal: Sensors (Basel) Date: 2022-01-23 Impact factor: 3.576

4. High-concentration methane and ethane QEPAS detection employing partial least squares regression to filter out energy relaxation dependence on gas matrix composition.

Authors: Giansergio Menduni; Andrea Zifarelli; Angelo Sampaolo; Pietro Patimisco; Marilena Giglio; Nicola Amoroso; Hongpeng Wu; Lei Dong; Roberto Bellotti; Vincenzo Spagnolo
Journal: Photoacoustics Date: 2022-03-21

5. Multi-gas quartz-enhanced photoacoustic sensor for environmental monitoring exploiting a Vernier effect-based quantum cascade laser.

Authors: Andrea Zifarelli; Raffaele De Palo; Pietro Patimisco; Marilena Giglio; Angelo Sampaolo; Stéphane Blaser; Jérémy Butet; Olivier Landry; Antoine Müller; Vincenzo Spagnolo
Journal: Photoacoustics Date: 2022-09-05

5 in total