Literature DB >> 35721958

Selection of the Effective Characteristic Spectra Based on the Chemical Structure and Its Application in Rapid Analysis of Ethanol Content in Gasoline.

Ke Li1, Chi Zhang2, Biao Du3, Xiaoping Song1, Qi Li1, Zhengdong Zhang1.   

Abstract

Near-infrared (NIR) spectroscopy analysis is one of the most rapid detection methods for determining ethanol content in gasoline. Wavelength selection is a key step in the multivariate calibration analysis of NIR spectroscopy. To improve detection accuracy of ethanol content in gasoline and provide a simpler interpretation, we established NIR spectroscopy, a rapid analysis method based on the effective characteristic spectra. Five effective characteristic spectral bands were used according to the molecular structure of ethanol, followed by the development of four modeling schemes. The four modeling schemes spectra, NIR full spectra, and variable importance projection (VIP) spectra were used for modeling and analysis. The model was established based on the effective characteristic spectra without the interference spectra of aromatic hydrocarbons, achieving the best model performance. In addition, the model was further evaluated by internal cross-validation and external validation. The model's evaluation parameters were as follows: the root mean square error of cross-validation (RMSECV) was 0.6193, the correlation coefficient of internal cross-validation (R CV 2) was 0.9995, the root mean square error of prediction (RMSEP) was 0.5572, and the correlation coefficient of external prediction validation (R P 2) was 0.9995. The effective characteristic spectra model had smaller RMSEP and RMSECV values, and larger R CV 2 and R P 2 values compared to the full spectra and VIP spectra models. In conclusion, the effective characteristic spectra model had the highest accuracy and could provide rapid analysis of the ethanol content in gasoline.
© 2022 The Authors. Published by American Chemical Society.

Entities:  

Year:  2022        PMID: 35721958      PMCID: PMC9202040          DOI: 10.1021/acsomega.2c02282

Source DB:  PubMed          Journal:  ACS Omega        ISSN: 2470-1343


Introduction

Ethanol-gasoline is a new substitute fuel[1] made by mixing pre-determined volume ratios of ordinary gasoline and ethanol fuel. Ethanol fuel is a renewable resource, and use of ethanol-gasoline can significantly alleviate energy demand pressure. Ethanol-gasoline with a high oxygen content can be fully burned, and effectively reduces the emission of carbon monoxide, hydrocarbons, and other harmful substances in automobile exhausts.[2−4] As a low-carbon, clean, and high-quality green fuel, ethanol-gasoline has received considerable attention in recent years.[5] Random inspections of the ethanol-gasoline quality at various gas stations highlighted some problems, for example, the ethanol content was either too high or too low. However, use of substandard ethanol-gasoline led to engine speed instability and engine failure.[6] Therefore, it is necessary to conduct spot checks to ensure that the quality of the refined oil including ethanol-gasoline in the market is of a high standard. Currently, the conventional method for determining ethanol content in gasoline is gas chromatography, which is a reliable method for determining ethanol content in oil. However, this method has disadvantages, including long analysis times and the inability to meet the needs of on-site analysis and real-time detection due to bulky instruments that cannot be easily carried and moved around for analysis. To solve this problem, some rapid analytical methods for measuring ethanol content in gasoline have been developed, for example, Raman spectroscopy[7,8] and Near-infrared (NIR) spectroscopy.[9,10] NIR is a fast, non-destructive analytical method that consumes small amounts of reagents.[11−14] NIR spectroscopy combined with multivariate statistical analytical methods, such as partial least squares or principal component analysis, has been widely used in rapid analysis of ethanol content in gasoline. Mabood et al.,[2] using partial least squares and principal component analysis, established a rapid analytical method to determine ethanol content in gasoline using NIR spectroscopy, achieving a low root mean square error of prediction (RMSEP). Furthermore, Carneiro et al.[15] compared and analyzed the difference between NIR spectroscopy and mid-infrared (MIR) spectroscopy in the rapid analysis of the methanol content in ethanol-gasoline, demonstrating a good prediction performance for the partial least squares model based on NIR spectroscopy. Wavelength selection (or variable selection) is a key step in multivariate calibration analysis of NIR spectra.[16] Appropriate wavelength selection can remove uninformative and interfering variables in the spectra to obtain better model prediction performance and improve interpretability.[17] To date, several NIR wavelength selection methods have been developed. The most commonly used methods include uninformative variable elimination (UVE),[18] successive projection analysis (SPA),[19] moving window partial least squares (MWPLS),[20] interval partial least squares (iPLS),[21] simulated annealing (SA),[22] genetic algorithm (GA),[23] ant colony optimization (ACO),[24] and variable importance in projection (VIP).[25] These wavelength selection methods are based on spectral data, and they have their respective characteristics and advantages in selected applications. This means that none of these methods can achieve good model performance for all types of spectral data. In addition, these methods cannot solve the problem of “false correlations” between spectral variables and properties caused by environmental factors or instrument performance factors. Therefore, the above mentioned methods have no commercial application in the rapid analysis of ethanol-gasoline. To improve the accuracy of the rapid analysis of ethanol content in gasoline using NIR spectroscopy, it is necessary to establish an effective method for selecting characteristic spectra. The chemical structure determines the properties of substances, and different functional groups or chemical bonds in chemical substances have their corresponding characteristic spectral bands.[26−28] Modeling using characteristic spectral bands based on a chemical structure can improve the predictive performance and interpretability of a model.[16,29] Based on this principle, we established a rapid analytical method for ethanol content analysis based on the effective characteristic spectra. Four calibration models were established by analyzing the effective characteristic spectral bands corresponding to each chemical bond of the ethanol molecule. The four models were compared with the models established by the full spectra and variables selected by the VIP method, to verify the accuracy of the effective characteristic spectra model.

Results and Discussion

Effective Characteristic Spectra Selection

Figure shows NIR spectra of 44 ethanol-gasoline samples. An increase in ethanol concentration/content in the gasoline causes an inconsistent variation in the absorption intensity at different wavenumbers. In some spectral ranges, the NIR absorbance increased significantly with an increase in ethanol concentration in the gasoline, while in other spectral ranges, the change was not noticeable. The absorption of ethanol in the NIR spectral region originates from the C–H and O–H bonds in the molecular structure. Gasoline is mainly composed of hydrocarbons, and changes in the absorption strength of the C–H bond were not apparent, whereas changes in the absorption strength of the O–H bond were apparent. In the characteristic absorption spectra of ethanol, the spectra ranged from 6060.171 to 7141.113 cm–1, covering the first overtone of O–H stretching.[30−32] The absorption peaks in the region from 4661.104 to 5000.515 cm–1 are related to the combined absorption frequency of the O–H stretching and bending vibrations.[33,34] The spectral range from 6450.422 to 7407.241 cm–1 resulted from the first overtone of the combination band from C–H + C–H and C–H + C–C stretching;[35,36] and the spectral range from 5660.050 to 6001.389 cm–1 corresponds to the first overtone of C–H stretching.[37] The absorption peaks in the region from 8300.121 to 8500.682 cm–1 are caused by the second overtone of C–H from the methyl group,[15,38] while the peaks below 6060.171 cm–1 contain the first overtone from aromatic C–H stretching.[35] Characteristic spectra of the other components may adversely affect ethanol content. The above characteristic absorption spectra of ethanol were ultimately selected as the effective characteristic spectra to build a calibration analysis model.
Figure 1

Raw NIR spectra of 44 ethanol-gasoline samples. The dashed rectangles 1, 2, 3, 4, and 5 represent the spectral ranges of 4661.104–5000.515 cm–1, 5660.050–6001.389 cm–1, 6000.171–7141.113 cm–1, 6450.422–7407.241 cm–1, and 8300.121–8500.682 cm–1, respectively.

Raw NIR spectra of 44 ethanol-gasoline samples. The dashed rectangles 1, 2, 3, 4, and 5 represent the spectral ranges of 4661.104–5000.515 cm–1, 5660.050–6001.389 cm–1, 6000.171–7141.113 cm–1, 6450.422–7407.241 cm–1, and 8300.121–8500.682 cm–1, respectively. To explore the optimal effective characteristic spectra and eliminate the adverse effects of other interfering spectra, we established four modeling schemes that combined different effective characteristic spectra. The four modeling schemes are listed in Table . Scheme 1 contains all the characteristic spectral regions. Scheme 2 contains the characteristic spectral region of the hydroxyl group. Scheme 3 contains the characteristic spectral region of the hydroxyl group and excludes the interference spectra of aromatic groups. Scheme 4 includes all the characteristic spectral regions and excludes the interference spectra of aromatic groups.
Table 1

Effective Characteristic Spectral Region Modeling Scheme

modeling schemespectral combination (cm–1)combination strategy
14661.104–5000.515all the characteristic spectra region
5660.050–7407.241
8300.121–8500.682
24661.104–5000.515characteristic spectra region of hydroxyl group
6060.171–7141.113
36060.171–7141.113characteristic spectra region of hydroxyl group and excludes the interference spectra of aromatic groups
46060.171–7407.241all the characteristic spectra region and excludes the interference spectra of aromatic groups
 8300.121–8500.682 

Results of the Preprocess Methods’ Optimization

The spectral signal of ethanol-gasoline samples may be disturbed by noise, stray light, baseline drift, and other factors,[39] which may result in irrelevant information in the NIR spectra and affect the accurate analysis of the ethanol content. Five preprocessing methods, namely savitzky-golay smoothing (SGM, 13 points with a second polynomial order), the savitzky-golay derivative (SGD, first derivative, 13 points with a second polynomial order), multiplicative signal correction (MSC), vector normalization (VN), and the standard normal variate (SNV), were used to process the NIR spectra of ethanol-gasoline. Table shows the performance of the calibration analysis model for the full spectra based on different preprocessing methods. Among the five preprocessing methods, the calibration analysis model based on the first derivative processing methods had the smallest root mean square error of cross-validation (RMSECV) and RMSEP values and the largest correlation coefficient of internal cross-validation (RCV2) and correlation coefficient of external prediction validation (RP2) values, indicating that this model has the highest prediction accuracy. Compared to the model based on raw spectra, the RMSECV value of the model established by the first derivative spectra decreased from 0.7682 to 0.6396, the RMSEP value decreased from 1.0955 to 0.7824, the RCV2 value increased from 0.9987 to 0.9991, and the RP2 value increased from 0.9966 to 0.9982. This indicates that the model based on the first derivative spectra has a higher prediction accuracy than the model based on the raw spectra. The first derivative was finally proven to be used for the optimal preprocessing method. The preprocessed NIR spectra of 44 ethanol-gasoline samples are shown in Figure .
Table 2

Performance Comparison of Full Spectra Models using Different Preprocessing Methods

 training set
validation set
preprocess methodRMSECVRCV2RMSEPRP2
without0.76820.99871.09550.9966
first derivative0.63960.99910.78240.9982
SNV0.95810.99790.96760.9973
VN1.84010.99272.64180.9800
MSC0.75970.99871.05010.9968
SGM0.76770.99871.19540.9959
Figure 2

Preprocessed NIR spectra by SGM (13 points with a second polynomial order).

Preprocessed NIR spectra by SGM (13 points with a second polynomial order).

Comparison of the Model Performance Between Effective Characteristic Spectra, Full Spectra, and VIP Spectra

After preprocessing using the optimal method, the calibration analysis models were established by full spectra and the four effective characteristic spectral schemes, and then the validation set samples were predicted. As shown in Table , the models established by the effective characteristic spectra have better parameters than the model established by the full spectra. However, the RMSEP values of Schemes 2 and 4 are slightly larger than those of the full spectral model, which may be due to the interference of other components in the complex gasoline samples. Scheme 3 has the best performance, the values of the model parameters, RMSECV, RMSEP, RCV2, and RP2 were 0.6193, 0.5572, 0.9995, and 0.9991, respectively. The close values of RMSECV and RMSEP mean that the model established by Scheme 3 has high stability and accuracy. These results indicate that the characteristic spectra of Scheme 3 are less disturbed by other components. This also proves that the hydroxyl group spectra, without the interference of aromatic groups, more accurately reflect the ethanol content in gasoline.
Table 3

Comparison of Different Wavelength Selection Methods for Modeling and Analysis of Ethanol Content

 training set
validation set
modeling schemeRMSECVRCV2RMSEPRP2
full spectra0.69360.99910.78240.9982
10.60120.99920.64910.9988
20.62040.99910.90110.9977
30.61930.99950.55720.9991
40.65920.99910.89580.9977
VIP spectra0.62880.99910.88590.9977
To further prove the effectiveness of the calibration analysis model established by the effective characteristic spectra, it was compared with the modeling results of spectral variables screened by the VIP method. After 3112 spectral variables were preprocessed using the first derivative, 640 variables (VIP spectra) with VIP values greater than 1 were screened out. A partial least squares model was then established to conduct predictive analysis on the validation set samples. As shown in Table , the RMSECV value of the VIP spectra model is lower than that of the full spectra model, while the RMSEP value is higher. The RCV2 and RP2 values of the full spectra and VIP spectra models were nearly consistent. Compared with Scheme 3, the RMSECV value of the VIP spectra model decreased from 0.6288 to 0.6193, the RMSEP value decreased from 0.8859 to 0.5572, while RCV2 increased from 0.9991 to 0.9995, and RP2 increased from 0.9977 to 0.9991. The results show a superior prediction performance of the model established in Scheme 3. The effective characteristic spectra model based on the chemical structure achieved highly accurate prediction results. Thus, Scheme 3 was selected for modeling and analyzing the ethanol content in gasoline.

Application of the Models Established By Effective Characteristic Spectra, Full Spectra, and VIP Spectra

The 14 ethanol-gasoline samples in the prediction set were analyzed using a calibration analysis model established based on Scheme 3, full spectra, and VIP spectra. Internal cross-validation and external validation were used to evaluate and analyze the results. Figure shows the internal cross-validation results. The RMSECV value of the partial least squares model established based on Scheme 3 or VIP spectra is better than that of the model based on the full spectra, indicating that the characteristic spectra model can eliminate irrelevant spectral variables to improve accuracy. Compared to the full spectra and the VIP spectra models, the RMSECV value of the model established based on Scheme 3 decreased by 10.71 and 1.51%, respectively, indicating that the spectral information of the effective characteristic spectra is more “characteristic” and could more accurately reflect the sample content. In addition, the model based on Scheme 3 has the largest RCV2 value (close to 1), indicating that the model has the highest accuracy, and the greatest correlation between the predictive value and the reference value during the internal cross-validation.
Figure 3

Internal cross-validation results of the ethanol content in gasoline using partial least squares models based on different spectral ranges.

Internal cross-validation results of the ethanol content in gasoline using partial least squares models based on different spectral ranges. External validation results are shown in Figure . In Figure c, all sample points fell on the fitting curve after the prediction of the model based on the effective characteristic spectra. For the full spectra model (Figure a) and the VIP spectra model (Figure b), some sample points did not fall on the fitting curve. This indicates that the smallest deviation between the predicted and reference values was obtained by the effective characteristic spectra model. Furthermore, the RMSEP values of the three models also explain this result. Compared to the full spectra and VIP spectra models, the RMSEP value of the effective characteristic spectra model decreased by 49.14 and 37.10%, respectively. The effective characteristic spectra model achieved the highest prediction accuracy. Using the effective characteristic spectra model to predict the validation set sample, the linear equation between the predicted value and the reference value is Y = 1.0039X – 0.1275, and the correlation coefficient of the external validation prediction is RP2 = 0.9991, indicating that the predicted value is consistent with the reference value. In conclusion, modeling based on the effective characteristic spectra yielded the best prediction performance.
Figure 4

External validation results of the ethanol content in gasoline using partial least squares models based on different spectral ranges. (a) Full spectra; (b): VIP characteristic spectra; (c) effective characteristic spectra.

External validation results of the ethanol content in gasoline using partial least squares models based on different spectral ranges. (a) Full spectra; (b): VIP characteristic spectra; (c) effective characteristic spectra.

Conclusions

A rapid NIR analysis method based on effective characteristic spectra was developed and successfully applied to detect ethanol content in gasoline. Four effective characteristic spectral modeling schemes were developed based on the effective characteristic spectra of ethanol. To compare the model performance, models based on full spectra and VIP spectra were also established. After optimizing five main preprocess methods, the model based on the first derivative showed the best performance. The spectral range is 6060.171–7141.113 cm–1 for the model established by Scheme 3, which is the characteristic spectra that only contain the spectra of hydroxyl groups without the interference of aromatic hydrocarbon group spectra, had the smallest RMSECV and RMSEP values and the best model performance. These results of the application demonstrated that, compared to the full spectra model, the RMSECV and RMSEP values of the Scheme 3 model decreased by 10.71 and 49.14%, respectively. Compared to the VIP spectra model, the RMSECV and RMSEP values of the Scheme 3 model decreased by 1.51 and 37.10%, respectively. The ethanol content in gasoline was accurately and rapidly analyzed using the optimal effective characteristic spectral modeling scheme, with RCV2 = 0.9995, RMSECV = 0.6193, RP2 = 0.9991, and RMSEP = 0.5572. Thus, the rapid analysis method of NIR based on effective characteristic spectra is a specific analysis method with high accuracy, which can be applied to the rapid analysis of more characteristic indicators in complex sample systems such as gasoline and diesel.

Materials and Methods

Preparation of the Samples

Octane-rated gasoline samples (92 and 95) without ethanol were obtained from different gas stations in Beijing, including different batches of gasoline from various gas stations under PetroChina, Sinopec, and Sinochem groups. Forty-four ethanol-gasoline samples were prepared by adding a predetermined amount of ethanol (Analytical Reagent, Fuchen (Tianjin) Chemical Reagent Co., Ltd.) to gasoline (sample details in Table ). The prepared ethanol-gasoline samples have different solvent components, which can simulate actual ethanol-gasoline samples. The ethanol-gasoline samples were grouped into two sets, namely, the training set (samples not marked with * in Table ), which was used to establish the calibration analysis model, and the prediction set (samples marked with * in Table ), which was used to evaluate the prediction performance of the model. The ethanol concentration range in the calibration set was 0.5–80% (volume fraction), covering the range of ethanol in normal and adulterated ethanol-gasoline, indicating that the established calibration model is highly representative.
Table 4

Ethanol Volume Fraction in Ethanol-gasoline Samples

no.volume fraction (%)no.volume fraction (%)no.volume fraction (%)no.volume fraction (%)
10.512*1123223436
21131224*233538
3*21413252436*40
4315*1426253745
54161527*263850
6*51716282739*55
7618*1729284060
87191830*294165
9*82019313042*70
10921*2032324375
1110222133*344480

Note: Samples marked with * are included in the validation set.

Note: Samples marked with * are included in the validation set.

Acquisition of NIR spectra

The NIR spectra of the ethanol-gasoline samples were measured using a Frontier NIR spectrometer (Antaris II, Thermo Fisher Scientific (China) Co., Ltd.) equipped with an InGaAs detector and a tungsten-halogen source. The spectra were obtained by co-adding 32 scans in the transmission mode. The spectral scan range was 4000–10 000 cm–1 at 4 cm–1 digital resolution. Each sample was loaded and tested three times in a 1 mm pathlength cuvette, and average spectra were used for further analysis. The spectra were recorded at room temperature and the humidity was maintained at 40%.

Data Preprocessing and Model Evaluation

The Unscrambler X software was used for the preprocessing and statistical analysis of the NIR spectral data. SGM, SGD, MSC, VN, and SNV were selected to denoise the spectral data and then compared with the preprocess results. SGM can effectively reduce noise in the spectral signal and improve the signal-to-noise ratio. SGD can reduce the drift of NIR spectroscopy and interference of certain background signals. MSC is used to reduce the scattering effects of mechanical impurities in gasoline samples. VN can improve the role of characteristic spectral segments in modeling and eliminate the adverse effects caused by large-scale differences, while SNV can reduce the scattering effects of sample surfaces and the influence of optical path changes. After the spectral data were preprocessed using the five methods described above, the partial least squares[40,41] method was used to build the calibration model. Finally, an optimal preprocessing method was selected. RCV2 and RMSECV were selected as parameters to evaluate the quality of the calibration model. The precision of the calibration model was better when the RCV2 value was closer to 1 and the RMSECV value was low. RP2 and RMSEP are typically used as parameters to evaluate the predictive ability of the calibration model. The accuracy of the prediction model improves when the RP2 value is closer to 1, and the RMSEP value is low. For an optimal calibration model, the RMSECV and RMSEP values should be close. Therefore, in this study, we chose four parameters, RMSECV, RCV2, RMSEP, and RP2 to compare the effects of the full spectra, VIP spectra, and effective characteristic spectra models. RMSECV, RMSEP, and R2 (RCV2 or RP2) were calculated using the following equations Note: n is the number of training set samples, m is the number of validation set samples, y indicates the measurement results obtained by standard methods for sample i, y̅ is the mean value of y, and ŷ indicates the predicted results of sample i based on spectral modeling.
  11 in total

Review 1.  Variables selection methods in near-infrared spectroscopy.

Authors:  Zou Xiaobo; Zhao Jiewen; Malcolm J W Povey; Mel Holmes; Mao Hanpin
Journal:  Anal Chim Acta       Date:  2010-03-30       Impact factor: 6.558

2.  Advances in Molecular Structure and Interaction Studies Using Near-Infrared Spectroscopy.

Authors:  Mirosław Antoni Czarnecki; Yusuke Morisawa; Yoshisuke Futami; Yukihiro Ozaki
Journal:  Chem Rev       Date:  2015-09-10       Impact factor: 60.622

3.  A strategy that iteratively retains informative variables for selecting optimal variable subset in multivariate calibration.

Authors:  Yong-Huan Yun; Wei-Ting Wang; Min-Li Tan; Yi-Zeng Liang; Hong-Dong Li; Dong-Sheng Cao; Hong-Mei Lu; Qing-Song Xu
Journal:  Anal Chim Acta       Date:  2013-11-21       Impact factor: 6.558

4.  Simultaneous determination of methanol and ethanol in gasoline using NIR spectroscopy: effect of gasoline composition.

Authors:  Heitor L Fernandes; Ivo M Raimundo; Celio Pasquini; Jarbas J R Rohwedder
Journal:  Talanta       Date:  2007-12-27       Impact factor: 6.057

5.  Comparison of near infrared spectroscopy and Raman spectroscopy for the identification and quantification through MCR-ALS and PLS of peanut oil adulterants.

Authors:  Rafael C Castro; David S M Ribeiro; João L M Santos; Ricardo N M J Páscoa
Journal:  Talanta       Date:  2021-03-29       Impact factor: 6.057

6.  [Multi Spectral Detection of Ethanol Content in Gasoline Based on SiPLS Feature Extraction and Information Fusion].

Authors:  Kun-peng Zhou; Wei-hong Bi; Yun-hai Xing; Jun-gang Chen; Tong Zhou; Xing-hu Fu
Journal:  Guang Pu Xue Yu Guang Pu Fen Xi       Date:  2017-02       Impact factor: 0.589

7.  Rapid spectral analysis of agro-products using an optimal strategy: dynamic backward interval PLS-competitive adaptive reweighted sampling.

Authors:  Xiangzhong Song; Guorong Du; Qianqian Li; Guo Tang; Yue Huang
Journal:  Anal Bioanal Chem       Date:  2020-02-23       Impact factor: 4.142

8.  Genetic Algorithm-Based Partial Least-Squares with Only the First Component for Model Interpretation.

Authors:  Hiromasa Kaneko
Journal:  ACS Omega       Date:  2022-03-04

9.  Detecting Phytoplankton Cell Viability Using NIR Raman Spectroscopy and PCA.

Authors:  Nina I Novikova; Hannah Matthews; Isabelle Williams; Mary A Sewell; Michel K Nieuwoudt; M Cather Simpson; Neil G R Broderick
Journal:  ACS Omega       Date:  2022-02-10

10.  Combined Effects of a Biobutanol/Ethanol-Gasoline (E10) Blend and Exhaust Gas Recirculation on Performance and Pollutant Emissions.

Authors:  Lifeng Zhao; Defu Wang
Journal:  ACS Omega       Date:  2020-02-13
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.