Literature DB >> 35694514

Correlation between the Molecular Structure and Viscosity Index of CTL Base Oils Based on Ridge Regression.

Chunhua Zhang1, Hanwen Wang1, Xiaowen Yu1, Chaolin Peng1, Angui Zhang2, Xuemei Liang2, Yinan Yan2.   

Abstract

In China, coal-to-liquid (CTL) lube base oils with ultrahigh viscosity index (VI) are very popular. Since it consists of chain alkanes only and can be precisely characterized by molecular structures alone, quantitative 13C nuclear magnetic resonance (NMR) data are used to generate the average structural parameters (ASPs) of CTL base oil. In this work, the ASPs and bulk properties of CTL base oils were tested and compared with those of mineral base oils. Based on the test results, the correlation between the unique property of CTL base oil VI and ASPs was analyzed. To eliminate the effect of significant multicollinearity among the input variables, statistical methods such as ordinary least-squares (OLS), stepwise regression, and ridge regression methods were used to build the VI prediction model. The main findings are as follows: according to the 13C NMR spectrum, CTL base oils had a significantly higher content of isomeric chain alkanes (including several branching structures) than mineral base oil, while the content of cycloalkanes was zero; among several branched structures, the one with the largest difference in content is structure S67, which has the highest percentage in the iso-paraffin structures, all above 25.5% in CTL base oils and below 21.39% in mineral oils; according to the distillation curve of the simulated distillation (SimDist) analysis, CTL base oils with similar carbon number distribution showed lower boiling points, narrower distillation ranges, and higher distillation efficiencies than mineral base oil; correlation analysis showed that the average chain length (ACL), normal paraffins (NPs), and structure S67 caused the CTL base oil to exhibit a higher VI; and from 13C NMR data, the ridge regression model was used to obtain regression coefficients consistent with reality, and the expected VI could be well predicted with a correlation coefficient of 0.935.
© 2022 The Authors. Published by American Chemical Society.

Entities:  

Year:  2022        PMID: 35694514      PMCID: PMC9178722          DOI: 10.1021/acsomega.2c01877

Source DB:  PubMed          Journal:  ACS Omega        ISSN: 2470-1343


Introduction

The global distribution of energy mix and the expanding demand for high-performance lubricants have driven the development of coal-to-liquid (CTL), gas-to-liquid (GTL), and biomass-to-liquid (BTL) technologies.[1−4] In China, where coal mining resources are extensive, CTL base oil is a promising alternative to existing group II and III base oils. It is technically a lubricant component for machines and engines composed of straight-chain alkanes. These straight-chain alkanes are the main chemical species that provide lubricating properties similar to those of PAO (group IV).[1,5,6] CTL base oil has various advantages including high viscosity index (VI), excellent low-temperature performance, environment friendliness, and the absence of polyaromatic hydrocarbons, naphthenes, nitrogens, and sulfides. CTL base oils are generally produced from feedstocks of coal through an indirect liquefaction process. In this method, synthesis gases (carbon monoxide and hydrogen) produced from coal are converted to paraffinic hydrocarbons by the Fischer–Tropsch (FT) reaction. Paraffinic hydrocarbons are processed by isomerization and fractionation, named CTL base oils. Pure synthetic CTL base oils, which have a high VI, are identified as group III base oils due to the use of traditional refining technologies such as hydrocracking and isomeric dewaxing.[4,7] Base oil properties such as VI, pour point, density, flash point, evaporation loss, and rotary pressure vessel oxidation test (RPVOT) affect the lubricating performance and service life of the lubricant. VI, first proposed by standard oil’s Dean and Davis in 1929,[8] is a method for describing the viscosity–temperature relationship between base oils and lubricants. The magnitude of its value is one of the most important indicators in determining the quality grade of a lube base oil. For example, a high VI value allows Fischer–Tropsch synthetic base oils to be considered group III base oils, which means that their kinematic viscosity (KV) is less sensitive to temperature. This smaller variation of KV with temperature is pleasing, as such base oils can be adapted for use in applications with a wide range of temperature variations to ensure effective lubrication, especially in the field of automotive engine lubricants. Furthermore, the VI of a lube base oil has been shown to be highly dependent on feedstocks, process conditions, etc., and is ultimately reflected in the molecular composition and structure of the base oil.[9] Currently, the main approaches for molecular characterization of base oils are mass spectrometry (MS) and nuclear magnetic resonance (NMR).[10,11] MS provides information on the content of different types of compounds and the carbon number of each compound, which allows us to evaluate the changes in hydrocarbon structure types during the hydrogenation process. Nuclear magnetic resonance (NMR), on the other hand, is mainly used to make a certain degree of speculation on the structure of base oils by measuring the content of different types of carbon atoms and calculating some ASPs.[12−16] Compared to 1H NMR, the chemical shifts of 13C NMR are more sensitive to the molecular structure and can provide more valid information,[17] so the carbon-type distributions it provided are often used as input variables in models for VI calculations. Sarpal et al.[18,19] delineated the attribution of 13C NMR spectral peaks and established the correlation between VI and carbon-type composition and the distribution of the branched structure. However, in a later study, it was found that the correlation was established based on qualitative results. Sharma et al.[20] analyzed the relationship between the average structural parameters of hydrogenated base oils and VI, such as the amount of normal paraffins (NPs), iso-paraffins (IPs), and average chain length (ACL), based on a simple linear regression approach. The results indicated that a single structural parameter cannot accurately predict the VI and that appropriate prediction accuracy requires the use of at least two or more structural parameters. In addition, the correlation indicated that a decrease in ACL caused an increase in VI; however, this conclusion did not work for other studies.[21,22] Verdier et al.[21] developed a correlation model for VI based on molecular structure data from 13C NMR of 20 base oils and the correlation model was shown to predict the experimental data well with an R2 of 0.9589. However, the significant correlation between molecular structures as input variables can make the regression model unstable. Recently, Noh et al.[11] developed two VI regression models for three types of base oils based on the molecular structure of hydrocarbons; they found that the constrained regression model that considers the physical significance of the regression components has better generalization ability than the stepwise regression method based on pure data. Despite the increasing production of CTL base oils in China and the fact that VI is a very important characteristic in determining the quality grade of base oils, so far, there is a lack of practical correlation models between VI and the molecule structure of CTL base oils. With small sample sizes, most studies have focused on simple linear statistical methods to determine regression models of dependent variables with multiple explanatory variables.[23−25] Based on experimental samples, the strongly correlated variables were selected as input features for modeling by measuring the linear correlation between the input structures and VI, which implied that the relationship between the representative variables with strong correlation coefficients and VI was confirmed. However, the coefficients of the regression model did not guarantee the consistency with the actual positive and negative correlations due to the significant multicollinearity among different molecular structures, which increased the variance of the coefficients of the input variables and made the prediction model unrepresentative of the true regulation.[26] For severe multicollinearity, the common possible solutions are (1) grouping input variables and combining or splitting highly correlated variables, (2) using stepwise regression methods to filter and eliminate input variables, and (3) using ridge regression methods to introduce a small amount of bias to reduce sensitivity to sample data.[27,28] Studies on the relationship between base oils with various structures and bulk properties could provide the right direction for base oil production and processing. Although research on the structures and properties of mineral base oils has long been started, we know little about the differences in the properties of CTL base oils with different molecular structures. The current work, therefore, a study of five CTL base oils and four mineral base oils with different molecular structures, was carried out, and the results showed that 13C NMR could accurately characterize the structure of CTL base oils. In addition, the established VI model has excellent predictive ability.

Experimental Devices and Methods

Samples

Nine different oil samples were analyzed in this study: C#1–C#5 were CTL base oils and M#1–M#4 were hydrotreated and hydrocracked mineral base oils. CTL base oils were supplied by a Chinese coal-to-oil production plant. Mineral base oils M#1 and M#2 were group III base oils and M#3 and M#4 were group II base oils, purchased from Ssangyong in Korea and Hainan Lian in China, respectively. The macroscopic physical properties of the individual base oils ensure a certain degree of variability to meet adequate representation.

Measurement of Bulk Properties

The different properties of base oils were tested according to ASTM standards. The pour point and flash point were measured using methods ASTM D-5949 and ASTM D-93, respectively, and evaporation loss was measured according to Din51.581 (noack). The oxidation stability was analyzed by the rotary pressure vessel oxidation test (RPVOT) according to ASTM D-2272. KV and density were measured using an Anton Par SVM3001 Stabinger viscometer at 40 and 100 °C according to ASTM D-7042 and ASTM D-4052, respectively. VI was calculated using the ASTM D-2270 method. Gas chromatograph (Agilent 8890) equipped with a DB-HT-SIMDIS (5 m, 0.53 mm, 0.1 μm) column was used to execute the simulated distillation of the base oil samples according to the ASTM D-6352 method. The retention time was converted to boiling point by Agilent SimDis software. The COC inlet was set to ramp up at 35 °C/min to a final temperature of 400 °C. The flame ionization detector temperature was set to 450 °C, and the flow rates of hydrogen, air, and nitrogen were 32, 400, and 24 mL/min, respectively. In addition, the carbon number distribution of the base oil was determined based on the retention times of n-alkanes. The precision of the Agilent 8890 GC system performance has been evaluated using 5010 standards before testing the carbon number distribution on nine oil samples. The retention time and calibration model showed excellent separation and detection capabilities, and the results of 10 runs showed an average deviation of less than 2%. Considering all sources of uncertainty causing the measurements, three repeatability tests were performed on different batches of bottled oil samples to ensure adequate representativeness of the test data. The properties of these different oil samples are given in Table .
Table 1

Properties of the Nine Oil Samples

sampleVIpour point (°C)flash point (°C)KV@40 °C (mm2/s)KV@100 °C (mm2/s)density (g/cm3)evaporation lossoxidation stability (min)
C#1142–39211.519.254.380.80118.819
C#2150–33294.772.111.350.8180.621
C#3133–39230.918.454.180.80214.518
C#4142–30278.843.767.600.8131.322
C#5151–36239.430.646.100.808524
M#1130–15235.231.45.90.81410.748
M#2121–21248.247.87.40.8265.126
M#384–12274.1461.6280.8611.535
M#4109–21146.529.350.83116.827

13C NMR Spectroscopy

For mineral oils, 13C NMR cannot distinguish between the same type of carbon atoms in different molecules, such as in chain alkanes and in naphthenic side chains. In addition, since cycloalkane carbon usually appears in 13C NMR spectra as an envelope peak with chemical shifts ranging from 24 to 60 ppm, the choice of integration method has a significant impact on the calculation results of the naphthenic carbon (Cn) content. However, for CTL base oils, since they do not contain Cn, no baseline drift occurs and the high intermolecular similarity makes it easier and more accurate to determine the quantitative data on molecular structure. In this study, quantitative 13C NMR spectra of base oil samples were performed on a Bruker AVANCE spectrometer at a resonance frequency of 400 MHz. 13C NMR spectra were equipped with a 5 mm dual resonance broad-band inverse probe. The oil samples were diluted in CDCl3 with 0.1 M Cr(acac)3. Cr(acac)3 was used as a relaxant to induce spin–lattice relaxation times, and TMS was used as an internal standard to measure chemical shifts. The application used an inverse gating decoupling scheme with a pulse width of 2.7 μs, a relaxation delay of 5 s, and an acquisition time of 1.5 s. A total of 20 000–24 000 scans were acquired for each spectrum. MestReNova software was used to collect and analyze the data, and each spectrum was processed three times and the average values were reported.

Regression Model

In petroleum science with small samples, the primary forces of statistical analysis are OLS and stepwise regression. OLS estimates the unknown parameters of an equation by minimizing the sum of squares of the differences between the sample values and the predicted values. OLS regression produces unstable results when there is a high degree of multicollinearity among the input variables, so it can be used as a means of testing for multicollinearity.[29] Stepwise regression builds the model by screening and eliminating variables that cause multicollinearity, so that the variables that are ultimately retained in the model are both significant and have no significant multicollinearity. In addition, ridge regression also provides a way to address multicollinearity. The ridge regression algorithm is a regularization method that reduces the sensitivity of the results to the training data set by introducing a small amount of bias, which suppresses the adverse effects of covariance on the predictions. In this study, ordinary least-squares (OLS) and stepwise regression methods were developed using the Minitab statistical software version 19.0 (Minitab Inc., State College, PA), while ridge regression models were developed using SPSS 26.0 software (IBM Corp., Ltd., New York).

Results and Discussion

The bulk properties of CTL and mineral base oils, including VI, pour point, flash point, KV@40 °C, KV@100 °C, density, evaporation loss, oxidation stability, and distillation properties, were studied and compared. In addition, the relative contents of carbon types and branched structures were compared based on 13C NMR results. Finally, VI regression models were developed based on ASPs, and the applicability of OLS, stepwise regression models, and ridge regression models was investigated in turn. Meanwhile, the physical significance of the effect of each structure on the overall VI was considered.

Measurement of Properties

Physicochemical Property Analysis

A comparison of the physicochemical properties of CTL and mineral base oils is presented in Table . In general, the physicochemical properties of CTL base oils differ from those of mineral base oils, while a similar isomerization process makes them similar in some respects. The KV of CTL base oils at 100 °C is similar to that of mineral base oils, but at 40 °C, the KV of mineral base oils is higher, resulting in a lower VI than that of CTL base oils. In addition, the higher pour points of the mineral base oils reflect poorer low-temperature fluidity than those of CTL base oils. Another characteristic of CTL base oil is its lower density compared to mineral base oil. In contrast, flash point and evaporation loss show no difference in these two types of base oils, with C#2, C#4, and M#3 having the lowest evaporation loss and M#4 having the lowest flash point. In addition, it was observed that evaporation loss and flash point showed a negative correlation; in general, the lower the evaporation loss, the higher the flash point. Among them, C#1 has the lowest flash point and the largest evaporation loss among CTL base oils, while the evaporation loss of mineral oil M#4 is similar to that of C#1, but the flash point is 65 °C lower. The reason for this is that the flash point indicates the lowest temperature at which a flash fire occurs and burns immediately when the mixture comes into contact with the flame, as shown in Figure ; the content of light fraction (C14–C19) in M#4 is more than that of C#1, so the flash point is lower. The oxidation stability of mineral base oils is generally better than that of CTL base oils. The overall difference in the oxidation induction time in CTL base oil is small, and the difference between the longest and shortest times is only 5 min.
Figure 1

Carbon number distribution of base oils.

Carbon number distribution of base oils.

Simulated Distillation (SimDist) Analysis

The nine oil samples are divided from the SimDist analysis (by % wt) into light (C14–C23), medium (C33–C45), heavy (C46–C66), and super heavy (>C66) according to the number of carbons. Figure shows that the carbon number distribution of CTL base oils is more concentrated overall compared to that of mineral base oils, showing more than 48% of the characterized fractions, similar to M#1 and M#2 but much higher than the highest content fraction in M#3 and M#4. Among them, C14–C23 represents the relatively lightest fraction of nine base oils, and the lighter fraction with a large variation in content is considered to be the main factor affecting evaporation losses. The actual cumulative yield of the base oil and the cumulative yield distribution curve obtained by the n-alkane boiling point calculation are shown in Figure , and the temperature corresponding to the n-alkane boiling point is derived from the study of Kudchadker et al.[30] It can be observed that the isomerization reduces the overall boiling point of the base oil and increases the efficiency of distillation, which is more obvious in the fractions with higher carbon numbers.
Figure 2

Cumulative yield of base oil with increasing temperature (% wt) and carbon number of n-paraffins corresponding to the boiling temperature.

Cumulative yield of base oil with increasing temperature (% wt) and carbon number of n-paraffins corresponding to the boiling temperature.

Measurement of ASPs

The ASPs of the base oil can be characterized in detail based on the peak positions of different types of carbon atoms provided by the 13C NMR spectra. As shown in Table , the chemical shift assignments for the various carbon types were taken from Sarpal et al.[18,22]
Table 2

Algorithms for ASPs

13C NMR parameterchemical shift (ppm)
naphthenic carbons (Cn)hump in region (60–24) ppm
paraffinic carbons (Cp)(60–5) ppm - Cn
n-paraffinic α carbon (NPα)14.1 ppm
n-paraffinic β carbon (NPβ)22.7 ppm
n-paraffinic γ carbon (NPγ)32.0 ppm
n-paraffinic δ or higher carbon (NPn)29.4 and 29.9 ppm
normal paraffins (NPs)NPα + NPβ + NPγ + NPn
iso-paraffins (IPs)Cp–NP
average chain length (ACL)2 × NP/NPα
various branched structures 
2-methyl-substituted (S2)28.2 ppm
3-methyl-substituted (S3)11.4 ppm
ethyl-substituted (S3′)10.7 ppm
4-methyl-substituted (S4)14.2 ppm
5-methyl-substituted (S5)14.3 ppm
6- or 7-methyl-substituted (S67)27.0 ppm
2 or more methyl-substituted (S8)24–25.6 ppm
The NMR spectra of CTL base oils differed significantly from those of mineral base oils, as shown in Figure . Taking C#4 and M#4 as an example, a severe drift of the baseline was observed in the M#4 spectrum compared with that in C#4, indicating the presence of relatively abundant Cn absorption peaks. In addition, the resonance signals of C#4 were stronger at 10–12, 14–15, 24–27, 32–34, and 36–40 ppm compared to that of M#4, which implies a higher content of branched structures. Based on the earlier study of peak assignments,[22] it was possible to distinguish specific branched structures, as shown in Figure , where the relative content of the carbonaceous units was recorded by normalizing the total carbon integral area, based on which the molar fractions of the different branched structures were calculated by the contribution to the IP. The relative contents of specific carbon types and branched structures were calculated and are shown in Table .
Figure 3

13C NMR spectrum of M#4 and C#4 base oils.

Figure 4

Various branched structures possible in base oils from 13C NMR spectra (S1 = 14.1 ppm, S2 = 28.2 ppm, S3 = 11.4 ppm, S3′ = 10.7 ppm, S4 = 14.2 ppm, S5 = 14.3 ppm, S67 = 27.0 ppm, S8 = 24.0–25.6 ppm) (+ indicates the atoms detected from the structure).

Table 3

Structural Parameters by 13C NMR (in %) of Base Oils

  carbon-type compositions (%)
branched structures (mol % C)
sampleACLCnNPsIPsS2S3S3′S4S5S67S8
C#122.22032.6467.365.025.077.675.074.2530.679.60
C#227.66035.4264.584.033.637.553.913.9732.998.50
C#319.21032.4767.535.745.457.945.855.9125.5011.13
C#421.73032.9467.064.694.828.214.826.3628.1010.06
C#523.78031.5668.444.884.538.154.704.3632.079.75
M#122.4313.5435.7850.685.193.574.274.043.2921.398.94
M#222.9519.2335.2445.534.183.323.263.262.1819.999.34
M#327.1236.824.9438.265.382.191.462.191.9310.7114.37
M#420.3218.2230.6851.105.803.572.743.892.6817.6514.78
13C NMR spectrum of M#4 and C#4 base oils. Various branched structures possible in base oils from 13C NMR spectra (S1 = 14.1 ppm, S2 = 28.2 ppm, S3 = 11.4 ppm, S3′ = 10.7 ppm, S4 = 14.2 ppm, S5 = 14.3 ppm, S67 = 27.0 ppm, S8 = 24.0–25.6 ppm) (+ indicates the atoms detected from the structure). Table shows the relative contents of carbon types and branched structures in base oil components. A certain fraction range of Cn significantly reduces the relative content of IP, which makes the biggest difference in the structure between CTL and mineral base oils, where CTL base oil has a higher branched-chain content than mineral base oils, such as structures S3–S7. First, structure S3′ of CTL base oil has an average value of 7.904% at 10.9 ppm, which is higher than that of mineral base oils. The average value of structure S67 at 27 ppm is 29.866%, which is also higher than that of mineral base oil. Second, the average values of structures at 11.4 (S3), 14.4 (S4), and 14.5 (S5) ppm in CTL base oils are slightly higher than those of mineral base oils. In addition, ACL indicates the amount of carbon in the chain alkanes and also as part of the side chain attached to the cycloalkene ring, so the ACL of mineral base oils containing some of the cycloalkanes is lower than actual.

Relationship between VI and Base Oil Structures

VI reflects the variation of the viscosity of the base oil with temperature. n-Paraffins have the highest VI, and iso-paraffins have a slightly lower VI than n-paraffins. The presence of cycloalkanes and aromatic hydrocarbons negatively affects the overall VI of the base oil, and VI decreases as the number of rings increases.[31] In mineral base oils with a certain content of cycloalkanes, high VI depends on the relative content of monocyclic alkane and IP.[32] In addition, different levels of branching of iso-alkanes have different effects on the VI, and the key parameters for high VI point to molecules with the methyl branching structure at the center of the carbon chain or without ethyl branching.[21] Based on the collected ASPs of CTL base oils and mineral base oils, one-dimensional linear regression equations of VI and characteristic structure fractions are constructed as a way to determine whether there is a tight linear correlation. In Figure , each graph plots the data points of VI versus the fraction of ASPs and gives the correlation coefficient (R2). Among them, ACL, NPs, and structure S67 are positively correlated with VI, while other branched structures are negatively correlated with VI. The coefficients for the same molecular structure are in the same direction in both base oils, indicating a consistent effect of ASPs on VI. In addition, the values of the individual correlation coefficients are closer to 1 in the CTL base oil, indicating a stronger linear correlation between the ASPs and VI. However, in mineral base oils, the correlation between molecular structures and VI is not significant, such as Cn (R2 = 0.118), NPs (R2 = 0.184), S3′ (R2 = 0.0306), and S8 (R2 = 0.0297).
Figure 5

Correlation between VI and the fraction of molecular structure type (CTL and mineral base oils are indicated by filled blue circles and unfilled circles, respectively).

Correlation between VI and the fraction of molecular structure type (CTL and mineral base oils are indicated by filled blue circles and unfilled circles, respectively). Among the various types of methyl structures of iso-paraffin, structure S67 is the only one that positively correlated with VI, and R2 is 0.9005 in CTL base oils and 0.4629 in mineral base oils. This finding is in good agreement with previous studies that methyl branched chains in the center of carbon chains possess the ability to restrict molecular diffusion at high temperatures and thus exhibit high VI.[22] ACL also shows a similar correlation to VI in both base oils, with R2 = 0.7597 in CTL base oil and R2 = 0.6043 in mineral base oil. However, the positive effect of ACL on VI has been underestimated in some studies;[20] in mineral base oils, ACL does not only represent the length of carbon chains in normal and isoparaffinic chains but also represent part of the side chain that includes the linkage to the naphthenic hydrocarbons, so the ACL of mineral base oils is affected by the carbon content of naphthenic hydrocarbons, thus underestimating its positive effect.

Development of the Prediction Model

The CTL base oil ASPs obtained from the 13C NMR quantification technique were used to build the regression model for predicting VI. In Table , the data points in the first five rows are the experimental results for CTL base oils shown above. In addition, the data in the last six rows of Table were obtained from the six base oils synthesized using the Fischer–Tropsch technique.[33] Thus, in total, data from 11 experiments are used to construct the regression model for VI.
Table 4

Experimental Data Sets Used for Multivariable Regression Models of VI

 NPsACLS2S3S3′S4S5S67S8VI
C#132.6422.225.025.087.675.084.2530.679.6142
C#235.4227.664.033.637.553.913.9732.998.5150
C#332.4719.215.745.457.945.855.9125.511.13133
C#432.9421.734.694.828.214.826.3628.110.07142
C#531.5623.784.884.538.154.704.3632.079.75151
FT#125.6517.005.767.489.807.156.8922.3114.96103
FT#227.7919.654.906.268.645.856.4725.7314.36120
FT#336.7529.52.783.227.363.464.7631.6610.02154
FT#424.5617.575.457.4410.306.586.7122.2716.68109
FT#528.8820.753.784.838.535.106.0129.313.57138
FT#635.4429.032.673.057.513.314.6430.9812.40154

Variable Selection

It is increasingly critical how to select representative molecular structures. Presently, the focus is on reducing the covariance among the input key molecular structures, and grouping the input variables is a good way to reduce the covariance.[34] Since most of the chain alkanes constituting IP are lightly branched, mainly with one methyl group, a few with one ethyl group or two or more methyl groups located in the side chain, and their carbon chain lengths may be close to n-paraffins. Therefore, we treat ACL as an independent group, which represents the average size of base oil molecules. NPs, as a separate group, represents the molecular fraction of n-paraffins in the base oil. For the multiple branching structures in IP, two grouping strategies are considered. On the one hand, the position of the methyl branches in the chain alkanes has different effects on the rigidity of the molecular structure. For example, structure S2 is near the end of the carbon chain and in a nonequilibrium position; such a structure is easily deformed at low temperatures, while structure S67 is near the center of the carbon chain and maintains a rigidity similar to that of n-paraffins, which limits the mobility of the molecule at high temperatures. On the other hand, the grouping is based on the number and type of branching structures since they have different strengths of influence on VI. As shown in Figure , the coefficients of the input variables of the monomethyl branching equation (except S67) are very different from structures S3′ and S8, which means that structures S3′ and S8 have a weaker ability to reduce VI. Ultimately, the IPs are divided into four groups, structures S2, S3, S4, and S5 combined into one group (S2345) and S3′, S67, and S8 into their own groups.

OLS

It has been shown that the different findings are attributed to inconsistencies in the methodology and model selection. Among them, ordinary least-squares estimation is the simplest and most widely used regression estimation method.[35,36] In this work, to see whether grouping different molecular structures could resolve the effects caused by covariance, we use ordinary least squares for VI prediction modeling. Results of the OLS regression are shown in Table , with a p-value less than 0.01 and an R2 of 0.977, indicating that the results of the OLS regression model are significant. NPs and structure S67 have a positive effect on VI; however, the coefficient (B) for the corresponding variable shows that the increase in NPs, structure S67, and ACL decrease the VI, which is not consistent with the results in Section . The reason is that variance inflation factor (VIF) values of all variables are in the range of 33.213-inf, which indicates strong multicollinearity between these variables. Therefore, the regression results of OLS do not reflect the true relationship between the molecular structure and VI, and new estimation methods are needed to overcome this deficiency.
Table 5

Least-Squares Regression Coefficienta

 BβVIFR2FP
constant0.1204 inf0.97742.9360.0004
NPs4.10590.929inf   
ACL–3.6288–0.89733.213   
S2345–3.3136–0.842inf   
S3′7.25710.384inf   
S672.92450.626inf   
S81.06700.158inf   

Inf stands for infinity.

Inf stands for infinity.

Stepwise Regression

The stepwise regression method reduces the degree of multicollinearity by eliminating variables that are less important and highly correlated with other variables.[37,38] The model is performed with NPs, ACL, and structures S2345, S3′, S67, and S8 as independent variables and VI as a dependent variable. The result shows that the model constructed by NPs and structural S67 passes the F-test and the R2 is 0.951, implying that the NP and structure S67 explained 95.1% of the variation in VI. From Table , the coefficient (B) of the corresponding variable for NP is 1.796, while that for structure S67 is 2.850, indicating that the effect of structure S67 on VI is greater than that of NPs. Although the VI of n-paraffins with the same carbon number is higher than iso-paraffins, the low carbon number distribution of n-paraffins may be responsible for this phenomenon. However, ACL also has a strong positive correlation with VI, as shown in Figure ; when there is a large change in the ACL value, the VI may show a large deviation and poor generalization performance.
Table 6

Stepwise Regression Coefficient

 modelBSE(B)βtsig FVIFR2
1constant10.47314.023 0.7470.474 0.901
S674.4320.4910.9499.0270.0001.000 
2constant–0.89911.177 –0.0800.938 0.951
S672.8500.6620.6104.3030.0033.277 
NPs1.7960.6270.4062.8650.0213.277 

Ridge Regression

An alternative regression model is developed to determine the contribution of each molecular structure to the VI, in other words, to decompose the VI into a weighted sum of the individual structures. To alleviate the effect of multicollinearity on the model and decrease the error in data processing, taking logarithms for each variable of the model is a common treatment. The model is as followsConsidering the effect of multicollinearity among molecular structures, a ridge regression approach was used for modeling. Compared with the stepwise regression method, although the ridge regression analysis is a biased estimation method, it does not require the elimination of variables, so the obtained regression coefficients are more realistic and reliable than stepwise regression. To more intuitively represent the multicollinearity among variables, we calculated the Pearson correlation coefficients among the characteristic structural variables. The results of the correlation analysis are shown in Table . In statistics, the Pearson correlation coefficient is a measure of vector similarity. The output ranges from −1 to +1, with 0 representing no correlation, negative values representing negative correlation, and positive values representing positive correlation. As seen in Table , the correlation coefficient values between the variables range from 0.693 to 0.886 and −0.948 to −0.645, which indicates that the multicollinearity between them is significant. Based on the ridges shown in Figure , 0.147 is used as the ridge parameter (k) because the coefficients seem to stabilize around this value.
Table 7

Pearson Correlation Coefficients

variablesNPsACLS2345S3′S67
ACL0.886    
S2345–0.900–0.971   
S3′–0.948–0.8040.867  
S670.8340.860–0.928–0.863 
S8–0.853–0.6450.6930.854–0.822
Figure 6

Ridge trace of the coefficient estimates of the ridge regression.

Ridge trace of the coefficient estimates of the ridge regression. The results of the ridge regression are shown in Table , and the significance p-value of the regression model is 0.022, which presents a level of significance and rejects the original hypothesis, indicating the existence of a regression relationship between the independent and dependent variables. Meanwhile, the goodness-of-fit R2 and ANOVA table reflect that the model is adequate and reasonable. The experimental data versus the model regression values are shown in Figure .
Table 8

Model Result of Ridge Regression

ridge regression with k = 0.147
Mult.R: 0.9682305829
R-square: 0.9374704617
Adj. R-square: 0.8436761542
SE: 0.0554879472
Figure 7

Deviation between the experimental and predicted VI values by least squares, stepwise regression, and ridge regression.

Deviation between the experimental and predicted VI values by least squares, stepwise regression, and ridge regression. Based on the unnormalized coefficient B, the model obtained is shown in eq . It can be seen that the coefficients of the variables NPs, ACL, and structure S67 are positive, while the coefficients of the branched structures S2345, S3′, and S8 are negative, which is consistent with the conclusions obtained from the correlation analysis. The unstandardized coefficient B for each structure highlights the extent to which the variation in the content of that structure affects the VI value and reflects the intrinsic reason for developing CTL base oil products with desirable molecular structures

Conclusions

In this study, the differences between CTL base oils and mineral base oils in terms of chemical structures and conventional properties were investigated. The main objective of this study was to determine the effect of each structural feature of CTL base oils on VI, to combine the characteristic molecular structures with similar functions, and to develop a VI prediction model consistent with the physical significance of each structure. The main conclusions can be drawn as follows: The molecular structure of CTL base oils is simpler than that of mineral base oils, and the main components are iso-paraffins. In this regard, the content of structure S67 is most different between the two base oils, and its average content in the iso-paraffins of CTL base oil is 29.866%, while it is 17.435% in the mineral oil. CTL base oils have a higher VI, lower flash point, density, and oxidation induction period than mineral base oils. At the same time, according to the results of high-temperature simulated distillation tests, CTL base oil has a narrower distillation range, lower distillation temperature, and higher distillation efficiency for similar carbon number distribution. From the correlation analysis, NPs, ACL, and structure S67 are the key factors for the high viscosity index of CTL base oils, and the increase of other branched-chain structure contents will reduce the viscosity index; structure S3′ has the greatest impact. From the analytical data of 13C NMR, the stepwise regression model has an R2 of 0.951 and the ridge regression model has an R2 of 0.937, but we considered that ridge regression is more reliable because it takes into account the physical significance of molecular structure and therefore obtains more realistic regression coefficients, which makes the model more powerful in generalization.
  2 in total

1.  Multiple regression modelling of mineral base oil biodegradability based on their physical properties and overall chemical composition.

Authors:  Frédérique Haus; Olivier Boissel; Guy Alain Junter
Journal:  Chemosphere       Date:  2003-02       Impact factor: 7.086

2.  Advanced Biofuels Based on Fischer-Tropsch Synthesis for Applications in Diesel Engines.

Authors:  Jan Jenčík; Vladimír Hönig; Michal Obergruber; Jiří Hájek; Aleš Vráblík; Radek Černý; Dominik Schlehöfer; Tomáš Herink
Journal:  Materials (Basel)       Date:  2021-06-04       Impact factor: 3.623

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.