Literature DB >> 32287621

Determination of total organic carbon and soluble solids contents in Tanreqing injection intermediates with NIR spectroscopy and chemometrics.

Wenlong Li1, Haibin Qu1.   

Abstract

Near infrared spectroscopy combined with chemometrics was investigated for the fast determination of total organic carbon (TOC) and soluble solids contents (SSC) of Tanreqing injection intermediates. The NIR spectra were collected in transflective mode, and the TOC and SSC reference values were determined with Multi N/C UV HS analyzer and loss on drying method. The samples were divided into calibration sets and validation sets using the Kennard-Stone (KS) algorithm. The Dixon test, leverage and studentized residual test were studied for the sample outlier analysis. The selection of wavebands, spectra pretreated method and the number of latent variables were optimized to obtain better results. The quantitative calibration models were established with 3 different PLS regression algorithms, named linear PLS, non-linear PLS and concentration weighted PLS, and the net result was defined as the average of the predicted values of the different calibration models. The overall results indicated that the presented method is more powerful than single multivariable regression method, characterized by higher mean recovery rate (MRR) of the validation set, and can be used for the rapid determination of TOC and SSC values of Tanreqing injection intermediates, which are two important quality indicators for the process monitoring.
Copyright © 2015 Elsevier B.V. All rights reserved.

Entities:  

Keywords:  Consensus strategy; Near infrared spectroscopy; Soluble solids contents; Tanreqing injection intermediates; Total organic carbon

Year:  2016        PMID: 32287621      PMCID: PMC7114577          DOI: 10.1016/j.chemolab.2015.12.018

Source DB:  PubMed          Journal:  Chemometr Intell Lab Syst        ISSN: 0169-7439            Impact factor:   3.491


Introduction

Tanreqing injection is a widely used Chinese patent drug, which has good efficiency in treating upper respiratory infection, and has played a great part in the prevention and treatment of Severe Acute Respiratory Syndrome (SARS) and influenza A (H1N1) [1], [2], [3]. It is made from five kinds of Traditional Chinese Medicine (TCM) materials, namely: Radix Scutellariae, Forsythia suspensa, Flos Lonicerae, Bear gall powder, and Cornu gorais. During the manufacture of Tanreqing injection, lots of technological processes and intermediates need online monitoring or rapid analysis, such as extraction, separation, purification, and liquid preparation process, for the process control and real time release (RTR). Near infrared (NIR) spectroscopy is an ideal process analytical tool characterized by high speed, multi-index simultaneous determination, and chemical reagent free. The NIR spectroscopy is viewed as a green analytical technology and is more and more frequently applied in the quality control and process analysis of TCM productions [4], [5], [6]. The test targets include physics, chemistry and biology indices, which were selected according to the specific requirements for the industrial production. During the NIR spectral analysis, the most important step is to build the calibration models, which correlate the independent variable x (spectral data) and dependent variable y (test targets). Various regression algorithms have been studied, including linear regression methods, such as multiple linear regression (MLR), principal component regression (PCR), as well as nonlinear regression methods, such as artificial neural network (ANN), support vector machine (SVM) and Gaussian process. Among these regression algorithms, partial least-square regression (PLSR) is the mostly widely used one, due to its clear and definite physical meaning, and also the compatibility of linear and nonlinear correlation between the independent variables and dependent variables. In view of the different types of relationship between X and Y, a series of PLS based regression algorithms were developed, such as linear PLS, non-linear PLS, concentration weighted PLS [7], [8], [9], and so on. As for the NIR spectral analysis, the linear and nonlinear correlations between the independent variables and dependent variables always coexist, in these cases, different kinds of regression models can be established respectively, and the net result can be obtained through the consensus of these models. It can also be viewed as consensus strategy, which differs from the consensus modeling in the traditional sense [10], [11], [12]. In this paper, the NIR spectroscopy was adopted for the rapid determination of Total organic carbon (TOC) and soluble solids contents (SSC) in 4 kinds of Tanreqing injection intermediates, and during the calibration process, different chemometrics methods were applied, and the final result was calculated through the consensus strategy of linear PLSR, non-linear PLSR, and concentration weighted PLSR models, which were established independently.

Materials and methods

Experimental materials

The samples were all collected from a TCM pharmaceutical factory (Shanghai Kaibao Pharmaceutical CO., LTD., Shanghai, China) in the liquid preparation process of Tanreqing injection. In every batch, after the purified TCM extracts were put into the solvent feed tank, 10 ml of the mixture was taken as sample 1, then the pH of the mixture in the tank was adjusted with NaOH solution, after decolored with active carbon, 10 ml of the filtrate was taken as sample 2; the rest filtrate was again filtrated with triple filter, and 10 ml of the filtrate was taken as test sample 3; the rest filtrate was then hyperfiltrated and made into semi-finished product, and 10 ml of semi-finished product was collected as test sample 4. For privacy, the detailed technical parameters are omitted here. The 4 kinds of intermediate test samples above mentioned were collected from the product line of 30 batches and 120 samples were collected in total, and part of the samples were selected at random for the TOC and SSC references analysis.

Instruments

The NIR spectra of the samples were collected at 4 cm− 1 interval over the spectral region 4000–10,000 cm− 1 with an Antaris MX FT-NIR System (Thermo Scientific, Madison, USA) equipped with an optical fiber transflectance adapter. Samples were scanned with a 2 mm path length and equilibrated at 25 °C for 10 min before scanning to ensure that the samples were analyzed at the same temperature, and each spectrum was obtained by averaging 64 scans. The NIR instrument was controlled by a compatible PC, and a RESULT Operation workstation was used for data acquisition. Under the selected instrument conditions, the spectra of all the samples were collected as shown in Fig. 1 .
Fig. 1

The raw NIR spectra of Tanreqing Injection intermediates.

The raw NIR spectra of Tanreqing Injection intermediates.

Measurements of the references values

The TOC values of the samples were determined with a Multi N/C UV HS analyzer (Analytik Jena AG, Germany) in NPOC mode. Before the determination, N = 66 samples were diluted 1000 times and digested using persulfate oxidation method with the assistance of a double-wavelength high-intensity ultraviolet lamp. The standard curve was established using series of phthalic acid solutions with different concentrations. The SSC reference values of N = 100 samples were determined with loss on drying method under normal pressure according to the Chinese Pharmacopeia (2010 edition).

Data processing method

All the computations, including selection of the spectra wavebands, mathematical pretreatment, principal component analysis and partial least squares regression, were performed using TQ analyst software package (Version 8.0, Thermo Scientific, Madison, USA), The Unscrambler (Version 10.1, Camo Inc., Trondheim, Norway), and MatLab (Version 7.13, MathWorks, USA).

Results and discussion

Outlier sample eliminating

Outlier samples are the samples with abnormal spectra or reference values. Several factors may lead to unusable spectra, including the reliability of spectra collecting method, the fluctuation of the testing environment, the small mistake of the operating personnel, and the significant difference of the sample matrix [13]. Dixon's test and leverage diagnostic method have been proven effective for the outlier identification [14]. In the Dixon's test, the spectra data were firstly processed with PCA method, and then the Mahalanobis distance between the average spectrum and every spectrum was calculated with the first few PCs. Then the samples were sorted according to the Mahalanobis distance, and the samples having significant differences with the average spectra are considered as outlier samples. In this research, the results of Dixon's test were shown in Fig. 2 . The leverage diagnostic shows the relationship between the leverage and residual values for each component and each sample in a PLSR calibration model. The leverage values reflect how much the standard is influencing the model, and the studentized residual value is the residual value of a spectrum after it has been scaled by the leverage value. The residual values plotted by the Leverage diagnostic have been divided by their standard error to produce the studentized residual value, which places all samples on a similar scale, regardless of their leverage values, and makes it easier to identify standards that may be outliers. Fig. 3 illustrates how the leverage values compare with the studentized residual values. In the present study, we provisionally viewed the samples with significant differences as outlier samples, and then reinstated them one by one. If the model performance deteriorated after a certain sample was reinstated, then the sample was deemed to be outlier sample, and should be removed from the calibration set. Conversely, the sample was still viewed as a normal one.
Fig. 2

The Dixon's test result of the Tanreqing injection intermediate samples.

Fig. 3

The leverage and the Studentized residual values of the Tanreqing injection intermediate samples.

The Dixon's test result of the Tanreqing injection intermediate samples. The leverage and the Studentized residual values of the Tanreqing injection intermediate samples. After the Dixon's test and leverage diagnostic test, 4 samples for the TOC, and 6 samples for the SSC had significant differences to the average of the calibration samples, therefore these samples were identified as outliers and were removed from the calibration set. The remaining 66 samples for TOC and 104 samples for SSC were used as normal samples.

Division of calibration and validation sets

All the remaining samples were split into calibration and validation sets by Kennard–Stone (KS) algorithm [15], the former was used to build regression models and the latter to evaluate model performance. The KS algorithm firstly calculates the Euclidean distances of every two samples, and the two samples with the greatest distances are firstly taken into the calibration set, and then follows a stepwise procedure in which new selections with the largest minimum distance with any sample already existed in the calibration set are taken in, until the number of samples specified by the analyst is achieved. With the algorithm, about two thirds of the total samples were chosen to form the calibration set while the remaining one third of the total samples went into the validation set. The content ranges of TOC and SSC are listed in Table 1 , from which it can be concluded that a calibration set with enough representative was selected from a pool of total samples.
Table 1

Statistics of parameters in calibration sets and validation sets.

Quality parametersTotal sets
Calibration sets
Validation sets
MeanRangeMeanRangeMeanRange
SSC (%)2.552.27–2.792.532.27–2.792.582.27–2.76
TOC (mg/L)55.4 e352.9–59.7 e354.9 e352.9–59.7 e355.6 e353.2–59.1 e3
Statistics of parameters in calibration sets and validation sets.

Comparison of different PLSR regression method

In view of both linear and nonlinear relationships between the spectral data and the reference values of TOC and SSC that coexisted, three regression methods were adopted to establish the calibration models, including linear PLS models, non-linear PLS models and the concentration weighted PLS models. Table 2 lists the RMSEC, RMSECV, RMSEP, and related Rs for all the calibration models of TOC and SSC, from which it would be easy to see that the three PLS methods provide similar results. Detail computing formula of above mentioned parameters can be found in Ref. [16].
Table 2

The performance parameters of the models established with different PLSR methods.

QIsRegressionmethodsLVsCalibration
Cross-validation
Prediction
RCRMSECRCVRMSECVRPRMSEP
TOCPLS90.99750.134e30.94910.622 e30.96890.410e3
WPLS90.98700.317 e30.94870.624 e30.95070.385 e3
NPLS90.98010.328 e30.95880.606 e30.95330.379 e3
SSCPLS100.99180.01720.95740.03960.93350.0413
WPLS100.99050.01880.95780.03900.95090.0431
NPLS100.99080.01850.94640.03860.95190.0428
The performance parameters of the models established with different PLSR methods. It is well known that, during the PLS calibration, the most important issues are the selection of wavebands, spectra pretreated method and the determination of the most suitable latent variable (LV) numbers. Therefore the optimization methods are discussed in the next sections.

Selection of wavelengths

Different methods, including Monte Carlo uninformative variable elimination method (MC-UVE) [17], correlation coefficients method [18], were compared. However, the performance parameters of the calibration models were not remarkably improved. So the wavebands with clear physical meaning and high noise–signal ratio were adopted ultimately [19], [20], [21], [22], [23], which are 8932.66–4818.64 cm− 1 for the TOC calibration model, and 9002.08–4839.28 cm− 1 for the SSC calibration model.

Spectra pre-processing approach

As for the spectral data preprocessing, different combinations were compared to optimize the calibration models, including Savitzky–Golay algorithm and wavelets for the noise removing, first-order derivative and second-order derivative for the baseline eliminating, MSC and SNV for the light-scatter effects correcting, and also OSC method [24], [25], [26], [27]. The comparison of different spectra pretreated methods were listed in Table 3 .
Table 3

The comparison of different calibration models with different spectra pretreated methods.

Pretreated methodsTOC
SSC
RCRMSECRPRMSEPRCRMSECRPRMSEP
Raw spectra0.999921.10.92480.784e30.96120.03770.90440.0498
OSC0.94770.188e30.96300.408e30.99010.02120.92890.0407
Wavelet + 1d0.93220.129e30.90830.499e30.92180.02050.93870.0593
Wavelet + 2d0.97780.192e30.93650.582e30.97010.02220.90540.0613
SNV + SG(7,3) + 1d0.98220.360e30.46780.167e40.68020.10000.19060.1200
MSC + ND(15,5) + 2d0.99750.134e30.96890.410e30.99180.01720.93350.0413

Bias = − 0.0202.

The comparison of different calibration models with different spectra pretreated methods. Bias = − 0.0202.

Determination of the optimum latent variable numbers

Due to the multicollinearity among the NIR spectroscopic data, and also the autocorrelation with the dependent variables, the dimensionality reduction is very necessary. In the PLSR calibration, the optimum latent variable (LV) numbers are always determined using Leave-one-out (LOO) cross validation method, in which the spectrum of one sample in the calibration set was removed, and a PLS regression model was built with the remaining spectra in the calibration set, the left-out sample was predicted with this model and the procedure was repeated with leaving out each of the samples in the calibration set. Predicted residual error sum square (PRESS), which represents the sum of square of deviation between predicted and reference values of all the samples in the LOO cross-validation decreases with the increasing of the LVs, and when the PRESS value tends to be constant or increases gradually, the optimum LV numbers is achieved [28]. For both SSC and TOC, the optimum LVs were determined according to the LOO cross-validation method, and the PRESS-LVs correlation diagrams were shown in Fig. 4 .
Fig. 4

The PRESS-LVs correlation diagrams of SSC and TOC calibration models.

The PRESS-LVs correlation diagrams of SSC and TOC calibration models.

Calibration, optimization and validation of the models

Using the optimum parameters selected according to the procedures above described, the PLSR quantitative calibration models of TOC and SSC were established. It can be concluded from Table 2 that the coefficients of determination for calibration, cross validation, and external validation are all above 0.9, which show a satisfied correlation between the true values and the model predicted values. For all the 6 models, the RMSEP/RMSECV values are close to 1, which indicates the calibration models have satisfactory robustness. The correlation diagrams and the residuals plots of the PLSR calibration models are shown in Fig. 5 .
Fig. 5

The correlation diagrams and the residuals plots of linear PLSR models of TOC and SSC.

The correlation diagrams and the residuals plots of linear PLSR models of TOC and SSC.

Application of the calibration models

In the earlier study beforehand, we have developed a series of NIR spectroscopy based method for the rapid analysis of Tanreqing injection and its intermediates [29], [30], [31]. However, the indices of the analysis are mostly organic compounds. Although those active pharmaceutical ingredients (API) influence directly the effects of Tanreqing injection, it would have been otherwise impractical to use them as monitoring control indices, because it is difficult to find the definite link between the operating parameters and them. Instead, SSC and TOC are also two Critical Quality Attributes (CQAs) of the Tanreqing injection intermediates, which indicates the physical property and the chemical composition, respectively. The two CQAs need to be determined promptly and accurately during the liquid preparation process of Tanreqing injection. However, traditional analytical technology cannot meet this demand. The present NIR spectroscopy based methods have been successfully applied on the manufacturing lines, preferably solving the problem. In the course of industrial applications, we found that the analysis results can be more accurate with the consensus strategy of different PLSR models, in which the net result is defined as the average of the different PLSR models. To prove this, the mean recovery rate (MRR), which can be calculated with the following formula, was proposed to indicate the prediction accuracy of the individual models and the consensus strategy of different PLSR models. Take the validation samples as example, the MRRs are listed in Table 4 , from which, it can be concluded that the MRRs of the consensus strategy are always closer to 1, except the liner PLSR model of TOC. Therefore, in practical applications, after collecting the spectra of the samples, the TOC and SSC values are calculated separately with the three PLSR models, and then analysis of variance (ANOVA) is done to test whether there are significant differences among them. If no, the average values of the three models can be viewed as the final result, while if there are significant differences among them, standard analysis procedure should be done to obtain the standard values.
Table 4

The mean recovery rates of the individual models and the consensus of different PLSR models.

QIsRegression methodsMean recovery rate (%)
TOCLinear PLS98.26
WPLS97.53
NPLS104.05
Consensus102.11
SSCLinear PLS104.17
WPLS102.22
NPLS96.66
Consensus102.15
The mean recovery rates of the individual models and the consensus of different PLSR models.

Conclusion

In the present study, NIR spectroscopy based method was developed for the rapid analysis of Tanreqing injection intermediates. The overall results indicated that NIR spectroscopy had the capability to determine the SSC and TOC, and with consensus modeling strategy, the determination results can be more accurate and reliable. TOC and SSC are two important quality indices in most liquid product lines of TCM, therefore, the presented method is fairly effective and deserves popularization.

Conflict of interest

The authors declared that they have no conflicts of interest to this work.
  12 in total

1.  Application of near infrared spectroscopy for rapid analysis of intermediates of Tanreqing injection.

Authors:  Wenlong Li; Lihong Xing; Limin Fang; Jue Wang; Haibin Qu
Journal:  J Pharm Biomed Anal       Date:  2010-05-08       Impact factor: 3.935

2.  Monitoring batch-to-batch reproducibility of liquid-liquid extraction process using in-line near-infrared spectroscopy combined with multivariate analysis.

Authors:  Haoshu Xiong; Xingchu Gong; Haibin Qu
Journal:  J Pharm Biomed Anal       Date:  2012-06-28       Impact factor: 3.935

3.  Comparison of calibrations for the determination of soluble solids content and pH of rice vinegars using visible and short-wave near infrared spectroscopy.

Authors:  Fei Liu; Yong He; Li Wang
Journal:  Anal Chim Acta       Date:  2008-01-20       Impact factor: 6.558

4.  Rapid quantification of phenolic acids in radix Salvia miltiorrhiza extract solutions by FT-NIR spectroscopy in transflective mode.

Authors:  Wenlong Li; Haibin Qu
Journal:  J Pharm Biomed Anal       Date:  2010-01-18       Impact factor: 3.935

5.  NIR analysis for batch process of ethanol precipitation coupled with a new calibration model updating strategy.

Authors:  Bing Xu; Zhisheng Wu; Zhaozhou Lin; Chenglin Sui; Xinyuan Shi; Yanjiang Qiao
Journal:  Anal Chim Acta       Date:  2012-01-21       Impact factor: 6.558

6.  A consensus orthogonal partial least squares discriminant analysis (OPLS-DA) strategy for multiblock Omics data fusion.

Authors:  Julien Boccard; Douglas N Rutledge
Journal:  Anal Chim Acta       Date:  2013-01-21       Impact factor: 6.558

7.  Quality control of Lonicerae Japonicae Flos using near infrared spectroscopy and chemometrics.

Authors:  Wenlong Li; Zhiwei Cheng; Yuefei Wang; Haibin Qu
Journal:  J Pharm Biomed Anal       Date:  2012-09-24       Impact factor: 3.935

8.  In-line monitoring of alcohol precipitation by near-infrared spectroscopy in conjunction with multivariate batch modeling.

Authors:  Hongxia Huang; Haibin Qu
Journal:  Anal Chim Acta       Date:  2011-10-01       Impact factor: 6.558

9.  Analysis of berberine and total alkaloid content in cortex phellodendri by near infrared spectroscopy (NIRS) compared with high-performance liquid chromatography coupled with ultra-visible spectrometric detection.

Authors:  Chi-On Chan; Ching-Ching Chu; Daniel Kam-Wah Mok; Foo-Tim Chau
Journal:  Anal Chim Acta       Date:  2007-04-19       Impact factor: 6.558

10.  A study on the use of near-infrared spectroscopy for the rapid quantification of major compounds in Tanreqing injection.

Authors:  Wenlong Li; Zhiwei Cheng; Yuefei Wang; Haibin Qu
Journal:  Spectrochim Acta A Mol Biomol Spectrosc       Date:  2012-09-28       Impact factor: 4.098

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.