| Literature DB >> 35889512 |
Yang-Qiannan Tang1, Li Li1, Tian-Feng Lin1, Li-Mei Lin1, Ya-Mei Li1, Bo-Hou Xia1.
Abstract
LJF and LF are commonly used in Chinese patent drugs. In the Chinese Pharmacopoeia, LJF and LF once belonged to the same source. However, since 2005, the two species have been listed separately. Therefore, they are often misused, and medicinal materials are indiscriminately put in their related prescriptions in China. In this work, firstly, we established a model for discriminating LJF and LF using ATR-FTIR combined with multivariate statistical analysis. The spectra data were further preprocessed and combined with spectral filter transformations and normalization methods. These pretreated data were used to establish pattern recognition models with PLS-DA, RF, and SVM. Results demonstrated that the RF model was the optimal model, and the overall classification accuracy for LJF and LF samples reached 98.86%. Then, the established model was applied in the discrimination of their related prescriptions. Interestingly, the results show good accuracy and applicability. The RF model for discriminating the related prescriptions containing LJF or LF had an accuracy of 100%. Our results suggest that this method is a rapid and effective tool for the successful discrimination of LJF and LF and their related prescriptions.Entities:
Keywords: ATR-FTIR; Lonicerae Flos; Lonicerae japonicae Flos; multivariate statistical analysis
Mesh:
Substances:
Year: 2022 PMID: 35889512 PMCID: PMC9322902 DOI: 10.3390/molecules27144640
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.927
Figure 1ATR-FTIR spectra of representive LJF and LF.
Peak assignments of the ATR-FTIR spectra of LJF and LF.
| LJF | LF | Vibration | Suggested Biomolecular Assignment |
|---|---|---|---|
| 4000–3500 | 4000–3500 | O-H, v | water |
| 3350 | 3357 | O-H, v | saccharides |
| 2920 | 2920 | CH2, CH3νas | lipids (cutin and waxes), proteins, carbohydrates |
| 2851 | 2850 | CH2, νs | lipids (cutin and waxes), proteins, carbohydrates |
| 2442–2208 | 2442–2208 | C-O-C, v | CO2 |
| 1730 | 1735 | C=O, v | lipids (cutin and waxes) |
| 1633 | 1629 | C-O, v C-N, v | amide I band |
| 1545 | Amide II bands | proteins | |
| 1528 | Amide II bands | phenolic acids, flavonoids | |
| 14,401,374 | 14,401,376 | O-H, v | organic acid, flavonoids |
| 1400 | C-H, δ | saccharides | |
| 1321 | 1314 | C-O, v | lipids, flavonoid |
| 1147 | 1152 | C-O, v CO-O-C, νas | cholesterol ester, oligosaccharides, triacylglycerols |
| 1047 | 1049 | C-O, v | starch |
| 930 | C-O-C, skeletal | saccharides | |
| 817 | 813 | C-H, δoop | |
| 781 | COO−, skeletal | saponins |
v—stretching, νs—symmetrical stretching, νas—asymmetrical stretching, δ—bending, δoop—bending out of the plane.
Comparing the performance of RF, PLS-DA, and SVM models according to various normalization and spectral filter transformations between LJF and LF samples.
| RF Model | |||||
|---|---|---|---|---|---|
| Pretreatment Methods | SENS | SPEC | ACC | MCC | AUC |
| No methods | 0.9167 | 0.8250 | 0.8750 | 0.7481 | 0.9190 |
| Vector (first) | 0.9706 | 1 | 0.9844 | 0.9692 | 0.9710 |
| Vector (second) | 0.9583 | 1 | 0.9773 | 0.9554 | 0.9630 |
| Min-max | 0.9375 | 0.9750 | 0.9545 | 0.9097 | 0.9368 |
| Area | 0.9500 | 0.9375 | 0.9432 | 0.8859 | 0.9491 |
| EWMA | 0.9167 | 0.9000 | 0.9091 | 0.8167 | 0.9090 |
| MSC | 0.9500 | 0.9583 | 0.9545 | 0.9083 | 0.9310 |
| RC | 0.9500 | 0.9583 | 0.9545 | 0.9083 | 0.9430 |
| S-G | 0.9250 | 0.9167 | 0.9205 | 0.8401 | 0.9168 |
| SNV | 0.9500 | 0.9375 | 0.9432 | 0.8859 | 0.9291 |
| airPLS | 0.9750 | 0.9792 | 0.9773 | 0.9542 | 0.9690 |
| PLS-DA Model | |||||
| Pretreatment methods | SENS | SPEC | ACC | MCC | AUC |
| No methods | 0.9412 | 1 | 0.9687 | 0.9393 | 0.9229 |
| Vector (first) | 0.9750 | 0.9792 | 0.9773 | 0.9542 | 0.9710 |
| Vector (second) | 0.9500 | 0.9583 | 0.9545 | 0.9083 | 0.9630 |
| Min-max | 0.9333 | 0.9667 | 0.9687 | 0.9389 | 0.9218 |
| Area | 0.9118 | 0.9706 | 0.9531 | 0.9104 | 0.9091 |
| EWMA | 0.9412 | 1 | 0.9687 | 0.9393 | 0.9290 |
| MSC | 0.9706 | 0.9667 | 0.9687 | 0.9373 | 0.9610 |
| RC | 0.9750 | 0.9792 | 0.9773 | 0.9542 | 0.9430 |
| S-G | 0.9412 | 1 | 0.9687 | 0.9393 | 0.9268 |
| SNV | 0.9667 | 0.9706 | 0.9687 | 0.9373 | 0.9491 |
| airPLS | 0.9750 | 0.9792 | 0.9773 | 0.9542 | 0.9690 |
| SVM Model | |||||
| Pretreatment methods | SENS | SPEC | ACC | MCC | AUC |
| No methods | 0.9500 | 0.9792 | 0.9659 | 0.9314 | 0.9390 |
| Vector (first) | 0.9792 | 0.9981 | 0.9716 | 0.9724 | 0.9710 |
| Vector (second) | 0.9750 | 0.9792 | 0.9773 | 0.9542 | 0.9630 |
| Min-max | 0.9750 | 0.9792 | 0.9773 | 0.9542 | 0.9668 |
| Area | 0.7250 | 0.5208 | 0.6136 | 0.2490 | 0.1291 |
| EWMA | 0.9500 | 0.9792 | 0.9659 | 0.9314 | 0.9290 |
| MSC | 0.9250 | 0.9792 | 0.9545 | 0.9089 | 0.9010 |
| RC | 0.9500 | 0.9583 | 0.9545 | 0.9083 | 0.9230 |
| S-G | 0.9500 | 0.9792 | 0.9659 | 0.9314 | 0.9168 |
| SNV | 0.9750 | 0.9792 | 0.9773 | 0.9542 | 0.9491 |
| airPLS | 0.9500 | 0.9792 | 0.9659 | 0.9314 | 0.9390 |
RF: random forest; PLS-DA: partial least squares-linear discriminant analysis; SVM: support vector machine regression; EWMA: exponentially weighted moving average; MSC: multiplicative scatter correction; RC: row center; S-G: Savitzky–Golay; SNV: standard normal variate; airPLS: adaptive iteratively reweighted penalized least squares.
The classification results and evaluation parameters between LJF and LF combined with RF, PLS-DA, and SVM models by vector normalization applied after the first differentiation.
| Calibration Set | SENS | SPEC | ACC | MCC | AUC |
|---|---|---|---|---|---|
| RF | 0.9706 | 1 | 0.9844 | 0.9692 | 0.9775 |
| PLS-DA | 0.9750 | 0.9792 | 0.9773 | 0.9542 | 0.9546 |
| SVM | 0.9792 | 0.9981 | 0.9716 | 0.9724 | 0.9668 |
| Validation set | SENS | SPEC | ACC | MCC | AUC |
| RF | 0.9706 | 0.9981 | 0.9744 | 0.9592 | 0.9425 |
| PLS-DA | 0.9250 | 0.9792 | 0.9545 | 0.9089 | 0.9006 |
| SVM | 0.9500 | 0.9792 | 0.9659 | 0.9314 | 0.9218 |
The parameter screening in the RF model for variables is ranked by permutation accuracy importance.
| ntree | SENS | SPEC | ACC | MCC | AUC |
|---|---|---|---|---|---|
| 100 | 0.9750 | 0.9792 | 0.9773 | 0.9554 | 0.9390 |
| 200 | 0.9286 | 1 | 0.9583 | 0.9188 | 0.9010 |
| 300 | 0.9706 | 1 | 0.9844 | 0.9692 | 0.9775 |
| 500 | 0.9583 | 0.9750 | 0.9659 | 0.9316 | 0.9168 |
| 800 | 0.9583 | 0.9750 | 0.9659 | 0.9316 | 0.9168 |
| 1000 | 0.9286 | 1 | 0.9583 | 0.9188 | 0.9018 |
| mtry | SENS | SPEC | ACC | MCC | AUC |
| 82 | 0.9583 | 1 | 0.9773 | 0.9554 | 0.9390 |
| 84 | 0.9583 | 0.9750 | 0.9659 | 0.9316 | 0.9168 |
| 86 | 0.9792 | 1 | 0.9886 | 0.9774 | 0.9875 |
| 88 | 0.9706 | 1 | 0.9844 | 0.9692 | 0.9775 |
| 90 | 0.9583 | 0.9750 | 0.9659 | 0.9316 | 0.9168 |
| 92 | 0.9583 | 1 | 0.9773 | 0.9554 | 0.9390 |
| 94 | 0.9792 | 0.9750 | 0.9773 | 0.9542 | 0.9390 |
| 96 | 0.9583 | 0.9750 | 0.9659 | 0.9316 | 0.9168 |
List of permutation parameters of the random forest model obtained using variables selected by vector normalization applied after the first differentiation.
| Normalization Method | SENS | SPEC | ACC | MCC | AUC |
|---|---|---|---|---|---|
| Vector (First) | |||||
| 4000–600 cm−1 except for water vapor, carbon dioxide region | 0.9750 | 0.9792 | 0.9773 | 0.9542 | 0.9425 |
| 2000–600 cm−1 | 0.9792 | 0.9750 | 0.9773 | 0.9542 | 0.9390 |
| 4000–2000 cm−1 | 0.9583 | 1 | 0.9773 | 0.9554 | 0.9390 |
| 4000–600 cm−1 | 0.9706 | 1 | 0.9844 | 0.9692 | 0.9775 |
Various VIP cutoff values using 4000–600 cm−1 wavenumber areas for the comparison of LJF and LF.
| VIP Cutoff | SENS | SPEC | ACC | MCC | AUC |
|---|---|---|---|---|---|
| 0.05 | 0.9412 | 1 | 0.9688 | 0.9393 | 0.9168 |
| 0.01 | 0.9750 | 1 | 0.9886 | 0.9773 | 0.9775 |
| 0.015 | 0.9512 | 0.9867 | 0.9731 | 0.9465 | 0.9425 |
| 0.020 | 0.9000 | 0.9706 | 0.9375 | 0.8758 | 0.8625 |