| Literature DB >> 34957390 |
Xiaolong Li1, Wenwen Kong2, Xiaoli Liu3,4, Xi Zhang3, Wei Wang1, Rongqin Chen1, Yongqi Sun5, Fei Liu1,6.
Abstract
Accurate geographical origin identification is of great significance to ensure the quality of traditional Chinese medicine (TCM). Laser-induced breakdown spectroscopy (LIBS) was applied to achieve the fast geographical origin identification of wild Gentiana rigescens Franch (G. rigescens Franch). However, LIBS spectra with too many variables could increase the training time of models and reduce the discrimination accuracy. In order to solve the problems, we proposed two methods. One was reducing the number of variables through two consecutive variable selections. The other was transforming the spectrum into spectral matrix by spectrum segmentation and recombination. Combined with convolutional neural network (CNN), both methods could improve the accuracy of discrimination. For the underground parts of G. rigescens Franch, the optimal accuracy in the prediction set for the two methods was 92.19 and 94.01%, respectively. For the aerial parts, the two corresponding accuracies were the same with the value of 94.01%. Saliency map was used to explain the rationality of discriminant analysis by CNN combined with spectral matrix. The first method could provide some support for LIBS portable instrument development. The second method could offer some reference for the discriminant analysis of LIBS spectra with too many variables by the end-to-end learning of CNN. The present results demonstrated that LIBS combined with CNN was an effective tool to quickly identify the geographical origin of G. rigescens Franch.Entities:
Keywords: Gentiana rigescens franch; convolutional neural network; geographical origin identification; spectral matrix; variable importance measured
Year: 2021 PMID: 34957390 PMCID: PMC8703168 DOI: 10.3389/frai.2021.735533
Source DB: PubMed Journal: Front Artif Intell ISSN: 2624-8212
FIGURE 1The schematic diagram of the laser-induced breakdown spectroscopy (LIBS) experiment.
FIGURE 2The geographical origin identification flowchart using convolutional neural network including variable selection and end-to-end learning based on spectral matrix.
FIGURE 3The architectures of the proposed classification models: (A) the architecture of the Convolution 1Block; (B) the architecture of the Dense Block; (C) the architecture of the Convolution 2Block; (D) the architecture of the Residual Block; (E) the architecture of the 1D-CNN1; (F) the architecture of the 1D-CNN2; (G) the architecture of the 2D-CNN.
FIGURE 4The average LIBS spectra of G. rigescens Franch from 12 geographical origins: (A) underground parts and (B) aerial parts.
FIGURE 5The average LIBS spectra for underground parts of G. rigescens Franch with (A) original variables and (B) reserved variables by standard deviation. Note: The variables with near-zero standard deviation were set as “0” and colored black for better comparison.
FIGURE 6The selected variables by VIM using RF in LIBS spectra for (A) underground parts and (B) aerial parts of G. rigescens Franch.
The results of discriminant models based on underground parts of G. rigescens Franch using full spectra and selected variables.
| Variables selection method (number of variables | Model | Cal1(%) | Val2(%) | Pre3(%) |
|---|---|---|---|---|
| Full variables (22015 | LDA | 73.35 | 65.89 | 59.90 |
| KNN | 89.46 | 68.75 | 82.03 | |
| SVM | 100.00 | 86.72 | 91.93 | |
| 1D-CNN1 | 100.00 | 8.33 | 8.33 | |
| 1D-CNN2 | 100.00 | 8.33 | 8.33 | |
| First variable selection (2016 | LDA | 75.95 | 74.22 | 66.41 |
| KNN | 88.93 | 68.75 | 82.03 | |
| SVM | 100.00 | 89.06 | 89.58 | |
| 1D-CNN1 | 100.00 | 92.19 | 89.32 | |
| 1D-CNN2 | 100.00 | 90.36 | 88.54 | |
| Second variable selection (325 | LDA | 83.12 | 79.17 | 74.48 |
| KNN | 90.18 | 73.44 | 85.94 | |
| SVM | 100.00 | 90.63 | 88.02 | |
| 1D-CNN1 | 100.00 | 92.45 | 92.19 | |
| 1D-CNN2 | 100.00 | 87.24 | 89.84 |
1 2 3. Cal, Val and Pre are assigned respectively as the discriminant accuracy of calibration set, validation set, and prediction set.
The results of 2D-CNN with the input of spectral matrix for underground and aerial parts of G. rigescens Franch.
| Sample |
| Cal1(%) | Val2(%) | Pre3(%) |
|---|---|---|---|---|
| underground parts | 110 | 100.00 | 95.57 | 93.49 |
| 150 | 100.00 | 95.05 | 94.01 | |
| 200 | 100.00 | 95.57 | 92.97 | |
| aerial parts | 110 | 100.00 | 93.23 | 92.71 |
| 150 | 100.00 | 93.23 | 94.01 | |
| 200 | 100.00 | 96.35 | 92.45 |
FIGURE 7The average saliency map of each geographical origin based on 2D-CNN for underground parts of G. rigescens Franch. (A–L) represent 12 different geographical origins. (M, N) represent the important wavelengths selected by 2D-CNN and VIM of RF, respectively.
FIGURE 8The confusion matrix for the prediction set of underground parts of G. rigescens Franch based on (A) 1D-CNN1 after the second variable selection and (B) 2D-CNN with the input of spectral matrix.
FIGURE 9The clustering visualization in layers of (A) Conv. 1Block1, (B) Conv. 1Block2 and (C) Dense 12 in 2D-CNN for underground parts by t-SNE.