| Literature DB >> 32140161 |
Chao-Yong Wang1,2, Li Tang1,3, Li Li1,2, Qiang Zhou1,2, You-Ji Li1,4, Jing Li1,2, Yuan-Zhong Wang1,5.
Abstract
To explore the influences of different cultivated areas on the chemical profiles of Eucommia ulmoides leaves (EUL) and rapidly authenticate its geographical origins, 187 samples from 13 provinces in China were systematically investigated using three data fusion strategies (low, mid, and high level) combined with two discrimination model algorithms (partial least squares discrimination analysis; random forest, RF). RF models constructed by high-level data fusion with different modes of different spectral data (Fourier transform near-infrared spectrum and attenuated total reflection Fourier transform mid-infrared spectrum) were most suitable for identifying EULs from different geographical origins. The accuracy rates of calibration and validation set were 92.86% and 93.44%, respectively. In addition, climate parameters were systematically investigated the cluster difference in our study. Some interesting and novel information could be found from the clustering tree diagram of hierarchical cluster analysis. The Xinjiang Autonomous Region (Region 5) located in the high latitude area was the only region in the middle temperate zone of all sample collection areas in which the samples belonged to an individual class no matter their distance in the tree diagram. The samples were from a relatively high elevation in the Shennongjia Forest District in Hubei Province (>1200 m), which is the main difference from the samples from Xiangyang City (78 m). Thus, the sample clusters from region 9 are different from the sample clusters from other regions. The results would provide a reference for further research to those samples from the special cluster.Entities:
Keywords: Attenuated Total Reflection Fourier Transform Mid-Infrared; Eucommia ulmoides leaves; Fourier Transform Near-Infrared; chemometrics; environment
Year: 2020 PMID: 32140161 PMCID: PMC7042207 DOI: 10.3389/fpls.2020.00079
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 5.753
Information of the leaves (EUL) samples.
| Region | Number of individuals | Collection site | Latitude (N) | Longitude (E) | Elevation (m) |
|---|---|---|---|---|---|
| 1 | 11 | Pingxiang City, Jiangxi Province | 27°41′55.96″ | 114°05′49.77″ | 150 |
| 2 | 31 | Zunyi City, Guizhou Province | 27°24′04.63″ | 106°57′42.17″ | 857 |
| Zunyi City, Guizhou Province | 27°43′29.82″ | 106°52′49.39″ | 946 | ||
| Zunyi City, Guizhou Province | 27°38′42.25″ | 106°53′35.42″ | 925 | ||
| 3 | 10 | Guangyuan city, Sichuan Province | 32°13′38.78″ | 106°18′06.24″ | 524 |
| 4 | 21 | Ankang City, Shaanxi Province | 32°54′11.78″ | 108°30′36.21″ | 443 |
| Hanzhong City, Shaanxi Province | 33°20′02.71″ | 106°00′18.25″ | 727 | ||
| 5 | 7 | Ürümqi City, Xinjiang Autonomous Region | 44°02′35.60″ | 87°27′45.06″ | 205 |
| Fukang City, Xinjiang Autonomous Region | 44°18′32.65″ | 88°35′38.32″ | 513 | ||
| 6 | 20 | Zhangjiajie City, Hunan Province | 29°31′22.69″ | 110°46′02.50″ | 334 |
| Jishou City, Hunan Province | 28°18′17.38″ | 109°38′13.42″ | 285.1 | ||
| 7 | 16 | Shennongjia Forestry District, Hubei Province | 31°28′44.43″ | 110°22′43.08″ | 1343 |
| Xiangyang city, Hubei province | 32°00′56.49″ | 112°09′59.91″ | 78 | ||
| Shennongjia Forestry District, Hubei Province | 31°26′55.95″ | 110°23′89.11″ | 1,247 | ||
| 8 | 30 | Pingdingshan City, Henan Province | 34°04′21.02″ | 113°12′52.28″ | 252 |
| Lingbao City, Henan Province | 34°16′56.56″ | 110°39′13.91″ | 984 | ||
| Xingyang City, Henan Province | 34°43′10.21″ | 113°17′18.14″ | 420 | ||
| 9 | 5 | Longnan City, Gansu Province | 32°52′53.59″ | 104°24′12.91″ | 166 |
| Longnan City, Gansu Province | 32°53′46.02″ | 104°22′57.99″ | 1,763 | ||
| 10 | 10 | Nanjing City, Jiangsu Province | 32°04′77.28″ | 118°45′73.63″ | 347 |
| 11 | 6 | Dingzhou City, Hebei Province | 38°53′00.38″ | 115°22′03.12″ | 68 |
| 12 | 10 | Lu’an City, Anhui Province | 31°28′30.79″ | 115°50′50.84″ | 266 |
| 13 | 10 | Linzi City, Shandong Province | 36°77′40.87″ | 118°30′63.12″ | 31 |
Figure 1Stacked Fourier transform near-infrared (FT-NIR) spectra of Eucommia ulmoides leaves (EUL) from thirteen geographical regions.
Peak assignments on the FT-NIR and ATR-FT-MIR spectra of EUL.
| Spectral type | Wavenumber (cm-1) | Assignments |
|---|---|---|
| NIR | 8,295 | Second overtone of C–H stretching |
| 6,881 | First overtone with O-H stretching | |
| 5,775 | C–H stretching of R–OHCH3 | |
| 5,172 | O–H stretching and OH deformation of H2O | |
| 5,000–4,000 | C–H stretching and C═O group frequencies of carbohydrates and C–H stretching and deformation group frequencies of polysaccharide | |
| MIR | 3,317 | O–H stretching of polysaccharide |
| 2,919, 2,851 | Asymmetric and symmetric C–H stretching of CH2 | |
| 1,734 | C═O stretching of lipids, etc. | |
| 1,629 | Amide I band | |
| 1,607 | C═O stretching of flavones | |
| 1,553 | Amide II band | |
| 1,439 | C–H scissoring and in-plane deforming | |
| 1,375 | CH3 scissoring | |
| 1,317 | α-Helix of amide III band | |
| 1,243 | Amide III and C–O stretching | |
| 1,145 | C–O–C stretching | |
| 1,101, 1,068, 1,054 | Polysaccharide rings | |
| 920 | Sugar skeleton vibration |
Figure 2Stacked attenuated total reflection Fourier transform mid-infrared (ATR-FT-MIR) spectra of Eucommia ulmoides leaves (EUL) from 13 geographical regions.
Figure 3The exploratory analysis results of Eucommia ulmoides leaves (EUL) samples in ≥10°C accumulated temperature (A): The distribution of each collection site; (B): PCA; (C): t-SNE.
Figure 4The exploratory analysis results of Eucommia ulmoides leaves (EUL) samples in annual average temperature (A): The distribution of each collection site; (B): PCA; (C): t-SNE.
Figure 5The exploratory analysis results of Eucommia ulmoides leaves (EUL) samples in dryness (A): The distribution of each collection site; (B): PCA; (C): t-SNE.
Figure 7The exploratory analysis results of Eucommia ulmoides leaves (EUL) samples in moisture index (A): The distribution of each collection site; (B): PCA; (C): t-SNE.
Figure 8The exploratory analysis results of Eucommia ulmoides leaves (EUL) samples in soil type (A): The distribution of each collection site; (B): PCA; (C): t-SNE.
Figure 9Hierarchical cluster analysis (HCA) dendrograms of the Eucommia ulmoides leaves (EUL) samples from different regions (The hyphen is preceded by the region of EUL samples, the hyphen followed by each sample collection site elevation).
Classification parameters obtained for PLS-DA model using low-level fusion of EUL with different collection regions.
| Parameters | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | Average |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Calibration set | ||||||||||||||
| SEN (%) | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
| SPE (%) | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
| PRE (%) | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
| EFF (%) | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
| RMSEE | 0.08 | 0.13 | 0.09 | 0.10 | 0.05 | 0.11 | 0.10 | 0.12 | 0.06 | 0.07 | 0.07 | 0.09 | 0.08 | 0.09 |
| RMSECV | 0.15 | 0.28 | 0.20 | 0.24 | 0.19 | 0.23 | 0.23 | 0.30 | 0.17 | 0.14 | 0.17 | 0.18 | 0.18 | 0.20 |
| Validation set | ||||||||||||||
| SEN (%) | 100.00 | 100.00 | 33.33 | 100.00 | 100.00 | 57.14 | 100.00 | 100.00 | 0 | 100.00 | 0 | 100.00 | 66.67 | 73.63 |
| SPE (%) | 100.00 | 96.08 | 100.00 | 100.00 | 100.00 | 100.00 | 96.43 | 88.24 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 98.52 |
| PRE (%) | 100.00 | 83.33 | 100.00 | 100.00 | 100.00 | 100.00 | 71.43 | 62.50 | – | 100.00 | – | 100.00 | 100.00 | 92.48 |
| EFF (%) | 100.00 | 98.02 | 57.74 | 100.00 | 100.00 | 75.59 | 98.20 | 93.93 | 0 | 100.00 | 0 | 100.00 | 81.65 | 77.32 |
| RMSEP | 0.15 | 0.26 | 0.17 | 0.20 | 0.10 | 0.27 | 0.18 | 0.28 | 0.15 | 0.13 | 0.15 | 0.15 | 0.17 | 0.18 |
| Permutation test | ||||||||||||||
| R2 (min–max) | 0.66–0.84 | 0.66–0.87 | 0.55–0.85 | 0.73–0.85 | 0.26–0.88 | 0.66–0.83 | 0.67–0.85 | 0.69–0.86 | 0.47–0.84 | 0.72–0.83 | 0.65–0.86 | 0.59–0.83 | 0.65–0.85 | – |
| Q2 (min–max) | -1.08–-0.08 | -1.38–-0.33 | -0.98–-0.13 | -0.97–-0.15 | -0.91–-0.01 | -1.48–-0.37 | -1.27–-0.30 | -1.25–-0.05 | -0.83–-0.21 | -0.94–-0.21 | -0.91–-0.18 | -1.28–-0.26 | -1.17–-0.27 | – |
| Original R2 | 0.91 | 0.89 | 0.88 | 0.92 | 0.94 | 0.88 | 0.88 | 0.90 | 0.86 | 0.91 | 0.88 | 0.86 | 0.89 | 0.89 |
| Original Q2 | 0.64 | 0.44 | 0.30 | 0.48 | 0.36 | 0.58 | 0.38 | 0.42 | -0.22 | 0.70 | 0.21 | 0.48 | 0.47 | 0.40 |
| Q2-intercept | -0.69 | -0.80 | -0.64 | -0.60 | -0.57 | -0.85 | -0.68 | -0.71 | -0.55 | -0.70 | -0.62 | -0.78 | -0.67 | -0.68 |
Classification parameters obtained for PLS-DA model using mid-level fusion of EUL with different collection regions.
| Parameters | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | Average |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Calibration set | ||||||||||||||
| SEN (%) | 100.00 | 100.00 | 0 | 85.71 | 100.00 | 92.31 | 100.00 | 100.00 | 0 | 100.00 | 75.00 | 71.43 | 100.00 | 78.80 |
| SPE (%) | 100.00 | 91.43 | 100.00 | 100.00 | 99.17 | 100.00 | 97.39 | 97.17 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 98.86 |
| PRE (%) | 100.00 | 70.00 | – | 100.00 | 83.33 | 100.00 | 78.57 | 86.96 | – | 100.00 | 100.00 | 100.00 | 100.00 | 92.62 |
| EFF (%) | 100.00 | 95.62 | 0 | 92.58 | 99.59 | 96.08 | 98.69 | 98.57 | 0 | 100.00 | 86.60 | 84.52 | 100.00 | 80.94 |
| RMSEE | 0.15 | 0.27 | 0.21 | 0.24 | 0.07 | 0.16 | 0.18 | 0.27 | 0.15 | 0.10 | 0.14 | 0.17 | 0.17 | 0.18 |
| RMSECV | 0.21 | 0.33 | 0.21 | 0.29 | 0.19 | 0.24 | 0.24 | 0.34 | 0.16 | 0.16 | 0.18 | 0.22 | 0.20 | 0.23 |
| Validation set | ||||||||||||||
| SEN (%) | 75.00 | 100.00 | 0 | 100.00 | 100.00 | 57.14 | 100.00 | 100.00 | 0 | 100.00 | 0.00 | 33.33 | 66.67 | 64.01 |
| SPE (%) | 100.00 | 92.16 | 100.00 | 100.00 | 100.00 | 100.00 | 92.86 | 90.20 | 100.00 | 98.28 | 100.00 | 100.00 | 100.00 | 97.96 |
| PRE (%) | 100.00 | 71.43 | – | 100.00 | 100.00 | 100.00 | 55.56 | 66.67 | – | 75.00 | – | 100.00 | 100.00 | 86.87 |
| EFF (%) | 86.60 | 96.00 | 0 | 100.00 | 100.00 | 75.59 | 96.36 | 94.97 | 0 | 99.13 | 0.00 | 57.74 | 81.65 | 68.31 |
| RMSEP | 0.18 | 0.28 | 0.21 | 0.23 | 0.12 | 0.28 | 0.21 | 0.28 | 0.16 | 0.13 | 0.15 | 0.17 | 0.18 | 0.20 |
| Permutation test | ||||||||||||||
| R2 (min–max) | 0.06–0.50 | 0.12–0.46 | 0.10–0.63 | 0.10–0.57 | 0.11–0.50 | 0.19–0.48 | 0.13–0.46 | 0.24–0.49 | 0.09–0.53 | 0.11–0.57 | 0.09–0.65 | 0.13–0.49 | 0.08–0.51 | – |
| Q2 (min–max) | -0.69–-0.15 | -0.79–-0.15 | -0.78–-0.03 | -0.81–-0.19 | -0.67–-0.14 | -1.09–-0.19 | -0.62–-0.05 | -0.88–-0.27 | -0.58–-0.01 | -0.77–-0.09 | -0.72–-0.05 | -0.64–-0.14 | -0.64–-0.03 | – |
| Original R2 | 0.59 | 0.50 | 0.23 | 0.48 | 0.87 | 0.75 | 0.63 | 0.50 | 0.14 | 0.83 | 0.42 | 0.49 | 0.48 | 0.53 |
| Original Q2 | 0.21 | 0.25 | 0.11 | 0.12 | 0.49 | 0.41 | 0.32 | 0.20 | -0.06 | 0.58 | -0.05 | 0.13 | 0.25 | 0.23 |
| Q2-intercept | -0.41 | -0.50 | -0.46 | -0.44 | -0.37 | -0.55 | -0.47 | -0.56 | -0.27 | -0.45 | -0.31 | -0.39 | -0.38 | -0.43 |
Figure 10The parameter optimization of random forest models (A): ntree of the low-level data fusion dataset; (B): mtry of the low-level data fusion dataset; (C): ntree of the mid-level data fusion dataset; (D): mtry of the mid-level data fusion dataset).
Figure 11The permutation accuracy importance of each variable of spectra (A): MSC+SD FT-NIR; (B): MSC+SD ATR-FT-MIR).