| Literature DB >> 29895771 |
Changjian Deng1, Kun Lv2, Debo Shi3, Bo Yang4, Song Yu5, Zhiyi He6, Jia Yan7.
Abstract
In this paper, a novel feature selection and fusion framework is proposed to enhance the discrimination ability of gas sensor arrays for odor identification. Firstly, we put forward an efficient feature selection method based on the separability and the dissimilarity to determine the feature selection order for each type of feature when increasing the dimension of selected feature subsets. Secondly, the K-nearest neighbor (KNN) classifier is applied to determine the dimensions of the optimal feature subsets for different types of features. Finally, in the process of establishing features fusion, we come up with a classification dominance feature fusion strategy which conducts an effective basic feature. Experimental results on two datasets show that the recognition rates of Database I and Database II achieve 97.5% and 80.11%, respectively, when k = 1 for KNN classifier and the distance metric is correlation distance (COR), which demonstrates the superiority of the proposed feature selection and fusion framework in representing signal features. The novel feature selection method proposed in this paper can effectively select feature subsets that are conducive to the classification, while the feature fusion framework can fuse various features which describe the different characteristics of sensor signals, for enhancing the discrimination ability of gas sensors and, to a certain extent, suppressing drift effect.Entities:
Keywords: electronic nose; feature fusion; feature selection; multiclass recognition; sensor drift
Year: 2018 PMID: 29895771 PMCID: PMC6021920 DOI: 10.3390/s18061909
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1The flow chart of the feature selection and fusion framework.
Number of samples in Dataset I.
| Group | Training Set | Test Set |
|---|---|---|
| No infection | 20 | 20 |
|
| 20 | 20 |
|
| 20 | 20 |
|
| 20 | 20 |
| Total | 80 | 80 |
Data structure of seven features.
| Features | MV | FFT | Db1 | Db2 | Db3 | Db4 | Db5 |
|---|---|---|---|---|---|---|---|
| Feature structure | 15 × 80 | 30 × 80 | 30 × 80 | 60 × 80 | 90 × 80 | 120 × 80 | 150 × 80 |
Note: MV, maximum value; FFT, the DC component and first order harmonic component of the coefficients of fast Fourier transformation; Db1, Db2, Db3, Db4, Db5, the approximation coefficients of discrete wavelet transformation based on wavelets Db1, Db2, Db3, Db4, and Db5, respectively.
Concentration ranges of analytes in Dataset II.
| Analytes | Ammonia | Acetaldehyde | Acetone | Ethylene | Ethanol | Toluene |
|---|---|---|---|---|---|---|
| Concentration Range (ppm) | 50–1000 | 5–300 | 10–300 | 10–300 | 10–600 | 10–100 |
Experimental long-term sensor drift big data.
| Batch ID | Month | Number of the Data | |||||
|---|---|---|---|---|---|---|---|
| Ethanol | Ethylene | Ammonia | Acetaldehyde | Acetone | Toluene | ||
| Batch 1 | 1, 2 | 83 | 30 | 70 | 98 | 90 | 74 |
| Batch 2 | 3~10 | 100 | 109 | 532 | 334 | 164 | 5 |
| Batch 3 | 11, 12, 13 | 216 | 240 | 275 | 490 | 365 | 0 |
| Batch 4 | 14, 15 | 12 | 30 | 12 | 43 | 64 | 0 |
| Batch 5 | 16 | 20 | 46 | 63 | 40 | 28 | 0 |
| Batch 6 | 17~20 | 110 | 29 | 606 | 574 | 514 | 467 |
| Batch 7 | 21 | 360 | 744 | 630 | 662 | 649 | 568 |
| Batch 8 | 22, 23 | 40 | 33 | 143 | 30 | 30 | 18 |
| Batch 9 | 24, 30 | 100 | 75 | 78 | 55 | 61 | 101 |
| Batch 10 | 36 | 600 | 600 | 600 | 600 | 600 | 600 |
Classification results of seven features based on different values of k and distance metrics for Dataset I (%).
| Distance |
| MV | FFT | Db1 | Db2 | Db3 | Db4 | Db5 |
|---|---|---|---|---|---|---|---|---|
| EU | 1 | 68.75 | 73.75 | 90.00 |
| 87.50 | 88.75 | 83.75 |
| 3 | 63.75 | 72.50 | 75.00 | 78.75 | 78.75 | 81.25 | 80.00 | |
| 5 | 46.25 | 45.00 | 66.25 | 70.00 | 70.00 | 71.25 | 73.75 | |
| 7 | 51.25 | 53.75 | 60.00 | 68.75 | 70.00 | 73.75 | 75.00 | |
| 9 | 43.75 | 60.00 | 57.50 | 66.25 | 66.25 | 66.25 | 70.00 | |
| CB | 1 | 66.25 | 70.00 |
|
| 86.25 | 86.25 | 82.50 |
| 3 | 60.00 | 62.50 | 71.25 | 75.00 | 76.25 | 77.50 | 75.00 | |
| 5 | 41.25 | 47.50 | 61.25 | 71.25 | 70.00 | 72.50 | 72.50 | |
| 7 | 52.50 | 53.75 | 57.50 | 71.25 | 71.25 | 72.50 | 73.75 | |
| 9 | 48.75 | 48.75 | 55.00 | 65.00 | 63.75 | 65.00 | 67.50 | |
| COS | 1 | 77.50 | 77.50 | 90.00 |
| 91.25 | 91.25 | 86.25 |
| 3 | 72.50 | 78.75 | 80.00 | 82.50 | 82.50 | 82.50 | 82.50 | |
| 5 | 57.50 | 60.00 | 68.75 | 68.75 | 72.50 | 73.75 | 80.00 | |
| 7 | 58.75 | 53.75 | 63.75 | 65.00 | 67.50 | 71.25 | 75.00 | |
| 9 | 46.25 | 43.75 | 55.00 | 62.50 | 61.25 | 61.25 | 66.25 | |
| COR | 1 | 78.75 | 77.50 |
| 92.50 | 91.25 | 92.50 | 87.50 |
| 3 | 71.25 | 76.25 | 81.25 | 85.00 | 85.00 | 85.00 | 85.00 | |
| 5 | 56.25 | 57.50 | 66.25 | 76.25 | 75.00 | 76.25 | 82.50 | |
| 7 | 52.50 | 61.25 | 63.75 | 66.25 | 67.50 | 71.25 | 76.25 | |
| 9 | 51.25 | 58.75 | 50.00 | 58.75 | 63.75 | 65.00 | 67.50 |
Note: EU, Euclidean distance; CB, cityblock distance; COS, cosine distance; COR, correlation distance. The bold numbers are the highest accuracies for each distance metric.
Classification results of eight features based on different values of k and distance metrics for Dataset II (%).
| Distance |
| DR | NDR | EMAi1 | EMAi2 | EMAi3 | EMAd1 | EMAd2 | EMAd3 |
|---|---|---|---|---|---|---|---|---|---|
| EU | 1 | 53.53 |
| 36.31 | 53.61 | 59.06 | 36.31 | 43.56 | 48.78 |
| 3 | 54.25 | 58.89 | 37.28 | 54.03 | 58.42 | 36.61 | 43.47 | 49.39 | |
| 5 | 54.06 | 59.42 | 38.47 | 54.28 | 58.00 | 37.11 | 43.47 | 49.00 | |
| 7 | 53.47 | 59.53 | 38.61 | 54.08 | 57.97 | 37.39 | 43.25 | 48.42 | |
| 9 | 53.50 | 59.50 | 38.11 | 53.64 | 57.61 | 37.03 | 43.00 | 48.31 | |
| CB | 1 | 57.33 | 61.97 | 38.22 | 54.03 | 60.25 | 36.64 | 44.36 | 50.50 |
| 3 | 60.58 |
| 37.50 | 55.78 | 58.89 | 36.75 | 44.42 | 51.36 | |
| 5 | 60.00 | 61.28 | 38.69 | 55.58 | 59.36 | 36.81 | 44.31 | 51.28 | |
| 7 | 60.03 | 61.78 | 38.11 | 54.53 | 59.61 | 37.03 | 44.36 | 51.03 | |
| 9 | 59.97 | 62.17 | 37.86 | 53.69 | 60.28 | 36.89 | 44.31 | 51.50 | |
| COS | 1 | 49.42 |
| 37.39 | 52.22 | 57.25 | 34.50 | 43.39 | 48.58 |
| 3 | 51.33 | 59.81 | 37.97 | 50.47 | 55.28 | 35.00 | 43.56 | 48.69 | |
| 5 | 51.31 | 59.22 | 38.28 | 50.42 | 55.58 | 35.64 | 43.19 | 48.50 | |
| 7 | 51.53 | 59.31 | 38.08 | 50.28 | 55.42 | 36.17 | 42.94 | 48.42 | |
| 9 | 52.00 | 59.00 | 37.78 | 50.06 | 55.42 | 36.19 | 42.58 | 48.19 | |
| COR | 1 | 49.56 | 59.72 | 37.89 | 51.03 | 56.44 | 35.72 | 40.86 | 46.69 |
| 3 | 49.94 |
| 37.75 | 49.50 | 54.58 | 35.61 | 41.06 | 47.31 | |
| 5 | 50.25 | 59.53 | 37.86 | 49.94 | 55.00 | 35.08 | 40.56 | 47.78 | |
| 7 | 50.36 | 59.14 | 37.36 | 49.11 | 54.69 | 35.67 | 40.47 | 47.50 | |
| 9 | 50.22 | 59.28 | 36.97 | 48.81 | 55.69 | 36.28 | 40.14 | 47.14 |
Note: the bold numbers are the highest accuracies for each distance metric.
Figure 2Separability index of maximum value (MV) for Dataset I and difference of the maximal resistance change and the baseline (DR) for Dataset II.
Figure 3Visualization of the dissimilarity matrix of MV and DR.
Figure 4The classification accuracies of seven features for Dataset I.
Figure 5The classification accuracies of eight features for Dataset II.
Optimal numbers of different features after selection for Dataset I.
| Features | MV | FFT | Db1 | Db2 | Db3 | Db4 | Db5 | |
|---|---|---|---|---|---|---|---|---|
| Distance metrics | COS | 15 | 18 | 21 | 10 | 10 | 36 | 74 |
| COR | 14 | 26 | 25 | 23 | 18 | 49 | 109 |
Optimal numbers of different features after selection for Dataset II.
| Features | DR | NDR | EMAi1 | EMAi2 | EMAi3 | EMAd1 | EMAd2 | EMAd3 | |
|---|---|---|---|---|---|---|---|---|---|
| Distance metrics | COS | 13 | 16 | 7 | 13 | 8 | 7 | 11 | 12 |
| COR | 13 | 9 | 7 | 10 | 8 | 16 | 11 | 12 |
Classification accuracy of Dataset I with/without feature selection.
| Features | MV | FFT | Db1 | Db2 | Db3 | Db4 | Db5 | ||
|---|---|---|---|---|---|---|---|---|---|
| Without selection | COS | Dimension |
|
|
|
|
|
|
|
| 1 | 80.00 | 85.00 | 90.00 | 90.00 | 95.00 | 95.00 | 95.00 | ||
| 2 | 80.00 | 80.00 | 90.00 | 95.00 | 85.00 | 85.00 | 80.00 | ||
| 3 | 65.00 | 70.00 | 95.00 | 90.00 | 90.00 | 90.00 | 90.00 | ||
| 4 | 85.00 | 75.00 | 85.00 | 95.00 | 95.00 | 95.00 | 80.00 | ||
| Average | 77.50 | 77.50 | 90.00 | 92.50 | 91.25 | 91.25 | 86.25 | ||
| COR | 1 | 75.00 | 85.00 | 95.00 | 90.00 | 90.00 | 95.00 | 95.00 | |
| 2 | 85.00 | 85.00 | 100.00 | 90.00 | 90.00 | 90.00 | 85.00 | ||
| 3 | 70.00 | 65.00 | 90.00 | 90.00 | 90.00 | 90.00 | 90.00 | ||
| 4 | 85.00 | 75.00 | 90.00 | 100.00 | 95.00 | 95.00 | 80.00 | ||
| Average | 78.75 | 77.50 | 93.75 | 92.50 | 91.25 | 92.50 | 87.50 | ||
| With selection | COS | Dimension |
|
|
|
|
|
|
|
| 1 | 80.00 | 85.00 | 95.00 | 95.00 | 95.00 | 95.00 | 95.00 | ||
| 2 | 80.00 | 80.00 | 90.00 | 90.00 | 95.00 | 90.00 | 85.00 | ||
| 3 | 65.00 | 75.00 | 95.00 | 95.00 | 90.00 | 95.00 | 85.00 | ||
| 4 | 85.00 | 80.00 | 95.00 | 95.00 | 90.00 | 90.00 | 90.00 | ||
| Average | 77.50 | 80.00 | 93.75 | 93.75 | 92.50 | 92.50 | 88.75 | ||
| COR | Dimension |
|
|
|
|
|
|
| |
| 1 | 85.00 | 85.00 | 95.00 | 95.00 | 90.00 | 90.00 | 95.00 | ||
| 2 | 90.00 | 85.00 | 100.00 | 95.00 | 95.00 | 100.00 | 85.00 | ||
| 3 | 70.00 | 70.00 | 95.00 | 95.00 | 95.00 | 95.00 | 90.00 | ||
| 4 | 80.00 | 80.00 | 95.00 | 100.00 | 95.00 | 95.00 | 85.00 | ||
| Average | 81.25 | 80.00 |
|
| 93.75 | 95.00 | 88.75 | ||
Note: 1, No-infection; 2, S. aureus; 3, E. coli; 4, P. aeruginosa.
Classification accuracy of Dataset II with/without feature selection.
| Features | DR | NDR | EMAi1 | EMAi2 | EMAi3 | EMAd1 | EMAd2 | EMAd3 | ||
|---|---|---|---|---|---|---|---|---|---|---|
| without selection | COS | Dimension |
|
|
|
|
|
|
|
|
| 1 | 61.00 | 26.00 | 73.17 | 98.67 | 65.00 | 61.17 | 69.67 | 85.00 | ||
| 2 | 85.17 | 98.67 | 67.17 | 83.33 | 84.50 | 54.00 | 62.83 | 67.67 | ||
| 3 | 90.33 | 91.83 | 10.00 | 37.17 | 89.33 | 27.33 | 58.50 | 79.50 | ||
| 4 | 6.67 | 17.33 | 40.67 | 59.50 | 67.67 | 1.33 | 5.83 | 15.00 | ||
| 5 | 46.17 | 58.17 | 30.83 | 23.83 | 27.50 | 23.33 | 55.50 | 40.67 | ||
| 6 | 7.17 | 70.50 | 2.50 | 10.83 | 9.50 | 39.83 | 8.00 | 3.67 | ||
| Average | 49.42 | 60.42 | 37.39 | 52.22 | 57.25 | 34.50 | 43.39 | 48.58 | ||
| COR | 1 | 53.00 | 25.50 | 73.33 | 98.83 | 65.50 | 60.83 | 70.67 | 83.00 | |
| 2 | 85.17 | 98.83 | 67.67 | 83.83 | 83.83 | 54.33 | 63.17 | 68.67 | ||
| 3 | 90.17 | 90.17 | 6.33 | 29.50 | 86.33 | 27.33 | 45.67 | 78.67 | ||
| 4 | 5.17 | 14.83 | 43.67 | 58.17 | 67.67 | 0.83 | 4.50 | 11.50 | ||
| 5 | 51.83 | 59.83 | 32.17 | 22.67 | 24.00 | 20.67 | 51.50 | 34.67 | ||
| 6 | 12.00 | 69.17 | 4.17 | 13.17 | 11.33 | 50.33 | 9.67 | 3.67 | ||
| Average | 49.56 | 59.72 | 37.89 | 51.03 | 56.44 | 35.72 | 40.86 | 46.69 | ||
| with selection | COS | Dimension |
|
|
|
|
|
|
|
|
| 1 | 47.17 | 26.00 | 75.00 | 87.50 | 82.00 | 52.50 | 77.83 | 82.50 | ||
| 2 | 92.00 | 98.67 | 67.17 | 84.33 | 90.17 | 53.00 | 66.00 | 79.33 | ||
| 3 | 85.83 | 91.83 | 31.83 | 32.83 | 75.33 | 31.67 | 58.67 | 76.50 | ||
| 4 | 31.67 | 17.33 | 16.50 | 71.33 | 43.00 | 1.50 | 30.50 | 20.00 | ||
| 5 | 93.67 | 58.17 | 88.17 | 49.33 | 62.83 | 41.00 | 70.00 | 47.00 | ||
| 6 | 76.33 | 70.50 | 19.67 | 29.50 | 59.67 | 43.67 | 47.17 | 62.00 | ||
| Average |
| 60.42 | 49.72 | 59.14 | 68.83 | 37.22 | 58.36 | 61.22 | ||
| COR | Dimension |
|
|
|
|
|
|
|
| |
| 1 | 41.00 | 51.00 | 73.33 | 92.17 | 85.83 | 60.83 | 74.00 | 79.50 | ||
| 2 | 92.67 | 99.17 | 72.33 | 87.50 | 91.17 | 54.33 | 64.00 | 79.33 | ||
| 3 | 86.67 | 98.67 | 24.17 | 60.50 | 59.33 | 27.33 | 44.83 | 76.67 | ||
| 4 | 31.33 | 0.00 | 9.50 | 5.33 | 43.67 | 0.83 | 16.00 | 9.33 | ||
| 5 | 93.17 | 30.33 | 81.83 | 43.33 | 65.33 | 20.67 | 69.00 | 42.17 | ||
| 6 | 81.33 | 95.67 | 37.17 | 52.83 | 91.00 | 50.33 | 47.17 | 60.33 | ||
| Average | 71.03 | 62.47 | 49.72 | 56.94 |
| 35.72 | 52.50 | 57.89 | ||
Note: 1, ethanol; 2, ethylene; 3, ammonia; 4, acetaldehyde; 5, acetone; 6, toluene.
Figure 6The 3D plot of classification results of the four feature fusion methods for Dataset I. (a) The conventional feature fusion method without feature selection; (b) the proposed feature fusion method without feature selection; (c) the proposed feature fusion method with feature selection; (Note: 1, No infection; 2, S. aureus; 3, E. coli; 4, P. aeruginosa.).
Figure 7The 3D plot of classification results of the four feature fusion methods for Dataset II. (a) The conventional feature fusion method without feature selection; (b) the proposed feature fusion method without feature selection; (c) the proposed feature fusion method with feature selection; (Note: 1, ethanol; 2, ethylene; 3, ammonia; 4, acetaldehyde; 5, acetone; 6, toluene).