| Literature DB >> 36014182 |
Fangyu Sun1, Ruihong Sun2, Jia Yan2,3.
Abstract
The problem of drift in the electronic nose (E-nose) is an important factor in the distortion of data. The existing active learning methods do not take into account the misalignment of the data feature distribution between different domains due to drift when selecting samples. For this, we proposed a cross-domain active learning (CDAL) method based on the Hellinger distance (HD) and maximum mean difference (MMD). In this framework, we weighted the HD with the MMD as a criterion for sample selection, which can reflect as much drift information as possible with as few labeled samples as possible. Overall, the CDAL framework has the following advantages: (1) CDAL combines active learning and domain adaptation to better assess the interdomain distribution differences and the amount of information contained in the selected samples. (2) The introduction of a Gaussian kernel function mapping aligns the data distribution between domains as closely as possible. (3) The combination of active learning and domain adaptation can significantly suppress the effects of time drift caused by sensor ageing, thus improving the detection accuracy of the electronic nose system for data collected at different times. The results showed that the proposed CDAL method has a better drift compensation effect compared with several recent methodological frameworks.Entities:
Keywords: active learning; cross-domain learning; drift compensation; electronic nose
Year: 2022 PMID: 36014182 PMCID: PMC9413090 DOI: 10.3390/mi13081260
Source DB: PubMed Journal: Micromachines (Basel) ISSN: 2072-666X Impact factor: 3.523
Figure 1Schematic diagram of the CDAL framework. The cross-domain active learning method using weighted HD and MMD selects samples to update the classification model, which enables more representative labeled samples.
Figure 2Active Learning Framework.
Brief of drift dataset.
| Batch ID | Month | Acetone | Acetaldehyde | Ethanol | Ethylene | Ammonia | Toluene | Total |
|---|---|---|---|---|---|---|---|---|
| Batch 1 | 1–2 | 90 | 98 | 445 | 30 | 70 | 74 | 445 |
| Batch 2 | 3–10 | 164 | 334 | 1244 | 109 | 532 | 5 | 1244 |
| Batch 3 | 11–13 | 365 | 490 | 1586 | 240 | 275 | 0 | 586 |
| Batch 4 | 14–15 | 64 | 43 | 161 | 30 | 12 | 0 | 161 |
| Batch 5 | 16 | 28 | 40 | 197 | 46 | 63 | 0 | 197 |
| Batch 6 | 17–20 | 514 | 574 | 2300 | 29 | 606 | 467 | 2300 |
| Batch 7 | 21 | 649 | 662 | 3613 | 744 | 630 | 568 | 3613 |
| Batch 8 | 22–23 | 30 | 30 | 294 | 33 | 143 | 18 | 294 |
| Batch 9 | 24,30 | 61 | 55 | 470 | 75 | 78 | 101 | 470 |
| Batch 10 | 36 | 600 | 600 | 600 | 600 | 600 | 600 | 3600 |
Figure 3Principal component analysis of ten batches of data. The drift of the data over time batches is evident from the graphs.
Comparison of recognition accuracy in long-term drift (%).
| Method | 1–2 | 1–3 | 1–4 | 1–5 | 1–6 | 1–7 | 1–8 | 1–9 | 1–10 | Average |
|---|---|---|---|---|---|---|---|---|---|---|
| SVM | 47.99 | 57.57 | 65.22 | 32.99 | 45.09 | 35.57 | 24.83 | 40.21 | 31.19 | 42.30 |
| ELM | 69.13 | 46.22 | 32.30 | 46.19 | 44.91 | 35.37 | 25.51 | 33.19 | 37.19 | 41.11 |
| CCPCA | 77.65 | 67.91 | 65.84 | 69.54 | 72.04 | 54.58 | 65.31 | 65.11 | 37.14 | 63.90 |
| OSC | 79.74 | 35.25 | 48.45 | 52.28 | 34.30 | 43.84 | 49.66 | 45.32 | 22.83 | 45.74 |
| LDA | 70.90 | 73.58 | 63.35 | 59.90 | 63.57 | 55.58 | 67.69 | 47.23 | 43.22 | 60.56 |
| DS | 42.77 | 30.90 | 39.13 | 48.22 | 26.35 | 19.96 | 48.64 | 23.19 | 27.94 | 34.12 |
| GLSW | 72.67 | 42.37 | 70.19 | 52.79 | 49.78 | 43.18 | 57.48 | 41.91 | 37.47 | 51.98 |
| DRCA | 64.31 | 83.35 | 80.75 | 74.62 | 55.04 | 42.37 | 48.64 | 40.00 | 39.39 | 58.72 |
| CDSL | 79.18 | 82.85 | 80.75 | 76.14 | 71.78 | 56.10 | 74.49 |
| 40.31 | 69.59 |
| AL-KLD | 83.63 | 74.40 | 62.75 | 85.63 | 71.45 | 53.80 | 71.78 | 32.61 | 54.62 | 65.63 |
| AL-JSD | 75.82 | 74.42 | 62.10 | 83.60 | 69.82 | 52.00 | 72.50 | 45.00 | 51.83 | 65.23 |
| AL-HD | 87.16 | 71.61 | 62.60 | 85.18 | 68.32 | 50.08 | 70.88 | 45.45 | 47.62 | 65.43 |
| CDAL |
|
|
|
|
|
|
| 62.05 |
| 77.43 |
Long-term drift parameter optimization.
| Parameters | Batch 2 | Batch 3 | Batch 4 | Batch 5 | Batch 6 | Batch 7 | Batch 8 | Batch 9 | Batch 10 | |
|---|---|---|---|---|---|---|---|---|---|---|
| Long-term drift |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
| 0.92 | 0.98 | 0.88 | 0.96 | 0.86 | 0.16 | 0.78 | 0.13 | 0.71 |
Comparison of recognition accuracy in short-term drift (%).
| Method | 1–2 | 2–3 | 3–4 | 4–5 | 5–6 | 6–7 | 7–8 | 8–9 | 9–10 | Average |
|---|---|---|---|---|---|---|---|---|---|---|
| SVM | 47.99 | 60.03 | 71.4 | 58.38 | 54.69 | 57.82 | 69.73 | 27.02 | 33.56 | 53.40 |
| ELM | 69.13 | 63.68 | 63.98 | 59.90 | 47.13 | 56.02 | 69.39 | 26.81 | 28.69 | 53.86 |
| CCPCA | 77.65 | 67.15 | 57.14 | 55.33 | 53.26 | 55.47 | 75.51 | 77.45 | 26.14 | 60.57 |
| OSC | 79.94 | 73.64 | 70.19 | 51.78 | 56.22 | 53.67 | 48.64 | 61.28 | 28.89 | 58.23 |
| LDA | 70.90 | 46.78 | 82.61 | 69.04 | 73.09 | 56.35 | 85.71 | 77.23 | 16.67 | 64.26 |
| DS | 42.77 | 43.69 | 47.83 | 21.32 | 28.91 | 27.35 | 48.64 | 16.60 | 35.58 | 34.74 |
| GLSW | 72.67 | 66.08 | 43.48 | 23.35 | 27.52 | 33.63 | 48.64 | 68.94 | 30.58 | 46.10 |
| DRCA | 64.31 | 66.27 | 95.03 | 47.21 | 54.96 | 68.92 | 84.69 | 72.55 | 25.25 | 64.35 |
| CDSL | 79.18 | 77.24 | 97.52 | 65.99 | 74.13 | 86.44 |
| 77.02 | 34.11 | 75.58 |
| AL-KLD | 83.63 | 87.68 | 93.74 | 70.06 | 77.43 | 85.34 | 75.76 | 27.02 | 50.26 | 72.33 |
| AL-JSD | 88.96 | 93.70 | 91.60 | 68.86 | 71.01 | 82.19 | 70.45 | 74.77 | 45.52 | 76.34 |
| AL-HD | 86.05 | 87.52 | 88.17 | 70.66 | 59.52 | 83.02 | 73.86 | 70.57 | 36.15 | 72.84 |
| CDAL |
|
|
|
|
|
| 79.92 |
|
|
|
Short-term drift parameter optimization.
| Parameters | Batch2 | Batch3 | Batch4 | Batch5 | Batch6 | Batch7 | Batch8 | Batch9 | Batch10 | |
|---|---|---|---|---|---|---|---|---|---|---|
| Short-term drift |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
| 0.92 | 0.98 | 0.98 | 0.40 | 0.10 | 0.98 | 0.97 | 0.62 | 0.30 |
Figure 4Effect of on long-term drift.
Classification accuracy with different values of on Setting 1.
|
| 5 | 10 | 20 | 30 | 40 | 50 |
|---|---|---|---|---|---|---|
| Batch 2 | 82.57 | 86.55 | 87.01 | 88.63 | 90.78 | 91.71 |
| Batch 3 | 74.26 | 74.81 | 74.71 | 87.34 | 88.49 | 91.01 |
| Batch 4 | 71.79 | 74.83 | 77.30 | 89.31 | 91.74 | 92.79 |
| Batch 5 | 43.23 | 74.33 | 72.32 | 91.62 | 92.36 | 95.24 |
| Batch 6 | 61.83 | 61.92 | 66.23 | 81.06 | 90.62 | 94.44 |
| Batch 7 | 44.29 | 57.01 | 64.60 | 63.77 | 71.73 | 76.14 |
| Batch 8 | 69.20 | 79.23 | 80.22 | 78.41 | 84.65 | 84.02 |
| Batch 9 | 47.96 | 43.70 | 56.22 | 62.05 | 67.73 | 76.67 |
| Batch 10 | 42.29 | 51.28 | 53.07 | 54.68 | 56.18 | 58.39 |
| Average | 59.71 | 67.07 | 70.19 | 77.43 | 81.59 | 84.49 |
Figure 5Effect of on short-term drift.
Classification accuracy with different values of on Setting 2.
|
| 5 | 10 | 20 | 30 | 40 | 50 |
|---|---|---|---|---|---|---|
| Batch 2 | 82.57 | 86.55 | 87.01 | 88.63 | 90.78 | 91.71 |
| Batch 3 | 87.56 | 90.42 | 92.46 | 96.34 | 98.90 | 98.89 |
| Batch 4 | 90.38 | 94.04 | 95.74 | 98.47 | 99.17 | 98.20 |
| Batch 5 | 65.63 | 64.71 | 68.36 | 72.46 | 70.70 | 68.71 |
| Batch 6 | 59.30 | 72.40 | 86.01 | 89.96 | 91.50 | 93.56 |
| Batch 7 | 84.10 | 84.07 | 84.02 | 87.52 | 91.02 | 94.44 |
| Batch 8 | 69.20 | 74.65 | 76.28 | 79.92 | 85.83 | 92.62 |
| Batch 9 | 68.17 | 73.48 | 78.44 | 81.36 | 86.98 | 94.76 |
| Batch 10 | 40.11 | 42.76 | 48.49 | 51.99 | 55.14 | 58.17 |
| Average | 71.89 | 75.90 | 79.65 | 82.96 | 85.56 | 87.90 |