| Literature DB >> 26985898 |
Pengfei Jia1, Tailai Huang2, Shukai Duan3, Lingpu Ge4, Jia Yan5, Lidan Wang6.
Abstract
When an electronic nose (E-nose) is used to distinguish different kinds of gases, the label information of the target gas could be lost due to some fault of the operators or some other reason, although this is not expected. Another fact is that the cost of getting the labeled samples is usually higher than for unlabeled ones. In most cases, the classification accuracy of an E-nose trained using labeled samples is higher than that of the E-nose trained by unlabeled ones, so gases without label information should not be used to train an E-nose, however, this wastes resources and can even delay the progress of research. In this work a novel multi-class semi-supervised learning technique called M-training is proposed to train E-noses with both labeled and unlabeled samples. We employ M-training to train the E-nose which is used to distinguish three indoor pollutant gases (benzene, toluene and formaldehyde). Data processing results prove that the classification accuracy of E-nose trained by semi-supervised techniques (tri-training and M-training) is higher than that of an E-nose trained only with labeled samples, and the performance of M-training is better than that of tri-training because more base classifiers can be employed by M-training.Entities:
Keywords: electronic nose; indoor pollution gas; semi-supervised learning; unlabeled samples
Mesh:
Substances:
Year: 2016 PMID: 26985898 PMCID: PMC4813945 DOI: 10.3390/s16030370
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Main sensitive characteristics of gas sensors.
| Sensors | Main Sensitive Characteristics |
|---|---|
| TGS2620 | Carbon monoxide, ethanol, methane, isobutane, VOCs |
| TGS2602 | Ammonia, formaldehyde, toluene, ethanol, hepatic gas, VOCs |
| TGS2201 | Carbon monoxide, nitric oxide, nitrogen dioxide |
Note: The response of these three sensors is non-specific. Table 1 lists their main sensitive gas, but they are also sensitive to other gases.
Figure 1Schematic diagram of the experimental system.
Figure 2Image of the experimental setup.
Figure 3Response of the sensors array.
Concentration of the target gas.
| Gas | Concentration Range (ppm) | Number of Samples |
|---|---|---|
| Benzene | [0.1721, 0.7056] | 144 (12 × 12) |
| Toluene | [0.0668, 0.1425] | 132 (12 × 11) |
| Formaldehyde | [0.0565, 1.2856] | 252 (12 × 21) |
Amount of samples in training set and test set.
| Gas | Training Set | Test Set |
|---|---|---|
| Benzene | 108 | 36 |
| Toluene | 100 | 32 |
| Formaldehyde | 188 | 64 |
| All-3 | 396 | 132 |
Classification accuracy of different classifiers (%).
| PLS-DA | RBFNN | SVM | |
|---|---|---|---|
| Classification accuracy of training data set | 87.88 | 92.03 | 96.59 |
| Classification accuracy of test data set | 87.88 | 89.02 | 96.21 |
Figure 4Flow chart of SSL process.
Performance of tri-training and M-training with different number of base classifiers (%).
| Classification Accuracy (Initial) | Classification Accuracy (Final) | Impro | |
|---|---|---|---|
| Tri-training | 73.48 | 91.67 | 24.76 |
| M-training (4 base classifiers) | 74.24 | 96.97 | 30.62 |
| M-training (5 base classifiers) | 74.24 | 96.97 | 30.62 |
| M-training (6 base classifiers) | 74.24 | 96.97 | 30.62 |
Note: Impro = (final accuracy-initial accuracy)/initial accuracy; The initial classification accuracy of the test data set is obtained when just set L is used to train the base classifiers, and the final classification accuracy is obtained when set U is adopted to refine the base classifiers which have been trained by set L.
Amount of samples in each data set.
| Amount of Samples in Training Data Set | Amount of Samples in | Amount of Samples in | Amount of Samples in Test Data Set | |
|---|---|---|---|---|
| Benzene | 108 | 27/54/81 | 81/54/27 | 36 |
| Toluene | 100 | 25/50/75 | 75/50/25 | 32 |
| Formaldehyde | 188 | 47/94/141 | 141/94/47 | 64 |
| All-3 | 396 | 99/198/297 | 297/198/99 | 132 |
Note: 25%/50%/75% are three different unlabeled rates.
Classification accuracy of M-training with 75%-unlabeled rate (%).
| Training Data Set | Test Data Set | ||||
|---|---|---|---|---|---|
| Classification Accuracy (Initial) | Classification Accuracy (Final) | Classification Accuracy (Initial) | Classification Accuracy (Final) | Impro | |
| Benzene | 100 | 100 | 44.44 | 72.22 | 62.51 |
| Toluene | 100 | 100 | 43.75 | 46.88 | 7.15 |
| Formaldehyde | 100 | 100 | 75 | 95.31 | 27.08 |
| All-3 | 100 | 100 | 59.09 | 80.3 | 35.89 |
Classification accuracy of M-training with 50%-unlabeled rate (%).
| Training Data Set | Test Data Set | ||||
|---|---|---|---|---|---|
| Classification Accuracy (Initial) | Classification Accuracy (Final) | Classification Accuracy (Initial) | Classification Accuracy (Final) | Impro | |
| Benzene | 100 | 100 | 50 | 88.89 | 77.78 |
| Toluene | 100 | 100 | 81.25 | 100 | 23.08 |
| Formaldehyde | 100 | 100 | 84.78 | 100 | 17.95 |
| All-3 | 100 | 100 | 74.24 | 96.97 | 30.62 |
Classification accuracy of M-training with 25%-unlabeled rate (%).
| Training Data Set | Test Data Set | ||||
|---|---|---|---|---|---|
| Classification Accuracy (Initial) | Classification Accuracy (Final) | Classification Accuracy (Initial) | Classification Accuracy (Final) | Impro | |
| Benzene | 100 | 100 | 69.44 | 100 | 44.01 |
| Toluene | 100 | 100 | 81.25 | 100 | 23.08 |
| Formaldehyde | 100 | 100 | 82.81 | 92.19 | 11.33 |
| All-3 | 100 | 100 | 78.79 | 96.21 | 22.11 |
Amount of samples in each c of M-training with different unlabeled rates.
| 0.25 | 0.5 | 0.75 | ||||
|---|---|---|---|---|---|---|
| Initial | Final | Initial | Final | Initial | Final | |
| 223 | 322 (99) | 149 | 603 (454) | 74 | 246 (172) | |
| 223 | 322 (99) | 149 | 214 (65) | 74 | 354 (280) | |
| 223 | 322 (99) | 149 | 407 (258) | 74 | 236 (162) | |
| 223 | 223 (0) | 149 | 153 (4) | 74 | 74 (0) | |
Note: 322 (99) means there are 322 samples in the training data set of c, and 99 samples more than its initial training data set (223).
Figure 5Classification accuracy of different gas in the test data set. (a), (b) and (c) show the classification accuracy of benzene, toluene and formaldehyde, respectively, and (d) shows the classification accuracy of all gas. In each figure, the accuracy is improved with the help of M-training, and the improvement is most obvious when the unlabeled rate is 50%.