| Literature DB >> 35684662 |
Dongting Xu1,2, Zhisheng Zhang1, Jinfei Shi1,2.
Abstract
Multiple sensors are often mounted in a complex manufacturing process to detect failures. Due to the high reliability of modern manufacturing processes, failures only happen occasionally. Therefore, data collected in practical manufacturing processes are extremely imbalanced, which often brings about bias of supervised learning models. Data collected by the multiple sensors can be regarded as multivariate time series or multi-sensor stream data. The high dimension of multi-sensor stream data makes building models even more challenging. In this study, a new and easy-to-apply data augmentation approach, namely, imbalanced multi-sensor stream data augmentation (IMSDA), is proposed for imbalanced learning. IMSDA can generate high quality of failure data for all dimensions. The generated data can keep the similar temporal property of the original multivariate time series. Both raw data and generated data are used to train the failure detection models, but the models are tested by the same real dataset. The proposed method is applied to a real-world industry case. Results show that IMSDA can not only obtain good quality failure data to reduce the imbalance level but also significantly improve the performance of supervised failure detection models.Entities:
Keywords: complex manufacturing process; data augmentation; failure detection; imbalanced learning; multi-sensor stream data; multivariate time series; supervised learning
Year: 2022 PMID: 35684662 PMCID: PMC9185280 DOI: 10.3390/s22114042
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
The description of the notations.
| Notation | Description |
|---|---|
|
| The field of real numbers |
|
| Multivariate time series |
|
| Matrix (multivariate time series for special time) |
|
| Matrix |
|
| Diagonal matrix |
|
| Matrix |
|
| Left-singular vectors |
|
| Singular value |
|
| Right-singular vectors |
|
| Cosine similarity (threshold) |
|
| The degree of class imbalance |
|
| The desired balance level |
|
| The size of augmented data |
|
| The rows of the matrix |
|
| The columns of the matrix (the number of variables) |
Figure 1Original multi-sensor stream data from paper manufacturing processes.
The statistic description of original multi-sensor stream data.
| Paper Grade | Normal | Failure | The Degree of Class Imbalance |
|---|---|---|---|
| 96 * | 6502 | 72 | 0.0111 |
| 82 | 4360 | 18 | 0.0041 |
| 118 | 2631 | 15 | 0.0057 |
| 139 | 1797 | 10 | 0.0056 |
| 112 | 1230 | 5 | 0.0041 |
* Means that rows of failure data are more than columns of dataset.
The description of dataset after data augmentation.
| Paper Grade |
|
| Normal | Abnormal | The Degree of Class Imbalance |
|---|---|---|---|---|---|
| 96 * | 0.5 | 18–35 | 6502 | 1332 | 0.2049 |
| 82 | 0.5 | 14–17 | 4360 | 86 | 0.0197 |
| 118 | 0.5 | 10–14 | 2631 | 85 | 0.0323 |
| 139 | 0.5 | 6–9 | 1797 | 46 | 0.0256 |
| 112 | 0.5 | 4 | 1230 | 9 | 0.0073 |
* Means that rows of failure data are more than columns of dataset.
Figure 2The curves of selected original data and the augmented data. (a) The curves of 139 original data and some of the augmented data (x1); (b) the curves of 139 original data and some of the augmented data (x3).
The value of cosine similarity of the paper grade 139.
|
| 1.0000 | 0.9837 | 0.9522 | 0.9521 | 0.9236 | 0.9020 | 0.7935 | 0.6554 |
|
|
| 1.0000 | 0.9979 | 0.9915 | 0.5518 |
| 0.3487 | 0.3567 | 0.2287 | 0.2072 |
|
| 1.0000 | 0.9883 | 0.8937 | 0.5886 |
| 0.3556 | 0.3397 | 0.4897 | 0.4260 |
|
| ⁝ | ⁝ | ⁝ | ⁝ | ⁝ | ⁝ | ⁝ | ⁝ | ⁝ |
|
| 1.0000 | 0.9860 | 0.9800 | 0.9114 | 0.8948 | 0.8915 | 0.5730 | 0.3125 | 0.5484 |
Performance metrics for imbalanced learning.
| Notation | Description |
|---|---|
| Accuracy | (TP + TN)/(TP + FP + TN + FN) |
| Recall | TP/(TP + FN) |
| Precision | TP/(TP + FP) |
| F1 score | 2 × Precision × Recall/(Precision + Recall) |
TP: True Positive; TN: True Negative; FP: False Positive; FN: False Negative.
Figure 3The flowchart for training and evaluating the supervised failure detection models.
The parameters of training model.
| Rank | Parameters | Values |
|---|---|---|
| 1 | cv(cross-validation) | 5 |
| 2 | * Random-state | 0 |
| 3 | Max_iter | 10,000 |
| 4 | Train dataset | 0.6 |
| 5 | Test dataset | 0.2 |
| 6 | Validation dataset | 0.2 |
* For paper grade 139, random-state = 1.
The performance of logistic regression.
| Paper Grade | Model | Accuracy | F1 Score | Precision | Recall |
|---|---|---|---|---|---|
| 82 | Original data trained model clf82 | 0.997 | 0.666 | 0.558 | 0.666 |
| Augmentation data trained model clf82_aug | 0.998 | 0.857 | 0.638 | 1 | |
| 96 | Original data trained model clf96 | 0.993 | 0.400 | 0.340 | 0.330 |
| Augmentation data trained model clf96_aug | 0.995 | 0.530 | 0.542 | 0.444 | |
| 112 | Original data trained model clf112 | 0.996 | 0.004 | 0 | 0 |
| Augmentation data trained clf112_aug | 1 | 1 | 1 | 1 | |
| 118 | Original data trained model clf118 | 0.991 | 0 | 0.035 | 0 |
| Augmentation data trained model clf118_aug | 0.998 | 0.909 | 0.998 | 1 | |
| 139 | Original data trained model clf139 | 0.994 | 0 | 0.02 | 0 |
| Augmentation data trained model clf139_aug | 1 | 1 | 1 | 1 |
The parameters of training model.
| Rank | Parameters | Values |
|---|---|---|
| 1 | Random-state | 0 |
| 2 | Train dataset | 0.8 |
| 3 | Test dataset | 0.2 |
The performance of random forest.
| Paper Grade | Model | Accuracy | F1 Score | Precision | Recall |
|---|---|---|---|---|---|
| 82 | Original data trained model rf82 | 0.997 | 0.571 | 0.667 | 0.500 |
| Augmentation data trained model rf82_aug | 0.995 | 1.000 | 0.500 | 1.000 | |
| 96 | Original data trained model rf96 | 0.995 | 0.769 | 0.833 | 0.714 |
| Augmentation data trained model rf96_aug | 0.101 | 1.000 | 0.012 | 1.000 | |
| 112 | Original data trained model rf112 | 0.996 | 1.000 | 0.000 | 0.000 |
| Augmentation data trained model rf112_aug | 0.996 | 1.000 | 0.000 | 0.000 | |
| 118 | Original data trained model rf118 | 0.994 | 0.400 | 1.000 | 0.250 |
| Augmentation data trained model rf118_aug | 0.998 | 1.000 | 1.000 | 0.750 | |
| 139 | Original data trained model crf139 | 1.000 | 1.000 | 1.000 | 1.000 |
| Augmentation data trained rf139_aug | 0.980 | 1.000 | 0.430 | 1.000 |