| Literature DB >> 35890845 |
Feng Duan1, Shuai Zhang1, Yinze Yan1, Zhiqiang Cai1.
Abstract
With the development of machine learning, data-driven mechanical fault diagnosis methods have been widely used in the field of PHM. Due to the limitation of the amount of fault data, it is a difficult problem for fault diagnosis to solve the problem of unbalanced data sets. Under unbalanced data sets, faults with little historical data are always difficult to diagnose and lead to economic losses. In order to improve the prediction accuracy under unbalanced data sets, this paper proposes MeanRadius-SMOTE based on the traditional SMOTE oversampling algorithm, which effectively avoids the generation of useless samples and noise samples. This paper validates the effectiveness of the algorithm on three linear unbalanced data sets and four step unbalanced data sets. Experimental results show that MeanRadius-SMOTE outperforms SMOTE and LR-SMOTE in various evaluation indicators, as well as has better robustness against different imbalance rates. In addition, MeanRadius-SMOTE can take into account the prediction accuracy of the overall and minority class, which is of great significance for engineering applications.Entities:
Keywords: MeanRadius-SMOTE; mechanical fault diagnosis; minority class; unbalanced data set
Year: 2022 PMID: 35890845 PMCID: PMC9324964 DOI: 10.3390/s22145166
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1The flow chart of the MeanRadius-SMOTE algorithm.
Figure 2New samples distribution under different .
Figure 3New samples on oversampling algorithms: (a) SMOTE, (b) LR-SMOTE, (c) MeanRadius-SMOTE.
The information of the two-dimension feature samples.
| Sample 1 | Sample 2 | Sample 3 | Sample 4 | Sample 5 | Sample 6 | Sample 7 | Sample 8 | |
|---|---|---|---|---|---|---|---|---|
| Feature 1 | 3 | 4 | 6 | 7 | 5 | 2 | 3 | 5.5 |
| Feature 2 | 6 | 3 | 2 | 4 | 5 | 2 | −1 | 0 |
Figure 4Gearbox used in PHM 2009 challenge data.
A brief description of the faults.
| Label | Description |
|---|---|
| Label 1 | Good |
| Label 2 | Gear chipped and eccentric |
| Label 3 | Gear eccentric |
| Label 4 | Gear eccentric and broken, bearing ball fault |
| Label 5 | Gear chipped and eccentric and broken, bearing inner and ball and outer fault |
| Label 6 | Gear broken, bearing inner and ball and outer fault, shaft imbalance |
| Label 7 | Bearing inner fault, shaft keyway sheared |
| Label 8 | Bearing ball and outer fault, shaft imbalance |
The time–frequency domain features.
| Time-Domain Feature | Frequency-Domain Feature | ||
|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
| where | where | ||
Predicting results for class i samples.
| Positive Prediction | Negative Prediction | |
|---|---|---|
| Positive class | TP | FN |
| Negative class | FP | TN |
Figure 5Two imbalance forms: (a) linear imbalance, (b) step imbalance.
Unbalanced data sets description.
| Imbalance Forms | Name | Number of Samples | Imbalance Rate | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Label 1 | Label 2 | Label 3 | Label 4 | Label 5 | Label 6 | Label 7 | Label 8 | |||
| linear | line-1 | 1500 | 465 | 258 | 50 | 672 | 879 | 1293 | 1086 | 30 |
| line-2 | 1000 | 864 | 592 | 50 | 728 | 321 | 185 | 457 | 20 | |
| line-3 | 750 | 550 | 450 | 50 | 150 | 350 | 650 | 250 | 15 | |
| step | stage-1 | 1500 | 50 | 1500 | 50 | 1500 | 1500 | 1500 | 50 | 30 |
| stage-2 | 750 | 50 | 750 | 50 | 750 | 750 | 750 | 50 | 15 | |
| stage-3 | 1500 | 50 | 1500 | 50 | 50 | 50 | 50 | 1500 | 30 | |
| stage-4 | 750 | 50 | 750 | 50 | 50 | 50 | 50 | 750 | 15 | |
Experimental results of the linear unbalanced data set.
| Data | Methods | SVM | RF | GBDT | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Acc | Mac-P | Mac-F1 | Acc | Mac-P | Mac-F1 | Acc | Mac-P | Mac-F1 | ||
| line-1 | None | 0.8675 | 0.8896 | 0.8484 | 0.7726 | 0.8122 | 0.7256 | 0.8126 | 0.8243 | 0.7842 |
| SMOTE | 0.9045 | 0.9148 | 0.8997 | 0.8555 | 0.8685 | 0.8471 | 0.8774 | 0.8849 | 0.8731 | |
| LR-SMOTE | 0.9065 | 0.9161 | 0.9012 | 0.8339 | 0.8528 | 0.8182 | 0.8662 | 0.8746 | 0.8591 | |
| MR-SMOTE | 0.9206 | 0.9233 | 0.9186 | 0.8678 | 0.8730 | 0.8643 | 0.8739 | 0.8773 | 0.8698 | |
| line-2 | None | 0.8733 | 0.8945 | 0.8607 | 0.7668 | 0.8075 | 0.7243 | 0.8209 | 0.8412 | 0.8011 |
| SMOTE | 0.8891 | 0.9062 | 0.8836 | 0.8548 | 0.8629 | 0.8501 | 0.8626 | 0.8713 | 0.8566 | |
| LR-SMOTE | 0.8923 | 0.9059 | 0.8865 | 0.8354 | 0.8497 | 0.8271 | 0.8588 | 0.8685 | 0.852 | |
| MR-SMOTE | 0.9139 | 0.9160 | 0.9131 | 0.8675 | 0.8702 | 0.8657 | 0.8733 | 0.8780 | 0.8698 | |
| line-3 | None | 0.8754 | 0.8890 | 0.8644 | 0.792 | 0.8261 | 0.7583 | 0.8344 | 0.8496 | 0.818 |
| SMOTE | 0.8995 | 0.9069 | 0.8954 | 0.8646 | 0.8726 | 0.8618 | 0.8748 | 0.8782 | 0.8716 | |
| LR-SMOTE | 0.8988 | 0.9058 | 0.8947 | 0.8464 | 0.8575 | 0.8415 | 0.8683 | 0.8730 | 0.8639 | |
| MR-SMOTE | 0.9175 | 0.9183 | 0.9168 | 0.8691 | 0.8720 | 0.8679 | 0.8803 | 0.8807 | 0.8784 | |
Experimental results of the step unbalanced data set.
| Data | Methods | SVM | RF | GBDT | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Acc | Mac-P | Mac-F1 | Acc | Mac-P | Mac-F1 | Acc | Mac-P | Mac-F1 | ||
| Stage-1 | None | 0.7403 | 0.8207 | 0.7066 | 0.6144 | 0.7610 | 0.5051 | 0.6793 | 0.7553 | 0.6194 |
| SMOTE | 0.8418 | 0.8685 | 0.8332 | 0.7614 | 0.8166 | 0.7447 | 0.8250 | 0.8487 | 0.8182 | |
| LR-SMOTE | 0.8566 | 0.8789 | 0.8512 | 0.7103 | 0.7950 | 0.6729 | 0.8021 | 0.8403 | 0.7915 | |
| MR-SMOTE | 0.9039 | 0.9062 | 0.9023 | 0.844 | 0.8592 | 0.8408 | 0.8596 | 0.8706 | 0.8561 | |
| Stage-2 | None | 0.7746 | 0.8365 | 0.7528 | 0.6398 | 0.7694 | 0.5538 | 0.7193 | 0.7936 | 0.6838 |
| SMOTE | 0.8575 | 0.8760 | 0.8525 | 0.7790 | 0.8242 | 0.7682 | 0.8368 | 0.8551 | 0.8330 | |
| LR-SMOTE | 0.8649 | 0.8833 | 0.8602 | 0.7429 | 0.8073 | 0.7202 | 0.8205 | 0.8481 | 0.8142 | |
| MR-SMOTE | 0.9064 | 0.9078 | 0.9051 | 0.838 | 0.8529 | 0.8357 | 0.8621 | 0.8723 | 0.8601 | |
| Stage-3 | None | 0.6465 | 0.8034 | 0.6369 | 0.4534 | 0.7669 | 0.3607 | 0.5651 | 0.6999 | 0.5259 |
| SMOTE | 0.7828 | 0.8481 | 0.7847 | 0.7390 | 0.8181 | 0.7336 | 0.8048 | 0.8297 | 0.8022 | |
| LR-SMOTE | 0.8118 | 0.8546 | 0.8116 | 0.6766 | 0.8001 | 0.6654 | 0.7641 | 0.8044 | 0.7583 | |
| MR-SMOTE | 0.8771 | 0.8826 | 0.8759 | 0.8163 | 0.8366 | 0.8151 | 0.8351 | 0.8431 | 0.8327 | |
| Stage-4 | None | 0.6823 | 0.8124 | 0.6767 | 0.5186 | 0.7697 | 0.4671 | 0.6491 | 0.7391 | 0.6334 |
| SMOTE | 0.8221 | 0.8606 | 0.8223 | 0.767 | 0.8119 | 0.7634 | 0.8098 | 0.8315 | 0.8082 | |
| LR-SMOTE | 0.8440 | 0.8700 | 0.8434 | 0.7186 | 0.7957 | 0.7131 | 0.7871 | 0.8205 | 0.7848 | |
| MR-SMOTE | 0.8766 | 0.8829 | 0.8762 | 0.8135 | 0.8278 | 0.8119 | 0.8436 | 0.8513 | 0.8425 | |
Pre on the data sets.
| Data | Methods | SVM | RF | GBDT | Data | Methods | SVM | RF | GBDT |
|---|---|---|---|---|---|---|---|---|---|
| line-1 | None | 0.277 | 0.048 | 0.184 | stage-1 | None | 0.329 | 0.039 | 0.154 |
| SMOTE | 0.563 | 0.472 | 0.588 | SMOTE | 0.555 | 0.484 | 0.545 | ||
| LR-SMOTE | 0.553 | 0.338 | 0.508 | LR-SMOTE | 0.584 | 0.322 | 0.506 | ||
| MR-SMOTE | 0.703 | 0.625 | 0.603 | MR-SMOTE | 0.781 | 0.615 | 0.619 | ||
| line-2 | None | 0.358 | 0.052 | 0.261 | stage-2 | None | 0.403 | 0.062 | 0.259 |
| SMOTE | 0.427 | 0.554 | 0.519 | SMOTE | 0.616 | 0.506 | 0.637 | ||
| LR-SMOTE | 0.501 | 0.449 | 0.433 | LR-SMOTE | 0.606 | 0.36 | 0.555 | ||
| MR-SMOTE | 0.768 | 0.681 | 0.612 | MR-SMOTE | 0.791 | 0.695 | 0.690 | ||
| line-3 | None | 0.403 | 0.110 | 0.308 | stage-3 | None | 0.445 | 0.042 | 0.368 |
| SMOTE | 0.583 | 0.607 | 0.618 | SMOTE | 0.657 | 0.738 | 0.697 | ||
| LR-SMOTE | 0.579 | 0.525 | 0.565 | LR-SMOTE | 0.662 | 0.605 | 0.616 | ||
| MR-SMOTE | 0.791 | 0.701 | 0.681 | MR-SMOTE | 0.768 | 0.738 | 0.708 | ||
| stage-4 | None | 0.476 | 0.164 | 0.42 | |||||
| SMOTE | 0.732 | 0.745 | 0.697 | ||||||
| LR-SMOTE | 0.754 | 0.648 | 0.705 | ||||||
| MR-SMOTE | 0.783 | 0.760 | 0.778 | ||||||
Figure 6The line charts of Mac-P, Mac-F1, and Pre.