| Literature DB >> 36236265 |
Cheng Ding1, Tania Pereira2, Ran Xiao3, Randall J Lee4, Xiao Hu3,5,6.
Abstract
Label noise is omnipresent in the annotations process and has an impact on supervised learning algorithms. This work focuses on the impact of label noise on the performance of learning models by examining the effect of random and class-dependent label noise on a binary classification task: quality assessment for photoplethysmography (PPG). PPG signal is used to detect physiological changes and its quality can have a significant impact on the subsequent tasks, which makes PPG quality assessment a particularly good target for examining the impact of label noise in the field of biomedicine. Random and class-dependent label noise was introduced separately into the training set to emulate the errors associated with fatigue and bias in labeling data samples. We also tested different representations of the PPG, including features defined by domain experts, 1D raw signal and 2D image. Three different classifiers are tested on the noisy training data, including support vector machine (SVM), XGBoost, 1D Resnet and 2D Resnet, which handle three representations, respectively. The results showed that the two deep learning models were more robust than the two traditional machine learning models for both the random and class-dependent label noise. From the representation perspective, the 2D image shows better robustness compared to the 1D raw signal. The logits from three classifiers are also analyzed, the predicted probabilities intend to be more dispersed when more label noise is introduced. From this work, we investigated various factors related to label noise, including representations, label noise type, and data imbalance, which can be a good guidebook for designing more robust methods for label noise in future work.Entities:
Keywords: Biomedical Signal; binary classification; label noise; learning models; supervised learning
Mesh:
Year: 2022 PMID: 36236265 PMCID: PMC9572105 DOI: 10.3390/s22197166
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1Overview of the pipeline of this study. Starting with data collection, followed by the annotation process of the quality assessment for each PPG segment and the experiments for the artificial label noise and the three learning models used in this study. The final output of the three learning models was analyzed and the results were compared.
The distribution of signal quality in the analyzed patient cohorts.
| Data | Number of Patients | Number of Records | Bad Quality Records | Good Quality Records |
|---|---|---|---|---|
| ICU—Training set | 3764 | 78,872 | 23,764 (30%) | 55,108 (70%) |
| Neuro ICU—Test set | 13 | 2683 | 815 (30%) | 1868 (70%) |
Figure 2Performance of three classifiers according to levels of label noise in the training data. (a) Random flipping. (b) Good-to-bad flipping. (c) Bad-to-good flipping.
Mean accuracy values in relation to an increase in label noise in the training data.
| Accuracy (Random Flipping) | ||||
|---|---|---|---|---|
|
|
|
|
|
|
| 10% | 96.84% ± 0.01 | 97.20% ± 0.008 | 91.78% ± 0.01 | 94.88% ± 0.01 |
| 50% | 73.26% ± 0.02 | 76.69% ± 0.09 | 61.37% ± 0.18 | 71.75% ± 0.01 |
|
| ||||
|
|
|
|
|
|
| 10% | 96.79% ± 0.01 | 96.61% ± 0.007 | 94.26% ± 0.01 | 95.99% ± 0.01 |
| 50% | 85.38% ± 0.2 | 92.13% ± 0.01 | 77.20% ± 0.13 | 83.10% ± 0.01 |
|
| ||||
|
|
|
|
|
|
| 10% | 97.06% ± 0.01 | 97.89% ± 0.003 | 94.38% ± 0.01 | 95.60% ± 0.01 |
| 50% | 87.30% ± 0.13 | 89.42% ± 0.05 | 81.30% ± 0.03 | 87.07% ± 0.01 |
Figure 3Impact of the absolute number of mislabeled records upon the best learning model on the cleanest training set from the previous work [17,18]. The three types of label noise were used and the performance degradation was verified, with different slopes accordingly to the type of noise induced.
Figure 4Scatter plots for the results of different probability predictions after adding a percentage of label noise to the PPG records in the initial training data: (a) trained on clean data; (b) trained with 10% random label noise; (c) trained with 10% good-to-bad switching as label noise; (d) trained with 10% bad-to-good switching as label noise.
Statistical result between different models based on pairwise t-test.
| Model | 1D Resnet | 2D Resnet | SVM | XGBoost | ||||
|---|---|---|---|---|---|---|---|---|
| t-Statistic | t-Statistic | t-Statistic | t-Statistic | |||||
| 1D Resnet | - | - | 2.5282 | 0.0142 | 8.6594 | <0.05 | 12.6235 | <0.05 |
| 2D Resnet | - | - | 9.8804 | <0.05 | 11.9236 | <0.05 | ||
| SVM | - | - | 7.2637 | <0.05 | ||||
| Xgboost | - | - | ||||||
Statistical result between different label noise on pairwise t-test.
| Label Noise | ‘Bad to Good’ | ‘Good to Bad’ | ‘Random’ | |||
|---|---|---|---|---|---|---|
| t-Statistic | t-Statistic | t-Statistic | ||||
| ‘bad to good’ | - | - | 0.9127 | 0.2625 | 3.2615 | <0.05 |
| ‘good to bad’ | - | - | 3.3762 | <0.05 | ||
| ‘Random’ | - | - | ||||