| Literature DB >> 27627768 |
Wenjing Han1, Eduardo Coutinho2,3, Huabin Ruan4, Haifeng Li5, Björn Schuller3,5,6, Xiaojie Yu1, Xuan Zhu1.
Abstract
Coping with scarcity of labeled data is a common problem in sound classification tasks. Approaches for classifying sounds are commonly based on supervised learning algorithms, which require labeled data which is often scarce and leads to models that do not generalize well. In this paper, we make an efficient combination of confidence-based Active Learning and Self-Training with the aim of minimizing the need for human annotation for sound classification model training. The proposed method pre-processes the instances that are ready for labeling by calculating their classifier confidence scores, and then delivers the candidates with lower scores to human annotators, and those with high scores are automatically labeled by the machine. We demonstrate the feasibility and efficacy of this method in two practical scenarios: pool-based and stream-based processing. Extensive experimental results indicate that our approach requires significantly less labeled instances to reach the same performance in both scenarios compared to Passive Learning, Active Learning and Self-Training. A reduction of 52.2% in human labeled instances is achieved in both of the pool-based and stream-based scenarios on a sound classification task considering 16,930 sound instances.Entities:
Mesh:
Year: 2016 PMID: 27627768 PMCID: PMC5023122 DOI: 10.1371/journal.pone.0162075
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Overview of state-of-the-art research in sound classification.
For features, BoAP: bag-of-audio-phrases descriptor, UFL: unsupervised feature learning, E: energy, SF: spectral features, ZCR: zero-crossing rate, TFB-ED: triangle filter bank and eigen-decomposition, MFCC: mel-frequency cepstral coefficients, STE: subband temporal envelopes, and for classifiers, SVM: support vector machines, RF: random forest, KFDA: kernel Fisher discriminant anlysis, HMM: hidden Markov models, for learning methods, FS: fully supervised learning.
| Work | #Clips | #Classes | Features | Classifiers | Learning methods | Domains |
|---|---|---|---|---|---|---|
| [ | 1,479 | 22 | BoAP | SVM | FS | human activity |
| [ | 8,732 | 10 | UFL | RF | FS | urban environment |
| [ | 5,949 | 62 | E+SF+ZCR | SVM | FS | surveillance |
| [ | 650 | 3 | TFB-ED | KFDA | FS | environment |
| [ | 115/10,500 | 7/105 | MFCC | HMM | FS | healthcare |
| [ | 705 | 10 | STE | SVM | FS | canteen |
Overview of previous work combining Active and Semi-Supervised Learning techniques, and the work proposed in this paper.
AL: Active Learning, SSL: Semi-Supervised Learning, QBC: Query-By-Committee, EM: Expectation Maximization, SBC: Similarity-based Classifier, CRFs: Conditional Random Fields, SVM: Support Vector Machines.
| Article | AL method | SSL method | Scenario | Classifier | Domain | Year |
|---|---|---|---|---|---|---|
| [ | QBC | EM | pool | naive Bayes | text classification | 1998 |
| [ | Co-Testing | Co-EM | pool | naive Bayes | Web pages & pictures classification. | 2002 |
| [ | Co-Testing | Co-Training | pool | SBC | content-based image retrieval | 2004 |
| [ | Certainty-based | Self-training | fixed & dynamic pool | Boosting | spoken language understanding | 2005 |
| [ | Certainty-based | Self-training | stream | CRFs | natural language processing | 2009 |
| this work | Certainty-based | Self-training | pool & stream | SVM | sound classification | 2015 |
Certainty-based Active Learning algorithm in a pool-based scenario.
| |
| |
| |
| |
| Classify each instance in |
| Select those instances with |
| Refer to the new labeled set as |
| |
| Re-train classifier |
Certainty-based Active Learning algorithm in a stream-based scenario.
| |
| |
| |
| |
| |
| Classify current instance from |
| |
| Retain current instance in buffer |
| |
| Discard current instance. |
| |
| |
| Submit instances in |
| Refer to the new labeled set as |
| |
| Re-train classifier |
| |
Semi-Supervised Learning strategy.
| |
| |
| |
| |
| Classify every instance in |
| Select those instances with |
| Refer to the machine-labeled set as |
| |
| Re-train classifier |
Semi-Supervised Active Learning in a pool-based scenario.
| |
| |
| |
| |
|
Classify every instance in |
|
Select instances with |
|
Refer to the new labeled set as |
|
|
| |
| Select those instances with |
| Refer to the machine-labeled set as |
| |
| |
* Note that the model is re-trained twice at each learning iteration.
Semi-Supervised Active Learning in a stream-based scenario.
| |
| |
| |
| |
| |
|
Classify current instance from |
| Retain current instance in buffer |
| |
| |
| Refer to the human-labeled set as |
| |
|
|
| Automatically label those instances with |
| Refer to the machine-labeled set as |
|
|
|
|
| |
* Note that the model is re-trained twice at each learning iteration.
Description of the subset of the FindSounds database used in this paper.
| Category | # Subsets | # Clips | Duration [h] |
|---|---|---|---|
| 45 | 2,540 | 2 h 09 min | |
| 85 | 2,834 | 2 h 42 min | |
| 19 | 937 | 1 h 17 min | |
| 34 | 2,166 | 2 h 47 min | |
| 13 | 2,010 | 1 h 56 min | |
| 18 | 1,769 | 1 h 01 min | |
| 62 | 4,674 | 3 h 49 min | |
Fig 1Relationship between classifier’s classification UARs and confidence scores for 500 and 5,000 initial training instances.
Fig 2Distribution percentage of classifier confidence scores for 500 (blue) and 5,000 (red) training instances.
(There is no instance assigned with confidence score falling in the range of [0.0, 0.1].)
Fig 3Learning curves for using active and passive learning method in pool-based scenario.
Fig 4Learning curves for using active and passive learning method in stream-based scenario.
Fig 5Semi-supervised learning results for varying sizes of the initial training set (different number of human labeled instances) in combination with different confidence thresholds.
Fig 6Learning curves for semi-supervised active learning (in each round 500 instances with lowest confidence scores are selected for human annotation and a variable number of instances with confidence scores above the threshold 0.95 are selected for machine annotation), active learning, and passive learning in the pool-based scenario.
Fig 7Learning curves for semi-supervised active learning with different thresholds in pool-based scenario.
Fig 8Learning curves for semi-supervised active learning (in each round 500 instances with lowest confidence scores are selected for human annotation, and 100 instances with the highest confidence scores are selected for machine annotation), active learning, and passive learning in stream-based scenario.
Best performances up to statistic significance achieved using semi-supervised active learning (SSAL), active learning (AL), and passive learning (PL) in pool-based and stream-based scenarios, as well as the number of human-labeled instances (#HLI) needed to achieve that performance.
| 69.4 | 69.3 | 68.5 | |
| 6,500 | 7,500 | 11,500 | |
| 68.7 | 68.7 | 68.5 | |
| 6,000 | 7,000 | 11,500 | |