| Literature DB >> 23935439 |
Fengqi Li1, Chuang Yu, Nanhai Yang, Feng Xia, Guangming Li, Fatemeh Kaveh-Yazdy.
Abstract
Transductive graph-based semisupervised learning methods usually build an undirected graph utilizing both labeled and unlabeled samples as vertices. Those methods propagate label information of labeled samples to neighbors through their edges in order to get the predicted labels of unlabeled samples. Most popular semi-supervised learning approaches are sensitive to initial label distribution which happened in imbalanced labeled datasets. The class boundary will be severely skewed by the majority classes in an imbalanced classification. In this paper, we proposed a simple and effective approach to alleviate the unfavorable influence of imbalance problem by iteratively selecting a few unlabeled samples and adding them into the minority classes to form a balanced labeled dataset for the learning methods afterwards. The experiments on UCI datasets and MNIST handwritten digits dataset showed that the proposed approach outperforms other existing state-of-art methods.Entities:
Mesh:
Year: 2013 PMID: 23935439 PMCID: PMC3725769 DOI: 10.1155/2013/875450
Source DB: PubMed Journal: ScientificWorldJournal ISSN: 1537-744X
Figure 1A demonstration of imbalanced label dataset affection to transductive GSSL methods on two-moon toy dataset.
Figure 2Workflow of graph-based SSL integrating with INNO.
Algorithm 1Iterative Nearest Neighborhood Oversampling (INNO).
Figure 3INNO algorithm illustration.
Confusion matrix.
| True class | ||
|---|---|---|
|
| TP (true positives) | FP (false positives) |
|
| FN (false negatives) | TN (true negatives) |
| Row sum |
|
|
Figure 5Influence of unbalanced labeled samples to the classification accuracy.
Figure 4Influence of imbalanced ratio in labeled dataset to the classification accuracy.
Figure 6Influence of neighbors k to classification accuracy.
Figure 7Influence of stop parameter s to classification accuracy.