Literature DB >> 26353217

Towards Making Unlabeled Data Never Hurt.

Yu-Feng Li, Zhi-Hua Zhou.   

Abstract

It is usually expected that learning performance can be improved by exploiting unlabeled data, particularly when the number of labeled data is limited. However, it has been reported that, in some cases existing semi-supervised learning approaches perform even worse than supervised ones which only use labeled data. For this reason, it is desirable to develop safe semi-supervised learning approaches that will not significantly reduce learning performance when unlabeled data are used. This paper focuses on improving the safeness of semi-supervised support vector machines (S3VMs). First, the S3VM-us approach is proposed. It employs a conservative strategy and uses only the unlabeled instances that are very likely to be helpful, while avoiding the use of highly risky ones. This approach improves safeness but its performance improvement using unlabeled data is often much smaller than S3VMs. In order to develop a safe and well-performing approach, we examine the fundamental assumption of S3VMs, i.e., low-density separation. Based on the observation that multiple good candidate low-density separators may be identified from training data, safe semi-supervised support vector machines (S4VMs) are here proposed. This approach uses multiple low-density separators to approximate the ground-truth decision boundary and maximizes the improvement in performance of inductive SVMs for any candidate separator. Under the assumption employed by S3VMs, it is here shown that S4VMs are provably safe and that the performance improvement using unlabeled data can be maximized. An out-of-sample extension of S4VMs is also presented. This extension allows S4VMs to make predictions on unseen instances. Our empirical study on a broad range of data shows that the overall performance of S4VMs is highly competitive with S3VMs, whereas in contrast to S3VMs which hurt performance significantly in many cases, S4VMs rarely perform worse than inductive SVMs.

Year:  2015        PMID: 26353217     DOI: 10.1109/TPAMI.2014.2299812

Source DB:  PubMed          Journal:  IEEE Trans Pattern Anal Mach Intell        ISSN: 0098-5589            Impact factor:   6.226


  11 in total

1.  SSC-EKE: Semi-Supervised Classification with Extensive Knowledge Exploitation.

Authors:  Pengjiang Qian; Chen Xi; Min Xu; Yizhang Jiang; Kuan-Hao Su; Shitong Wang; Raymond F Muzic
Journal:  Inf Sci (N Y)       Date:  2017-09-21       Impact factor: 6.795

2.  Multi-modal classification of neurodegenerative disease by progressive graph-based transductive learning.

Authors:  Zhengxia Wang; Xiaofeng Zhu; Ehsan Adeli; Yingying Zhu; Feiping Nie; Brent Munsell; Guorong Wu
Journal:  Med Image Anal       Date:  2017-05-13       Impact factor: 8.545

Review 3.  Open-environment machine learning.

Authors:  Zhi-Hua Zhou
Journal:  Natl Sci Rev       Date:  2022-07-01       Impact factor: 23.178

4.  Simple strategies for semi-supervised feature selection.

Authors:  Konstantinos Sechidis; Gavin Brown
Journal:  Mach Learn       Date:  2017-07-17       Impact factor: 2.940

5.  An empirical study of ensemble-based semi-supervised learning approaches for imbalanced splice site datasets.

Authors:  Ana Stanescu; Doina Caragea
Journal:  BMC Syst Biol       Date:  2015-09-01

6.  A novel logistic regression model combining semi-supervised learning and active learning for disease classification.

Authors:  Hua Chai; Yong Liang; Sai Wang; Hai-Wei Shen
Journal:  Sci Rep       Date:  2018-08-29       Impact factor: 4.379

7.  Feature space learning model.

Authors:  Renchu Guan; Xu Wang; Maurizio Marchese; Mary Qu Yang; Yanchun Liang; Chen Yang
Journal:  J Ambient Intell Humaniz Comput       Date:  2018-05-09

8.  Improved Transductive Support Vector Machine for a Small Labelled Set in Motor Imagery-Based Brain-Computer Interface.

Authors:  Yilu Xu; Jing Hua; Hua Zhang; Ronghua Hu; Xin Huang; Jizhong Liu; Fumin Guo
Journal:  Comput Intell Neurosci       Date:  2019-11-25

9.  Tumour Relapse Prediction Using Multiparametric MR Data Recorded during Follow-Up of GBM Patients.

Authors:  Adrian Ion-Margineanu; Sofie Van Cauter; Diana M Sima; Frederik Maes; Stefaan W Van Gool; Stefan Sunaert; Uwe Himmelreich; Sabine Van Huffel
Journal:  Biomed Res Int       Date:  2015-08-27       Impact factor: 3.411

10.  A Novel Semi-Supervised Method of Electronic Nose for Indoor Pollution Detection Trained by M-S4VMs.

Authors:  Tailai Huang; Pengfei Jia; Peilin He; Shukai Duan; Jia Yan; Lidan Wang
Journal:  Sensors (Basel)       Date:  2016-09-10       Impact factor: 3.576

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.