Literature DB >> 27000293

Classification of gene expression data: A hubness-aware semi-supervised approach.

Krisztian Buza1.   

Abstract

BACKGROUND AND
OBJECTIVE: Classification of gene expression data is the common denominator of various biomedical recognition tasks. However, obtaining class labels for large training samples may be difficult or even impossible in many cases. Therefore, semi-supervised classification techniques are required as semi-supervised classifiers take advantage of unlabeled data.
METHODS: Gene expression data is high-dimensional which gives rise to the phenomena known under the umbrella of the curse of dimensionality, one of its recently explored aspects being the presence of hubs or hubness for short. Therefore, hubness-aware classifiers have been developed recently, such as Naive Hubness-Bayesian k-Nearest Neighbor (NHBNN). In this paper, we propose a semi-supervised extension of NHBNN which follows the self-training schema. As one of the core components of self-training is the certainty score, we propose a new hubness-aware certainty score.
RESULTS: We performed experiments on publicly available gene expression data. These experiments show that the proposed classifier outperforms its competitors. We investigated the impact of each of the components (classification algorithm, semi-supervised technique, hubness-aware certainty score) separately and showed that each of these components are relevant to the performance of the proposed approach.
CONCLUSIONS: Our results imply that our approach may increase classification accuracy and reduce computational costs (i.e., runtime). Based on the promising results presented in the paper, we envision that hubness-aware techniques will be used in various other biomedical machine learning tasks. In order to accelerate this process, we made an implementation of hubness-aware machine learning techniques publicly available in the PyHubs software package (http://www.biointelligence.hu/pyhubs) implemented in Python, one of the most popular programming languages of data science.
Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

Keywords:  Gene expression; High dimensionality; Machine learning; Semi-supervised classification

Mesh:

Year:  2016        PMID: 27000293     DOI: 10.1016/j.cmpb.2016.01.016

Source DB:  PubMed          Journal:  Comput Methods Programs Biomed        ISSN: 0169-2607            Impact factor:   5.428


  4 in total

1.  Artificial Intelligence-Based Semisupervised Self-Training Algorithm in Pathological Tissue Image Segmentation.

Authors:  Qun Li; Linlin Liu
Journal:  Comput Intell Neurosci       Date:  2022-06-13

2.  Random Subspace Aggregation for Cancer Prediction with Gene Expression Profiles.

Authors:  Liying Yang; Zhimin Liu; Xiguo Yuan; Jianhua Wei; Junying Zhang
Journal:  Biomed Res Int       Date:  2016-11-24       Impact factor: 3.411

3.  Robustification of Naïve Bayes Classifier and Its Application for Microarray Gene Expression Data Analysis.

Authors:  Md Shakil Ahmed; Md Shahjaman; Md Masud Rana; Md Nurul Haque Mollah
Journal:  Biomed Res Int       Date:  2017-08-07       Impact factor: 3.411

4.  The ability to classify patients based on gene-expression data varies by algorithm and performance metric.

Authors:  Stephen R Piccolo; Avery Mecham; Nathan P Golightly; Jérémie L Johnson; Dustin B Miller
Journal:  PLoS Comput Biol       Date:  2022-03-11       Impact factor: 4.475

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.