Literature DB >> 23418189

Classification of mislabelled microarrays using robust sparse logistic regression.

Jakramate Bootkrajang1, Ata Kabán.   

Abstract

MOTIVATION: Previous studies reported that labelling errors are not uncommon in microarray datasets. In such cases, the training set may become misleading, and the ability of classifiers to make reliable inferences from the data is compromised. Yet, few methods are currently available in the bioinformatics literature to deal with this problem. The few existing methods focus on data cleansing alone, without reference to classification, and their performance crucially depends on some tuning parameters.
RESULTS: In this article, we develop a new method to detect mislabelled arrays simultaneously with learning a sparse logistic regression classifier. Our method may be seen as a label-noise robust extension of the well-known and successful Bayesian logistic regression classifier. To account for possible mislabelling, we formulate a label-flipping process as part of the classifier. The regularization parameter is automatically set using Bayesian regularization, which not only saves the computation time that cross-validation would take, but also eliminates any unwanted effects of label noise when setting the regularization parameter. Extensive experiments with both synthetic data and real microarray datasets demonstrate that our approach is able to counter the bad effects of labelling errors in terms of predictive performance, it is effective at identifying marker genes and simultaneously it detects mislabelled arrays to high accuracy. AVAILABILITY: The code is available from http://cs.bham.ac.uk/∼jxb008. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Mesh:

Year:  2013        PMID: 23418189     DOI: 10.1093/bioinformatics/btt078

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  3 in total

1.  Identification and Correction of Sample Mix-Ups in Expression Genetic Data: A Case Study.

Authors:  Karl W Broman; Mark P Keller; Aimee Teo Broman; Christina Kendziorski; Brian S Yandell; Śaunak Sen; Alan D Attie
Journal:  G3 (Bethesda)       Date:  2015-08-19       Impact factor: 3.154

2.  Molecular pathway identification using biological network-regularized logistic models.

Authors:  Wen Zhang; Ying-Wooi Wan; Genevera I Allen; Kaifang Pang; Matthew L Anderson; Zhandong Liu
Journal:  BMC Genomics       Date:  2013-12-09       Impact factor: 3.969

3.  Simultaneous parameter estimation and variable selection via the logit-normal continuous analogue of the spike-and-slab prior.

Authors:  W Thomson; S Jabbari; A E Taylor; W Arlt; D J Smith
Journal:  J R Soc Interface       Date:  2019-01-31       Impact factor: 4.118

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.