Literature DB >> 16820424

Detecting potential labeling errors in microarrays by data perturbation.

Andrea Malossini1, Enrico Blanzieri, Raymond T Ng.   

Abstract

MOTIVATION: Classification is widely used in medical applications. However, the quality of the classifier depends critically on the accurate labeling of the training data. But for many medical applications, labeling a sample or grading a biopsy can be subjective. Existing studies confirm this phenomenon and show that even a very small number of mislabeled samples could deeply degrade the performance of the obtained classifier, particularly when the sample size is small. The problem we address in this paper is to develop a method for automatically detecting samples that are possibly mislabeled.
RESULTS: We propose two algorithms, a classification-stability algorithm and a leave-one-out-error-sensitivity algorithm for detecting possibly mislabeled samples. For both algorithms, the key structure is the computation of the leave-one-out perturbation matrix. The classification-stability algorithm is based on measuring the stability of the label of a sample with respect to label changes of other samples and the version of this algorithm based on the support vector machine appears to be quite accurate for three real datasets. The suspect list produced by the version is of high quality. Furthermore, when human intervention is not available, the correction heuristic appears to be beneficial.

Entities:  

Mesh:

Year:  2006        PMID: 16820424     DOI: 10.1093/bioinformatics/btl346

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  5 in total

Review 1.  Deep learning with noisy labels: Exploring techniques and remedies in medical image analysis.

Authors:  Davood Karimi; Haoran Dou; Simon K Warfield; Ali Gholipour
Journal:  Med Image Anal       Date:  2020-06-20       Impact factor: 8.545

2.  Accounting for control mislabeling in case-control biomarker studies.

Authors:  Mattias Rantalainen; Chris C Holmes
Journal:  J Proteome Res       Date:  2011-11-08       Impact factor: 4.466

3.  An integrated approach for identifying wrongly labelled samples when performing classification in microarray data.

Authors:  Yuk Yee Leung; Chun Qi Chang; Yeung Sam Hung
Journal:  PLoS One       Date:  2012-10-17       Impact factor: 3.240

4.  A kernel-based approach for detecting outliers of high-dimensional biological data.

Authors:  Jung Hun Oh; Jean Gao
Journal:  BMC Bioinformatics       Date:  2009-04-29       Impact factor: 3.169

5.  The use of haplotype-specific transcripts improves sample annotation consistency.

Authors:  Nicole Hartmann; Evert Luesink; Edward Khokhlovich; Joseph D Szustakowski; Lukas Baeriswyl; Joshua Peterson; Andreas Scherer; Nirmala R Nanguneri; Frank Staedtler
Journal:  Biomark Res       Date:  2014-09-30
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.