Literature DB >> 19661242

Methods for labeling error detection in microarrays based on the effect of data perturbation on the regression model.

Chen Zhang1, Chunguo Wu, Enrico Blanzieri, You Zhou, Yan Wang, Wei Du, Yanchun Liang.   

Abstract

MOTIVATION: Mislabeled samples often appear in gene expression profile because of the similarity of different sub-type of disease and the subjective misdiagnosis. The mislabeled samples deteriorate supervised learning procedures. The LOOE-sensitivity algorithm is an approach for mislabeled sample detection for microarray based on data perturbation. However, the failure of measuring the perturbing effect makes the LOOE-sensitivity algorithm a poor performance. The purpose of this article is to design a novel detection method for mislabeled samples of microarray, which could take advantage of the measuring effect of data perturbations.
RESULTS: To measure the effect of data perturbation, we define an index named perturbing influence value (PIV), based on the support vector machine (SVM) regression model. The Column Algorithm (CAPIV), Row Algorithm (RAPIV) and progressive Row Algorithm (PRAPIV) based on the PIV value are proposed to detect the mislabeled samples. Experimental results obtained by using six artificial datasets and five microarray datasets demonstrate that all proposed methods in this article are superior to LOOE-sensitivity. Moreover, compared with the simple SVM and CL-stability, the PRAPIV algorithm shows an increase in precision and high recall. AVAILABILITY: The program and source code (in JAVA) are publicly available at http://ccst.jlu.edu.cn/CSBG/PIVS/index.htm

Mesh:

Year:  2009        PMID: 19661242     DOI: 10.1093/bioinformatics/btp478

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  6 in total

1.  Coding SNPs as intrinsic markers for sample tracking in large-scale transcriptome studies.

Authors:  Weihong Xu; Hong Gao; Junhee Seok; Julie Wilhelmy; Michael N Mindrinos; Ronald W Davis; Wenzhong Xiao
Journal:  Biotechniques       Date:  2012-06       Impact factor: 1.993

2.  Bottlenecks caused by software gaps in miRNA and RNAi research.

Authors:  Sean Ekins; Ron Shigeta; Barry A Bunin
Journal:  Pharm Res       Date:  2012-02-24       Impact factor: 4.200

Review 3.  Deep learning with noisy labels: Exploring techniques and remedies in medical image analysis.

Authors:  Davood Karimi; Haoran Dou; Simon K Warfield; Ali Gholipour
Journal:  Med Image Anal       Date:  2020-06-20       Impact factor: 8.545

4.  Addressing Measurement Error in Random Forests Using Quantitative Bias Analysis.

Authors:  Tammy Jiang; Jaimie L Gradus; Timothy L Lash; Matthew P Fox
Journal:  Am J Epidemiol       Date:  2021-09-01       Impact factor: 5.363

5.  Identification and Correction of Sample Mix-Ups in Expression Genetic Data: A Case Study.

Authors:  Karl W Broman; Mark P Keller; Aimee Teo Broman; Christina Kendziorski; Brian S Yandell; Śaunak Sen; Alan D Attie
Journal:  G3 (Bethesda)       Date:  2015-08-19       Impact factor: 3.154

6.  Comparative analyses of H3K4 and H3K27 trimethylations between the mouse cerebrum and testis.

Authors:  Peng Cui; Wanfei Liu; Yuhui Zhao; Qiang Lin; Daoyong Zhang; Feng Ding; Chengqi Xin; Zhang Zhang; Shuhui Song; Fanglin Sun; Jun Yu; Songnian Hu
Journal:  Genomics Proteomics Bioinformatics       Date:  2012-06-09       Impact factor: 7.691

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.