Literature DB >> 16267079

A method for predicting disease subtypes in presence of misclassification among training samples using gene expression: application to human breast cancer.

Wensheng Zhang1, Romdhane Rekaya, Keith Bertrand.   

Abstract

MOTIVATION: An accurate diagnostic and prediction will not be achieved unless the disease subtype status for every training sample used in the supervised learning step is accurately known. Such an assumption requires the existence of a perfect tool for disease diagnostic and classification, which is seldom available in the majority of the cases. Thus, the supervised learning step has to be conducted with a statistical model that contemplates and handles potential mislabeling in the input data.
RESULTS: A procedure for handling potential mislabeling among training samples in the prediction of disease subtypes using gene expression data was proposed. A real data-based simulation study about the estrogen receptor status (ER+/ER-) of breast cancer patients was conducted. The results demonstrated that when 1-4 training samples (N = 30) were artificially mislabeled, the proposed method was able not only in correcting the ER status of mislabeled training samples but also more importantly in predicting the ER status of validation samples as well as using 'true' training data.

Entities:  

Mesh:

Substances:

Year:  2005        PMID: 16267079     DOI: 10.1093/bioinformatics/bti738

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  8 in total

1.  Comparison of two output-coding strategies for multi-class tumor classification using gene expression data and Latent Variable Model as binary classifier.

Authors:  Sandeep J Joseph; Kelly R Robbins; Wensheng Zhang; Romdhane Rekaya
Journal:  Cancer Inform       Date:  2010-03-10

2.  A jackknife-like method for classification and uncertainty assessment of multi-category tumor samples using gene expression information.

Authors:  Wensheng Zhang; Kelly Robbins; Yupeng Wang; Keith Bertrand; Romdhane Rekaya
Journal:  BMC Genomics       Date:  2010-04-29       Impact factor: 3.969

3.  svdPPCS: an effective singular value decomposition-based method for conserved and divergent co-expression gene module identification.

Authors:  Wensheng Zhang; Andrea Edwards; Wei Fan; Dongxiao Zhu; Kun Zhang
Journal:  BMC Bioinformatics       Date:  2010-06-22       Impact factor: 3.169

4.  Constructing endophenotypes of complex diseases using non-negative matrix factorization and adjusted rand index.

Authors:  Hui-Min Wang; Ching-Lin Hsiao; Ai-Ru Hsieh; Ying-Chao Lin; Cathy S J Fann
Journal:  PLoS One       Date:  2012-07-16       Impact factor: 3.240

5.  Identifying novel associations in GWAS by hierarchical Bayesian latent variable detection of differentially misclassified phenotypes.

Authors:  Afrah Shafquat; Ronald G Crystal; Jason G Mezey
Journal:  BMC Bioinformatics       Date:  2020-05-07       Impact factor: 3.169

6.  Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments.

Authors:  Magalie Celton; Alain Malpertuy; Gaëlle Lelandais; Alexandre G de Brevern
Journal:  BMC Genomics       Date:  2010-01-07       Impact factor: 3.969

7.  An integrated approach for identifying wrongly labelled samples when performing classification in microarray data.

Authors:  Yuk Yee Leung; Chun Qi Chang; Yeung Sam Hung
Journal:  PLoS One       Date:  2012-10-17       Impact factor: 3.240

8.  Genome wide association studies in presence of misclassified binary responses.

Authors:  Shannon Smith; El Hamidi Hay; Nourhene Farhat; Romdhane Rekaya
Journal:  BMC Genet       Date:  2013-12-26       Impact factor: 2.797

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.