Literature DB >> 16447988

Robust and accurate cancer classification with gene expression profiling.

Haifeng Li1, Keshu Zhang, Tao Jiang.   

Abstract

Robust and accurate cancer classification is critical in cancer treatment. Gene expression profiling is expected to enable us to diagnose tumors precisely and systematically. However, the classification task in this context is very challenging because of the curse of dimensionality and the small sample size problem. In this paper, we propose a novel method to solve these two problems. Our method is able to map gene expression data into a very low dimensional space and thus meets the recommended samples to features per class ratio. As a result, it can be used to classify new samples robustly with low and trustable (estimated) error rates. The method is based on linear discriminant analysis (LDA). However, the conventional LDA requires that the within-class scatter matrix S(w) be nonsingular. Unfortunately, Sw is always singular in the case of cancer classification due to the small sample size problem. To overcome this problem, we develop a generalized linear discriminant analysis (GLDA) that is a general, direct, and complete solution to optimize Fisher's criterion. GLDA is mathematically well-founded and coincides with the conventional LDA when S(w) is nonsingular. Different from the conventional LDA, GLDA does not assume the nonsingularity of S(w), and thus naturally solves the small sample size problem. To accommodate the high dimensionality of scatter matrices, a fast algorithm of GLDA is also developed. Our extensive experiments on seven public cancer datasets show that the method performs well. Especially on some difficult instances that have very small samples to genes per class ratios, our method achieves much higher accuracies than widely used classification methods such as support vector machines, random forests, etc.

Entities:  

Mesh:

Substances:

Year:  2005        PMID: 16447988     DOI: 10.1109/csb.2005.49

Source DB:  PubMed          Journal:  Proc IEEE Comput Syst Bioinform Conf        ISSN: 1551-7497


  3 in total

1.  A novel kernel Wasserstein distance on Gaussian measures: An application of identifying dental artifacts in head and neck computed tomography.

Authors:  Jung Hun Oh; Maryam Pouryahya; Aditi Iyer; Aditya P Apte; Joseph O Deasy; Allen Tannenbaum
Journal:  Comput Biol Med       Date:  2020-03-26       Impact factor: 4.589

2.  ANMM4CBR: a case-based reasoning method for gene expression data classification.

Authors:  Bangpeng Yao; Shao Li
Journal:  Algorithms Mol Biol       Date:  2010-01-06       Impact factor: 1.405

3.  Comparison of linear discriminant analysis methods for the classification of cancer based on gene expression data.

Authors:  Desheng Huang; Yu Quan; Miao He; Baosen Zhou
Journal:  J Exp Clin Cancer Res       Date:  2009-12-10
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.