| Literature DB >> 19255638 |
Abstract
The advantage of using DNA microarray data when investigating human cancer gene expressions is its ability to generate enormous amount of information from a single assay in order to speed up the scientific evaluation process. The number of variables from the gene expression data coupled with comparably much less number of samples creates new challenges to scientists and statisticians. In particular, the problems include enormous degree of collinearity among genes expressions, likely violation of model assumptions as well as high level of noise with potential outliers. To deal with these problems, we propose a block wavelet shrinkage principal component (BWSPCA) analysis method to optimize the information during the noise reduction process. This paper firstly uses the National Cancer Institute database (NC160) as an illustration and shows a significant improvement in dimension reduction. Secondly we combine BWSPCA with an artificial neural network-based gene minimization strategy to establish a Block Wavelet-based Neural Network model in a robust and accurate cancer classification process (BWNN). Our extensive experiments on six public cancer datasets have shown that the method of BWNN for tumor classification performed well, especially on some difficult instances with large-class (more than two) expression data. This proposed method is extremely useful for data denoising and is competitiveness with respect to other methods such as BagBoost, RandomForest (RanFor), Support Vector Machines (SVM), K-Nearest Neighbor (KNN) and Artificial Neural Network (ANN).Entities:
Keywords: ANN; classification of cancer types; denoising; wavelet shrinkage
Year: 2008 PMID: 19255638 PMCID: PMC2646193 DOI: 10.6026/97320630003223
Source DB: PubMed Journal: Bioinformation ISSN: 0973-2063
Figure 1Wavelet-shrinkage paradigm (b) Schematic diagram of data flow by using BWSPCA method.
Figure 2Work flow for BWNN analysis for microarray data.
Figure 3The cross‐validated BWSPCA results by applying the method of nearest shrunken centroids in PAMR: (a) the cross‐validated error curves for a range of threshold values, and (b) the plot of cross‐validated sample probabilities for the original result of BWSPCA (at threshold value 0).
Figure 4Misclassification rates for 6 classifiers on 6 microarray datasets, based on 50 random splits into training and test sets. The bars represent pseudo‐confidence intervals, showing the variability of the point estimates.