| Literature DB >> 19956422 |
Ming-Gang Du1, Shan-Wen Zhang, Hong Wang.
Abstract
Motivation. Independent Components Analysis (ICA) maximizes the statistical independence of the representational components of a training gene expression profiles (GEP) ensemble, but it cannot distinguish relations between the different factors, or different modes, and it is not available to high-order GEP Data Mining. In order to generalize ICA, we introduce Multilinear-ICA and apply it to tumor classification using high order GEP. Firstly, we introduce the basis conceptions and operations of tensor and recommend Support Vector Machine (SVM) classifier and Multilinear-ICA. Secondly, the higher score genes of original high order GEP are selected by using t-statistics and tabulate tensors. Thirdly, the tensors are performed by Multilinear-ICA. Finally, the SVM is used to classify the tumor subtypes. Results. To show the validity of the proposed method, we apply it to tumor classification using high order GEP. Though we only use three datasets, the experimental results show that the method is effective and feasible. Through this survey, we hope to gain some insight into the problem of high order GEP tumor classification, in aid of further developing more effective tumor classification algorithms.Entities:
Year: 2009 PMID: 19956422 PMCID: PMC2778791 DOI: 10.1155/2009/926450
Source DB: PubMed Journal: Adv Bioinformatics ISSN: 1687-8027
Figure 1Third-order tensor.
Descriptions of three original tumor datasets.
| Tumor dataset | #Gene | #Sample | Subtype 1 | Subtype 2 |
|---|---|---|---|---|
| Leukemia dataset 1 | 7,129 | 38 | 27(ALL) | 11(AML) |
| Leukemia dataset 2 | 7,129 | 72 | 47(ALL) | 25(AML) |
| Lung dataset 3 | 12,533 | 181 | 32 | 149 |
Figure 2The gene distribution frequency versus gene S-values.
Distribution of two tumor datasets in our experiments.
| Tumor dataset | Choosing gene | Choosing sample | Training sample | Testing sample |
|---|---|---|---|---|
| Leukemia dataset 1 | 200 | 38 | 19 | 19 |
| Leukemia dataset 2 | 200 | 38 | 19 | 19 |
Figure 3Training tensor A tn.
Figure 4Training core tensor S tn.
Figure 6Training tensor A tn.
Figure 7Training core tensor S tn.
Figure 8Three matrixes U 1, U 2, and U 3.
Figure 9MICA for three-order lung microarray.
Classification results on three tumor datasets by LOO-CV.
| Dataset | ||||
|---|---|---|---|---|
| Method | Leukemia dataset 1 | Leukemia dataset 2 | Leukemia dataset 1 + dataset 2 | Lung dataset 3 |
| SVM | 95.76 | 94.32 | 90.75 | 88.69 |
| ICA + SVM | 99.15 | 99.45 | 95.48 | 90.44 |
| Multilinear ICA + SVM | — | — | 99.80 | 90.26 |
| 99.54 | ||||