| Literature DB >> 29666661 |
Jiucheng Xu1,2, Huiyu Mu1, Yun Wang1, Fangzhou Huang1.
Abstract
The selection of feature genes with high recognition ability from the gene expression profiles has gained great significance in biology. However, most of the existing methods have a high time complexity and poor classification performance. Motivated by this, an effective feature selection method, called supervised locally linear embedding and Spearman's rank correlation coefficient (SLLE-SC2), is proposed which is based on the concept of locally linear embedding and correlation coefficient algorithms. Supervised locally linear embedding takes into account class label information and improves the classification performance. Furthermore, Spearman's rank correlation coefficient is used to remove the coexpression genes. The experiment results obtained on four public tumor microarray datasets illustrate that our method is valid and feasible.Entities:
Mesh:
Year: 2018 PMID: 29666661 PMCID: PMC5831962 DOI: 10.1155/2018/5490513
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
Example sample X.
| Sample |
|
|
|---|---|---|
|
| 0.7 | 0.9 |
|
| 0.3 | 0.3 |
|
| 0.5 | 0.4 |
|
| 0.2 | 0.1 |
|
| 0.8 | 0.7 |
The rank sequences R and S.
| Sample |
|
|
|
|
|---|---|---|---|---|
|
| 0.7 | 4 | 0.9 | 5 |
|
| 0.3 | 2 | 0.3 | 2 |
|
| 0.5 | 3 | 0.4 | 3 |
|
| 0.2 | 1 | 0.1 | 1 |
|
| 0.8 | 5 | 0.7 | 4 |
Algorithm 1Supervised locally linear embedding method description.
Algorithm 2Spearman's rank correlation coefficient method description.
Algorithm 3SLLE-SC2 method description.
Experiment dataset.
| Dataset | Number of features | Classes | Number of instances |
|---|---|---|---|
| Leukemia | 7129 | ALL (47), AML (25) | 72 |
| Colon | 2000 | Tumor (40), normal (22) | 62 |
| Lung | 12600 | Tumor (186), normal (17) | 203 |
| Prostate | 12600 | Tumor (52), normal (50) | 102 |
Figure 1Pareto diagram of the principal components explained variance.
Figure 2Classification accuracies with threshold λ.
The results of various performance metrics.
| Dataset | Acc | TPR | TNR |
|
| AUC |
|---|---|---|---|---|---|---|
| Leukemia | 0.997 | 0.86 | 0.882 | 0.909 | 0.895 | 0.914 |
| Colon | 0.948 | 0.89 | 0.877 | 0.85 | 0.911 | 0.864 |
| Lung | 0.942 | 0.793 | 0.827 | 0.842 | 0.837 | 0.858 |
| Prostate | 0.968 | 0.863 | 0.873 | 0.858 | 0.848 | 0.904 |
Classification performance of leukemia data.
| Classifiers | SLLE-SC2 | LE | LLE | SLLE | SC2 |
|---|---|---|---|---|---|
| SVM | 99.7 | 85.9 | 92.3 | 97.4 | 85.2 |
| C4.5 | 97.4 | 84.6 | 87.5 | 93.2 | 81.1 |
| Naive Bayes | 98.8 | 79.7 | 82.7 | 99.1 | 74.4 |
|
| 100 | 93.2 | 92.3 | 98.8 | 83.6 |
Classification performance of colon data.
| Classifiers | SLLE-SC2 | LE | LLE | SLLE | SC2 |
|---|---|---|---|---|---|
| SVM | 94.8 | 81.2 | 89.1 | 91.9 | 80.5 |
| C4.5 | 93.1 | 83.3 | 87.5 | 92.6 | 77.2 |
| Naive Bayes | 92.7 | 95.6 | 85.7 | 89.6 | 73.4 |
|
| 94.6 | 79.3 | 89.3 | 92.7 | 78.7 |
Classification performance of lung data.
| Classifiers | SLLE-SC2 | LE | LLE | SLLE | SC2 |
|---|---|---|---|---|---|
| SVM | 94.2 | 80.5 | 87.1 | 91.6 | 80.6 |
| C4.5 | 92.7 | 79.2 | 87.5 | 92.3 | 79.1 |
| Naive Bayes | 94.8 | 78.1 | 90.7 | 94.7 | 80.5 |
|
| 89.9 | 81.4 | 87.3 | 89.6 | 75.8 |
Classification performance of prostate data.
| Classifiers | SLLE-SC2 | LE | LLE | SLLE | SC2 |
|---|---|---|---|---|---|
| SVM | 97.9 | 85.5 | 88.2 | 96.9 | 79.5 |
| C4.5 | 95.4 | 81.3 | 90.7 | 95.3 | 81.1 |
| Naive Bayes | 94.8 | 79.1 | 86.7 | 89.9 | 73.7 |
|
| 96.8 | 82.9 | 87.3 | 97.8 | 74.8 |
The number of feature genes and classification results.
| Method | Leukemia | Colon | Lung | Prostate |
|---|---|---|---|---|
| IGA-FBFE [ | 94.20 (35) | 90.09 (30) | 91.23 (80) | 88.12 (50) |
| BQPSO [ | 100 (7) | 92.52 (11) | 99.96 (9) | 99.25 (10) |
| CAGC [ | 95.3 (866) | 91.9 (135) | — | 68.9 (3071) |
| ILASSO [ | 98.61 (14) | 90.32 (4) | 100 (7) | 96.08 (9) |
| RT-PLSDA [ | 94.12 (9) | — | 97.99 (4) | 91.18 (18) |
| MAHP [ | 92.78 (5) | 83.47 (5) | 88.77 (5) | — |
| SU [ | 100 (6) | 83.87 (4) | 100 (3) | 93.14 (4) |
| DRF0-CFS [ | 91.18 (13) | 90.0 (10) | 98.66 (17) | 85.29 (113) |
| IG-SGA [ | 97.06 (3) | 85.48 (60) | — | 100 (26) |
| SLLE-SC2 | 99.7 (5) | 95.4 (4) | 94.8 (3) | 97.3 (5) |
Biological significance of leukemia data.
| Index | Gene selection | Description |
|---|---|---|
| 1834 | M23197 | CD33 antigen (differentiation antigen) [ |
| 1882 | M27891 | CST3 cystatin C [ |
| 3847 | U82759 | GB DEF = homeodomain protein HoxA9 mRNA [ |
| 4847 | X95735 | Zyxin [ |
| 6041 | L09209 | APLP2 [ |
Note. Index denotes the serial number of the selected genes in the original data.
Biological significance of colon data.
| Index | Gene selection | Description |
|---|---|---|
| 792 | R88740 | ATP synthase coupling factor 6, mitochondrial precursor [ |
| 1346 | T62947 | 60S ribosomal protein l24 ( |
| 1400 | M59040 | Human cell adhesion molecule (CD44) mRNA [ |
| 1772 | H08393 | Collagen alpha 2(xi) chain ( |
Note. Index denotes the serial number of the selected genes in the original data.
Biological significance of lung data.
| Index | Gene selection | Description |
|---|---|---|
| 4336 | AL050224 |
|
| 7765 | X05323 | Human MOX2 gene for OX-2 membrane glycoprotein, exon 1, and joined CDS [ |
| 8537 | AJ011497 |
|
Note. Index denotes the serial number of the selected genes in the original data.
Biological significance of prostate data.
| Index | Gene selection | Description |
|---|---|---|
| 5890 | AJ001625 |
|
| 6462 | M11433 | Human cellular retinol-binding protein mRNA, complete cds [ |
| 9172 | AI207842 | Ao89h09.x1 |
| 9850 | M84526 | Human adipsin/complement factor D mRNA, complete cds [ |
| 12495 | M98539 | Human prostaglandin D2 synthase gene, exon 7 [ |
Note. Index denotes the serial number of the selected genes in the original data.