| Literature DB >> 18541061 |
Xue-Qiang Zeng1, Guo-Zheng Li, Jack Y Yang, Mary Qu Yang, Geng-Feng Wu.
Abstract
BACKGROUND: Analysis of gene expression data for tumor classification is an important application of bioinformatics methods. But it is hard to analyse gene expression data from DNA microarray experiments by commonly used classifiers, because there are only a few observations but with thousands of measured genes in the data set. Dimension reduction is often used to handle such a high dimensional problem, but it is obscured by the existence of amounts of redundant features in the microarray data set.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18541061 PMCID: PMC2423430 DOI: 10.1186/1471-2105-9-S6-S8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1The number of selected genes by performing REDISC and RELIC with different parameters.
Statistical results by performing PLS after REDISC and RELIC with different parameters on the Colon data set
| #genes | Sensitivity | Specificity | BACC | Precision | PPV | NPV | Correction | |
| RELIC 0.1 | 5.62 | 0.9750 | 0.3167 | 0.6458 | 0.7362 | 0.7362 | 0.5217 | 0.7424 |
| RELIC 0.2 | 8.56 | 0.9600 | 0.4167 | 0.6883 | 0.7646 | 0.7646 | 0.6400 | 0.7667 |
| RELIC 0.3 | 16.55 | 0.9350 | 0.5567 | 0.7458 | 0.8131 | 0.8131 | 0.7258 | 0.7995 |
| RELIC 0.4 | 28.08 | 0.9150 | 0.6650 | 0.7900 | 0.8468 | 0.8468 | 0.8208 | 0.8264 |
| RELIC 0.5 | 49.55 | 0.9100 | 0.7017 | 0.8058 | 0.8633 | 0.8633 | 0.8092 | 0.8352 |
| RELIC 0.6 | 94.3 | 0.8975 | 0.7133 | 0.8054 | 0.8682 | 0.8682 | 0.7992 | 0.8329 |
| RELIC 0.7 | 218.37 | 0.8950 | 0.7650 | 0.8300 | 0.8955 | 0.8955 | 0.8075 | 0.8490 |
| RELIC 0.8 | 542.46 | 0.8825 | 0.7917 | 0.8371 | 0.9003 | 0.9003 | 0.8187 | 0.8500 |
| RELIC 0.9 | 1413.92 | 0.8750 | 0.7967 | 0.8358 | 0.9080 | 0.9080 | 0.8110 | 0.8479 |
| REDISC 0.1 | 2 | 1.0000 | 0.1850 | 0.5925 | 0.6996 | 0.6996 | 0.3600 | 0.7121 |
| REDISC 0.2 | 2.1 | 0.9850 | 0.2083 | 0.5967 | 0.7022 | 0.7022 | 0.3950 | 0.7107 |
| REDISC 0.3 | 2.94 | 0.9800 | 0.2983 | 0.6392 | 0.7315 | 0.7315 | 0.4883 | 0.7383 |
| REDISC 0.4 | 4.3 | 0.9750 | 0.3750 | 0.6750 | 0.7549 | 0.7549 | 0.6050 | 0.7617 |
| REDISC 0.5 | 8.09 | 0.9200 | 0.5133 | 0.7167 | 0.7911 | 0.7911 | 0.7133 | 0.7752 |
| REDISC 0.6 | 15.82 | 0.8975 | 0.6533 | 0.7754 | 0.8443 | 0.8443 | 0.7525 | 0.8112 |
| REDISC 0.7 | 41.9 | 0.9000 | 0.7800 | 0.8400 | 0.8992 | 0.8992 | 0.8208 | 0.8579 |
| REDISC 0.8 | 157.06 | 0.8950 | 0.8150 | 0.8550 | 0.9150 | 0.9150 | 0.8350 | 0.8662 |
| REDISC 0.9 | 558 | 0.8900 | 0.7900 | 0.8400 | 0.8985 | 0.8985 | 0.8277 | 0.8533 |
| Full Set | 2000 | 0.8750 | 0.7733 | 0.8242 | 0.8958 | 0.8958 | 0.8137 | 0.8388 |
Statistical results by performing PCA after REDISC and RELIC with different parameters on the Leukemia data set
| #genes | Sensitivity | Specificity | BACC | Precision | PPV | NPV | Correction | |
| RELIC 0.1 | 18.1 | 0.9395 | 0.7433 | 0.8414 | 0.8818 | 0.8818 | 0.8758 | 0.8702 |
| RELIC 0.2 | 55.86 | 0.9115 | 0.7783 | 0.8449 | 0.9024 | 0.9024 | 0.8007 | 0.8646 |
| RELIC 0.3 | 205.48 | 0.9440 | 0.7700 | 0.8570 | 0.8960 | 0.8960 | 0.8692 | 0.8827 |
| RELIC 0.4 | 790.41 | 0.9585 | 0.7983 | 0.8784 | 0.9105 | 0.9105 | 0.8975 | 0.9027 |
| RELIC 0.5 | 2168.33 | 0.9735 | 0.8783 | 0.9259 | 0.9441 | 0.9441 | 0.9617 | 0.9405 |
| RELIC 0.6 | 3859.52 | 0.9865 | 0.9400 | 0.9632 | 0.9716 | 0.9716 | 0.9825 | 0.9700 |
| RELIC 0.7 | 5394.2 | 0.9910 | 0.9567 | 0.9738 | 0.9786 | 0.9786 | 0.9883 | 0.9784 |
| RELIC 0.8 | 6545.28 | 0.9910 | 0.9633 | 0.9772 | 0.9815 | 0.9815 | 0.9875 | 0.9807 |
| RELIC 0.9 | 7035.99 | 0.9955 | 0.9633 | 0.9794 | 0.9815 | 0.9815 | 0.9933 | 0.9836 |
| REDISC 0.1 | 3.31 | 0.9865 | 0.6267 | 0.8066 | 0.8380 | 0.8380 | 0.9167 | 0.8579 |
| REDISC 0.2 | 4.29 | 0.9805 | 0.6583 | 0.8194 | 0.8550 | 0.8550 | 0.8858 | 0.8671 |
| REDISC 0.3 | 6.88 | 0.9695 | 0.7733 | 0.8714 | 0.9013 | 0.9013 | 0.9108 | 0.9000 |
| REDISC 0.4 | 14.83 | 0.9655 | 0.8700 | 0.9178 | 0.9385 | 0.9385 | 0.9508 | 0.9304 |
| REDISC 0.5 | 46.56 | 0.9735 | 0.9417 | 0.9576 | 0.9715 | 0.9715 | 0.9642 | 0.9614 |
| REDISC 0.6 | 131.23 | 0.9690 | 0.9467 | 0.9578 | 0.9737 | 0.9737 | 0.9592 | 0.9611 |
| REDISC 0.7 | 531.33 | 0.9795 | 0.9517 | 0.9656 | 0.9765 | 0.9765 | 0.9742 | 0.9698 |
| REDISC 0.8 | 2239.81 | 0.9880 | 0.9600 | 0.9740 | 0.9798 | 0.9798 | 0.9867 | 0.9782 |
| REDISC 0.9 | 4195.03 | 0.9980 | 0.9550 | 0.9765 | 0.9787 | 0.9787 | 0.9967 | 0.9823 |
| Full Set | 7129 | 0.9955 | 0.9633 | 0.9794 | 0.9815 | 0.9815 | 0.9933 | 0.9836 |
Figure 2Comparative results of BACC scores by using different algorithms on the Colon data set.
Figure 3Comparative results of BACC scores by using different algorithms on the Leukemia data set.
Figure 4The novel framework of dimension reduction.
Statistical relative classification results of two classifiers
| true | false | |
| true | a | b |
| false | c | d |
Figure 5The REDISC algorithm.
Figure 6The RELIC algorithm.
Experimental data sets
| Data sets | Number of examples | Class ratio | Number of genes |
| Colon | 62 | 22/40 | 2,000 |
| Leukemia | 72 | 25/47 | 7,129 |
Figure 7Experimental procedure for comparing different algorithms.
Statistical results by performing PCA after REDISC and RELIC with different parameters on the Colon data set
| #genes | Sensitivity | Specificity | BACC | Precision | PPV | NPV | Correction | |
| RELIC 0.1 | 5.62 | 0.9750 | 0.2817 | 0.6283 | 0.7231 | 0.7231 | 0.4800 | 0.7298 |
| RELIC 0.2 | 8.56 | 0.9625 | 0.3917 | 0.6771 | 0.7546 | 0.7546 | 0.6333 | 0.7605 |
| RELIC 0.3 | 16.55 | 0.9300 | 0.5483 | 0.7392 | 0.8075 | 0.8075 | 0.7267 | 0.7931 |
| RELIC 0.4 | 28.08 | 0.9175 | 0.6650 | 0.7912 | 0.8480 | 0.8480 | 0.8142 | 0.8279 |
| RELIC 0.5 | 49.55 | 0.9075 | 0.7033 | 0.8054 | 0.8645 | 0.8645 | 0.8075 | 0.8338 |
| RELIC 0.6 | 94.3 | 0.9125 | 0.7250 | 0.8188 | 0.8743 | 0.8743 | 0.8208 | 0.8455 |
| RELIC 0.7 | 218.37 | 0.9025 | 0.7917 | 0.8471 | 0.9053 | 0.9053 | 0.8483 | 0.8631 |
| RELIC 0.8 | 542.46 | 0.8725 | 0.7883 | 0.8304 | 0.8987 | 0.8987 | 0.8102 | 0.8424 |
| RELIC 0.9 | 1413.92 | 0.9000 | 0.7400 | 0.8200 | 0.8802 | 0.8802 | 0.8175 | 0.8426 |
| REDISC 0.1 | 2 | 0.9975 | 0.0150 | 0.5063 | 0.6516 | 0.6516 | 0.0200 | 0.6510 |
| REDISC 0.2 | 2.1 | 0.9875 | 0.0550 | 0.5213 | 0.6599 | 0.6599 | 0.1000 | 0.6590 |
| REDISC 0.3 | 2.94 | 0.9800 | 0.2167 | 0.5983 | 0.7039 | 0.7039 | 0.3950 | 0.7095 |
| REDISC 0.4 | 4.3 | 0.9675 | 0.3350 | 0.6513 | 0.7396 | 0.7396 | 0.5483 | 0.7424 |
| REDISC 0.5 | 8.09 | 0.9175 | 0.5017 | 0.7096 | 0.7875 | 0.7875 | 0.6833 | 0.7690 |
| REDISC 0.6 | 15.82 | 0.8975 | 0.6533 | 0.7754 | 0.8443 | 0.8443 | 0.7525 | 0.8112 |
| REDISC 0.7 | 41.9 | 0.9000 | 0.7750 | 0.8375 | 0.8978 | 0.8978 | 0.8108 | 0.8562 |
| REDISC 0.8 | 157.06 | 0.8900 | 0.8117 | 0.8508 | 0.9142 | 0.9142 | 0.8300 | 0.8617 |
| REDISC 0.9 | 558 | 0.8750 | 0.7483 | 0.8117 | 0.8828 | 0.8828 | 0.7892 | 0.8298 |
| Full Set | 2000 | 0.8925 | 0.7150 | 0.8038 | 0.8680 | 0.8680 | 0.7950 | 0.8290 |
Statistical results by performing PLS after REDISC and RELIC with different parameters on the Leukemia data set
| #genes | Sensitivity | Specificity | BACC | Precision | PPV | NPV | Correction | |
| RELIC 0.1 | 18.1 | 0.9400 | 0.7567 | 0.8483 | 0.8872 | 0.8872 | 0.8898 | 0.8743 |
| RELIC 0.2 | 55.86 | 0.9110 | 0.7850 | 0.8480 | 0.9024 | 0.9024 | 0.8318 | 0.8663 |
| RELIC 0.3 | 205.48 | 0.9415 | 0.8150 | 0.8783 | 0.9161 | 0.9161 | 0.8842 | 0.8964 |
| RELIC 0.4 | 790.41 | 0.9605 | 0.8133 | 0.8869 | 0.9141 | 0.9141 | 0.9200 | 0.9084 |
| RELIC 0.5 | 2168.33 | 0.9720 | 0.9017 | 0.9368 | 0.9537 | 0.9537 | 0.9575 | 0.9470 |
| RELIC 0.6 | 3859.52 | 0.9795 | 0.9550 | 0.9672 | 0.9782 | 0.9782 | 0.9742 | 0.9711 |
| RELIC 0.7 | 5394.2 | 0.9795 | 0.9500 | 0.9647 | 0.9752 | 0.9752 | 0.9742 | 0.9686 |
| RELIC 0.8 | 6545.28 | 0.9815 | 0.9150 | 0.9483 | 0.9585 | 0.9585 | 0.9750 | 0.9573 |
| RELIC 0.9 | 7035.99 | 0.9840 | 0.9067 | 0.9453 | 0.9572 | 0.9572 | 0.9775 | 0.9571 |
| REDISC 0.1 | 3.31 | 0.9865 | 0.6317 | 0.8091 | 0.8394 | 0.8394 | 0.9267 | 0.8595 |
| REDISC 0.2 | 4.29 | 0.9805 | 0.6683 | 0.8244 | 0.8566 | 0.8566 | 0.9050 | 0.8700 |
| REDISC 0.3 | 6.88 | 0.9695 | 0.7833 | 0.8764 | 0.9052 | 0.9052 | 0.9208 | 0.9041 |
| REDISC 0.4 | 14.83 | 0.9635 | 0.8783 | 0.9209 | 0.9418 | 0.9418 | 0.9475 | 0.9316 |
| REDISC 0.5 | 46.56 | 0.9710 | 0.9533 | 0.9622 | 0.9760 | 0.9760 | 0.9625 | 0.9641 |
| REDISC 0.6 | 131.23 | 0.9665 | 0.9633 | 0.9649 | 0.9810 | 0.9810 | 0.9567 | 0.9654 |
| REDISC 0.7 | 531.33 | 0.9775 | 0.9450 | 0.9612 | 0.9723 | 0.9723 | 0.9700 | 0.9657 |
| REDISC 0.8 | 2239.81 | 0.9855 | 0.9383 | 0.9619 | 0.9702 | 0.9702 | 0.9842 | 0.9686 |
| REDISC 0.9 | 4195.03 | 0.9885 | 0.9117 | 0.9501 | 0.9600 | 0.9600 | 0.9858 | 0.9613 |
| Full Set | 7129 | 0.9840 | 0.9083 | 0.9462 | 0.9572 | 0.9572 | 0.9775 | 0.9573 |