| Literature DB >> 25810748 |
Kun-Hong Liu1, Muchenxuan Tong2, Shu-Tong Xie3, Vincent To Yee Ng4.
Abstract
Recently, more and more machine learning techniques have been applied to microarray data analysis. The aim of this study is to propose a genetic programming (GP) based new ensemble system (named GPES), which can be used to effectively classify different types of cancers. Decision trees are deployed as base classifiers in this ensemble framework with three operators: Min, Max, and Average. Each individual of the GP is an ensemble system, and they become more and more accurate in the evolutionary process. The feature selection technique and balanced subsampling technique are applied to increase the diversity in each ensemble system. The final ensemble committee is selected by a forward search algorithm, which is shown to be capable of fitting data automatically. The performance of GPES is evaluated using five binary class and six multiclass microarray datasets, and results show that the algorithm can achieve better results in most cases compared with some other ensemble systems. By using elaborate base classifiers or applying other sampling techniques, the performance of GPES may be further improved.Entities:
Mesh:
Year: 2015 PMID: 25810748 PMCID: PMC4355811 DOI: 10.1155/2015/193406
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
Figure 1A simple syntax tree for arithmetic.
Figure 2An example of the individual of GP in the proposed algorithm.
Figure 3Decompose GPES into different phases.
Binary class datasets used in experiments.
| Datasets | Number of genes | Number of samples of two classes | Reference |
|---|---|---|---|
| Ovarian | 15154 | 162/91 | [ |
| Leukemia | 7129 | 47/25 | [ |
| Colon | 20000 | 40/22 | [ |
| Lung | 12533 | 150/31 | [ |
| Prostate | 12600 | 77/59 | [ |
Multiclass datasets used in experiments.
| Dataset | Number of classes | Number of genes | Number of training samples | Number of test samples | Reference |
|---|---|---|---|---|---|
| Leukemia 1 | 3 | 7129 | 38 | 34 | [ |
| Leukemia 2 | 3 | 12,582 | 57 | 15 |
[ |
| Lung 1 | 3 | 7129 | 64 | 32 | [ |
| Lung 2 | 5 | 12,600 | 136 | 67 | [ |
| Breast | 5 | 9216 | 54 | 30 | [ |
| DLBCL | 6 | 4026 | 58 | 30 | [ |
Experimental results for binary datasets.
| Datasets | GPES | DT | Random Forest | Rotation Forest | GA-ESP |
|---|---|---|---|---|---|
| Ovarian | |||||
| Accuracy |
| 0.984 ± 0.000 | 0.988 ± 0.006 |
|
|
| AUC |
| 0.987 ± 0.000 | 0.990 ± 0.000 |
|
|
| Leukemia | |||||
| Accuracy |
| 0.863 ± 0.000 | 0.944 ± 0.016 | 0.934 ± 0.020 | 0.944 ± 0.012 |
| AUC |
| 0.829 ± 0.000 | 0.942 ± 0.006 | 0.945 ± 0.017 | 0.935 ± 0.010 |
| Colon | |||||
| Accuracy | 0.810 ± 0.024 | 0.743 ± 0.000 | 0.804 ± 0.030 | 0.820 ± 0.025 |
|
| AUC |
| 0.756 ± 0.000 | 0.795 ± 0.024 | 0.796 ± 0.022 |
|
| Lung | |||||
| Accuracy |
| 0.967 ± 0.000 | 0.985 ± 0.005 | 0.990 ± 0.005 | 0.980 ± 0.009 |
| AUC |
| 0.927 ± 0.000 | 0.980 ± 0.003 | 0.986 ± 0.004 | 0.963 ± 0.007 |
| Prostate | |||||
| Accuracy | 0.902 ± 0.014 | 0.889 ± 0.000 | 0.889 ± 0.017 |
| 0.890 ± 0.018 |
| AUC | 0.889 ± 0.010 | 0.831 ± 0.000 | 0.862 ± 0.011 |
| 0.885 ± 0.015 |
|
| |||||
| Average | |||||
| Accuracy |
| 0.889 ± 0.000 | 0.837 ± 0.020 | 0.930 ± 0.015 | 0.928 ± 0.014 |
| AUC |
| 0.866 ± 0.000 | 0.914 ± 0.009 | 0.922 ± 0.012 | 0.905 ± 0.010 |
Figure 4Change of accuracy in different phases.
Figure 5Average accuracy (validation set) in Phase 4 versus average number of final ensemble systems.
Percentage of different operators in the final committee.
| Datasets | % of Min | % of Average | % of Max |
|---|---|---|---|
| Ovarian | 0.257 | 0.567 | 0.177 |
| Leukemia | 0.240 | 0.567 | 0.194 |
| Colon | 0.374 | 0.449 | 0.177 |
| Lung | 0.333 | 0.487 | 0.180 |
| Prostate | 0.178 | 0.582 | 0.240 |
Experimental results for multiclass datasets.
| Datasets | OVO | OVR | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| GPES | DT | Random Forest | HARF | GA-ESP | GPES | DT | Random Forest | HARF | GA-ESP | |
| Leukemia 1 | ||||||||||
|
| 0.897 ± 0.006 | 0.941 ± 0.000 | 0.881 ± 0.061 | 0.923 ± 0.074 | 0.932 ± 0.010 |
| 0.882 ± 0.000 | 0.917 ± 0.012 | 0.941 ± 0.019 | 0.903 ± 0.012 |
| AAc | 0.905 ± 0.004 | 0.960 ± 0.000 | 0.918 ± 0.040 | 0.949 ± 0.049 | 0.961 ± 0.007 |
| 0.922 ± 0.000 | 0.944 ± 0.008 | 0.951 ± 0.012 | 0.938 ± 0.019 |
| Leukemia 2 | ||||||||||
|
| 0.933 ± 0.042 | 0.733 ± 0.000 | 0.934 ± 0.066 |
| 0.935 ± 0.018 | 0.891 ± 0.046 | 0.773 ± 0.000 | 0.921 ± 0.050 | 0.925 ± 0.026 | 0.902 ± 0.014 |
| AAc | 0.956 ± 0.025 | 0.778 ± 0.000 | 0.953 ± 0.043 |
| 0.978 ± 0.016 | 0.925 ± 0.028 | 0.882 ± 0.000 | 0.947 ± 0.033 | 0.940 ± 0.017 | 0.922 ± 0.008 |
| Lung 1 | ||||||||||
|
|
| 0.668 ± 0.000 | 0.791 ± 0.017 | 0.817 ± 0.013 | 0.812 ± 0.025 | 0.822 ± 0.033 | 0.781 ± 0.000 | 0.771 ± 0.025 | 0.818 ± 0.036 | 0.802 ± 0.013 |
| AAc |
| 0.792 ± 0.000 | 0.861 ± 0.011 | 0.869 ± 0.009 | 0.853 ± 0.013 | 0.881 ± 0.023 | 0.854 ± 0.000 | 0.847 ± 0.017 | 0.868 ± 0.026 | 0.865 ± 0.012 |
| Lung 2 | ||||||||||
|
| 0.948 ± 0.030 | 0.955 ± 0.000 | 0.950 ± 0.008 | 0.933 ± 0.013 | 0.942 ± 0.021 | 0.913 ± 0.013 | 0.851 ± 0.000 | 0.908 ± 0.014 | 0.946 ± 0.017 |
|
| AAc | 0.979 ± 0.012 | 0.982 ± 0.000 | 0.980 ± 0.031 | 0.965 ± 0.009 | 0.953 ± 0.016 | 0.960 ± 0.007 | 0.940 ± 0.000 | 0.963 ± 0.006 | 0.964 ± 0.013 |
|
| Breast | ||||||||||
|
| 0.875 ± 0.026 | 0.700 ± 0.000 |
| 0.860 ± 0.037 | 0.853 ± 0.020 | 0.821 ± 0.004 | 0.733 ± 0.000 | 0.844 ± 0.027 | 0.873 ± 0.037 |
|
| AAc | 0.947 ± 0.014 | 0.880 ± 0.000 |
| 0.946 ± 0.012 | 0.916 ± 0.009 | 0.915 ± 0.003 | 0.893 ± 0.000 | 0.949 ± 0.017 | 0.942 ± 0.015 | 0.912 ± 0.010 |
| DLBCL | ||||||||||
|
|
| 0.833 ± 0.000 | 0.935 ± 0.055 | 0.837 ± 0.007 | 0.932 ± 0.010 | 0.883 ± 0.032 | 0.833 ± 0.000 | 0.815 ± 0.071 | 0.803 ± 0.071 | 0.922 ± 0.013 |
| AAc |
| 0.944 ± 0.000 | 0.972 ± 0.018 | 0.925 ± 0.024 | 0.958 ± 0.007 | 0.947 ± 0.011 | 0.944 ± 0.000 | 0.937 ± 0.023 | 0.923 ± 0.023 | 0.965 ± 0.009 |
|
| ||||||||||
| Average | ||||||||||
|
|
| 0.805 ± 0.000 | 0.897 ± 0.039 | 0.888 ± 0.026 | 0.901 ± 0.017 | 0.882 ± 0.024 | 0.809 ± 0.000 | 0.863 ± 0.033 | 0.884 ± 0.034 | 0.896 ± 0.014 |
| AAc |
| 0.889 ± 0.000 | 0.941 ± 0.024 | 0.942 ± 0.018 | 0.937 ± 0.013 | 0.934 ± 0.012 | 0.906 ± 0.000 | 0.878 ± 0.055 | 0.931 ± 0.017 | 0.932 ± 0.012 |