| Literature DB >> 27688975 |
Mei Sze Tan1, Jing Wei Tan1, Siow-Wee Chang1, Hwa Jen Yap2, Sameem Abdul Kareem3, Rosnah Binti Zain4.
Abstract
BACKGROUND: The potential of genetic programming (GP) on various fields has been attained in recent years. In bio-medical field, many researches in GP are focused on the recognition of cancerous cells and also on gene expression profiling data. In this research, the aim is to study the performance of GP on the survival prediction of a small sample size of oral cancer prognosis dataset, which is the first study in the field of oral cancer prognosis.Entities:
Keywords: Feature selection; Genetic Programming; Machine learning; Oral cancer prognosis
Year: 2016 PMID: 27688975 PMCID: PMC5036111 DOI: 10.7717/peerj.2482
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Features available in the oral cancer prognosis dataset.
| Age | |
| Ethnicity | |
| Gender | |
| Smoke | |
| Drink | |
| Chew | |
| Site | |
| Histological differentiation of SCC | |
| Pattern of invasion | |
| Nodes | |
| PT | |
| PN | |
| Stage | |
| Size | |
| Treatment | |
| p53 | ( |
| p63 | ( |
Figure 1Framework of oral cancer prognosis with genetic programming.
Parameters used in this research.
| Population size | 31 |
| Population initiation | ‘rampedinit’ |
| Maximum number of generation | 5 |
| Selection method | Tournament (size = 0.0100) |
| Crossover rate | 0.01 |
| Mutation rate | 0.01 |
| Types of kernel | Radial basis function |
| gamma ( | 0.06 |
| cost ( | 1 |
| epsilon parameter | 0.001 |
| Weight ( | 1 |
Notes.
SVM parameter: gamma (γ) parameter determines the boundary of RBF kernel in which the kernel will be exceed a certain value; cost (c) penalty function to control the tradeoff between the two requirements, i.e., the margin of the SVM hyperplane depends on the c penalty; epsilon parameter determines the level of accuracy of the function; weight (w) parameter is an n-dimensional coefficient vector which is normal to the hyperplane
Figure 2Frequency of each feature selected by GP in 20 runs.
Best results on selected feature subsets using normal GP and Operator Equalisation GP.
| No. of feature | Feature subset | Normal GP | OpEq GP | ||||
|---|---|---|---|---|---|---|---|
| Average accuracy (%) | Root Mean Square Error (RMSE) | Average AUROC | Average accuracy (%) | Root Mean Square Error (RMSE) | Average AUROC | ||
| 10 | 67.74 | 0.4675 | 0.6477 | 67.74 | 0.5957 | 0.6559 | |
| 9 | 77.42 | 0.5165 | 0.7432 | 69.68 | 0.4069 | 0.6995 | |
| 8 | 67.74 | 0.6947 | 0.6682 | 64.19 | 0.4478 | 0.6648 | |
| 7 | 80.65 | 0.5681 | 0.7786 | 78.39 | 0.6956 | 0.7589 | |
| 6 | 64.52 | 0.5957 | 0.6432 | 63.87 | 0.5191 | 0.6300 | |
| 4 | 67.74 | 0.4600 | 0.6886 | 57.42 | 0.5680 | 0.6270 | |
| 3 | 67.74 | 0.5598 | 0.6886 | 60.32 | 0.3548 | 0.6414 | |
| 2 | 64.52 | 0.5161 | 0.5000 | 60.65 | 0.4366 | 0.5150 | |
Figure 3The ROC curve for the classification using Smo Dri Chew Diff p63 as the selected features.
Best results of SVM on the selected feature subsets.
| No. of feature | Feature subset | SVM | ||
|---|---|---|---|---|
| Average accuracy (%) | Root Mean Square Error (RMSE) | Average AUROC | ||
| 10 | 64.76 | 0.4000 | 0.5000 | |
| 9 | 64.76 | 0.4000 | 0.5000 | |
| 8 | 64.76 | 0.4000 | 0.5000 | |
| 7 | 64.76 | 0.4000 | 0.5000 | |
| 6 | 61.43 | 0.6000 | 0.4750 | |
| 5 | 64.76 | 0.4000 | 0.5000 | |
| 4 | 64.76 | 0.4000 | 0.5000 | |
| 3 | 64.76 | 0.4000 | 0.5000 | |
| 2 | 61.43 | 0.4000 | 0.5000 | |
Best results of logistic regression for the selected feature subsets.
| No. of feature | Feature subset | Logistic regression | ||
|---|---|---|---|---|
| Accuracy (%) | Root Mean Square Error (RMSE) | AUROC | ||
| 10 | 64.5161 | 0.7315 | 0.5 | |
| 9 | 64.5161 | 0.7304 | 0.5 | |
| 8 | 64.5161 | 0.7303 | 0.5 | |
| 7 | 64.5161 | 0.7303 | 0.5 | |
| 6 | 64.5161 | 0.7293 | 0.4545 | |
| 5 | 61.2903 | 0.7265 | 0.7 | |
| 4 | 54.8387 | 0.7260 | 0.65 | |
| 3 | 51.6129 | 0.7305 | 0.625 | |
| 2 | 51.6129 | 0.7301 | 0.625 | |