| Literature DB >> 35845436 |
Mohammad A Mezher1, Almothana Altamimi2, Ruhaifa Altamimi3.
Abstract
Cancer is defined as an abnormal growth of human cells classified into benign and malignant. The site makes further classification of cancers of initiation and genomic underpinnings. Lung cancer displays extreme heterogeneity, making genomic classification vital for future targeted therapies. Especially considering lung cancers account for 1.76 million deaths worldwide annually. However, tumors do not always correlate to cancer as they can be benign, severely dysplastic (pre-cancerous), or malignant (cancerous). Lung cancer presents with ambiguous symptoms, thus is difficult to diagnose and is detected later compared to other cancers. Diagnosis relies heavily on radiology and invasive procedures. Different models developed employing Artificial Intelligence (AI), and Machine Learning (ML) have been used to classify various cancers. In this study, the authors propose a Genetic Folding Strategy (GFS) based model to predict lung cancer from a lung cancer dataset. We developed and implemented GF to improve Support Vector Machines (SVM) classification kernel functions and used it to classify lung cancer. We developed and implemented GF to improve SVM classification kernel functions and used it to classify lung cancer. Classification performance evaluations and comparisons between the authors' GFS model and three SVM kernels, linear, polynomial and radial basis function, were conducted thoroughly on real lung cancer datasets. While using GFS in classifying lung cancer, the authors obtained an accuracy of 96.2%. This is the highest current accuracy compared to other kernels.Entities:
Keywords: classification; evolutionary algorithms; genetic folding algorithm; genetic programming; lung cancer; support vector machine
Year: 2022 PMID: 35845436 PMCID: PMC9280892 DOI: 10.3389/frai.2022.826374
Source DB: PubMed Journal: Front Artif Intell ISSN: 2624-8212
Figure 1GF life cycle.
A list of parameters and values used in the experiments.
|
|
|
|---|---|
| Operators | {+_v, +_s, - v, - s, *_s} |
| Operands | {x, y} |
| Fitness function | Apply the produced equation to the datapoints |
| Selection function | Roulette wheel selection |
| Mutation function | Mutate at less than or equals to 0.5 ratios for each operator and operand |
| Stopping criterion | 300 generations |
| No. of generations | 20 |
| No. of populations | 50 |
| Mutation rate | 0.5 |
| Scaler | Sklearn.preprocessing.StandardScaler() |
| K-Folds | 5-folds |
Lung cancer dataset sample.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| YES | M | 69 | 1 | 2 | 2 | 1 | 1 | 2 | 1 | 2 | 2 | 2 | 2 | 2 | 2 |
| YES | M | 74 | 2 | 1 | 1 | 1 | 2 | 2 | 2 | 1 | 1 | 1 | 2 | 2 | 2 |
| NO | F | 59 | 1 | 1 | 1 | 2 | 1 | 2 | 1 | 2 | 1 | 2 | 2 | 1 | 2 |
| NO | M | 63 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 2 | 2 |
| NO | F | 63 | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 2 | 1 | 2 | 2 | 1 | 1 |
Lung cancer scaled dataset sample.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| YES | 0.953 | 0.772 | 1.135 | 0.869 | 1.003 | 1.003 | 1.010 | 0.697 | 1.120 | 0.892 | 0.892 | 0.852 | 0.749 | 1.064 | 0.892 |
| YES | 0.953 | 1.382 | 0.881 | 1.150 | 0.997 | 1.003 | 0.990 | 0.697 | 0.892 | 1.120 | 1.120 | 1.173 | 0.749 | 1.064 | 0.892 |
| NO | 1.050 | 0.448 | 1.135 | 1.150 | 0.997 | 0.997 | 1.010 | 0.697 | 1.120 | 0.892 | 1.120 | 0.852 | 0.749 | 0.940 | 0.892 |
| NO | 0.953 | 0.040 | 0.881 | 0.869 | 1.003 | 1.003 | 1.010 | 1.435 | 1.120 | 1.120 | 0.892 | 1.173 | 1.336 | 1.064 | 0.892 |
| NO | 1.050 | 0.040 | 1.135 | 0.869 | 0.997 | 1.003 | 1.010 | 1.435 | 1.120 | 0.892 | 1.120 | 0.852 | 0.749 | 0.940 | 1.120 |
A list of features found in the lung cancer dataset (Bhat, 2021).
|
|
|
|
|
|---|---|---|---|
| F_1 | Gender | M(male), F(female) | - |
| F_2 | Age | Age of the patient | 62.6 |
| F_3 | Smoking | YES=2, NO=1 | 1.5 |
| F_4 | Yellow fingers | YES=2, NO=1 | 1.5 |
| F_5 | Anxiety | YES=2, NO=1 | 1.5 |
| F_6 | Peer_pressure | YES=2, NO=1 | 1.5 |
| F_7 | Chronic Disease | YES=2, NO=1 | 1.5 |
| F_8 | Fatigue | YES=2, NO=1 | 1.8 |
| F_9 | Allergy | YES=2, NO=1 | 1.5 |
| F_10 | Wheezing | YES=2, NO=1 | 1.5 |
| F_11 | Alcohol | YES=2, NO=1 | 1.5 |
| F_12 | Coughing | YES=2, NO=1 | 1.6 |
| F_13 | Shortness of breath | YES=2, NO=1 | 1.6 |
| F_14 | Swallowing difficulty | YES=2, NO=1 | 1.5 |
| F_15 | Chest pain | YES=2, NO=1 | 1.5 |
| Out | Lung cancer | YES, NO | - |
Figure 2The results of the GF toolbox for the lung cancer dataset. (A) Roc curve. (B) Fitness values. (C) Training vs. testing time (seconds). (D) Mean squared errors. (E) Accuracy vs. complexity (folds). (F) Best GF chromosome in tree format.
List of parameters and values used in the experiments.
|
|
|
|
|---|---|---|
|
|
|
|
| Random forest (Santos, | 95.8 | 0.05 |
| SVM (linear) | 93.6 | 2.38 |
| SVM (RBF) | 91.0 | 2.76 |
| SVM (polynomial) | 89.7 | 2.17 |
| Logistic regression (Santos, | 84.0 | 0.0 |
| Gaussian NB (Santos, | 84.0 | 0.0 |
| Gradient boosting (Santos, | 84.0 | 0.0 |
| KNeighbors (Santos, | 76.5 | 0.0 |
| AdaBoost (Santos, | 76.5 | 0.1 |
| Linear regression (Bhatt, | 64.0 | 0.0 |
| KNeighbors classifier (Bhatt, | 93.5 | 0.0 |
| Quadratic discriminant analysis (Wu, | 96.1 | 0.0 |
Best are in bold.