| Literature DB >> 35684753 |
Naif Al Mudawi1, Abdulwahab Alazeb1.
Abstract
A growing number of individuals and organizations are turning to machine learning (ML) and deep learning (DL) to analyze massive amounts of data and produce actionable insights. Predicting the early stages of serious illnesses using ML-based schemes, including cancer, kidney failure, and heart attacks, is becoming increasingly common in medical practice. Cervical cancer is one of the most frequent diseases among women, and early diagnosis could be a possible solution for preventing this cancer. Thus, this study presents an astute way to predict cervical cancer with ML algorithms. Research dataset, data pre-processing, predictive model selection (PMS), and pseudo-code are the four phases of the proposed research technique. The PMS section reports experiments with a range of classic machine learning methods, including decision tree (DT), logistic regression (LR), support vector machine (SVM), K-nearest neighbors algorithm (KNN), adaptive boosting, gradient boosting, random forest, and XGBoost. In terms of cervical cancer prediction, the highest classification score of 100% is achieved with random forest (RF), decision tree (DT), adaptive boosting, and gradient boosting algorithms. In contrast, 99% accuracy has been found with SVM. The computational complexity of classic machine learning techniques is computed to assess the efficacy of the models. In addition, 132 Saudi Arabian volunteers were polled as part of this study to learn their thoughts about computer-assisted cervical cancer prediction, to focus attention on the human papillomavirus (HPV).Entities:
Keywords: cervical cancer; gradient boosting; human papillomavirus (HPV); machine learning (ML); support vector machine (SVM)
Mesh:
Year: 2022 PMID: 35684753 PMCID: PMC9185380 DOI: 10.3390/s22114132
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Comparative analysis of existing research.
| Source | Used Dataset | Classifiers | Evaluation Matrix | Findings |
|---|---|---|---|---|
| [ | UCL-858 patients and 36 attributes | ROC-AUC | ML method | Cervical cancer diagnosis |
| [ | Patient demographics | N/A | Neural network | Applied Cox proportional techniques |
| [ | UCL repository | ROC-AUC | Decision tree | Hinslemann screening methods |
| [ | EHRs | AUC | Random forest | Traditional approaches |
| [ | N/A | G-mean and F-measure | ADTree | Handling the data imbalance |
| [ | Dataset collected from the University of California (UCI) | Using four target parameters: biopsy, cytology, Schiller, and Hinselmann, as well as 32 risk factors | Machine learning (ML) algorithms are applied, such as decision tree and decision jungle approaches. | Decision tree algorithm shows a higher value of 98.5%. |
| [ | Data mining technique | (AUROC) | The Microsoft Azure ML tool | Decision tree algorithm, a higher value range of 97.8% on the AUROC curve. |
| [ | A survey-based study on cervical cancer to collect data from 900 women aged 25 to 49 years | N/A | Using Stata 12.0 software. | A majority of 557 women (70.2%) acquired their information from the radio, while a minority of 120 women (15.1%) got their information from health care organizations. |
| [ | Unbalanced medical image dataset | Assisted in determining cervical cancer, and benefits and drawbacks of different approaches | Machine learning approaches | Employing deep learning to predict cervical cancer with high probability. |
| [ | A dataset from the University of California, Irvine | Used Hinslemann screening methods to forecast cervical cancer | Deep-learning neural network | Boosted decision tree, decision forest, and decision jungle approaches. |
| [ | Electronic health record (EHR) data | Four machine learning classifiers | Random forest algorithm | The boosted decision tree method produced a precise forecast of 98%. |
| [ | Data radiation on bone metastases in cervical cancer patients | Ant-miner, RIPPER, Ridor, PART, ADTree, C4.5, ELM, and Weighted ELM | Class imbalance learning (CIL) | Suggested genetic assistance as an optional strategy to enhance the validity of the prediction. |
| [ | N/A | Classification algorithms are used to construct the system | Method based on machine learning approaches | Utilized to improve classification accuracy and shorten the time it takes to develop a classification system. |
| [ | Data related to diabetes | Health specialists and other stakeholders collaborate | Big data analytics and machine-learning-based approaches may be used for diabetes. | Machine learning-based system might score as high as 86% on the diagnostic accuracy Of DL. |
| [ | UCI repository dataset | Classify patient data to detect cardiac disease | Boosted decision tree, decision forest | Score as high as 92% on the diagnostic accuracy of DL. |
Figure 1Proposed research model for classifying cervical cancer.
Attributes of the research dataset.
| No. | Attribute | Type |
|---|---|---|
| 1 | Age | Int |
| 2 | Number of sexual partners | Int |
| 3 | First sexual intercourse | Int |
| 4 | Number of pregnancies | Int |
| 5 | Smokes | Bool |
| 6 | Smokes (years) | Bool |
| 7 | Smokes (pack/year) | Bool |
| 8 | Hormonal contraceptives | Bool |
| 9 | Hormonal contraceptives (years) | Int |
| 10 | IUD | Bool |
| 11 | IUD (years) | Int |
| 12 | STDs | Bool |
| 13 | STDs (number) | Int |
| 14 | STDs: condylomatosis | Bool |
| 15 | STDs: cervical condylomatosis | Bool |
| 16 | STDs: vaginal condylomatosis | Bool |
| 17 | STDs: vulvo-perineal condylomatosis | Bool |
| 18 | STDs: syphilis | Bool |
| 19 | STDs: pelvic inflammatory | Bool |
| 20 | STDs: genital herpes | Bool |
| 21 | STDs: molluscum contagiosum | Bool |
| 22 | STDs: AIDS | Bool |
| 23 | STDs: HIV | Bool |
| 24 | STDs: hepatitis B | Bool |
| 25 | STDs: HPV | Bool |
| 26 | STDs: number of diagnoses | Int |
| 27 | STDs: time since first diagnosis | Int |
| 28 | STDs: time since last diagnosis | Int |
| 29 | Dx: cancer | Bool |
| 30 | Dx: CIN | Bool |
| 31 | Dx: HPV | Bool |
| 32 | Dx | Bool |
Classification report of the machine learning algorithms for classifying cervical cancer.
| Algorithm | For the Case of “0” | For the Case of “1” | ||||||
|---|---|---|---|---|---|---|---|---|
| Purpose | P | R | F1 | P | R | F1 | Accuracy Score | |
| Logistic Regression | Cervical cancer prediction | 0.98 | 1.00 | 0.99 | 1.00 | 0.77 | 0.87 | 0.98 |
| SVM | 0.99 | 1.00 | 1.00 | 1.00 | 0.92 | 0.96 | 0.99 | |
| Random Forest | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | |
| Decision Tree | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | |
| Adaptive Boosting | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | |
| KNN | 0.95 | 1.00 | 0.97 | 1.00 | 0.31 | 0.47 | 0.95 | |
Accuracy measurement of gradient boosting and XGradient boosting.
| Algorithm | MAE | MSE | RMSE | Accuracy | R2 |
|---|---|---|---|---|---|
| Gradient Boosting | 7.330935195811098 × 10−165 | 0.0 | 0.0 | 1.00 | 1.00 |
| XGBoost | 0.04847228 | 0.021919228 | 0.14805144 | 0.68628035 | 0.68628035 |
Figure 2Correlations between different variables of cervical cancer.
Figure 3Count measurement in terms of the number of pregnancies, number of sexual partners, and age.
Figure 4Visualization of comparison between biopsy and number of pregnancies.
Computational complexity of machine learning algorithms.
| Algorithm | Classification/Regression | Training | Prediction |
|---|---|---|---|
| Decision Tree | C + R |
|
|
| Random Forest | C + R |
|
|
| Gradient Boosting
| C + R |
|
|
| SVM (Kernel) | C + R |
|
|
| k-Nearest Neighbours | C + R | - |
|
Some major survey questions for investigating cervical cancer.
| Some Major Survey Questions that Match Survey Goal | Response N = 132 | ||
|---|---|---|---|
| Yes/Agree | No/Disagree | Maybe/No Idea | |
| Have you done a biopsy test or any other cervical cancer (uterus)-related test before? | 68% | 26% | 6% |
| Is everyone in your family aware of cervical cancer? | 76% | 20% | 4% |
| Do you agree with the statement that the rate of being affected by this cancer is becoming higher than before? | 73% | 10% | 17% |
| Do you know about human papillomavirus (HPV)? | 62% | 31% | 7% |
| Does living in a city or urban area affect how conscious people are of this cancer? | 71% | 21% | 8% |
| Have you had a biopsy or any other cervical cancer (uterus)-related test before? | 54% | 35% | 11% |
Figure 5Number of responses regarding the awareness of human papillomavirus (HPV).
Figure 6Survey responses regarding whether or not the rate of being affected by cervical cancer is becoming higher than before.
Figure 7Total percentage of individuals who have undergone a biopsy test or another cervical cancer (uterus)-related test before.
Figure 8The awareness level in rural and urban areas regarding cervical cancer.