| Literature DB >> 34992648 |
Ishleen Kaur1, M N Doja1, Tanvir Ahmad1, Musheer Ahmad1, Amir Hussain2, Ahmed Nadeem3, Ahmed A Abd El-Latif4.
Abstract
Ovarian cancer is the third most common gynecologic cancers worldwide. Advanced ovarian cancer patients bear a significant mortality rate. Survival estimation is essential for clinicians and patients to understand better and tolerate future outcomes. The present study intends to investigate different survival predictors available for cancer prognosis using data mining techniques. Dataset of 140 advanced ovarian cancer patients containing data from different data profiles (clinical, treatment, and overall life quality) has been collected and used to foresee cancer patients' survival. Attributes from each data profile have been processed accordingly. Clinical data has been prepared corresponding to missing values and outliers. Treatment data including varying time periods were created using sequence mining techniques to identify the treatments given to the patients. And lastly, different comorbidities were combined into a single factor by computing Charlson Comorbidity Index for each patient. After appropriate preprocessing, the integrated dataset is classified using appropriate machine learning algorithms. The proposed integrated model approach gave the highest accuracy of 76.4% using ensemble technique with sequential pattern mining including time intervals of 2 months between treatments. Thus, the treatment sequences and, most importantly, life quality attributes significantly contribute to the survival prediction of cancer patients.Entities:
Mesh:
Year: 2021 PMID: 34992648 PMCID: PMC8727098 DOI: 10.1155/2021/6342226
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1Methodology followed in study.
Figure 2Treatment preprocessing.
Figure 3Time intervals in treatment sequences.
Dataset description.
| Attribute | Description | Range/values | |
|---|---|---|---|
| Clinical attributes | Age | Age at the time of diagnosis | 17–80 (median: 54) |
| CA-125 | CA-125 value at the time of diagnosis | 8.7–16301 (median: 929.13) | |
| Ascites | Presence of ascites in the body | Yes: 114 | |
| No: 26 | |||
| Grade | Abnormality level of cancer cells | 2–4 (median: 3) | |
| Stage | Figo substage | 3–4 (median: 4) | |
| Histology | Microscopic regularity of cancer cells | Clear cell: 1 | |
| Endometrioid: 4 | |||
| Serous: 111 | |||
| Small cell:1 | |||
| Germ cell: 1 | |||
| Mucinous: 6 | |||
| Poorly/undifferentiated: 13 | |||
| Mixed: 3 | |||
|
| |||
| Treatment attributes | Treatment sequences | Frequent treatment sequences obtained after sequence mining | Surgery ⟶ chemotherapy |
| NACT ⟶ surgery | |||
| NACT ⟶ hormonal therapy | |||
| Chemotherapy ⟶ hormonal therapy | |||
| Surgery ⟶ hormonal therapy | |||
| Chemotherapy ⟶ CRS | |||
| Surgery ⟶ NACT | |||
| Life quality attributes | CCI | Charlson comorbidity index obtained using comorbidities | 2–9 (median: 3) |
| ECOG performance status | The general well-being of a patient | 1–5 (median: 2) | |
|
| |||
| Class attribute | Outcome | Survival outcome after three years of cancer diagnosis | Yes: 59 |
| No: 81 | |||
Figure 4Data analysis with survival.
Experimental details.
| Model | Parameter settings |
|---|---|
| Bagging | Method = decision trees |
| Max number of splits = 139 | |
| Learning rate = 0.1 | |
|
| |
| Boosting | Ensemble method = AdaBoost |
| Max number of splits = 20 | |
| Learning rate = 0.1 | |
|
| |
| Random forest | Random number seed = 0 |
| Maximum depth = unlimited | |
|
| |
| XGBoost | Maximum number of trees = 100 |
| Logistic regression | — |
Classification results.
| Accuracy (%) | True positive rate or sensitivity | Specificity | Area under curve | ||
|---|---|---|---|---|---|
| 6 months | Bagging | 71.4 |
| 0.61 | 0.80 |
| Random forest | 70.7 | 0.64 | 0.8 | 0.72 | |
| Boosting |
| 0.69 |
|
| |
| Logistic regression | 65.7 | 0.68 | 0.63 | 0.70 | |
| XGBoost | 71.42 | 0.71 | 0.64 | 0.78 | |
|
| |||||
| 2 months | Bagging | 74.3 |
| 0.59 | 0.82 |
| Random forest | 75.7 | 0.72 |
| 0.82 | |
| Boosting |
| 0.80 | 0.71 |
| |
| Logistic regression | 67.1 | 0.64 | 0.71 | 0.70 | |
| XGBoost | 73.8 | 0.73 | 0.63 | 0.79 | |
Figure 5ROC curves for (a) boosting in 2 months' time interval; (b) boosting in 6 months' time interval.
Treatment attributes selected.
| 2 months' time interval | 6 months' time interval | ||
|---|---|---|---|
| Attributes | Information gain | Attributes | Information gain |
|
| |||
| Chemotherapy_T5_CRS | 0.0458 | Chemotherapy_T1_hormonal therapy | 0.0408 |
|
| |||
| Surgery_T5_chemotherapy | 0.0272 | NACT_T1_hormonal therapy | 0.008 |
|
| |||
| Chemotherapy_T4_CRS | 0.023 | ||
| NACT_T1_hormonal therapy | 0.01 | ||
Classification results for each data profile.
| Data profile | Highest accuracy in % (classifier) |
|---|---|
| Clinical dataset | 61.4 (bagging) |
| Treatment dataset | 65 (boosting) |
| Life quality dataset | 71.4 (boosting) |
Figure 6Without sequence treatment processing.
Comparison of results.
| Accuracy | Sensitivity or true positive rate | Specificity | Area under curve | |
|---|---|---|---|---|
| Without sequence mining | 0.707 | 0.78 | 0.71 | 0.77 |
| 2-month time interval |
|
| 0.71 |
|
| 6-month time interval | 0.736 | 0.69 |
| 0.81 |
Figure 7Comparison of results.
Statistical significance.
| Approach | ‘2-months time interval' with ‘without sequence mining' |
|---|---|
|
| 1.90429 |
|
| 0.036491 |
Comparison of techniques with previous literature.
| S.no. | Authors | Dataset | Type of cancer with stage | Stage of cancer patients used | Type of attributes | Classification technique used | Results |
|---|---|---|---|---|---|---|---|
| 1. | Matsuo et al. [ | Clinical-768 patients | Cervical cancer | All stage | (i) Clinical | Deep learning and cox proportional model | Mean absolute error of 30.7 (deep learning), 43.6 (cox proportional hazard regression) |
| (ii) Treatment | |||||||
|
| |||||||
| 2. | Park et al. [ | SEER dataset | Breast cancer | All stage | (i) Clinical | Subgroup mining | Effective rules generated |
| (ii) Treatment | |||||||
|
| |||||||
| 3. | Simsek et al. [ | SEER dataset | Breast cancer | All stage | (i) Clinical | ANNs and logistic regression | 83.6% (ANNs) |
| 82.9% (LR) for 5-year survival | |||||||
|
| |||||||
| 4. | Wang et al. [ | Clinical-1075 patients | Lung cancer | All stage | (i) Clinical | Gaussian bayesian network |
|
| (ii) Treatment | |||||||
| (iii) Comorbidities | |||||||
|
| |||||||
| 5. | García-Laencina et al. [ | Clinical-399 patients | Breast cancer | All stage | (i) Clinical | KNN, logistic regression, decision trees, support vector machine | 81% (highest in KNN) |
| (ii) Treatment | |||||||
|
| |||||||
| 6. | Toth et al. [ | National health database-28817 patients | Colon cancer | All stage | (i) Treatment | Sequence mining | — |
| 7. | Koo et al. [ | Clinical-7267 patients | Prostate cancer | All stage | (i) Clinical | Artificial neural networks | 84.9% overall 5-year survival |
| (ii) Treatment | |||||||
|
| |||||||
| 8. | Kate and Nadig [ | SEER dataset | Breast cancer | All stage | (i) Clinical | Logistic regression, naïve bayes, decision tree | 84.2% (naïve bayes) |
| (ii) Treatment | |||||||
|
| |||||||
| 9. | Malhotra et al. [ | Clinical-393 patients | Glioblastoma cancer | All stage | (i) Treatment | Sequence mining with statistical techniques | 85% (logistic regression) |
| (ii) Genetic | |||||||
| (iii) Clinical | |||||||
|
| |||||||
| 10. | Guo et al. [ | Clinical-5842 patients | Cervical cancer | Stage IA1 to IIB2 | (i) Clinical | SVM, decision tree, random forest, ANN etc. | 0.895 and 0.89 AUC (light GBM and random forest) |
| 11. | Kalafi et al. [ | University Malaya medical cancer registry-8066 patients | Breast cancer | All stage | (i) Clinical | SVM, MLP (multilayer perceptron), decision trees, random forest | 88.2% accuracy (MLP) |
| (ii) Treatment | |||||||
|
| |||||||
| 12. | Alabi et al. [ | SEER dataset | Oral cancer | All stage | (i) Clinical | Logistic regression, SVM, bayes point, boosting, decision forest, decision jungle | 88.7% (boosting) |
| 13. | Bos et al. [ | Clinical-177 patients | Oral cancer | All stage | (i) Clinical | Logistic regression | 0.744 AUC |
| (ii) Radiomic (MRI) | |||||||
|
| |||||||
| 14. | Hira et al. [ | TCGA-579 and 593 samples | Ovarian cancer | All stage | (i) Multi-omics data | Deep learning | 93.2–95.5% and 87.1–95.7% accuracy |
| 15. | Proposed approach | Clinical-140 patients | Ovarian cancer | Advanced stage | (i) Clinical | Sequence mining with ensemble | 76.4% accuracy and 0.85 AUC (boosting) |
| (ii) Treatment | |||||||
| (iii) Life quality (comorbidities + ECOG) | |||||||