| Literature DB >> 34693730 |
Alexandros Laios1, Angeliki Katsenou2, Yong Sheng Tan1, Racheal Johnson1, Mohamed Otify1, Angelika Kaufmann1, Sarika Munot1, Amudha Thangavelu1, Richard Hutson1, Tim Broadhead1, Georgios Theophilou1, David Nugent1, Diederick De Jong1.
Abstract
INTRODUCTION: Accurate prediction of patient prognosis can be especially useful for the selection of best treatment protocols. Machine Learning can serve this purpose by making predictions based upon generalizable clinical patterns embedded within learning datasets. We designed a study to support the feature selection for the 2-year prognostic period and compared the performance of several Machine Learning prediction algorithms for accurate 2-year prognosis estimation in advanced-stage high grade serous ovarian cancer (HGSOC) patients.Entities:
Keywords: Machine Learning; clinical factor analysis; cytoreduction; ovarian cancer; predictive factors; prognosis estimation
Mesh:
Year: 2021 PMID: 34693730 PMCID: PMC8549478 DOI: 10.1177/10732748211044678
Source DB: PubMed Journal: Cancer Control ISSN: 1073-2748 Impact factor: 3.302
Figure 1.Workflow showing integration of ML algorithms to analyse comprehensive resource of clinical, radiological and surgical data for the development of prognostic ovarian cancer models. The framework for building the predictive ML model comprised 5 steps.
Descriptive Statistics of the Advanced-HGSOC Cohort.
| Variables (n = 209) | Frequency | Percent (%) |
|---|---|---|
| Age, year, mean, SD (range) | 64.6±10.6 (41–85) | |
| Surgical Complexity Score (SCS) | ||
| Low (1–3) | 124 | 59.3 |
| Moderate (4–7) | 76 | 36.4 |
| High (8–12) | 9 | 4.3 |
| Radiological dissemination patterns | ||
| Intraperitoneal | 134 | 64.1 |
| Intraperitoneal and lymphatic | 59 | 28.2 |
| Intraperitoneal and haematogenous | 16 | 7.7 |
| Operation time, mean, SD (min-max) | 177±77 (45–485) | |
| Disease score | ||
| Pelvis (1) | 10 | 4.8 |
| Lower abdomen (2) | 187 | 89.5 |
| Upper abdomen (3) | 12 | 5.7 |
| Timing of surgery | ||
| PDS | 46 | 20 |
| IDS | 163 | 80 |
| Residual disease | ||
| R0 | 160 | 76.5 |
| R1 | 39 | 18.7 |
| R2 | 10 | 4.8 |
| Chemotherapy | ||
| Carboplatin+Taxol | 134 | 64.1 |
| Carboplatin+Taxol+Bevascusimab | 22 | 10.5 |
| Carbo+Taxol+PARP inhibitor | 25 | 12.0 |
| Carboplatin only | 22 | 10.5 |
| No | 6 | 2.9 |
Cox-Regression with Progression-Free Survival and Overall Survival as Outcomes.
| Progression-free survival (PFS) | Overall survival (OS) | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Variables | Univariate analysis | Multivariate analysis | Univariate analysis | Multivariate analysis | ||||||||
| HR | P | 95% CI | HR | P | 95% CI | HR | P | 95% CI | HR | P | 95% CI | |
| Age | .997 | .742 | .983–1.01 | .995 | .661 | .971–1.019 | 1.004 | .672 | .98–1.03 | .983 | .284 | .951–1.015 |
| ECOG performance status (PS) (0) | 1.000 | .133 | 1.000 | .19 | 1.000 | .002 | 1.000 | .131 | ||||
| ECOG performance status (PS) (1) | 0.5 | .085 | .23–1.1 | .46 | .08 | .2–1.1 | .289 | .006 | .12–.69 | .39 | .061 | .14–1.04 |
| ECOG performance status (PS) (2) | .52 | .115 | .24–1.15 | .55 | .17 | .23–1.3 | .367 | .027 | .15–.91 | .53 | .23 | .2–1.49 |
| ECOG performance status (PS) (3) | .75 | 0.5 | .32–1.73 | .71 | .44 | .3–1.7 | .716 | .47 | .29–1.71 | .71 | 0.5 | .26–1.94 |
| IP dissemination (1) | 1.000 | .158 | 1.000 | .188 | 1.000 | .009 | 1.000 | .007 | ||||
| IP dissemination (2) | 0.1 | .630 | .357–1.1 | .734 | .336 | .392–1.378 | 0.5 | .048 | .25–.99 | .556 | .127 | .262–1.182 |
| IP dissemination (3) | .49 | .811 | .44–1.48 | 1.035 | .919 | .534–2.01 | .957 | .904 | .46–1.95 | 1.226 | .603 | .570–2.637 |
| PDS | 1.43 | .084 | .95–2.14 | 1.610 | .039 | 1.026–2.529 | 1.648 | .087 | .93-2.92 | 2.008 | .039 | 1.035–3.894 |
| Residual disease (RD) | .671 | .034 | .464–.97 | .656 | .046 | .433–.992 | .422 | <.001 | .27-.66 | .437 | .001 | .264–.724 |
| Surgical complexity score (SCS)-low | 1.000 | .494 | 1.000 | .852 | 1.000 | .763 | 1.000 | .825 | ||||
| Surgical complexity score (SCS)-intermediate | .98 | .958 | .452–2.12 | 1.102 | .865 | .359–3.387 | 1.173 | 1.173 | .36–3.75 | .596 | .535 | .116–3.061 |
| Surgical complexity score (SCS)-high | .795 | .574 | .358–1.76 | .926 | .957 | .377–2.43 | .99 | .99 | .29–3.27 | 1.226 | .572 | 1.67–2.685 |
| Operation time | 1.000 | .921 | .998–1.02 | 1.001 | .686 | .997–1.004 | .708 | .999 | .997–1.01 | .999 | .522 | .994–1.02 |
| Carboplatin and Taxol | 25,04 | .03 | 2.93–213.7 | 34.56 | .002 | 3.83–311.65 | 18.09 | .008 | 2.11–155.1 | 43.77 | .001 | 4.45–430.19 |
| Disease score (DS) (1) | 1.000 | .947 | 1.000 | .810 | 1.000 | .592 | 1.000 | .516 | ||||
| Disease score (DS) (2) | .941 | .914 | .31–2.83 | 1.292 | .676 | .389–4.28 | .483 | .308 | .12–1.95 | .717 | .671 | .154–3.332 |
| Disease score (DS) (3) | .883 | .884 | .38-2.01 | .984 | .972 | .41–2.364 | .671 | .442 | .24–1.85 | .552 | .287 | .185–1.649 |
Figure 2.Cohort survival outcomes. Kaplan–Meier curves demonstrating (A) PFS and (B) OS analysed by complete and incomplete cytoreductive outcomes. (C) Stratification of residual disease according to intraperitoneal dissemination pattern. (D) Kaplan–Meier curves demonstrating OS according to IDP. Haematogenous metastases negatively affect OS, potentially highlighting difficulty to achieve complete cytoreduction (p:0.000).
Figure 3.Feature ranking graphs for 2-year PFS: (A) Univariate feature ranking for classification using chi-square tests. (B) Multivariate feature ranking using MRMR algorithm; feature ranking graphs for 2-year OS: (C) Univariate feature ranking for classification using chi-square tests. (D) Multivariate feature ranking using MRMR algorithm.
Figure 4.Example of a confusion matrix showing a) prediction accuracy for 2-year OS by use of (A) the SVM classifier with Quadratic Kernel (AUC: .66) (B) the k-NN (AUC: .63). The example shows that the prediction is more accurate for the negative class compared to the positive class.
Predictive Accuracy of the ML Models and Comparisons with Conventional Logistic Regression for the 2-Year PFS and OS.
| OS 2-years | |||||||
| Model | Accuracy | AUC_P | AUC_N | Precision | Recall | F-score | G-score |
| SVM – Quadratic Kernel | 72.9% | .66 | .418 | .7182 | .9076 | .8018 | .8074 |
| SVM – Cubic Kernel | 68.2% | .58 | .41 | .7252 | .8719 | .7917 | .7951 |
| Logistic Regression | 66.5% | .59 | .413 | .7209 | .9169 | .8071 | .8130 |
| Gaussian Naïve Bayes | 66.0% | .63 | .463 | .6934 | .9879 | .8148 | .8276 |
| KNN – 5 neighbors | 71.8% | .63 | .443 | .7009 | .8656 | .7742 | .7787 |
| KNN – 10 neighbors | 69.4% | .62 | .433 | .7081 | .8350 | .7661 | .7688 |
| Ensemble – Bagged Trees | 68.8% | .60 | .432 | .7086 | .8425 | .7695 | .7725 |
| Ensemble – Subspace Discriminant | 71.8% | .61 | .411 | .7154 | .9270 | .8071 | .8141 |
| PFS 2-years | |||||||
| Model | Accuracy | AUC | Precision | Recall | F-score | G-score | |
| SVM – Quadratic Kernel | 65.50% | .62 | .469 | .5160 | .8893 | .6530 | .6774 |
| SVM – Cubic Kernel | 58.20% | .52 | .485 | .4309 | .7286 | .5415 | .5603 |
| Logistic Regression | 56.50% | .58 | .468 | .5049 | .8478 | .6384 | .6619 |
| Gaussian Naïve Bayes | 58.80% | .55 | .49 | .4356 | .8373 | .5731 | .6039 |
| KNN – 5 neighbors | 57.60% | .54 | .452 | .4574 | .5834 | .5127 | .5165 |
| KNN – 10 neighbors | 56.18% | .58 | .446 | .4643 | .5947 | .5214 | .5254 |
| Ensemble – Bagged Trees | 55.30% | .52 | .494 | .4180 | .7497 | .5367 | .5598 |
| Ensemble – Subspace Discriminant | 59.40% | .58 | .475 | .5112 | .9096 | .6546 | .6819 |
Figure 5.Correlation heatmap of the features included in the ML models demonstrating the correlation amongst the features using a variation of Pearson’s R correlation coefficient. The colours in the heatmap represent the correlation coefficients. A weak correlation amongst features was demonstrated.
Figure 6.(A) Feature ranking for PFS based on the Lasso method. (B) Feature ranking for OS based on the Lasso method. (C) Feature ranking for PFS based on the Elastic Nets method. (D) Feature ranking for OS based on the Elastic Nets method.