| Literature DB >> 36077825 |
Molly J Carroll1, Katja Kaipio2, Johanna Hynninen3, Olli Carpen2,4,5, Sampsa Hautaniemi4, David Page6, Pamela K Kreeger1,7,8,9.
Abstract
The time between the last cycle of chemotherapy and recurrence, the platinum-free interval (PFI), predicts overall survival in high-grade serous ovarian cancer (HGSOC). To identify secreted proteins associated with a shorter PFI, we utilized machine learning to predict the PFI from ascites composition. Ascites from stage III/IV HGSOC patients treated with neoadjuvant chemotherapy (NACT) or primary debulking surgery (PDS) were screened for secreted proteins and Lasso regression models were built to predict the PFI. Through regularization techniques, the number of analytes used in each model was reduced; to minimize overfitting, we utilized an analysis of model robustness. This resulted in models with 26 analytes and a root-mean-square error (RMSE) of 19 days for the NACT cohort and 16 analytes and an RMSE of 7 days for the PDS cohort. High concentrations of MMP-2 and EMMPRIN correlated with a shorter PFI in the NACT patients, whereas high concentrations of uPA Urokinase and MMP-3 correlated with a shorter PFI in PDS patients. Our results suggest that the analysis of ascites may be useful for outcome prediction and identified factors in the tumor microenvironment that may lead to worse outcomes. Our approach to tuning for model stability, rather than only model accuracy, may be applicable to other biomarker discovery tasks.Entities:
Keywords: Lasso; ascites; model stability; ovarian cancer; platinum-free interval; robustness
Year: 2022 PMID: 36077825 PMCID: PMC9454800 DOI: 10.3390/cancers14174291
Source DB: PubMed Journal: Cancers (Basel) ISSN: 2072-6694 Impact factor: 6.575
Figure 1NACT and PDS regimens. (A) Schematic of NACT and PDS treatment regimens and measurement of PFI. (B) Box and whisker plot comparison of PFIs in the NACT and PDS subcohorts. * indicates p = 0.0123 by Mann–Whitney test. (C) Heatmap of Yeo–Johnson power transformation-normalized analyte concentrations for patients grouped by treatment regimen. (D) Similarity matrix for analyte concentrations based on Pearson correlation.
Summary of clinical data for NACT and PDS cohorts.
| NACT | PDS | Statistical Comparison | |
|---|---|---|---|
| N | 25 | 14 | |
| Age (years) | 65.4 ± 7.0 | 66.1 ± 8.5 | |
| Stage III | 14 | 6 | |
| Stage IV | 11 | 8 | |
| 0–10 mm residual 1 | 16 | 9 | |
| >10 mm residual | 5 | 5 | |
| <1 L ascites 2 | 5 | 4 | |
| CA-125 (serum) | 1900 ± 2900 | 2800 ± 4600 | |
| Total chemotherapy 3 cycles | 6.1 ± 2.5 | 6.8 ± 1.4 | |
| Carboplatin 4 | 4 | 1 | |
| Bevacizumab maintenance | 4 | 3 |
1 residual information not available for all NACT patients. 2 Estimated at time of diagnosis, not available for 2 patients in each category. 3 For NACT, three cycles were completed prior to surgery; one patient had four cycles prior to surgery. 4 Most patients received carboplatin with paclitaxel—values here are for those who received carboplatin alone or carboplatin with doxorubicin.
Figure 2Regularized regression predicts PFI from analyte levels in ascites of NACT cohort. Coefficient for each analyte feature in the optimal Lasso model trained on all NACT patients. Cross-validation analysis includes average and standard deviation of coefficient values and the number of times each analyte was given a non-zero coefficient value across leave-one-out cross-validation folds in the optimal NACT model. Positive coefficients correspond to longer PFI.
Figure 3Reduction in features used in NACT model produces a more robust model. Lasso coefficient values and cross-validation error analysis for NACT model with RMSE = 18.6 days. Cross-validation analysis includes average coefficient values, standard deviation, and count of inclusion for the features retained in the full model across the 24 leave-one-out cross-validation folds. Positive coefficients correspond to longer PFI.
Figure 4Regularized regression predicts PFI from analyte levels in ascites of PDS cohort. Lasso coefficient values and cross-validation error analysis for optimal PDS model (RMSE = 6.65 days). Cross-validation analysis includes average coefficient values, standard deviation, and count of inclusion for the features retained in the full model across the 14 leave-one-out cross-validation folds. Positive coefficients correspond to longer PFI.