| Literature DB >> 26503707 |
Wim Verleyen1, Simon P Langdon2, Dana Faratian2, David J Harrison3, V Anne Smith1.
Abstract
Current clinical practice in cancer stratifies patients based on tumour histology to determine prognosis. Molecular profiling has been hailed as the path towards personalised care, but molecular data are still typically analysed independently of known clinical information. Conventional clinical and histopathological data, if used, are added only to improve a molecular prediction, placing a high burden upon molecular data to be informative in isolation. Here, we develop a novel Monte Carlo analysis to evaluate the usefulness of data assemblages. We applied our analysis to varying assemblages of clinical data and molecular data in an ovarian cancer dataset, evaluating their ability to discriminate one-year progression-free survival (PFS) and three-year overall survival (OS). We found that Cox proportional hazard regression models based on both data types together provided greater discriminative ability than either alone. In particular, we show that proteomics data assemblages that alone were uninformative (p = 0.245 for PFS, p = 0.526 for OS) became informative when combined with clinical information (p = 0.022 for PFS, p = 0.048 for OS). Thus, concurrent analysis of clinical and molecular data enables exploitation of prognosis-relevant information that may not be accessible from independent analysis of these data types.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26503707 PMCID: PMC4622081 DOI: 10.1038/srep15563
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Added value of proteomics for predicting progression-free survival.
(a–c) Example images representing proteomics, a fluorescence AQUA image (a) clinicopathology, a histological slice (b) and the combination (c). (d) C-index of Cox proportional hazards regression models for proteomics data only, clinicopathological data only, and combined proteomics and clinicopathological data. (e–g) Corresponding Monte Carlo (MC) analyses showing histograms of c-index from 10,000 randomised datasets; value of the actual analysis is highlighted and its p-value indicated (*-significant); histogram bars are coloured green below the actual value and pink above. (h–k) As for (d–g) after LASSO feature selection; selected features shown below MC histograms in order of decreasing hazard ratio. Note only proteomics data was randomised in (g) and (k).
Clinicopathological and proteomic measures.
| pERK | x | |||
| age | continuous (days) | pβCatenin | x | |
| stratified < >50 years | pSTAT3 (Ser727) | x | ||
| histopathology | papillary serous | pSTAT3 (Ser705) | x | |
| clear cell | pNFkB | x | ||
| endometrioid | pRB | x | ||
| mixed histology | pH2AX | x | ||
| mucinous | pBRCA1 | x | ||
| adenocarcinoma | p-p53 | x | ||
| stage | stage 1 | Ki67 | x | |
| stage 2 | phosphohistone H3 (pHH3) | x | ||
| stage 3 | cleaved caspase-3 | x | ||
| stage 4 | WT1 | x | ||
| regimen | platinum | Snail | x | |
| platinum + taxane | Slug | x | ||
| E-cadherin | x | |||
| progression-free survival | continuous (days) | estrogen receptor-β 1 (ERβ1) | x | x |
| overall survival | continuous (days) | estrogen receptor-β 2 (ERβ2) | x | x |
Figure 2Shuffling methodology for novel Monte Carlo analysis.
(a) Graphical representation of a dataset with patient outcome in the leftmost column and the remainder of the columns representing predictor variables; each row is coloured uniquely in a gradient to represent data from an individual patient for illustrative purposes. (b) For the Monte Carlo analysis, the values of each variable are shuffled, randomising that single variable with respect to patient outcome; this is carried out independently for each variable such that correspondence both between a variable and outcome, and among variables, is broken. Note this differs from standard Monte Carlo analyses, which would shuffle only patient outcome with respect to predictors, thus maintaining correspondence among variables. (c) The shuffling procedure can also be performed on a subset of variables, to evaluate only the added value of these variables.