| Literature DB >> 29927052 |
Keefe Murphy1, Brendan T Murphy1, Susie Boyce1,2, Louise Flynn3, Sarah Gilgunn4,5, Colm J O'Rourke6, Cathy Rooney2, Henning Stöckmann7, Anna L Walsh6, Stephen Finn3, Richard J O'Kennedy4,5, John O'Leary3, Stephen R Pennington2, Antoinette S Perry6,8, Pauline M Rudd7, Radka Saldova7, Orla Sheils3, Denis C Shields2, R William Watson2.
Abstract
Classifying indolent prostate cancer represents a significant clinical challenge. We investigated whether integrating data from different omic platforms could identify a biomarker panel with improved performance compared to individual platforms alone. DNA methylation, transcripts, protein and glycosylation biomarkers were assessed in a single cohort of patients treated by radical prostatectomy. Novel multiblock statistical data integration approaches were used to deal with missing data and modelled via stepwise multinomial logistic regression, or LASSO. After applying leave-one-out cross-validation to each model, the probabilistic predictions of disease type for each individual panel were aggregated to improve prediction accuracy using all available information for a given patient. Through assessment of three performance parameters of area under the curve (AUC) values, calibration and decision curve analysis, the study identified an integrated biomarker panel which predicts disease type with a high level of accuracy, with Multi AUC value of 0.91 (0.89, 0.94) and Ordinal C-Index (ORC) value of 0.94 (0.91, 0.96), which was significantly improved compared to the values for the clinical panel alone of 0.67 (0.62, 0.72) Multi AUC and 0.72 (0.67, 0.78) ORC. Biomarker integration across different omic platforms significantly improves prediction accuracy. We provide a novel multiplatform approach for the analysis, determination and performance assessment of novel panels which can be applied to other diseases. With further refinement and validation, this panel could form a tool to help inform appropriate treatment strategies impacting on patient outcome in early stage prostate cancer.Entities:
Keywords: zzm321990LASSOzzm321990; biomarkers; indolent; integration; omics; prostate cancer
Mesh:
Substances:
Year: 2018 PMID: 29927052 PMCID: PMC6120220 DOI: 10.1002/1878-0261.12348
Source DB: PubMed Journal: Mol Oncol ISSN: 1574-7891 Impact factor: 6.603
Variables retained in biomarker panel
| Data set | No. variables | No. retained | Biomarker panels |
|---|---|---|---|
| Clinical data | 4 | 4 | bxGS, Age, PSA, DRE |
| DNA methylation | 17 | 4 | GSTP1, CTNNA2., MAGPIE.1, LXN |
| Transcripts 1 | 12 | 4 | miR.663a., miR.20a.5p, miR.221.3p, miR.143.3p |
| Transcripts 2 | 25 | 9 | miR.330, miR.222, miR.101, miR.16.1, ALCAM, FAM49B, IGFBP3, AMACR, SFRP4 |
| Proteomics | 91 | 27 | DYVSQFEGSALGK and LLDNWDSVTSTFSK (APA1), EPCVESLVSQYFQTVTDYGK (APOA2), IDQNVEELK(APOA4), NPNLPPETVDSLK (APOD), WVQTLSEQVQEELLSSQVTQELR and VQAAVGTSAAPVPSDNH (APOE), DLLLPWPDLR and VAAGAFQGLR (LRG1), ITCAEEGWSPTPK and TGDIVEFVCK (CFHR2), SDLAVPSELALLK and AAIPSALDTNSSK (LGALS3BP), SVLGQLGITK (SERPINA1), ADLSGITGAR (SERPINA3), NEDSLVfVQTDK (A2M), DFDFVPPVVR (C3), TEHYEEQIEAFK (C9), ELGCGCAASGTPSGILYEPPAEK (CD5L), EDSLEAGLPLQVR (CHGA), TTLSGAPCQPWASEATYR (F12), YGIDWASGR (FCN3), LAAIAESGVER (PSMB6), ETLLQDFR (AMBP), WEAERPVYVQRP (AZGP1), NVPLPVIAELPPK (IGHM), EAVPEPVLLSR (TGFB1) |
| Glycosylation | 50 | 13 | A2[3]G1S[3]1, A2G2S[3]1, A1, A2BG2S[3,6]2, A4F1G3S3, M6 D3, A2BG2S[6]1, A2[6]BG1, A4G4S[3,3,3,3]4, FA2BG2, A3G3S[3,3,6]3, FA2[6]BG1, FA2G2S[3,6]2 |
Table includes within each dataset the number of variables measured, the number retained by the stepwise selection procedure, and the final biomarker panels used by the individual logistic regression models. For the Proteomics variables, the peptide sequence is included with the gene name in brackets. See Table S3 for further details on function and relevance to PCa.
Patient numbers within each dataset
| Data set | No. patients (%) | |||
|---|---|---|---|---|
| Indolent | Significant | Aggressive | Total | |
| Clinical data | 46 (29.11) | 56 (35.44) | 56 (35.44) | 158 |
| DNA methylation | 23 (21.90) | 43 (40.95) | 39 (37.14) | 105 |
| Transcripts 1 | 20 (21.98) | 39 (42.86) | 32 (35.16) | 91 |
| Transcripts 2 | 21 (18.26) | 48 (41.74) | 46 (40.00) | 115 |
| Proteomics | 34 (29.31) | 35 (30.17) | 47 (40.52) | 116 |
| Glycosylation | 41 (35.04) | 41 (27.35) | 44 (37.61) | 117 |
| Full information | 10 (21.74) | 17 (36.96) | 19 (41.30) | 46 |
Table shows the breakdown of sample sizes and indolent, significant and aggressive disease type prevalence across the partially overlapping datasets; the clinical cohort, each of the five datasets and the subset of patients with full clinical and biomarker information.
Figure 1ROC curves for the individual clinical and biomarker panel models and associated AUC values. (A) Indolent vs. nonindolent patients, (B) significant vs. nonsignificant patients, (C) aggressive vs. nonaggressive patients.
Figure 2Performance assessment of the IGP model via ROC Curve Analysis (A), calibration curve analysis (C) and decision curve analysis (D). The AUCs in (A) can be compared to those from the ROC curves for the logistic regression model based on the four current clinical parameters of age, Biopsy Gleason, DRE and PSA in (B). Solid colours correspond to indolent, significant and aggressive in all four cases. The horizontal location of the dotted confidence interval error bars in (C) relate to the different bins used in the Calibration Curve Analysis for the indolent, significant and aggressive probabilities.
Final Multi AUC and ORC values for the clinical and integrated geometric pooling model
| Type | Specificity | Sensitivity | AUCs | Multi AUC | ORC |
|---|---|---|---|---|---|
| Clinical Model | |||||
| Indolent | 0.92 (0.85, 0.97) | 0.55 (0.43, 0.66) | 0.78 (0.70, 0.85) | 0.67 (0.62, 0.72) | 0.72 (0.67, 0.78) |
| Significant | 0.67 (0.58, 0.75) | 0.45 (0.27, 0.64) | 0.47 (0.37, 0.57) | ||
| Aggressive | 0.79 (0.70, 0.86) | 0.62 (0.48, 0.75) | 0.78 (0.70, 0.86) | ||
| Integrated Geometric Pooling Model | |||||
| Indolent | 0.92 (0.84, 0.96) | 0.84 (0.70, 0.93) | 0.95 (0.91, 0.98) | 0.91 (0.89, 0.94) | 0.94 (0.91, 0.96) |
| Significant | 0.82 (0.73, 0.89) | 0.67 (0.53, 0.79) | 0.87 (0.82, 0.92) | ||
| Aggressive | 0.87 (0.79, 0.93) | 0.75 (0.62, 0.86) | 0.93 (0.89, 0.97) | ||
The table describes the performance comparison of the clinical model against the IGP model, using various numeric metrics, with 95% Confidence Intervals in parentheses.
Figure 3Misclassification Cobweb generated using the misclassification ratios of the confusion ratio matrix, which is the column‐normalised version of the confusion matrix depicting misclassification errors for each of the six (|T|2 − |T|) types of classification error that can be made for this |T| = 3‐class problem, depicted by three 6‐sided polygons, which maps the misclassification rates of the confusion ratio matrices resulting from (a) random assignment, whereby a patient is equally likely to be indolent, significant or aggressive, with probability 1/|T| = 1/3 (dotted orange); (b) the clinical model (purple); and (c) the IGP model (green).