| Literature DB >> 27756408 |
Jesus Gonzalez Bosquet1, Andreea M Newtson2, Rebecca K Chung2, Kristina W Thiel2, Timothy Ginader3,4, Michael J Goodheart2,3, Kimberly K Leslie2,3, Brian J Smith3,4.
Abstract
BACKGROUND: Nearly one-third of serous ovarian cancer (OVCA) patients will not respond to initial treatment with surgery and chemotherapy and die within one year of diagnosis. If patients who are unlikely to respond to current standard therapy can be identified up front, enhanced tumor analyses and treatment regimens could potentially be offered. Using the Cancer Genome Atlas (TCGA) serous OVCA database, we previously identified a robust molecular signature of 422-genes associated with chemo-response. Our objective was to test whether this signature is an accurate and sensitive predictor of chemo-response in serous OVCA.Entities:
Keywords: Chemo-response; Data integration; Individualized treatment; Ovarian cancer; Prediction model
Mesh:
Substances:
Year: 2016 PMID: 27756408 PMCID: PMC5070116 DOI: 10.1186/s12943-016-0548-9
Source DB: PubMed Journal: Mol Cancer ISSN: 1476-4598 Impact factor: 27.401
Clinical data from TCGA patients
| CR | IR |
| |
|---|---|---|---|
| Number of Patients | 292 | 158 | |
| Age (Avg.) | 60 | 59.6 | N.S. |
| Grade | N.S. | ||
| Grade 1 | 4 | 1 | |
| Grade 2 | 35 | 18 | |
| Grade 3 | 246 | 135 | |
| Stage |
| ||
| Stage I | 10 | 3 | |
| Stage II | 19 | 1 | |
| Stage III | 224 | 123 | |
| Stage IV | 39 | 29 | |
| Surgical outcome | N.S. | ||
| Optimal (<1 cm residual) | 207 | 92 | |
| Suboptimal (>1 cm residual) | 52 | 57 | |
| Optimal Treatment |
| ||
| Optimal (Surgery + 6 cycles) | 179 | 66 | |
| Suboptimal | 113 | 92 |
*Multivariable analysis of TCGA clinical variables: Only FIGO stage and optimal treatment (including optimal surgery AND 6 cycles of platinum-based chemotherapy) were independently associated with chemo-response in serous OVCA
Fig. 1Survivorship by chemo-response in serous OVCA TCGA data. Chemo-response was the most significant factor in the multivariable analysis for survival. Complete responders (CR) have a median survival 2 years greater than IR
Publicly available GEO datasets of patients with serous OVCA used for validation/replication of prediction models
| Repositories | Number of patients | Study Names | References | |
|---|---|---|---|---|
| CR | IR | |||
| GEO accession number | ||||
| GSE23554 | 90 | 37 | MCC | Marchion, 2011 [ |
| GSE9891 | 185 | 55 | Australia | Tothill, 2008 [ |
| GSE28739 | 20 | 30 | Trinh | Trinh, 2011 [ |
| GSE17260 | 93 | 17 | Yoshihara | Yoshihara, 2010 [ |
| GSE30161 | 32 | 23 | Ferriss | Ferriss, 2012 [ |
| EMBL-EBI accession number | ||||
| E-MTAB-386 | 64 | 41 | Bentink | Bentink, 2012 [ |
Fig. 2Area under the ROC curve (AUC) for the 422-gene prediction models. a Box plot representations of the AUC for the complete model by different methods. RF: Random forest; Lasso (least absolute shrinkage and selection operator); Elastic Net; PAM: Prediction analysis for microarrays; DDA: Diagonal discriminant analysis; PLS-LR: Partial least squares - Logistic regression; PLR: Penalized logistic regression; PLS: Partial least squares; PLS-RF: Partial least squares - Random forest. b Prediction performance measured in AUC, with their respective standard error, and confidence intervals (CI) by different methods using all 422 genes. PAM: Prediction analysis for microarrays; PLS: Partial least squares; Lasso: least absolute shrinkage and selection operator
AUCs and their CI comparing the 422-gene prediction model and clinical prediction models
PAM: Prediction analysis for microarrays; PLS: Partial least squares; Lasso: least absolute shrinkage and selection operator; NA: not available (not computable)
CI of clinical prediction model WITH significant overlap with TCGA 422-gene prediction model CI: in red
Fig. 3AUC for 34-gene predictions models. AUCs and CIs for the predicting methods using the most relevant genes in the prediction model/classifier: RF: Random forest; Lasso (least absolute shrinkage and selection operator); Elastic Net; PAM: Prediction analysis for microarrays; DDA: Diagonal discriminant analysis; PLS-LR: Partial least squares - Logistic regression; PLR: Penalized logistic regression; PLS: Partial least squares; PLS-RF: Partial least squares - Random forest
Fig. 4Origin of 34 genes selected in the optimized prediction model. Initially, genes were included in the 422-gene signature because of their differential gene expression (red), miRNA expression (pink), DNA methylation (blue), or copy number variation (green) between CR and IR. Some genes had more than one biological difference
Fig. 5Genomic position of 34 genes selected for the optimized prediction model. a The 34 most informative genes from the prediction model and their chromosomal location: chr: number of the chromosome were the gene is located; start: of the gene position; length: of the gene in base-pairs (bp). The human genome version was hg19. b Circular layout with matrix depiction of different biological variables. From external to internal: Chromosome bands: circular representation of all chromosomes (centromere is in red); d Differential gene expression between incomplete and complete responders (CR/IR; red is over-expressed, green is under-expressed); c Differential DNA methylation between CR and IR (CR/IR; blue is hypomethylated, orange is hyper methylated); b Differential miRNA expression between CR and IR (CR/IR; red is over-expressed, grey is under-expressed); a Gene copy number variation between CR and IR (copy gain is red, green is copy loss). The order of genes in a is the same as in b. Lines represent correlations between different biological variables (for more details see Table 4)
Genomic information and reason for inclusion in the original 422-gene signature for the 34 genes selected in the prediction model
| Annotation | Expression | Copy number | DNA Methylation | miRNA expression | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Symbol | Name | Entrez-ID | chr | Start | Length | Fold-change (CR/IR) | Presence | cytoband | Methylated Genes | Methylation Status (CR/IR) | miRNA | CR/IR expression |
|
| leucine rich repeat containing 8 family, member D | 55144 | 1 | 90286572 | 115417 | 1.13 | ||||||
|
| olfactomedin-like 3 | 56944 | 1 | 114522029 | 2846 | 0.76 | ||||||
|
| dermatopontin | 1805 | 1 | 168664694 | 33748 | 0.9 | ||||||
|
| fibroblast activation protein, alpha | 2191 | 2 | 163027199 | 72846 | 0.7 | ||||||
|
| ring finger protein 13 | 11342 | 3 | 149530474 | 149451 | Gain | 3q22.1-q29 | CRKRS | 1.19 | |||
|
| solute carrier family 7, (cationic amino acid transporter, y + system) member 11 | 23657 | 4 | 139085247 | 78256 | Loss | 4q13.3-q35.2 | |||||
|
| triple functional domain (PTPRF interacting) | 7204 | 5 | 14143828 | 365630 | 1.18 | Gain | 5p15.33-p13.1 | ||||
|
| drosha, ribonuclease type III | 29102 | 5 | 31400601 | 131681 | 1.16 | Gain | 5p15.33-p13.1 | ||||
|
| ribosomal protein S23 | 6228 | 5 | 81569138 | 5097 | 0.96 | Loss | 5q11.2-q21.1 | UNQ9217 | 1.15 | miR-22 | 0.8 |
|
| spectrin repeat containing, nuclear envelope 1 | 23345 | 6 | 152442821 | 196658 | Loss | 6q15-q27 | LAD1, NFATC2, SLC1A2, STEAP4 | 0.77 | miR-22, miR-200b | 1.22 | |
|
| STEAP family member 4 | 79689 | 7 | 87905743 | 30466 | SYNE1 | 0.82 | |||||
|
| protease, serine, 1 (trypsin 1) | 5644 | 7 | 142457318 | 3609 | 1.32 | Gain | 7q32.1-q36.3 | ||||
|
| protease, serine, 2 (trypsin 2) | 5645 | 7 | 142479907 | 1471 | 1.34 | Gain | 7q32.1-q36.3 | ||||
|
| suppression of tumorigenicity 18 (breast carcinoma) (zinc finger protein) | 9705 | 8 | 53023391 | 299048 | 0.97 | Gain | 8p11.21-q24.3 | ||||
|
| ribosomal protein S20 | 6224 | 8 | 56980738 | 6402 | Gain | 8p11.21-q24.3 | miR-135b | 1.22 | |||
|
| zinc finger and BTB domain containing 10 | 65986 | 8 | 81398447 | 36163 | 1.2 | Gain | 8p11.21-q24.3 | miR-708 | 1.2 | ||
|
| methylthioadenosine phosphorylase | 4507 | 9 | 21802634 | 63335 | 1.12 | Loss | 9p21.3-p21.2 | ||||
|
| glucosaminyl (N-acetyl) transferase 1, core 2 | 2650 | 9 | 79056581 | 65751 | 1.17 | ||||||
|
| WBP1L - chromosome 10 open reading frame 26 | 54838 | 10 | 104503726 | 72295 | 0.87 | No | |||||
|
| myosin VIIA | 4647 | 11 | 76839309 | 86977 | 1.08 | Gain | 11q13.5-q14.1 | ||||
|
| platelet derived growth factor D | 80310 | 11 | 103777913 | 257114 | 0.76 | No | |||||
|
| killer cell lectin-like receptor subfamily A pseudogene 1 | 10748 | 12 | 10741076 | 11358 | Gain | 12p13.33-p11.21 | miR-22 | 0.8 | |||
|
| NUAK family, SNF1-like kinase, 1 | 9891 | 12 | 106457124 | 76687 | 0.75 | No | |||||
|
| polymerase (DNA directed), epsilon | 5426 | 12 | 133200347 | 63598 | 1.14 | No | |||||
|
| ectonucleoside triphosphate diphosphohydrolase 5 | 957 | 14 | 74433180 | 52846 | 1.13 | No | |||||
|
| thrombospondin 1 | 7057 | 15 | 39873279 | 16389 | Loss | 15q11.1-q21.1 | CORO6, LAD1, NFATC2, SLC1A2, SNAI1 | 0.74 | miR-22, miR-641, miR-200b | 1.22 | |
|
| cell cycle progression 1 | 9236 | 15 | 55647437 | 53137 | 0.84 | Loss | 15q21.3 | ||||
|
| pyruvate dehydrogenase phosphatase regulatory subunit | 55066 | 16 | 70147528 | 47656 | 1.04 | Loss | 16q12.2-q24.3 | ||||
|
| ras homolog gene family, member T1 | 55288 | 17 | 30469472 | 83274 | 0.87 | Gain | 17p13.3-q21.2 | ||||
|
| MAX-like protein X | 6945 | 17 | 40719077 | 6144 | 0.88 | No | |||||
|
| coatomer protein complex, subunit zeta 2 | 51226 | 17 | 46103532 | 11620 | 0.78 | No | |||||
|
| megakaryocyte-associated tyrosine kinase | 4145 | 19 | 3777966 | 8449 | 0.93 | No | |||||
|
| TIMP metallopeptidase inhibitor 3 | 7078 | 22 | 33196801 | 62227 | 0.7 | Loss | 22q11.22 | miR-22 | 0.8 | ||
|
| minichromosome maintenance complex component 5 | 4174 | 22 | 35796115 | 24380 | 1.19 | Loss | 22q11.22-q13.33 | ||||
In the Expression, Copy number, DNA methylation, and miRNA expression, only those with significant differential values between CR and IR were represented. Some genes had more than one biological difference
Copy number shows the chromosomal region (cytoband) that was significantly correlated with gene expression in the 422-gene signature. DNA Methylation and miRNA expression shows the initial variables that were significantly correlated to gene expression in the 422-gene signature
Fig. 6Multivariate survival analysis of the 34 selected genes. a Table with hazard-ratio (HR) or risk of death, with 95 % CIs and p-values for each of the genes independently associated with survival. b Forest plot of independently significant genes for survival. Blue boxes represent hazard-ratios (HR), and lines are their CIs. HR = 1 is non-significant. HR < 1 has decreased risk of death; HR > 1 has increased risk of death. Overall p-value of the survival model: p = 5.7×10−7
Pathway enrichment analysis of the selected 34 genes constituting the simplified prediction model
| Cluster Profiler pathway enrichment analysis | ||||
| KEGG ID | Description | Adjusted | FDR | Gene IDs |
| hsa03030 | DNA replication | 2.07E-02 | 7.28E-03 | POLE/MCM5 |
| hsa04974 | Protein digestion and absorption | 3.09E-02 | 1.08E-02 | PRSS1/PRSS2 |
| hsa03010 | Ribosome | 3.09E-02 | 1.08E-02 | RPS23/RPS20 |
| hsa00240 | Pyrimidine metabolism | 3.09E-02 | 1.08E-02 | POLE/ENTPD5 |
| hsa04972 | Pancreatic secretion | 3.09E-02 | 1.08E-02 | PRSS1/PRSS2 |
| GeneGO pathway enrichment analysis | ||||
| # | Description | Adjusted | FDR | Gene IDs |
| 1 | Cell adhesion_Chemokines and adhesion | 8.45E-03 | 9.91E-02 | Thrombospondin 1, TRIO |
| 2 | Immune response_IL-12 signaling pathway | 3.21E-02 | 9.91E-02 | G6NT |
| 3 | Cytoskeleton remodeling_Role of PDGFs in cell migration | 3.34E-02 | 9.91E-02 | PDGF-D |
| 4 | Triacylglycerol metabolism p.2 | 3.89E-02 | 9.91E-02 | CEL |
| 5 | Development_Thrombospondin-1 signaling | 3.89E-02 | 9.91E-02 | Thrombospondin 1 |
| 6 | Cell cycle_Start of DNA replication in early S phase | 4.43E-02 | 9.91E-02 | MCM5 |
| 7 | Role of Tissue factor in cancer independent of coagulation protease signaling | 4.84E-02 | 9.91E-02 | Thrombospondin 1 |
Replication of 422-gene prediction models with different methods
The table presents the results taking in consideration: 1) whether they were part of the training or replication set; 2) the number of genes available for analysis: on top analyses including all 422 genes; on the bottom analyses where all genes were not available (and the number included)
CI of replication datasets with WIDE overlap with TCGA testing set CI: in red. CI of replication datasets with NO overlap with TCGA testing set CI: in green. AUC: area under the ROC curve. CI: confidence intervals. SE: Standard Error. PAM: Prediction analysis for microarrays; PLS: Partial least squares; Lasso: least absolute shrinkage and selection operator
Validation of 422-gene prediction models in independent databases
The table presents the results taking into consideration whether they were part of the training or validation set
CI of validation datasets WITH overlap with TCGA testing set CI: in red. CI of validation datasets with NO overlap with TCGA testing set CI: in green. AUC: area under the ROC curve. CI: confidence intervals. SE: Standard Error. PAM: Prediction analysis for microarrays; PLS: Partial least squares; Lasso: least absolute shrinkage and selection operator
Validation of optimized 34-gene prediction models
The table presents the results taking into consideration whether they were part of the training or validation set
CI of validation datasets WITH overlap with TCGA testing set CI: in red. CI of validation datasets with NO overlap with TCGA testing set CI: in green. AUC: area under the ROC curve. CI: confidence intervals. SE: Standard Error. PLS: Partial least squares