| Literature DB >> 31217513 |
Deena M A Gendoo1, Michael Zon2,3, Vandana Sandhu2, Venkata S K Manem2,4,5, Natchar Ratanasirigulchai2, Gregory M Chen2, Levi Waldron6, Benjamin Haibe-Kains7,8,9,10,11.
Abstract
A wealth of transcriptomic and clinical data on solid tumours are under-utilized due to unharmonized data storage and format. We have developed the MetaGxData package compendium, which includes manually-curated and standardized clinical, pathological, survival, and treatment metadata across breast, ovarian, and pancreatic cancer data. MetaGxData is the largest compendium of curated transcriptomic data for these cancer types to date, spanning 86 datasets and encompassing 15,249 samples. Open access to standardized metadata across cancer types promotes use of their transcriptomic and clinical data in a variety of cross-tumour analyses, including identification of common biomarkers, and assessing the validity of prognostic signatures. Here, we demonstrate that MetaGxData is a flexible framework that facilitates meta-analyses by using it to identify common prognostic genes in ovarian and breast cancer. Furthermore, we use the data compendium to create the first gene signature that is prognostic in a meta-analysis across 3 cancer types. These findings demonstrate the potential of MetaGxData to serve as an important resource in oncology research, and provide a foundation for future development of cancer-specific compendia.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31217513 PMCID: PMC6584731 DOI: 10.1038/s41598-019-45165-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Diagrammatic representation of the data processing pipeline for packages that are part of the MetaGxData compendium. Depicted are the processes involved in downloading a dataset, and standardization of molecular (gene) and clinical (patient) data to produce cancer-specific compendia that abide by the MetaGxData framework.
Figure 2Schematic representation of some of the common clinical variables (pData) that are available across datasets in MetaGxBreast, MetaGxOvarian, and MetaGxPancreas. The Stacked bar plots indicate the percentage of samples in every dataset annotated with a particular variable designation. Continuous numeric values are represented by box plots.
Figure 3Assessment of the prognostic value of seven key gene modules in breast cancer, using the MetaGxBreast package. (a) Heatmap representation of hazard ratios for each gene module, across 9 datasets. The estimate is presented as a hazard ratio for each gene. Ratios greater than 1 (red) indicate worse prognosis for elevated expression levels of that gene in the respective datasets. (b) Random effects meta-estimates of the hazard ratios for each gene, calculated by pooling the hazard ratios from each individual dataset. (c) Kaplan-Meier curves of the most prognostic gene with p < 0.05, in this case AURKA. Each KM plot represents patients of a specific treatment type. Within each plot, patients are split into ‘high’ and ‘low’ based on the median AURKA score.
Figure 5Assessment of the prognostic value of genes in pancreatic cancer, using the MetaGxPancreas package. (a) Heatmap representation of hazard ratios for each gene, across 11 datasets. The estimate is presented as a hazard ratio for each gene. Ratios greater than 1 (red) indicate worse prognosis for elevated expression levels of that gene in the respective datasets. (b) Random effects meta-estimates of the hazard ratios for each gene, calculated by pooling the hazard ratios from each individual dataset. (c) Kaplan-Meier curve of ADM.
Figure 4Assessment of the prognostic value of six key genes in ovarian cancer, using the MetaGxOvarian package. (a) Heatmap representation of hazard ratios for each gene, across 17 datasets. The estimate is presented as a hazard ratio for each gene. Ratios greater than 1 (red) indicate worse prognosis for elevated expression levels of that gene in the respective datasets. (b) Random effects meta-estimates of the hazard ratios for each gene, calculated by pooling the hazard ratios from each individual dataset. (c) Kaplan-Meier curves of NUAK1. Each KM plot represents patients of a specific tumour grade. Within each plot, patients are split into ‘high’ and ‘low’ based whether they fall above or below the median NUAK1 gene expression. The asterisks above the D indices indicate whether the D index was statistically significant (p < 0.05).
Genes present in the MetaGx gene signature.
| Gene Symbol | Description | Entrez ID | Direction | |
|---|---|---|---|---|
| 1 | ACKR3 | atypical chemokine receptor 3 | 57007 | 1 |
| 2 | ACTN4 | actinin alpha 4 | 81 | 1 |
| 3 | ARHGAP21 | Rho GTPase activating protein 21 | 57584 | 1 |
| 4 | C12orf49 | chromosome 12 open reading frame 49 | 79794 | 1 |
| 5 | CACNB3 | calcium voltage-gated channel auxiliary subunit beta 3 | 784 | 1 |
| 6 | CAMK1D | calcium/calmodulin dependent protein kinase ID | 57118 | 1 |
| 7 | CAMSAP3 | calmodulin regulated spectrin associated protein family member 3 | 57662 | −1 |
| 8 | CBFB | core-binding factor beta subunit | 865 | 1 |
| 9 | CDC37L1 | cell division cycle 37 like 1 | 55664 | −1 |
| 10 | CDK19 | cyclin dependent kinase 19 | 23097 | 1 |
| 11 | CLDN4 | claudin 4 | 1364 | 1 |
| 12 | CMBL | carboxymethylenebutenolidase homolog | 134147 | 1 |
| 13 | COP1 | COP1, E3 ubiquitin ligase | 64326 | 1 |
| 14 | CRABP2 | cellular retinoic acid binding protein 2 | 1382 | 1 |
| 15 | CSE1L | chromosome segregation 1 like | 1434 | 1 |
| 16 | DARS2 | aspartyl-tRNA synthetase 2, mitochondrial | 55157 | 1 |
| 17 | DDB2 | damage specific DNA binding protein 2 | 1643 | −1 |
| 18 | DPP4 | dipeptidyl peptidase 4 | 1803 | 1 |
| 19 | EGFR | epidermal growth factor receptor | 1956 | 1 |
| 20 | FAM189A2 | family with sequence similarity 189 member A2 | 9413 | −1 |
| 21 | GSTZ1 | glutathione S-transferase zeta 1 | 2954 | −1 |
| 22 | IMPDH1 | inosine monophosphate dehydrogenase 1 | 3614 | 1 |
| 23 | IRF3 | interferon regulatory factor 3 | 3661 | 1 |
| 24 | KATNAL1 | katanin catalytic subunit A1 like 1 | 84056 | 1 |
| 25 | KIF11 | kinesin family member 11 | 3832 | 1 |
| 26 | LATS2 | large tumor suppressor kinase 2 | 26524 | 1 |
| 27 | LOXL2 | lysyl oxidase like 2 | 4017 | 1 |
| 28 | MOCS1 | molybdenum cofactor synthesis 1 | 4337 | −1 |
| 29 | MREG | melanoregulin | 55686 | −1 |
| 30 | MSC | musculin | 9242 | 1 |
| 31 | MYADM | myeloid associated differentiation marker | 91663 | 1 |
| 32 | MYLK3 | myosin light chain kinase 3 | 91807 | −1 |
| 33 | NAE1 | NEDD8 activating enzyme E1 subunit 1 | 8883 | 1 |
| 34 | NID2 | nidogen 2 | 22795 | 1 |
| 35 | OPRM1 | opioid receptor mu 1 | 4988 | 1 |
| 36 | PLAU | plasminogen activator, urokinase | 5328 | 1 |
| 37 | PPEF1 | protein phosphatase with EF-hand domain 1 | 5475 | 1 |
| 38 | PWP1 | PWP1 homolog, endonuclein | 11137 | 1 |
| 39 | RALY | RALY heterogeneous nuclear ribonucleoprotein | 22913 | 1 |
| 40 | RARRES3 | retinoic acid receptor responder 3 | 5920 | −1 |
| 41 | REX1BD | required for excision 1-B domain containing | 55049 | 1 |
| 42 | SERPINB2 | serpin family B member 2 | 5055 | 1 |
| 43 | SIPA1L2 | signal induced proliferation associated 1 like 2 | 57568 | 1 |
| 44 | STK3 | serine/threonine kinase 3 | 6788 | 1 |
| 45 | TERF2 | telomeric repeat binding factor 2 | 7014 | 1 |
| 46 | TEX261 | testis expressed 261 | 113419 | 1 |
| 47 | TGFBI | transforming growth factor beta induced | 7045 | 1 |
| 48 | TNFRSF18 | TNF receptor superfamily member 18 | 8784 | −1 |
| 49 | TPD52L2 | tumor protein D52 like 2 | 7165 | 1 |
| 50 | UTP6 | UTP6, small subunit processome component | 55813 | 1 |
| 51 | ZFAND2A | zinc finger AN1-type containing 2 A | 90637 | 1 |
| 52 | ZNF204P | zinc finger protein 204, pseudogene | 7754 | −1 |
| 53 | ZSCAN32 | zinc finger and SCAN domain containing 32 | 54925 | −1 |
Figure 6Survival curves for the MetaGx signature with patients stratified by molecular subtypes. (a–e) Survival curves in ovarian cancer. (f–i) Survival curves in breast cancer. (j–l) Survival curves in pancreatic cancer. The asterisks above the D indices indicate whether the D index was statistically significant (p < 0.05).
Prognostic value of Pancreatic Cancer Gene Signatures.
| Gene Signature - Subtype | D Index | D Index 95% CI | D Index P | Log Rank Test P | |
|---|---|---|---|---|---|
| 1 | MetaGx - All Patients | 1.64 | (1.37, 1.90) | 1.9e-04 | 2.3e-06 |
| 2 | MetaGx - basal | 1.75 | (1.31, 2.19) | 1.1e-02 | 1.1e-03 |
| 3 | MetaGx - classical | 1.43 | (1.09, 1.77) | 3.7e-02 | 1.3e-02 |
| 4 | Newhook PLos onea - All Patients | 1.22 | (0.97, 1.47) | 1.1e-01 | 1.2e-01 |
| 5 | Newhook PLos onea - basal | 1.01 | (0.80, 1.23) | 9e-01 | 7.9e-01 |
| 6 | Newhook PLos onea - classical | 0.99 | (0.77, 1.20) | 9e-01 | 6.2e-01 |
| 7 | Haider Gen Medb - All Patients | 1.56 | (1.23, 1.88) | 6.8e-03 | 4.7e-06 |
| 8 | Haider Gen Medb- basal | 1.22 | (0.88, 1.57) | 2.4e-01 | 2.2e-01 |
| 9 | Haider Gen Medb - classical | 1.43 | (1.15, 1.71) | 1e-02 | 9.5e-02 |
| 10 | Grutzmann Oncogenec - All Patients | 1.35 | (1.22, 1.49) | 1.3e-05 | 2.1e-06 |
| 11 | Grutzmann Oncogenec - basal | 1.29 | (0.91, 1.67) | 1.7e-01 | 7.8e-03 |
| 12 | Grutzmann Oncogenec - classical | 1.23 | (1.01, 1.46) | 6.2e-02 | 1.1e-01 |
| 13 | Stratford PLos medd - All Patients | 1.39 | (1.09, 1.68) | 2.9e-02 | 6.3e-03 |
| 14 | Stratford PLos medd - basal | 1.22 | (1.01, 1.43) | 6.4e-02 | 2.6e-01 |
| 15 | Stratford PLos medd- classical | 1.29 | (0.94, 1.63) | 1.4e-01 | 6.7e-02 |
aT. E. Newhook et al., A thirteen-gene expression signature predicts survival of patients with pancreatic cancer and identifies new genes of interest, PLoS One, vol. 9, no. 9, p. e105631, Sep. 2014.
bS. Haider et al., A multi-gene signature predicts outcome in patients with pancreatic ductal adenocarcinoma, Genome Med., vol. 6, no. 12, p. 105, Dec. 2014.
cR. Grutzmann et al., Meta-analysis of microarray data on pancreatic cancer defines a set of commonly dysregulated genes, Oncogene, vol. 24, no. 32, pp. 5079–5088, Jul. 2005.
dJ. K. Stratford et al., A six-gene signature predicts survival of patients with localized pancreatic ductal adenocarcinoma, PLoS Med., vol. 7, no. 7, p. e1000307, Jul. 2010.