| Literature DB >> 31739607 |
Lydia Mok1, Yongkang Kim2, Sungyoung Lee3, Sungkyoung Choi4, Seungyeoun Lee5, Jin-Young Jang6, Taesung Park1,2.
Abstract
Although there have been several analyses for identifying cancer-associated pathways, based on gene expression data, most of these are based on single pathway analyses, and thus do not consider correlations between pathways. In this paper, we propose a hierarchical structural component model for pathway analysis of gene expression data (HisCoM-PAGE), which accounts for the hierarchical structure of genes and pathways, as well as the correlations among pathways. Specifically, HisCoM-PAGE focuses on the survival phenotype and identifies its associated pathways. Moreover, its application to real biological data analysis of pancreatic cancer data demonstrated that HisCoM-PAGE could successfully identify pathways associated with pancreatic cancer prognosis. Simulation studies comparing the performance of HisCoM-PAGE with other competing methods such as Gene Set Enrichment Analysis (GSEA), Global Test, and Wald-type Test showed HisCoM-PAGE to have the highest power to detect causal pathways in most simulation scenarios.Entities:
Keywords: Hierarchical structured component model; Pathway analysis; Survival phenotype
Mesh:
Year: 2019 PMID: 31739607 PMCID: PMC6896173 DOI: 10.3390/genes10110931
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Demographics and clinical characteristics of study patients.
| Variable | Variable Description | Descriptive Statistics |
|---|---|---|
| Age | Age at diagnosis | 63.32(10.064) mean(se) |
| Sex | Male: 75, Female: 50 | |
| Positive | Number of cancers transmitted by Lymphocytes | (0,1,2) |
| Size | Maximum Tumor Size (cm) | 3.574 (mean) |
| Differentiation | Clinico-pathologic characteristics and prognostic value of various histological types. | WD: 19, MD: 85, PD: 18, Other: 2 (NA: 1) |
| Jaundice | Yes: 89, No: 36 | |
| 7th staging T stage | AJCC 7th T staging criteria. | 1th: 6, 2nd: 3, |
| 7th staging N stage | AJCC 7th N staging criteria. | Yes: 71, No: 54 |
| Radiation therapy | Radiation therapy after surgery | Yes: 72, No: 53 |
| Chemotherapy | Chemotherapy after surgery | Yes: 94, No: 31 |
| Overall survival time | Median: 25 months |
Differentiation variables were constituted the following categories: Well Differentiated (WD), Moderately Differentiated (MD) and Poorly Differentiated (PD). For T and N stage, American Joint Committee on Cancer (AJCC) 7th edition was applied.
Figure 1A schematic diagram of the HisCoM-PAGE model. Figure 1 shows the HisCoM-PAGE model with J pathways. Rectangles and circles represent observed variables (mRNA expression) and latent variables (pathways), respectively. Each pathway consists of three or more genes and is represented by a latent variable constructed by a weighted sum of its genes. Single-headed arrows represent the effect of genes in a pathway, and the effect of pathways on the hazard function at the survival time y.
Significant pathways for PDAC prognosis identified by HisCoM-PAGE using SNUH microarray data and the replication study result using TCGA RNA-seq data.
| Pathway Database | Pathway Name | Microarray Data | RNA-seq Data | ||||
|---|---|---|---|---|---|---|---|
| |βpath| | |βpath| | ||||||
| BIOCARTA | Transforming Growth Factor-β (TGF-β)* | 0.017 | 0.00001 | 0.002 | 0.006 | 0.053 | 0.211 |
| Non-Typeable Haemophilus Influenzae (NTHI) | 0.014 | 0.00033 | 0.03 | 0.006 | 0.314 | 0.419 | |
| MITOCHONDRIA | 0.01 | 0.00054 | 0.03 | 0.007 | 0.197 | 0.394 | |
| Transducer Of ERBB2.1(TOB1)* | 0.016 | 0.00056 | 0.03 | 0.005 | 0.46 | 0.46 | |
| KEGG |
| 0.015 | 0.0001 | 0.0074 | 0.0071 | 0.261 | 0.894 |
| COLORECTAL CANCER | 0.014 | 0.0001 | 0.0074 | 0.0085 | 0.843 | 0.959 | |
| CIRCADIAN RHYTHM MAMMAL | 0.01 | 0.0012 | 0.0306 | 0.0062 | 0.869 | 0.959 | |
|
| 0.012 | 0.0009 | 0.0306 | 0.0088 | 0.868 | 0.959 | |
|
| 0.01 | 0.0008 | 0.0306 | 0.0018 | 0.364 | 0.9055 | |
|
| 0.016 | 0.0011 | 0.0306 | 0.013 | 0.951 | 0.959 | |
|
| 0.014 | 0.0015 | 0.0306 | 0.0037 | 0.541 | 0.959 | |
|
| 0.014 | 0.0007 | 0.0306 | 0.0078 | 0.724 | 0.959 | |
|
| 0.017 | 0.0014 | 0.0306 | 0.0104 | 0.415 | 0.905 | |
| ADHERENS JUNCTION* | 0.013 | 0.0018 | 0.0326 | 0.01 | 0.819 | 0.959 | |
|
| 0.016 | 0.0023 | 0.0326 | 0.011 | 0.851 | 0.959 | |
|
| 0.009 | 0.0021 | 0.0326 | 0.0031 | 0.665 | 0.959 | |
|
| 0.015 | 0.0022 | 0.0326 | 0.0099 | 0.178 | 0.894 | |
|
| 0.01 | 0.0028 | 0.0369 | 0.0108 | 0.004 | 0.0852 | |
| DORSO VENTRAL AXIS FORMATION | 0.009 | 0.0031 | 0.0369 | 0.0036 | 0.344 | 0.905 | |
|
| 0.008 | 0.0032 | 0.0369 | 0.0046 | 0.212 | 0.894 | |
|
| 0.013 | 0.0038 | 0.0373 | 0.0086 | 0.732 | 0.959 | |
| ERBB SIGNALING* | 0.013 | 0.004 | 0.0373 | 0.0069 | 0.401 | 0.9055 | |
|
| 0.015 | 0.0036 | 0.0373 | 0.0121 | 0.952 | 0.959 | |
|
| 0.015 | 0.004 | 0.0373 | 0.0127 | 0.959 | 0.959 | |
|
| 0.012 | 0.0044 | 0.039 | 0.0081 | 0.711 | 0.959 | |
| Mammalian TOR (MTOR) SIGNALING* | 0.011 | 0.0055 | 0.0465 | 0.0098 | 0.014 | 0.112 | |
|
| 0.011 | 0.0058 | 0.047 | 0.0068 | 0.258 | 0.894 | |
Pathways related to PDAC or pancreatic cancer are denoted by *. The pathways uniquely identified by HisCoM-PAGE are denoted as bold. Kyoto Encyclopedia of Genes and Genomes (KEGG).
Figure 2Venn diagram for the significant pathways identified by four different methods using the KEGG database. The pathways are listed with FDR-adjusted q-values less than 0.05. The 17 pathways uniquely identified by HisCoM-PAGE are highlighted in Table 2.
Significant genes in PDAC prognosis identified by HisCoM-PAGE using the SNUH microarray dataset and the replication study result using the TCGA RNA-seq data.
| Pathway Database | Pathway Name | Gene | SNUH Microarray | TCGA RNA-seq | ||||
|---|---|---|---|---|---|---|---|---|
| |wgene × βpath| | |wgene × βpath| | |||||||
| BIOCARTA | Non-Typeable Haemophilus Influenzae ( |
| 0.032 | 0.00001 | 0.004 | 0.004 | 0.1246 | 0.298 |
| Transducer Of ERBB2.1( |
| 0.032 | 0.00001 | 0.004 | 0.004 | 0.1134 | 0.298 | |
| Transforming Growth Factor-β ( |
| 0.032 | 0.00001 | 0.004 | 0.004 | 0.1114 | 0.298 | |
| CHEMICAL |
| 0.024 | 0.00003 | 0.004 | 0.006 | 0.0762 | 0.259 | |
| IL-2 receptor beta chain (IL2RB) |
| 0.024 | 0.00003 | 0.004 | 0.006 | 0.0866 | 0.266 | |
| RAS |
| 0.024 | 0.00003 | 0.004 | 0.006 | 0.0783 | 0.259 | |
| Bcl-2 antagonist of cell death (BAD) |
| 0.024 | 0.00003 | 0.004 | 0.006 | 0.0777 | 0.259 | |
|
|
| 0.024 | 0.00003 | 0.004 | 0.006 | 0.0753 | 0.259 | |
| CCCTC-binding factor (CTCF) |
| 0.019 | 0.00005 | 0.004 | 0.008 | 0.9707 | 0.982 | |
| Inflammatory Response(INFLAM) |
| 0.019 | 0.00005 | 0.004 | 0.008 | 0.9715 | 0.982 | |
| Erythrocyte Differentiation (ERYTH) |
| 0.019 | 0.00005 | 0.004 | 0.008 | 0.9716 | 0.982 | |
| MAP Kinase(MAPK) |
| 0.019 | 0.00005 | 0.004 | 0.008 | 0.9726 | 0.982 | |
| Anaplastic lymphoma kinase(ALK) |
| 0.018 | 0.00006 | 0.004 | 0.008 | 0.9695 | 0.982 | |
| G1 |
| 0.018 | 0.00006 | 0.004 | 0.008 | 0.9706 | 0.982 | |
| P38MAPK |
| 0.019 | 0.00006 | 0.004 | 0.008 | 0.9718 | 0.982 | |
|
|
| 0.018 | 0.00006 | 0.004 | 0.008 | 0.971 | 0.982 | |
| NKT |
| 0.018 | 0.00006 | 0.004 | 0.008 | 0.971 | 0.982 | |
| Interleukin-1 receptor (IL1R) |
| 0.018 | 0.00006 | 0.004 | 0.008 | 0.971 | 0.982 | |
|
|
| 0.018 | 0.00006 | 0.004 | 0.008 | 0.971 | 0.982 | |
| KERATINOCYTE |
| 0.015 | 0.00008 | 0.005 | 0.001 | 0.3516 | 0.643 | |
| E-26-specific (ETS) |
| 0.015 | 0.0001 | 0.006 | 0.001 | 0.3588 | 0.643 | |
| P53HYPOXIA |
| 0.016 | 0.00047 | 0.028 | 0.0002 | 0.4766 | 0.762 | |
| Hypoxia-Inducible Factor(HIF) |
| 0.016 | 0.00047 | 0.028 | 0.0001 | 0.4767 | 0.762 | |
| Erythropoietin mediated neuroprotection through NF-kB (EPONFKB) |
| 0.016 | 0.0005 | 0.028 | 0.0001 | 0.4786 | 0.762 | |
| Vascular Endothelial Growth Factor (VEGF) |
| 0.015 | 0.0006 | 0.033 | 0.00005 | 0.9824 | 0.982 | |
| DEATH |
| 0.018 | 0.00064 | 0.033 | 0.002 | 0.647 | 0.897 | |
| Formyl methionyl leucyl phenilalanine (FMLP) |
| 0.015 | 0.00074 | 0.037 | 0.006 | 0.0485 | 0.24 | |
| IL1R |
| 0.01 | 0.00095 | 0.041 | 0.002 | 0.2839 | 0.581 | |
| SET |
| 0.015 | 0.001 | 0.041 | - | - | - | |
| Phosphoinositides (PTDINS) |
| 0.011 | 0.0011 | 0.041 | 0.00008 | 0.5064 | 0.778 | |
| Extrinsic Prothrombin Activation (EXTRINSIC) |
| 0.013 | 0.00115 | 0.041 | 0.002 | 0.6457 | 0.897 | |
| Acute Myocardial Infarction (AMI) |
| 0.013 | 0.00116 | 0.041 | 0.002 | 0.6457 | 0.897 | |
| protease-activated receptors-1 (PAR1) |
| 0.017 | 0.00118 | 0.041 | 0.007 | 0.0502 | 0.24 | |
| Endothelial differentiation gene- 1 (EDG1) |
| 0.017 | 0.00119 | 0.041 | 0.007 | 0.0464 | 0.24 | |
| G protein-coupled receptors (GPCR) |
| 0.017 | 0.00119 | 0.041 | 0.007 | 0.0499 | 0.24 | |
| SPPA |
| 0.017 | 0.00122 | 0.041 | 0.007 | 0.0481 | 0.24 | |
| Bioactive Peptide Induced Signaling (BIOPEPTIDES) |
| 0.017 | 0.00122 | 0.041 | 0.007 | 0.0476 | 0.24 | |
| CXC chemokine receptor type-4 (CXCR4) |
| 0.017 | 0.00122 | 0.041 | 0.007 | 0.0447 | 0.24 | |
| Mannose 6-phosphate receptors (MPR) |
| 0.017 | 0.00122 | 0.041 | 0.008 | 0.0432 | 0.24 | |
| Glycogen synthase kinase-3 (GSK3) |
| 0.017 | 0.00123 | 0.041 | 0.008 | 0.0432 | 0.24 | |
| Peroxisome proliferator-activated receptor alpha (PPARA) |
| 0.015 | 0.00122 | 0.041 | 0.003 | 0.3221 | 0.63 | |
| VEGF |
| 0.01 | 0.00146 | 0.047 | 0.003 | 0.1576 | 0.339 | |
| Nitric Oxide-1(NO1) |
| 0.01 | 0.00147 | 0.047 | 0.003 | 0.1575 | 0.339 | |
| KEGG | CELL CYCLE |
| 0.023 | 0.0001 | 0.047 | 0.003 | 0.099 | 0.099 |
|
|
| 0.023 | 0.0001 | 0.047 | 0.003 | 0.096 | 0.099 | |
| TGF-β |
| 0.023 | 0.0001 | 0.047 | 0.003 | 0.0957 | 0.099 | |
Bold pathways were significantly identified by HisCoM-PAGE.
Figure 3The empirical type 1 error with 1000 replicates at the 0 to 0.5 censoring proportions. The x-axis represents the censoring proportion and the y- axis represents the type 1 error. Comparison methods are as follows: Gene Set Enrichment Analysis with weight zero (GSEA1), Gene Set Enrichment Analysis with weight 1 (GSEA2), HisCoM-PAGE (HisCoM), Global test (GT) and Wald type test (Adewale).
Figure 4Empirical power of four scenarios. For the simulated gene expression data set, four correlation structures were considered. The x-axis refers to significant gene proportion, and the y-axis represents power. The percentage in the parenthesis indicates the censoring proportion.