Literature DB >> 34368650

Network analysis identifies DAPK3 as a potential biomarker for lymphatic invasion and colon adenocarcinoma prognosis.

Huey-Miin Chen1, Justin A MacDonald1.   

Abstract

Colon adenocarcinoma is a prevalent malignancy with significant mortality. Hence, the identification of molecular biomarkers with prognostic significance is important for improved treatment and patient outcomes. Clinical traits and RNA-Seq of 551 patient samples in the UCSC Toil Recompute Compendium of The Cancer Genome Atlas TARGET and Genotype Tissue Expression project datasets (primary_site = colon) were used for weighted gene co-expression network analysis to reveal the association between gene networks and cancer cell invasion. One module, containing 151 genes, was significantly correlated with lymphatic invasion, a histopathological feature of higher risk colon cancer. DAPK3 (death-associated protein kinase 3) was identified as the pseudohub of the module. Gene ontology identified gene enrichment related to cytoskeletal organization and apoptotic signaling processes, suggesting modular involvement in tumor cell survival, migration, and epithelial-mesenchymal transformation. Although DAPK3 expression was reduced in patients with colon cancer, high expression of DAPK3 was significantly correlated with greater lymphatic invasion and poor overall survival.
© 2021 The Authors.

Entities:  

Keywords:  Molecular biology; Systems biology; Transcriptomics

Year:  2021        PMID: 34368650      PMCID: PMC8326195          DOI: 10.1016/j.isci.2021.102831

Source DB:  PubMed          Journal:  iScience        ISSN: 2589-0042


Introduction

Adenocarcinomas of the colon and rectum, collectively referred to as colorectal cancer (CRC), are highly aggressive and common contributors to global cancer morbidity and mortality (Malki et al., 2020; Dekker et al., 2019; Marmol et al., 2017). The likelihood of CRC diagnosis is about 5%, and risk factors include age and sex as well as diet and other lifestyle habits (Marmol et al., 2017). The distinction between colon and rectal cancer is largely anatomical, but the tumor location does impact both surgical and therapeutic management strategies as well as prognosis. In 2018, adenocarcinoma of the colon was the fourth most common malignancy and was documented to account for >10% of all diagnosed cancers globally (Ferlay et al., 2019). Colorectal adenocarcinomas are caused by genomic instability and mutations that occur in tumor suppressor genes, oncogenes, and genes related to DNA repair. Genetic and epigenetic alterations contribute to molecular events that lead to the neoplastic transformation of healthy colorectal epithelium which then progresses to malignancy (Fearon and Vogelstein, 1990). Several canonical signaling pathways are known to be affected by genomic aberrations in CRC (i.e., mitogen-activated protein kinases (MAPKs), phosphoinositide 3-kinase (PI3K), transforming growth factor (TGF)-β, tumor protein P53 (TP53), and wingless and Int-1 (Wnt/β-catenin) and ultimately lead to changes in fundamental cellular processes that drive tumor development, including differentiation, apoptosis, cell cycle, cell proliferation, and survival (Malki et al., 2020; Marmol et al., 2017). In addition, cancer cell migration is particularly important to the process of metastasis, by which cancer cells escape from the primary tumor and travel to distant sites to invade a new niche (Chow et al., 2012). Two important pathologic processes for cancer cell migration include lymphovascular and perineural invasion, whereby tumor cells invade nervous tissue (perinuclear invasion, PNI) or small lymphatic and vascular tissues (lymphovascular invasion, LVI), respectively. Patients with locally advanced rectal cancer who demonstrated PNI and/or LVI had significantly higher risk of recurrence and poorer survival outcomes (Sun et al., 2019). Moreover, LVI and PNI had detrimental effects on survival after diagnosis of stage II or III adenocarcinoma of the colon (Mutabdzic et al., 2019; Skancke et al., 2019). In patients with stage III colon cancer, increased lymph node involvement was also associated with a need for adjuvant therapy with worsening overall survival and shorter time to recurrence (Sjo et al., 2012). Thus, an improved understanding of the genes associated with PNI and/or LVI of colon adenocarcinoma may reveal candidate biomarkers and novel therapeutic opportunities by identifying predictive and/or prognosis markers for metastatic invasion events. In this study, we assess the ubiquity between tumor and LVI molecular markers that are associated with colon cancer to improve prediction of cancer cell migration and invasion with links to overall survival. Weighted gene co-expression network analysis (WGCNA) is commonly used to explore next-generation RNA-Seq datasets for correlations among genes and clinical phenotypes (Li et al., 2018; Langfelder and Horvath, 2008; Zhang and Horvath, 2005). WGCNA can provide insights into the molecular networks that connect clinical features of tumor progression and has been previously used to identify candidate prognostic biomarkers or therapeutic targets (Chen et al., 2019; Zou and Jing, 2019; Qin et al., 2019; Zhai et al., 2017; Yang et al., 2014). Co-expression modules were constructed using normalized RNA-Seq data and then related to clinicopathological outcomes for patients with colon adenocarcinoma. Search tool for the retrieval of interacting genes/proteins (STRING) network analysis and gene ontology (GO) enrichment analysis were performed to identify hub genes and the primary biological function of selected modules that were aligned with LVI and PNI characteristics of colon adenocarcinoma.

Results

The primary objective of our study was to resolve tumor-associated molecular markers by their linkage to lymphatic invasion and overall survival. To achieve this, we used a schema that is outlined in Figure 1.
Figure 1

Workflow scheme. The workflow for data acquisition, preparation, processing, and analyses

Workflow scheme. The workflow for data acquisition, preparation, processing, and analyses

WGCNA identified a gene network associated with lymphovascular invasion in colon adenocarcinoma

The presence of tumor cells within the lymphatics or blood vessels is associated with increased risk of axillary lymph node and distant metastases. To identify candidate biomarkers for lymphatic invasion, we performed WGCNA on RNA-Seq data obtained from The Cancer Genome Atlas (TCGA) with the |primary_site| originating from colon, |histological_type| limited to colon adenocarcinoma, and non-protein-coding genes excluded. When the soft thresholding was set at 10, the scale-free topology fit index reached 0.95 (Figures 2A and 2B). Modules of co-expressed genes were determined with hierarchical clustering and the dynamic tree cut procedure, and each of the modules was marked by a color. In all, 24 different modules resulted from WGCNA with the clustering dendrograms of genes shown in Figure 2C. Merging of modules with similar expression profiles was completed with a cut height of 0.25, corresponding to a 0.75 correlation; however, this did not reduce the number of modules. The darkgrey module was the smallest, with 100 genes, while the turquoise and blue modules were the largest, with 2,005 and 1,742 genes, respectively (Table S1). All genes were assigned, and no genes were grouped in gray (i.e., the module reserved for genes with no assigned classification).
Figure 2

Weighted gene co-expression network analysis (WGCNA) of primary tumors from colon adenocarcinoma

A range of soft-thresholding power (β) was screened within the WGCNA pipeline (A), and the plot shows the scale-free topology fit index for different β powers. The mean connectivity for various soft-thresholding powers is also provided (B). Hierarchical clustering and module assignment was completed based on a topological overlap matrix to calculate the corresponding dissimilarity (dissTOM). In (C), the connected genes were grouped into colored modules with the dynamic tree cut method. Highly correlated modules were merged based on a cut height of 0.25.

Weighted gene co-expression network analysis (WGCNA) of primary tumors from colon adenocarcinoma A range of soft-thresholding power (β) was screened within the WGCNA pipeline (A), and the plot shows the scale-free topology fit index for different β powers. The mean connectivity for various soft-thresholding powers is also provided (B). Hierarchical clustering and module assignment was completed based on a topological overlap matrix to calculate the corresponding dissimilarity (dissTOM). In (C), the connected genes were grouped into colored modules with the dynamic tree cut method. Highly correlated modules were merged based on a cut height of 0.25. The clinical features of patients colon adenocarcinoma, including lymphatic invasion, venous invasion, and perineural invasion occurrence, as well as pathological stage and pathological T (tumor, T), pathological N (node, N), and pathological M (metastasis, M) staging categories were extracted from the TCGA dataset and then related to the module eigengenes (MEs). As shown in Figure 3, MEdarkred was correlated with lymphatic invasion (r = 0.13, p = 0.04) and venous invasion (r = 0.14, p = 0.04). Additional correlations were observed for MEdarkred with pathology N (r = 0.14, p = 0.03) and pathological stage (r = 0.17, p = 0.01). Notably, the darkred module contained DAPK3 (death-associated protein kinase 3) along with 150 other genes (Table S2). DAPK3 expression is attenuated in several squamous cell carcinomas (Kake et al., 2017; Das et al., 2016; Li et al., 2015; Kocher et al., 2015; Bi et al., 2009). Moreover, loss-of-function mutations or deletion of DAPK3 promote increased cell survival, proliferation, cellular aggregation, and increased resistance to chemotherapy (Brognard et al., 2011). Considering this evidence, DAPK3 is now implicated as a tumor-suppressing kinase.
Figure 3

Module-trait relationship matrix correlating MEs with clinical traits

For the module-trait relationships, each row in the matrix corresponds to a module eigengene (ME) and each column to a clinical trait. Each cell contains the corresponding correlation with the p values of the correlations in parentheses. Each module-trait relationship is color-coded by correlation as indicated in the color legend on the right; blue indicates a negative correlation, while red represents a positive one. The MEdarkred (outlined in green) had the highest correlation with lymphatic invasion and contains DAPK3.

Module-trait relationship matrix correlating MEs with clinical traits For the module-trait relationships, each row in the matrix corresponds to a module eigengene (ME) and each column to a clinical trait. Each cell contains the corresponding correlation with the p values of the correlations in parentheses. Each module-trait relationship is color-coded by correlation as indicated in the color legend on the right; blue indicates a negative correlation, while red represents a positive one. The MEdarkred (outlined in green) had the highest correlation with lymphatic invasion and contains DAPK3. To further validate the module-clinical trait relationships, we completed additional examination of the correlation between the gene significance (GS) and the module membership (MM) measures. Five modules possessed a greater proportion of gene members with positive significance toward lymphatic invasion, including the darkred module (r = 0.29; p = 0.0003), as well as blue, cyan, pink, and lightyellow modules (Figure 4). As the module of primary interest, darkred contained a majority of members (79%) with positive gene significance toward lymphatic invasion. MEs were also used as representative profiles to assess module similarity by eigengene correlation. Construction of the eigengene network provided the identification of groups of correlated eigengenes termed meta-modules. MEdarkred, MEblack, MEmidnightblue, and MElightyellow modules were highly related. The mutual correlation of these four MEs was much stronger than their correlation with lymphatic invasion; however, lymphatic invasion was present as a part of the meta-module (Figure S1).
Figure 4

Intramodular analysis for modular membership versus gene significance

(A) The correlation between the gene significance (GS) and module membership (MM) measures was explored. Shown are the Pearson correlations for modules with p values (two-tailed) indicating significance.

(B) Scatterplots of GS versus MM for lymphatic invasion. GS is the relationship between the gene and the clinical trait, and the MM describes the correlation between the ME and the expression profile of a given gene with said module. Highlighted in green are modules with a majority of members having positive GS toward lymphatic invasion. Red highlights modules with most members having negative GS toward lymphatic invasion.

Intramodular analysis for modular membership versus gene significance (A) The correlation between the gene significance (GS) and module membership (MM) measures was explored. Shown are the Pearson correlations for modules with p values (two-tailed) indicating significance. (B) Scatterplots of GS versus MM for lymphatic invasion. GS is the relationship between the gene and the clinical trait, and the MM describes the correlation between the ME and the expression profile of a given gene with said module. Highlighted in green are modules with a majority of members having positive GS toward lymphatic invasion. Red highlights modules with most members having negative GS toward lymphatic invasion.

The gene module associated with lymphovascular invasion is linked to apoptosis and actin cytoskeleton processes

Functional enrichment analysis was conducted with topGO R on the darkred module, and the enriched GO biological processes were found to be dominated by regulation of apoptosis and cellular architecture (Figure 5A). Importantly, the significantly enriched pathways were aligned with “Actin cytoskeleton organization; GO:0030036,” the “Positive regulation of apoptotic signaling pathway; GO:2001235,” the “Regulation of mitochondrion organization; GO:0010821,” and the “Regulation of mitochondrial membrane permeability involved in apoptotic process; GO:1902108”.
Figure 5

Biomolecular characteristics of the darkred module

(A) Gene ontology (GO) enrichment analysis was conducted for genes in the darkred module. Biological process (BP) GO terms are listed along the y axis. GO terms were deemed significant when |p value| ≤ 0.01 and |total number of annotated genes| ≥ 30.

(B) A volcano plot of differentially expressed genes (DEGs, n = 8,479) derived from the TCGA and GTEx datasets and plotted as the log2-fold change differences (log2FC) between the compared samples. The upregulated (red) and downregulated (blue) genes were assessed for statistical significance using the treat method with empirical Bayes-moderated t-statistics (-log(FDR)).

(C) Six genes were revealed to be differentially expressed and be significantly associated with overall survival. Kaplan-Meier survival analysis by log-rank test was used to examine survivability with respect to high or low expression of (i) ANKRD13D, ankyrin repeat domain-containing protein 13D; (ii) DAPK3, death-associated protein kinase 3; (iii) ENKD1, enkurin domain-containing protein 1; (iv) ARRDC1, arrestin domain-containing protein 1; (v) DEF8, differentially expressed in FDCP 8 homolog; and (vi) NOL3, nucleolar protein 3.

Biomolecular characteristics of the darkred module (A) Gene ontology (GO) enrichment analysis was conducted for genes in the darkred module. Biological process (BP) GO terms are listed along the y axis. GO terms were deemed significant when |p value| ≤ 0.01 and |total number of annotated genes| ≥ 30. (B) A volcano plot of differentially expressed genes (DEGs, n = 8,479) derived from the TCGA and GTEx datasets and plotted as the log2-fold change differences (log2FC) between the compared samples. The upregulated (red) and downregulated (blue) genes were assessed for statistical significance using the treat method with empirical Bayes-moderated t-statistics (-log(FDR)). (C) Six genes were revealed to be differentially expressed and be significantly associated with overall survival. Kaplan-Meier survival analysis by log-rank test was used to examine survivability with respect to high or low expression of (i) ANKRD13D, ankyrin repeat domain-containing protein 13D; (ii) DAPK3, death-associated protein kinase 3; (iii) ENKD1, enkurin domain-containing protein 1; (iv) ARRDC1, arrestin domain-containing protein 1; (v) DEF8, differentially expressed in FDCP 8 homolog; and (vi) NOL3, nucleolar protein 3.

DAPK3 is revealed as the pseudohub gene in the module associated with lymphovascular invasion

Differential expression analysis is routinely used to interpret the biological importance of transcriptomics experiments (Crow et al., 2019). RNA-Seq data from TCGA and Genotype Tissue Expression project (GTEx) were analyzed to obtain a list of differentially expressed genes (DEGs). Based on the threshold of |log2(FC)| > 0.58 and an adjusted p value cut-off of 5%, 9,390 DEGs were identified. For a stricter definition of significance, the treat method was used to calculate the p values from empirical Bayes-moderated t-statistics with a minimum |log2(FC)| requirement (McCarthy and Smyth, 2009). The number of DEGs was reduced to a total of 8,479 when treat testing required a |log2(FC)| greater than 0.58. Among the 8,479 DEGs, 3,579 were significantly upregulated in CAC samples, whereas 4,900 were significantly downregulated in CAC samples (Figure 5B). The list of DEGs generated via the limma R pipeline was amalgamated with WGCNA-derived modules. An examination of the consolidated analyses revealed 57 of 151 (38%) genes in the darkred module to be differentially expressed. The R function coxph |survival| was used to determine the association between differential gene expression and overall survival. This analysis identified 18 of 151 (12%) genes in the darkred module that passed the |log_rank = 0.05| threshold. Of this subset, 6 of 151 genes were significantly differentially expressed and significantly associated with overall survival (Figure 5C and Table S2). As shown in Figure 6, genes within the darkred module were used as input for protein-protein interaction network construction with the STRING database. Intramodular connectivity was used to identify putative intramodular hub genes with the top five candidates having greater than 25 intramodular connections (i.e., RHOT2, mitochondrial Rho GTPase 2; RIPK3, receptor-interacting serine/threonine-protein kinase 3; SSH3, slingshot homolog 3; PIDD1, p53-induced-death-domain-containing protein 1; and TELO2, telomer length regulation protein TEL2 homolog). Importantly, none of the five hub candidates were both differentially expressed and linked to overall survival probability. Of genes that were differentially expressed and were significant for overall survival, DAPK3 was revealed as the pseudo-hub gene of the network (Figure 6). It also possessed a high degree of connectivity as revealed by degree distribution and closeness centrality (Table S3). In addition to DAPK3, ANKRD13D (ankyrin-repeat-domain-containing protein 13D), ENKD1 (enkurin-domain-containing protein 1), DEF8 (differentially expressed in FDCP 8 homolog), NOL3 (nucleolar protein 3), and ARRDC1 (α-arrestin-domain-containing protein 1) exhibited differential expression profiles that significantly impacted overall survival probability.
Figure 6

Biomolecular interaction network constructed for DAPK3 and other gene products of the darkred module

Genes within the darkred module were used as input for protein-protein interaction network construction using the search tool for the retrieval of interacting genes/proteins (STRING). The network was visualized with Cytoscape. The node size reflects the degree of interactions (i.e., edges) linked to a given gene. The node color describes the differential expression: blue, downregulation; orange, upregulation; and white, not differentially expressed. The node border color is reflective of the association with overall survival: red, |log_rank <0.05| and blank, not significant. Genes that were differentially expressed and significantly associated with overall survival are noted in red text.

Biomolecular interaction network constructed for DAPK3 and other gene products of the darkred module Genes within the darkred module were used as input for protein-protein interaction network construction using the search tool for the retrieval of interacting genes/proteins (STRING). The network was visualized with Cytoscape. The node size reflects the degree of interactions (i.e., edges) linked to a given gene. The node color describes the differential expression: blue, downregulation; orange, upregulation; and white, not differentially expressed. The node border color is reflective of the association with overall survival: red, |log_rank <0.05| and blank, not significant. Genes that were differentially expressed and significantly associated with overall survival are noted in red text. To investigate DAPK3 expression in human colon cancer, the Toil recomputed expression of DAPK3 in GTEx and TCGA (healthy colon vs. colon adenocarcinoma, primary tumor) was juxtaposed via unpaired t test with Welch's correction. As shown, the level of DAPK3 expression is significantly downregulated (Figure 7A; −0.825 ± 0.089, p < 0.0001) in colon adenocarcinoma. A subsequent analysis of DAPK3 expression (healthy colon vs. colon adenocarcinoma, primary tumor; unified by Wang et al., 2018) generated consistent results and verified reduced DAPK3 expression in colon cancer samples (Figure 7B; −0.879 ± 0.048, p < 0.0001). Thus, the gene expression analysis further substantiates DAPK3 as a candidate tumor suppressor gene. Survival data were also retrieved as part of the UCSC dataset downloaded from Xena and subjected to Kaplan-Meier analysis to determine the significance of DAPK3 expression on survival probability in patients with colon adenocarcinoma. Figure 8A shows that high DAPK3 expression (>10.92, upper quartile) strongly correlates with poor overall survival probability when compared with low DAPK3 expression (<10.39, lower quartile). However, DAPK3 expression does not make a significant impact on disease-specific survival (Figure 8B). One feasible explanation for this discrepancy is an experimentally defined role for DAPK3 in the actin filament and focal adhesion dynamics (Nehru et al., 2013), as well as cellular migration (Li et al., 2015; Komatsu and Ikebe, 2004), which may play unfavorably within the context of colon cancer metastasis. Invasion data were not available for the Toil Recomputed Compendium; however, an examination of DAPK3 expression versus invasion for the ‘TCGA Colon Cancer’ dataset via Xena does reveal correlation between DAPK3 expression and metastatic invasion occurrence. In this case, incidents of venous, lymphatic, and perineural invasions were paradoxically associated with higher DAPK3 expression (Figure 8C).
Figure 7

DAPK3 expression in the colon of healthy normal and primary colonic adenocarcinoma tumors

The expression of DAPK3 in the healthy colon (GTEx) and colon adenocarcinoma primary tumor (TCGA) was examined in UCSC Toil RNA-Seq Recompute Compendium (A) and MSKCC unified (B) datasets. Statistical significance was identified by Welch's t test for two independent samples of unequal variance. Data are presented as a violin plot showing the median (dashed line) along with the upper and lower limits of the interquartile range between the 25th and 75th percentiles (dotted lines). The n values are indicated on the figure. TCGA, The Cancer Genome Atlas.

Figure 8

High DAPK3 expression is associated with poor overall survival of patients with colon adenocarcinoma

Overall survival data (A) and disease-specific survival data (B) were retrieved as part of the UCSC Toil Recompute dataset and subjected to Kaplan-Meier analyses. In (C), a comparison of DAPK3 expression in patients with colon adenocarcinoma with or without incidents of venous (i), lymphatic (ii), or perineural (iii) invasion was completed using data retrieved from the Xena TCGA data hub (https://tcga.xenahubs.net). Statistical significance was identified by Welch's t test for two independent samples of unequal variance (n = 245 (venous), 251 (lymphatic), and 171 (perineural)). Data are presented as boxplots showing the median (solid line) along with the upper and lower limits of the interquartile range between the 25th and 75th percentiles (box) and Tukey whiskers. TCGA, The Cancer Genome Atlas.

DAPK3 expression in the colon of healthy normal and primary colonic adenocarcinoma tumors The expression of DAPK3 in the healthy colon (GTEx) and colon adenocarcinoma primary tumor (TCGA) was examined in UCSC Toil RNA-Seq Recompute Compendium (A) and MSKCC unified (B) datasets. Statistical significance was identified by Welch's t test for two independent samples of unequal variance. Data are presented as a violin plot showing the median (dashed line) along with the upper and lower limits of the interquartile range between the 25th and 75th percentiles (dotted lines). The n values are indicated on the figure. TCGA, The Cancer Genome Atlas. High DAPK3 expression is associated with poor overall survival of patients with colon adenocarcinoma Overall survival data (A) and disease-specific survival data (B) were retrieved as part of the UCSC Toil Recompute dataset and subjected to Kaplan-Meier analyses. In (C), a comparison of DAPK3 expression in patients with colon adenocarcinoma with or without incidents of venous (i), lymphatic (ii), or perineural (iii) invasion was completed using data retrieved from the Xena TCGA data hub (https://tcga.xenahubs.net). Statistical significance was identified by Welch's t test for two independent samples of unequal variance (n = 245 (venous), 251 (lymphatic), and 171 (perineural)). Data are presented as boxplots showing the median (solid line) along with the upper and lower limits of the interquartile range between the 25th and 75th percentiles (box) and Tukey whiskers. TCGA, The Cancer Genome Atlas.

Differential correlation of DAPK3 in normal and tumor samples revealed loss of correlation with genes linked to cell migration and cell adhesion

Genes that are functionally related tend to have similar expression profile; therefore, differential gene correlation analysis (DGCA) comparing expression correlation in normal and disease samples can illuminate DAPK3-dependent biological processes and/or molecular pathways that may be distinctly involved between the two states. DGCA was used to compare gene expression correlation in normal and colon adenocarcinoma disease samples, and the outcomes were sorted into nine possible categories (Figure S2). Of the 8,528 protein-coding genes surveyed, 8,222 (93%) were found to be differentially correlated with DAPK3 between healthy controls and colon cancer conditions. When compared against the GTEx normal healthy population, DAPK3 lost correlation with 46% of the genes surveyed and gained correlation with 47%. Finally, 6% of the surveyed genes had no significant correlation in either condition (healthy colon or colon adenocarcinoma) or had nonsignificant differential correlation. GO enrichment analysis was used to identify the biological processes and/or molecular pathways wherein differential correlation of DAPK3 was implicated (Figure 9). DAPK3 differential correlations changed most within the biological process of “Organic acid catabolic process; GO:0016054,” “Cell projection assembly; GO:0030031, and “Muscle organ development; GO:0007517” and the molecular pathways of “Endoribonuclease activity; GO:0004521” and “Extracellular matrix structural constituent; GO:0005201.”
Figure 9

Differential gene correlation analysis of DAPK3-dependent biological processes

DGCA R was used to compare gene expression correlation in normal and colon adenocarcinoma disease samples. GO enrichment analysis was completed using |minSize| = 30, |p value cutoff| = 0.01, and |filterSigThresh| = 0.01. GO, gene ontology.

Differential gene correlation analysis of DAPK3-dependent biological processes DGCA R was used to compare gene expression correlation in normal and colon adenocarcinoma disease samples. GO enrichment analysis was completed using |minSize| = 30, |p value cutoff| = 0.01, and |filterSigThresh| = 0.01. GO, gene ontology.

Discussion

Several patient and disease characteristics are known to affect survival of patients with colon cancer, including age, sex, primary tumor location, tumor grade, and lymph node involvement (Dienstmann et al., 2019; Weiser et al., 2011). In addition, LVI and PNI are histopathological features associated with higher risk colon cancer (Skancke et al., 2019; Mutabdzic et al., 2019; Cienfuegos et al., 2017; Liebig et al., 2009). However, the molecular biomarkers that distinguish between normal and invasive colon cancer are not well described. In this study, we used the WGCNA method to evaluate the system-level functionality of genes with clinical invasion features of colon adenocarcinoma. The power of WGCNA lies in its ability to reveal gene modules (i.e., gene co-expression networks) and identify the central players (i.e., hub genes) within modules that are usually related to biological functions (Li et al., 2018; Langfelder and Horvath, 2008; Zhang and Horvath, 2005). RNA-Seq data and clinical information obtained from the TCGA database were included in the WGCNA to identify robust co-expression modules associated with tumor invasion characteristics. A total of 24 distinct gene co-expression modules that included 9,750 protein-coding genes were identified from primary tumors of patients with colon cancer. After examining the associations between the modules and clinical invasion traits, darkred was identified to be a clinically significant gene module that required further interrogation. Significantly the module was linked with positive gene correlation toward lymphatic and venous invasion as well as pathological stage. The 151 genes within this module were considered to possess functional associations with each other and, thus, could potentially identify important molecular pathways and hub genes that could advance understanding, detection, and/or treatment opportunities for colon cancer. Functional annotation of the darkred module revealed a core set of genes that were involved in the “Actin cytoskeleton pathway.” Moreover, the enrichment of GO terms “Cell projection assembly” and “Extracellular matrix structural constituent” in differential correlations identified for DAPK3 are of particular interest. These terms are associated with the molecular dynamics of cell migration and adhesion, which are particularly important to the process of metastasis. The cytoskeleton, especially the contractile tension generated by actomyosin, is also inherent to the pathogenesis and metastasis of tumors. Indeed, there is growing appreciation that a wide range of cellular activities that are highly relevant to tumorigenesis are dependent on the composition and organization of the cytoskeletal architecture, including tumor cell migration and invasion (Fletcher and Mullins, 2010; Vasiliev, 2004), epithelial-mesenchymal transformation (Rana et al., 2018; Lamouille et al., 2014), nuclear dysmorphia and genome stability (Takaki et al., 2017; Irianto et al., 2016; Chow et al., 2012), and tumor cell survival under hemodynamic shear stress (Xin et al., 2019). Furthermore, genes in the darkred module were also enriched in processes related to “Positive regulation of apoptotic signaling pathway” and “Regulation of mitochondrial membrane permeability involved in apoptotic process.” It is well known that the attenuation of apoptotic signaling is a hallmark of tumor biology (Zhang and Yu, 2013; Yang et al., 2009), and a plethora of alterations have been revealed in key apoptotic pathways that increase tumor cell survival and reduce the efficacy of chemotherapy. Enhanced invasiveness is also observed in cancer cells that experience failed apoptosis (Berthenet et al., 2020) or incomplete mitochondrial outer membrane permeabilization (Ichim and Tait, 2016). When the genes in the darkred module were mapped using STRING, DAPK3 was identified as a pseudohub gene. However, RHOT2 (mitochondrial Rho GTPase 2; also known as MIRO2) was the true core hub based on node degree when the (i) differential expression or (ii) Kaplan-Meier overall survival estimate was disregarded. RHOT2 is a mitochondrial GTPase involved in mitochondrial trafficking and serves as a docking site for parkin (PRKN)-mediated mitophagy. While the RhoA GTPase family is closely associated with cancer progression, there are few studies on the role of the RHOT2 protein in tumor cell movement. In one published study, however, mitochondrial trafficking to the cortical cytoskeleton and tumor cell invasion were suppressed when siRNA-mediated silencing of Miro2 in LN229 glioma cells was used under conditions of cellular stress (Caino et al., 2016). The RHOT2 network also included ENKD1, ARRDC1, NOL3, DAPK3, ANKRD13D, and DEF8. Each of these genes was differentially expressed in colon adenocarcinoma and was correlated with patient prognosis for overall survival analysis. The ANKRD13D protein possesses a ubiquitin-interacting motif (UIM) and forms a complex with other ANKRD13 family members to bind the Lys63-linked ubiquitin chains appended to epidermal growth factor receptor (EGFR) to regulate the internalization of ligand-activated EGFR (Mattioni et al., 2020; Tanno et al., 2012). ANKRD13 has also been used in a panel of DNA methylation markers to identify colorectal cancer (CRC) in cell-free DNA samples obtained from patients (Cho et al., 2020). The ARDDC1 protein controls the generation of endosomal microvesicles that bud from the plasma membrane (Nabhan et al., 2012), a type of protein cargo (Anand et al., 2018), and, consequently, signal transduction by receptors subjected to endosomal sorting mechanisms. Other studies suggest that ARRDC1 may act as a tumor suppressor by modulating the levels of Yes-associated protein (YAP) 1 and other membrane receptors (Xiao et al., 2018). NOL3 is an RNA-binding protein that has been implicated in tumorigenesis and metastasis. NOL3 was associated with worse prognosis in CRC and was overexpressed in patients with CRC with lymph node metastasis (Zhang et al., 2020; Toth et al., 2016). Finally, the gene products of ENKD1 and DEF8 are poorly described in the literature, so it is unclear how these gene products may act to impact tumor cells. The emergence of DAPK3 as a pseudohub gene in the darkred network with differential expression and significant linkage to survival outcome was particularly interesting. DAPK3 (also known as zipper-interacting protein kinase, ZIPK) is a serine/threonine protein kinase that acts as a molecular switch for a multitude of cellular processes, including the induction of apoptotic and autophagic cell death, cell proliferation, actomyosin contraction, and cellular migration (Shiloh et al., 2014). The proapoptotic influence of DAPK3 is aligned with its association with promyelocytic leukemia (PML) oncogenic domains, nuclear structures implicated in transcription of regulation of apoptotic factors (Kawai et al., 2003). A marked drop in basal apoptosis is observed at the polyp-to-adenoma stage of colon cancer, suggesting that resistance to apoptotic programming is acquired early in tumorigenesis (Termuhlen et al., 2002). The reduced expression of DAPK3, as a tumor-suppressing kinase, could provide a means for metastatic colon cancer cells to exhibit resistance to particular pathways of apoptosis. In addition, DAPK is associated with autophagic cell death; DAPK3 interaction with autophagy-related (ATG)-1 protein kinase allows for regulation of autophagosome formation via actomyosin activation (Tang et al., 2011). DAPK3 was also found to phosphorylate and increase the activity of Unc-51-like autophagy activating kinase (ULK1; the mammalian homolog of ATG1), thereby driving autophagy induction (Li et al., 2021). Further to this finding, the co-expression of ULK1 and DAPK3 was correlated with favorable survival outcomes in patients with gastric cancer (Li et al., 2021). DAPK3 promotes actin reorganization and actomyosin contraction by controlling the phosphorylation of myosin-regulatory light chain (Moffat et al., 2011) which has a broad range of cellular impacts, including cell migration, focal adhesion regulation, stress fiber bundling, and autophagic cell death. In addition, DAPK3 can attenuate myosin light chain phosphatase activity (Borman et al., 2002; MacDonald et al., 2001), and the loss of myosin phosphatase was further linked to cancer cell nuclear dysmorphia and genome instability (Takaki et al., 2017). Although DAPK3 displays tumor suppressor properties (Li et al., 2015, 2021; Das et al., 2016; Kocher et al., 2015; Brognard et al., 2011; Bi et al., 2009), it also plays an oncogenic role with regulation of transcriptional and translational programs that are tightly connected to cell survival, proliferation, and growth. In this regard, DAPK3 promotes epithelial-mesenchymal transition (Kake et al., 2017; Li et al., 2015). DAPK3 also provides transcriptional regulation of canonical Wnt/β-catenin signaling through an interaction with Nemo-like kinase to drive enhanced proliferation of colon carcinoma cells (Togi et al., 2011). Additionally, DAPK3 can directly enhance the transcriptional activity of both STAT3 and the DNA-binding androgen receptor (Felten et al., 2013; Sato et al., 2006). As previously hypothesized by Kake et al (Kake et al., 2017), DAPK3 may change from a tumor suppressor into a tumor promoter during the course of tumor progression. In the present study, we show that DAPK3 expression is reduced in samples obtained from patients with colon cancer; however, there is significant correlation for DAPK3 association with LVI and PNI properties and worse survival probability. Further investigations will be required to solve the incongruity observed with respect to the proapoptotic role of DAPK3 in tumor suppression versus its proliferative influence on tumorigenesis. TCGA database is frequently used to uncover molecular mechanisms and potential biomarkers associated with CRC progression and prognosis. For instance, Liu et al utilized TCGA to show that the prognostic power of the cancerous inhibitor of protein phosphatase 2A (CIP2A) was linked to activating transcription factor 6 (ATF6)-dependent ER stress signaling in CRC (Liu et al., 2018). By the same token, V-set and transmembrane-domain-containing 2A (VSTM2A) was identified as one of the most significantly downregulated secretory proteins in CRC, with low expression levels equating to worse overall survival in patients with CRC by way of Wnt signaling suppression (Dong et al., 2019). As well, the TCGA database was used by Taha-Mehlitz et al to support their discovery of adenylosuccinate lyase (ADSL) as a contributor to mitochondrial dysfunction within the context of CRC (Taha-Mehlitz et al., 2021). Finally, TCGA has facilitated the construction of prognostic indices (Hou et al., 2018) and nomograms (Lee et al., 2019) for CRC which outperform conventional prediction tools that are based solely on clinicopathological risk factors such as age and Tumor, Node, Metastasis (TNM) staging. WGCNA of TCGA data has been used in multiple studies to derive system-level properties, to identify genes and/or gene modules associated with specific clinicopathological variables, or to relate prespecified genes and/or gene sets to pathways or functions within the context of CRC. Application of WGCNA to TCGA data by Wang et al revealed patients with CRC with median immunity to have enhanced activation of TGF-β, vascular endothelial growth factor (VEGF), and MAPK signaling pathways and have better survival outcome; in contrast, a low-immunity subgroup displayed the worst prognosis (Wang et al., 2020). Similarly, Su et al found that higher neutrophil scores associate with poorer prognosis and that a SOD2-CXCL8 neutrophil recruitment axis may play a role in CRC progression (Su et al., 2021). Recently, Jiang et al combined the power of TCGA and WGCNA to investigate the oncogenic functions of B-cell lymphoma 9 (BCL9). This investigation uncovered a β-catenin-independent function of BCL9 in a poor-prognosis subtype of CRC tumors that enhances tumor progression through its interaction with paraspeckle proteins (Jiang et al., 2020). Some studies that utilize WGCNA to identify modules of highly correlated biomolecules restrict the data input to include only differentially expressed transcripts. A study by Zhou et al. constructed miRNA-gene interaction networks, and the authors subsequently identified two important gene modules with 11 of the 20 hub genes demonstrating prognostic value in survival analyses (Zhou et al., 2018). Although not the focus of the study, LVI was associated with both gene modules and subsequent KEGG analysis (i.e., Kyoto Encyclopedia of Genes and Genomes) identified extracellular matrix (ECM)-receptor interaction, focal adhesion, and vascular smooth muscle contraction as significantly enriched pathways. A study by Wu et al. cross-referenced DEGs with target genes for differentially expressed miRNAs from which GCNT4, EDN2, and miR-1295 were observed as network hubs (Wu et al., 2017). Given the theoretical distinction between differential expression and differential correlation and the process by which WGCNA determines intramodular hub genes, filtering genes by differential expression before WGCNA can result in loss of information. This information is pertinent to gene network construction, the loss of which may negatively impact the integrity of the co-expression network analysis (Langfelder and Horvath, 2008). Nevertheless, differential gene expression is a valuable component in network analyses that look to pinpoint candidate biomarkers, therapeutic targets, and/or gene signatures for diagnosis or prognosis. One viable method of integrating differential gene expression into a co-expression network that is constructed with all genes (i.e., DEGs and non-DEGs) is to project the DEGs post hoc onto said network after the identification of gene modules. An example of this approach is apparent in the study conducted by Jiang et al, where candidate BCL9-interacting proteins and genes downstream of BCL9 were projected onto a WGCNA correlation matrix constructed with all genes (Jiang et al., 2020). The authors found that most of the genes downstream of BCL9, but not the BCL9-interacting proteins, mapped into specific gene modules involved in processes such as ECM remodeling, neuron differentiation, and wound healing. Similarly, we applied the WGCNA method to systematically identify co-expression network modules associated with lymphatic invasion in colon adenocarcinoma. Our co-expression network was constructed using all genes (minus those removed during quality control), and the DEGs were then projected onto the gene module with the highest significant correlation with LVI. Finally, for our differential expression analysis, we compared TCGA tumor samples against GTEx normal samples that were reanalyzed using the same RNA-Seq pipeline to eliminate batch effects (Vivian et al., 2017). This eliminates potential noises of tumor microenvironments stemming from the proximity of “TCGA solid tissue normal” to “TCGA primary tumor” samples. To our knowledge, no previous published studies have investigated LVI in colon adenocarcinoma utilizing the methods described in the present study.

Limitations of the study

The presented analyses were completed using the TCGA and GTEx RNA-Seq datasets. Although these datasets excel at having sufficient observations for statistically sound correlation studies, similar analyses need to be repeated on additional datasets to confirm our findings. The described networks characterized a core gene set associated with invasion and survival of tumor bulk tissues. Given that genes may demonstrate diverse functions across different cell types, gene sets identified from averaged dataset need to be reexamined in a cell-specific manner for the identification of susceptible cell types and converged pathways among different cells. The described networks were based on the analysis of gene-to-gene correlations, which did not indicate causal relationships.

STAR★Methods

Key resources table

Resource availability

Lead contact

Requests for further information should be directed to and will be fulfilled by the lead contact, Justin A. MacDonald (jmacdo@ucalgary.ca).

Materials availability

This study did not generate new unique reagents.

Data and code availability

Data This study analyzes existing, publicly available RNA-Seq data. The sources for the datasets are listed in the key resources table. Code This study does not report original code. All codes were used in this study in alignment with recommendations made by authors of R packages in their respective user’s guide, which can be accessed at https://bioconductor.org. Additional information requests Any additional information required to reanalyze the data used in this study is available from the lead contact upon request.

Experimental model and subject details

RNA sequencing data (RNA-Seq) for the GTEx (https://gtexportal.org/home/) and TCGA (https://portal.gdc.cancer.gov) were retrieved from the Xena Toil RNA-Seq Recompute Compendium (https://toil.xenahubs.net) on 30 November 2020. Although TCGA includes normal samples, these “solid tissue normal” samples are derived from normal tissues located proximal to the tumor. Consequently, they may also possess tumor transcriptomic profiles. In contrast, samples from the GTEx project provide expression data from the normal tissue of healthy, cancer-free individuals. The datasets from the two projects cannot be compared directly, so the “TCGA TARGET GTEx study” with samples restricted to |Primary_site| = “colon” was retrieved from the Xena Toil data hub where TCGA, TARGET, and GTEx samples were reanalyzed (i.e., realigned to hg38 genome with expressions defined using RSEM and Kallisto methods) by the same RNA-Seq pipeline. The Toil Compendium provides recomputed TCGA and GTEx expression raw data based on a uniform bioinformatic/RNA-Seq pipeline to eliminate batch effect due to different computational processing (Vivian et al., 2017). The Xena Toil pipeline utilized STAR for uniform realignment (Dobin et al., 2013); RSEM and Kallisto were used to produce quantification data (Li and Dewey, 2011). The realignment also enabled the removal of degraded samples before quantification and batch effect correction after quantification. In addition, TCGA survival data and COAD clinicalMatrix information of patients with colon cancer were retrieved from the Xena TCGA data hub (https://tcga.xenahubs.net). The Xena browser (http://xena.ucsc.edu), hosted by the Computational Genomics Lab of the University of California Santa Cruz (UCSC; https://cglgenomics.ucsc.edu), was utilized to compare gene expressions (Goldman et al., 2020). Data retrieval was completed via the UCSCXenaTools R package.

Method details

Data preprocessing

The log-transformed expected count retrieved from Xena Toil was back-transformed via the |round(((2ˆx)-1),0)| function. The data were further refined to include only “Normal Tissue” from the GTEx gene sets and only “Primary Tumor” with |histological_type| = “Colon Adenocarcinoma” from the TCGA gene sets. TCGA and GTEx datasets were filtered to exclude non-protein-coding genes. The TARGET dataset was excluded to permit focus on adult samples. Data from metastatic and recurrent tumors were also excluded owing to small sample sizes. After these preprocessing steps, the dataset included 551 patient samples and 18,205 genes. Data processing was completed using the R (v4.0.2) programming language. All codes used in this study align with recommendations made by authors of R packages in their respective user’s guide, which can be accessed at https://bioconductor.org.

Identification of DEGs

The limma workflow was used to detect DEGs between cancer and normal samples (Robinson et al., 2010; Ritchie et al., 2015; Law et al., 2016). The input data (18,205 genes, 551 samples) were screened with function |filterByExpr.default| to eliminate outliers and lowly expressed genes. Scaling factors were computed with function |calcNormFactors.upperquartile|. The final DGEList object contained 15,164 genes and 547 samples (303 GTEx, 244 TCGA). Heteroscedasticity was removed via function |voom|. Expression values of TCGA were then compared with that of GTEx through linear modeling; namely, the limma functions |lmFit|, |contrasts.fit|, and |eBayes|. All functions were operated with default settings. The transformed expression values, generated by |voom|, were exported for subsequent analyses.

Weighted gene co-expression network analysis

To identify candidate biomarkers for lymphatic invasion within the colon adenocarcinoma subset, we applied WGCNA to voom-transformed expression values (Langfelder and Horvath, 2008). Genes with negative normalized expression levels were removed, and then, the WGCNA function |goodSamplesGenes| was used to identify genes with zero variance. Sample outliers were detected via function |hclust| and then removed with the WGCNA function |cutreeStatic.minSize=10|. The data input for WGCNA consisted of 231 TCGA samples and 9,750 genes. The network topology was calculated for a range of soft-thresholding power (β). Then, the scale-free topology fit index was plotted as a function of the soft-thresholding power, and the mean connectivity as a function of the soft-thresholding power was determined. Although β = 9 was the lowest power for which the scale-free topology fit index reached an R2 value of 0.90, β = 10 was ultimately used as it was recommended as the minimum soft-threshold power to be used for a signed network (Langfelder and Horvath, 2008). The co-expression similarity was raised to achieve scale-free topology with this soft-thresholding power. Co-expression similarity and adjacency were calculated for a signed network, transformed into a topological overlap matrix (TOM) to calculate the corresponding dissimilarity (dissTOM) which in turn was used to conduct hierarchical clustering. Module identification was completed with the function |cutreeDynamic| with method=hybrid, minClusterSize=100, deepSplit=4, pamStage=TRUE, and pamRespectDendro=FALSE. Any highly similar modules with the height of clustering lower than 0.25 were merged. A clustering dendrogram was used to display the results.

Generation of module-trait relationships and pathway enrichment analysis of genes

Correlations between modules generated by WGCNA and clinical parameters were determined by module-trait relationship (MTR) analysis. MEs were calculated and related to clinical features via standard WGCNA pipeline. GO enrichment analysis was completed with topGO [algorithm=elim, statistic=fisher] to facilitate biological interpretation (Alexa et al., 2006). Finally, intramodular connectivity was analyzed using genes within the darkred module as search inputs. STRING (Doncheva et al., 2019) and the Cytoscape plugin NetworkAnalyzer (Shannon et al., 2003) were used for computation of topological parameters for an undirected network with protein query as the data source.

Survival analysis

Survival data were retrieved as part of the TCGA dataset downloaded from Xena and were subjected to Kaplan-Meier analysis in R. The Cox proportional-hazards model was used to investigate the association between survival time and candidate biomarkers for lymphatic invasion. Z-scale cut-offs were set at ≥ 0.647 for high expression and ≤ 0.647 for low expression. Survival time cut-off was set at 10 years. The coxph R function was used in combination with the |log-rank| test for comparison of the high/low survival curves. Survival curves were plotted to show the differences in patient mortality between high- and low-expression groups.

Differential gene correlation analysis

Using the R package DGCA (McKenzie et al., 2016), the differential correlation between DAPK3 and a filtered list of human protein-coding genes (8,528 genes; exclusion criteria: bottom 25th percentile of median expression and/or dispersion index of expression) was computed with the |ddCorAll| function with [nPerms=100]. This pipeline provided the Pearson coefficient (r) and the corresponding p values for each pair of genes across samples (n=547). Significant changes in differential correlation between the two conditions (healthy colon vs. colon adenocarcinoma) were then identified using a Fisher’s Z-test. GO term enrichment analysis of differential correlation-classified genes was completed with the DGCA function |ddcorGO| with [adjusted=TRUE, conditional=TRUE, calculateVariance=TRUE].

Quantification and statistical analyses

Statistical analyses were performed within specific R packages. For DGCA, the pipeline provided Pearson coefficient (r) and the corresponding p values for each pair of genes across samples (n=547). Significant changes in differential correlation between the two conditions (healthy colon vs. colon adenocarcinoma) were then identified using a Fisher’s Z-test. For survival analyses, the Kaplan-Meier method was used to estimate the survival cure, univariate Cox analyses was used to compute hazard ratios, and outputs from log-rank testing was used to describe overall significance of the model. For the differential expression of genes analysis, the treat method with a nonparametric empirical Bayes approach for the analysis of factorial data provide a paired t-test for every gene within the limma R environment (McCarthy and Smyth, 2009). For the comparison of DAPK3 expression levels, Welch’s t-test for two independent samples of unequal variance was used. Data were plotted using GraphPad Prism 8.3.0 (GraphPad Software, La Jolla, CA). DAPK3 expression levels are presented graphically with median values along with the upper and lower limits of the interquartile range (IQR) between the 25th and 75th percentiles. The statistical tests used to assess experimental results are provided within the figure legends. Statistical test results (i.e., p values) are provided for all comparisons in figures, and p values < 0.05 were considered significant. The ‘n’ value represents the number of patient samples obtained from the RNA-Seq datasets and is marked within the figure or in the figure legend. As detailed earlier, RNA-Seq data were filtered to include only samples from adults and to exclude non-protein-coding genes; in addition, data for metastatic and recurrent tumors were excluded owing to small sample sizes.
REAGENT or RESOURCESOURCEIDENTIFIER
Deposited data

TCGA TARGET GTEx studyXena Toil RNA-Seq Recompute Compendiumhttps://toil.xenahubs.net
TCGA survival dataXena TCGA data hubhttps://tcga.xenahubs.net
COAD clinicalMatrix dataXena TCGA data hubhttps://tcga.xenahubs.net

Software and algorithms

R (v4.0.2)The R Projecthttps://www.r-project.org
RStudio (1.3.1093-1)RStudio Teamhttp://www.rstudio.com
Cytoscape (v3.8.2)Shannon et al. (2003)https://cytoscape.org
Limma (v3.48.0)Ritchie et al. (2015)https://bioconductor.org
stringAppDoncheva et al. (2019)http://apps.cytoscape.org/apps/stringapp
topGO (release 3.13)Alexa et al. (2006)https://bioconductor.org
Xena browserComputational Genomics Lab of the University of California Santa Cruzhttp://xena.ucsc.edu
ggplot2 (v3.3.3)Wickham (2016)http://ggplot2.tidyverse.org
DGCA (v1.0.2)McKenzie et al. (2016)https://cran.rproject.org/web/packages/DGCA/index.html
  1 in total

1.  Network analysis of TCGA and GTEx gene expression datasets for identification of trait-associated biomarkers in human cancer.

Authors:  Huey-Miin Chen; Justin A MacDonald
Journal:  STAR Protoc       Date:  2022-02-07
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.