Literature DB >> 29541392

Discovery and validation of a glioblastoma co-expressed gene module.

Leland J Dunwoodie1, William L Poehlman1, Stephen P Ficklin2, Frank Alexander Feltus1.   

Abstract

Tumors exhibit complex patterns of aberrant gene expression. Using a knowledge-independent, noise-reducing gene co-expression network construction software called KINC, we created multiple RNAseq-based gene co-expression networks relevant to brain and glioblastoma biology. In this report, we describe the discovery and validation of a glioblastoma-specific gene module that contains 22 co-expressed genes. The genes are upregulated in glioblastoma relative to normal brain and lower grade glioma samples; they are also hypo-methylated in glioblastoma relative to lower grade glioma tumors. Among the proneural, neural, mesenchymal, and classical glioblastoma subtypes, these genes are most-highly expressed in the mesenchymal subtype. Furthermore, high expression of these genes is associated with decreased survival across each glioblastoma subtype. These genes are of interest to glioblastoma biology and our gene interaction discovery and validation workflow can be used to discover and validate co-expressed gene modules derived from any co-expression network.

Entities:  

Keywords:  cancer; complement system; gene co-expression networks; glioblastoma; systems biology

Year:  2018        PMID: 29541392      PMCID: PMC5834250          DOI: 10.18632/oncotarget.24228

Source DB:  PubMed          Journal:  Oncotarget        ISSN: 1949-2553


INTRODUCTION

Glioblastoma (GBM) tumors, with an adult median survival time of 14.6 months after radiation and temozolomide therapy [1], are known for their heterogeneity, vascularization, and lethality. Even after resection, remaining tumor cells multiply and invade the surrounding parenchyma. Interestingly, primary GBM has few known risk factors [2] -- GBM affects patients across age, cultural, and socioeconomic boundaries. The discovery of relevant biomarker combinations driving GBM tumorgenicity would have therapeutic implications. There are known monogenic GBM biomarkers that include mutations in the IDH1 [3] and PDGFRα [4] loci. However, any given biomarker does not provide a complete picture of the GBM microenvironment. GBM tumors, as with other tumors, diseases, and complex traits, are controlled by a variety of genetic and epigenetic factors [5]. Thus, a systems approach is needed to fully understand the biology underlying the GBM phenotype. Fortunately, modern measurement technologies such as next-generation sequencing [6] now provide researchers a broad genomics perspective that is revealing new insights to human disease. These technologies, coupled with genomics and epigenomics databases such as The Cancer Genome Atlas (TCGA) [7], used in this investigation, encourage new discoveries. To identify complex gene expression relationships in multiple human tumors, we have built RNAseq-based gene expression matrices (GEMs) from publicly available RNAseq datasets. One GEM contains gene expression profiles for GBM, lower grade glioma (LGG), bladder urothelial carcinoma (BLCA), thyroid carcinoma (THCA), and ovarian serous cystadenocarcinoma (OV) from TCGA [7] and is fully described in another publication [8]. Another GEM built for this study contains different RNAseq expression profiles for GBM [9], normal brain [10], and Parkinson’s brain [11] obtained from the NCBI SRA archive [12]. Both GEMs were individually preprocessed and transformed into a gene co-expression network (GCN) using Knowledge-Independent Network Construction (KINC) software [8]. Complex gene interactions present in the input samples are mineable from these GCNs, and these gene interactions patterns can be compared between GCNs. The KINC software package ([8]; open source code available at http://www.github.com/SystemsGenetics/KINC) is unique because it deconvolutes mixed-condition expression patterns (e.g. mixed tumor expression profiles in the GEM), allowing significant co-expression relationships to be annotated with sample labels (e.g. GBM vs. non-GBM). There is no need to separate gene expression profiles prior to analysis. Thus, without providing KINC a priori knowledge, tumor- or non-tumor-specific gene expression relationships can be identified and analyzed for biological meaning. In this report, we apply a generalizable gene interaction discovery and validation workflow, outlined in Supplementary Figure 1, that allows for the detection of condition-specific gene sets in one GCN that can be validated for specificity and reproducibility in alternate GCNs. We focused this approach on GBM-specific modules to investigate GBM tumor biology. Herein, we describe the discovery and characteristics of a GBM-specific gene module present in two GCNs, and explore the expression patterns and epigenetic state of these genes across the spectrum of GBM subtypes using multiple in silico approaches.

RESULTS AND DISCUSSION

GCN construction

KINC identified significant co-expression relationships among 2016 datasets in the TCGA Network and among 204 datasets in the Brain Network. The TCGA network, obtained from our previous study in Ficklin et al. [8], included GBM and LGG datasets along with BLCA, OV, and THCA datasets. The Brain Network included GBM and normal brain datasets along with datasets from Brodmann’s Area 9 of Parkinson’s patients [11]. The TCGA Network (Supplementary Table 1) is described in [8] and the Brain Network (Supplementary Table 2) was visualized with Cytoscape [41] and shown in Figure 1. 356 LCMs were detected in the TCGA Network ([8]; Supplementary Table 3) and 456 LCMs were found in the Brain Network (Supplementary Table 4). Many of these modules were condition-specific; for example, 68 of the Brain Network’s 456 modules were enriched for GBM (p < 0.001).
Figure 1

Visualization of the brain network

Points represent transcripts (nodes) and lines represent significant expression correlations (edges) between nodes.

Visualization of the brain network

Points represent transcripts (nodes) and lines represent significant expression correlations (edges) between nodes.

Gene module discovery

Genes in the Brain Network were compared with those in the TCGA Network to investigate commonality in GBM co-expression. 477 unique genes -- 6.6% of the unique genes in the TCGA Network and 8.9% of the unique genes in the Brain Network-were found in both networks. In addition, each network was parsed into LCMs. The TCGA Network had 356 modules total and the Brain Network had 456 modules total. Considering only the genes in these modules, 74 unique genes mapped between networks. The low amount of overlap may be a function of (A) sparse gene overlap between GCNs; and (B) differential Type I and Type II error in each GCN partially due to alternate expression measurement techniques; the TCGA Network used a GEM made with TCGA’s RNAseqV2 workflow [15] and 73599 knownGene 5 UCSC transcript IDs, while the Brain Network used a GEM made with Hisat [18], Cufflinks [21], and 209086 Ensembl hg38 transcript IDs. Nonetheless, of these 74 overlapping genes, 22 (Table 1) were seen in Modules M0257 (in the Brain Network) and M0214 (in the TCGA Network). While the fate of other common interactions is unclear, these 22 intersecting GBM interactions emerged in both GCNs.
Table 1

The 22 genes overlapping between TCGA M0214 and brain M0257

Gene SymbolGene Namehg38 Ensembl IDkg5 UCSC IDEntrez IDChromosomeTranscription StartTranscription StopGene Friends
LAPTM5lysosomal protein transmembrane 5 ENST00000294507uc002iop.178051307324693075782019
C1QAcomplement C1q A chain ENST00000374642uc001bfy.27121226365062263960810
FCER1GFc fragment of IgE receptor IgENST00000367992uc001bga.32207116121527916122069922
C1QCcomplement C1q C chainENST00000374639uc001qtv.17141226436332264811011
CD86CD86 molecule ENST00000330540uc002jkv.2942312205537412212113622
HAVCR2 (TIM-3) [23]hepatitis A virus cellular receptor 2ENST00000307851uc003eet.284868515708583215710971421
LY86lymphocyte antigen 86ENST00000379953uc001fyz.1945066588108665498316
TREM2triggering receptor expressed on myeloid cells 2ENST00000373113uc001nym.254209641158507411631769
FERMT3fermitin family member 3ENST00000345728uc001xvv.28370611642067346422388619
SPI1Spi-1 proto-oncogeneENST00000378538uc003lwk.1668811473548604737857619
C3AR1complement C3a receptor 1ENST00000307637uc002zgf.3719128058302806647121
GPR65G protein-coupled receptor 65ENST00000267549uc001bsc.2847714880051248801481117
RNASE6ribonuclease A family member k6ENST00000304677uc003mwy.11004814207810512078246722
ABI3ABI family member 3 ENST00000225941uc002mkg.25122517492102274922322521
CD300ACD300a molecule ENST00000360141uc011aqf.11131417744664167448479619
TYROBPTYRO protein tyrosine kinase binding protein ENST00000262629uc001vye.3730519359044103590829522
SIGLEC9sialic acid binding Ig like lectin 9 ENST00000250360uc004euu.22718019511249085113031019
MYO1Fmyosin IFENST00000613525uc002pvu.24542198520797857757718
ITGB2integrin subunit beta 2 ENST00000397852uc001nfb.1368921448859534491082622
PARVGparvin gammaENST00000356909uc003opy.26409822441814004420663519
WASWiskott-Aldrich syndromeENST00000376701uc002ocm.212731X486837794869142719
SASH3SAM and SH3 domain containing 3 ENST00000356892uc004dkm.354440X12977998412979520119
Next, we asked if there was corroborative co-expression evidence for the 22 interacting genes from other sources. Specifically, we searched the 22 matching genes as a group in the GeneFriends gene co-expression database [27]. Table 1 shows the gene set co-expression analysis where co-expression probability is represented by the GeneFriends binomial cumulative distribution function (p < 1.00E-7). Because 22 genes were provided to GeneFriends, each gene has a maximum of 22 friends, or co-expressed genes. Interestingly, TCGA M0214 (p < 2.56E-17) and Brain M0257 (p < 1.37E-15) were both enriched for GBM but not LGG (p < 4.52E-03 in the TCGA Network) or normal brain (p < 1.00 in the Brain Network). TCGA M0214 is also enriched for ovarian cancer (OV; p < 2.56E-17). Table 2 shows the condition-specific Fisher’s Exact Test enrichment values for TCGA M0214 and Brain M0257. In total, TCGA M0214 includes 54 genes and Brain M0257 includes 63 genes (Supplementary Table 5). The clinical annotation term enrichment results are available for each TCGA module (Supplementary Table 3) and each Brain module (Supplementary Table 4).
Table 2

Condition-specific module enrichment

NetworkModuleConditionEnrichment
TCGAM0214GBM2.56E-17
TCGAM0214LGG4.52E-03
TCGAM0214OV1.61E-14
TCGAM0214THCA1.00E+00
TCGAM0214BLCA1.00E+00
BrainM0257GBM1.37E-15
BrainM0257Normal Brain1.00E+00
BrainM0257Parkinson2.05E-01
It was interesting that TCGA M0214 is enriched for both GBM and OV. While OV is enriched for 57 modules and GBM is enriched for 102 modules (Supplementary Table 3), 22 modules are enriched for both GBM and OV. While these cancers are seemingly very different, further investigations might reveal new commonalities between them. Indeed, the literature shows few links between GBM and OV, but OV can metastasize to the brain [42] and bevacizumab, an angiogenesis inhibitor used to treat GBM, has been found to alleviate mesenchymal-like, proliferative OV subtypes [43]. While TCGA M0282, described in Ficklin et al. [8], is also enriched for GBM and OV in addition to THCA, no genes are shared between TCGA M0282 and M0257. Indeed, no genes are shared between TCGA M0282 and the Brain Network.

Internetwork gene module validation shows GBM-specific correlations

The correlations in Brain M0257, which is enriched for GBM, were compared to matching correlations found by KINC using only the normal brain datasets in the Brain GEM. 70 of the 154 edges (45.45%) in Brain M0257 were rediscovered in the normal brain datasets. Of these edges, none had a significant Spearman correlation greater than 0.8801, the Brain Network’s RMT threshold (Figure 2). TCGA M0214 includes 54 unique transcripts and 416 edges correlated above its RMT threshold. 49 of these UCSC kg5 transcripts mapped to the hg38 Ensembl IDs used by the Brain and Random GEMs. As described in the Methods section, the expression values for these 49 transcripts in the Random GEM were processed with KINC [8]. KINC rediscovered 191 (45.91%) of the 416 TCGA edges using expression values from the Random GEM. Nine (4.71%) of the 191 rediscovered edges in the Random GEM had a Spearman correlation greater than 0.7901, the Random Network’s RMT threshold. Seven (4.55%) of the 154 edges in Brain M0257 were also found in TCGA M0214; 100% of these edges have a Spearman correlation greater than the Brain Network’s significance threshold (Figure 3). These data indicate that the pairwise correlation of these genes is GBM-specific.
Figure 2

Brain M0257 Correlation values mapped to normal brain datasets

The Spearman correlation values of pairwise gene expression are shown for Brain M0257 and matching edges using only normal samples from the Brain GEM.

Figure 3

Correlation values from TCGA M0214 mapped to Brain M0257 and Random GEMs

The Spearman correlation values of pairwise gene expression are shown for TCGA M0214, Brain M0257, and a miniature GEM created using expression levels from the Random GEM and the genes in TCGA M0214. Only the edges in Brain M0257 that exactly match edges in TCGA M0214 are shown.

Brain M0257 Correlation values mapped to normal brain datasets

The Spearman correlation values of pairwise gene expression are shown for Brain M0257 and matching edges using only normal samples from the Brain GEM.

Correlation values from TCGA M0214 mapped to Brain M0257 and Random GEMs

The Spearman correlation values of pairwise gene expression are shown for TCGA M0214, Brain M0257, and a miniature GEM created using expression levels from the Random GEM and the genes in TCGA M0214. Only the edges in Brain M0257 that exactly match edges in TCGA M0214 are shown.

Gene expression analysis reveals GBM-specific upregulation

We investigated the gene expression levels of the 22 genes seen in TCGA M0214 and Brain M0257 in different conditions. Heatmaps and bar graphs were constructed to visualize the expression levels of the 22 matching genes between TCGA M0214 and Brain M0257 (Figures 4A–4B and 5A–5B). All 22 genes showed significantly upregulated expression (Student’s T Test; p < 0.001) in GBM relative to LGG (in the TCGA Network) and relative to normal brain (in the Brain Network). Significance test results are available for the expression of the 22 shared genes in the TCGA GEM (Supplementary Table 6) and in the Brain GEM (Supplementary Table 7).
Figure 4

Expression levels of the 22 matching genes

(A) GBM and normal brain expression levels in the normalized Brain GEM. (B) GBM and LGG expression levels in the normalized TCGA GEM. Red indicates expression above the mean, blue indicates expression below the mean, and white indicates expression near the mean.

Figure 5

Expression levels of the 22 matching glioblastoma genes

(A) The expression of the 22 matching genes in the Brain GEM after dividing the normalized expression values by either the GBM or normal brain median expression value. Vertical axis units represent normalized FPKM values. (B) The expression of the 22 matching genes in the TCGA GEM. No further normalization was performed. Vertical axis units represent normalized RPKM values. Error bars represent SEM.

Expression levels of the 22 matching genes

(A) GBM and normal brain expression levels in the normalized Brain GEM. (B) GBM and LGG expression levels in the normalized TCGA GEM. Red indicates expression above the mean, blue indicates expression below the mean, and white indicates expression near the mean.

Expression levels of the 22 matching glioblastoma genes

(A) The expression of the 22 matching genes in the Brain GEM after dividing the normalized expression values by either the GBM or normal brain median expression value. Vertical axis units represent normalized FPKM values. (B) The expression of the 22 matching genes in the TCGA GEM. No further normalization was performed. Vertical axis units represent normalized RPKM values. Error bars represent SEM.

Putative transcriptional regulators of the GBM modules

Several gbmSygnal biclusters [37] involving genes from TCGA M0214 or Brain M0257 (Supplementary Table 8) implied regulation by the transcription factor ELF1. In addition, by loading the 22 matching genes into the Broad Institute Gene Set Enrichment Analysis [34], we found that ELF1 was the second-most enriched transcription factor for this dataset (p < 9.91 × 10–8) (Supplementary Table 9). Indeed, 5 of the 22 matching genes have putative ELF1 sites. Moreover, one ELF1 transcript, ENST00000239882, was upregulated in the Brain GEM GBM data relative to normal brain data (p < 3.79 × 10–27) (Supplementary Table 10). Furthermore, this transcript’s kg5 UCSC counterpart, uc001uxs.2, was also upregulated in the TCGA GEM GBM data relative to LGG data (p < 1.94 × 10–4) (Supplementary Table 10). In addition, one of the 22 matching genes, SPI1, is a transcription factor that shares the ETS transcription factor family with ELF1. The 22 matching genes were also provided to the transcription factor function in the GeneFriends database [27] (Supplementary Table 11). Of 1538 possible transcription factors, SPI1 was the 5th-most enriched for the gene set (p < 1.01 × 10–23) and 19 of the 22 genes were co-expressed with SPI1. In addition, the 22 matching genes were provided to the RegNetwork database [35] (Supplementary Table 12). SPI1 was the only one of the 22 matching genes to be considered a regulator by RegNetwork. Eight of the 21 other genes, three with high confidence, were considered regulated by SPI1. Indeed, of the 2221 genes potentially regulated by SPI1, three of the 22 genes ranked very highly-ITGB2 was ranked 3rd, WAS 5th, and TYROBP 27th.

Internetwork modular methylation analysis shows GBM-specific hypo-methylation

The beta methylation values for each of the 22 matching genes were evaluated using data from TCGA. A Student’s T-Test was used to compare the beta methylation values for LGG with those in GBM (p < 6.84 × 10–6). As shown in Figure 6, on average, each of the 22 genes was hypo-methylated in GBM relative to LGG. Of interest, while TCGA M0214 is enriched for GBM and OV, the methylation patterns differ between GBM and OV. While these 22 genes are hypo-methylated in GBM versus LGG datasets, several of these 22 genes are hyper-methylated in OV versus LGG datasets (data not shown), suggesting an alternate regulatory mechanism in OV.
Figure 6

Beta methylation values for the 22 matching glioblastoma genes

Error bars show SEM. Note that, because there are several methylation readings for each dataset and hundreds of datasets per condition, no error bars are visible because the SEM is appreciably zero.

Beta methylation values for the 22 matching glioblastoma genes

Error bars show SEM. Note that, because there are several methylation readings for each dataset and hundreds of datasets per condition, no error bars are visible because the SEM is appreciably zero.

GBM subtype analysis supports the mesenchymal phenotype

The gbmSygnal Network [37] uses bicluster technology to group genes based on ChIPseq signals and gene co-expression. Each of the 22 matching genes was searched in the gbmSygnal database. 16 biclusters enriched for a cancer hallmark were found with three or more of the 22 matching genes. These 16 biclusters were all enriched for “tumor-promoting inflammation” and “evading immune detection” (Supplementary Table 8). In addition, the gbmSygnal Network organizes expression data for each bicluster into quintiles and enriches each quintile for GBM subtype. Mesenchymal GBM was predominant in the highest expression quintile relative to the lowest expression quintile in each of the 16 biclusters. Furthermore, Verhaak et al. [44] described four subtypes of GBM [44] and genes upregulated in each of the four GBM subtypes. Four genes (SIGLEC9, MYO1F, LAPTM5, ITGB2) from the list of 22 matching genes were upregulated in mesenchymal GBM; none of the 22 genes were upregulated in any other subtype. Furthermore, we investigated the expression levels of nine NF1 transcripts in the Brain GEM. High NF1 expression is characteristic of mesenchymal GBM [44]. One of these transcripts, ENST00000358273, was upregulated in GBM relative to normal brain (Student’s T-Test p-value = 6.56 × 10–15; Supplementary Table 13). Finally, 17 of the 22 shared genes were found in the Glioblastoma Bio Discovery Portal [38] based on results from Verhaak et al. [44]. The average mRNA expression z-score was found across the proneural (n = 56 tumors), neural (n = 31 tumors), mesenchymal (n = 57 tumors), and classical (n = 53 tumors) GBM subtypes (Figure 7). The mesenchymal subtype showed the highest expression for 15 of 17 genes. Furthermore, using the Glioblastoma Bio Discovery Portal, it was found that above-median expression levels for these 17 genes led to decreased survival in every GBM subtype and the full cohort (Table 3).
Figure 7

Gene expression levels across GBM subtypes for 17 of the 22 matching genes

Error bars represent SEM.

Table 3

Glioblastoma bio discovery portal survival analysis

SubtypePrognostic index hazard ratioLogRank P-Value
Classical3.570
Neural15.630
Proneural4.660
Mesenchymal3.020
Full Cohort1.790

Prognostic index hazard ratio and logrank p-value are given for expression levels above the median within each subtype.

Gene expression levels across GBM subtypes for 17 of the 22 matching genes

Error bars represent SEM. Prognostic index hazard ratio and logrank p-value are given for expression levels above the median within each subtype.

cBioPortal analysis provides evidence for a GBM-specific module

If this 22-gene network is GBM-specific, as shown in Figures 4B and 5B, one would expect different co-occurrence and mutual exclusivity results for LGG and GBM data (Table 4). Indeed, this 22-gene network appears highly dependent on ATRX and p53 in LGG but not in GBM. The mutual exclusivity of PIK3R1 mutations in GBM is of particular interest; PIK3R1 knockdown decreases invasion, proliferation, and migration in GBM [45]. IDH1 mutations were neither mutually exclusive nor co-occurring with alterations in the 22 shared genes. This is consistent with Figure 7; the proneural GBM subtype, which exhibits low expression for these 22 genes, is often defined by IDH1 or PDGFRA mutation.
Table 4

Mutation enrichment in shared genes

ConditionGeneDirectionEnrichment
LGGATRXCo-occurrence2.56E-06
LGGTP53Co-occurrence5.04E-06
LGGCICMutual exclusivity1.29E-03
LGGMUC17Co-occurrence0.0122
LGGFUBP1Mutual exclusivity0.0143
LGGVPS13BCo-occurrence0.019
LGGMYH8Co-occurrence0.029
GBMKCNQ5Co-occurrence3.84E-03
GBMMUC17Co-occurrence0.0126
GBMLILRB2Co-occurrence0.016
GBMPTPRMCo-occurrence0.016
GBMPIK3R1Mutual exclusivity0.0326

Internetwork modular functional annotation analysis

We next sought to understand the function of TCGA M0214 and Brain M0257 in GBM. Comparing the functional annotations enriched (p < 0.001) in TCGA M0214 and Brain M0257 showed that 23 annotations were shared between modules (Table 5). C1QA and C1QC, components of complement protein C1 [46], and C3AR1, a receptor for the complement protein C3a [47], are among the 22 shared genes between TCGA M0214 and Brain M0257. C1Q has been shown [48] to promote GBM invasiveness and proliferation independent of complement system activation. Carro et al. also lists C1Q as a member of the transcriptional network which drives mesenchymal phenotypes in brain tumors [49]. Several complement system-related functional annotations (Table 5) are also shared between TCGA M0214 and Brain M0257. In addition, RNase6, one of the 22 matching genes, was searched with ImmuNet, a regulatory network database for immune system-related genes [36]. At a confidence level > 99%, eight of the other 21 genes shared a function with RNase6 in the complement system (Supplementary Table 14).
Table 5

Shared functional annotations in between modules

TermTerm IDFunction
GOGO:0045087Innate immune responses are defense responses mediated by germline encoded components that directly recognize components of potential pathogens.
GOGO:0050776Any process that modulates the frequency, rate or extent of the immune response, the immunological reaction of an organism to an immunogenic stimulus.
GOGO:0045650Any process that stops, prevents, or reduces the frequency, rate or extent of macrophage differentiation.
GOGO:0005581A protein complex consisting of three collagen chains assembled into a left-handed triple helix.
GOGO:0030853Any process that stops, prevents, or reduces the frequency, rate or extent of granulocyte differentiation.
GOGO:0006955Any immune system process that functions in the calibrated response of an organism to a potential internal or invasive threat.
GOGO:0034138Any series of molecular signals generated as a consequence of binding to toll-like receptor 3.
GOGO:0002283The change in morphology and behavior of a neutrophil resulting from exposure to a cytokine, chemokine, cellular ligand, or soluble factor, leading to the initiation or perpetuation of an immune response.
GOGO:0002281A change in morphology and behavior of a macrophage resulting from exposure to a cytokine, chemokine, cellular ligand, or soluble factor, leading to the initiation or perpetuation of an immune response.
GOGO:0019864Interacting selectively and non-covalently with an immunoglobulin of an IgG isotype.
GOGO:0071404Any process that results in a change in state or activity of a cell (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of a low-density lipoprotein particle stimulus.
INTERPROIPR001073C1q domain
INTERPROIPR008983Tumour necrosis factor-like domain
INTERPROIPR008160Collagen triple helix repeat
INTERPROIPR013106Immunoglobulin V-set domain
KEGGhsa05322Systemic lupus erythematosus
MIM120575COMPLEMENT COMPONENT 1, q SUBCOMPONENT, C CHAIN
PFAMPF00386C1q is a subunit of the C1 enzyme complex that activates the serum complement system.
PFAMPF01391Members of this family belong to the collagen superfamily.
PFAMPF07686This domain is found in antibodies as well as neural protein P0 and CTL4 amongst others.
REACTOMER-HSA-173623Classical antibody-mediated complement activation
REACTOMER-HSA-198933Immunoregulatory interactions between a Lymphoid and a non-Lymphoid cell
REACTOMER-HSA-166663Initial triggering of complement
In conclusion, we used a GMM-based gene co-expression analysis to identify GBM-specific gene co-expression clusters embedded within and parsed from LGG and normal brain datasets. A 22-gene module was separately identified in two gene expression sources; this module has increased RNA expression and decreased DNA methylation in GBM. Furthermore, high expression of these genes is associated with decreased survival and with the mesenchymal GBM subtype. Future work involving these genes may help assess their roles in the complex GBM phenotype. We present this GBM-specific gene module and note that the cross-validating co-expression workflow used here is widely applicable.

MATERIALS AND METHODS

TCGA GEM construction

As described in Ficklin et al. [8], the TCGA GEM was constructed using 2016 RNAseq tumor samples [7]. All available normalized isoform datasets, produced by TCGA’s RNASeqV2 workflow [14, 15], were downloaded for five cancers on April 1, 2016. The datasets include 173 GBM samples, 534 lower grade glioma (LGG) samples, 427 bladder cancer (BLCA) samples, 309 ovarian cancer (OV) samples, and 572 thyroid cancer (THCA) samples. These datasets were compiled into a single GEM, which is an n x m matrix where n is the number of datasets and m is the number of RNA transcript IDs; each value represents a gene’s expression level as quantified by RSEM through the RNASeqV2 workflow [14, 15]. The TCGA data utilized 73599 knownGene version 5 (kg5) UCSC transcript IDs. As such, the raw GEM was a 73599 × 2016 matrix. Outlier expression profiles were detected using a Kolmogorov-Smirnov (KS) test (DN > 0.15) implemented in the preprocessCore [16] library. No outliers were detected. All non-zero expression values were log2 transformed and the matrix was quantile normalized.

Brain and random GEM construction

A brain-specific GEM was constructed using 220 RNAseq datasets from NCBI’s SRA database [12]. These 220 samples were the only publicly-available samples annotated as brain-specific in the SRA database upon their download on September 16, 2016. These datasets were processed into a GEM using the SRA toolkit v2.5.2 [12], Trimmomatic v0.33 [17], Hisat2-2.0.1-beta [18], Samtools v0.1.19 [19, 20], and Cufflinks v2.2.1 [21]. The Gencode v24 GFF3 file, complete with scaffolds, assembly patches, and alternate loci guided transcript quantification (http://www.gencodegenes.org/releases/24.html). The raw GEM was preprocessed through the methods described above. 16 datasets were removed by the KS test (DN > 0.15), leaving 204 samples in the log2 normalized GEM. Transcript counts were indexed as 209086 Ensembl hg38 transcript IDs resulted in a 209086 × 204 GEM. Of these samples, 38 were GBM tumor samples [9], 138 were normal brain samples [10], and 28 were from Brodmann’s Area 9 of Parkinson’s Disease patients [11]. A random human GEM was also constructed using 2004 human RNAseq datasets from the NCBI SRA database. These samples were randomly selected from all available paired-end human RNAseq datasets that were produced by an Illumina HiSeq sequencer. This GEM was constructed and preprocessed as described above. The KS test (DN > 0.15) removed 211 datasets, resulting in a 209086 × 1793 GEM. From this 209086 × 1793 preprocessed GEM, 49 transcripts mapped to the genes present in Module 0214 identified in the TCGA Network.

GCN construction and thresholding

Each normalized GEM was processed with KINC v1.0 [8], a software package that uses Gaussian mixture models (GMMs) before applying pairwise correlation analyses. For each GEM, the OSG-KINC (https://github.com/feltus/OSG-KINC) workflow was utilized to build a similarity matrix using the KINC software. This workflow utilizes the Pegasus Workflow Management System [22] to execute GMM clustering and pairwise spearman correlation on the Open Science Grid (https://www.opensciencegrid.org). By using GMMs prior to each pairwise comparison, KINC [8] samples clusters that result in co-expressed genes. Only clusters spanning 30 or more samples were further processed with Spearman correlations because a Pearson’s power analysis found that 30 samples resulted in a false positive rate at α = 0.05, a false negative at β = 0.2, and an effect size of 0.5. The Brain and TCGA Networks each took about one month to construct with KINC. Globus [24] was used to transfer KINC output files to the Palmetto Cluster at Clemson University (https://www.palmetto.clemson.edu/palmetto/). Random Matrix Thresholding (RMT) [25] was used to find a correlation significance threshold for each similarity matrix produced by KINC. The thresholding process ignored clusters with low expression levels (< 0.1 FPKM) and/or less than 30 datasets. Correlations above this experimentally determined significance threshold were extracted for each GCN. The TCGA Network, as described in Ficklin et al. [8], had a correlation threshold of 0.8601 and the Brain Network had a correlation threshold of 0.8801. Each edge in a GCN represents a relationship between two genes with a correlation value greater than the RMT-defined significance threshold. Link Community Modules (LCM), or groups of co-expressed genes, were identified using the linkcomm R package [13, 26].

TCGA transcript ID mapping

Biomart (http://useast.ensembl.org/biomart) was used to map each hg38 Ensembl transcript ID in the Brain Network to its corresponding hg38 Associated Gene Name. Using the UCSC hg19 database, the kg5 UCSC transcript IDs in the TCGA Network were mapped to kg6 and kg7 UCSC IDs. Using the UCSC hg38 database, these kg7 UCSC IDs were mapped to kg8 UCSC IDs, kg9 UCSC IDs, and finally to hg38 Ensembl IDs. Biomart was then used to map each hg38 Ensembl ID to its corresponding Associate Gene Name. 90% of the original kg5 UCSC IDs mapped to hg38 Ensembl IDs.

GeneFriends co-expression validation

The 22 matching genes between TCGA M0214 and Brain M0257 were searched as a group in the GeneFriends database [27]. The transcription factor data and the internal co-expression data for the matching genes were downloaded.

Module enrichment analysis

Each module was tested for sample label enrichment (Fisher’s Exact Test P < 0.001; e.g. LGG, GBM, OV, THCA, BLCA, Normal Brain, and Parkinson’s Brain. Only samples present in > 95% of the edges in a module were considered. Functional term enrichment was performed to associate each module with these functional annotations: Kyoto Encyclopedia of Genes and Genomes (KEGG) [28], Gene Ontology (GO) [29], Reactome [30], InterPro [31], Pfam [32], and Mendelian Inheritance in Man (MIM) [33]. A Fisher’s Exact Test p-value < 0.001 was considered significant.

Internetwork comparisons

For GBM comparisons with the Brain GEM, 63 transcripts in Brain M0257 were used to extract a 63 × 138 GEM with only normal brain datasets-no GBM datasets. This mini-GEM was processed with KINC [8] and edges matching those in Brain M0257 were identified. For cancer-specific comparisons with the random GEM, 54 transcripts in TCGA M0214 were identified in the Brain GEM and the Random GEM. 49 of the 54 kg5 UCSC transcripts in the TCGA GEM mapped to hg38 Ensembl transcripts in the Brain and Random GEMs. These 49 transcripts and their expression values were extracted into a 49 × 1793 mini-GEM that was converted into a GCN with KINC. Edges matching those in TCGA M0214 were identified. For the 22 matching genes between TCGA M0214 and Brain M0257, the normalized GBM and LGG expression values in the TCGA GEM were compared with a Student’s T-Test. The GBM and normal brain expression values in the Brain GEM were also compared with a Student’s T-Test. The Matlab clustergram function was used to generate heatmaps that represent gene expression intensities. We recognized that the Brain GEM, created from various publicly available RNAseq datasets, required further normalization within conditions; unlike the TCGA datasets, the Brain datasets were not all derived by the same research network using the same RNASeqV2 workflow. Thus, GBM expression values of the 22 matching genes were divided by the GBM global median expression value in the Brain GEM (1.1398). Similarly, the normal brain expression values of the 22 matching genes were divided by the normal brain global median expression value in the Brain GEM (1.7215). A Student’s T-Test was also used to compare ELF1 and NF1 expression levels across different conditions in the TCGA and Brain GEMs.

Module DNA methylation analysis

All publicly available beta methylation datasets for GBM and LGG were downloaded from the Genomic Data Commons’ TCGA Data Portal (https://gdc-portal.nci.nih.gov/) on January 27, 2017. 1607 datasets, including 534 LGG and 450 GBM datasets, were downloaded. The beta methylation values for the 22 matching genes across each cancer were compared with a Student’s T-Test.

External evidence for internetwork module relationships

The 22 genes matching between TCGA M0214 and Brain M0257 were queried in the Gene Set Enrichment Analysis database [34] and the resulting transcription factor data was downloaded. These 22 genes were also provided to the RegNetwork database [35] to find possible regulatory mechanisms. In addition, RNase6, one of the 22 matching genes, was searched in the ImmuNet database [36] to find functional similarities to genes in the complement cascade. The genes with ImmuNet confidence levels > 0.99 were downloaded. The gbmSygnal Network [37] was also queried and 16 biclusters with three or more of the 22 matching genes were discovered. The 22 matching genes were also provided to the Glioblastoma Bio Discovery Portal [38] using the “Verhaak Core” participants option and the “3-Platform Aggregates” experiment option. Finally, cBioPortal [39, 40] was used to analyze the 22 matching genes in patient samples; the 22 matching genes were queried twice; first, as a gene set using LGG (n = 283 tumors) data, and second, using GBM (n = 136 tumors) data from the “TCGA, Provisional” dataset. cBioPortal uses a Student’s T-Test for its reverse phase protein array comparison and a Fisher Exact Test for mutation enrichment.
  44 in total

1.  KEGG: kyoto encyclopedia of genes and genomes.

Authors:  M Kanehisa; S Goto
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  Cytoscape: a software environment for integrated models of biomolecular interaction networks.

Authors:  Paul Shannon; Andrew Markiel; Owen Ozier; Nitin S Baliga; Jonathan T Wang; Daniel Ramage; Nada Amin; Benno Schwikowski; Trey Ideker
Journal:  Genome Res       Date:  2003-11       Impact factor: 9.043

3.  A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data.

Authors:  Heng Li
Journal:  Bioinformatics       Date:  2011-09-08       Impact factor: 6.937

4.  Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.

Authors:  Aravind Subramanian; Pablo Tamayo; Vamsi K Mootha; Sayan Mukherjee; Benjamin L Ebert; Michael A Gillette; Amanda Paulovich; Scott L Pomeroy; Todd R Golub; Eric S Lander; Jill P Mesirov
Journal:  Proc Natl Acad Sci U S A       Date:  2005-09-30       Impact factor: 11.205

5.  Cooperation of C1q receptors and integrins in C1q-mediated endothelial cell adhesion and spreading.

Authors:  Xiaodong Feng; Marcia G Tonnesen; Ellinor I B Peerschke; Berhane Ghebrehiwet
Journal:  J Immunol       Date:  2002-03-01       Impact factor: 5.422

6.  Potential risk factors for incident glioblastoma multiforme: the Honolulu Heart Program and Honolulu-Asia Aging Study.

Authors:  James S Nelson; Cecil M Burchfiel; Desta Fekedulegn; Michael E Andrew
Journal:  J Neurooncol       Date:  2012-05-17       Impact factor: 4.130

Review 7.  Platelet-derived growth factor receptor alpha in glioma: a bad seed.

Authors:  Kun-Wei Liu; Bo Hu; Shi-Yuan Cheng
Journal:  Chin J Cancer       Date:  2011-09

8.  Abrogation of PIK3CA or PIK3R1 reduces proliferation, migration, and invasion in glioblastoma multiforme cells.

Authors:  Genevieve L Weber; Marie-Odile Parat; Zev A Binder; Gary L Gallia; Gregory J Riggins
Journal:  Oncotarget       Date:  2011-11

Review 9.  From genomics to the clinic: biological and translational insights of mutant IDH1/2 in glioma.

Authors:  Gavin P Dunn; Ovidiu C Andronesi; Daniel P Cahill
Journal:  Neurosurg Focus       Date:  2013-02       Impact factor: 4.332

10.  Visualizing molecular profiles of glioblastoma with GBM-BioDP.

Authors:  Orieta Celiku; Seth Johnson; Shuping Zhao; Kevin Camphausen; Uma Shankavaram
Journal:  PLoS One       Date:  2014-07-10       Impact factor: 3.240

View more
  5 in total

1.  Linking Binary Gene Relationships to Drivers of Renal Cell Carcinoma Reveals Convergent Function in Alternate Tumor Progression Paths.

Authors:  William L Poehlman; James J Hsieh; F Alex Feltus
Journal:  Sci Rep       Date:  2019-02-27       Impact factor: 4.379

2.  Identifying Temporally Regulated Root Nodulation Biomarkers Using Time Series Gene Co-Expression Network Analysis.

Authors:  William L Poehlman; Elise L Schnabel; Suchitra A Chavan; Julia A Frugoli; Frank Alex Feltus
Journal:  Front Plant Sci       Date:  2019-10-31       Impact factor: 5.753

Review 3.  Addressing noise in co-expression network construction.

Authors:  Joshua J R Burns; Benjamin T Shealy; Mitchell S Greer; John A Hadish; Matthew T McGowan; Tyler Biggs; Melissa C Smith; F Alex Feltus; Stephen P Ficklin
Journal:  Brief Bioinform       Date:  2022-01-17       Impact factor: 11.622

Review 4.  The Role of Network Science in Glioblastoma.

Authors:  Marta B Lopes; Eduarda P Martins; Susana Vinga; Bruno M Costa
Journal:  Cancers (Basel)       Date:  2021-03-02       Impact factor: 6.639

5.  Integration of RNA-Seq and proteomics data identifies glioblastoma multiforme surfaceome signature.

Authors:  Saiful Effendi Syafruddin; Wan Fahmi Wan Mohamad Nazarie; Nurshahirah Ashikin Moidu; Bee Hong Soon; M Aiman Mohtar
Journal:  BMC Cancer       Date:  2021-07-23       Impact factor: 4.430

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.