Literature DB >> 33116619

Gene Expression Along with Genomic Copy Number Variation and Mutational Analysis Were Used to Develop a 9-Gene Signature for Estimating Prognosis of COAD.

Yiping Lu1, Si Wu1, Changwan Cui1, Miao Yu1, Shuang Wang1, Yuanyi Yue1, Miao Liu1, Zhengrong Sun1.   

Abstract

PURPOSE: This study aims to systematically analyze multi-omics data to explore new prognosis biomarkers in colon adenocarcinoma (COAD).
MATERIALS AND METHODS: Multi-omics data of COAD and clinical information were obtained from The Cancer Genome Atlas (TCGA). Univariate Cox analysis was used to select genes which significantly related to the overall survival. GISTIC 2.0 software was used to identify significant amplification or deletion. Mutsig 2.0 software was used to identify significant mutation genes. The 9-gene signature was screened by random forest algorithm and Cox regression analysis. GSE17538 dataset was used as an external dataset to verify the predictive ability of 9-gene signature. qPCR was used to detect the expression of 9 genes in clinical specimens.
RESULTS: A total of 71 candidate genes are obtained by integrating genomic variation, mutation and prognostic data. Then, 9-gene signature was established, which includes HOXD12, RNF25, CBLN3, DOCK3, DNAJB13, PYGO2, CTNNA1, PTPRK, and NAT1. The 9-gene signature is an independent prognostic risk factor for COAD patients. In addition, the signature shows good predicting performance and clinical practicality in training set, testing set and external verification set. The results of qPCR based on clinical samples showed that the expression of HOXD12, RNF25, CBLN3, DOCK3, DNAJB13, and PYGO2 was increased in colon cancer tissues and the expression of CTNNA1, PTPRK, NAT1 was decreased in colon cancer tissues.
CONCLUSION: In this study, 9-gene signature is constructed as a new prognostic marker to predict the survival of COAD patients.
© 2020 Lu et al.

Entities:  

Keywords:  9-gene signature; COAD; multi-omics; prognosis biomarkers

Year:  2020        PMID: 33116619      PMCID: PMC7569059          DOI: 10.2147/OTT.S255590

Source DB:  PubMed          Journal:  Onco Targets Ther        ISSN: 1178-6930            Impact factor:   4.147


Introduction

Colon adenocarcinoma (COAD) is a common malignant tumor of the digestive system. It is the fourth leading cause of cancer-related death in the world.1,2 Although much progress has been made in surgical and complementary therapy of COAD in recent years, the prognosis of patients with COAD is still poor.3 The main reason is that the pathogenesis of COAD is complex. Most patients are in the advanced stage at the time of diagnosis, losing the opportunity of surgical treatment.4 Therefore, in-depth research on COAD prognostic markers and potential drug targets to provide new means of diagnosis and treatment will greatly improve patients’ chances of survival. The formation and development of COAD are accompanied by complex and varied genetic molecular mechanisms. COADs with the same pathological characteristics may have different molecular pathogenesis. The prognosis of patients is also quite different from each other. Therefore, molecular markers show potential value in COAD diagnosis and prognosis. A variety of markers have been used to screen and diagnose COAD. For example, the level of carcinoembryonic antigen is closely related to lymph node metastasis and tumor stage in cancer. It can be used as a screening marker for COAD patients.5 Epidermal growth factor receptor (EGFR) and vascular endothelial growth factor (VEGF) are significantly correlated with KI-67 and prognosis in COAD.6 In addition, as far as miRNA biomarkers are concerned, the combination of miR-378, miR-199a and miR-92a in plasma is considered as an effective way to distinguish COAD cases from normal controls.7 However, these indicators are not suitable for clinical practice because of their low specificity and lack of clinical evidence. Therefore, there is an urgent need to accurately classify COAD patients and explore its molecular mechanism in depth to screen specific molecular markers. This requires a broader understanding of the heterogeneity of genome and transcriptome level. There had been plenty of previous studies trying to screen and construct a prognostic marker model for colon cancer. Smith et al8 identified a metastasis gene expression profile derived from experiment to predict colon cancer recurrence and death. Gao et al9 constructed cancer hallmark-based gene signature to predict recurrence and chemotherapy benefit of colorectal cancer patients. In addition, some recent studies have shown that VEGF could be used as a prognostic indicator, but cannot predict the response of advanced CRC to VEGF targeted therapy.10,11 In addition, CEA level is an independent prognostic factor for colon cancer, and can be used with TNM staging of colon cancer.12,13 As far as miRNA prognosis signatures are concerned, Rong et al identified 6-miRNA signature to predict overall survival;14 another study constructed 4 lncRNA-miRNA prognostic signature for stage II colon cancer patients by integrating miRNA and lncRNA data15. However, no studies construct colon cancer prognostic signature using multiple omics data. The study establishes 9-gene signature by integrating genomic copy number variation, mutation and prognostic data. It has stable robustness in internal and external validation sets. Moreover, 9-gene signature is found to be involved in the important pathway and independent prognostic risk factors in COAD patients. In conclusion, the 9-gene signature constructed in this study can be used as a potential target for COAD treatment.

Materials and Methods

Data Acquisition and Processing

Level-3 RNA-sequencing data, the clinicopathological、 SNP 、 copy number segments and survival data of patients with COAD were downloaded from the UCSC Xena browser (). The mutation Annotation Format (MAF) is downloaded from GDC client. The GSE17538 expression profile data and clinical follow-up information are downloaded from the GEO database. The samples of TCGA are randomly divided into two groups. The TCGA training set contains 226 samples and the test set contains 227 samples. As an external validation set, the GSE17538 data set contains a total of 244 samples, including 6 mouse samples, while among the 238 human samples, 38 samples recorded the survival status of NA, and finally used for follow-up analysis.

Univariate Cox Proportional Risk Regression Analysis

As Guo, J et al16 univariate Cox proportional risk regression analysis is performed for each gene in this paper. The genes significantly related to the overall survival (OS) of patients in the training cohort are identified. p < 0.05 is selected as the threshold.

Copy Number Variation Data Analysis

GISTIC is widely used to detect both broad and focal (potentially overlapping) recurring events. GISTIC 2.017 software is used to identify the genes with significant amplification or deletion, with parameter thresholds of amplification, and deletion length greater than 0.1 and p<0.05.

Gene Mutation Analysis

Mutsig 2.0 software is used to identify significant mutation genes in the maf file of TCGA mutation data. The threshold is p < 0.05.

Construction of Prognostic Gene Signature

In this study, prognostic genes with copy number amplification/deletion and mutation are selected. Random survival forest algorithm is used to sequence the importance of common genes.18,19 The number of Monte Carlo iterations is set to 100 and the number of steps forward is set to 5. The genes with relative importance greater than 0.4 are identified as feature genes. Multivariate Cox regression analysis is further carried out to construct the following risk scoring model: RiskScore = ∑Expk * eHRk k=2. In the model, N represents the number of prognostic genes. Expk represents the expression value of prognostic genes. eHRk represents the estimated regression coefficient of genes in the multivariate Cox regression analysis.

Functional Enrichment Analysis

Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis is performed using the R package clusterprofiler20 to identify over-represented GO terms in KEGG pathway and three categories (biological processes, molecular function and cellular component). For this analysis, a false discovery rate (FDR) <0 0.05 is considered to denote statistical significance.

Gene Set Enrichment Analysis (GSEA)

GSEA is performed by the JAVA program () using the MSigDB C2 Canonical pathways gene set collection, which contains 1320 gene sets. Gene sets with a FDR value less than 0.05 after performing 1000 permutations are considered to be significantly enriched.

Quantitative PCR (qPCR)

Twenty specimens were obtained from colon cancer patients which had undergone surgery. Normal tissues were mean paired with tumor biopsies from the same patient. Among the 20 patients, 10 patients were in stage 1, 6 patients were in stage 2, and 4 patients were in stage 3. None of these patients had undergone any therapy include chemotherapy or radiotherapy before surgery. We got the informed consent of every patient, and was approved by The Human Ethics Review Committee of the Shengjing hospital of China Medical University. This study is in accordance with the Declaration of Helsinki and got written informed consent from patients. And, this is a retrospective analysis, and has not been obtained prospectively and consecutively. Total RNA was extracted using TRIzol reagent (Invitrogen, Carlsbad, CA, USA) following the manufacturer’s instructions. Complementary DNA (cDNA) was synthesized from high-quality total RNA using PrimeScript™ RT Master Mix (No. RR036A, Takara Bio USA, Mountain View, CA, USA). Real-time qPCR was performed to validate gene expression using Power SYBR™ Green PCR Master Mix (No. A25742, Thermo Fisher Scientific, Waltham, MA, USA) on the 7900HT Fast Real-Time PCR System (Applied Biosystems, Foster City, CA, USA). Relative expression was calculated based on 2-ΔΔCt method.

Statistical Analysis

A Kaplan-Meier (KM) curve is drawn when using the median risk score in each dataset as a cutoff to compare the survival risk between high-risk and low-risk groups. Multivariate Cox regression analysis is performed to test whether genetic markers are independent prognostic factors. Significance is defined as P < 0.05. All of these analyses are carried out in R 3.4.3.

Results

Preliminary Analysis of Multi-Omics Data to Obtain Prognostic Genes

Preprocessing Data Statistics

The specific distribution of including variables is shown in Table 1. Univariate cox analysis was used to identify 1639 prognostic genes with p < 0.05 in TCGA training set (). The information of the top 20 genes of the 1639 genes is shown in Table 2.
Table 1

Clinical Information Statistics

CharacteristicsTCGA Training Datasets (n=226)TCGA Test Datasets (n=227)p valueGSE17538 (n=238)
Age (years)≤5031290.87532
>50195198206
Survival times (years)Median1.811.972.38
Mean2.252.443.12
Survival statusLiving1781760.839145
Dead485155
GenderFemale1041090.739114
Male122118124
Lymphatic invasionNO1251280.695
YES8175
pathologic_TT1650.769
T23543
T3157153
T42825
pathologic_NN01271390.454
N15452
N24536
N300
pathologic_MM01651680.956
M1/MX5756
AJCC stageStage Ⅰ34420.33228
Stage Ⅱ829272
Stage III735776
Stage Ⅳ303256
Table 2

Information on Top20 Prognosis-Related Genes

ENGSIDGenesymbolHRCoefficientz scorep value
ENSG00000241697TMEFF11.5520.4404.7372.17E-06
ENSG00000160117ANKLE11.5250.4224.6912.72E-06
ENSG00000214128TMEM2131.4270.3554.5595.13E-06
ENSG00000090932DLL31.6220.4834.5285.95E-06
ENSG00000268940CT45A11.3610.3084.5226.11E-06
ENSG00000159556ISL21.5280.4244.5026.72E-06
ENSG00000205456TP53TG3D1.4740.3884.4071.05E-05
ENSG00000100156SLC16A81.4730.3874.3811.18E-05
ENSG00000182759MAFA1.4710.3864.3211.56E-05
ENSG00000106689LHX21.4720.3874.2841.84E-05
ENSG00000148331ASB62.0350.7114.2751.91E-05
ENSG00000269437NXF2B1.4250.3544.2691.96E-05
ENSG00000124260MAGEA101.3660.3124.2622.02E-05
ENSG00000126890CTAG21.3460.2974.2102.56E-05
ENSG00000184029DSCR41.4140.3474.1723.02E-05
ENSG00000187730GABRD1.4800.3924.1533.29E-05
ENSG00000149927DOC2A1.6000.4704.1273.68E-05
ENSG00000075043KCNQ21.4280.3564.1263.69E-05
ENSG00000110148CCKBR1.4040.3394.1133.91E-05
ENSG00000121905HPCA1.4920.4004.1074.01E-05
Clinical Information Statistics Information on Top20 Prognosis-Related Genes

Identification of 398 Genes with Copy Number Variation

The parameter threshold is a fragment with an amplification or deletion length greater than 0.1 and p < 0.05. Figure 1A shows a significantly amplified fragment in the COAD genome. records the genes that have been significantly amplified on each fragment. For example, CCND2 significantly amplifies on the 12p13.32 fragment (q value = 1.38E-07). CCND3 significantly amplifies on the 6p21.1 fragment (q value = 0.0002804). VEGFA significantly amplifies on the 6p21.1 fragment (q value = 0.0002804). A total of 137 genes are amplified. Figure 1B shows fragments with significant deletions in the COAD genome. records the genes that have significant deletion on each fragment. For example, RBFOX1 has significantly deletion on the 16p13.3 fragment (q value = 4.02E-44). SMAD4 has significant deletion on the 18q21.2 fragment (q value = 9.94E-15). APC has significant deletion on the 5q22.2 fragment (q value = 1.48E-05). A total of 261 genes have deletion.
Figure 1

(A) Fragments with significant amplification in the COAD genome. (B) Fragments with significant deletion in the COAD genome. q represents the long arm of the chromosome, p represents the broken arm.

(A) Fragments with significant amplification in the COAD genome. (B) Fragments with significant deletion in the COAD genome. q represents the long arm of the chromosome, p represents the broken arm.

Identification of 486 Mutant Genes

A total of 486 genes with significant mutation frequency are obtained with a threshold of p<0.05. Figure 2 shows the distribution of synonymous mutations, missense mutations, frame insertion or deletion, frame movement, nonsense mutations, cleavage sites and other non-synonymous mutations of 50 significant genes in TCGA COAD cohort. Among the identified 486 genes, some genes are closely related to the occurrence and development of cancer, such as KRAS, TP53, APC, PIK3CA, and FBXW7. By integrating copy number mutant genes and mutant genes, we identified 839 common genes, which are involved in tumor-related pathways and biological functions ().
Figure 2

Distribution of top 50 genes with the most significant P value in COAD patients. The column chart on the top shows the total number of synonymous and non-synonymous mutations in 50 genes per patient. The column chart on the right shows the number of samples in which 50 genes have mutations in all samples.

Distribution of top 50 genes with the most significant P value in COAD patients. The column chart on the top shows the total number of synonymous and non-synonymous mutations in 50 genes per patient. The column chart on the right shows the number of samples in which 50 genes have mutations in all samples.

Identification of 9 Hub Genes Using Random Forest Algorithm

Among the 1639 candidate prognostic genes, 71 genes are identified to have copy number variation and mutation. The importance of prognostic genes is sequenced using R package random survival forest. Parameters are nrep = 100 and nstep = 5, which represent the number of Monte Carlo iterations is 100 and the number of steps forward is 5 (11), respectively. Genes with relative importance greater than 0.3 are identified as the final signature. Figure 3A shows the relationship between error rate and the number of classification trees. Figure 3B shows the importance sequencing of the first 9 genes out-of-bag.
Figure 3

(A) Relationship between error rate and number of classification trees. The axe means number of trees. (B) Importance sequencing of the first five genes out-of-bag. The axe means variable importance.

(A) Relationship between error rate and number of classification trees. The axe means number of trees. (B) Importance sequencing of the first five genes out-of-bag. The axe means variable importance.

Establishing 9-Gene Signature to Divide Samples in TCGA Training Cohort

As for the identified 9-gene signature, the importance and relative importance of HR, Z score, p value, and out-of-bag for the univariate regression of the 9 genes are shown in Table 3. Then, the 9-gene signature is established using multivariate COX analysis. The signature is as follows ():
Table 3

9- Genes Significantly Associated with the Overall Survival in the Training Dataset

Ensembl Gene IDSymbolHRZ-scoreP valueImportanceRelative Importance
ENSG00000170178HOXD121.302.3490911.88E-020.00961
ENSG00000163481RNF251.522.5406261.11E-020.00940.9848
ENSG00000139899CBLN31.392.2494822.45E-020.00880.9242
ENSG00000088538DOCK31.523.7827431.55E-040.00780.8182
ENSG00000044115CTNNA10.73−2.3992781.64E-020.00640.6667
ENSG00000152894PTPRK0.71−2.3714691.77E-020.00490.5152
ENSG00000187726DNAJB131.282.6398958.29E-030.00460.4848
ENSG00000171428NAT10.70−2.4021131.63E-020.00450.4697
ENSG00000163348PYGO21.472.3901661.68E-020.00320.3333
9- Genes Significantly Associated with the Overall Survival in the Training Dataset Risk = 0.2929859 * HOXD12 + 0.1081464 * RNF25 + 0.06077754 * CBLN3 + 0.312247 * DOCK3 – 0.1765895 * CTNNA1 – 0.3442616 * PTPRK + 0.1121588 * DNAJB13 – 0.3093265 * NAT1 + 0.2425542 * PYGO2 The scoring formula for each sample is the addition of the above gene expression value * ordinal number. Then the sample score median −0.1002146 is selected as cutoff to divide samples into high-risk group and low-risk group. Figure 4 shows the classification effect in the TCGA training set. In Figure 4A, 113 patients are divided into the low-risk group and 113 patients are divided into the high-risk group. There are significant differences between the two groups. log-rank p=1.288372e-05. Figure 4B shows the ROC curve, where 5-year AUC is 0.76. Figure 4C shows that as the patient’s risk score increases, the patient’s survival time decreases. According to the expression changes of 9-gene signature, HOXD12, RNF25, CBLN3, DOCK3, DNAJB13, and PYGO2 are identified as risk factors, with high expression associated with high risk. CTNNA1, PTPRK, and NAT1 are identified as protective factors, with high expression associated with low risk.
Figure 4

(A) KM survival curve of 9-gene signature distribution in TCGA training set. (B) ROC curve and AUC of 9-gene signature classification. (C) Risk score, survival time, survival status and expression of 9 genes in TCGA training set.

(A) KM survival curve of 9-gene signature distribution in TCGA training set. (B) ROC curve and AUC of 9-gene signature classification. (C) Risk score, survival time, survival status and expression of 9 genes in TCGA training set.

Robustness Detection of 9-Gene Signature in TCGA Testing Cohort

In order to determine the robustness of the model, the same model and the same cutoff are used as the TCGA training set, which is verified in the TCGA testing set as well. In Figure 5A, 101 patients are divided into the low-risk group and 126 patients are divided into the high-risk group. There are significant differences between the two groups (log-rank p= 0.006360314). Figure 5B shows the 5-year ROC was 0.64. Figure 5C produces results similar to those of the TCGA training set. The survival time of death samples decreases significantly as the risk score increases.
Figure 5

(A) KM survival curve of 9-gene signature distribution in TCGA testing set. (B) ROC curve and AUC of 9-gene signature classification. (C) Risk score, survival time, survival status and expression of 9 genes in TCGA training set.

(A) KM survival curve of 9-gene signature distribution in TCGA testing set. (B) ROC curve and AUC of 9-gene signature classification. (C) Risk score, survival time, survival status and expression of 9 genes in TCGA training set.

Validation of 9-Gene Signature in External Cohorts

External independent dataset GSE17538, GSE39582 and GSE24551 were used to determine the stable performance in predicting the prognosis. In Figure 6A, 79 patients are divided into the low-risk group and 121 patients are divided into the high-risk group. There are significant differences between the two groups (log-rank p= 0.027). Figure 6B shows the ROC curve with an average 5-year AUC is 0.60. Figure 6C shows that as the patient’s risk score increases, the patient’s survival time decreases. In the GSE39582 and GSE24551 cohorts, we found that the 9-gene signature can still significantly divide the two data sets into high and low-risk groups significantly. Among them, the 1-year AUC of GSE39582 is 0.68, and the 5-year AUC of GSE24551 is 0.63. Our research confirmed that our model has stable prediction performance and robustness on different platforms, and the external verification of multiple data sets minimizes the statistical deviation ().
Figure 6

(A) KM survival curve of 9-gene signature distribution in GSE17538. (B) ROC curve and AUC of 9-gene signature classification. (C) Risk score, survival time, survival status and expression of 9 genes in GSE17538.

(A) KM survival curve of 9-gene signature distribution in GSE17538. (B) ROC curve and AUC of 9-gene signature classification. (C) Risk score, survival time, survival status and expression of 9 genes in GSE17538.

Univariate and Multivariate Cox Regression Analysis

In order to identify the prognostic independence of 9-gene signature, univariate and multivariate COX regression were analyzed in TCGA training and testing cohort (Table 4). It showed that high-risk score (HR= 3.05, 95% CI = 1.36–6.84, p= 0.007), age >50 (HR= 1.04, 95% CI = 1.007–1.07, p= 0.016), AJCC Stage III/IV (HR= 37.58, 95% CI = 1.42–990.93, p= 0.030) were independent risk factors in multivariable cox analysis.
Table 4

Univariate and Multivariate Cox Regression Analyses

VariablesUnivariate AnalysisMultivariable Analysis
HR95% CI of HRP valueHR95% CI of HRP value
TCGA training datasets
9-gene risk score
Low-risk score1 (Ref)1 (Ref)
High-risk score3.972.04–7.734.9E-053.051.36–6.840.007
Age(≤50/>50)1.020.99–1.056.0E-021.041.00–1.070.016
Gender female1 (Ref)1 (Ref)
Gender male1.050.59–1.860.870.660.31–1.370.265
Lymphatic Invasion NO1 (Ref)1 (Ref)
Lymphatic Invasion YES2.581.35–4.910.001.940.66–5.680.227
Pathologic T 1/ T 21 (Ref)1 (Ref)
Pathologic T 31.550.54–4.430.4120.150.01–1.340.091
Pathologic T 45.151.66–160.0050.430.04–4.130.461
Pathologic N 01 (Ref)1 (Ref)
Pathologic N 11.380.63–2.960.4130.240.04–1.560.136
Pathologic N 23.001.58–5.660.0010.490.07–3.210.456
Pathologic M 01 (Ref)1 (Ref)
Pathologic M 14.522.33–8.737.42E-062.280.89–5.790.080
Pathologic M X1.820.77–4.260.1681.560.50–4.840.440
AJCC Stage Ⅰ1 (Ref)1 (Ref)
AJCC Stage Ⅱ1.940.43–8.650.385413.420.68–262.170.087
AJCC Stage III/Ⅳ4.281.01–17.980.04737.581.42–990.930.030
Univariate and Multivariate Cox Regression Analyses We performed KM curve of different stages in the training cohort, and found that our model can significantly divide Stage II and Stage III patients into high- and low-risk groups, but no significance was found in Stage II and Stage IV because of a small number of samples ()

Analysis of Pathways Enriched in High-Risk and Low-Risk Groups Using GSEA

In TCGA training data set, GSEA is used to analyze the pathways enriched in high-risk and low-risk groups (Table 5). Due to the large number of KEGG pathways included in the analysis, and the FDR q values are not significant, so here we choose the first four results with the most significant p value to plot. As shown in Figure 7, KEGG ERBB SIGNALING PATHWAY, KEGG COLORECTAL CANCER, KEGG p53 SIGNALING PATHWAY, and KEGG TGF BETA SIGNALING PATHWAY, which are significantly enriched in both high-risk and low-risk groups, are closely related to the occurrence, development and metastasis of COAD (Figure 7).
Table 5

GSEA Analysis of 9-Gene Signature

NameSIZEESNESNOM p-valFDR q-val
KEGG_O_GLYCAN_BIOSYNTHESIS26−0.650−1.9070.0000.355
KEGG_UBIQUITIN_MEDIATED_PROTEOLYSIS133−0.502−1.8820.0060.245
KEGG_ERBB_SIGNALING_PATHWAY87−0.455−1.7720.0080.494
KEGG_ENDOMETRIAL_CANCER52−0.493−1.7160.0080.586
KEGG_OOCYTE_MEIOSIS110−0.455−1.7160.0080.471
KEGG_THYROID_CANCER29−0.511−1.7090.0060.415
KEGG_COLORECTAL_CANCER62−0.470−1.7010.0120.377
KEGG_VALINE_LEUCINE_AND_ISOLEUCINE_DEGRADATION44−0.584−1.6900.0560.356
KEGG_PROSTATE_CANCER89−0.445−1.6890.0190.321
KEGG_APOPTOSIS86−0.438−1.6800.0210.312
KEGG_INSULIN_SIGNALING_PATHWAY136−0.393−1.6750.0100.292
KEGG_BASAL_TRANSCRIPTION_FACTORS34−0.546−1.6570.0250.306
KEGG_RENAL_CELL_CARCINOMA70−0.448−1.6020.0370.413
KEGG_ADHERENS_JUNCTION73−0.455−1.5960.0460.400
KEGG_DORSO_VENTRAL_AXIS_FORMATION24−0.511−1.5850.0350.398
KEGG_NEUROTROPHIN_SIGNALING_PATHWAY125−0.399−1.5850.0400.374
KEGG_RENIN_ANGIOTENSIN_SYSTEM17−0.527−1.5830.0350.356
KEGG_ENDOCYTOSIS177−0.379−1.5700.0420.365
KEGG_PEROXISOME78−0.470−1.5680.0630.351
KEGG_AMINO_SUGAR_AND_NUCLEOTIDE_SUGAR_METABOLISM44−0.451−1.5500.0540.373
KEGG_PROPANOATE_METABOLISM32−0.525−1.5440.0640.370
KEGG_LONG_TERM_POTENTIATION68−0.402−1.5340.0230.375
KEGG_FATTY_ACID_METABOLISM41−0.484−1.5330.1000.360
KEGG_STARCH_AND_SUCROSE_METABOLISM50−0.441−1.5170.0560.380
KEGG_CHRONIC_MYELOID_LEUKEMIA73−0.417−1.5070.0860.387
KEGG_P53_SIGNALING_PATHWAY66−0.420−1.5020.0560.383
KEGG_BIOSYNTHESIS_OF_UNSATURATED_FATTY_ACIDS20−0.506−1.4960.0690.383
KEGG_TGF_BETA_SIGNALING_PATHWAY85−0.399−1.4930.0590.374
KEGG_NICOTINATE_AND_NICOTINAMIDE_METABOLISM22−0.423−1.4670.03910.4133
Figure 7

Pathways enriched in high-risk and low-risk groups obtained by 9-gene signature.

GSEA Analysis of 9-Gene Signature Pathways enriched in high-risk and low-risk groups obtained by 9-gene signature.

Comparison with Other Prognostic Signature

In order to validate the superiority of our model, we found two published prognostic signature of colorectal cancer to make comparison, such as Zuo21 and Kim.22 To make it comparable, we used the corresponding genes in these two signatures. The risk score of each sample in TCGA was calculated using the same method, and the ROC and KM curve of each signature was evaluated. According to the median risk score value, the sample was divided into Risk-H and Risk-L groups, and the difference of prognosis between the two groups was calculated (Figure 8A and B). We further analyzed the restricted mean survival curves of these signatures, we can see that our model has the highest C-index (0.72), which has more advantages in the long-term survival prediction (Figure 8C).
Figure 8

Comparative analysis of 9-gene signature and others. (A) The AUC and KM curves of Zuo’s model. (B) The AUC and KM curves of Kim’s model. (C) RMS curves and C-index of three signatures.

Comparative analysis of 9-gene signature and others. (A) The AUC and KM curves of Zuo’s model. (B) The AUC and KM curves of Kim’s model. (C) RMS curves and C-index of three signatures.

Data Analysis Flowchart

In order to make the reader easier to understand, we have drawn a flowchart for the data analysis part of the paper (Figure 9).
Figure 9

Data analysis flowchart.

Data analysis flowchart.

Expression Levels of 9 Genes

Based on the bioinformatics analysis results, the expression of 9 genes was verified in twenty normal tissues and colon cancer tissues. The results in Figure 10 showed that the mRNA expression of HOXD12, RNF25, CBLN3, DOCK3, DNAJB13, and PYGO2 were increased in colon cancer tissues and the mRNA expression of CTNNA1, PTPRK, NAT1were decreased in colon cancer tissues (p<0.05). It was consistent with that analyzed using bioinformatic analysis.
Figure 10

Expression levels of 9 genes quantified using qPCR in twenty paired normal tissues and colon cancer tissues. *P<0.05.

Expression levels of 9 genes quantified using qPCR in twenty paired normal tissues and colon cancer tissues. *P<0.05.

Discussion

Colon adenocarcinoma (COAD) is a highly heterogeneous disease. Its occurrence and development process changes with the change of genetic and epigenetic factors.23 Therefore, COAD patients with the same pathological features may have different prognostic and therapeutic response to certain drugs. Screening prognostic molecular markers that fully reflect the biological characteristics of COAD is greatly significant for individualized treatment. Several recent studies have shown that genomics, epigenomics and transcriptomics play a vital role in tumor development and progression, helping predict the prognosis of patients.24,25 Therefore, multi-omics studies can help determine tumor heterogeneity and screen therapeutic targets and tumor biomarkers, which have greater advantages.26 The study screens and identifies 9-gene signature associated with COAD prognosis by analyzing multi-omics data, including transcriptome data, copying number variation data and mutation data. The 9-gene signature established by screening has strong robustness and stable prediction performance in both internal verification set and external verification set. It enables stable prediction performance in data sets of different platforms. In addition, the clinical information in TCGA and GSE19234 are analyzed systematically by COX regression. The results show that 9-gene signature is an independent prognostic factor, which maintains stable clinical independence under the influence of many clinical factors, including training set, TCGA internal verification set and GSE19234 external verification set. Several studies have shown that multi-omics have been used in clinical prediction of prognosis and therapeutic responses. To be more specific, Oncotype DX used for breast cancer recurrence score contains 21 genes.27,28 Another Mammaprint™, which contains 70 genes, is used to assess the risk of metastasis in breast cancer.29,30 In the study of COAD, ColoPrint, which contains 18 genes, is used to predict the risk of prognosis and recurrence in patients.31–33 These results show that great potential of multi-omics screening in clinical application through gene expression profile. In this study, the AUC of 9-gene signature screened by multi-omics in the training set and validation set for five years is more than 0.64, which is more effective in predicting the prognosis of patients. Furthermore, the multi-omics in this study contains only 9 genes, making it easier to apply clinically. In 9-gene signature identified and verified by multi-omics data, HOXD12, RNF25, CBLN3, DOCK3, DNAJB13, and PYGO2 are risk factors, and CTNNA1, PTPRK, NAT1 are protective factors. Consistently, our results based on clinical samples showed the mRNA expression of HOXD12, RNF25, CBLN3, DOCK3, DNAJB13, and PYGO2 were increased and the mRNA expression of CTNNA1, PTPRK, NAT1were decreased in colon cancer tissues. These results suggested these 9-genes may play an important role in the progression of COAD. There have been reported that HOXD12 has high expression in progesterone receptor positive breast cancer tissues.34 RNF25 is an E3 ubiquitin ligase. Studies have confirmed that RNF25 promotes gefitinib resistance by mediating crosstalk between the mediated NF-κB and ERK pathways.35 DOCK3 is shown to be involved in regulating tumor cell metastasis.36,37 PYG02 is closely related to the prognosis of glioma, esophageal squamous cell carcinoma, hepatocellular carcinoma and other tumors.38–40 CTNNA1 is closely related to the prognosis of invasive breast cancer and renal cell carcinoma.41,42 PTPRK is a marker of breast cancer.43 NAT1 is significantly correlated with the increase of the overall survival time of breast cancer patients.44 In brief, these genes are closely related to the prognosis of tumors. In addition, CBLN3 and DNAJB13 have not been reported to be associated with tumors. They are found to be prognostic markers of COAD for the first time in this study. In addition, GSEA enrichment analysis results also show that 9-gene signature is closely related to the progression and metastasis of COAD. Examples include KEGG ERBB SIGNALING PATHWAY, KEGG COLORECTAL CANCER, KEGG p53 SIGNALING PATHWAY, and KEGG TGF BETA SIGNALING PATHWAY. The above results show that the model has potential clinical application value and can provide potential drug targets for patients with COAD. Although this study screens and verifies the potential prognostic markers of COAD based on large sample multi-omics data, this study still has some limitations. The conclusions of this study are mainly based on bioinformatics analysis, which still need to be further verified by experiments in vitro and in vivo. In addition, all the samples involved in this study are retrospective studies. The clinical application still requires comprehensive and in-depth research. Finally, because our study based on public cohorts, there are also have limitations such as the therapeutic effects of these patients, which are not found in the original research. To sum up, our research results indicate that the 9 gene prognostic signature is a reliable tool for predicting the OS of COAD patients.
  44 in total

1.  Random Survival Forests.

Authors:  Jeremy M G Taylor
Journal:  J Thorac Oncol       Date:  2011-12       Impact factor: 15.609

2.  Tumor gene expression and prognosis in breast cancer patients with 10 or more positive lymph nodes.

Authors:  Melody A Cobleigh; Bita Tabesh; Pincas Bitterman; Joffre Baker; Maureen Cronin; Mei-Lan Liu; Russell Borchik; Juan-Miguel Mosquera; Michael G Walker; Steven Shak
Journal:  Clin Cancer Res       Date:  2005-12-15       Impact factor: 12.531

Review 3.  Strategies for targeted drug delivery in treatment of colon cancer: current trends and future perspectives.

Authors:  Antara Banerjee; Surajit Pathak; Vimala Devi Subramanium; Dharanivasan G; Ramachandran Murugesan; Rama S Verma
Journal:  Drug Discov Today       Date:  2017-05-22       Impact factor: 7.851

Review 4.  From tumour heterogeneity to advances in precision treatment of colorectal cancer.

Authors:  Cornelis J A Punt; Miriam Koopman; Louis Vermeulen
Journal:  Nat Rev Clin Oncol       Date:  2016-12-06       Impact factor: 66.675

Review 5.  Biomarkers to predict the clinical efficacy of bevacizumab in cancer.

Authors:  Adrian M Jubb; Adrian L Harris
Journal:  Lancet Oncol       Date:  2010-12       Impact factor: 41.316

6.  Colorectal cancer statistics, 2017.

Authors:  Rebecca L Siegel; Kimberly D Miller; Stacey A Fedewa; Dennis J Ahnen; Reinier G S Meester; Afsaneh Barzi; Ahmedin Jemal
Journal:  CA Cancer J Clin       Date:  2017-03-01       Impact factor: 508.702

7.  Independent validation of a prognostic genomic signature (ColoPrint) for patients with stage II colon cancer.

Authors:  Matthias Maak; Iris Simon; Ulrich Nitsche; Paul Roepman; Mireille Snel; Annuska M Glas; Tibor Schuster; Gisela Keller; Eliane Zeestraten; Inès Goossens; Klaus-Peter Janssen; Helmut Friess; Robert Rosenberg
Journal:  Ann Surg       Date:  2013-06       Impact factor: 12.969

Review 8.  Worldwide burden of colorectal cancer: a review.

Authors:  Pasqualino Favoriti; Gabriele Carbone; Marco Greco; Felice Pirozzi; Raffaele Emmanuele Maria Pirozzi; Francesco Corcione
Journal:  Updates Surg       Date:  2016-04-11

9.  Identification of a 6-gene signature predicting prognosis for colorectal cancer.

Authors:  Shuguang Zuo; Gongpeng Dai; Xuequn Ren
Journal:  Cancer Cell Int       Date:  2019-01-05       Impact factor: 5.722

10.  Development of a Novel Six-miRNA-Based Model to Predict Overall Survival Among Colon Adenocarcinoma Patients.

Authors:  Zhenxiang Rong; Yi Rong; Yingru Li; Lei Zhang; Jingwen Peng; Baojia Zou; Nan Zhou; Zihao Pan
Journal:  Front Oncol       Date:  2020-02-21       Impact factor: 6.244

View more
  1 in total

1.  Long Noncoding RNA SNHG7 Is a Diagnostic and Prognostic Marker for Colon Adenocarcinoma.

Authors:  Chengwei Jiang; Shanshan Qu; Tie Liu; Miao Hao
Journal:  Front Oncol       Date:  2022-06-07       Impact factor: 5.738

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.