Hengjun Lin1, Xueke Qiu1, Bo Zhang1, Jichao Zhang1. 1. Department of Tumor, Anus and Intestine, Jinhua People's Hospital, Jinhua, Zhejiang 321000, China, 13857988075@163.com.
Abstract
BACKGROUND: Colorectal cancer is a malignant tumor with high death rate. Chemotherapy, radiotherapy and surgery are the three common treatments of colorectal cancer. For early colorectal cancer patients, postoperative adjuvant chemotherapy can reduce the risk of recurrence. For advanced colorectal cancer patients, palliative chemotherapy can significantly improve the life quality of patients and prolong survival. FOLFOX is one of the mainstream chemotherapies in colorectal cancer, however, its response rate is only about 50%. METHODS: To systematically investigate why some of the colorectal cancer patients have response to FOLFOX therapy while others do not, we searched all publicly available database and combined three gene expression datasets of colorectal cancer patients with FOLFOX therapy. With advanced minimal redundancy maximal relevance and incremental feature selection method, we identified the biomarker genes. RESULTS: A Support Vector Machine-based classifier was constructed to predict the response of colorectal cancer patients to FOLFOX therapy. Its accuracy, sensitivity and specificity were 0.854, 0.845 and 0.863, respectively. CONCLUSION: The biological analysis of representative biomarker genes suggested that apoptosis and inflammation signaling pathways were essential for the response of colorectal cancer patients to FOLFOX chemotherapy.
BACKGROUND: Colorectal cancer is a malignant tumor with high death rate. Chemotherapy, radiotherapy and surgery are the three common treatments of colorectal cancer. For early colorectal cancer patients, postoperative adjuvant chemotherapy can reduce the risk of recurrence. For advanced colorectal cancer patients, palliative chemotherapy can significantly improve the life quality of patients and prolong survival. FOLFOX is one of the mainstream chemotherapies in colorectal cancer, however, its response rate is only about 50%. METHODS: To systematically investigate why some of the colorectal cancer patients have response to FOLFOX therapy while others do not, we searched all publicly available database and combined three gene expression datasets of colorectal cancer patients with FOLFOX therapy. With advanced minimal redundancy maximal relevance and incremental feature selection method, we identified the biomarker genes. RESULTS: A Support Vector Machine-based classifier was constructed to predict the response of colorectal cancer patients to FOLFOX therapy. Its accuracy, sensitivity and specificity were 0.854, 0.845 and 0.863, respectively. CONCLUSION: The biological analysis of representative biomarker genes suggested that apoptosis and inflammation signaling pathways were essential for the response of colorectal cancer patients to FOLFOX chemotherapy.
Colorectal cancer is a malignant tumor that seriously endangers people’s health. In recent years, the incidence of colorectal cancer has significantly increased and has become the third most common type of cancer. In the past few decades, due to the early detection and treatment, many countries have improved the survival rate of colorectal cancer. Especially in some developed countries, the 5-year survival rate has reached more than 65%.1Treatment options for colorectal cancer include chemotherapy, radiotherapy and surgery.2 In general, surgical removal of the affected tumor and any adjacent intestines can effectively eliminate cancer cells and reduce the risk of cancer spreading. Chemotherapy also occupies an important role in the treatment of colorectal cancer. Postoperative adjuvant chemotherapy in early colorectal cancer can reduce the risk of recurrence. For patients with advanced colorectal cancer who are inoperable, palliative chemotherapy can significantly improve the life quality of patients and prolong survival.Generally, the combination of chemotherapeutic agent results in significantly increased response rates and improved survival.3 Current combination chemotherapy includes 5-fluorouracil (5-FU)/leucovorin with oxaliplatin (FOLFOX), 5-FU/leucovorin and irinotecan (FOLFIRI), capecitabine and oxaliplatin (CAPEOX/XELOX) and 5-FU/leucovorin/oxaliplatin and irinotecan (FOLFOXIRI).FOLFOX chemotherapy has proven to be effective in the treatment of unresectable metastatic colorectal cancer.4 Studies have suggested that patients with stage III colorectal cancer, who receive adjuvant FOLFOX chemotherapy, experience an improved disease-free and overall survival.5 However, about half of the patients were unable to benefit from the treatment and even suffered from neurotoxicity.6There have been several studies that are trying to predict the FOLFOX chemotherapy response.7,8 It has been reported that MTHFR germinal polymorphism is a potential strong predictor of response to FOLFOX therapy, and the response rate to FOLFOX increases continuously with the number of favorable MTHFR alleles.7 Another reported biomarker is SMURF2. It was highly expressed in non-responders for FOLFOX therapy.8To systematically investigate the response mechanisms of FOLFOX chemotherapy in colorectal cancer patients, we collected three gene expression datasets of colorectal cancer patients with FOLFOX therapy and identified the genes that can predict responders to FOLFOX therapy for colorectal cancer using advanced machine learning methods. The biological analysis of several representative signature genes, such as MLKL, CC2D1A, LPL, PAGE4 and SLC26A9, suggested that apoptosis and inflammation signaling pathways were the essential pathways that controlled the response of colorectal cancer patients to FOLFOX chemotherapy.
Methods
The gene expression profiles of colorectal cancer patients with FOLFOX therapy
We searched Gene Expression Omnibus (GEO) database and found three datasets of colorectal cancer patients with FOLFOX therapy.The gene expression profiles of colorectal cancer patients with FOLFOX therapy were combined from three datasets downloaded from GEO with accession number of GSE19860, GSE28702 and GSE72970. The platform of these three datasets was the same. They all used Affymetrix Human Genome U133 Plus 2.0 Array.These three datasets were generated by different researchers from different labs. To minimize the systemic bias, the raw CEL files were downloaded and processed together using R package affyPLM and affy.9 The gene expression levels of probes were quantified with MAS5 method10 and normalized with quantile method. The probe expression levels were transformed into gene expression levels using R package gahgu133plus2cdf and gahgu133plus2.db. There were 18,733 genes with expression levels that were used as features to predict whether a colorectal cancer patient will respond to FOLFOX therapy.In GSE72970 dataset, there were 20 colorectal cancer patients with FOLFOX response and 12 colorectal cancer patients without FOLFOX response. In GSE28702, there were 42 colorectal cancer patients with FOLFOX response and 41 colorectal cancer patients without FOLFOX response. In GSE19860, there were nine colorectal cancer patients with FOLFOX response and 20 colorectal cancer patients without FOLFOX response. Together, there were 42 colorectal cancer patients with FOLFOX response who were considered as positive samples and 41 colorectal cancer patients without FOLFOX response who were considered as negative samples. The sizes of positive and negative samples are shown in Table 1. The clinical information of the 144 colorectal cancer patients from GEO is given in Table S1.
Table 1
The sizes of positive and negative samples
Dataset number
Number of positive samplesa
Number of negative samplesb
Sample size
GSE72970
20
12
32
GSE28702
42
41
83
GSE19860
9
20
29
Combined
71
73
144
Notes:
Positive samples: colorectal cancer patients with FOLFOX response.
Negative samples: colorectal cancer patients without FOLFOX response.
Rank the discriminative genes using mRMR method
The minimal redundancy maximal relevance (mRMR) method11 is widely used to select discriminative features.12–17 The mRMR software downloaded from http://home.penglab.com/proj/mRMR/ was used to perform the feature ranking.It works as follows: first, let us represent all the 18,733 genes, the selected m genes and the to-be-selected n genes using Ω, Ω and Ω, respectively. The relevance I of gene g from Ω with FOLFOX response r can be measured with mutual information (I):18,19The redundancy R of the gene g from Ω with the selected genes in Ω areThe algorithm tries to find the best gene g from Ω that has maximum relevance with FOLFOX response r and minimum redundancy with the selected genes in Ω by maximizing the function belowAfter N rounds of evaluation procedure, all the genes from Ω will be rankedThe mRMR rank represents the discriminating power of the gene.To reduce the computational time, only the top 500 mRMR genes were analyzed in the following steps.
Identify the predictive genes using incremental feature selection (IFS) method
To evaluate the prediction performance of mRMR genes, IFS method20–26 was applied to select the genes with greatest prediction power. The IFS method is a wrapped feature selection method that combines the feature selection with classifier construction. We used Support Vector Machine (SVM) as the classifier. To be specific, the SVM function in R package e1071 was used to construct the classifier.IFS is a process of iteration that adds genes one by one based on the mRMR ranking and then evaluates the classification performance of the selected genes. Each time, the top k genes from the mRMR table were selected and used to build the classifier that predicts whether a colorectal cancer patient will respond to FOLFOX therapy. The performance of each classifier was evaluated with leave-one-out cross validation (LOOCV).The three major measurements for a classifier, sensitivity (Sn), specificity (Sp) and accuracy (ACC), were calculated.In these equations, TP, TN, FP and FN stand for true positive samples, true negative samples, false positive samples and false negative samples, respectively.In this study, the colorectal cancer patients with FOLFOX response and the colorectal cancer patients without FOLFOX response were considered as positive and negative samples, respectively.After 500 rounds of IFS evaluation, an IFS curve can be plotted. The x-axis was the number of used genes, and the y-axis was the LOOCV accuracy. Based on the IFS, we can easily see how many genes should be used to classify the colorectal cancer patients with FOLFOX response and the colorectal cancer patients without FOLFOX response.
The visualization of how predictive the genes are for FOLFOX response
After we identified the predictive genes using mRMR and IFS methods, we tried to visually investigate how good they can classify the colorectal cancer patients with FOLFOX response and the colorectal cancer patients without FOLFOX response.Principal component analysis (PCA)27 was performed to extract the first and second principal component (PC) of the selected genes. PCA is a widely used multivariate statistical method and can capture most of the gene expression variability.27 With the dimensionality reduction via PCA, the high dimension gene expression profiles can be mapped onto two dimensions of PC1 and PC2, which can explain the most variance observed in the data. Since it is unsupervised, the 2D-PCA plot will give an intuitive view of how close each sample is to each other.Another method that we applied was two-way hierarchical clustering of both colorectal cancer patients and selected genes. From the heatmap, we can not only explore whether the colorectal cancer patients with FOLFOX response and the colorectal cancer patients without FOLFOX response were clustered into different groups but also know which genes were highly expressed or lowly expressed in the colorectal cancer patients with FOLFOX response.
Results and discussion
The top discriminative genes ranked with mRMR method
The mRMR can rank the genes based on not only their relevance with the FOLFOX responses of colorectal cancer patients but also the redundancy with each other. Therefore, the discriminative genes identified by mRMR methods will be compact, which means the highly co-expressed genes will not all be selected, only the best representative gene will be chosen. We obtained the top 500 most discriminative genes using the mRMR method. These 500 genes will be further optimized using IFS method.
The predictive genes selected based on IFS method
We used different number of top mRMR genes to construct the SVM classifier. Based on how accurate the model can classify the colorectal cancer patients into the right FOLFOX response groups, we plotted the IFS curve in which the x-axis was the number of genes and the y-axis was the LOOCV accuracy. The IFS curve is shown in Figure 1.
Figure 1
The IFS curve of how the classifiers were based on different number of gene performance.
Notes: The x-axis was the number of genes used to build the classifier and y-axis was the prediction accuracy evaluated with LOOCV. The peak of IFS curve was accuracy of 0.854 when 138 genes were used. But even when only top ten genes were used, the accuracy was over 0.8.
As shown in Figure 1, the peak located at the position of using top 138 genes. Its accuracy was 0.854, which was the highest. We also calculated its sensitivity and specificity, which were 0.845 and 0.863, respectively. The top 138 genes are given in Table S2. The confusion matrix of actual responses and predicted responses is given in Table 2. We calculated the CIs of prediction performance using function sensSpec from R package epibasix28 and the 95% CIs for sensitivity and specificity were (76.1, 92.9) and (78.4, 94.2), respectively.
Table 2
The confusion matrix of actual responses and predicted responses based on 138 genes
Number of patients
Actual responders
Actual non-responders
Predicted responders
60
10
Predicted non-responders
11
63
Although the performance of 138 genes was best, the accuracy of the top ten genes had already been over 0.8. The sensitivity and specificity for the ten gene classifier were 0.732 and 0.890, respectively. The top ten genes are given in Table 3.
Table 3
The top ten mRMR genes
Order
Name
Score
1
LOC100009676
0.131
2
ZNF461
0.101
3
MLKL
0.072
4
MGC15885
0.083
5
MBTD1
0.071
6
CC2D1A
0.067
7
FAM104A
0.061
8
KIF3B
0.060
9
SYTL1
0.060
10
EML6
0.057
The first gene was LOC100009676, which was understudied and did not have too much known functions.The second gene was Lnc-ZNF461, which has been reported to be associated with non-small-cell lung cancer (NSCLC).29 It was involved in immune response and can promote NSCLC progression by interacting with SLA2, DEFB4A, LAT and LIME1.29The third gene was MLKL, a necroptosis kinase. It was reported that MLKL was involved in immune activation in cancer cells.30 Chemotherapy kills MLKL−/− cancer cells, and due to MLKL deficiency, the dying cancer cells will not cause immune response. MLKL may function through ICD signaling pathway. A recent publication by Sun et al31 found that small-molecule analogs of SMAC mimetic in association with MLKL-pDNA and z-VAD-fmk showed antitumor effects in colorectal cancer cells in vitro via induction of RIP3-dependent necroptosis. All these findings have confirmed MLKL as a good chemotherapy response biomarker.Another interesting gene was CC2D1A, a remarkable member of various signaling pathways, such as nuclear factor kB, PDK1/Akt, cAMP/PKA and Notch. Notch pathway is a well-studied colorectal cancer pathway.32,33 It has also been reported to be involved in the antiviral pathway by interacting with TBK-1 and IKKε and acts as a transcriptional repressor of serotonin and dopamine receptor genes.34 CC2D1A silencing can induce apoptosis and increase chemotherapy sensitivity by decreasing Akt kinase activity.35
The responders and non-responders were different on the first PC
To intuitively explore the difference of responders and non-responders, we calculated the first and second PCs of the 138 genes and plotted the PCA of responders (blue dots) and non-responders (red dots) in Figure 2. PC1 represented 8.7% variance, while PC2 represented 4.7% variance.
Figure 2
The PCA plot of responders and non-responders.
Notes: The x-axis was the first PC and y-axis was the second PC. The red dots were NR and the blue dots were R. It can be seen that most responders were in area of PC1<0, while most non-responders were in the area of PC1>0. R and NR were different on the first PC.
Abbreviations: PCA, principal component analysis; PC, principal component; NR, non-responders; R, responders.
It can be seen that most responders were in area of PC1<0, while most non-responders were in the area of PC1>0. The responders and non-responders were different on the first PC.
The highly expressed genes in FOLFOX responders and non-responders
Although the PCA plot clearly demonstrated the difference of responders and non-responders, we were interested in identifying the highly expressed genes in FOLFOX responders and non-responders, which may reveal the biological mechanisms of FOLFOX response in colorectal cancer. Therefore, we plotted the heatmap of the 138 genes in the responder and non-responder colorectal cancer patients (Figure 3).
Figure 3
The heatmap of the 138 genes in the responder and non-responder colorectal cancer patients.
Notes: Each row corresponded to the scaled gene expressed level of a gene. The warmer colors indicated higher expression level and the colder colors indicate lower expression levels. Each column corresponded to a colorectal cancer patient who may be responder (red) and non-responder (green) to FOLFOX therapy. It can be seen that the responders and non-responders were clearly clustered into two groups and correspondingly, the 138 genes were also clustered into two groups. The top cluster of genes was highly expressed in responders and the bottom cluster of genes was highly expressed in non-responders.
Abbreviations: NR, non-responders; R, responders.
It can be seen that the responders and non-responders were clearly clustered into two groups and correspondingly, the 138 genes were also clustered into two groups. The top cluster of genes was highly expressed in responders, and the bottom cluster of genes was highly expressed in non-responders.We have listed the highly expressed genes in FOLFOX responders whose fold change was greater than 1.5 and the lowly expressed genes in FOLFOX responders whose fold change was smaller than 0.67 in Tables 4 and 5, respectively.
Table 4
The highly expressed genes in FOLFOX responders
Gene name
Mean in NRa
Mean in Rb
Fold changec
MGC15885
11.2
23.8
2.1
ENSG00000244627
15.7
32.8
2.1
CRYBB1
7.6
15.1
2.0
NEUROG3
14.1
26.9
1.9
LOC284100
23.5
43.1
1.8
PACSIN1
9.7
17.2
1.8
LPL
179.4
306.0
1.7
LOC340107
18.0
30.5
1.7
C16orf92
16.4
26.2
1.6
CYP4F8
17.6
27.8
1.6
PAGE4
41.3
64.8
1.6
Notes:
NR, colorectal cancer patients without FOLFOX response.
R, colorectal cancer patients with FOLFOX response.
Fold change, R/NR.
Table 5
The lowly expressed genes in FOLFOX responders
Gene name
Mean in NRa
Mean in Rb
Fold changec
SLC26A9
81.0
31.5
0.39
ADAMTSL2
15.3
6.9
0.45
IGKC
2,452.8
1,261.2
0.51
TMPRSS3
298.0
175.0
0.59
CXorf57
79.4
46.9
0.59
OR10H2
13.3
8.2
0.62
HS3ST5
92.1
61.1
0.66
Notes:
NR, colorectal cancer patients without FOLFOX response.
R, colorectal cancer patients with FOLFOX response.
Fold change, R/NR.
For the highly expressed genes in FOLFOX responders, CRYBB1 was one of the highly mutated genes in micro-satellite instability colorectal cancers.36NEUROG3 played important roles in intestinal enteroendocrine cells and was repressed by the growth factor-independent one transcription factor (GFI1) that was normally expressed in Paneth and goblet cells of colon.37LPL is a crucial enzyme for intravascular catabolism of triglyceride-rich lipoproteins. The alteration of LPL may let the cell acquire growth advantage and develop malignancy.38 The LPL gene deficiency increases cancer risk. The tumor suppressive effects of LPL have been verified in animal models; due to its roles in inflammation, it is a great general target for chemotherapy.39CYP4F is a member of the CYP/CYP450 superfamily of enzymes. It was highly expressed in prostate cancer and RNAi experiments, which suggested that CYP4F was important for cell growth and survival.40PAGE4 is a member of GAGE family, which is highly expressed in various tumors.41–43 It has been reported that PAGE4 expression can predict liver metastasis of colorectal cancer.44For the lowly expressed genes in FOLFOX responders, SLC26A9 has colon-specific functions, such as transport of glucose, organic acids, metal ions and mineral absorption.45 Its low expression may affect the growth of tumor cells.
The limitations and potential improvements of this study
Although this study identified candidate genes for chemotherapy response for colorectal cancer and revealed highly possible mechanism, there were several limitations:Since this was a bioinformatics study, we did not validate our results with biological experiments. This limited the discovery of novel mechanisms. To reduce the effects of lacking experimental validation, we did thoroughgoing literature survey and proposed the possible mechanisms based on confirmed biological functions of candidate genes from published papers.The sample size of this study was small, even though we collected all publicly available gene expression profiles from the largest gene expression database, GEO. In the next step, we will collect colorectal cancer patients with chemotherapy from our hospital and build a large independent test dataset.The number of genes was still too large. We will try more advanced feature selection methods to further reduce the number of selected genes. The exhaust search strategies can be applied within the 138 genes to find the optimal 3–5 genes.The clinical information should be documented carefully. Since the data we analyzed were from GEO, much clinical information of the patients was unknown. Analyzing the clinical information may provide novel insight. For example, within the 141 colorectal cancer patients, 117 samples were from primary sites and 27 samples were from metastatic lesions. But, we found that all 27 metastatic samples were predicted with the correct responses, as shown in Table S1 in which the third and sixth columns are actual responses and predicted responses, respectively. There may be two reasons of why the metastatic lesions can predict chemotherapy response: 1) the gene expressions between primary tumors and metastatic lesions have strong correlation.46,47 Staub et al reported that the primary site of metastatic cancer can be predicted based on the similarity between metastatic cancer and primary tissue.46 2) Some of the candidate genes were general tumor genes, such as PAGE4, a member of the GAGE family that is expressed in a variety of tumors.41–43Genetic variations, such as single-nucleotide polymorphisms (SNPs) and copy number variations, have been proven to be a causal factor for tumorgenesis.48–52 They can be used for cancer subtyping and drug response prediction.22,48 Unfortunately, our dataset did not include genetic data. But based on central dogma and previous studies, most SNPs function through expression quantitative trait loci (eQTL).17,18,53 The gene expression data can partially represent the effects of SNPs. If possible, we will preform DNA-Seq and RNA-Seq for the same patients and investigate the eQTL regulatory network of colorectal cancer patients with chemotherapy in the future.
Conclusion
Chemotherapy is a widely used treatment for cancers but not all cancer patients have expected responses to this treatment. In this study, we analyzed the gene expression profiles of FOLFOX responders and FOLFOX non-responders of colorectal cancer patients by combing several datasets. With advanced feature selection methods, we identified the biomarkers that can accurately predict the response of colorectal cancer patient to FOLFOX treatment. The biological analysis of selected genes revealed the possible mechanism of chemotherapy in colorectal cancer.The clinical information of the 144 colorectal cancer patientsAbbreviation: NA, not applicableThe top 138 mRMR genes
Table S1
The clinical information of the 144 colorectal cancer patients
Authors: Aalok Kumar; Renata D Peixoto; Hagen F Kennecke; Daniel J Renouf; Howard J Lim; Sharlene Gill; Caroline H Speers; Winson Y Cheung Journal: Clin Colorectal Cancer Date: 2015-06-06 Impact factor: 4.481
Authors: Kaitlyn E Vinson; Dennis C George; Alexander W Fender; Fred E Bertrand; George Sigounas Journal: Int J Cancer Date: 2015-08-27 Impact factor: 7.396
Authors: Tong-Hui Zhao; Min Jiang; Tao Huang; Bi-Qing Li; Ning Zhang; Hai-Peng Li; Yu-Dong Cai Journal: Biomed Res Int Date: 2013-04-22 Impact factor: 3.411
Authors: Satya Narayan; Asif Raza; Iqbal Mahmud; Nayeong Koo; Timothy J Garrett; Mary E Law; Brian K Law; Arun K Sharma Journal: iScience Date: 2022-06-03