Literature DB >> 33194745

Intratumor Heterogeneity of MYO18A and FBXW7 Variants Impact the Clinical Outcome of Stage III Colorectal Cancer.

Peng-Chan Lin1,2,3, Yu-Min Yeh1,2, Bo-Wen Lin4, Shao-Chieh Lin4, Ren-Hao Chan4, Po-Chuan Chen4, Meng-Ru Shen5,6,7.   

Abstract

Many studies failed to demonstrate benefit from the addition of targeted agents to current standard adjuvant FOLFOX chemotherapy in stage III colorectal cancer (CRC) patients. Intratumor heterogeneity may foster the resistant subclones and leads to cancer recurrence. Here, we built a cancer evolution model and applied machine learning analysis to identify potential therapeutic targets. Among 78 CRC cases, whole-genome (WGS) and deep targeted sequencing data generated from paired blood and primary tumor were used for phylogenetic tree reconstruction. Genetic alterations in the PI3K/AKT, and RTK oncogenic signaling pathways were commonly detected in founding clones. The dominant subclones frequently exhibited dysregulations in the TP53, FBXW7/NOTCH1 tumor suppression, and DNA repair pathways. Fourteen genetic mutations were simultaneously selected by random forest and LASSO methods. The logistic regression model had better accuracy (79%), precision (70%), and recall (65%) and area under the curve (AUC) (82%) for cancer recurrence prediction. Three genes, including MYO18A in the founding clone, FBXW7, and ATM in the dominant subclone, affected the prognosis were selected simultaneously by different feature sets. The in vitro studies, HCT-116 cells transfected with MYO18A siRNA demonstrated a significant reduction in cell migration activity by 20-40%. These results indicate that MYO18A plays a crucial role in the migration of human CRC cells. The cancer evolution model revealed the critical mutations in the founding and dominant subclones. They can be used to predict clinical outcomes and the development of novel therapeutic targets for stage III CRC.
Copyright © 2020 Lin, Yeh, Lin, Lin, Chan, Chen and Shen.

Entities:  

Keywords:  colorectal cancer; intratumor heterogeneity; targeted gene sequencing; tumor evolution; whole-genome sequencing

Year:  2020        PMID: 33194745      PMCID: PMC7658598          DOI: 10.3389/fonc.2020.588557

Source DB:  PubMed          Journal:  Front Oncol        ISSN: 2234-943X            Impact factor:   6.244


Introduction

Colorectal cancer (CRC) is the most commonly diagnosed gastrointestinal cancer and is also one of the leading causes of cancer-related death worldwide (1). Although adjuvant FOLFOX (5-fluorouracil, leucovorin, and oxaliplatin) chemotherapy benefits stage III CRC patients, recurrence develops in 30–35% of patients (2). Many studies have tried to assess the addition of targeted therapy, including bevacizumab and cetuximab, to FOLFOX in the adjuvant treatment of stage III CRC. However, no significant improvement in survival was noted. A considerable challenge of recurrent stage III CRC is identifying the critical genetic mutations responsible for tumor metastasis and delivering effective therapeutic strategies (3, 4). CRC is a highly heterogeneous disease that differs in clinical presentations, molecular characteristics, and responses to treatment and survival. Intratumor heterogeneity is defined as the distinct morphological and phenotypic differences within a tumor (5). Hence, building the genome evolution model underlying the mechanism of tumor carcinogenesis and biological pathways and identifying genetic markers to predict cancer recurrence is crucial to accelerate and facilitate the development of CRC treatment targets. Cancer cells accumulate somatic alterations over time. Most cancers arise from a single clone with acquired genetic variability, and tumor progression and metastasis result from the sequential selection of more aggressive subclones (6). Cancer evolves dynamically as clonal expansions. Recent genomic studies have demonstrated that cancer relapse or metastasis is associated with the addition of new mutations and clonal evolution (7). Intratumor heterogeneity may foster tumor evolution and adaptation and hinder the biomarker development of personalized-medicine strategies that depend on results from single tumor-biopsy samples (7). The most common technology used for the molecular characterization of tumor heterogeneity is the high-throughput DNA sequencing of bulk samples. There is a significant acceleration in the use of next-generating sequencing (NGS) to approach tumor heterogeneity and evolution for precision medicine (8, 9). By using advances in bioinformatics and artificial intelligence, determining the essence of key genetic mutations in cancer evolution has recently become possible. From the evolutionary models, we can identify the “oncogenic addiction or driver” mutations that provide a fitness advantage to cancer targets against neutral “passenger” mutations. In this study, we aimed to develop a genome evolution model by analyzing tumor heterogeneity and discovering actionable mutational targets. We first developed a cancer evolution model for the development of new agents in tumor heterogeneity and the generation of novel and more effective therapies by analyzing somatic mutations and tumor heterogeneity. Second, we established a model predicting cancer recurrence and survival and identified therapeutic driver mutation targets via robust optimization in machine learning. Finally, we used the causal inference model and biological methods to validate the potential cancer evolution targets. The results further described early mutation changes that predict tumors progress to stage III carcinomas and showed that statistical inference predicts that the subclone-related pathogenic mutations are acquired when the cancer is progressing. Here, we defined a broad time window of opportunity for early detection to prevent recurrence and death in advanced colorectal cancer patients. A fine-resolution view of this clonal architecture provides insight into tumor heterogeneity, evolution, and treatment response, all of which may have clinical implications.

Materials and Methods

Study Population

A total of 78 CRC cancer patients were recruited for the study from National Cheng Kung University Hospital (NCKUH) between January 2014 and January 2019. All CRC patients were pathological stage III and received standard surgical resection followed by adjuvant chemotherapy with the regimen of mFOLFOX6 (5-fluorouracil, leucovorin, and oxaliplatin). Clinical information was obtained from medical records. Tumor tissues and blood samples were collected at the time of enrollment. This study was approved by the Institutional Review Board of NCKUH (A-ER-103-395 and A-ER-104-153) and conducted under the Declaration of Helsinki. All participants provided written informed consent.

Germline Whole-Genome Sequencing

Whole blood was collected for genomic DNA extraction. Genomic DNA was quantified with a Qubit fluorescence assay (Thermo Fisher Scientific) and sheared with an S2 instrument (Covaris). Library preparation was carried out using the TruSeq DNA PCR-Free HT Kit (Illumina). Individual DNA libraries were measured by 2100 Bioanalyzer (Agilent) qPCR and Qubit (Thermo Fisher Scientific). Normalized DNA libraries were combined into five-sample pools per flow cell in all eight lanes and clustered on a cBot instrument (Illumina) with Paired-End Cluster Kit V4 (Illumina). All flow cells were sequenced on the HiSeq2500 sequencer (Illumina) using the SBS Kit V4 chemistry (Illumina). FastQC was used to check read quality, and the resulting reads were aligned to the hg19 reference genome with the BWA-MEM algorithm (10). Single nucleotide variants (SNVs) and indel identification and genotyping were performed across all samples simultaneously using standard hard filtering parameters or variant quality score recalibration according to GATK Best Practices recommendations. WGS was presented with a minimum, median coverage of 30X.

Targeted Tumor Sequencing by Cancer Panel

A total of 78 formalin-fixed paraffin-embedded primary tumor samples were collected for histologic assessment followed by the extraction of nucleic acids. The histologic evaluation was performed by pathologists, who determined the percentage of tumors and adequacy for sequencing. Tumor deep targeted sequencing was performed by Oncomine Comprehensive Assays (OCA) version 1 (Thermo Fisher Scientific) (11). OCA v1 was designed to detect 143 drug targets, including 73 hotspot genes, 49 focal copy number variation (CNV) gains, 26 genes for full coding region sequencing (CDS), and 22 fusion driver genes. (druggable) The Ion PGM Sequencing 200 Kit v.2 was used with the Ion PGM sequencer (Thermo Fisher Scientific) according to the manufacturer’s instructions. All samples were analyzed using the Torrent Suite Software 5.0.4, aligning all reads to the hg19 reference genome, and variant calling was performed running the Torrent Variant Caller plugin version 5.0.4.0. We used the ANNOVAR tool to annotate variants and filter out indels not reported in the 1000 Genomes Project, the Single Nucleotide Polymorphism Database (dbSNP), and the Exome Aggregation Consortium (ExAc) (12).

Cancer Evolution Model Construction

The somatic mutation calling was performed by comparing the sequencing data generated by OCA v1 and germline genetic variants by WGS. Somatic SNVs were obtained by DeepSNV (13). DeepSNV (a beta-binomial model and a likelihood ratio test) is a tool that can detect subclonal SNVs with frequencies higher than 10-4 with higher sensitivity and specificity. The tumor subclones were identified, and clusters were identified using SciClone (14), a Bayesian clustering method. ClonEvol was used to establish the evolution tree in cancer (15).

Statistical Analysis

Chi-square tests, Fisher’s exact tests, and unpaired t-tests were used to assess the differences between groups. Kaplan–Meier curves were used to evaluate disease-free survival, and the log-rank test was used to compare the differences between groups. Disease-free survival was defined as the time between surgery and recurrence of cancer. A P value < 0.05 was considered statistically significant.

Pathway Analysis

Signaling pathways for frequently mutated genes detected in founding clones and dominant subclones were enriched by using Reactome (http://www.reactome.org) (16). Significance was derived from over-representation analysis built in Reactome.

Machine Learning Analysis

Feature Selection

The machine learning model includes logistic regression (LR), least absolute shrinkage and selection operator (LASSO) method, and random forest. The LASSO method is a regression model that penalizes the absolute size of the coefficients, causing some regression coefficients to shrink to zero. The penalization, or constraint, allows the LASSO method to estimate a model while simultaneously performing automatic variable selection (17). The random forest (RF) model consists of an ensemble of classification trees, where each classifier was built from different independent and identically distributed bootstrap samples from a training set. Each classifier casts a vote for the most popular class. Odds ratio (OD) measures the strength of the association between two types, and hazard ratio (HR) is the ratio of the hazard rates corresponding to the conditions described by two levels of an explanatory variable in survival analysis. LASSO method was done by R package glmnet, Random forest was done by R package randomForest, the odds ratio was done by R package fmsb, and the hazard ratio was done by R package survival and survminer.

Classifier Model

The support vector machine (SVM) (18) is a state-of-the-art classification method referred to as black-box processes. Random forest (RF) (19) is an “off-the-shelf” widely used machine learning method that shows competitive prediction performance. XGBoost (20) is an optimized implementation of gradient boosting (GBM). The advantages of the classifier include less prone to overfitting due to the strong inner regularization scheme, easy to implement parallelization and scalability. C5.0 is a machine learning method based on decision trees, which is also referred to as white box processes and is known for interpretability (21). Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval, or ratio-level independent variables (22, 23). Finally, we performed ten-fold cross-validation on our dataset to evaluate the efficiency of the models using the caret packages in R with default parameters (24). SVM, XGBoost, and C5.0 were done by R package e1071, xgboost, and C50.

Migration Assay

For the migration assay, placed on a cell culture surface, the ibidi Culture-Insert 2 Well (ibidi GmbH, Planegg, Germany) provides two cell cultureservoirs, each separated by a 500 μm wall. Cells were plated at 80,000 cells per well and allowed to attach overnight. On the following day, culture inserts were removed, and light microscopy images were acquired. Cells were maintained under standard culture conditions while migrating toward the cell-free gap area. For HCT-116, HT-29, and DLD-1 cells, images were acquired every 24 hours later. Images were analyzed using ImageJ software.

Results

Identification of Cancer Driver Mutations by Conventional Approaches

The discovery of somatic mutations that drive cancer progression is essential for therapeutic strategies. Following the protocol shown in , we used conventional statistical methods such as odds ratio (OD) and hazard ratio (HR) to evaluate the clinical impact of somatic mutations in a cohort of 78 stage III CRC patients. The median follow-up duration of this cohort was 31.2 months. Among these patients, 33% (26/78) had recurrent disease, and 67% (52/78) remained disease-free. Of all patients, the distribution of gender was the same. The median age of these patients was 58 years old. The prevalent primary tumor site was left colon (80.8%). There was no significant difference between recurrence and tumor characteristics, such as tumor site, tumor invasion stage (T), and nodal stage (N) ( ). A total of 30 mutated genes were identified. There was no genetic variant significantly associated with recurrence in these CRC patients by odds ratio. The hazard ratio ( ) of four mutated genes, including MTOR, BAP1, TSC1, and NOTCH1, showed a correlation with worse progression-free survival [p < 0.05 and hazard ratio (HR) =10.5–76.5]. However, these four genetic variants were rare and were found only in 1.3% (1/78) of these CRC patients ( ). Targeting these rare mutations does not seem to provide significant improvements in the clinical outcome of stage III CRC. These data imply the limitation of the conventional approach of the analytic sequencing method.

Targeting Intratumor Heterogeneity by Cancer Evolution Model

Evolutionary dynamic models have been studied to elucidate the process of tumorigenesis and discover the driver somatic mutations for the development of potential therapeutic strategies (25). Accordingly, we built clonal evolution models and applied statistics and machine learning algorithms to identify disease-related driver mutations. As shown in , normal cells accumulate background variations (blue cross) and many cancer-specific mutations (stars) over a prolonged clinically latent period to become cancers. Multiple subclones could be found within single cancer tissue. Theoretically, the background variation and driver mutation (red star) are present in the ancestor cell and all the subclones. During cancer evolution, additional mutations occur subsequently in different subclones, which give rise to intratumoral heterogeneity. Deep targeted-gene sequencing of bulky tumor tissue provides useful information on variant allele frequency (VAF), which could be used to predict cellularity and construct phylogenetic trees. Therefore, we used the targeted-gene sequencing data of 78 stage III CRCs to reconstruct the tumor evolution. The WGS data from paired normal blood samples were used to filter germline variants ( ). After the somatic variant calling by DeepSNV (13), SciClone (14) was applied for analyzing the distribution of purity-scaled variant allele fractions, and ClonEvol (15) was used to reconstruct the phylogenetic tree. We determined the dominant clone according to the predicted cellularity. After that, the potential candidate driver mutations in ancestor and dominant clones could be identified. We applied machine learning models to predict the risk of cancer recurrence by using different genetic variant feature selection strategies and classifiers. By this pipeline, we could identify the critical driver genetic variants that could be potential drug targeting clonal variants and involved in cancer survival stratification ( ).
Figure 1

The concept of cancer evolution and study design. (A) The cancer evolution model depicts the accumulation of cancer-specific mutations and clonal expansion during tumor formation. The data of VAF generated from deep sequencing of tumor tissues could be used to predict the cellularity and construct the phylogenetic tree. The blue cross indicates the background variation. Stars with different colors represent mutations that develop during cancer evolution. (B) Based on the concept of cancer evolution, the research design is proposed to identify driver mutations with crucial clinical impact.

The concept of cancer evolution and study design. (A) The cancer evolution model depicts the accumulation of cancer-specific mutations and clonal expansion during tumor formation. The data of VAF generated from deep sequencing of tumor tissues could be used to predict the cellularity and construct the phylogenetic tree. The blue cross indicates the background variation. Stars with different colors represent mutations that develop during cancer evolution. (B) Based on the concept of cancer evolution, the research design is proposed to identify driver mutations with crucial clinical impact.

Phylogenetic Tree Reconstruction From the Clonal Evolution Model

A clinical example is shown in . This case was a 40-year-old man with stage III CRC at initial diagnosis. He received standard surgical resection followed by adjuvant chemotherapy with mFOLFOX6. Recurrence was detected by computed tomography (CT) scan 15.4 months after surgery. WGS and deep targeted-gene sequencing were performed on paired normal and tumor samples, respectively. displays the allele frequency of the detected variants in tumor and germline tissues and indicates the levels of significance of the deepSNV test. Using the VHL gene as an example, the dots above the diagonal line represented the variants that were called as true variants rather than sequencing errors by the deepSNV algorithm. A total of 307 somatic SNVs were detected in this case. When SciClone was used to perform the clustering, 307 SNVs clustered into four groups ( ). The mean VAF values of clusters 1 to 4 are 36.5, 22.1, 11.2, and 4.2%. Besides, the mean posterior probabilities of clusters 1 to 4 are 93.5, 83, 95, and 99%. shows the kernel density plots of VAF under two copy number estimations. The model did not perfectly fit the original distribution because the mean probability of cluster 2 was 83%. shows the scatter plot of each cluster’s SNV coverage and VAF. The SNVs with coverage less than 50x were filtered out in this study because the low coverage would lead to biased estimation. demonstrate the cancer cellularity prediction and the most likely evolutionary tree of the primary tumor via ClonEvol. In this case, the gray color was the founding clone with cellularity ranging from 18.6–33.3%, the green color was the dominant subclone ranging from 28.5–35.3%, the blue color was subclone two ranging from 25.8–35.4%, and the purple color was subclone four ranging from 3.6–14.9%. Finally, the phylogenetic tree was constructed ( ).
Figure 2

Representation of analyzing clonal evolution in a CRC patient as an example. (A) Scatter plot showing the variant allele frequencies (VAFs) of the example VHL gene in tumor and normal tissues and the levels of significance of the deepSNV test. (B) The flow chart showing the utility of various algorithms for calling of SNV, clustering of VAFs, and analysis of clonal relationships from the sequencing data of paired normal-tumor samples. (C) Inferring subclones according to the clustering of VAFs by SciClone. (D) Kernel density estimation (KDE) of VAFs. The distribution of mutations occurring in colon cancer patients. The gray line represents the mutations detected in the founding clone, and the green line represents driver mutations in the dominant subclone. (E) The read depth versus VAF plot showing the read depth of 4 clusters of genes under the assumption of neutral copy number (copy number = 2). (F) Sphere of cells demonstrating the founding clone and clonal subpopulations of the CRC tumor sample by CloneEvol. (G) Bell plot showing clonal dynamics during tumor formation.

Representation of analyzing clonal evolution in a CRC patient as an example. (A) Scatter plot showing the variant allele frequencies (VAFs) of the example VHL gene in tumor and normal tissues and the levels of significance of the deepSNV test. (B) The flow chart showing the utility of various algorithms for calling of SNV, clustering of VAFs, and analysis of clonal relationships from the sequencing data of paired normal-tumor samples. (C) Inferring subclones according to the clustering of VAFs by SciClone. (D) Kernel density estimation (KDE) of VAFs. The distribution of mutations occurring in colon cancer patients. The gray line represents the mutations detected in the founding clone, and the green line represents driver mutations in the dominant subclone. (E) The read depth versus VAF plot showing the read depth of 4 clusters of genes under the assumption of neutral copy number (copy number = 2). (F) Sphere of cells demonstrating the founding clone and clonal subpopulations of the CRC tumor sample by CloneEvol. (G) Bell plot showing clonal dynamics during tumor formation.

Sequential Oncogenic and Tumor Suppression Genetic Alterations in Cancer Evolution

By using the study protocol shown in , we identified possible driver mutations in the founding clone and dominant subclone for each cancer patient. Among 78 CRC patients, 66 and 49 genetic variants with high or moderate protein impact were detected in founding and dominant subclones, respectively. Several variants were frequently detected in the founding clones of these 78 CRCs, including the ABL1, MYO18A, and ATM mutations ( ). Approximately 78.2, 73.1, and 64.1% of patients harbored the ABL1, MYO18A, and ATM mutations, respectively, in their founding clone. In contrast, the most commonly detected mutations in the dominant subclones of these CRCs were BRCA1, BRCA2, TET2, APC, VHL, MSH2, TP53, PIK3CA, and FBXW7 mutations, accounting for 79.5, 93.6, 64.1, 55.1, 53.8, 41, 30.8, 32, and 21.8%, respectively. Driver genes can be classified into distinct signaling pathways that control cell survival, cell fate, and genome maintenance (26). Accordingly, we analyzed the dysregulated signaling pathways in founding clones and dominant subclones. As shown in , the signaling pathways involved were significantly different between the founding clones and dominant subclones. Alterations in the PI3K/AKT, RAF/MAP, and RTK signaling pathways were commonly detected in founding clones. By contrast, the dominant subclones frequently exhibited dysregulations in the TP53, FBXW7/NOTCH1, and DNA repair pathways. These results implied that cancer cells accumulated different somatic mutations during cancer evolution. Oncogenic alterations in signaling pathways controlling cell proliferation and survival, such as the PI3K/AKT and MAPK pathways, occurred at the early stage of cancer formation (27, 28). Mutations involving the tumor suppressors and DNA repair pathways became more important during evolution.
Figure 3

Mutated genes and relevant pathways in the founding clone and dominant subclones. (A) Percentage of patients who harbor each mutated gene was occurring in founding clones or dominant subclones of their primary tumors. A gene under the dashed diagonal indicates that mutation frequently occurs in the founding clone than the dominant subclone of these patients. (B) Heatmap showing oncogenic pathways enriched by mutated genes frequently occurring in founding clones and dominant subclones, respectively. Color density represents the significance of enrichment derived by Reactome.

Mutated genes and relevant pathways in the founding clone and dominant subclones. (A) Percentage of patients who harbor each mutated gene was occurring in founding clones or dominant subclones of their primary tumors. A gene under the dashed diagonal indicates that mutation frequently occurs in the founding clone than the dominant subclone of these patients. (B) Heatmap showing oncogenic pathways enriched by mutated genes frequently occurring in founding clones and dominant subclones, respectively. Color density represents the significance of enrichment derived by Reactome.

Optimizing the Selection of Clonal Mutations by Recurrence Status

To investigate the clinical significance of mutations detected in the founding clone and dominant subclones, we used different feature selection techniques to identify the important mutations associated with the recurrence of CRC patients. Top 30 mutations were selected by Gini importance using the random forest (RF). The FBXW7 and MYO18A mutations were the variables with the highest importance among these CRC patients. LASSO was performed to select mutated genes with nonzero coefficients, and 23 mutations (LASSO23) were selected. By calculating the odds ratio (OD) and hazard ratio (HR), we identified 8 (OD8) and 25 (HR25) genes, respectively, that were significantly associated with recurrence in this CRC cohort (p < 0.05). The above results were shown in . Fourteen genes were simultaneously selected by random forest and LASSO, including GATA3, ACVRL1, MYO18A, IDH1, ABL1, NFE2L2, MCL1, RET, PDCD1LG2, TSC2, CSF1R, ATM, FBXW7, and TP53. RF_LASSO14 was named for this 14-gene set ( ). We analyzed the correlation between disease-free survival (DFS) and the mutation status of 14 genes selected by both the random forest and lasso. Among these 14 genes, mutations of CSF1R, PDCD1LG2, FBXW7, TSC2, and NFE2L2 gene were significantly associated with shorter DFS. In contrast, MYO18A mutation was associated with better DFS. No association between the DFS and the mutation status of the other eight genes was observed ( ). The heatmap of these 14 mutated genes is shown in . For robust optimization, we input the gene groups identified by four different feature sets into five classifier models, including support vector machine (SVM), C5.0, random forest, logistic regression, and XGBoost (gradient boosting), to predict and classify the cancer recurrence. The recall value of 10-fold cross-validation for SVM, C5.0, RF, LR, and XGBoost were 0.615, 0.346, 0.384, 0.654, and 0.461, respectively. The accuracy for SVM, C5.0, RF, LR, and XGBoost were 0.756, 0.615, 0.756, 0.795, and 0.744, respectively. The logistic regression model had the best performance with the highest recall value and better accuracy. The results were shown in . The receiver operating characteristic (ROC) curve analysis confirmed that the RF_LASSO14 gene set had better accuracy (79%), precision (70%), and recall (65%) and area under the curve (AUC) (82%) in the logistic regression model ( ). To emphasize the importance of cancer evolution, we compared the performance between the models with or without intratumor heterogeneity. We operated the bootstrapping process 5000 times and selected background genetic mutations as features by lasso methods and built the same logistic regression classifier to calculate the probability of accuracy over 0.795. Consequently, the probability of a conventional model with better accuracy than the evolution model was only 4.54%, which proved that our model was quite meaningful.
Figure 4

Feature selection of founding clone and dominant subclone genes and prediction model for CRC recurrence. (A) Venn diagram showing overlapping genes selected by the random forest (RF) and LASSO models. RF_LASSO14 indicates a group of 14 overlapping genes. F represents the founding clone gene mutation, and S represents the dominant subclone gene mutation. (B) Receiver operating characteristic (ROC) curves of gene sets selected by different statistical and machine learning models for prediction of recurrence in stage III CRCs. The area under the curve of ROC corresponding to each gene set was shown. RF_LASSO14, 14 genes simultaneously selected by random forest and LASSO; OD8, eight genes selected by odds ratio; HR25, 25 genes selected by hazard ratio.

Feature selection of founding clone and dominant subclone genes and prediction model for CRC recurrence. (A) Venn diagram showing overlapping genes selected by the random forest (RF) and LASSO models. RF_LASSO14 indicates a group of 14 overlapping genes. F represents the founding clone gene mutation, and S represents the dominant subclone gene mutation. (B) Receiver operating characteristic (ROC) curves of gene sets selected by different statistical and machine learning models for prediction of recurrence in stage III CRCs. The area under the curve of ROC corresponding to each gene set was shown. RF_LASSO14, 14 genes simultaneously selected by random forest and LASSO; OD8, eight genes selected by odds ratio; HR25, 25 genes selected by hazard ratio.

Survival Stratification by Three Genetic Variants via the Decision Tree Model

Currently, the “one size fits all” approach is still used for adjuvant treatment of stage III CRC patients. FOLFOX6 chemotherapy is the gold standard regimen without considering genomic alterations. In this CRC cohort, all patients were pathological stage III and received standard surgical resection followed by adjuvant FOLFOX6 chemotherapy. The 5-year DFS was approximately 70% ( ). The decision tree, a nonparametric supervised learning method, was used to analyze the predictive value of the mutations identified in the founding clone and dominant to further subclassify these stage III CRC patients and identify potential treatment strategies subclone. As shown in , three mutations, including the MYO18A mutation in the founding clone and FBXW7 and the ATM mutation in the dominant subclone, could be used to stratify these 78 CRC patients into four subgroups that had different clinical outcomes. Group 1 (G1) was the patient without the MYO18A mutation in the founding clone (F.MYO18A), and this group of patients had the worst DFS ( ). Patients harboring the MYO18A mutation could be further categorized into groups 2, 3, and 4 according to the dominant subclone’s mutation status of FBXW7 and ATM. Patients in group 2 (G2) had FBXW7 mutations (S.FBXW7), and patients in group 4 (G4) had ATM mutations in the dominant subclone (S.ATM). Patients in group 4 had the best outcome, followed by groups 3 and 2. The mutated genes in the founding and dominant subclones detected in these four subgroups of patients are shown in . Targeting these relevant mutations in founding and dominant subclones might provide benefits for stage III CRC patients, especially groups 1 and 2. Since MYO18A, FBXW7, and ATM mutations ( ) have a considerable impact on clinical outcomes, these mutations might be potential therapeutic targets. MYO18A mutation was detected in 50 and 84.6% of patients with and without recurrence. In contrast, the percentage of FBXW7 mutations in patients with or without recurrence was 38.5 and 13.5%, respectively. When analyzed by the Chi-Square test, the distributions of MYO18A and FBXW7 mutations are significantly different in these two groups of patients (p = 0.002 and 0.019) ( ). Several clinical and pathological factors, as shown in the , were also considered when analyzing the prognostic impact of MYO18A and FBXW7 mutations. We analyzed these factors through univariate and multivariate Cox proportional hazards model. The results showed MYO18A, and FBXW7 mutations are significantly associated with the clinical outcome when univariate analysis. The outcome was not affected by age, gender, primary tumor location, the depth of tumor invasion, and the number of lymph node metastasis. In multivariate analysis, both the MYO18A and FBXW7 mutations were still the independent prognostic factors ( ).
Figure 5

The model of survival decision tree (CRC) (A) The Kaplan-Meier curve of disease-free survival (DFS) for 78 stage III CRC patients. (B) CRC patients are grouped into four subgroups according to the mutation status of the MYO18A gene in the founding clone (F) MYO18A) and FBXW7 and ATM genes in the dominant subclone (S. FBXW7 and S.ATM). Parentheses indicate the number of patients in each group. (C) The Kaplan-Meier curves of DFS stratified by group (1–4) and compared with the log-rank test. (D) Summary of the mutations and clinical outcome in 4 subgroups of CRC patients. The colored rectangles indicate the detection of recurrent disease or not. The gray and white rectangles indicate the gene with and without the mutation.

The model of survival decision tree (CRC) (A) The Kaplan-Meier curve of disease-free survival (DFS) for 78 stage III CRC patients. (B) CRC patients are grouped into four subgroups according to the mutation status of the MYO18A gene in the founding clone (F) MYO18A) and FBXW7 and ATM genes in the dominant subclone (S. FBXW7 and S.ATM). Parentheses indicate the number of patients in each group. (C) The Kaplan-Meier curves of DFS stratified by group (1–4) and compared with the log-rank test. (D) Summary of the mutations and clinical outcome in 4 subgroups of CRC patients. The colored rectangles indicate the detection of recurrent disease or not. The gray and white rectangles indicate the gene with and without the mutation.

Examining the Biological Role of MYO18A In Vitro

MYO18A is a gene encoding a unique myosin involved in intracellular transport processes and cell motilities (29). Therefore, we assessed whether MYO18A has a role in CRC cell invasion or migration. Specific siRNA targeting MYO18A was transfected into human CRC cancer cells, and the impact on cell migration was determined by gap closure assay. As shown in , MYO18A siRNA significantly reduced the level of MYO18A protein after 48 hours of transfection. Compared to cells treated with scrambled siRNA, HCT-116 cells transfected with MYO18A siRNA demonstrated a significant reduction in cell migration activity by 20 to 40% ( ). Reduced migration was also observed in MYO18A siRNA-treated HT-29 and DLD-1 cells. These results indicate that MYO18A plays an essential role in the migration of human CRC cells.
Figure 6

Depletion of MYO18A by siRNA suppressed cell migration in human CRC cells. (A) HCT-116 cells were transfected with either scrambled or MYO18A siRNA. Cells were harvested 24, 48, and 72 hours after transfection, and western blotting was used to determine the effect on MYO18A protein expression. Representative images (B) and quantitative analysis (C) of gap closure assay in HCT-116 cells. Forty-eight hours after the transfection of siRNA, the gap was created and monitored with a microscope every 12 hours. The red dotted lines indicate where the migration began. The total gap area created at 0 hours and gap closure areas at the indicated time points were quantified by ImageJ software. The migration activity, which was determined by the percentage of gap closure area, was compared between cells transfected with the scramble and MYO18A siRNA. Each value represents the mean +/- SEM from at least three independent experiments in each group. ***p < 0.01.

Depletion of MYO18A by siRNA suppressed cell migration in human CRC cells. (A) HCT-116 cells were transfected with either scrambled or MYO18A siRNA. Cells were harvested 24, 48, and 72 hours after transfection, and western blotting was used to determine the effect on MYO18A protein expression. Representative images (B) and quantitative analysis (C) of gap closure assay in HCT-116 cells. Forty-eight hours after the transfection of siRNA, the gap was created and monitored with a microscope every 12 hours. The red dotted lines indicate where the migration began. The total gap area created at 0 hours and gap closure areas at the indicated time points were quantified by ImageJ software. The migration activity, which was determined by the percentage of gap closure area, was compared between cells transfected with the scramble and MYO18A siRNA. Each value represents the mean +/- SEM from at least three independent experiments in each group. ***p < 0.01.

Discussion

With the advances of NGS technologies and machine learning in cancer biology, targeting cancer evolution has become more feasible. Here, we demonstrate using a genomic-machine learning model for recurrence-risk prediction and identification of potential therapeutic targets for CRC. Importantly, we designed different treatment strategies for different risk subgroups of CRC patients. Our results highlight the following important points: (i) The sequential oncogenic and tumor suppression genetic alterations were found during tumor evolution. (ii) We identified a fourteen genes panel that could predict the risk of recurrence in stage III CRC. (iii) Three genes, including MYO18A in the founding clone, FBXW7, and ATM in the dominant subclone, affected the prognosis. (iv) MYO18A plays an important role in the migration of human CRC cells. These findings suggest that the integration of genomic data and cancer evolution models provides insights into disease biology. These results could be applied for the recurrence-risk classification of stage III CRC and the development of novel therapeutic strategies. The ability to predict the future behavior of individual cancers is crucial for precision cancer medicine. Considering that traditional methods might hinder the efficacy of rare somatic selection, we established a more comprehensive pipeline by the cancer evolution model for treatment strategy analysis. First, we selected rare somatic mutations that were not detected by traditional methods. To target intratumor heterogeneity and cancer evolution somatic mutations to overcome chemotherapy resistance, we used the evolution model. We supposed that founding clone and dominant clones (the most estimated prevalence) are the significant events for cancer recurrence, which was confirmed by the probability of random two subclone sampling ( ). Finally, for robust optimization, the different machine learning algorithms and statistical methods were selected for cancer recurrence-risk prediction and survival stratification models. In the CRC multistage progression model, the adenoma-carcinoma sequence refers to a stepwise pattern of mutational activation of oncogenes and inactivation of tumor suppressor genes. In our cancer evolution model, we provide information about genetic changes in cancer-driving metastasis. In the early stages, mutations in the oncogenic pathway, such as the receptor kinase signaling (RTK) pathway, the fibroblast growth factor receptor (FGFR) signaling pathway, and the transforming growth factor-beta (TGFB) signaling pathway, appear to be the first step. Second, mutations in TP53, FBXW7, and APC may play a role in cancer evolution. Sequential oncogenic and tumor suppression genetic alterations were consistent with the hypothesis of cancer two-hit theory. Classification and decision systems in data analysis are mostly based on accuracy. In our study, we trade off accuracy, precision, and recall for useful optimization in a multiple machine learning model. We selected the 14 genetic variants for cancer recurrence prediction. The variant distribution and frequency in cancer patients with or without recurrence are shown in . There are 11 genetic variants in the funding clone and three genetic variants in the dominant subclone. This study implies a robust optimization cancer panel for recurrence prediction. We have developed a genomic-machine learning model and pipeline software for CRC recurrence-risk prediction. shows the 5-year DFS of this CRC cohort. The 5-year disease-free survival rate is approximately 70%, which has reached the benchmark of a worldwide standard. In addition to modeling for recurrence prediction, we need to improve care survival by different treatment strategies. Using the three machine learning models, we can classify the CRC subgroup by three genetic variants. Group 1 and group 2 have a poor prognosis. The progression-free survival of Group 4 was better than that of group 3. This successful study identified associations between three genetic markers and survival subgroup and recurrence status. The uniqueness of this study is that the evolution model shows the clinical impact on stage III colorectal cancer by the machine learning method utilizing the comprehensive clinical and genomic information. However, the major limitation of this study is the small sample size. It is too early to make a strong conclusion at this stage in terms of the case number. MYO18A and FBXW7 intratumor heterogeneity variants are potential targets in the cancer evolution model. MYO18A is an unconventional myosin that has been implicated in multiple cellular processes. MYO18A has been involved as a cancer driver. Overexpression of MYO18A was observed in metastatic prostate cancer cell lines in a previous study (29, 30). Migration assay of various cancer cell lines also revealed that MYO18A-depleted cells had decreased cell motility (31, 32). By analyzing the clinical data, we found that patients without MYO18A mutation in the founding clone (F. MYO18A) had the worst DFS ( ). Moreover, the in vitro study showed knockdown of MY018A by siRNA caused a reduction in cell migration activity by 20–40% ( ). These results imply that MYO18A is a potential tumor driver for cancer cell migration. The alteration of MYO18A was common in the founding clone ( ). To conclude, the alteration of MY018A is a gain-of-function or loss-of-function mutation; further in vitro and in vivo investigations are needed to study the impact of MYO18A mutations on cell survival, proliferation, or angiogenesis. FBXW7 is a critical tumor suppressor involved in the ubiquitin-proteasome system in human cancer. It has been demonstrated that metastatic CRC patients with FBXW7 missense mutations show shorter overall survival compared with patients with wild-type FBXW7 (33). The results are consistent with our data showing poor prognosis survival in the G1 and G2 groups. In conclusion, this study highlights the importance of a cancer evolution model in the development of new therapeutic strategies. The integration of genomics and machine learning could provide an opportunity to identify new targets for cancers.

Data Availability Statement

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: NCBI BioProject (Accessions: PRJNA655796/PRJNA662159).

Ethics Statement

This study was approved by the institutional review board of NCKUH (A-ER-103-395 and A-ER-104-153) and conducted under the Declaration of Helsinki. All participants provided written informed consent.

Author Contributions

Conception and study design: P-CL and M-RS. Development of methodology: P-CL, Y-MY, and M-RS. Acquisition of data: Y-MY, B-WL, S-CL, and P-CC. Statistical and computational analysis: P-CL, Y-MY, and M-RS. Writing, review, and/or revision of the manuscript: P-CL, Y-MY, B-WL, S-CL, R-HC, P-CC, and M-RS. Study supervision: M-RS. All authors contributed to the article and approved the submitted version.

Funding

This work was supported by the Ministry of Science and Technology (MOST 104-2320-B-006-015-MY3, MOST 107-2319-B-006-001 and MOST 108-2319-B-006-001 to M-RS), National Health Research Institutes (NHRI-108A1-CACO-02181811), the Ministry of Health Welfare (MOHW107-TDU-B-211-114018, MOHW108-TDU-B-211-124018), National Cheng Kung University and National Cheng Kung University Hospital, Taiwan.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
  25 in total

1.  The life history of 21 breast cancers.

Authors:  Serena Nik-Zainal; Peter Van Loo; David C Wedge; Ludmil B Alexandrov; Christopher D Greenman; King Wai Lau; Keiran Raine; David Jones; John Marshall; Manasa Ramakrishna; Adam Shlien; Susanna L Cooke; Jonathan Hinton; Andrew Menzies; Lucy A Stebbings; Catherine Leroy; Mingming Jia; Richard Rance; Laura J Mudie; Stephen J Gamble; Philip J Stephens; Stuart McLaren; Patrick S Tarpey; Elli Papaemmanuil; Helen R Davies; Ignacio Varela; David J McBride; Graham R Bignell; Kenric Leung; Adam P Butler; Jon W Teague; Sancha Martin; Goran Jönsson; Odette Mariani; Sandrine Boyault; Penelope Miron; Aquila Fatima; Anita Langerød; Samuel A J R Aparicio; Andrew Tutt; Anieta M Sieuwerts; Åke Borg; Gilles Thomas; Anne Vincent Salomon; Andrea L Richardson; Anne-Lise Børresen-Dale; P Andrew Futreal; Michael R Stratton; Peter J Campbell
Journal:  Cell       Date:  2012-05-17       Impact factor: 41.582

2.  Feasibility of Large-Scale Genomic Testing to Facilitate Enrollment Onto Genomically Matched Clinical Trials.

Authors:  Funda Meric-Bernstam; Lauren Brusco; Kenna Shaw; Chacha Horombe; Scott Kopetz; Michael A Davies; Mark Routbort; Sarina A Piha-Paul; Filip Janku; Naoto Ueno; David Hong; John De Groot; Vinod Ravi; Yisheng Li; Raja Luthra; Keyur Patel; Russell Broaddus; John Mendelsohn; Gordon B Mills
Journal:  J Clin Oncol       Date:  2015-05-26       Impact factor: 44.544

3.  ClonEvol: clonal ordering and visualization in cancer sequencing.

Authors:  H X Dang; B S White; S M Foltz; C A Miller; J Luo; R C Fields; C A Maher
Journal:  Ann Oncol       Date:  2017-12-01       Impact factor: 32.976

4.  Oxaliplatin combined with weekly bolus fluorouracil and leucovorin as surgical adjuvant chemotherapy for stage II and III colon cancer: results from NSABP C-07.

Authors:  J Philip Kuebler; H Samuel Wieand; Michael J O'Connell; Roy E Smith; Linda H Colangelo; Greg Yothers; Nicholas J Petrelli; Michael P Findlay; Thomas E Seay; James N Atkins; John L Zapas; J Wendall Goodwin; Louis Fehrenbacher; Ramesh K Ramanathan; Barbara A Conley; Patrick J Flynn; Gamini Soori; Lauren K Colman; Edward A Levine; Keith S Lanier; Norman Wolmark
Journal:  J Clin Oncol       Date:  2007-04-30       Impact factor: 44.544

5.  An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests.

Authors:  Carolin Strobl; James Malley; Gerhard Tutz
Journal:  Psychol Methods       Date:  2009-12

Review 6.  Cancer genome landscapes.

Authors:  Bert Vogelstein; Nickolas Papadopoulos; Victor E Velculescu; Shibin Zhou; Luis A Diaz; Kenneth W Kinzler
Journal:  Science       Date:  2013-03-29       Impact factor: 47.728

7.  Molecularly targeted therapy based on tumour molecular profiling versus conventional therapy for advanced cancer (SHIVA): a multicentre, open-label, proof-of-concept, randomised, controlled phase 2 trial.

Authors:  Christophe Le Tourneau; Jean-Pierre Delord; Anthony Gonçalves; Céline Gavoille; Coraline Dubot; Nicolas Isambert; Mario Campone; Olivier Trédan; Marie-Ange Massiani; Cécile Mauborgne; Sebastien Armanet; Nicolas Servant; Ivan Bièche; Virginie Bernard; David Gentien; Pascal Jezequel; Valéry Attignon; Sandrine Boyault; Anne Vincent-Salomon; Vincent Servois; Marie-Paule Sablin; Maud Kamal; Xavier Paoletti
Journal:  Lancet Oncol       Date:  2015-09-03       Impact factor: 41.316

8.  Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing.

Authors:  Li Ding; Timothy J Ley; David E Larson; Christopher A Miller; Daniel C Koboldt; John S Welch; Julie K Ritchey; Margaret A Young; Tamara Lamprecht; Michael D McLellan; Joshua F McMichael; John W Wallis; Charles Lu; Dong Shen; Christopher C Harris; David J Dooling; Robert S Fulton; Lucinda L Fulton; Ken Chen; Heather Schmidt; Joelle Kalicki-Veizer; Vincent J Magrini; Lisa Cook; Sean D McGrath; Tammi L Vickery; Michael C Wendl; Sharon Heath; Mark A Watson; Daniel C Link; Michael H Tomasson; William D Shannon; Jacqueline E Payton; Shashikant Kulkarni; Peter Westervelt; Matthew J Walter; Timothy A Graubert; Elaine R Mardis; Richard K Wilson; John F DiPersio
Journal:  Nature       Date:  2012-01-11       Impact factor: 49.962

9.  GOLPH3 drives cell migration by promoting Golgi reorientation and directional trafficking to the leading edge.

Authors:  Mengke Xing; Marshall C Peterman; Robert L Davis; Karen Oegema; Andrew K Shiau; Seth J Field
Journal:  Mol Biol Cell       Date:  2016-10-05       Impact factor: 4.138

10.  The Reactome pathway knowledgebase.

Authors:  David Croft; Antonio Fabregat Mundo; Robin Haw; Marija Milacic; Joel Weiser; Guanming Wu; Michael Caudy; Phani Garapati; Marc Gillespie; Maulik R Kamdar; Bijay Jassal; Steven Jupe; Lisa Matthews; Bruce May; Stanislav Palatnik; Karen Rothfels; Veronica Shamovsky; Heeyeon Song; Mark Williams; Ewan Birney; Henning Hermjakob; Lincoln Stein; Peter D'Eustachio
Journal:  Nucleic Acids Res       Date:  2013-11-15       Impact factor: 16.971

View more
  4 in total

1.  Sequential and co-occurring DNA damage response genetic mutations impact survival in stage III colorectal cancer patients receiving adjuvant oxaliplatin-based chemotherapy.

Authors:  Peng-Chan Lin; Yu-Min Yeh; Ren-Hao Chan; Bo-Wen Lin; Po-Chuan Chen; Chien-Chang Pan; Meng-Ru Shen
Journal:  BMC Cancer       Date:  2021-03-02       Impact factor: 4.430

2.  Implications of Intratumor Heterogeneity on Consensus Molecular Subtype (CMS) in Colorectal Cancer.

Authors:  Saikat Chowdhury; Matan Hofree; Kangyu Lin; Dipen Maru; Scott Kopetz; John Paul Shen
Journal:  Cancers (Basel)       Date:  2021-09-30       Impact factor: 6.639

Review 3.  Cutting-Edge AI Technologies Meet Precision Medicine to Improve Cancer Care.

Authors:  Peng-Chan Lin; Yi-Shan Tsai; Yu-Min Yeh; Meng-Ru Shen
Journal:  Biomolecules       Date:  2022-08-17

Review 4.  Myosin Motors: Novel Regulators and Therapeutic Targets in Colorectal Cancer.

Authors:  Nayden G Naydenov; Susana Lechuga; Emina H Huang; Andrei I Ivanov
Journal:  Cancers (Basel)       Date:  2021-02-11       Impact factor: 6.639

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.