| Literature DB >> 31186717 |
Zuhuan Gan1, Qiyuan Zou2, Yan Lin3, Zihai Xu1, Zhong Huang1, Zhichao Chen1, Yufeng Lv1.
Abstract
The aim of the current study was to develop a predictor classifier for response to fluorouracil-based chemotherapy in patients with advanced colorectal cancer (CRC) using microarray gene expression profiles of primary CRC tissues. Using two expression profiles downloaded from the Gene Expression Omnibus database, differentially expressed genes (DEGs) between responders and non-responders to fluorouracil-based chemotherapy were identified. A total of 791 DEGs, including 303 that were upregulated and 488 that were downregulated in responders, were identified. Functional enrichment analysis revealed that the DEGs were primarily involved in 'cell mitosis', 'DNA replication' and 'cell cycle' signaling pathways. Following feature selection using two methods, a random forest classifier for response to fluorouracil-based chemotherapy with 13 DEGs was constructed. The accuracy of the 13-gene classifier was 0.930 in the training set and 0.810 in the validation set. The receiver operating characteristic curve analysis revealed that the area under the curve was 1.000 in the training set and 0.873 in the validation set (P=0.227). The 13-gene-based classifier described in the current study may be used as a potential biomarker to predict the effects of fluorouracil-based chemotherapy in patients with CRC.Entities:
Keywords: colorectal cancer; differential expression genes; fluorouracil-based chemotherapy; random forest classifier
Year: 2019 PMID: 31186717 PMCID: PMC6507297 DOI: 10.3892/ol.2019.10159
Source DB: PubMed Journal: Oncol Lett ISSN: 1792-1074 Impact factor: 2.967
Figure 1.Significantly enriched GO annotation and enriched KEGG pathways of differentially expressed genes. GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes.
Figure 2.LASSO model and principal component analysis. (A) 10-fold cross-validation for tuning parameter selection in the LASSO model. (B) PCA prior to and (C) following LASSO variable reduction. LASSO, least absolute shrinkage and selection operator; PCA, principal component analysis.
Overview of the 31 optimal genes.
| Gene | Log2 fold change (Responder/non-responder) | P-value | Coefficients provided by least absolute shrinkage and selection operator | Variable importance provided by Boruta |
|---|---|---|---|---|
| Matrix metallopeptidase 12 | −1.069 | 0.002 | −0.154 | Tentative |
| C-X-C motif chemokine ligand 11 | −1.016 | 0.015 | −0.184 | Rejected |
| Forkhead box P2 | 0.957 | 0.003 | 0.032 | Tentative |
| Small muscle protein X-linked | 0.766 | 0.003 | 0.575 | Confirmed |
| Pleckstrin homology like domain family A member 1 | −0.625 | 0.000 | −0.584 | Confirmed |
| Prostaglandin reductase 2 | −0.602 | 0.000 | −0.792 | Confirmed |
| Chitinase 1 | 0.569 | 0.002 | 0.976 | Confirmed |
| S100 calcium binding protein A2 | −0.541 | 0.039 | −0.091 | Rejected |
| Histone cluster 1 H2B family member c | 0.539 | 0.001 | 0.927 | Confirmed |
| RP1-74M1.3 | −0.515 | 0.005 | −0.023 | Tentative |
| Formin homology 2 domain containing 3 | 0.469 | 0.013 | 0.855 | Confirmed |
| RNA binding motif protein 3 | −0.451 | 0.001 | −0.555 | Tentative |
| Tubulin polymerization promoting protein family member 3 | 0.412 | 0.011 | 0.442 | Rejected |
| Cadherin related family member 2 | 0.411 | 0.047 | 0.913 | Tentative |
| OTUD6B antisense RNA 1 (head to head) | 0.387 | 0.012 | 0.651 | Confirmed |
| Teashirt zinc finger homeobox 1 | 0.384 | 0.004 | 0.347 | Tentative |
| Cholinergic receptor nicotinic | −0.365 | 0.000 | −3.574 | Confirmed |
| Stromal antigen 3-like 4 (pseudogene) | −0.364 | 0.005 | −0.554 | Rejected |
| RPA interacting protein | −0.343 | 0.000 | −0.825 | Confirmed |
| Leucine rich repeat neuronal 1 | −0.334 | 0.017 | −0.331 | Rejected |
| Heparan- | 0.334 | 0.006 | 1.374 | Rejected |
| MINDY lysine 48 deubiquitinase 3 | −0.320 | 0.001 | −0.253 | Tentative |
| THAP domain containing 5 | −0.308 | 0.016 | −0.432 | Rejected |
| DNA ligase 4 | 0.298 | 0.002 | 1.692 | Confirmed |
| Zinc finger protein 2 | −0.291 | 0.004 | −1.885 | Tentative |
| ASAP1 intronic transcript 2 | 0.289 | 0.004 | 0.045 | Confirmed |
| Small integral membrane protein 30 | −0.287 | 0.001 | −0.973 | Confirmed |
| c-Maf inducing protein | 0.282 | 0.001 | 0.208 | Confirmed |
| ADAMTS like 2 | 0.278 | 0.005 | 1.088 | Tentative |
| Nucleoporin 133 | −0.273 | 0.011 | −1.718 | Tentative |
| DEAD-box helicase 28 | −0.267 | 0.003 | −0.063 | Tentative |
Figure 3.Z score evolution during Boruta run. Green lines correspond to confirmed attributes, yellow to tentative, red to rejected ones; and blue lines correspond to respectively minimal, average and maximal shadow attribute importance.
Performance of the 13-gene classifier.
| Cohort | Sensitivity | Specificity | Positive predictive value | Negative predictive value | Accuracy | Area under the curve |
|---|---|---|---|---|---|---|
| Training set | 0.970 | 0.960 | 0.910 | 0.960 | 0.930 | 1.000 |
| Validation set | 0.860 | 0.880 | 0.750 | 0.880 | 0.810 | 0.873 |
Figure 4.Receiver operating characteristic curves for training and validation sets. AUC, area under the curve.