Meng-Xiang Li1,2, Xiao-Meng Sun2,3, Wei-Gang Cheng4, Hao-Jie Ruan2, Ke Liu1,2, Pan Chen2, Hai-Jun Xu2, She-Gan Gao2, Xiao-Shan Feng5,6, Yi-Jun Qi7. 1. School of Information Engineering of Henan University of Science and Technology, 263 Kaiyuan Road, Luolong Qu, Luoyang, 471023, P. R. China. 2. Henan Key Laboratory of Microbiome and Esophageal Cancer Prevention and Treatment; Henan Key Laboratory of Cancer Epigenetics, Cancer Hospital, The First Affiliated Hospital, College of Clinical Medicine, Medical College of Henan University of Science and Technology, 24 Jinghua Road, Jianxi Qu, Luoyang, 471003, P. R. China. 3. The Sixth People's Hospital of Luoyang, Oncology Department, 14 Xiyuan Road, Jianxi Qu, Luoyang, 471003, P. R. China. 4. Department of Thyroid and Breast Cancer Surgery, The First Affiliated Hospital, College of Clinical Medicine, Medical College of Henan University of Science and Technology, 24 Jinghua Road, Jianxi Qu, Luoyang, 471003, P. R. China. 5. School of Information Engineering of Henan University of Science and Technology, 263 Kaiyuan Road, Luolong Qu, Luoyang, 471023, P. R. China. samfeng137@hotmail.com. 6. Henan Key Laboratory of Microbiome and Esophageal Cancer Prevention and Treatment; Henan Key Laboratory of Cancer Epigenetics, Cancer Hospital, The First Affiliated Hospital, College of Clinical Medicine, Medical College of Henan University of Science and Technology, 24 Jinghua Road, Jianxi Qu, Luoyang, 471003, P. R. China. samfeng137@hotmail.com. 7. Henan Key Laboratory of Microbiome and Esophageal Cancer Prevention and Treatment; Henan Key Laboratory of Cancer Epigenetics, Cancer Hospital, The First Affiliated Hospital, College of Clinical Medicine, Medical College of Henan University of Science and Technology, 24 Jinghua Road, Jianxi Qu, Luoyang, 471003, P. R. China. qiyijun@haust.edu.cn.
Abstract
BACKGROUND: A plethora of prognostic biomarkers for esophageal squamous cell carcinoma (ESCC) that have hitherto been reported are challenged with low reproducibility due to high molecular heterogeneity of ESCC. The purpose of this study was to identify the optimal biomarkers for ESCC using machine learning algorithms. METHODS: Biomarkers related to clinical survival, recurrence or therapeutic response of patients with ESCC were determined through literature database searching. Forty-eight biomarkers linked to recurrence or prognosis of ESCC were used to construct a molecular interaction network based on NetBox and then to identify the functional modules. Publicably available mRNA transcriptome data of ESCC downloaded from Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) datasets included GSE53625 and TCGA-ESCC. Five machine learning algorithms, including logical regression (LR), support vector machine (SVM), artificial neural network (ANN), random forest (RF) and XGBoost, were used to develop classifiers for prognostic classification for feature selection. The area under ROC curve (AUC) was used to evaluate the performance of the prognostic classifiers. The importances of identified molecules were ranked by their occurrence frequencies in the prognostic classifiers. Kaplan-Meier survival analysis and log-rank test were performed to determine the statistical significance of overall survival. RESULTS: A total of 48 clinically proven molecules associated with ESCC progression were used to construct a molecular interaction network with 3 functional modules comprising 17 component molecules. The 131,071 prognostic classifiers using these 17 molecules were built for each machine learning algorithm. Using the occurrence frequencies in the prognostic classifiers with AUCs greater than the mean value of all 131,071 AUCs to rank importances of these 17 molecules, stratifin encoded by SFN was identified as the optimal prognostic biomarker for ESCC, whose performance was further validated in another 2 independent cohorts. CONCLUSION: The occurrence frequencies across various feature selection approaches reflect the degree of clinical importance and stratifin is an optimal prognostic biomarker for ESCC.
BACKGROUND: A plethora of prognostic biomarkers for esophageal squamous cell carcinoma (ESCC) that have hitherto been reported are challenged with low reproducibility due to high molecular heterogeneity of ESCC. The purpose of this study was to identify the optimal biomarkers for ESCC using machine learning algorithms. METHODS: Biomarkers related to clinical survival, recurrence or therapeutic response of patients with ESCC were determined through literature database searching. Forty-eight biomarkers linked to recurrence or prognosis of ESCC were used to construct a molecular interaction network based on NetBox and then to identify the functional modules. Publicably available mRNA transcriptome data of ESCC downloaded from Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) datasets included GSE53625 and TCGA-ESCC. Five machine learning algorithms, including logical regression (LR), support vector machine (SVM), artificial neural network (ANN), random forest (RF) and XGBoost, were used to develop classifiers for prognostic classification for feature selection. The area under ROC curve (AUC) was used to evaluate the performance of the prognostic classifiers. The importances of identified molecules were ranked by their occurrence frequencies in the prognostic classifiers. Kaplan-Meier survival analysis and log-rank test were performed to determine the statistical significance of overall survival. RESULTS: A total of 48 clinically proven molecules associated with ESCC progression were used to construct a molecular interaction network with 3 functional modules comprising 17 component molecules. The 131,071 prognostic classifiers using these 17 molecules were built for each machine learning algorithm. Using the occurrence frequencies in the prognostic classifiers with AUCs greater than the mean value of all 131,071 AUCs to rank importances of these 17 molecules, stratifin encoded by SFN was identified as the optimal prognostic biomarker for ESCC, whose performance was further validated in another 2 independent cohorts. CONCLUSION: The occurrence frequencies across various feature selection approaches reflect the degree of clinical importance and stratifin is an optimal prognostic biomarker for ESCC.
Authors: Yixin Wang; Jan G M Klijn; Yi Zhang; Anieta M Sieuwerts; Maxime P Look; Fei Yang; Dmitri Talantov; Mieke Timmermans; Marion E Meijer-van Gelder; Jack Yu; Tim Jatkoe; Els M J J Berns; David Atkins; John A Foekens Journal: Lancet Date: 2005 Feb 19-25 Impact factor: 79.321
Authors: Jacques Ferlay; Hai-Rim Shin; Freddie Bray; David Forman; Colin Mathers; Donald Maxwell Parkin Journal: Int J Cancer Date: 2010-12-15 Impact factor: 7.396
Authors: Neal D Freedman; Liam J Murray; Farin Kamangar; Christian C Abnet; Michael B Cook; Olof Nyrén; Weimin Ye; Anna H Wu; Leslie Bernstein; Linda M Brown; Mary H Ward; Nirmala Pandeya; Adele C Green; Alan G Casson; Carol Giffen; Harvey A Risch; Marilie D Gammon; Wong-Ho Chow; Thomas L Vaughan; Douglas A Corley; David C Whiteman Journal: Gut Date: 2011-03-14 Impact factor: 23.059
Authors: Gina D Tran; Xiu-Di Sun; Christian C Abnet; Jin-Hu Fan; Sanford M Dawsey; Zhi-Wei Dong; Steven D Mark; You-Lin Qiao; Philip R Taylor Journal: Int J Cancer Date: 2005-01-20 Impact factor: 7.396
Authors: Lawrence S Engel; Wong-Ho Chow; Thomas L Vaughan; Marilie D Gammon; Harvey A Risch; Janet L Stanford; Janet B Schoenberg; Susan T Mayne; Robert Dubrow; Heidrun Rotterdam; A Brian West; Martin Blaser; William J Blot; Mitchell H Gail; Joseph F Fraumeni Journal: J Natl Cancer Inst Date: 2003-09-17 Impact factor: 13.506
Authors: S Tsuji; Y Midorikawa; T Takahashi; K Yagi; T Takayama; K Yoshida; Y Sugiyama; H Aburatani Journal: Br J Cancer Date: 2011-11-17 Impact factor: 7.640