Literature DB >> 34537824

Multimetric feature selection for analyzing multicategory outcomes of colorectal cancer: random forest and multinomial logistic regression models.

Catherine H Feng1, Mary L Disis2, Chao Cheng3,4,5, Lanjing Zhang6,7,8,9.   

Abstract

Colorectal cancer (CRC) is one of the most common cancers worldwide, and a leading cause of cancer deaths. Better classifying multicategory outcomes of CRC with clinical and omic data may help adjust treatment regimens based on individual's risk. Here, we selected the features that were useful for classifying four-category survival outcome of CRC using the clinical and transcriptomic data, or clinical, transcriptomic, microsatellite instability and selected oncogenic-driver data (all data) of TCGA. We also optimized multimetric feature selection to develop the best multinomial logistic regression (MLR) and random forest (RF) models that had the highest accuracy, precision, recall and F1 score, respectively. We identified 2073 differentially expressed genes of the TCGA RNASeq dataset. MLR overall outperformed RF in the multimetric feature selection. In both RF and MLR models, precision, recall and F1 score increased as the feature number increased and peaked at the feature number of 600-1000, while the models' accuracy remained stable. The best model was the MLR one with 825 features based on sum of squared coefficients using all data, and attained the best accuracy of 0.855, F1 of 0.738 and precision of 0.832, which were higher than those using clinical and transcriptomic data. The top-ranked features in the MLR model of the best performance using clinical and transcriptomic data were different from those using all data. However, pathologic staging, HBS1L, TSPYL4, and TP53TG3B were the overlapping top-20 ranked features in the best models using clinical and transcriptomic, or all data. Thus, we developed a multimetric feature-selection based MLR model that outperformed RF models in classifying four-category outcome of CRC patients. Interestingly, adding microsatellite instability and oncogenic-driver data to clinical and transcriptomic data improved models' performances. Precision and recall of tuned algorithms may change significantly as the feature number changes, but accuracy appears not sensitive to these changes.
© 2021. The Author(s), under exclusive licence to United States and Canadian Academy of Pathology.

Entities:  

Mesh:

Year:  2021        PMID: 34537824     DOI: 10.1038/s41374-021-00662-x

Source DB:  PubMed          Journal:  Lab Invest        ISSN: 0023-6837            Impact factor:   5.662


  51 in total

1.  Trends in the characteristics of human functional genomic data on the gene expression omnibus, 2001-2017.

Authors:  Daniel D Liu; Lanjing Zhang
Journal:  Lab Invest       Date:  2018-09-11       Impact factor: 5.662

2.  Radiotherapy for Patients With Resected Tumor Deposit-Positive Colorectal Cancer: A Surveillance, Epidemiology, and End Results-Based Population Study.

Authors:  Laxmi B Chavali; Adana A M Llanos; Jing-Ping Yun; Stephanie M Hill; Xiang-Lin Tan; Lanjing Zhang
Journal:  Arch Pathol Lab Med       Date:  2017-10-19       Impact factor: 5.534

3.  Prognostic value of tumour deposit and perineural invasion status in colorectal cancer patients: a SEER-based population study.

Authors:  Erin Mayo; Adana A M Llanos; Xianghua Yi; Sheng-Zhong Duan; Lanjing Zhang
Journal:  Histopathology       Date:  2016-03-09       Impact factor: 5.087

4.  Association of KRAS mutation with tumor deposit status and overall survival of colorectal cancer.

Authors:  Meifang Zhang; Wenwei Hu; Kun Hu; Yong Lin; Zhaohui Feng; Jing-Ping Yun; Nan Gao; Lanjing Zhang
Journal:  Cancer Causes Control       Date:  2020-05-11       Impact factor: 2.506

5.  Classify multicategory outcome in patients with lung adenocarcinoma using clinical, transcriptomic and clinico-transcriptomic data: machine learning versus multinomial models.

Authors:  Fei Deng; Lanlan Shen; He Wang; Lanjing Zhang
Journal:  Am J Cancer Res       Date:  2020-12-01       Impact factor: 6.166

6.  Genomics and prognosis analysis of epithelial-mesenchymal transition in colorectal cancer patients.

Authors:  Zizhen Zhang; Sheng Zheng; Yifeng Lin; Jiawei Sun; Ning Ding; Jingyu Chen; Jing Zhong; Liuhong Shi; Meng Xue
Journal:  BMC Cancer       Date:  2020-11-23       Impact factor: 4.430

7.  Cofilin-1, LIMK1 and SSH1 are differentially expressed in locally advanced colorectal cancer and according to consensus molecular subtypes.

Authors:  Annie Cristhine Moraes Sousa-Squiavinato; Renata Ivo Vasconcelos; Adriana Sartorio Gehren; Priscila Valverde Fernandes; Ivanir Martins de Oliveira; Mariana Boroni; Jose Andrés Morgado-Díaz
Journal:  Cancer Cell Int       Date:  2021-01-22       Impact factor: 5.722

8.  RNA-Seq Analysis of Colorectal Tumor-Infiltrating Myeloid-Derived Suppressor Cell Subsets Revealed Gene Signatures of Poor Prognosis.

Authors:  Reem Saleh; Varun Sasidharan Nair; Mahmood Al-Dhaheri; Mahwish Khawar; Mohamed Abu Nada; Nehad M Alajez; Eyad Elkord
Journal:  Front Oncol       Date:  2020-11-10       Impact factor: 6.244

9.  12-Chemokine signature, a predictor of tumor recurrence in colorectal cancer.

Authors:  Ryuma Tokunaga; Shigeki Nakagawa; Yasuo Sakamoto; Kenichi Nakamura; Madiha Naseem; Daisuke Izumi; Keisuke Kosumi; Katsunobu Taki; Takaaki Higashi; Tatsunori Miyata; Yuji Miyamoto; Naoya Yoshida; Hideo Baba; Heinz-Josef Lenz
Journal:  Int J Cancer       Date:  2020-04-25       Impact factor: 7.316

10.  Promoter Hypermethylation of CHODL Contributes to Carcinogenesis and Indicates Poor Survival in Patients with Early-stage Colorectal Cancer.

Authors:  Xinyue Zhang; Kaiming Wu; Yuhua Huang; Lixia Xu; Xiaoxing Li; Ning Zhang
Journal:  J Cancer       Date:  2020-02-28       Impact factor: 4.207

View more
  1 in total

1.  The Challenges and Opportunities of Translational Pathology.

Authors:  Lanjing Zhang
Journal:  J Clin Transl Pathol       Date:  2022-02-23
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.