Literature DB >> 25500636

Stable feature selection for clinical prediction: exploiting ICD tree structure using Tree-Lasso.

Iman Kamkar1, Sunil Kumar Gupta2, Dinh Phung3, Svetha Venkatesh4.   

Abstract

Modern healthcare is getting reshaped by growing Electronic Medical Records (EMR). Recently, these records have been shown of great value towards building clinical prediction models. In EMR data, patients' diseases and hospital interventions are captured through a set of diagnoses and procedures codes. These codes are usually represented in a tree form (e.g. ICD-10 tree) and the codes within a tree branch may be highly correlated. These codes can be used as features to build a prediction model and an appropriate feature selection can inform a clinician about important risk factors for a disease. Traditional feature selection methods (e.g. Information Gain, T-test, etc.) consider each variable independently and usually end up having a long feature list. Recently, Lasso and related l1-penalty based feature selection methods have become popular due to their joint feature selection property. However, Lasso is known to have problems of selecting one feature of many correlated features randomly. This hinders the clinicians to arrive at a stable feature set, which is crucial for clinical decision making process. In this paper, we solve this problem by using a recently proposed Tree-Lasso model. Since, the stability behavior of Tree-Lasso is not well understood, we study the stability behavior of Tree-Lasso and compare it with other feature selection methods. Using a synthetic and two real-world datasets (Cancer and Acute Myocardial Infarction), we show that Tree-Lasso based feature selection is significantly more stable than Lasso and comparable to other methods e.g. Information Gain, ReliefF and T-test. We further show that, using different types of classifiers such as logistic regression, naive Bayes, support vector machines, decision trees and Random Forest, the classification performance of Tree-Lasso is comparable to Lasso and better than other methods. Our result has implications in identifying stable risk factors for many healthcare problems and therefore can potentially assist clinical decision making for accurate medical prognosis.
Copyright © 2014 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Classification; Feature selection; Feature stability; Lasso; Tree-Lasso

Mesh:

Year:  2014        PMID: 25500636     DOI: 10.1016/j.jbi.2014.11.013

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  10 in total

1.  Development and Prospective Validation of Tools to Accurately Identify Neurosurgical and Critical Care Events in Children With Traumatic Brain Injury.

Authors:  Tellen D Bennett; Peter E DeWitt; Rebecca R Dixon; Cory Kartchner; Yamila Sierra; Diane Ladell; Rajendu Srivastava; Jay Riva-Cambrin; Allison Kempe; Desmond K Runyan; Heather T Keenan; J Michael Dean
Journal:  Pediatr Crit Care Med       Date:  2017-05       Impact factor: 3.624

2.  Peel Learning for Pathway-Related Outcome Prediction.

Authors:  Yuantong Li; Fei Wang; Mengying Yan; Edward Cantu; Fan Nils Yang; Hengyi Rao; Rui Feng
Journal:  Bioinformatics       Date:  2021-05-27       Impact factor: 6.931

3.  Predicting congenital heart defects: A comparison of three data mining methods.

Authors:  Yanhong Luo; Zhi Li; Husheng Guo; Hongyan Cao; Chunying Song; Xingping Guo; Yanbo Zhang
Journal:  PLoS One       Date:  2017-05-24       Impact factor: 3.240

4.  Individual Morphological Brain Network Construction Based on Multivariate Euclidean Distances Between Brain Regions.

Authors:  Kaixin Yu; Xuetong Wang; Qiongling Li; Xiaohui Zhang; Xinwei Li; Shuyu Li
Journal:  Front Hum Neurosci       Date:  2018-05-25       Impact factor: 3.169

5.  Radiomic Features of Hippocampal Subregions in Alzheimer's Disease and Amnestic Mild Cognitive Impairment.

Authors:  Feng Feng; Pan Wang; Kun Zhao; Bo Zhou; Hongxiang Yao; Qingqing Meng; Lei Wang; Zengqiang Zhang; Yanhui Ding; Luning Wang; Ningyu An; Xi Zhang; Yong Liu
Journal:  Front Aging Neurosci       Date:  2018-09-25       Impact factor: 5.750

6.  Development of a Longitudinal Diagnosis and Prognosis in Patients with Chronic Kidney Disease: Intelligent Clinical Decision-Making Scheme.

Authors:  Chin-Chuan Shih; Ssu-Han Chen; Gin-Den Chen; Chi-Chang Chang; Yu-Lin Shih
Journal:  Int J Environ Res Public Health       Date:  2021-12-04       Impact factor: 3.390

7.  Scalable Predictive Analysis in Critically Ill Patients Using a Visual Open Data Analysis Platform.

Authors:  Sven Van Poucke; Zhongheng Zhang; Martin Schmitz; Milan Vukicevic; Margot Vander Laenen; Leo Anthony Celi; Cathy De Deyne
Journal:  PLoS One       Date:  2016-01-05       Impact factor: 3.240

8.  An experimental study of the intrinsic stability of random forest variable importance measures.

Authors:  Huazhen Wang; Fan Yang; Zhiyuan Luo
Journal:  BMC Bioinformatics       Date:  2016-02-03       Impact factor: 3.169

9.  The Cross-Entropy Based Multi-Filter Ensemble Method for Gene Selection.

Authors:  Yingqiang Sun; Chengbo Lu; Xiaobo Li
Journal:  Genes (Basel)       Date:  2018-05-17       Impact factor: 4.096

10.  Feature selection with the Fisher score followed by the Maximal Clique Centrality algorithm can accurately identify the hub genes of hepatocellular carcinoma.

Authors:  Chengzhang Li; Jiucheng Xu
Journal:  Sci Rep       Date:  2019-11-21       Impact factor: 4.379

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.