Literature DB >> 35072086

Hierarchical Information Criterion for Variable Abstraction.

Mark Mirtchouk1, Bharat Srikishan1, Samantha Kleinberg1.   

Abstract

Large biomedical datasets can contain thousands of variables, creating challenges for machine learning tasks such as causal inference and prediction. Feature selection and ranking methods have been developed to reduce the number of variables and determine which are most important. However in many cases, such as in classification from diagnosis codes, ontologies, and controlled vocabularies, we must choose not only which variables to include but also at what level of granularity. ICD-9 codes, for example, are arranged in a hierarchy, and a user must decide at what level codes should be analyzed. Thus it is currently up to a researcher to decide whether to use any diagnosis of diabetes or whether to distinguish between specific forms, such as Type 2 diabetes with renal complications versus without mention of complications. Currently, there is no existing method that can automatically make this determination and methods for feature selection do not exploit this hierarchical information, which is found in other areas including nutrition (hierarchies of foods), and bioinformatics (hierarchical relationship of genes). To address this, we propose a novel Hierarchical Information Criterion (HIC) that builds on mutual information and allows fully automated abstraction of variables. Using HIC allows us to rank hierarchical features and select the ones with the highest score. We show that this significantly improves performance by an average AUROC of 0.053 over traditional feature selection methods and hand crafted features on two mortality prediction tasks using MIMIC-III ICU data. Our method also improves on the state of the art (Fu et al., 2019) with an AUROC increase from 0.819 to 0.887.

Entities:  

Year:  2021        PMID: 35072086      PMCID: PMC8782429     

Source DB:  PubMed          Journal:  Proc Mach Learn Res


  14 in total

1.  A hybrid knowledge-based and data-driven approach to identifying semantically similar concepts.

Authors:  Rimma Pivovarov; Noémie Elhadad
Journal:  J Biomed Inform       Date:  2012-01-25       Impact factor: 6.317

2.  Using mutual information for selecting features in supervised neural net learning.

Authors:  R Battiti
Journal:  IEEE Trans Neural Netw       Date:  1994

3.  Comorbidity measures for use with administrative data.

Authors:  A Elixhauser; C Steiner; D R Harris; R M Coffey
Journal:  Med Care       Date:  1998-01       Impact factor: 2.983

4.  GRAM: Graph-based Attention Model for Healthcare Representation Learning.

Authors:  Edward Choi; Mohammad Taha Bahadori; Le Song; Walter F Stewart; Jimeng Sun
Journal:  KDD       Date:  2017-08

5.  DDL: Deep Dictionary Learning for Predictive Phenotyping.

Authors:  Tianfan Fu; Trong Nghia Hoang; Cao Xiao; Jimeng Sun
Journal:  IJCAI (U S)       Date:  2019-08

6.  Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data.

Authors:  Hude Quan; Vijaya Sundararajan; Patricia Halfon; Andrew Fong; Bernard Burnand; Jean-Christophe Luthi; L Duncan Saunders; Cynthia A Beck; Thomas E Feasby; William A Ghali
Journal:  Med Care       Date:  2005-11       Impact factor: 2.983

7.  Benchmarking relief-based feature selection methods for bioinformatics data mining.

Authors:  Ryan J Urbanowicz; Randal S Olson; Peter Schmitt; Melissa Meeker; Jason H Moore
Journal:  J Biomed Inform       Date:  2018-07-17       Impact factor: 6.317

8.  A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data.

Authors:  Bjoern H Menze; B Michael Kelm; Ralf Masuch; Uwe Himmelreich; Peter Bachert; Wolfgang Petrich; Fred A Hamprecht
Journal:  BMC Bioinformatics       Date:  2009-07-10       Impact factor: 3.169

9.  Spatially uniform relieff (SURF) for computationally-efficient filtering of gene-gene interactions.

Authors:  Casey S Greene; Nadia M Penrod; Jeff Kiralis; Jason H Moore
Journal:  BioData Min       Date:  2009-09-22       Impact factor: 2.522

10.  MIMIC-III, a freely accessible critical care database.

Authors:  Alistair E W Johnson; Tom J Pollard; Lu Shen; Li-Wei H Lehman; Mengling Feng; Mohammad Ghassemi; Benjamin Moody; Peter Szolovits; Leo Anthony Celi; Roger G Mark
Journal:  Sci Data       Date:  2016-05-24       Impact factor: 6.444

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.