Literature DB >> 26241355

A systematic comparison of feature space effects on disease classifier performance for phenotype identification of five diseases.

Christopher Kotfila1, Özlem Uzuner2.   

Abstract

Automated phenotype identification plays a critical role in cohort selection and bioinformatics data mining. Natural Language Processing (NLP)-informed classification techniques can robustly identify phenotypes in unstructured medical notes. In this paper, we systematically assess the effect of naive, lexically normalized, and semantic feature spaces on classifier performance for obesity, atherosclerotic cardiovascular disease (CAD), hyperlipidemia, hypertension, and diabetes. We train support vector machines (SVMs) using individual feature spaces as well as combinations of these feature spaces on two small training corpora (730 and 790 documents) and a combined (1520 documents) training corpus. We assess the importance of feature spaces and training data size on SVM model performance. We show that inclusion of semantically-informed features does not statistically improve performance for these models. The addition of training data has weak effects of mixed statistical significance across disease classes suggesting larger corpora are not necessary to achieve relatively high performance with these models.
Copyright © 2015 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Classification; Natural language processing; Phenotyping

Mesh:

Year:  2015        PMID: 26241355      PMCID: PMC4994187          DOI: 10.1016/j.jbi.2015.07.016

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  25 in total

1.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program.

Authors:  A R Aronson
Journal:  Proc AMIA Symp       Date:  2001

2.  The Unified Medical Language System (UMLS): integrating biomedical terminology.

Authors:  Olivier Bodenreider
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

Review 3.  A review of feature selection techniques in bioinformatics.

Authors:  Yvan Saeys; Iñaki Inza; Pedro Larrañaga
Journal:  Bioinformatics       Date:  2007-08-24       Impact factor: 6.937

4.  Practical applications for natural language processing in clinical research: The 2014 i2b2/UTHealth shared tasks.

Authors:  Özlem Uzuner; Amber Stubbs
Journal:  J Biomed Inform       Date:  2015-10-24       Impact factor: 6.317

5.  The Unified Medical Language System.

Authors:  D A Lindberg; B L Humphreys; A T McCray
Journal:  Methods Inf Med       Date:  1993-08       Impact factor: 2.176

6.  N-gram support vector machines for scalable procedure and diagnosis classification, with applications to clinical free text data from the intensive care unit.

Authors:  Ben J Marafino; Jason M Davies; Naomi S Bardach; Mitzi L Dean; R Adams Dudley
Journal:  J Am Med Inform Assoc       Date:  2014-04-30       Impact factor: 4.497

7.  Naïve Electronic Health Record phenotype identification for Rheumatoid arthritis.

Authors:  Robert J Carroll; Anne E Eyler; Joshua C Denny
Journal:  AMIA Annu Symp Proc       Date:  2011-10-22

8.  The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies.

Authors:  Catherine A McCarty; Rex L Chisholm; Christopher G Chute; Iftikhar J Kullo; Gail P Jarvik; Eric B Larson; Rongling Li; Daniel R Masys; Marylyn D Ritchie; Dan M Roden; Jeffery P Struewing; Wendy A Wolf
Journal:  BMC Med Genomics       Date:  2011-01-26       Impact factor: 3.063

9.  Chapter 13: Mining electronic health records in the genomics era.

Authors:  Joshua C Denny
Journal:  PLoS Comput Biol       Date:  2012-12-27       Impact factor: 4.475

10.  Next-generation phenotyping of electronic health records.

Authors:  George Hripcsak; David J Albers
Journal:  J Am Med Inform Assoc       Date:  2012-09-06       Impact factor: 4.497

View more
  9 in total

1.  Feature extraction for phenotyping from semantic and knowledge resources.

Authors:  Wenxin Ning; Stephanie Chan; Andrew Beam; Ming Yu; Alon Geva; Katherine Liao; Mary Mullen; Kenneth D Mandl; Isaac Kohane; Tianxi Cai; Sheng Yu
Journal:  J Biomed Inform       Date:  2019-02-07       Impact factor: 6.317

2.  Ensembles of NLP Tools for Data Element Extraction from Clinical Notes.

Authors:  Tsung-Ting Kuo; Pallavi Rao; Cleo Maehara; Son Doan; Juan D Chaparro; Michele E Day; Claudiu Farcas; Lucila Ohno-Machado; Chun-Nan Hsu
Journal:  AMIA Annu Symp Proc       Date:  2017-02-10

3.  Practical applications for natural language processing in clinical research: The 2014 i2b2/UTHealth shared tasks.

Authors:  Özlem Uzuner; Amber Stubbs
Journal:  J Biomed Inform       Date:  2015-10-24       Impact factor: 6.317

Review 4.  Natural Language Processing for EHR-Based Computational Phenotyping.

Authors:  Zexian Zeng; Yu Deng; Xiaoyu Li; Tristan Naumann; Yuan Luo
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2018-06-25       Impact factor: 3.710

5.  Using Natural Language Processing to Measure and Improve Quality of Diabetes Care: A Systematic Review.

Authors:  Alexander Turchin; Luisa F Florez Builes
Journal:  J Diabetes Sci Technol       Date:  2021-03-19

6.  Evaluating shallow and deep learning strategies for the 2018 n2c2 shared task on clinical text classification.

Authors:  Michel Oleynik; Amila Kugic; Zdenko Kasáč; Markus Kreuzthaler
Journal:  J Am Med Inform Assoc       Date:  2019-11-01       Impact factor: 4.497

7.  BioHackathon 2015: Semantics of data for life sciences and reproducible research.

Authors:  Rutger A Vos; Toshiaki Katayama; Hiroyuki Mishima; Shin Kawano; Shuichi Kawashima; Jin-Dong Kim; Yuki Moriya; Toshiaki Tokimatsu; Atsuko Yamaguchi; Yasunori Yamamoto; Hongyan Wu; Peter Amstutz; Erick Antezana; Nobuyuki P Aoki; Kazuharu Arakawa; Jerven T Bolleman; Evan Bolton; Raoul J P Bonnal; Hidemasa Bono; Kees Burger; Hirokazu Chiba; Kevin B Cohen; Eric W Deutsch; Jesualdo T Fernández-Breis; Gang Fu; Takatomo Fujisawa; Atsushi Fukushima; Alexander García; Naohisa Goto; Tudor Groza; Colin Hercus; Robert Hoehndorf; Kotone Itaya; Nick Juty; Takeshi Kawashima; Jee-Hyub Kim; Akira R Kinjo; Masaaki Kotera; Kouji Kozaki; Sadahiro Kumagai; Tatsuya Kushida; Thomas Lütteke; Masaaki Matsubara; Joe Miyamoto; Attayeb Mohsen; Hiroshi Mori; Yuki Naito; Takeru Nakazato; Jeremy Nguyen-Xuan; Kozo Nishida; Naoki Nishida; Hiroyo Nishide; Soichi Ogishima; Tazro Ohta; Shujiro Okuda; Benedict Paten; Jean-Luc Perret; Philip Prathipati; Pjotr Prins; Núria Queralt-Rosinach; Daisuke Shinmachi; Shinya Suzuki; Tsuyosi Tabata; Terue Takatsuki; Kieron Taylor; Mark Thompson; Ikuo Uchiyama; Bruno Vieira; Chih-Hsuan Wei; Mark Wilkinson; Issaku Yamada; Ryota Yamanaka; Kazutoshi Yoshitake; Akiyasu C Yoshizawa; Michel Dumontier; Kenjiro Kosaki; Toshihisa Takagi
Journal:  F1000Res       Date:  2020-02-24

8.  Natural Language Processing Based Instrument for Classification of Free Text Medical Records.

Authors:  Manana Khachidze; Magda Tsintsadze; Maia Archuadze
Journal:  Biomed Res Int       Date:  2016-09-07       Impact factor: 3.411

9.  Cohort Selection for Clinical Trials From Longitudinal Patient Records: Text Mining Approach.

Authors:  Irena Spasic; Dominik Krzeminski; Padraig Corcoran; Alexander Balinsky
Journal:  JMIR Med Inform       Date:  2019-10-31
  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.