Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 A systematic comparison of feature space effects on disease classifier performance for phenotype identification of five diseases.

Literature DB >> 26241355

A systematic comparison of feature space effects on disease classifier performance for phenotype identification of five diseases.

Abstract

Automated phenotype identification plays a critical role in cohort selection and bioinformatics data mining. Natural Language Processing (NLP)-informed classification techniques can robustly identify phenotypes in unstructured medical notes. In this paper, we systematically assess the effect of naive, lexically normalized, and semantic feature spaces on classifier performance for obesity, atherosclerotic cardiovascular disease (CAD), hyperlipidemia, hypertension, and diabetes. We train support vector machines (SVMs) using individual feature spaces as well as combinations of these feature spaces on two small training corpora (730 and 790 documents) and a combined (1520 documents) training corpus. We assess the importance of feature spaces and training data size on SVM model performance. We show that inclusion of semantically-informed features does not statistically improve performance for these models. The addition of training data has weak effects of mixed statistical significance across disease classes suggesting larger corpora are not necessary to achieve relatively high performance with these models.

Entities: Chemical Disease Gene Mutation Species

Keywords: Classification; Natural language processing; Phenotyping

Mesh：

Year: 2015 PMID： 26241355 PMCID： PMC4994187 DOI： 10.1016/j.jbi.2015.07.016

Source DB: PubMed Journal: J Biomed Inform ISSN： 1532-0464 Impact factor: 6.317

25 in total

1. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program.

Authors: A R Aronson
Journal: Proc AMIA Symp Date: 2001

2. The Unified Medical Language System (UMLS): integrating biomedical terminology.

Authors: Olivier Bodenreider
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

Review 3. A review of feature selection techniques in bioinformatics.

Authors: Yvan Saeys; Iñaki Inza; Pedro Larrañaga
Journal: Bioinformatics Date: 2007-08-24 Impact factor: 6.937

4. Practical applications for natural language processing in clinical research: The 2014 i2b2/UTHealth shared tasks.

Authors: Özlem Uzuner; Amber Stubbs
Journal: J Biomed Inform Date: 2015-10-24 Impact factor: 6.317

5. The Unified Medical Language System.

Authors: D A Lindberg; B L Humphreys; A T McCray
Journal: Methods Inf Med Date: 1993-08 Impact factor: 2.176

6. N-gram support vector machines for scalable procedure and diagnosis classification, with applications to clinical free text data from the intensive care unit.

Authors: Ben J Marafino; Jason M Davies; Naomi S Bardach; Mitzi L Dean; R Adams Dudley
Journal: J Am Med Inform Assoc Date: 2014-04-30 Impact factor: 4.497

7. Naïve Electronic Health Record phenotype identification for Rheumatoid arthritis.

Authors: Robert J Carroll; Anne E Eyler; Joshua C Denny
Journal: AMIA Annu Symp Proc Date: 2011-10-22

8. The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies.

Authors: Catherine A McCarty; Rex L Chisholm; Christopher G Chute; Iftikhar J Kullo; Gail P Jarvik; Eric B Larson; Rongling Li; Daniel R Masys; Marylyn D Ritchie; Dan M Roden; Jeffery P Struewing; Wendy A Wolf
Journal: BMC Med Genomics Date: 2011-01-26 Impact factor: 3.063

9. Chapter 13: Mining electronic health records in the genomics era.

Authors: Joshua C Denny
Journal: PLoS Comput Biol Date: 2012-12-27 Impact factor: 4.475

10. Next-generation phenotyping of electronic health records.

Authors: George Hripcsak; David J Albers
Journal: J Am Med Inform Assoc Date: 2012-09-06 Impact factor: 4.497

9 in total

1. Feature extraction for phenotyping from semantic and knowledge resources.

Authors: Wenxin Ning; Stephanie Chan; Andrew Beam; Ming Yu; Alon Geva; Katherine Liao; Mary Mullen; Kenneth D Mandl; Isaac Kohane; Tianxi Cai; Sheng Yu
Journal: J Biomed Inform Date: 2019-02-07 Impact factor: 6.317

2. Ensembles of NLP Tools for Data Element Extraction from Clinical Notes.

Authors: Tsung-Ting Kuo; Pallavi Rao; Cleo Maehara; Son Doan; Juan D Chaparro; Michele E Day; Claudiu Farcas; Lucila Ohno-Machado; Chun-Nan Hsu
Journal: AMIA Annu Symp Proc Date: 2017-02-10

3. Practical applications for natural language processing in clinical research: The 2014 i2b2/UTHealth shared tasks.

Authors: Özlem Uzuner; Amber Stubbs
Journal: J Biomed Inform Date: 2015-10-24 Impact factor: 6.317

Review 4. Natural Language Processing for EHR-Based Computational Phenotyping.

Authors: Zexian Zeng; Yu Deng; Xiaoyu Li; Tristan Naumann; Yuan Luo
Journal: IEEE/ACM Trans Comput Biol Bioinform Date: 2018-06-25 Impact factor: 3.710

5. Using Natural Language Processing to Measure and Improve Quality of Diabetes Care: A Systematic Review.

Authors: Alexander Turchin; Luisa F Florez Builes
Journal: J Diabetes Sci Technol Date: 2021-03-19

6. Evaluating shallow and deep learning strategies for the 2018 n2c2 shared task on clinical text classification.

Authors: Michel Oleynik; Amila Kugic; Zdenko Kasáč; Markus Kreuzthaler
Journal: J Am Med Inform Assoc Date: 2019-11-01 Impact factor: 4.497

7. BioHackathon 2015: Semantics of data for life sciences and reproducible research.

Authors: Rutger A Vos; Toshiaki Katayama; Hiroyuki Mishima; Shin Kawano; Shuichi Kawashima; Jin-Dong Kim; Yuki Moriya; Toshiaki Tokimatsu; Atsuko Yamaguchi; Yasunori Yamamoto; Hongyan Wu; Peter Amstutz; Erick Antezana; Nobuyuki P Aoki; Kazuharu Arakawa; Jerven T Bolleman; Evan Bolton; Raoul J P Bonnal; Hidemasa Bono; Kees Burger; Hirokazu Chiba; Kevin B Cohen; Eric W Deutsch; Jesualdo T Fernández-Breis; Gang Fu; Takatomo Fujisawa; Atsushi Fukushima; Alexander García; Naohisa Goto; Tudor Groza; Colin Hercus; Robert Hoehndorf; Kotone Itaya; Nick Juty; Takeshi Kawashima; Jee-Hyub Kim; Akira R Kinjo; Masaaki Kotera; Kouji Kozaki; Sadahiro Kumagai; Tatsuya Kushida; Thomas Lütteke; Masaaki Matsubara; Joe Miyamoto; Attayeb Mohsen; Hiroshi Mori; Yuki Naito; Takeru Nakazato; Jeremy Nguyen-Xuan; Kozo Nishida; Naoki Nishida; Hiroyo Nishide; Soichi Ogishima; Tazro Ohta; Shujiro Okuda; Benedict Paten; Jean-Luc Perret; Philip Prathipati; Pjotr Prins; Núria Queralt-Rosinach; Daisuke Shinmachi; Shinya Suzuki; Tsuyosi Tabata; Terue Takatsuki; Kieron Taylor; Mark Thompson; Ikuo Uchiyama; Bruno Vieira; Chih-Hsuan Wei; Mark Wilkinson; Issaku Yamada; Ryota Yamanaka; Kazutoshi Yoshitake; Akiyasu C Yoshizawa; Michel Dumontier; Kenjiro Kosaki; Toshihisa Takagi
Journal: F1000Res Date: 2020-02-24

8. Natural Language Processing Based Instrument for Classification of Free Text Medical Records.

Authors: Manana Khachidze; Magda Tsintsadze; Maia Archuadze
Journal: Biomed Res Int Date: 2016-09-07 Impact factor: 3.411

9. Cohort Selection for Clinical Trials From Longitudinal Patient Records: Text Mining Approach.

Authors: Irena Spasic; Dominik Krzeminski; Padraig Corcoran; Alexander Balinsky
Journal: JMIR Med Inform Date: 2019-10-31

9 in total