Literature DB >> 30266231

Predicting of anaphylaxis in big data EMR by exploring machine learning approaches.

Isabel Segura-Bedmar1, Cristobal Colón-Ruíz2, Miguél Ángel Tejedor-Alonso3, Mar Moro-Moro4.   

Abstract

Anaphylaxis is a life-threatening allergic reaction that occurs suddenly after contact with an allergen. Epidemiological studies about anaphylaxis are very important in planning and evaluating new strategies that prevent this reaction, but also in providing a guide to the treatment of patients who have just suffered an anaphylactic reaction. Electronic Medical Records (EMR) are one of the most effective and richest sources for the epidemiology of anaphylaxis, because they provide a low-cost way of accessing rich longitudinal data on large populations. However, a negative aspect is that researchers have to manually review a huge amount of information, which is a very costly and highly time consuming task. Therefore, our goal is to explore different machine learning techniques to process Big Data EMR, lessening the needed efforts for performing epidemiological studies about anaphylaxis. In particular, we aim to study the incidence of anaphylaxis by the automatic classification of EMR. To do this, we employ the most widely used and efficient classifiers in text classification and compare different document representations, which range from well-known methods such as Bag Of Words (BoW) to more recent ones based on word embedding models, such as a simple average of word embeddings or a bag of centroids of word embeddings. Because the identification of anaphylaxis cases in EMR is a class-imbalanced problem (less than 1% describe anaphylaxis cases), we employ a novel undersampling technique based on clustering to balance our dataset. In addition to classical machine learning algorithms, we also use a Convolutional Neural Network (CNN) to classify our dataset. In general, experiments show that the most classifiers and representations are effective (F1 above 90%). Logistic Regression, Linear SVM, Multilayer Perceptron and Random Forest achieve an F1 around 95%, however linear methods have considerably lower training times. CNN provides slightly better performance (F1 = 95.6%).
Copyright © 2018 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Anaphylaxis; Bag of centroids; Balancing strategies; EMR classification; Machine learning

Mesh:

Year:  2018        PMID: 30266231     DOI: 10.1016/j.jbi.2018.09.012

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  5 in total

1.  Advances in drug allergy, urticaria, angioedema, and anaphylaxis in 2018.

Authors:  Rachel L Miller; Maria Shtessel; Lacey B Robinson; Aleena Banerji
Journal:  J Allergy Clin Immunol       Date:  2019-06-24       Impact factor: 10.793

2.  FasTag: Automatic text classification of unstructured medical narratives.

Authors:  Guhan Ram Venkataraman; Arturo Lopez Pineda; Oliver J Bear Don't Walk Iv; Ashley M Zehnder; Sandeep Ayyar; Rodney L Page; Carlos D Bustamante; Manuel A Rivas
Journal:  PLoS One       Date:  2020-06-22       Impact factor: 3.240

3.  Cohort selection for clinical trials using deep learning models.

Authors:  Isabel Segura-Bedmar; Pablo Raez
Journal:  J Am Med Inform Assoc       Date:  2019-11-01       Impact factor: 4.497

4.  A novel surgical predictive model for Chinese Crohn's disease patients.

Authors:  Yuan Dong; Li Xu; Yihong Fan; Ping Xiang; Xuning Gao; Yong Chen; Wenyu Zhang; Qiongxiang Ge
Journal:  Medicine (Baltimore)       Date:  2019-11       Impact factor: 1.817

5.  A Predictive Model Based on Machine Learning for the Early Detection of Late-Onset Neonatal Sepsis: Development and Observational Study.

Authors:  Wongeun Song; Se Young Jung; Hyunyoung Baek; Chang Won Choi; Young Hwa Jung; Sooyoung Yoo
Journal:  JMIR Med Inform       Date:  2020-07-31
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.