Literature DB >> 32942027

Learning structured medical information from social media.

Abul Hasan1, Mark Levene2, David Weston3.   

Abstract

Our goal is to summarise and aggregate information from social media regarding the symptoms of a disease, the drugs used and the treatment effects both positive and negative. To achieve this we first apply a supervised machine learning method to automatically extract medical concepts from natural language text. In an environment such as social media, where new data is continuously streamed, we need a methodology that will allow us to continuously train with the new data. To attain such incremental re-training, a semi-supervised methodology is developed, which is capable of learning new concepts from a small set of labelled data together with the much larger set of unlabelled data. The semi-supervised methodology deploys a conditional random field (CRF) as the base-line training algorithm for extracting medical concepts. The methodology iteratively augments to the training set sentences having high confidence, and adds terms to existing dictionaries to be used as features with the base-line model for further classification. Our empirical results show that the base-line CRF performs strongly across a range of different dictionary and training sizes; when the base-line is built with the full training data the F1 score reaches the range 84%-90%. Moreover, we show that the semi-supervised method produces a mild but significant improvement over the base-line. We also discuss the significance of the potential improvement of the semi-supervised methodology and found that it is significantly more accurate in most cases than the underlying base-line model. Crown
Copyright © 2020. Published by Elsevier Inc. All rights reserved.

Keywords:  Conditional random fields; Medical concept extraction; Pharmacovigilance; Semi-supervised algorithm; Social media mining

Mesh:

Year:  2020        PMID: 32942027     DOI: 10.1016/j.jbi.2020.103568

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  2 in total

1.  Monitoring COVID-19 on Social Media: Development of an End-to-End Natural Language Processing Pipeline Using a Novel Triage and Diagnosis Approach.

Authors:  Abul Hasan; Mark Levene; David Weston; Renate Fromson; Nicolas Koslover; Tamara Levene
Journal:  J Med Internet Res       Date:  2022-02-28       Impact factor: 7.076

2.  Longitudinal Changes of COVID-19 Symptoms in Social Media: Observational Study.

Authors:  Sarah Sarabadani; Gaurav Baruah; Yan Fossat; Jouhyun Jeon
Journal:  J Med Internet Res       Date:  2022-02-16       Impact factor: 5.428

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.