Literature DB >> 24642081

An ensemble heterogeneous classification methodology for discovering health-related knowledge in social media messages.

Suppawong Tuarob1, Conrad S Tucker2, Marcel Salathe3, Nilam Ram4.   

Abstract

OBJECTIVES: The role of social media as a source of timely and massive information has become more apparent since the era of Web 2.0.Multiple studies illustrated the use of information in social media to discover biomedical and health-related knowledge.Most methods proposed in the literature employ traditional document classification techniques that represent a document as a bag of words.These techniques work well when documents are rich in text and conform to standard English; however, they are not optimal for social media data where sparsity and noise are norms.This paper aims to address the limitations posed by the traditional bag-of-word based methods and propose to use heterogeneous features in combination with ensemble machine learning techniques to discover health-related information, which could prove to be useful to multiple biomedical applications, especially those needing to discover health-related knowledge in large scale social media data.Furthermore, the proposed methodology could be generalized to discover different types of information in various kinds of textual data.
METHODOLOGY: Social media data is characterized by an abundance of short social-oriented messages that do not conform to standard languages, both grammatically and syntactically.The problem of discovering health-related knowledge in social media data streams is then transformed into a text classification problem, where a text is identified as positive if it is health-related and negative otherwise.We first identify the limitations of the traditional methods which train machines with N-gram word features, then propose to overcome such limitations by utilizing the collaboration of machine learning based classifiers, each of which is trained to learn a semantically different aspect of the data.The parameter analysis for tuning each classifier is also reported. DATA SETS: Three data sets are used in this research.The first data set comprises of approximately 5000 hand-labeled tweets, and is used for cross validation of the classification models in the small scale experiment, and for training the classifiers in the real-world large scale experiment.The second data set is a random sample of real-world Twitter data in the US.The third data set is a random sample of real-world Facebook Timeline posts. EVALUATIONS: Two sets of evaluations are conducted to investigate the proposed model's ability to discover health-related information in the social media domain: small scale and large scale evaluations.The small scale evaluation employs 10-fold cross validation on the labeled data, and aims to tune parameters of the proposed models, and to compare with the stage-of-the-art method.The large scale evaluation tests the trained classification models on the native, real-world data sets, and is needed to verify the ability of the proposed model to handle the massive heterogeneity in real-world social media.
FINDINGS: The small scale experiment reveals that the proposed method is able to mitigate the limitations in the well established techniques existing in the literature, resulting in performance improvement of 18.61% (F-measure).The large scale experiment further reveals that the baseline fails to perform well on larger data with higher degrees of heterogeneity, while the proposed method is able to yield reasonably good performance and outperform the baseline by 46.62% (F-Measure) on average.
Copyright © 2014 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Classification; Machine learning; Social media

Mesh:

Year:  2014        PMID: 24642081     DOI: 10.1016/j.jbi.2014.03.005

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  20 in total

Review 1.  Utilizing social media data for pharmacovigilance: A review.

Authors:  Abeed Sarker; Rachel Ginn; Azadeh Nikfarjam; Karen O'Connor; Karen Smith; Swetha Jayaraman; Tejaswi Upadhaya; Graciela Gonzalez
Journal:  J Biomed Inform       Date:  2015-02-23       Impact factor: 6.317

2.  A Connectivity Framework for Social Information Systems Design in Healthcare.

Authors:  Craig E Kuziemsky; Pavel Andreev; Morad Benyoucef; Tracey O'Sullivan; Syam Jamaly
Journal:  AMIA Annu Symp Proc       Date:  2017-02-10

3.  Online health community experiences of sexual minority women with cancer.

Authors:  Young Ji Lee; Charles Kamen; Liz Margolies; Ulrike Boehmer
Journal:  J Am Med Inform Assoc       Date:  2019-08-01       Impact factor: 4.497

4.  Classification of Helpful Comments on Online Suicide Watch Forums.

Authors:  Ramakanth Kavuluru; Amanda G Williams; María Ramos-Morales; Laura Haye; Tara Holaday; Julie Cerel
Journal:  ACM BCB       Date:  2016-10

5.  Portable automatic text classification for adverse drug reaction detection via multi-corpus training.

Authors:  Abeed Sarker; Graciela Gonzalez
Journal:  J Biomed Inform       Date:  2014-11-08       Impact factor: 6.317

6.  A systematic literature review of machine learning in online personal health data.

Authors:  Zhijun Yin; Lina M Sulieman; Bradley A Malin
Journal:  J Am Med Inform Assoc       Date:  2019-06-01       Impact factor: 4.497

7.  HARNESSING SOCIAL MEDIA FOR HEALTH INFORMATION MANAGEMENT.

Authors:  Lina Zhou; Dongsong Zhang; Chris Yang; Yu Wang
Journal:  Electron Commer Res Appl       Date:  2017-12-29       Impact factor: 6.014

8.  Symptom clusters in women with breast cancer: an analysis of data from social media and a research study.

Authors:  Sarah A Marshall; Christopher C Yang; Qing Ping; Mengnan Zhao; Nancy E Avis; Edward H Ip
Journal:  Qual Life Res       Date:  2015-10-17       Impact factor: 4.147

9.  Deep learning for pollen allergy surveillance from twitter in Australia.

Authors:  Jia Rong; Sandra Michalska; Sudha Subramani; Jiahua Du; Hua Wang
Journal:  BMC Med Inform Decis Mak       Date:  2019-11-08       Impact factor: 2.796

10.  How are you feeling?: A personalized methodology for predicting mental states from temporally observable physical and behavioral information.

Authors:  Suppawong Tuarob; Conrad S Tucker; Soundar Kumara; C Lee Giles; Aaron L Pincus; David E Conroy; Nilam Ram
Journal:  J Biomed Inform       Date:  2017-02-15       Impact factor: 6.317

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.