Jina Huh1, Meliha Yetisgen-Yildiz, Wanda Pratt. 1. Department of Telecommunication, Information Studies, and Media, Michigan State University, 404 Wilson Rd, Rm 409, East Lansing, MI 48864, USA. Electronic address: jinahuh@msu.edu.
Abstract
OBJECTIVES: Patients increasingly visit online health communities to get help on managing health. The large scale of these online communities makes it impossible for the moderators to engage in all conversations; yet, some conversations need their expertise. Our work explores low-cost text classification methods to this new domain of determining whether a thread in an online health forum needs moderators' help. METHODS: We employed a binary classifier on WebMD's online diabetes community data. To train the classifier, we considered three feature types: (1) word unigram, (2) sentiment analysis features, and (3) thread length. We applied feature selection methods based on χ² statistics and under sampling to account for unbalanced data. We then performed a qualitative error analysis to investigate the appropriateness of the gold standard. RESULTS: Using sentiment analysis features, feature selection methods, and balanced training data increased the AUC value up to 0.75 and the F1-score up to 0.54 compared to the baseline of using word unigrams with no feature selection methods on unbalanced data (0.65 AUC and 0.40 F1-score). The error analysis uncovered additional reasons for why moderators respond to patients' posts. DISCUSSION: We showed how feature selection methods and balanced training data can improve the overall classification performance. We present implications of weighing precision versus recall for assisting moderators of online health communities. Our error analysis uncovered social, legal, and ethical issues around addressing community members' needs. We also note challenges in producing a gold standard, and discuss potential solutions for addressing these challenges. CONCLUSION: Social media environments provide popular venues in which patients gain health-related information. Our work contributes to understanding scalable solutions for providing moderators' expertise in these large-scale, social media environments.
OBJECTIVES:Patients increasingly visit online health communities to get help on managing health. The large scale of these online communities makes it impossible for the moderators to engage in all conversations; yet, some conversations need their expertise. Our work explores low-cost text classification methods to this new domain of determining whether a thread in an online health forum needs moderators' help. METHODS: We employed a binary classifier on WebMD's online diabetes community data. To train the classifier, we considered three feature types: (1) word unigram, (2) sentiment analysis features, and (3) thread length. We applied feature selection methods based on χ² statistics and under sampling to account for unbalanced data. We then performed a qualitative error analysis to investigate the appropriateness of the gold standard. RESULTS: Using sentiment analysis features, feature selection methods, and balanced training data increased the AUC value up to 0.75 and the F1-score up to 0.54 compared to the baseline of using word unigrams with no feature selection methods on unbalanced data (0.65 AUC and 0.40 F1-score). The error analysis uncovered additional reasons for why moderators respond to patients' posts. DISCUSSION: We showed how feature selection methods and balanced training data can improve the overall classification performance. We present implications of weighing precision versus recall for assisting moderators of online health communities. Our error analysis uncovered social, legal, and ethical issues around addressing community members' needs. We also note challenges in producing a gold standard, and discuss potential solutions for addressing these challenges. CONCLUSION: Social media environments provide popular venues in which patients gain health-related information. Our work contributes to understanding scalable solutions for providing moderators' expertise in these large-scale, social media environments.
Authors: Adam Wright; Allison B McCoy; Stanislav Henkin; Abhivyakti Kale; Dean F Sittig Journal: J Am Med Inform Assoc Date: 2013-03-30 Impact factor: 4.497
Authors: William H Polonsky; Lawrence Fisher; Jay Earles; R James Dudl; Joel Lees; Joseph Mullan; Richard A Jackson Journal: Diabetes Care Date: 2005-03 Impact factor: 19.112
Authors: Robert M Cronin; Daniel Fabbri; Joshua C Denny; S Trent Rosenbloom; Gretchen Purcell Jackson Journal: Int J Med Inform Date: 2017-06-23 Impact factor: 4.046
Authors: Bum Chul Kwon; Sung-Hee Kim; Sukwon Lee; Jaegul Choo; Jina Huh; Ji Soo Yi Journal: IEEE Trans Vis Comput Graph Date: 2016-01 Impact factor: 4.579