Tingting Wang1, Zhengxing Huang2, Chenxi Gan3. 1. Second Affiliated Hospital, School of Medicine, Zhejiang University, 88 Jiefang Road, Hangzhou, China. 2. College of Biomedical Engineering and Instrument Science, Zhejiang University, Zhou Yiqin Building 512, Zheda Road 38#, Hangzhou, 310008 Zhejiang, China. Electronic address: zhengxinghuang@zju.edu.cn. 3. College of Biomedical Engineering and Instrument Science, Zhejiang University, Zhou Yiqin Building 512, Zheda Road 38#, Hangzhou, 310008 Zhejiang, China.
Abstract
BACKGROUND: Public and internet-based social media such as online healthcare-oriented chat groups provide a convenient channel for patients and people concerned about health to communicate and share information with each other. The chat logs of an online healthcare-oriented chat group can potentially be used to extract latent topics, to encourage participation, and to recommend relevant healthcare information to users. OBJECTIVE: This paper addresses the use of online healthcare chat logs to automatically discover both underlying topics and user interests. METHOD: We present a new probabilistic model that exploits healthcare chat logs to find hidden topics and changes in these topics over time. The proposed model uses separate but associated hidden variables to explore both topics and individual interests such that it can provide useful insights to the participants of online healthcare chat groups about their interests in terms of weighted topics or vice versa. RESULTS: We evaluate the proposed model on a real-world chat log by comparing its performance to benchmark topic models, i.e., latent Dirichlet allocation (LDA) and Author Topic Model (ATM), on the topic extraction task. The chat log is obtained from an online chat group of pregnant women, which consists of 233,452 chat word tokens contributed by 118 users. Both detected individual interests and underlying topics with their progressive information over time are demonstrated. The results show that the performance of the proposed model exceeds that of the benchmark models. CONCLUSION: The experimental results illustrate that the proposed model is a promising method for extracting healthcare knowledge from social media data.
BACKGROUND: Public and internet-based social media such as online healthcare-oriented chat groups provide a convenient channel for patients and people concerned about health to communicate and share information with each other. The chat logs of an online healthcare-oriented chat group can potentially be used to extract latent topics, to encourage participation, and to recommend relevant healthcare information to users. OBJECTIVE: This paper addresses the use of online healthcare chat logs to automatically discover both underlying topics and user interests. METHOD: We present a new probabilistic model that exploits healthcare chat logs to find hidden topics and changes in these topics over time. The proposed model uses separate but associated hidden variables to explore both topics and individual interests such that it can provide useful insights to the participants of online healthcare chat groups about their interests in terms of weighted topics or vice versa. RESULTS: We evaluate the proposed model on a real-world chat log by comparing its performance to benchmark topic models, i.e., latent Dirichlet allocation (LDA) and Author Topic Model (ATM), on the topic extraction task. The chat log is obtained from an online chat group of pregnant women, which consists of 233,452 chat word tokens contributed by 118 users. Both detected individual interests and underlying topics with their progressive information over time are demonstrated. The results show that the performance of the proposed model exceeds that of the benchmark models. CONCLUSION: The experimental results illustrate that the proposed model is a promising method for extracting healthcare knowledge from social media data.