Ming Yang1, Melody Kiang2, Wei Shang3. 1. Department of Information Management, School of Information, Central University of Finance and Economics, Beijing 100081, China. Electronic address: yangming@cufe.edu.cn. 2. Department of Information Systems, California State University, Long Beach, CA 90840, United States. Electronic address: mkiang@csulb.edu. 3. Academy of Mathematics and Systems Science, Beijing 100190, China. Electronic address: shangwei@amss.ac.cn.
Abstract
OBJECTIVES: Adverse drug reactions (ADRs) are believed to be a leading cause of death in the world. Pharmacovigilance systems are aimed at early detection of ADRs. With the popularity of social media, Web forums and discussion boards become important sources of data for consumers to share their drug use experience, as a result may provide useful information on drugs and their adverse reactions. In this study, we propose an automated ADR related posts filtering mechanism using text classification methods. In real-life settings, ADR related messages are highly distributed in social media, while non-ADR related messages are unspecific and topically diverse. It is expensive to manually label a large amount of ADR related messages (positive examples) and non-ADR related messages (negative examples) to train classification systems. To mitigate this challenge, we examine the use of a partially supervised learning classification method to automate the process. METHODS: We propose a novel pharmacovigilance system leveraging a Latent Dirichlet Allocation modeling module and a partially supervised classification approach. We select drugs with more than 500 threads of discussion, and collect all the original posts and comments of these drugs using an automatic Web spidering program as the text corpus. Various classifiers were trained by varying the number of positive examples and the number of topics. The trained classifiers were applied to 3000 posts published over 60 days. Top-ranked posts from each classifier were pooled and the resulting set of 300 posts was reviewed by a domain expert to evaluate the classifiers. RESULTS: Compare to the alternative approaches using supervised learning methods and three general purpose partially supervised learning methods, our approach performs significantly better in terms of precision, recall, and the F measure (the harmonic mean of precision and recall), based on a computational experiment using online discussion threads from Medhelp. CONCLUSIONS: Our design provides satisfactory performance in identifying ADR related posts for post-marketing drug surveillance. The overall design of our system also points out a potentially fruitful direction for building other early warning systems that need to filter big data from social media networks.
OBJECTIVES: Adverse drug reactions (ADRs) are believed to be a leading cause of death in the world. Pharmacovigilance systems are aimed at early detection of ADRs. With the popularity of social media, Web forums and discussion boards become important sources of data for consumers to share their drug use experience, as a result may provide useful information on drugs and their adverse reactions. In this study, we propose an automated ADR related posts filtering mechanism using text classification methods. In real-life settings, ADR related messages are highly distributed in social media, while non-ADR related messages are unspecific and topically diverse. It is expensive to manually label a large amount of ADR related messages (positive examples) and non-ADR related messages (negative examples) to train classification systems. To mitigate this challenge, we examine the use of a partially supervised learning classification method to automate the process. METHODS: We propose a novel pharmacovigilance system leveraging a Latent Dirichlet Allocation modeling module and a partially supervised classification approach. We select drugs with more than 500 threads of discussion, and collect all the original posts and comments of these drugs using an automatic Web spidering program as the text corpus. Various classifiers were trained by varying the number of positive examples and the number of topics. The trained classifiers were applied to 3000 posts published over 60 days. Top-ranked posts from each classifier were pooled and the resulting set of 300 posts was reviewed by a domain expert to evaluate the classifiers. RESULTS: Compare to the alternative approaches using supervised learning methods and three general purpose partially supervised learning methods, our approach performs significantly better in terms of precision, recall, and the F measure (the harmonic mean of precision and recall), based on a computational experiment using online discussion threads from Medhelp. CONCLUSIONS: Our design provides satisfactory performance in identifying ADR related posts for post-marketing drug surveillance. The overall design of our system also points out a potentially fruitful direction for building other early warning systems that need to filter big data from social media networks.
Authors: Gregory E Powell; Harry A Seifert; Tjark Reblin; Phil J Burstein; James Blowers; J Alan Menius; Jeffery L Painter; Michele Thomas; Carrie E Pierce; Harold W Rodriguez; John S Brownstein; Clark C Freifeld; Heidi G Bell; Nabarun Dasgupta Journal: Drug Saf Date: 2016-05 Impact factor: 5.606
Authors: Susan Colilla; Elad Yom Tov; Ling Zhang; Marie-Laure Kurzinger; Stephanie Tcherny-Lessenot; Catherine Penfornis; Shang Jen; Danny S Gonzalez; Patrick Caubel; Susan Welsh; Juhaeri Juhaeri Journal: Drug Saf Date: 2017-05 Impact factor: 5.606
Authors: Robert M Cronin; Daniel Fabbri; Joshua C Denny; S Trent Rosenbloom; Gretchen Purcell Jackson Journal: Int J Med Inform Date: 2017-06-23 Impact factor: 4.046