Zhijun Yin1, Lina M Sulieman1, Bradley A Malin1,2,3. 1. Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA. 2. Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA. 3. Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, Tennessee, USA.
Abstract
OBJECTIVE: User-generated content (UGC) in online environments provides opportunities to learn an individual's health status outside of clinical settings. However, the nature of UGC brings challenges in both data collecting and processing. The purpose of this study is to systematically review the effectiveness of applying machine learning (ML) methodologies to UGC for personal health investigations. MATERIALS AND METHODS: We searched PubMed, Web of Science, IEEE Library, ACM library, AAAI library, and the ACL anthology. We focused on research articles that were published in English and in peer-reviewed journals or conference proceedings between 2010 and 2018. Publications that applied ML to UGC with a focus on personal health were identified for further systematic review. RESULTS: We identified 103 eligible studies which we summarized with respect to 5 research categories, 3 data collection strategies, 3 gold standard dataset creation methods, and 4 types of features applied in ML models. Popular off-the-shelf ML models were logistic regression (n = 22), support vector machines (n = 18), naive Bayes (n = 17), ensemble learning (n = 12), and deep learning (n = 11). The most investigated problems were mental health (n = 39) and cancer (n = 15). Common health-related aspects extracted from UGC were treatment experience, sentiments and emotions, coping strategies, and social support. CONCLUSIONS: The systematic review indicated that ML can be effectively applied to UGC in facilitating the description and inference of personal health. Future research needs to focus on mitigating bias introduced when building study cohorts, creating features from free text, improving clinical creditability of UGC, and model interpretability.
OBJECTIVE: User-generated content (UGC) in online environments provides opportunities to learn an individual's health status outside of clinical settings. However, the nature of UGC brings challenges in both data collecting and processing. The purpose of this study is to systematically review the effectiveness of applying machine learning (ML) methodologies to UGC for personal health investigations. MATERIALS AND METHODS: We searched PubMed, Web of Science, IEEE Library, ACM library, AAAI library, and the ACL anthology. We focused on research articles that were published in English and in peer-reviewed journals or conference proceedings between 2010 and 2018. Publications that applied ML to UGC with a focus on personal health were identified for further systematic review. RESULTS: We identified 103 eligible studies which we summarized with respect to 5 research categories, 3 data collection strategies, 3 gold standard dataset creation methods, and 4 types of features applied in ML models. Popular off-the-shelf ML models were logistic regression (n = 22), support vector machines (n = 18), naive Bayes (n = 17), ensemble learning (n = 12), and deep learning (n = 11). The most investigated problems were mental health (n = 39) and cancer (n = 15). Common health-related aspects extracted from UGC were treatment experience, sentiments and emotions, coping strategies, and social support. CONCLUSIONS: The systematic review indicated that ML can be effectively applied to UGC in facilitating the description and inference of personal health. Future research needs to focus on mitigating bias introduced when building study cohorts, creating features from free text, improving clinical creditability of UGC, and model interpretability.
Authors: Michael L Birnbaum; Sindhu Kiranmai Ernala; Asra F Rizvi; Munmun De Choudhury; John M Kane Journal: J Med Internet Res Date: 2017-08-14 Impact factor: 5.428
Authors: Robin Huang; Na Liu; Mary Ann Nicdao; Mary Mikaheal; Tanya Baldacchino; Annabelle Albeos; Kathy Petoumenos; Kamal Sud; Jinman Kim Journal: J Am Med Inform Assoc Date: 2020-02-01 Impact factor: 4.497
Authors: Sara Goering; Eran Klein; Laura Specker Sullivan; Anna Wexler; Blaise Agüera Y Arcas; Guoqiang Bi; Jose M Carmena; Joseph J Fins; Phoebe Friesen; Jack Gallant; Jane E Huggins; Philipp Kellmeyer; Adam Marblestone; Christine Mitchell; Erik Parens; Michelle Pham; Alan Rubel; Norihiro Sadato; Mina Teicher; David Wasserman; Meredith Whittaker; Jonathan Wolpaw; Rafael Yuste Journal: Neuroethics Date: 2021-04-29 Impact factor: 1.427