Sean D Young1, Wenchao Yu, Wei Wang. 1. *Department of Family Medicine, David Geffen School of Medicine, University of California, Los Angeles, CA; †University of California Institute for Prediction Technology, University of California, Los Angeles, CA; and ‡Department of Computer Science, University of California, Los Angeles, CA.
Abstract
INTRODUCTION: "Social big data" from technologies such as social media, wearable devices, and online searches continue to grow and can be used as tools for HIV research. Although researchers can uncover patterns and insights associated with HIV trends and transmission, the review process is time consuming and resource intensive. Machine learning methods derived from computer science might be used to assist HIV domain experts by learning how to rapidly and accurately identify patterns associated with HIV from a large set of social data. METHODS: Using an existing social media data set that was associated with HIV and coded by an HIV domain expert, we tested whether 4 commonly used machine learning methods could learn the patterns associated with HIV risk behavior. We used the 10-fold cross-validation method to examine the speed and accuracy of these models in applying that knowledge to detect HIV content in social media data. RESULTS AND DISCUSSION: Logistic regression and random forest resulted in the highest accuracy in detecting HIV-related social data (85.3%), whereas the Ridge Regression Classifier resulted in the lowest accuracy. Logistic regression yielded the fastest processing time (16.98 seconds). CONCLUSIONS: Machine learning can enable social big data to become a new and important tool in HIV research, helping to create a new field of "digital HIV epidemiology." If a domain expert can identify patterns in social data associated with HIV risk or HIV transmission, machine learning models could quickly and accurately learn those associations and identify potential HIV patterns in large social data sets.
INTRODUCTION: "Social big data" from technologies such as social media, wearable devices, and online searches continue to grow and can be used as tools for HIV research. Although researchers can uncover patterns and insights associated with HIV trends and transmission, the review process is time consuming and resource intensive. Machine learning methods derived from computer science might be used to assist HIV domain experts by learning how to rapidly and accurately identify patterns associated with HIV from a large set of social data. METHODS: Using an existing social media data set that was associated with HIV and coded by an HIV domain expert, we tested whether 4 commonly used machine learning methods could learn the patterns associated with HIV risk behavior. We used the 10-fold cross-validation method to examine the speed and accuracy of these models in applying that knowledge to detect HIV content in social media data. RESULTS AND DISCUSSION: Logistic regression and random forest resulted in the highest accuracy in detecting HIV-related social data (85.3%), whereas the Ridge Regression Classifier resulted in the lowest accuracy. Logistic regression yielded the fastest processing time (16.98 seconds). CONCLUSIONS: Machine learning can enable social big data to become a new and important tool in HIV research, helping to create a new field of "digital HIV epidemiology." If a domain expert can identify patterns in social data associated with HIV risk or HIV transmission, machine learning models could quickly and accurately learn those associations and identify potential HIV patterns in large social data sets.
Authors: Francisco Jose Grajales; Samuel Sheps; Kendall Ho; Helen Novak-Lauscher; Gunther Eysenbach Journal: J Med Internet Res Date: 2014-02-11 Impact factor: 5.428
Authors: Jon-Patrick Allem; Eric C Leas; Theodore L Caputi; Mark Dredze; Benjamin M Althouse; Seth M Noar; John W Ayers Journal: Prev Sci Date: 2017-07
Authors: Yang Xiang; Kayo Fujimoto; Fang Li; Qing Wang; Natascha Del Vecchio; John Schneider; Degui Zhi; Cui Tao Journal: AIDS Date: 2021-05-01 Impact factor: 4.632