| Literature DB >> 35308985 |
Yunpeng Zhao1, Pengfei Yin1, Yongqiu Li1, Xing He1, Jingcheng Du2, Cui Tao2, Yi Guo1, Mattia Prosperi1, Pierangelo Veltri3, Xi Yang1, Yonghui Wu1, Jiang Bian1.
Abstract
During the coronavirus disease pandemic (COVID-19), social media platforms such as Twitter have become a venue for individuals, health professionals, and government agencies to share COVID-19 information. Twitter has been a popular source of data for researchers, especially for public health studies. However, the use of Twitter data for research also has drawbacks and barriers. Biases appear everywhere from data collection methods to modeling approaches, and those biases have not been systematically assessed. In this study, we examined six different data collection methods and three different machine learning (ML) models-commonly used in social media analysis-to assess data collection bias and measure ML models' sensitivity to data collection bias. We showed that (1) publicly available Twitter data collection endpoints with appropriate strategies can collect data that is reasonably representative of the Twitter universe; and (2) careful examinations of ML models' sensitivity to data collection bias are critical. ©2021 AMIA - All rights reserved.Entities:
Mesh:
Year: 2022 PMID: 35308985 PMCID: PMC8861742
Source DB: PubMed Journal: AMIA Annu Symp Proc ISSN: 1559-4076