Literature DB >> 33449904

Toward Using Twitter for Tracking COVID-19: A Natural Language Processing Pipeline and Exploratory Data Set.

Ari Z Klein1, Arjun Magge1, Karen O'Connor1, Jesus Ivan Flores Amaro1, Davy Weissenbacher1, Graciela Gonzalez Hernandez1.   

Abstract

BACKGROUND: In the United States, the rapidly evolving COVID-19 outbreak, the shortage of available testing, and the delay of test results present challenges for actively monitoring its spread based on testing alone.
OBJECTIVE: The objective of this study was to develop, evaluate, and deploy an automatic natural language processing pipeline to collect user-generated Twitter data as a complementary resource for identifying potential cases of COVID-19 in the United States that are not based on testing and, thus, may not have been reported to the Centers for Disease Control and Prevention.
METHODS: Beginning January 23, 2020, we collected English tweets from the Twitter Streaming application programming interface that mention keywords related to COVID-19. We applied handwritten regular expressions to identify tweets indicating that the user potentially has been exposed to COVID-19. We automatically filtered out "reported speech" (eg, quotations, news headlines) from the tweets that matched the regular expressions, and two annotators annotated a random sample of 8976 tweets that are geo-tagged or have profile location metadata, distinguishing tweets that self-report potential cases of COVID-19 from those that do not. We used the annotated tweets to train and evaluate deep neural network classifiers based on bidirectional encoder representations from transformers (BERT). Finally, we deployed the automatic pipeline on more than 85 million unlabeled tweets that were continuously collected between March 1 and August 21, 2020.
RESULTS: Interannotator agreement, based on dual annotations for 3644 (41%) of the 8976 tweets, was 0.77 (Cohen κ). A deep neural network classifier, based on a BERT model that was pretrained on tweets related to COVID-19, achieved an F1-score of 0.76 (precision=0.76, recall=0.76) for detecting tweets that self-report potential cases of COVID-19. Upon deploying our automatic pipeline, we identified 13,714 tweets that self-report potential cases of COVID-19 and have US state-level geolocations.
CONCLUSIONS: We have made the 13,714 tweets identified in this study, along with each tweet's time stamp and US state-level geolocation, publicly available to download. This data set presents the opportunity for future work to assess the utility of Twitter data as a complementary resource for tracking the spread of COVID-19. ©Ari Z Klein, Arjun Magge, Karen O'Connor, Jesus Ivan Flores Amaro, Davy Weissenbacher, Graciela Gonzalez Hernandez. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 22.01.2021.

Entities:  

Keywords:  COVID-19; coronavirus; data mining; epidemiology; infodemiology; natural language processing; pandemics; social media

Mesh:

Year:  2021        PMID: 33449904      PMCID: PMC7834613          DOI: 10.2196/25314

Source DB:  PubMed          Journal:  J Med Internet Res        ISSN: 1438-8871            Impact factor:   5.428


  10 in total

1.  Understanding interobserver agreement: the kappa statistic.

Authors:  Anthony J Viera; Joanne M Garrett
Journal:  Fam Med       Date:  2005-05       Impact factor: 1.756

2.  Social Media and Emergency Preparedness in Response to Novel Coronavirus.

Authors:  Raina M Merchant; Nicole Lurie
Journal:  JAMA       Date:  2020-05-26       Impact factor: 56.272

3.  Mining twitter to explore the emergence of COVID-19 symptoms.

Authors:  Jia-Wen Guo; Christina L Radloff; Sarah E Wawrzynski; Kristin G Cloyes
Journal:  Public Health Nurs       Date:  2020-09-16       Impact factor: 1.462

4.  Tracking Mental Health and Symptom Mentions on Twitter During COVID-19.

Authors:  Sharath Chandra Guntuku; Garrick Sherman; Daniel C Stokes; Anish K Agarwal; Emily Seltzer; Raina M Merchant; Lyle H Ungar
Journal:  J Gen Intern Med       Date:  2020-07-07       Impact factor: 5.128

5.  Self-reported COVID-19 symptoms on Twitter: an analysis and a research resource.

Authors:  Abeed Sarker; Sahithi Lakamana; Whitney Hogg-Bremer; Angel Xie; Mohammed Ali Al-Garadi; Yuan-Chi Yang
Journal:  J Am Med Inform Assoc       Date:  2020-08-01       Impact factor: 4.497

6.  Machine Learning to Detect Self-Reporting of Symptoms, Testing Access, and Recovery Associated With COVID-19 on Twitter: Retrospective Big Data Infoveillance Study.

Authors:  Tim Mackey; Vidya Purushothaman; Jiawei Li; Neal Shah; Matthew Nali; Cortni Bardier; Bryan Liang; Mingxiang Cai; Raphael Cuomo
Journal:  JMIR Public Health Surveill       Date:  2020-06-08

7.  Identification of Risk Factors and Symptoms of COVID-19: Analysis of Biomedical Literature and Social Media Data.

Authors:  Jouhyun Jeon; Gaurav Baruah; Sarah Sarabadani; Adam Palanica
Journal:  J Med Internet Res       Date:  2020-10-02       Impact factor: 5.428

8.  Real-time tracking of self-reported symptoms to predict potential COVID-19.

Authors:  Cristina Menni; Ana M Valdes; Claire J Steves; Tim D Spector; Maxim B Freidin; Carole H Sudre; Long H Nguyen; David A Drew; Sajaysurya Ganesh; Thomas Varsavsky; M Jorge Cardoso; Julia S El-Sayed Moustafa; Alessia Visconti; Pirro Hysi; Ruth C E Bowyer; Massimo Mangino; Mario Falchi; Jonathan Wolf; Sebastien Ourselin; Andrew T Chan
Journal:  Nat Med       Date:  2020-05-11       Impact factor: 53.440

9.  Predicting COVID-19 Incidence Using Anosmia and Other COVID-19 Symptomatology: Preliminary Analysis Using Google and Twitter.

Authors:  Bharat A Panuganti; Aria Jafari; Bridget MacDonald; Adam S DeConde
Journal:  Otolaryngol Head Neck Surg       Date:  2020-06-02       Impact factor: 3.497

10.  The Incubation Period of Coronavirus Disease 2019 (COVID-19) From Publicly Reported Confirmed Cases: Estimation and Application.

Authors:  Stephen A Lauer; Kyra H Grantz; Qifang Bi; Forrest K Jones; Qulu Zheng; Hannah R Meredith; Andrew S Azman; Nicholas G Reich; Justin Lessler
Journal:  Ann Intern Med       Date:  2020-03-10       Impact factor: 25.391

  10 in total
  7 in total

1.  Using Twitter data to understand public perceptions of approved versus off-label use for COVID-19-related medications.

Authors:  Yining Hua; Hang Jiang; Shixu Lin; Jie Yang; Joseph M Plasek; David W Bates; Li Zhou
Journal:  J Am Med Inform Assoc       Date:  2022-09-12       Impact factor: 7.942

2.  A chronological and geographical analysis of personal reports of COVID-19 on Twitter from the UK.

Authors:  Su Golder; Ari Z Klein; Arjun Magge; Karen O'Connor; Haitao Cai; Davy Weissenbacher; Graciela Gonzalez-Hernandez
Journal:  Digit Health       Date:  2022-05-05

3.  SEED: Symptom Extraction from English Social Media Posts using Deep Learning and Transfer Learning.

Authors:  Arjun Magge; Davy Weissenbacher; Karen Oâ Connor; Matthew Scotch; Graciela Gonzalez-Hernandez
Journal:  medRxiv       Date:  2022-03-21

4.  Identifying COVID-19 Outbreaks From Contact-Tracing Interview Forms for Public Health Departments: Development of a Natural Language Processing Pipeline.

Authors:  John Caskey; Iain L McConnell; Madeline Oguss; Dmitriy Dligach; Rachel Kulikoff; Brittany Grogan; Crystal Gibson; Elizabeth Wimmer; Traci E DeSalvo; Edwin E Nyakoe-Nyasani; Matthew M Churpek; Majid Afshar
Journal:  JMIR Public Health Surveill       Date:  2022-03-08

5.  Identifying the Perceived Severity of Patient-Generated Telemedical Queries Regarding COVID: Developing and Evaluating a Transfer Learning-Based Solution.

Authors:  Joseph Gatto; Parker Seegmiller; Garrett Johnston; Sarah Masud Preum
Journal:  JMIR Med Inform       Date:  2022-09-02

6.  Perspectives of the COVID-19 Pandemic on Reddit: Comparative Natural Language Processing Study of the United States, the United Kingdom, Canada, and Australia.

Authors:  Mengke Hu; Mike Conway
Journal:  JMIR Infodemiology       Date:  2022-09-27

7.  Agenda-Setting for COVID-19: A Study of Large-Scale Economic News Coverage Using Natural Language Processing.

Authors:  Guang Lu; Martin Businger; Christian Dollfus; Thomas Wozniak; Matthes Fleck; Timo Heroth; Irina Lock; Janna Lipenkova
Journal:  Int J Data Sci Anal       Date:  2022-10-06
  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.