| Literature DB >> 36174192 |
Chad A Melton1,2, Brianna M White2, Robert L Davis2, Robert A Bednarczyk3, Arash Shaban-Nejad2.
Abstract
BACKGROUND: The emergence of the novel coronavirus (COVID-19) and the necessary separation of populations have led to an unprecedented number of new social media users seeking information related to the pandemic. Currently, with an estimated 4.5 billion users worldwide, social media data offer an opportunity for near real-time analysis of large bodies of text related to disease outbreaks and vaccination. These analyses can be used by officials to develop appropriate public health messaging, digital interventions, educational materials, and policies.Entities:
Keywords: COVID-19; DistilRoBERTa; Reddit; Twitter; content analysis; infodemiology; information quality; misinformation; natural language processing; public health; sentiment analysis; social media; surveillance; vaccination; vaccine
Mesh:
Substances:
Year: 2022 PMID: 36174192 PMCID: PMC9578521 DOI: 10.2196/40408
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 7.076
DistilRoBERTa fine-tuning training metrics. The model obtained optimal fine-tuning after 2 training epochs.
| Step | Epoch | Training loss | Validation loss | Precision | Accuracy | |
| 500 | 0.4 | 0.5903 | 0.4695 | 0.7342 | 0.7728 | 0.7890 |
| 1000 | 0.8 | 0.3986 | 0.3469 | 0.8144 | 0.8596 | 0.8684 |
| 1500 | 1.2 | 0.2366 | 0.1939 | 0.9313 | 0.9260 | 0.9253 |
| 2000 | 1.6 | 0.1476 | 0.1560 | 0.9207 | 0.9452 | 0.9465 |
| 2500 | 2.0 | 0.1284 | 0.1167 | 0.9561 | 0.9592 | 0.9592 |
Figure 1Tweet polarity from the DistilRoBERTa model fine-tuned to COVID-19 vaccine. Polarity and the corresponding confidence probability are represented on the y-axis, and time is represented on the x-axis. Tweets are represented as light blue circles. Circle size indicates the number of likes per tweet—larger circles indicate more likes and smaller circles indicate fewer likes.
Figure 2Confidence score versus like count for Twitter. The x-axis represents the confidence score and the y-axis represents the number of likes a tweet received. Data points below 0.00 on the x-axis represent a negative classification, and data points above 0.00 represent a positive classification. Data points are represented as light blue circles.
Figure 3Reddit comment polarity from the DistilRoBERTa model fine-tuned to COVID-19 vaccine. Polarity and corresponding confidence probability are represented on the y-axis, and time is represented on the x-axis. Data points are represented as orange-red circles. Circle size indicates the number of upvotes per comment—more upvotes are represented by larger circles and fewer upvotes are represented by smaller circles.
Figure 4Confidence score versus like count for Reddit. The x-axis represents the confidence score and the y-axis represents the number of upvotes a comment received. Data points below 0.00 on the x-axis represent a negative classification, and data points above 0.00 represent a positive classification. Data points are represented as orange-red circles.
Figure 5Monthly sentiment for Twitter and Reddit COVID-19 vaccine–related posts. The x-axis represents time and the y-axis represents the percentage of posts classified as positive. The blue line represents Twitter sentiment and the orange-red line represents Reddit sentiment. Note that since posting frequency was very low, sentiment for January 2020 is an average of all other months of corresponding data.