| Literature DB >> 34934256 |
Marco Viviani1, Cristina Crocamo2, Matteo Mazzola1, Francesco Bartoli2,3, Giuseppe Carrà2,3,4, Gabriella Pasi1.
Abstract
In recent years we have witnessed a growing interest in the analysis of social media data under different perspectives, since these online platforms have become the preferred tool for generating and sharing content across different users organized into virtual communities, based on their common interests, needs, and perceptions. In the current study, by considering a collection of social textual contents related to COVID-19 gathered on the Twitter microblogging platform in the period between August and December 2020, we aimed at evaluating the possible effects of some critical factors related to the pandemic on the mental well-being of the population. In particular, we aimed at investigating potential lexicon identifiers of vulnerability to psychological distress in digital social interactions with respect to distinct COVID-related scenarios, which could be "at risk" from a psychological discomfort point of view. Such scenarios have been associated with peculiar topics discussed on Twitter. For this purpose, two approaches based on a "top-down" and a "bottom-up" strategy were adopted. In the top-down approach, three potential scenarios were initially selected by medical experts, and associated with topics extracted from the Twitter dataset in a hybrid unsupervised-supervised way. On the other hand, in the bottom-up approach, three topics were extracted in a totally unsupervised way capitalizing on a Twitter dataset filtered according to the presence of keywords related to vulnerability to psychological distress, and associated with at-risk scenarios. The identification of such scenarios with both approaches made it possible to capture and analyze the potential psychological vulnerability in critical situations.Entities:
Keywords: Mental health; Psychological distress; Sentiment analysis; Social media; Social network analysis; Vulnerability
Year: 2021 PMID: 34934256 PMCID: PMC8678930 DOI: 10.1016/j.future.2021.06.044
Source DB: PubMed Journal: Future Gener Comput Syst ISSN: 0167-739X Impact factor: 7.187
Fig. 1The pipeline of the two approaches followed in this work.
Examples of topic modeling with LDA applied to the random dataset considered for a number of topics equal to and .
| Topic | Keywords |
|---|---|
| Run | |
| 1 | |
| 2 | |
| 3 | |
| Run | |
| 1 | |
| 2 | |
| 3 | Trump, |
| 4 | Positive, |
| 5 | Lockdown, |
| 6 | |
| 7 | Relief, |
Fig. 2Topic coherence values over the 50 runs performed in the top-down approach.
Fig. 3Interface of the tool used to interpret the topics extracted by choosing a variable number of topics. In particular, in the figure, the number of extracted topics is equal to 9, and the selected topic is the one labeled as 6.
Keywords extracted through the application of topic modeling with 9 topics and association with the respective target scenarios.
| Target | Keywords |
|---|---|
| Distance, distancing, family, holiday, home, lockdown(s), mask, quarantine, restrictions, safe, smartworking, social, spread, stay, wear, wearing. | |
| Asymptomatic, billion, biotech, drug(s), fear, government, herd, immunity, money, negative, paid, pfizer, positive, rich, test(s), tested, testing, trial, vaccination(s), vaccine(s). | |
| Bed(s), cancer, care, doctor(s), hospital(s), nurse(s), patient(s), room, strain, symptom(s), treatment(s). | |
Characteristics of the datasets and conversation graphs associated with the three target scenarios after the filtering phase on the basis of the keywords extracted by means of topic modeling and the intervention of experts.
| Datasets | Social distancing | Vaccines & vaccinations | Symptoms & hospitalization |
|---|---|---|---|
| # tweets | 3,347,407 | 944,108 | 632,200 |
| # nodes | 3,900,140 | 3,518,257 | 1,047,215 |
| # edges | 8,398,646 | 867,068 | 1,922,023 |
| 4.3068 | 4.4893 | 3.6707 | |
| CC nodes | 3,509,001 | 1,445,218 | 909,089 |
| CC edges | 8,146,111 | 3,441,020 | 1,832,946 |
| CC degree | 4.6430 | 4.7619 | 4.0325 |
| CC2 nodes | 212,012 | 102,497 | 49,685 |
| CC2 edges | 390,702 | 183,200 | 76,012 |
| CC2 degree | 3.686 | 3.575 | 3.060 |
Characteristics of the vulnerability dataset and the associated conversation graph.
| Vulnerability dataset | |
|---|---|
| # tweets | 1,000,000 |
| # nodes | 677,286 |
| # edges | 1,013,258 |
| Graph degree | 2.9921 |
| CC nodes | 579,469 |
| CC edges | 949,646 |
| CC degree | 3.2776 |
| CC2 nodes | 221,498 |
| CC2 edges | 591,675 |
| CC2 degree | 5.3425 |
Fig. 4Topic coherence values over the 50 runs performed in the bottom-up approach.
Fig. 5Top-15 terms for the topics 1, 12, and 27, with a number of extracted topics equal to 40.
Fig. 6Frequencies of the top-5 terms that appear in the target scenario social distancing with respect to the four LIWC categories.
Fig. 7Frequencies of the top-5 terms that appear in the target scenario vaccines & vaccinations with respect to the four LIWC categories.
Fig. 8Frequencies of the top-5 terms that appear in the target scenario vaccines & vaccinations with respect to the four LIWC categories.
Global vulnerability scores for the three (top-down) target scenarios.
| Target scenarios | ||
|---|---|---|
| 0.0038 | 0.0026 | 0.0037 |
Fig. 9Vulnerability scores per category with respect to the three (top-down) target scenarios.
Sentiment analysis values obtained by VADER on the three considered (top-down) target scenarios.
| Target | Social distancing | Vaccines & vaccinations | Symptoms & hospitalization |
|---|---|---|---|
| 39% | 46% | 40% | |
| 20% | 15% | 12% | |
| 41% | 39% | 48% |
Sentiment analysis values obtained by employing CT-BERT on the three considered (top-down) target scenarios.
| Target | Social distancing | Vaccines & vaccinations | Symptoms & hospitalization |
|---|---|---|---|
| 10% | 8% | 4% | |
| 40% | 35% | 46% | |
| 60% | 57% | 50% |
Fig. 10Frequencies of the top-5 terms that appear in the (bottom-up) target scenario social distancing and protection with respect to the four LIWC categories.
Fig. 11Frequencies of the top-5 terms that appear in the (bottom-up) target scenario tests & hospitalization with respect to the four LIWC categories.
Fig. 12Frequencies of the top-5 terms that appear in the (bottom-up) target scenario politics with respect to the four LIWC categories.
Global vulnerability scores for the three (bottom-up) target scenarios.
| Target scenarios | ||
|---|---|---|
| 0.0011 | 0.0007 | 0.0005 |
Fig. 13Vulnerability scores per category with respect to the three (bottom-up) target scenarios.
Sentiment analysis values obtained by VADER on the three considered (bottom-up) target scenarios.
| Target | Social distancing and protection | Tests & hospitalization | Politics |
|---|---|---|---|
| 20% | 7% | 64% | |
| 2% | 2% | 2% | |
| 78% | 89% | 34% |
Sentiment analysis values obtained by employing CT-BERT on the three considered (bottom-up) target scenarios.
| Target | Social distancing and protection | Tests & hospitalization | Politics |
|---|---|---|---|
| 8% | 7% | 9% | |
| 23% | 23% | 26% | |
| 69% | 70% | 65% |