| Literature DB >> 35622866 |
Alexander Bogdanowicz1, ChengHe Guan1,2.
Abstract
In an effort to gauge the global pandemic's impact on social thoughts and behavior, it is important to answer the following questions: (1) What kinds of topics are individuals and groups vocalizing in relation to the pandemic? (2) Are there any noticeable topic trends and if so how do these topics change over time and in response to major events? In this paper, through the advanced Sequential Latent Dirichlet Allocation model, we identified twelve of the most popular topics present in a Twitter dataset collected over the period spanning April 3rd to April 13th, 2020 in the United States and discussed their growth and changes over time. These topics were both robust, in that they covered specific domains, not simply events, and dynamic, in that they were able to change over time in response to rising trends in our dataset. They spanned politics, healthcare, community, and the economy, and experienced macro-level growth over time, while also exhibiting micro-level changes in topic composition. Our approach differentiated itself in both scale and scope to study the emerging topics concerning COVID-19 at a scale that few works have been able to achieve. We contributed to the cross-sectional field of urban studies and big data. Whereas we are optimistic towards the future, we also understand that this is an unprecedented time that will have lasting impacts on individuals and society at large, impacting not only the economy or geo-politics, but human behavior and psychology. Therefore, in more ways than one, this research is just beginning to scratch the surface of what will be a concerted research effort into studying the history and repercussions of COVID-19.Entities:
Mesh:
Year: 2022 PMID: 35622866 PMCID: PMC9140268 DOI: 10.1371/journal.pone.0268669
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.752
Recent literature on topic modeling: LDA advancements and variants.
| Author | Year | Variant | Description |
|---|---|---|---|
| Blei et al. | 2003 [ | LDA | Original Generative Model |
| Blei and Lafferty | 2006 [ | SeqLDA | First Dynamic Topic Model |
| Hoffman et al. | 2010 [ | OnlineLDA | First Scalable LDA Algorithm |
| Zhang and Sun | 2012 [ | MB-LDA | Feature Space Expanded to User Network |
| Huang et al. | 2012 [ | - | Clustering Feature Space prior to LDA |
| Yan and Zhao | 2013 [ | MB-LSA | Expanded Micro-Blog Feature-Set + LSA |
| Wang et al. | 2016 [ | SH-LDA | Updated Temporal & Hashtag-Graph-Based Topic Model |
| Xu et al. | 2016 [ | TUS-LDA | Joint Temporal and Emotional Probability Space LDA |
| Yao and Wang | 2020 [ | - | 3-Step Geo-Topic Generation and Tracking LDA |
| Du et al. | 2020 [ | MF-LDA | Analyzed the life-cycle of "hot-topics" with Dynamic LDA |
| Tan and Guan | 2021 [ | - | Recognized time and space frequency patterns |
Twitter API services & limitations.
| Twitter API Access | ||
|---|---|---|
| Service | Tweets/Month | Description |
| 30-Days Sandbox | 25k | Tweets only available from within the last 30-days |
| Full Archive | 5k | Tweets from the full twitter archive (since 2008) |
| Standard Stream | Rate-Limited | Stream Live Tweets from the last 14-days (Excessive requests can generate rate-limits) |
Fig 1Dataset preprocessing stages.
Fig 2Distribution of preprocessed Tweets.
Fig 3Original LDA representation.
Fig 4Original DTM representation.
Fig 5Static LDA coherence scores for varied numbers of topics.
Cluster & model configurations.
| Cluster Configuration | |
|---|---|
| Nodes | 2 |
| Cores/Node | 16 |
| Memory/Node | 32GB |
| Partition | Parallel |
|
| |
| Dataset Passes | 5 |
| Update Model | Every 1k tweets |
| Scoring | Umass |
| Train Time | 34 hours |
Fig 6Time-series representation of the changes in topic dominance over time.
Topic word representations for April 13th and custom labels.
| Topic Interpretations | |||
|---|---|---|---|
| Topic # | Words | Label | Topic Size |
| 46 | Status Updates | 52% | |
| 55 | US Politics | 5% | |
| 23 | Infection & Testing | 5% | |
| 52 | Reports | 4% | |
| 1 | Personal Finances | 2% | |
| 20 | Social Distancing | 1.5% | |
| 17 | Healthcare | 1.5% | |
| 39 | American Response | 1.5% | |
| 21 | Positive Response | 1% | |
| 35 | Medical Resources | 1% | |
Fig 7Changes in topic-word probabilities over time.
Fig 8Custom hierarchical coloring.
Fig 9t-SNE clustering visualized.
Fig 10t-SNE sample clustering visualized.