| Literature DB >> 36011965 |
Fernando Arias1,2,3, Ariel Guerra-Adames1,2, Maytee Zambrano1,2, Efraín Quintero-Guerra1, Nathalia Tejedor-Flores2,4.
Abstract
Over the past decade, an increase in global connectivity and social media users has changed the way in which opinions and sentiments are shared. Platforms such as Twitter can act as public forums for expressing opinions on non-personal matters, but often also as an outlet for individuals to share their feelings and personal thoughts. This becomes especially evident during times of crisis, such as a massive civil disorder or a pandemic. This study proposes the estimation and analysis of sentiments expressed by Twitter users of the Republic of Panama during the years 2019 and 2020. The proposed workflow is comprised of the extraction, quantification, processing and analysis of Spanish-language Twitter data based on Sentiment Analysis. This case of study highlights the importance of developing natural language processing resources explicitly devised for supporting opinion mining applications in Latin American countries, where language regionalisms can drastically change the lexicon on each country. A comparative analysis performed between popular machine learning algorithms demonstrated that a version of a distributed gradient boosting algorithm could infer sentiment polarity contained in Spanish text in an accurate and time-effective manner. This algorithm is the tool used to analyze over 20 million tweets produced between the years of 2019 and 2020 by residents of the Republic of Panama, accurately displaying strong sentiment responses to events occurred in the country over the two years that the analysis performed spanned. The obtained results highlight the potential that methodologies such as the one proposed in this study could have for transparent government monitoring of responses to public policies on a population scale.Entities:
Keywords: COVID-19; machine learning; natural language processing; public health; sentiment analysis; social media
Mesh:
Year: 2022 PMID: 36011965 PMCID: PMC9408347 DOI: 10.3390/ijerph191610328
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 4.614
Figure 1Weekly distribution of extracted tweets.
Figure 2Example of the stages of tweet pre-processing.
Figure 3Word clouds made from most frequent words used during the fourth week of 2019 both (a) before applying a immigrant bias filter and (b) after applying a immigrant bias filter.
Figure 4Examples of tweets classified as positive, neutral, and negative respectively.
Figure 5Number of tweets extracted versus number of tweets used in 2019.
Figure 6Number of tweets extracted versus number of tweets used in 2020.
Preliminary performance comparison between popular classification ML algorithms used for SA, with the most favorable values for each column shown in bold.
| Method | Accuracy | Precision | Recall | F1 | Time (ms) |
|---|---|---|---|---|---|
|
| 0.4703 | 0.7943 | 0.4728 | 0.4106 | 10 |
|
| 0.6282 | 0.6449 | 0.6271 | 0.6189 | 301 |
|
| 0.7401 | 0.7944 | 0.7401 | 0.7373 | 1894 |
|
| 0.7631 | 0.8041 | 0.7620 | 0.7594 |
|
|
| 0.7697 | 0.7852 | 0.7685 | 0.7670 | 833 |
|
|
|
|
|
| 34 |
Figure 7Confusion matrix obtained from the evaluation of the (a) Gradient Boosting algorithm, (b) Stochastic Gradient Descent algorithm, (c) Support Vector Classifier algorithm and (d) XGBoost algorithm.
Figure 8Sentiment index values and word clouds generated from most frequent words used each week, 2019.
Figure 9Sentiment index values and word clouds generated from most frequent words used each week, 2020.
Figure 10Sentiment index in 2019, with and without accounting for the immigrant bias, as well as the immediate difference in sentiment.