| Literature DB >> 33519326 |
Klaifer Garcia1, Lilian Berton1.
Abstract
Twitter is a social media platform with more than 500 million users worldwide. It has become a tool for spreading the news, discussing ideas and comments on world events. Twitter is also an important source of health-related information, given the amount of news, opinions and information that is shared by both citizens and official sources. It is a challenge identifying interesting and useful content from large text-streams in different languages, few works have explored languages other than English. In this paper, we use topic identification and sentiment analysis to explore a large number of tweets in both countries with a high number of spreading and deaths by COVID-19, Brazil, and the USA. We employ 3,332,565 tweets in English and 3,155,277 tweets in Portuguese to compare and discuss the effectiveness of topic identification and sentiment analysis in both languages. We ranked ten topics and analyzed the content discussed on Twitter for four months providing an assessment of the discourse evolution over time. The topics we identified were representative of the news outlets during April and August in both countries. We contribute to the study of the Portuguese language, to the analysis of sentiment trends over a long period and their relation to announced news, and the comparison of the human behavior in two different geographical locations affected by this pandemic. It is important to understand public reactions, information dissemination and consensus building in all major forms, including social media in different countries.Entities:
Keywords: COVID-19; English language; Portuguese language; Sentiment analysis; Topic detection; Twitter
Year: 2020 PMID: 33519326 PMCID: PMC7832522 DOI: 10.1016/j.asoc.2020.107057
Source DB: PubMed Journal: Appl Soft Comput ISSN: 1568-4946 Impact factor: 6.725
Fig. 1Work-flow: we access Twitter API and collect tweets using two different queries, one for English and other for Portuguese. The messages are processed separately, in the natural language processing step, to be compared in the analysis step.
Fig. 2Detailed Natural Language Processing step: the text from each tweet goes to a pre-processing (removing links, emoji, html, “@’), then, the tweets are employed into topic modeling and sentiment analysis algorithms, resulting in a set of topics and polarity/emotions classification for each tweet, respectively.
SemEval 2018 - task 1. Emotion intensity regression (EI-reg).
| Pearson correlations (r) | |||||
|---|---|---|---|---|---|
| Anger | Fear | Joy | Sadness | Avg. | |
| CrystalFeel - word embedding | 0.611 | 0.557 | 0.585 | 0.580 | 0.583 |
| CrystalFeel | 0.666 | ||||
| SBERT | 0.642 | 0.657 | 0.637 | 0.658 | |
| SBERT | 0.587 | 0.615 | 0.651 | 0.636 | 0.622 |
| SBERT | 0.642 | 0.653 | 0.574 | 0.638 | 0.626 |
| SBERT | 0.628 | 0.652 | 0.691 | 0.630 | 0.650 |
SemEval 2018, task 1, Sentiment Scale Regression.
| Pearson correlations (r) | |
|---|---|
| CrystalFeel | |
| Unigram | 0.387 |
| Unigram | 0.496 |
| Unigram | 0.388 |
| Unigram | 0.585 |
| SBERT | 0.764 |
| SBERT | 0.794 |
| SBERT | 0.765 |
| SBERT | 0.761 |
This value was calculated by the event team.
Portuguese sentiment classification considering unigrams and bigrams as features.
| Classifier | Negative | Positive | ||
|---|---|---|---|---|
| Precision | F1-score | Precision | F1-score | |
| Naive Bayes | 0.85 | 0.72 | 0.63 | |
| Logistic Regression | 0.81 | 0.77 | 0.66 | |
| Random Forest | 0.81 | 0.65 | ||
| Linear SVM | 0.81 | 0.77 | ||
| MLP | 0.82 | 0.83 | 0.66 | 0.65 |
| AdaBoost | 0.75 | 0.83 | 0.76 | 0.51 |
Portuguese sentiment classification considering sentence embedding as features.
| Classifier | Negative | Positive | ||
|---|---|---|---|---|
| Precision | F1-score | Precision | F1-score | |
| Logistic Regression | 0.73 | 0.81 | 0.65 | 0.44 |
| Random Forest | 0.75 | 0.83 | 0.73 | 0.51 |
| Linear SVM | 0.72 | 0.81 | 0.66 | 0.41 |
| Logistic Regression | 0.79 | 0.73 | 0.60 | |
| Random Forest | 0.76 | 0.77 | 0.54 | |
| Linear SVM | 0.78 | 0.74 | 0.49 | |
| Logistic Regression | 0.79 | 0.71 | 0.61 | |
| Random Forest | 0.77 | 0.57 | ||
| Linear SVM | 0.79 | 0.72 | 0.61 | |
| Logistic Regression | 0.71 | |||
| Random Forest | 0.78 | 0.72 | 0.59 | |
| Linear SVM | 0.71 | 0.62 | ||
Portuguese sentiment classification combining unigrams, bigrams and sentence embedding as features.
| Classifier | Negative | Positive | ||
|---|---|---|---|---|
| Precision | F1-score | Precision | F1-score | |
| Logistic Regression | 0.82 | 0.86 | 0.77 | 0.68 |
| Linear SVM | 0.82 | 0.86 | 0.76 | 0.68 |
| Logistic Regression | 0.83 | 0.77 | 0.70 | |
| Linear SVM | 0.83 | 0.86 | 0.76 | 0.69 |
| Logistic Regression | ||||
| Linear SVM | 0.83 | 0.77 | 0.70 | |
| Logistic Regression | 0.80 | 0.84 | 0.71 | 0.63 |
| Linear SVM | 0.80 | 0.84 | 0.72 | 0.62 |
English topics.
| Topic | Correlated words |
|---|---|
| Economic impacts | Work, impact, business, crisis, pay |
| Case reports/statistics | Case, death, report, die, number, people, total, patient, record, update, confirm |
| Proliferation care | Mask, wear, people, school, test, social, spread, reopen |
| Politics | Trump, government, response, president, fauci(Anthony Stephen Fauci), state, minister |
| Entertainment | Watch, like, video, love, show |
| Treatments | Vaccine, patient, hospital, drug, trial, treatment, plasma, hydroxychloroquine |
| Online events | Join, webinar, live, talk, discuss, impact, virtual, tomorrow, host |
| Charity | Support, help, fund, relief, donate, provide, community, food, million |
| Sports | Player, season, football, team, league, game, play, sport |
| Anti-racism protests | Protest, police, kill, black, american, death, die, right, war |
Portuguese topics.
| Topic | Correlated words |
|---|---|
| Economic impacts | Buying, money, working, crisis |
| Treatments | Vaccine, cure, chloroquine, virus, treatment, patient, hydroxychloroquine, medical |
| Proliferation care | To stick, house, to leave, people, to pass, mask, a party |
| Case reports/statistics | Death, case, number, dead, register, confirm, mother, country, father uncle |
| Education and culture | Class, learn, how to, study, read, school, teacher, college |
| Sports | Playing, football, player, Flamengo(a football club), team, club, championship |
| Politics | Bolsonaro, governor, president, stf(Supreme Federal Court), minister, major, approve, law, project, mp(Public Prosecutor’s Office or provisional measure), chamber, federal, public |
| Heath and beauty | Hair, fattening, health, mental, painting, losing weight, anxiety |
| Entertainment | Watch, series, musician, movie, listen, season, lauch, netflix, show, album, live, clip |
| Daily life | Day, sleep, wake up, eat, photo, play, night, morning |
Fig. 3Volume of messages by topic (English data on the left and Portuguese data on the right).
Fig. 4English volume variations for topic. The horizontal axis represents the posting dates where each point represents the sum of the messages over a week and the vertical axis the number of posts. The blue lines represent the total messages. The other lines are related to sentiment analysis where red are negative and green are positive.
Fig. 5Portuguese volume variations for topic. The horizontal axis represents the posting dates where each point represents the sum of the messages over a week and the vertical axis the number of posts. The blue lines represent the total messages. The other lines are related to sentiment analysis where red are negative and green are positive.
Fig. 6Proportion of emotions related to anger, fear and sadness for the English topics.