| Literature DB >> 36046150 |
Amir Toliyat1, Sarah Ita Levitan2, Zheng Peng3, Ronak Etemadpour3.
Abstract
Coronavirus disease 2019 (COVID-19) started in Wuhan, China, in late 2019, and after being utterly contagious in Asian countries, it rapidly spread to other countries. This disease caused governments worldwide to declare a public health crisis with severe measures taken to reduce the speed of the spread of the disease. This pandemic affected the lives of millions of people. Many citizens that lost their loved ones and jobs experienced a wide range of emotions, such as disbelief, shock, concerns about health, fear about food supplies, anxiety, and panic. All of the aforementioned phenomena led to the spread of racism and hate against Asians in western countries, especially in the United States. An analysis of official preliminary police data by the Center for the Study of Hate & Extremism at California State University shows that Anti-Asian hate crime in 16 of America's largest cities increased by 149% in 2020. In this study, we first chose a baseline of Americans' hate crimes against Asians on Twitter. Then we present an approach to balance the biased dataset and consequently improve the performance of tweet classification. We also have downloaded 10 million tweets through the Twitter API V-2. In this study, we have used a small portion of that, and we will use the entire dataset in the future study. In this article, three thousand tweets from our collected corpus are annotated by four annotators, including three Asian and one Asian-American. Using this data, we built predictive models of hate speech using various machine learning and deep learning methods. Our machine learning methods include Random Forest, K-nearest neighbors (KNN), Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), Logistic Regression, Decision Tree, and Naive Bayes. Our Deep Learning models include Basic Long-Term Short-Term Memory (LSTM), Bidirectional LSTM, Bidirectional LSTM with Drop out, Convolution, and Bidirectional Encoder Representations from Transformers (BERT). We also adjusted our dataset by filtering tweets that were ambiguous to the annotators based on low Fleiss Kappa agreement between annotators. Our final result showed that Logistic Regression achieved the best statistical machine learning performance with an F1 score of 0.72, while BERT achieved the best performance of the deep learning models, with an F1-Score of 0.85.Entities:
Keywords: Asian hate crime; COVID-19; Twitter; machine learning; natural language processing
Year: 2022 PMID: 36046150 PMCID: PMC9421075 DOI: 10.3389/frai.2022.932381
Source DB: PubMed Journal: Front Artif Intell ISSN: 2624-8212
Figure 1Example of word embedding.
Figure 2Comparing the balance in the number of tweets in each class in different datasets. (A) Baseline dataset, (B) Our dataset, (C) Our improved dataset, (D) Ambiguous dataset, and (E) Dataset with full agreement.
Figure 3Comparing the wordcloud of different datasets. (A) Baseline dataset, (B) Our dataset, (C) Our improved dataset, (D) Ambiguous dataset, and (E) Dataset with full agreement.
Most frequent phrases for each category in each dataset.
|
|
|
|
| ||
|---|---|---|---|---|---|
| (a) | Base line dataset | Counterhate | Stand | Calling Chinese | Stop Calling Chinese |
| Asian | Asian friends | Calling Chinese virus | |||
| Neutral | Chinese | Chinese virus | Fuck Chinese virus | ||
| Coronavirus | Fuck China | Fucking Chinese virus | |||
| Hate | Chinaliedpeopledied | Fuck Chinese | Fucking Chinese virus | ||
| Fuck | Fuck China | Fuck Chinese virus | |||
| (b) | Our dataset | Counterhate | Iamnotavirus | Asian American | Hate Asian Americans |
| Asian | Asian Americans | Calling Chinese virus | |||
| Neutral | Cases | Fuck China | Diamond Princess cruise | ||
| Coronavirus | Cruise ship | COVID-19 Coronavirus wuflu | |||
| Hate | Fuck | Fuck chinese | Fucking Chinese virus | ||
| Chinaliedpeopledied | Fuck China | Fuck Chinese virus | |||
| (c) | Our improved dataset | Counterhate | Racismisavirus | Asian American | Hate Asian Americans |
| Asian | Asian Americans | Calling Chinese virus | |||
| Neutral | Bioweapon | Chinese virus | Diamond princess cruise | ||
| Coronavirus | Fuck China | COVID-19 coronavirus wuflu | |||
| Hate | Fuck | Fuck Chinese | Fuck Chinese virus | ||
| Chinaliedpeopledied | Fuck China | Fucking Chinese virus | |||
| (d) | Ambiguous dataset | Counterhate | World | Chinese virus |
|
| Chink | Corona virus | ||||
| Neutral | Coronavirus | Chinese virus |
| ||
| Fuck | Corona virus | ||||
| Hate | Fuck | Chinese food |
| ||
| Started | Corona virus | ||||
| (e) | Dataset with full agreement | Counterhate | Asian | Asian people | Fuck Chinese virus |
| Racismisavirus | Calling Chinese | Calling Chinese virus | |||
| Neutral | Chinese | Fuck China | Calling Chinese virus | ||
| Coronavirus | Chinese virus | Diamond princess cruise | |||
| Hate | Racismisavirus | Fuck Chinese | Fuck China fuck | ||
| Coronavirus | Fuck China | Fuck Chinese virus |
The performance of machine learning methods for different phases.
|
|
|
|
|
|
|
|
| ||
|---|---|---|---|---|---|---|---|---|---|
| (a) | Phase 1 | R.F | 0.73 | 0.68 | 0.74 | 0.72 | 0.27 | 0.27 |
|
| K.N.N | 0.65 | 0.64 | 0.61 | 0.65 | 0.36 |
| 0.6 | ||
| S.V.M | 0.73 | 0.7 | 0.72 | 0.73 | 0.27 | 0.27 |
| ||
| XGBoost | 0.68 | 0.65 | 0.66 | 0.68 | 0.32 | 0.32 | 0.57 | ||
|
|
|
|
|
| 0.25 | 0.25 |
| ||
| D.T | 0.66 | 0.65 | 0.63 | 0.66 | 0.25 | 0.25 | 0.5 | ||
| N.B | 0.64 | 0.52 | 0.66 | 0.62 |
|
| 0.6 | ||
| (b) | Phase 2 | R.F | 0.75 | 0.49 | 0.8 | 0.65 | 0.25 | 0.25 | 0.5 |
| K.N.N | 0.72 | 0.54 | 0.62 | 0.67 |
|
|
| ||
| S.V.M | 0.8 | 0.65 |
| 0.74 | 0.19 | 0.19 | 0.44 | ||
| XGBoost | 0.78 | 0.6 | 0.8 | 0.71 | 0.21 | 0.21 | 0.46 | ||
|
|
|
| 0.72 |
| 0.2 | 0.2 | 0.44 | ||
| D.T | 0.71 | 0.58 | 0.57 | 0.68 | 0.2 | 0.2 | 0.45 | ||
| N.B | 0.59 | 0.5 | 0.42 | 0.6 |
|
|
| ||
| (c) | Phase 3 | R.F |
| 0.62 | 0.74 | 0.72 | 0.2 | 0.2 | 0.45 |
| K.N.N | 0.73 | 0.48 | 0.59 | 0.64 |
|
| 0.62 | ||
| S.V.M | 0.78 | 0.56 | 0.75 | 0.7 | 0.22 | 0.22 | 0.47 | ||
| XGBoost | 0.77 | 0.55 | 0.68 | 0.69 | 0.23 | 0.23 |
| ||
|
|
|
|
|
| 0.2 | 0.2 | 0.4 | ||
| D.T | 0.77 | 0.63 | 0.61 | 0.73 | 0.2 | 0.2 | 0.45 | ||
| N.B | 0.62 | 0.52 | 0.42 | 0.63 |
|
| 0.62 |
From the literature, we know that F1-score and ROC-AUC have been designed to work well on imbalanced data, and also F1-score combines precision and recall metrics; hence, we simply can conclude that Logistic Regression has a better overall performance in all phases. The best result is bolded in each column, meaning the smallest number for MAE, MSE, and RMSE, and the largest number for the rest is our favorite. In each phase, a method with the overall best performance is highlighted in yellow.
The performance of deep learning methods for different phases.
|
|
|
|
|
|
|
|
|
|
|
| ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| (a) | Phase 1 | Basic LSTM | 6 | 0.97 | 0.55 | 0.73 | 63 | 0.63 | 0.61 | 0.31 | 0.18 | 0.43 |
| Bidirectional LSTM | 6 | 1.79 | 0.67 | 0.73 | 63 | 0.7 | 0.65 | 0.23 | 0.2 | 0.45 | ||
| Bi. LSTM with dropout | 6 | 2.6 | 0.66 | 0.6 | 63 | 0.63 |
| 0.26 | 0.23 | 0.48 | ||
| Convolution | 6 | 1.61 | 0.66 | 0.67 | 63 | 0.66 | 0.66 | 0.23 | 0.19 | 0.43 | ||
|
|
|
| - | - | - |
| - | - | - | - | ||
| (b) | Phase 2 | Basic LSTM | 7 | 0.91 | 0.62 | 0.79 | 68 | 0.7 | 0.65 | 0.29 | 0.17 | 0.41 |
| Bidirectional LSTM | 7 | 1.69 | 0.71 | 0.71 | 68 | 0.71 | 0.66 | 0.22 | 0.19 | 0.44 | ||
| Bi. LSTM with dropout | 7 | 1.59 | 0.64 | 0.81 | 68 | 0.71 | 0.66 | 0.22 | 0.19 | 0.44 | ||
| Convolution | 7 | 1.56 | 0.62 | 0.81 | 68 | 0.7 | 0.66 | 0.23 | 0.19 | 0.43 | ||
|
|
| 0.75 | - | - | - |
| - | - | - | - | ||
| (c) | Phase 3 | Basic LSTM | 6 | 0.84 | 0.71 | 0.89 | 62 | 0.79 | 0.61 | 0.3 | 0.16 | 0.4 |
| Bidirectional LSTM | 6 | 0.65 | 0.74 | 0.84 | 62 | 0.79 | 0.72 | 0.19 | 0.17 | 0.41 | ||
| Bi. LSTM with dropout | 6 | 0.63 | 0.76 | 0.76 | 62 | 0.76 | 0.71 | 0.2 | 0.16 | 0.41 | ||
| Convolution | 6 | 0.58 | 0.71 | 0.76 | 62 | 0.73 | 0.69 | 0.21 | 0.17 | 0.41 | ||
|
|
| 0.64 | - | - | - |
| - | - | - | - |
The favorite is the higher F1-score and fewer epochs and validation loss. Hence, BERT has better performance in all three phases, and we achieved higher performance with fewer epochs in phase three, which outperformed our baseline performance. In each phase, a method with the overall best performance is highlighted in yellow.