| Literature DB >> 35494863 |
Michele Mastromattei1, Leonardo Ranaldi2, Francesca Fallucchi2, Fabio Massimo Zanzotto1.
Abstract
Hate speech recognizers (HSRs) can be the panacea for containing hate in social media or can result in the biggest form of prejudice-based censorship hindering people to express their true selves. In this paper, we hypothesized how massive use of syntax can reduce the prejudice effect in HSRs. To explore this hypothesis, we propose Unintended-bias Visualizer based on Kermit modeling (KERM-HATE): a syntax-based HSR, which is endowed with syntax heat parse trees used as a post-hoc explanation of classifications. KERM-HATE significantly outperforms BERT-based, RoBERTa-based and XLNet-based HSR on standard datasets. Surprisingly this result is not sufficient. In fact, the post-hoc analysis on novel datasets on recent divisive topics shows that even KERM-HATE carries the prejudice distilled from the initial corpus. Therefore, although tests on standard datasets may show higher performance, syntax alone cannot drive the "attention" of HSRs to ethically-unbiased features. ©2022 Mastromattei et al.Entities:
Keywords: Bias; Explainability; Hate speech; Neural networks; Syntax
Year: 2022 PMID: 35494863 PMCID: PMC9044272 DOI: 10.7717/peerj-cs.859
Source DB: PubMed Journal: PeerJ Comput Sci ISSN: 2376-5992
Figure 1Sentence: Black people are the worst to each other.
Unbiased Syntax Heat Parse Tree derived by KERMIT (Zanzotto et al., 2020) within an Hate Speech Recognizer trained on the Davidson Corpus (Davidson et al., 2017). Active nodes are red.
Figure 2KERM-HATE architecture, forward and interpretation pass.
Black Lives Matter Corpus example sentences.
|
|
|
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Twitter verified profiles selected for each party.
|
|
United States presidential election corpus example sentences.
|
|
|
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Performance of the Hate speech recognizers on the three different datasets. Mean and standard deviation results are obtained from 10 runs. The symbols ♢, † and ∗ indicate a statistically significant difference between two results with a 95% of confidence level with the sign test.
|
|
|
| ||||
|---|---|---|---|---|---|---|
|
| Average | Average | Average | Average | Average | Average |
| BERT | 0.67 (± 0.03)♢ |
| 0.73 (± 0.01)♢ |
| 0.54 (±0.02)♢ |
|
| BERT | 0.66 (± 0.01) | 0.47 (± 0.01) | 0.54 (± 0.12) | 0.34 (± 0.07) | 0.49 (±0.07) | 0.38 (±0.04) |
| BERT | 0.66 (± 0.02) | 0.47 (± 0.01) | 0.50 (± 0.10) | 0.33 (± 0.07) | 0.47 (± 0.08) | 0.38 (± 0.04) |
| XLNet | 0.47 (± 0.06)† | 0.34 (± 0.03)† | 0.55 (± 0.08)† | 0.39 (± 0.10)† | 0.53 (±0.03)† | 0.42 (±0.01)† |
| RoBERTa |
| 0.37 (± 0.05) |
| 0.44(± 0.09) |
| 0.42(±0.03) |
| KERMIT | 0.72 (± 0.02)∗ | 0.54 (± 0.02)∗ | 0.79 (± 0.01)∗ | 0.54 (± 0.09)∗ | 0.60 (± 0.02)∗ | 0.51 (± 0.01)∗ |
| KERMIT | 0.68 (± 0.05)† | 0.47 (± 0.03)† | 0.74 (± 0.02)† | 0.51 (± 0.10)† | 0.56 (± 0.03)† | 0.47 (± 0.01)† |
|
|
|
|
|
|
|
|
Figure 3Labeling phase on our generated corpora using BERTBASE and KERM-HATE.
Test summary.
For each test, inter-annotator agreement and the results obtained are reported.
|
|
|
|
| |
|---|---|---|---|---|
| Fleiss’ Kappa |
| 0.24 | 0.87 | |
|
| Average Agreement | Average Perceived | ||
| BERT | KERM-HATE | with KERM-HATE | Prejudice | |
| 0.11 | 0.41 | 0.52(±0.08) | 0.55(±0.04) | |
Figure 4(A) Sentence: Max is an - Labeled as: Offensive language (B) Sentence: Max is an - Labeled as: Neither.
KERM-HATE colored parse trees output.