Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Directions in abusive language training data, a systematic review: Garbage in, garbage out.

Literature DB >> 33370298

Directions in abusive language training data, a systematic review: Garbage in, garbage out.

Abstract

Data-driven and machine learning based approaches for detecting, categorising and measuring abusive content such as hate speech and harassment have gained traction due to their scalability, robustness and increasingly high performance. Making effective detection systems for abusive content relies on having the right training datasets, reflecting a widely accepted mantra in computer science: Garbage In, Garbage Out. However, creating training datasets which are large, varied, theoretically-informed and that minimize biases is difficult, laborious and requires deep expertise. This paper systematically reviews 63 publicly available training datasets which have been created to train abusive language classifiers. It also reports on creation of a dedicated website for cataloguing abusive language data hatespeechdata.com. We discuss the challenges and opportunities of open science in this field, and argue that although more dataset sharing would bring many benefits it also poses social and ethical risks which need careful consideration. Finally, we provide evidence-based recommendations for practitioners creating new abusive content training datasets.

Entities: Chemical Disease Gene Mutation Species

Year: 2020 PMID： 33370298 DOI： 10.1371/journal.pone.0243300

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

6 in total

1. In Search of Ambiguity: A Three-Stage Workflow Design to Clarify Annotation Guidelines for Crowd Workers.

Authors: Vivek Krishna Pradhan; Mike Schaekermann; Matthew Lease
Journal: Front Artif Intell Date: 2022-05-18

Review 2. Bias and comparison framework for abusive language datasets.

Authors: Maximilian Wich; Tobias Eder; Hala Al Kuwatly; Georg Groh
Journal: AI Ethics Date: 2021-07-19

3. Artificial Intelligence to Address Cyberbullying, Harassment and Abuse: New Directions in the Midst of Complexity.

Authors: Tijana Milosevic; Kathleen Van Royen; Brian Davis
Journal: Int J Bullying Prev Date: 2022-02-25

Review 4. Datasets for Automated Affect and Emotion Recognition from Cardiovascular Signals Using Artificial Intelligence- A Systematic Review.

Authors: Paweł Jemioło; Dawid Storman; Maria Mamica; Mateusz Szymkowski; Wioletta Żabicka; Magdalena Wojtaszek-Główka; Antoni Ligęza
Journal: Sensors (Basel) Date: 2022-03-25 Impact factor: 3.576

Review 5. Perspectives on Sex- and Gender-Specific Prediction of New-Onset Atrial Fibrillation by Leveraging Big Data.

Authors: Sven Geurts; Zuolin Lu; Maryam Kavousi
Journal: Front Cardiovasc Med Date: 2022-07-11

6. Asian hate speech detection on Twitter during COVID-19.

Authors: Amir Toliyat; Sarah Ita Levitan; Zheng Peng; Ronak Etemadpour
Journal: Front Artif Intell Date: 2022-08-15

6 in total