Literature DB >> 35502160

The design, construction and evaluation of annotated Arabic cyberbullying corpus.

Fatima Shannag1, Bassam H Hammo1,2, Hossam Faris1,3.   

Abstract

Cyberbullying (CB) is classified as one of the severe misconducts on social media. Many CB detection systems have been developed for many natural languages to face this phenomenon. However, Arabic is one of the under-resourced languages suffering from the lack of quality datasets in many computational research areas. This paper discusses the design, construction, and evaluation of a multi-dialect, annotated Arabic Cyberbullying Corpus (ArCybC), a valuable resource for Arabic CB detection and motivation for future research directions in Arabic Natural Language Processing (NLP). The study describes the phases of ArCybC compilation. By way of illustration, it explores the corpus to discover strategies used in rendering Arabic CB tweets pulled from four Twitter groups, including gaming, sports, news, and celebrities. Based on thorough analysis, we discovered that these groups were the most susceptible to harassment and cyberbullying. The collected tweets were filtered based on a compiled harassment lexicon, which contains a list of multi-dialectical profane words in Arabic compiled from four categories: sexual, racial, physical appearance, and intelligence. To annotate ArCybC, we asked five annotators to classify 4,505 tweets into two classes manually: Offensive/non-Offensive and CB/non-CB. We conducted a rigorous comparison of different machine learning approaches applied on ArCybC to detect Arabic CB using two language models: bag-of-words (BoW) and word embedding. The experiments showed that Support Vector Machine (SVM) with word embedding achieved an accuracy rate of 86.3% and an F1-score rate of 85%. The main challenges encountered during the ArCybC construction were the scarcity of freely available Arabic CB texts and the deficiency of annotating the texts.
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022.

Entities:  

Keywords:  Annotated cyberbullying corpus; Arabic harassment dataset; Cyberbullying dataset; Hate speech; Offensive language; Profane lexicon

Year:  2022        PMID: 35502160      PMCID: PMC9046013          DOI: 10.1007/s10639-022-11056-x

Source DB:  PubMed          Journal:  Educ Inf Technol (Dordr)        ISSN: 1360-2357


  8 in total

1.  Aralex: a lexical database for Modern Standard Arabic.

Authors:  Sami Boudelaa; William D Marslen-Wilson
Journal:  Behav Res Methods       Date:  2010-05

2.  Cyberbullying: another main type of bullying?

Authors:  Robert Slonje; Peter K Smith
Journal:  Scand J Psychol       Date:  2008-04

Review 3.  Defining Cyberbullying.

Authors:  Elizabeth Englander; Edward Donnerstein; Robin Kowalski; Carolyn A Lin; Katalin Parti
Journal:  Pediatrics       Date:  2017-11       Impact factor: 7.124

4.  Cyberbullying and its influence on academic, social, and emotional development of undergraduate students.

Authors:  Yehuda Peled
Journal:  Heliyon       Date:  2019-03-23

5.  Reporting items for systematic reviews and meta-analyses of acupuncture: the PRISMA for acupuncture checklist.

Authors:  Xiaoqin Wang; Yaolong Chen; Yali Liu; Liang Yao; Janne Estill; Zhaoxiang Bian; Taixiang Wu; Hongcai Shang; Myeong Soo Lee; Dang Wei; Jinhui Tian; Bin Ma; Yongfeng Wang; Guihua Tian; Kehu Yang
Journal:  BMC Complement Altern Med       Date:  2019-08-12       Impact factor: 3.659

6.  Interrater reliability: the kappa statistic.

Authors:  Mary L McHugh
Journal:  Biochem Med (Zagreb)       Date:  2012       Impact factor: 2.313

Review 7.  Current perspectives: the impact of cyberbullying on adolescent health.

Authors:  Charisse L Nixon
Journal:  Adolesc Health Med Ther       Date:  2014-08-01

8.  Automatic detection of cyberbullying in social media text.

Authors:  Cynthia Van Hee; Gilles Jacobs; Chris Emmery; Bart Desmet; Els Lefever; Ben Verhoeven; Guy De Pauw; Walter Daelemans; Véronique Hoste
Journal:  PLoS One       Date:  2018-10-08       Impact factor: 3.240

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.