| Literature DB >> 35434606 |
Ghadah Alqahtani1, Abdulrahman Alothaim1.
Abstract
One of the most popular social media platforms is Twitter. Emotion analysis and classification of tweets have become a significant research topic recently. The Arabic language faces challenges for emotion classification on Twitter, requiring more preprocessing than other languages. This article provides a practical overview and detailed description of a material that can help in developing an Arabic language model for emotion classification of Arabic tweets. An emotion classification of Arabic tweets using NLP, overall current practical practices, and available resources are highlighted to provide a guideline and overview sight to facilitate future studies. Finally, the article presents some challenges and issues that can be future research directions.Entities:
Keywords: Arabic tweets; emotion analysis; language models; natural language processing; social emotion
Year: 2022 PMID: 35434606 PMCID: PMC9007318 DOI: 10.3389/frai.2022.843038
Source DB: PubMed Journal: Front Artif Intell ISSN: 2624-8212
Summary of Arabic pre-trained language model versions, architecture details, and training data sets.
|
| ||||||
|---|---|---|---|---|---|---|
|
|
|
|
| |||
|
|
|
|
| |||
| Multilingual BERT | Multilingual Cased | 12 | 768 | 12 | 110M | W |
| Multilingual Uncased | 12 | 768 | 12 | 110M | ||
| AraBert | AraBERTbase | 12 | 768 | 12 | 136M | OS, W, N |
| AraBERTlarge | 12 | 768 | 12 | 371M | ||
| AraBERTTwitterbase | 12 | 768 | 12 | 136M | OS, W, N, T | |
| AraBERTTwitterlarge | 12 | 768 | 12 | 371M | ||
| ArabicBERT | ArabicBERTMini | 4 | 256 | 4 | 11M | OS, W |
| ArabicBERTMedium | 8 | 512 | 8 | 42M | ||
| ArabicBERTBase | 12 | 768 | 12 | 110M | ||
| ArabicBERTLarge | 24 | 1024 | 16 | 340M | ||
| QARiB | QARiB25mix | 12 | 768 | 12 | 110M | S, N, T |
| AraGPT2 | AraGPT2base | 12 | 768 | 12 | 135M | OS, W, N |
| AraGPT2medium | 24 | 1024 | 16 | 370M | ||
| AraGPT2large | 36 | 1280 | 20 | 792M | ||
| AraGPT2mega | 48 | 1536 | 25 | 1,46B | ||
| AraELECTRA | AraELECTRAbase | 12 | 256 | 4 | 60M | OS, W, N. |
| Arabic ALBERT | ALBERTbase | 12 | 768 | 12 | 12M | W, OS |
| ALBERTlarge | 24 layers | 1024 | 16 | 18M | ||
| ALBERTxlarge | 24 layers | 2048 | 32 | 60M | ||
| MARBERT | MARBERT | 12 layers | 256 | 12 | 160M | OS, W, N, B, T |
The pre-train data are indicated as follow: Tweets (T), Wikipedia (W), news (N), OSCAR corpus (OS), subtitles (S), and books (B).
Emotion data set available for Arabic language.
|
|
| ||
|---|---|---|---|
|
|
|
| |
| Badaro et al. ( | ArSEL | ArSEL is an Arabic sentiment and emotion lexicon designed to supplement the publicly available Arabic Sentiment Lexicon, ArSenL, and provide a large-scale lexicon with emotion and sentiment labels for practically every lemma in ArSenL. | 32,196 Arabic lemmas were annotated with sentiment and emotion scores at the same time. |
| Al-Khatib and El-Beltagy ( | AETD | Egyptian dialect tweets constitute the dataset. Anger, fear, happiness, love, sadness, surprise, sympathy, or none were used to categorize the tweets. | The total number of tweets is 10,065 |
| Almahdawi and Teahan ( | IAEDS | This corpus is consisting of Iraqi dialect Facebook postings. It is divided into six datasets, each of which contains instances of Ekman's basic emotions. | 1,365 posts from Facebook |
| Mohammad et al. ( | SemEval-2018 | This opinion corpus consists of tweets labeled as neutral or as one or more of 11 emotions include anger, anticipation, disgust, fear, happiness, love, optimism, pessimism, sadness, surprise, and trust. | 4,381 tweets |
| Saad ( | Emotion-lexicon | Arabic and English emotion lexicon, each entry in this lexicon annotated with one of Ekman's basic emotions. | Each emotion label has 3,207 Arabic words associated with it |
| Abdul-Mageed et al. ( | DINA | Tweets dataset annotated by the following emotions, happiness, sadness, anger, disgust, surprise, and fear. The tweets were gathered by utilizing a set of seeds to query Twitter for each class. | 3,000 tweets |
| Alhuzali et al. ( | LAMA-DINA | Dataset for MSA and AD emotion, the tweets were labeled with the Plutchik emotions. | 9,064 tweets |
| Yang et al. ( | SenWave | Collected tweets annotated for the task of fine-grained sentiment analysis with 11 emotions including optimistic, thankful, empathetic, pessimistic, anxious, sad, annoyed, denial, official report, surprise, and joking. | 10 K Arabic and English tweets |
| Al-Laith and Alenezi ( | AraEmoCorpus | The corpus contains Arabic tweets tagged with emotion categories: anger, disgust, fear, joy, sadness, and surprise. | 5.5 million Arabic tweets |
| Shakil et al. ( | AEELex | Tweets of users living in Mecca and Riyadh were collected, Arabic English emotion lexicon was classified on the basis of Plutchik's eight basic emotion categories. | 35,383 tweets. |