| Literature DB >> 32963511 |
Yasin Kirelli1, Seher Arslankaya1.
Abstract
As the usage of social media has increased, the size of shared data has instantly surged and this has been an important source of research for environmental issues as it has been with popular topics. Sentiment analysis has been used to determine people's sensitivity and behavior in environmental issues. However, the analysis of Turkish texts has not been investigated much in literature. In this article, sentiment analysis of Turkish tweets about global warming and climate change is determined by machine learning methods. In this regard, by using algorithms that are determined by supervised methods (linear classifiers and probabilistic classifiers) with trained thirty thousand randomly selected Turkish tweets, sentiment intensity (positive, negative, and neutral) has been detected and algorithm performance ratios have been compared. This study also provides benchmarking results for future sentiment analysis studies on Turkish texts.Entities:
Mesh:
Year: 2020 PMID: 32963511 PMCID: PMC7492944 DOI: 10.1155/2020/1904172
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1Data collection process via Twitter [28].
Figure 2Progress of data evaluation.
Data preprocessing steps.
| Remove numbers | Deleting numerical expressions in the texts |
| Remove punctuations | Deleting special characters and punctuation marks in the texts |
| Remove stop words | Removal of stop words that do not change the meaning of the sentence specified for Turkish |
| Remove whitespace | Deleting the blank characters in the text |
| Word stemming | Determining the word roots using Zemberek Turkish NLP in the sentence [ |
Training set attributes.
| @Relation train |
| @attribute document string |
| @attribute sentiment class {1,0} |
| @data |
Figure 3Word variable set.
Emotion analysis studies in Turkish language.
| Authors | Methodology | Data | Indicators | Performance result |
|---|---|---|---|---|
| Erdogan et al. [ |
| 2018 | Five most used cryptocurrencies in English text tweets | 94.60 |
| Ciftci et al. [ | RNN-based algorithm | 2018 | Turkish Wikipedia articles | 83.30 |
| Coban et al. [ | BoW vs W2VC model | 2013 | Turkish Twitter messages in the telecom sector | 59.17 |
| Ecemiş et al. [ | Support vector machine | 2018 | Turkey-based geographical user data | 0.954 |
| Isik et al. [ | Novel stacked ensemble method for sentiment analysis | 2018 | IMDB dataset including 1000 positive and 1000 negative; 2000 movie comments have been used | 0.791 |
| Karcioglu et al. [ | Linear SVM and logistics regression | 2019 | Random English and Turkish texts have been collected by Twitter | 65.62 |
| Uslu et al. [ | Logistics regression | 2019 | User reviews have been collected from Turkey's most preferred movie site | 77.35 |
| Kanmaz et al. [ | Decision trees, support vector machine, and Naive Bayes methods | 1996–2018 | News text-related stock exchange | 0.64–0.80 |
| Doğan et al. [ | LSTM recurrent neural networks | 2019 | In the study, a single mixed data pool with two categories is created with data collected from multiple social networks | 0.9194–0.9266 |
| Salur et al. [ | Random forest classification method | 2019 | Tweets collected about special tourism centers | 88.974 |
| Santur [ | Gated recurrent unit method | 2019 | Turkish e-commerce platform user reviews | 0.955 |
| Kamis et al. [ | Multiple CNN's and LSTM network | 2017 | A corpus of different datasets is utilized based on three datasets used in SemEval (semantic assessment) | 0.59 |
| Ogul et al. [ | Logistic regression classifier | 2017 | Public SemEval (semantic assessment) in three different sentiment analysis datasets containing both Turkish and English texts | 79.56 |
| Rumelli et al. [ |
| 2019 | The dataset is built by using e-commerce website ( | 73.8 |
| Hayran et al. [ | Support vector machine (SVM) classifier | 2017 | A Turkish text dataset classified (16000 positive and 16000 negative emotion) by emoji icon | 80.05 |
Figure 4Naive Bayes model evaluation.
Figure 5K-NN (nearest neighbor) model evaluation.
Figure 6Support vector machine (SVM) model evaluation.
Evaluation results.
| Classifier | TP rate | FP rate | Precision | Recall | F-Measure |
|---|---|---|---|---|---|
| K-NN | 0.746 | 0.251 | 0.748 | 0.746 | 0.746 |
| SVM | 0.735 | 0.269 | 0.735 | 0.735 | 0.735 |
| NB (Bayes) | 0.654 | 0.347 | 0.654 | 0.654 | 0.654 |
Figure 7Metric comparison of models.
Recommended combined technique.
| Integrated technique | Classification algorithm | Accuracy (%) |
|---|---|---|
| Zemberek Turkish NLP (word stemming), N-gram (2.3) | K-NN | 74.63 |
| Zemberek Turkish NLP (word stemming), N-gram (2.3) | SVM | 73.51 |
| Zemberek Turkish NLP (word stemming), N-gram (2.3) | NB (Bayes) | 65.43 |