| Literature DB >> 33108392 |
Bandeh Ali Talpur1, Declan O'Sullivan2.
Abstract
With widespread usage of online social networks and its popularity, social networking platforms have given us incalculable opportunities than ever before, and its benefits are undeniable. Despite benefits, people may be humiliated, insulted, bullied, and harassed by anonymous users, strangers, or peers. In this study, we have proposed a cyberbullying detection framework to generate features from Twitter content by leveraging a pointwise mutual information technique. Based on these features, we developed a supervised machine learning solution for cyberbullying detection and multi-class categorization of its severity in Twitter. In the study we applied Embedding, Sentiment, and Lexicon features along with PMI-semantic orientation. Extracted features were applied with Naïve Bayes, KNN, Decision Tree, Random Forest, and Support Vector Machine algorithms. Results from experiments with our proposed framework in a multi-class setting are promising both with respect to Kappa, classifier accuracy and f-measure metrics, as well as in a binary setting. These results indicate that our proposed framework provides a feasible solution to detect cyberbullying behavior and its severity in online social networks. Finally, we compared the results of proposed and baseline features with other machine learning algorithms. Findings of the comparison indicate the significance of the proposed features in cyberbullying detection.Entities:
Mesh:
Year: 2020 PMID: 33108392 PMCID: PMC7591033 DOI: 10.1371/journal.pone.0240924
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Proposed framework.
Annotated tweets by category.
| Category | No | Yes | Annotated Tweets |
|---|---|---|---|
| Sexual | 3616 | 229 | 3845 |
| Racial | 4273 | 700 | 4973 |
| Intelligence | 4049 | 810 | 4859 |
| Appearance | 4146 | 676 | 4822 |
| Political | 4961 | 698 | 5659 |
| Total | 21045 | 3113 | 24158 |
Cyberbullying tweets categorised per severity level.
| Category | Annotated Tweets |
|---|---|
| High | 905 |
| Medium | 1398 |
| Low | 810 |
| Non-Cyberbullying | 21045 |
| Combined Total | 24158 |
Dataset distribution by cyberbullying class.
| Classification | Class Distribution |
|---|---|
| High | 4% |
| Medium | 6% |
| Low | 3% |
| Non-Cyberbullying | 87% |
Classifiers performance under various settings in multi-class classification.
| Cases | Classifier | Accuracy | Kappa Statistics | F-Measure |
|---|---|---|---|---|
| Base Classifier | Naïve Bayes | 75.524 | 0.302 | 0.791 |
| KNN | 86.692 | 0.416 | 0.864 | |
| Decision Tree (J48) | 89.714 | 0.475 | 0.886 | |
| Random Forest | 86.576 | 0.417 | 0.864 | |
| SVM | ||||
| Base Classifier with SMOTE | Naïve Bayes | 76.910 | 0.276 | 0.794 |
| KNN | 86.679 | 0.415 | 0.864 | |
| Decision Tree (J48) | 89.731 | 0.479 | 0.887 | |
| Random Forest | ||||
| SVM | 89.747 | 0.475 | 0.886 | |
| Base Classifier with all proposed features | Naïve Bayes | 67.214 | 0.397 | 0.744 |
| KNN | 87.878 | 0.658 | 0.879 | |
| Decision Tree (J48) | 88.363 | 0.647 | 0.883 | |
| Random Forest | ||||
| SVM | 90.328 | 0.663 | 0.883 |
Classifier true positive and false positive rate in multi-class classification.
| Classifier | True Positive Rate | False Positive Rate |
|---|---|---|
| Naïve Bayes | 0.672 | 0.101 |
| KNN | 0.879 | 0.224 |
| Decision Tree (J48) | 0.884 | 0.211 |
| Random Forest | ||
| SVM | 0.903 | 0.341 |
Classifiers performance under various settings in binary classification.
| Cases | Classifier | TP Rate | FP Rate | Precision | Recall | F-Measure | AUC |
|---|---|---|---|---|---|---|---|
| Base Classifier | Naïve Bayes | 0.802 | 0.367 | 0.858 | 0.802 | 0.823 | 0.814 |
| KNN | 0.873 | 0.468 | 0.869 | 0.873 | 0.871 | 0.738 | |
| Decision Tree (J48) | 0.901 | 0.491 | 0.89 | 0.901 | 0.891 | 0.777 | |
| Random Forest | 0.907 | 0.509 | 0.898 | 0.907 | 0.896 | 0.894 | |
| SVM | 0.896 | 0.491 | 0.885 | 0.896 | 0.888 | 0.703 | |
| Base Classifier with all proposed features | Naïve Bayes | 0.881 | 0.201 | 0.901 | 0.881 | 0.875 | 0.858 |
| KNN | 0.909 | 0.104 | 0.909 | 0.909 | 0.909 | 0.933 | |
| Decision Tree (J48) | 0.916 | 0.095 | 0.916 | 0.916 | 0.916 | 0.905 | |
| Random Forest | 0.931 | 0.108 | 0.933 | 0.931 | |||
| SVM | 0.933 | 0.094 | 0.933 | 0.933 | 0.932 | 0.92 |