| Literature DB >> 35864136 |
Shuaitong Liu1, Xiaojun Li2, Changhua Hu1, Junping Yao1, Xiaoxia Han1, Jie Wang1.
Abstract
Spammer detection is essentially a process of judging the authenticity of users, and thus can be regarded as a classification problem. In order to improve the classification performance, multi-classifier information fusion is usually used to realize the automatic detection of spammers by utilizing the information from multiple classifiers. However, the existing fusion strategies do not reasonably take the uncertainty from the results of different classifiers (views) into account, and the relative importance and reliability of each classifier are not strictly distinguished. Therefore, in order to detect spammers effectively, this paper develops a novel multi-classifier information fusion model based on the evidential reasoning (ER) rule. Firstly, according to the user's characterization strategy, the base classifiers are constructed through the profile-based, content-based and behavior-based. Then, the idea of multi-classifier fusion is combined with the ER rule, and the results of base classifiers are aggregated by considering their weights and reliabilities. Extensive experimental results on the real-world dataset verify the effectiveness of the proposed model.Entities:
Mesh:
Year: 2022 PMID: 35864136 PMCID: PMC9304364 DOI: 10.1038/s41598-022-16576-7
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1The structure of the proposed model.
Figure 2The implementation procedure of the proposed model.
Figure 3Development of a framework for feature grouping.
Figure 4SVM classification results based on linear kernel and polynomial kernel functions.
The parameter optimization results of the base classifier and accuracy.
| Base classifier | SVM 1 | SVM 2 | SVM 3 |
|---|---|---|---|
| ( | (1100, 0.8) | (1100, 5) | (1000, 0.9) |
| Accuracy | 83.92% | 84.43% | 83.13% |
Figure 5Belief degree from classifier of behavior-based features.
The parameter setting of the ensemble learning methods.
| Methods | Parameters |
|---|---|
| Bagging | Number of ensembles: 10; Base classifier: SVM; Number of max samples: 1.0; Number of max features: 1.0; Bootstrap: True |
| AdaBoost | Number of ensembles: 10; Base classifier: SVM; Algorithm: ‘SAMME.R’; Learning rate: 1.0 |
| Random forest | Number of ensembles: 100; Criterion: Gini; Max depth: None; Min samples split: 2; Min samples leaf: 1; Bootstrap: True |
| XGBoost | Number of ensembles: 100; Max depth: 6; Gamma: 2; Min child weight: 1; Learning rate: 0.3; Subsample: 1 |
Figure 6Comparison of classification accuracy in all methods.
Comparison of classification performance of various methods.
| Methods | Accuracy | Precision | Recall | F1-score |
|---|---|---|---|---|
| SV | 84.48% | 88.32% | 81.76% | 0.8493 |
| WSV | 85.12% | 81.58% | 0.8582 | |
| DS | 85.45% | 89.05% | 82.99% | 0.8589 |
| ERA | 86.18% | 87.22% | 84.67% | 0.8593 |
| Bagging | 85.82% | 85.40% | 86.03% | 0.8571 |
| AdaBoost | 87.27% | 89.05% | 85.92% | 0.8737 |
| Random forest | 86.55% | 84.67% | 87.88% | 0.8625 |
| GNB | 84.36% | 85.07% | 83.21% | 0.8413 |
| XGBoost | 86.91% | 85.82% | 87.12% | 0.8647 |
| MICFA | 87.63% | 87.05% | 88.32% | 0.8768 |
| SDMER | 87.59% |
Significant values are in [bold].
Figure 7Comparison of classification accuracy in the Single SVM model and the SDMER model.