| Literature DB >> 29889897 |
Mohamad Hazim1, Nor Badrul Anuar1, Mohd Faizal Ab Razak1,2, Nor Aniza Abdullah1.
Abstract
Product reviews are the individual's opinions, judgement or belief about a certain product or service provided by certain companies. Such reviews serve as guides for these companies to plan and monitor their business ventures in terms of increasing productivity or enhancing their product/service qualities. Product reviews can also increase business profits by convincing future customers about the products which they have interest in. In the mobile application marketplace such as Google Playstore, reviews and star ratings are used as indicators of the application quality. However, among all these reviews, hereby also known as opinions, spams also exist, to disrupt the online business balance. Previous studies used the time series and neural network approach (which require a lot of computational power) to detect these opinion spams. However, the detection performance can be restricted in terms of accuracy because the approach focusses on basic, discrete and document level features only thereby, projecting little statistical relationships. Aiming to improve the detection of opinion spams in mobile application marketplace, this study proposes using statistical based features that are modelled through the supervised boosting approach such as the Extreme Gradient Boost (XGBoost) and the Generalized Boosted Regression Model (GBM) to evaluate two multilingual datasets (i.e. English and Malay language). From the evaluation done, it was found that the XGBoost is most suitable for detecting opinion spams in the English dataset while the GBM Gaussian is most suitable for the Malay dataset. The comparative analysis also indicates that the implementation of the proposed statistical based features had achieved a detection accuracy rate of 87.43 per cent on the English dataset and 86.13 per cent on the Malay dataset.Entities:
Mesh:
Year: 2018 PMID: 29889897 PMCID: PMC5995425 DOI: 10.1371/journal.pone.0198884
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Comparison of major boosting systems [38].
| System | Exact greedy | Approximate global | Approximate Local | Out-of-core | Sparsity aware | Parallel |
|---|---|---|---|---|---|---|
| XGBoost | Yes | Yes | Yes | Yes | Yes | Yes |
| R GBM | Yes | No | No | No | Partially | No |
| pGBRT | No | No | Yes | No | No | Yes |
| Spark MLLib | No | Yes | No | No | Partially | No |
| Scikit-learn | Yes | No | No | No | No | No |
| H20 | No | Yes | No | No | Partially | Yes |
Fig 1Different flavors of GBM distributions.
Fig 2Novel sparsity-aware split finding algorithm [38].
Fig 3Research methodology workflow.
Pre-processed elements from raw data.
| Element | Description |
|---|---|
| url | URL to the app’s playstore page. |
| appId | App id of the respective app. |
| title | Name of the app. |
| summary | Summary description of the app. |
| developer | Content type of the request |
| icon | URL to the app’s icon. |
| score | Average rating of the app |
| price | App’s price |
| free | True/False indicator if the app is free or paid. |
Fig 4Snippet of raw HTTP response data.
List of elements extracted from a RAW response file.
| Element | Description | Value | Translation (English) |
|---|---|---|---|
| appID | App id of the app. | com.outfit7.talkingpierrefree | com.outfit7.talkingpierrefree |
| appPrice | Price of the app | 0.0 | 0.0 |
| appScore | Avg rating of the app. | 4.2 | 4.2 |
| appTitle | Name of app. | Talking Pierre the Parrot | Talking Pierre the Parrot |
| revAuthor | Reviewer’s name. | Maya Liya | Maya Liya |
| revDate | Date of the review submitted. | 14 Mei 2015 | 14th May 2015 |
| revRating | Rating given by the reviewer. | 5.0 | 5.0 |
| revTitle | Title of review | Best | Best |
| revText | The review body. | game ni sangat best | This game is so good |
Fig 5The flow of two-step language filtration process.
Rules for labeling reviews.
| Review rating | Review sentiment polarity | Label |
|---|---|---|
| [0, 1, 2] | Negative | Normal |
| 3 | Negative | Spam |
| [4, 5] | Negative | Spam |
| [0, 1, 2] | Neutral | Spam |
| 3 | Neutral | Normal |
| [4, 5] | Neutral | Spam |
| [0, 1, 2] | Positive | Spam |
| 3 | Positive | Spam |
| [4, 5] | Positive | Normal |
Sample of spam reviews in neutral sentiment polarity category in private dataset.
| Review author | Review text | Translation (English) | Rating |
|---|---|---|---|
| Aku Ya | cacing ni halal ke haram bro ☺ | bro, is this worm halal or haram ☺ | 5 |
| Amirul Izzat | hmmm | hmmm | 5 |
| Nadi Sudin | 2016 | 2016 | 5 |
| Abu Zaid | pakai internet ke | are you using the internet | 5 |
| badri timalsena | Wew | Wew | 1 |
| NA | xgdjgokzfbbinovgvkbjjcbjkdvxfp | xgdjgokzfbbinovgvkbjjcbjkdvxfp | 5 |
| Ina Evaina | Aku belom coba game ni | I haven’t tried this game | 5 |
| Noriha Abf Ghani | Ewr | Ewr | 5 |
| Mawar Izuan | hai mawar | hey mawar | 1 |
Fig 6Distribution of spam and normal reviews across review ratings in private dataset.
Fig 7Features ranking based on variable importance gain scores.
List of existing and proposed features.
| Label | Features | Category | Gain Score | References |
|---|---|---|---|---|
| F6 | Average cosine similarity between review bodies. | Numerical | 0.12102 | [ |
| F26 | Sentiment polarity of review text | Categorical | 0.09480 | [ |
| F4 | Position of the review in the reviews of a product sorted by date (ascending). | Numerical | 0.07711 | [ |
| F5 | Position of the review in the reviews of a product sorted by date (descending). | Numerical | 0.07237 | [ |
| F2 | Length of review body. | Numerical | 0.05983 | [ |
| F3 | Rating of review. | Numerical | 0.05547 | [ |
| F15 | Automated Readability Index (ARI) of review body. | Numerical | 0.05102 | [ |
| F14 | Standard deviation between average review ratings with current review rating. | Numerical | 0.08079 | Proposed |
| F7 | Average levenshtein distance between review bodies. | Numerical | 0.06223 | Proposed |
| F13 | Average number of letters per word in review body. | Numerical | 0.05044 | Proposed |
Fig 8Snippet of the type of features in R data frame for the private and public datasets.
Performance metrics used in model evaluation.
| Evaluation measure | Descriptions |
|---|---|
| Confusion matrix | Shows the information about the actual and predicted classifications. |
| Accuracy | Calculates the percentage of correctly predicted instances either normal or spam. |
| Sensitivity/ True positive rate (TPR/ Recall | Calculates the correctly predicted instances as spam. |
| False Positive Rate (FPR) | Calculates the incorrectly predicted instances as spam. |
| Specificity/ True Negative Rate (TNR) | Measures of correctly predicted instances as normal. |
| Precision | Measures whether the prediction is precise or not. |
| F-measure | Calculates the weighted harmonic mean of precision and recall. |
Evaluation of different boosting classifiers using existing features on multilingual datasets.
| Dataset | Evaluation measure (%) | XGBoost | GBM AdaBoost | GBM Gaussian | GBM Bernoulli | GBM Poisson |
|---|---|---|---|---|---|---|
| English | Accuracy | 85.45 | 85.19 | 84.92 | 85.45 | |
| Recall | 12.93 | 21.55 | 16.37 | 5.17 | ||
| FPR | 4.37 | 1.72 | 2.19 | 2.66 | ||
| Specificity | 95.62 | 98.28 | 97.34 | 97.34 | ||
| Precision | 54.84 | 57.69 | 59.52 | 52.78 | ||
| F-measure | 21.12 | 31.65 | 25.00 | 9.83 | ||
| Malay | Accuracy | 85.20 | 85.07 | 84.87 | 84.87 | |
| Recall | 36.53 | 48.23 | 36.52 | 29.43 | ||
| FPR | 3.53 | 6.40 | 8.04 | 3.94 | ||
| Specificity | 96.47 | 93.60 | 91.95 | 96.06 | ||
| Precision | 70.55 | 63.55 | 61.87 | 68.21 | ||
| F-measure | 48.13 | 54.84 | 47.58 | 42.24 |
Comparative evaluation with existing and proposed features on English and Malay datasets.
| Evaluation measure (%) | Without proposed features | With proposed features | Without proposed features | With proposed features |
|---|---|---|---|---|
| Accuracy | 85.45 | 85.27 | ||
| Recall | 29.31 | 56.38 | ||
| FPR | 4.69 | 8.04 | ||
| Specificity | 95.31 | 91.95 | ||
| Precision | 54.84 | 61.87 | ||
| F-measure | 38.20 | 59.00 |
Confusion matrix for all English and Malay models.
| Model | Predicted | Actual | |
|---|---|---|---|
| Fake | Normal | ||
| English-A | Fake | 34 | 28 |
| Normal | 82 | 612 | |
| English-B | Fake | 51 | 30 |
| Normal | 65 | 610 | |
| Malay-A | Fake | 159 | 98 |
| Normal | 123 | 1120 | |
| Malay-B | Fake | 162 | 88 |
| Normal | 120 | 1130 | |
Fig 9Percentage of score difference between English-A and English-B models.
Fig 10Percentage of score difference between Malay-A and Malay-B models.