| Literature DB >> 29684036 |
Ahmed Al-Saffar1, Suryanti Awang1, Hai Tao1, Nazlia Omar2, Wafaa Al-Saiagh2, Mohammed Al-Bared2.
Abstract
Sentiment analysis techniques are increasingly exploited to categorize the opinion text to one or more predefined sentiment classes for the creation and automated maintenance of review-aggregation websites. In this paper, a Malay sentiment analysis classification model is proposed to improve classification performances based on the semantic orientation and machine learning approaches. First, a total of 2,478 Malay sentiment-lexicon phrases and words are assigned with a synonym and stored with the help of more than one Malay native speaker, and the polarity is manually allotted with a score. In addition, the supervised machine learning approaches and lexicon knowledge method are combined for Malay sentiment classification with evaluating thirteen features. Finally, three individual classifiers and a combined classifier are used to evaluate the classification accuracy. In experimental results, a wide-range of comparative experiments is conducted on a Malay Reviews Corpus (MRC), and it demonstrates that the feature extraction improves the performance of Malay sentiment analysis based on the combined classification. However, the results depend on three factors, the features, the number of features and the classification approach.Entities:
Mesh:
Year: 2018 PMID: 29684036 PMCID: PMC5912726 DOI: 10.1371/journal.pone.0194852
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Features extracted for each review.
| Feature Set Name | Feature Name | |
|---|---|---|
| Sentiment words presence-level features | F1 | Presence of positive words |
| F2 | Presence of negative words. | |
| F3 | Presence of positive words in proportion to the presence of negative words. | |
| F4 | Frequency of positive words in proportion to the frequency of negative words. | |
| Sentence-level features | F5 | Cumulative frequency of positive words in the first three sentences. |
| F6 | Cumulative frequency of negative words in the first three sentences. | |
| F7 | Presence of first positive words synonyms in the first three sentences. | |
| F8 | Presence of first negative words synonyms in the first three sentences. | |
| Sentiment words polarity level features | F9 | Weighted probabilities of a positive review |
| F10 | Weighted probabilities of a negative review. | |
| Subjective words conditional probability features | F11 | Average conditional probability of positive subjective words. |
| F12 | Average conditional probability of negative subjective words. | |
| F13 | Standard deviation of the conditional probability of the subjective words. | |
Fig 1Illustrates the model architecture.
Sample of Malay sentiment lexicon.
| Word in English | Malay word | Malay synonym | Polarity |
|---|---|---|---|
| Like | Seperti | Bagai | 2 |
| Superb | Hebat | Bagus | 5 |
| Bad | Buruk | Jahat | -3 |
Fig 2One sample of the reviews.
Sample of the Malay stop words.
| Malay stop words | English translation |
|---|---|
| Adalah | Be |
| Itu | It |
| Selepas | After |
| Mereka | They |
Fig 3The illustration DBN training processes.
Performance of the NB, SVM, and DBN classifiers.
| FEATURE TYPE | CLASSIFIER | F-MEASURE% |
|---|---|---|
| Unigram | NB | 75.20 |
| Unigram | SVM | 76.05 |
| Unigram | DBN | 80.02 |
| Unigram | Classifier Combination | 80.90 |
Measure for NB, SVM, DBN and combination (Comb.) classifiers.
| NO | F1 | F2 | F3 | F4 | F5 | F6 | F7 | F8 | F9 | F10 | F11 | F12 | F13 | NB | SVM | DBN | Comb. |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1. | 1 | 1 | 1 | 1 | 86.29 | 91.50 | 91.80 | 92.44 | |||||||||
| 2. | 1 | 1 | 1 | 1 | 88.81 | 88.88 | 90.00 | 90.82 | |||||||||
| 3. | 1 | 1 | 1 | 1 | 1 | 1 | 86.23 | 88.81 | 90.20 | 91.78 | |||||||
| 4. | 1 | 1 | 1 | 84.32 | 91.55 | 93.01 | 93.60 | ||||||||||
| 5. | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 88.70 | 90.36 | 91.45 | 92.60 | |||||
| 6. | 1 | 1 | 1 | 1 | 1 | 1 | 87.54 | 90.16 | 91.28 | 92.28 | |||||||
| 7. | 1 | 1 | 1 | 1 | 1 | 88.65 | 90.24 | 91.20 | 91.86 | ||||||||
| 8. | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 88.88 | 92.17 | 92.24 | 92.73 | |||
| 9. | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 88.35 | 91.52 | 93.00 | 93.54 | ||||
| 10. | 1 | 1 | 1 | 1 | 86.74 | 91.22 | 91.88 | 92.22 | |||||||||
| 11. | 1 | 1 | 1 | 1 | 85.57 | 90.94 | 91.55 | 92.00 | |||||||||
| 12. | 1 | 1 | 1 | 1 | 86.08 | 91.10 | 91.45 | 92.72 | |||||||||
| 13. | 1 | 1 | 1 | 1 | 86.62 | 90.31 | 91.85 | 92.54 | |||||||||
| 14. | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 87.24 | 90.10 | 93.08 | 93.66 | |||||
| 15. | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 86.89 | 91.13 | 92.10 | 92.76 | |||||
| 16. | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 87.44 | 91.20 | 93.26 | 93.40 | ||||
| 17. | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 87.24 | 91.15 | 93.20 | 93.64 | |||
| 18. | 1 | 1 | 82.66 | 90.60 | 92.88 | 92.94 | |||||||||||
| 19. | 1 | 1 | 82.87 | 90.16 | 91.24 | 92.19 | |||||||||||
| 20. | 1 | 1 | 83.30 | 90.54 | 91.90 | 92.60 | |||||||||||
| 21. | 1 | 1 | 82.87 | 89.79 | 91.20 | 92.16 | |||||||||||
| 22. | 1 | 1 | 82.76 | 90.84 | 91.90 | 92.22 | |||||||||||
| 23. | 1 | 1 | 82.68 | 90.45 | 91.42 | 92.80 | |||||||||||
| 24. | 1 | 1 | 86.48 | 89.79 | 91.30 | 92.16 | |||||||||||
| 25. | 1 | 1 | 1 | 84.90 | 90.22 | 93.56 | 92.35 | ||||||||||
| 26. | 1 | 1 | 1 | 85.14 | 90.22 | 93.10 | 93.86 | ||||||||||
| 27. | 1 | 1 | 1 | 85.63 | 90.60 | 91.80 | 92.55 | ||||||||||
| 28. | 1 | 1 | 1 | 87.74 | 90.01 | 91.40 | 92.98 | ||||||||||
| 29. | 1 | 1 | 1 | 87.25 | 90.38 | 90.60 | 92.24 | ||||||||||
| 30. | 1 | 1 | 1 | 1 | 84.06 | 90.38 | 90.45 | 92.11 | |||||||||
| 31. | 1 | 1 | 1 | 1 | 84.80 | 90.03 | 91.80 | 92.07 | |||||||||
| 32. | 1 | 1 | 1 | 1 | 87.20 | 90.11 | 93.40 | 93.86 | |||||||||
| 33. | 1 | 1 | 1 | 1 | 85.00 | 90.33 | 91.80 | 92.78 | |||||||||
| 34. | 1 | 1 | 83.88 | 90.22 | 91.54 | 92.23 | |||||||||||
| 35. | 1 | 1 | 87.75 | 90.44 | 93.56 | 93.88 | |||||||||||
| 36. | 1 | 1 | 83.16 | 89.24 | 90.77 | 92.73 | |||||||||||
| 37. | 1 | 1 | 83.60 | 90.86 | 91.22 | 92.48 | |||||||||||
| 38. | 1 | 1 | 83.60 | 90.38 | 91.00 | 92.60 | |||||||||||
| 39. | 1 | 1 | 84.05 | 89.93 | 91.24 | 92.45 | |||||||||||
| 40. | 1 | 1 | 84.10 | 90.32 | 91.25 | 92.34 | |||||||||||
| 41. | 1 | 1 | 84.34 | 89.90 | 90.60 | 92.15 | |||||||||||
| 42. | 1 | 1 | 83.99 | 90.71 | 91.89 | 92.87 | |||||||||||
| 43. | 1 | 1 | 84.88 | 90.49 | 91.32 | 92.48 | |||||||||||
| 44. | 1 | 1 | 84.05 | 90.84 | 91.90 | 92.82 | |||||||||||
| 45. | 1 | 1 | 81.62 | 89.98 | 91.47 | 92.97 | |||||||||||
| 46. | 1 | 1 | 1 | 84.92 | 89.98 | 91.02 | 92.80 | ||||||||||
| 47. | 1 | 1 | 1 | 86.55 | 90.14 | 90.78 | 92.16 | ||||||||||
| 48. | 1 | 1 | 1 | 85.02 | 89.69 | 91.26 | 92.56 | ||||||||||
| 49. | 1 | 1 | 1 | 84.80 | 90.08 | 93.20 | 93.77 | ||||||||||
| 50. | 1 | 1 | 1 | 87.56 | 89.66 | 91.90 | 92.98 | ||||||||||
| 51. | 1 | 1 | 1 | 84.78 | 90.48 | 91.47 | 92.76 | ||||||||||
| 52. | 1 | 1 | 1 | 88.53 | 90.26 | 91.22 | 92.84 | ||||||||||
| 53. | 1 | 1 | 1 | 87.55 | 90.62 | 91.18 | 92.46 | ||||||||||
| 54. | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 89.52 | 92.74 | 94.10 | 94.48 |
Fig 4Illustrates highest results of DBN, NB, SVM and classifier combination + 13 sentiment features and Unigram features.