| Literature DB >> 35327909 |
Haochen Zou1, Kun Xiang2.
Abstract
With the development of Internet technology, short texts have gradually become the main medium for people to obtain information and communicate. Short text reduces the threshold of information production and reading by virtue of its short length, which is in line with the trend of fragmented reading in the context of the current fast-paced life. In addition, short texts contain emojis to make the communication immersive. However, short-text content means it contains relatively little information, which is not conducive to the analysis of sentiment characteristics. Therefore, this paper proposes a sentiment classification method based on the blending of emoticons and short-text content. Emoticons and short-text content are transformed into vectors, and the corresponding word vector and emoticon vector are connected into a sentencing matrix in turn. The sentence matrix is input into a convolution neural network classification model for classification. The results indicate that, compared with existing methods, the proposed method improves the accuracy of analysis.Entities:
Keywords: convolutional neural network; emoticon vectorization algorithm; sentiment analysis
Year: 2022 PMID: 35327909 PMCID: PMC8965825 DOI: 10.3390/e24030398
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1The site “Emojitracker” monitors emoji usage in tweets in real time.
Emojis and their emotional tendencies.
| Emotional Tendencies | Emojis |
|---|---|
| Positive |
|
| Negative |
|
Vectors of four emojis.
| Emoji | Emoji Vector |
|---|---|
|
| 1.253765409217565 |
|
| 1.157409314569509 |
|
| −2.219247091347005 |
|
| −0.441708935387097 |
Figure 2Visualization of five-dimensional emoji sentiment vectors in a two-dimensional space.
Figure 3Structure diagram of the classification model.
Figure 4Structure of RNN.
Figure 5Structure diagram of LSTM cell.
Experimental results with positive-sentiment-value corpus.
| Analysis Method | Identify Quantity | Accuracy |
|---|---|---|
| Naïve Bayes | 1447 | 72.35% |
| CNN (Word2Vec) | 1632 | 81.60% |
| CNN (Emoji2Word, Word2Vec) | 1649 | 82.45% |
| CNN (Emoji2Vec, Word2Vec) | 1704 | 85.15% |
| LSTM | 1660 | 83.00% |
| RNN | 1651 | 82.55% |
| SVM | 1558 | 77.90% |
Experimental results with negative-sentiment-value corpus.
| Analysis Method | Identify Quantity | Accuracy |
|---|---|---|
| Naïve Bayes | 1351 | 67.55% |
| CNN (Word2Vec) | 1473 | 73.65% |
| CNN (Emoji2Word, Word2Vec) | 1385 | 69.25% |
| CNN (Emoji2Vec, Word2Vec) | 1596 | 79.80% |
| LSTM | 1479 | 73.95% |
| RNN | 1470 | 73.50% |
| SVM | 1402 | 70.10% |
Comparison of emotional classification results.
|
|
|
|
|
|
|
|
|
| Naïve Bayes | 79.15% | 82.58% | 82.37% | 78.66% | 80.27% | 79.36% | 80.75% |
| CNN (Word2Vec) | 86.38% | 87.75% | 87.20% | 85.52% | 87.05% | 86.97% | 86.91% |
| CNN (Emoji2Word, Word2Vec) | 85.60% | 89.95% | 87.42% | 85.10% | 87.83% | 86.65% | 87.60% |
| CNN (Emoji2Vec, Word2Vec) | 89.23% | 91.60% | 92.16% | 88.33% | 90.59% | 89.70% | 90.35% |
| LSTM | 86.55% | 87.10% | 87.05% | 88.42% | 86.98% | 88.15% | 87.20% |
| RNN | 84.92% | 86.03% | 87.13% | 84.88% | 86.04% | 84.79% | 85.33% |
| SVM | 83.78% | 85.35% | 85.41% | 83.88% | 84.79% | 84.32% | 84.66% |