| Literature DB >> 29624573 |
Abstract
In this paper, we propose an emotion separated method(SeTF·IDF) to assign the emotion labels of sentences with different values, which has a better visual effect compared with the values represented by TF·IDF in the visualization of a multi-label Chinese emotional corpus Ren_CECps. Inspired by the enormous improvement of the visualization map propelled by the changed distances among the sentences, we being the first group utilizes the Word Mover's Distance(WMD) algorithm as a way of feature representation in Chinese text emotion classification. Our experiments show that both in 80% for training, 20% for testing and 50% for training, 50% for testing experiments of Ren_CECps, WMD features get the best f1 scores and have a greater increase compared with the same dimension feature vectors obtained by dimension reduction TF·IDF method. Compared experiments in English corpus also show the efficiency of WMD features in the cross-language field.Entities:
Mesh:
Year: 2018 PMID: 29624573 PMCID: PMC5889067 DOI: 10.1371/journal.pone.0194136
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
The number of multi-label sentences in Ren_CECps.
| label No. | total | one | two | three | four | five | six |
|---|---|---|---|---|---|---|---|
| sentence No. | 36525 | 22751 | 11731 | 1847 | 175 | 15 | 6 |
| per. (%) | 100 | 62.2888 | 32.1177 | 5.0568 | 0.4791 | 0.0004 | 0.0001 |
Fig 1Visualization of Ren_CECps in traditional TF⋅IDF(left) and SeTF·IDF(right).
The comparison of time-consuming in WMD and fast WMD.
| Groups | Case 1 | Case 2 | Case 3 |
|---|---|---|---|
| per 10 times(s) | |||
| 632.545 | 646.700 | 646.237 | |
| 0.047 | 0.042 | 0.031 | |
| rate | |||
The results of experiments on Ren_CECps and 20 newsgroup.
| Type | Algorithm | Precision | Recall | F1-score |
|---|---|---|---|---|
| TF·IDF | 0.210819957 | 0.197094468 | 0.203726295 | |
| SeTF·IDF | 0.355171204 | 0.236033564 | 0.283598272 | |
| TF·IDF1800 | 0.116894026 | 0.115778614 | ||
| WMD | 0.358587638 | 0.273787523 | ||
| Seed_TF·IDF | 0.284783174 | 0.227757698 | 0.253098099 | |
| TF·IDF_word2vec | 0.218511086 | 0.223298753 | 0.220878979 | |
| sent2vec | 0.209975675 | 0.153755042 | 0.177520441 | |
| TF·IDF | 0.203162567 | 0.190372868 | 0.196559888 | |
| SeTF·IDF | 0.361914601 | 0.246556037 | 0.293300035 | |
| TF·IDF1800 | 0.117098454 | 0.113943072 | ||
| WMD | 0.338477937 | 0.300256706 | ||
| Seed_TF·IDF | 0.29824698 | 0.233749726 | 0.262088652 | |
| TF·IDF_word2vec | 0.238826227 | 0.231257587 | 0.234980977 | |
| sent2vec | 0.223046830 | 0.154791461 | 0.182754080 | |
| TF·IDF | 0.688655922 | 0.680110185 | ||
| TF·IDF2000 | 0.078399628 | 0.07408649 | ||
| WMD | 0.701975748 | 0.598521007 | 0.646133456 | |
| Seed_TF·IDF | 0.654635361 | 0.647471893 | 0.651033922 | |
| TF·IDF_word2vec | 0.582910280 | 0.581137014 | 0.582022296 | |
| sent2vec | 0.246903835 | 0.264207205 | 0.255262622 |
Fig 2The results of 1v1 and 4v1 experiments.
Fig 3The results of five methods in 20 newsgroup experiments.