| Literature DB >> 35295387 |
Zhu Yuan1.
Abstract
More and more tourists are sharing their travel feelings and posting their real experiences on the Internet, generating tourism big data. Online travel reviews can fully reflect tourists' emotions, and mining and analyzing them can provide insight into the value of them. In order to analyze the potential value of online travel reviews by using big data technology and machine learning technology, this paper proposes an improved support vector machine (SVM) algorithm based on travel consumer sentiment analysis and builds an Hadoop Distributed File System (HDFS) system based on Map-Reduce model. Firstly, Internet travel reviews are pre-processed for sentiment analysis of the review text. Secondly, an improved SVM algorithm is proposed based on the main features of linear classification and kernel functions, so as to improve the accuracy of sentiment word classification. Then, HDFS data nodes are deployed on the basis of Hadoop platform with the actual tourism application context. And based on the Map-Reduce programming model, the map function and reduce function are designed and implemented, which greatly improves the possibility of parallel processing and reduces the time consumption at the same time. Finally, an improved SVM algorithm is implemented under the built Hadoop platform. The test results show that online travel reviews can be an important data source for travel big data recommendation, and the proposed method can quickly and accurately achieve travel sentiment classification.Entities:
Keywords: Map-Reduce; big data analysis; sentiment analysis; support vector machine; tourism consumption
Year: 2022 PMID: 35295387 PMCID: PMC8918497 DOI: 10.3389/fpsyg.2022.857292
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Matching results of tourism-based emotional evaluation.
| Evaluation dimension | Weights | Score evaluation | Tendency |
|---|---|---|---|
| Clogging | 20% | 84 | negative |
| Traps | 15% | 85 | negative |
| Dirty and disordered | 15% | 81 | negative |
| Help | 20% | 80 | positive |
| Beautiful Scenery | 15% | 68 | positive |
| Friendliness | 15% | 45 | positive |
Vector table.
| Vector number | 1 | 2 |
|---|---|---|
| 1 | (11,84,7) | (13,80,6) |
| 2 | (22,85,9) | (12,45,4) |
| 3 | (17,81,7) | (13,44,9) |
| 4 | (4,45,−8) | (3,9,4) |
| 5 | (13,45,−10) | (14,32,−5) |
| 6 | (26,24,50) | (15,3,−6) |
Figure 1Support vector machine (SVM)-based travel sentiment classification model.
Experimental results with different d value.
| Order | Recognition rate | Number of support vectors |
|---|---|---|
| 2 | 0.987 | 355 |
| 3 | 0.934 | 299 |
| 4 | 0.955 | 270 |
| 5 | 0.919 | 248 |
| 6 | 0.923 | 234 |
| 7 | 0.910 | 255 |
| 8 | 0.933 | 211 |
Figure 2Map-Reduce model design ideas.
Results of the manual division of the West Lake Scenic Area online comments.
| Number of comments with negative meaning | Number of comments with positive meaning | Total | |
|---|---|---|---|
| Training set | 20,670 | 18,033 | 38,703 |
| Testing set | 16,145 | 11,578 | 27,723 |
| Total | 36,815 | 29,611 | 66,426 |
Figure 3The accuracy comparison results of the original SVM and improved SVM algorithm.
Figure 4The accuracy of improvement algorithms results under the background of big data.
Figure 5The time-consuming comparison of two algorithms.
Classification effects of different models at the same test set.
| Algorithm | Number of test sets |
| ||||
|---|---|---|---|---|---|---|
| Improved SVM | 1,000 | 0.838 | 0.825 | 0.912 | 0.816 | 0.738 |
| 5,000 | 0.874 | 0.883 | 0.919 | 0.859 | 0.803 | |
| 10,000 | 0.880 | 0.889 | 0.909 | 0.866 | 0.837 | |
| 15,000 | 0.880 | 0.875 | 0.907 | 0.886 | 0.847 | |
| 20,000 | 0.874 | 0.861 | 0.915 | 0.891 | 0.825 | |
| 27,723 | 0.870 | 0.856 | 0.915 | 0.890 | 0.817 | |
| NaiveBayes | 1,000 | 0.838 | 0.836 | 0.895 | 0.842 | 0.762 |
| 5,000 | 0.859 | 0.874 | 0.902 | 0.833 | 0.789 | |
| 10,000 | 0.866 | 0.878 | 0.898 | 0.849 | 0.821 | |
| 15,000 | 0.850 | 0.851 | 0.875 | 0.847 | 0.820 | |
| 20,000 | 0.846 | 0.844 | 0.878 | 0.848 | 0.807 | |
| 27,723 | 0.853 | 0.850 | 0.890 | 0.857 | 0.813 | |
| Dictionary | 1,000 | 0.880 | 0.960 | 0.828 | 0.800 | 0.952 |
| 5,000 | 0.845 | 0.900 | 0.811 | 0.790 | 0.888 | |
| 10,000 | 0.880 | 0.924 | 0.872 | 0.823 | 0.892 | |
| 15,000 | 0.855 | 0.930 | 0.836 | 0.753 | 0.889 | |
| 20,000 | 0.856 | 0.928 | 0.811 | 0.784 | 0.916 | |
| 27,723 | 0.867 | 0.931 | 0.841 | 0.790 | 0.906 |
Figure 6Sentiment classification accuracy (Acc) of each classifier.