| Literature DB >> 36211996 |
Natt Leelawat1,2, Sirawit Jariyapongpaiboon1, Arnon Promjun1, Samit Boonyarak3, Kumpol Saengtabtim1, Ampan Laosunthara2, Alfan Kurnia Yudha1, Jing Tang2,3.
Abstract
The coronavirus disease 2019 (COVID-19) pandemic has severely affected Thailand's economy, which relies heavily on tourism. In this study, we labeled the sentiment and intention classes of English-language tweets related to tourism in Bangkok, Chiang Mai, and Phuket. Then, the accuracy of three machine learning algorithms (decision tree, random forest, and support vector machine) in predicting the sentiments and intentions of the tweets was investigated. The support vector machine algorithm provided the best results for sentiment analysis, with a maximum accuracy of 77.4%. In the intention analysis, the random forest algorithm achieved an accuracy of 95.4%. In a subsequent preliminary qualitative content analysis, the top 10 words found in each sentiment and intention class were gathered to provide insights and suggestions to help increase tourism in Thailand. The results of this study suggest that to help restore tourism in Thailand, tourist destinations, natural attractions, restaurants, and nightlife should be promoted. In addition, the two main concerns of tourists to Thailand should be addressed: COVID-19 and current political tensions.Entities:
Keywords: COVID19; Machine learning; Sentiment analysis; Thailand; Tourism; Tweet
Year: 2022 PMID: 36211996 PMCID: PMC9527204 DOI: 10.1016/j.heliyon.2022.e10894
Source DB: PubMed Journal: Heliyon ISSN: 2405-8440
Figure 1Number of domestic travelers for Chaing mai, Bangkok, and Phuket (2017–2021).
Figure 2Number of foreign travelers for Chaing mai, Bangkok, and Phuket (2017–2021).
Figure 3Flowchart of the present study.
Number of foreign tweets collected for three main cities in Thailand.
| Jul | Aug | Sep | Oct | Nov | Dec | Total | |
|---|---|---|---|---|---|---|---|
| Bangkok | 10,746 | 9,913 | 9,778 | 11,236 | 9,698 | 10,614 | 61,985 |
| Phuket | 2,980 | 10,519 | 13,345 | 14,965 | 16,547 | 25,546 | 83,902 |
| Chiang Mai | 831 | 807 | 759 | 899 | 794 | 603 | 4,693 |
Figure 4Number of tweets separated by sentiment.
Figure 5Numbers of tweets separated by intention to visit or not visit.
Criteria for sentiment classification.
| Positive | Neutral | Negative |
|---|---|---|
Expresses an eagerness to visit Describes positive experiences of Thailand Uses positive adjectives Suggests why others should visit a particular tourist destination | Displays the current location at the time of creating a tweet Describes a tourist destination without expressing positive or negative feelings Tweets a question related to Thailand tourism | Describes negative experiences of Thailand Uses profanity Displays a lack of willingness to visit Exhibits a willingness to avoid Thailand |
Criteria for intention labeling.
| To Visit | Not to Visit |
|---|---|
Displays an eagerness to visit | Displays a lack of willingness to visit Displays a willingness to avoid Thailand |
Tuning hyperparameter for each type of selected machine learning algorithm.
| CART | Random forest | SVM |
|---|---|---|
| Decision criterion (Criterion) | Number of trees | Kernel type |
| Maximum tree depth (Max depth) | Decision criterion (Criterion), | Regularization parameter (C) |
| Minimum number of samples for splitting nodes (Min sample splits) | Maximum tree depth (Max depth) | Gamma parameter (Gamma) |
| Minimum number of samples in a leaf (Min sample leaf) | Minimum number of samples for splitting nodes (Min sample splits) | |
| Minimum number of samples in a leaf (Min sample leaf). |
Results from sentiment analysis for Bangkok dataset.
| Algorithm | Data preprocessing | Accuracy | F1-score |
|---|---|---|---|
| CART | Unigram, Over-sampling | 0.637 | 0.601 |
| Random Forest | Unigram, Under-sampling | 0.681 | 0.682 |
| Support Vector Machine | Unigram, Under-sampling | 0.721 | 0.701 |
Results from sentiment analysis for Chiang Mai dataset
| Algorithm | Data preprocessing | Accuracy | F1-score |
|---|---|---|---|
| CART | Unigram, Over-sampling | 0.637 | 0.578 |
| Random Forest | Unigram, Over-sampling | 0.637 | 0.634 |
| Support Vector Machine | Unigram, Over-sampling | 0.667 | 0.662 |
Results from sentiment analysis for Phuket dataset
| Algorithm | Data preprocessing | Accuracy | F1-score |
|---|---|---|---|
| CART | Unigram, Under-sampling | 0.639 | 0.632 |
| Random Forest | Unigram, Over-sampling | 0.708 | 0.713 |
| Support Vector Machine | Unigram, Over-sampling | 0.774 | 0.771 |
Results from Intention Analysis for Bangkok dataset.
| Algorithm | Data preprocessing | Accuracy | F1-score |
|---|---|---|---|
| CART | Bigram, Over-sampling | 0.943 | 0.933 |
| Random Forest | Bigram, Over-sampling | 0.954 | 0.950 |
| Support Vector Machine | Bigram, Over-sampling | 0.920 | 0.886 |
Results from Intention Analysis for Chiang Mai dataset
| Algorithm | Data preprocessing | Accuracy | F1-score |
|---|---|---|---|
| CART | Bigram, Over-sampling | 0.911 | 0.879 |
| Random Forest | Unigram, Over-sampling | 0.775 | 0.804 |
| Support Vector Machine | Unigram, Over-sampling | 0.912 | 0.893 |
Results from Intention Analysis for Phuket dataset
| Algorithm | Data preprocessing | Accuracy | F1-score |
|---|---|---|---|
| CART | Bigram, Over-sampling | 0.932 | 0.914 |
| Random Forest | Unigram, Over sampling | 0.797 | 0.797 |
| Support Vector Machine | Unigram, Over-sampling | 0.938 | 0.926 |
Top 10 words in the positive, neutral, and negative classes in sentiment analysis.
| Positive | Neutral | Negative | |||||||
|---|---|---|---|---|---|---|---|---|---|
| BKK | CM | PK | BKK | CM | PK | BKK | CM | PK | |
| 1 | Day | Good | Beach | Post | Coffee | Beach | Protest | Go | Island |
| 2 | One | Day | Pool | Wat | Post | Post | Police | Still | Covid |
| 3 | Go | One | View | 2020 | Day | View | People | Covid | Go |
| 4 | Love | Place | Luxury | New | Morning | Island | Govnment | People | Beach |
| 5 | Best | Go | Beautiful | Temple | Go | Pool | Wat | New | Time |
| 6 | Time | Beautiful | Bedroom | Day | Time | Chalong | Covid | Year | Town |
| 7 | Food | Love | One | Go | Good | Resort | Go | Night | Hotel |
| 8 | Good | Visit | Sea | One | Cafe | Locate | Traffic | Last | New |
| 9 | City | Time | Island | Hotel | Wat | House | Pro-democracy | Use | Year |
| 10 | See | Best | private | Protest | One | Apartment | Rally | Province | Pandemic |
Many tweets intentionally referred “government” as “govnment.”
Top 10 tourism-related words in the positive, neutral, and negative classes in sentiment analysis.
| Positive | Neutral | Negative | |||||||
|---|---|---|---|---|---|---|---|---|---|
| BKK | CM | PK | BKK | CM | PK | BKK | CM | PK | |
| 1 | Night | Temple | Beach | Temple | Coffee | Beach | Protest | Night | Island |
| 2 | Temple | Festival | Pool | Wat | Morning | View | Police | COVID | COVID |
| 3 | Street | Life | Luxury | Night | Life | Island | Govnment | Province | Beach |
| 4 | Market | Night | Beautiful | Protest | Garden | Pool | Traffic | Buddhist | Town |
| 5 | Wat | Doi | Bedroom | Art | Krathong | Chalong | Monarchy | Elephant | Pandemic |
| 6 | View | Waterfall | Sea | Street | Cafe | Resort | Reform | Festival | Ghost |
| 7 | Restaurant | Mountain | Island | Market | Date | Sea | Road | Airport | Industry |
| 8 | Road | Coffee | Sale | Park | Park | Patong | Pro-democracy | Protest | Patong |
| 9 | Bar | Wat | Apartment | Bar | Wat | Bedroom | Chaos | Sexism | Refund |
| 10 | Friend | Restaurant | Bathroom | Station | Airport | Kata | COVID | Crowd | Hurt |
Many tweets intentionally referred “government” as “govnment.”
Top 10 words in the “to visit” and “not to visit” classes obtained in the intention analysis.
| To Visit | Not to Visit | |||||
|---|---|---|---|---|---|---|
| BKK | CM | PK | BKK | CM | PK | |
| 1 | Go | Go | Go | Post | Day | Beach |
| 2 | Next | Back | Island | Protest | Good | Pool |
| 3 | Want | Visit | Beach | Day | One | View |
| 4 | Year | Day | Take | Wat | City | Island |
| 5 | Miss | Place | Day | One | Year | Bedroom |
| 6 | Back | One | Visit | 2020 | Time | Luxury |
| 7 | Visit | Time | Want | New | Place | One |
| 8 | Time | Want | Time | Go | Go | Sea |
| 9 | See | Take | Year | Time | Coffee | Beautiful |
| 10 | Day | See | Soon | City | Morning | Locate |
Top 10 words in the “to visit” and “not to visit” classes obtained in the intention analysis of tourism-related words.
| To Visit | Not to Visit | |||||
|---|---|---|---|---|---|---|
| BKK | CM | PK | BKK | CM | PK | |
| 1 | Flight | Home | Island | Protest | Coffee | Beach |
| 2 | Temple | Temple | Beach | Wat | Life | Pool |
| 3 | Home | Plan | Visit | Temple | Festival | View |
| 4 | Friend | Mountain | Want | Night | Temple | Bedroom |
| 5 | Wait | Car | Soon | Street | Event | Sea |
| 6 | Night | Work | Need | Market | Wat | Beautiful |
| 7 | COVID | Family | Miss | Police | Doi | Apartment |
| 8 | Visit | Everything | Back | Bar | Krathong | Resort |
| 9 | Life | Street | Hope | Restaurant | Garden | House |
| 10 | Plan | Doi | Like | Park | Airport | Patong |
Figure 6WordCloud result for Bangkok. Positive sentiment (a); Negative sentiment (b); Neutral sentiment (c); Intension not to visit (d); and Intension to visit (e).
Figure 7WordCloud result for Chiang Mai. Positive sentiment (a); Negative sentiment (b); Neutral sentiment (c); Intension not to visit (d); and Intension to visit (e).
Figure 8WordCloud result for Phuket. Positive sentiment (a); Negative sentiment (b); Neutral sentiment (c); Intension not to visit (d); and Intension to visit (e).