| Literature DB >> 32435085 |
Hyeonseo Lee1, Nakyeong Lee1, Harim Seo1, Min Song1.
Abstract
The fast-growing digital data generation leads to the emergence of the era of big data, which become particularly more valuable because approximately 70% of the collected data in the world comes from social media. Thus, the investigation of online social network services is of paramount importance. In this paper, we use the sentiment analysis, which detects attitudes and emotions toward issues of society posted in social media, to understand the actual economic situation. To this end, two steps are suggested. In the first step, after training the sentiment classifiers with several big data sources of social media datasets, we consider three types of feature sets: feature vector, sequence vector and a combination of dictionary-based feature and sequence vectors. Then, the performance of six classifiers is assessed: MaxEnt-L1, C4.5 decision tree, SVM-kernel, Ada-boost, Naïve Bayes and MaxEnt. In the second step, we collect datasets that are relevant to several economic words that the public use to explicitly express their opinions. Finally, we use a vector auto-regression analysis to confirm our hypothesis. The results show the statistically significant relationship between public sentiment and economic performance. That is, "depression" and "unemployment" lead to KOSPI. Also, it shows that the extracted keywords from the sentiment analysis, such as "price," "year-end-tax" and "budget deficit," cause the exchange rates. © Springer Science+Business Media, LLC, part of Springer Nature 2019.Entities:
Keywords: Machine learning; Sentiment analysis; Social media; Supervised learning
Year: 2019 PMID: 32435085 PMCID: PMC7224044 DOI: 10.1007/s11227-018-02737-x
Source DB: PubMed Journal: J Supercomput ISSN: 0920-8542 Impact factor: 2.474
Fig. 1Research flow
Economic terms
Fig. 2Data crawling process
Sentiment classification performance results
| Accuracy | Recall | Precision | ||
|---|---|---|---|---|
| Feature set–vector | ||||
| MaxEnt-L1 | 0.6787 ± 0.0051 | 0.500 | 0.708 | 0.74 |
| Decision tree | 0.5096 ± 0.0036 | 0.513 | 0.597 | 0.38 |
| SVM-kernel | 0.4778 ± 0.0099 | 0.500 | 0.483 | 0.62 |
| Ada-boost | 0.6695 ± 0.0049 | 0.259 | 0.323 | 0.23 |
| Naïve Bayes | 0.6763 ± 0.0052 | 0.500 | 0.450 | 0.41 |
| MaxEnt | 0.5129 ± 0.0027 | 0.511 | 0.516 | 0.47 |
| Feature set–sequence | ||||
| MaxEnt-L1 | 0.8929 ± 0.0168 | 0.663 | 0.746 | 0.75 |
| Decision tree | 0.6834 ± 0.0018 | 0.665 | 0.699 | 0.67 |
| SVM-kernel | 0.8942 ± 0.0226 | 0.639 | 0.719 | 0.70 |
| Ada-boost | 0.9153 ± 0.0143 | 0.644 | 0.753 | 0.67 |
| Naïve Bayes | 0.3789 ± 0.0221 | 0.500 | 0.448 | 0.49 |
| MaxEnt | 0.9091 ± 0.0145 | 0.646 | 0.74 | 0.65 |
| Feature set–combined | ||||
| MaxEnt-L1 | 0.9353 ± 0.0076 | 0.931 | 0.930 | 0.93 |
| Decision tree | 0.6834 ± 0.0018 | 0.665 | 0.699 | 0.67 |
| SVM-kernel | 0.8590 ± 0.0188 | 0.747 | 0.812 | 0.80 |
| Ada-boost | 0.8942 ± 0.0226 | 0.639 | 0.719 | 0.70 |
| Naïve Bayes | 0.8751 ± 0.0122 | 0.620 | 0.714 | 0.62 |
| MaxEnt | 0.9556 ± 0.0071 | 0.500 | 0.903 | 0.87 |
Pair-wise Granger causality tests
| Null hypothesis | |
|---|---|
| BOOM does not Granger Cause KOSPI | 1.25 |
| KOSPI does not Granger Cause BOOM | 2.20** |
| DEPR. does not Granger Cause KOSPI_ | 6.86*** |
| KOSPI does not Granger Cause DEPR. | 1.44 |
| UNEMP does not Granger Cause KOSPI_ | 4.96*** |
| KOSPI does not Granger Cause UNEMP | 2.38** |
Asterisks ** and *** correspond to 5% and 1% significance, respectively
Vector auto-regression estimates
| KOSPI | BOOM | DEPR | UNEMP | |
|---|---|---|---|---|
| KOSPI(− 1) | 0.97*** | − 0.01 | 0.17 | − 0.07 |
| (0.05) | (0.19) | (0.20) | (0.17) | |
| KOSPI(− 2) | − 0.01 | 0.05 | 0.24 | 0.46* |
| (0.07) | (0.26) | (0.27) | (0.24) | |
| KOSPI(− 3) | − 0.09 | − 0.46* | − 0.51* | − 0.68*** |
| (0.07) | (0.26) | (0.27) | (0.24) | |
| KOSPI(− 4) | 0.13* | 0.57** | − 0.02 | 0.06 |
| (0.07) | (0.26) | (0.27) | (0.23) | |
| KOSPI(− 5) | − 0.03 | − 0.14 | 0.01 | 0.31* |
| (0.05) | (0.18) | (0.19) | (0.17) | |
| BOOM(− 1) | − 0.02* | − 0.04 | − 0.003 | − 0.09** |
| (0.01) | (0.05) | (0.05) | (0.04) | |
| BOOM(− 2) | 0.01 | 0.09* | 0.07 | − 0.04 |
| (0.01) | (0.05) | (0.05) | (0.04) | |
| BOOM(− 3) | − 0.03** | 0.01 | 0.09* | 0.02 |
| (0.01) | (0.05) | (0.05) | (0.05) | |
| BOOM(− 4) | 0.004 | − 0.08 | 0.01 | − 0.08* |
| (0.01) | (0.05) | (0.05) | (0.05) | |
| BOOM(− 5) | 0.002 | − 0.07 | 0.02 | 0.01 |
| (0.01) | (0.05) | (0.05) | (0.05) | |
| DEPR.(− 1) | − 0.001 | 0.06 | 0.09* | − 0.07* |
| (0.01) | (0.05) | (0.05) | (0.04) | |
| DEPR.(− 2) | − 0.039** | 0.02 | 0.03 | 0.04 |
| (0.01) | (0.05) | (0.05) | (0.04) | |
| DEPR.(− 3) | 0.02* | − 0.05 | − 0.04 | 0.02 |
| (0.01) | (0.05) | (0.05) | (0.04) | |
| DEPR.(− 4) | 0.02 | − 0.03 | 0.05 | 0.07* |
| (0.01) | (0.05) | (0.05) | (0.04) | |
| DEPR.(− 5) | − 0.05** | 0.05 | 0.06 | 0.03 |
| (0.01) | (0.05) | (0.05) | (0.04) | |
| UNEMP.(− 1) | 0.03* | − 0.06 | 0.13** | 0.23*** |
| (0.01) | (0.05) | (0.06) | (0.05) | |
| UNEMP.(− 2) | − 0.06*** | − 0.09* | − 0.05 | 0.002 |
| (0.01) | (0.05) | (0.06) | (0.05) | |
| UNEMP.(− 3) | 0.01 | − 0.07 | − 0.07 | 0.05 |
| (0.01) | (0.05) | (0.06) | (0.05) | |
| UNEMP.(− 4) | − 0.02 | 0.07 | 0.22*** | 0.33*** |
| (0.01) | (0.05) | (0.06) | (0.05) | |
| UNEMP.(− 5) | 0.01 | 0.14*** | − 0.09 | − 0.07 |
| (0.01) | (0.06) | (0.06) | (0.05) | |
| C | − 0.01 | − 0.01 | − 0.03 | 0.01 |
| (0.01) | (0.05) | (0.05) | (0.04) | |
| Likelihood | − 26.28 | − 608.18 | − 631.13 | − 567.01 |
Standard errors are in (). The asterisks *, ** and *** correspond to 10%, 5% and 1% significance, respectively. DEPR and UNEMP represent DEPRESSION AND UNEMPLOYMENT, respectively
(− j) indicates the j-period back observation of the data. For example, (− 1) signifies the day before data
AIC and BIC values
| Lag | AIC | BIC |
|---|---|---|
| 0 | 11.40967 | 11.44702 |
| 1 | 8.814046 | 9.000771* |
| 2 | 8.773471 | 9.109575 |
| 3 | 8.786270 | 9.271754 |
| 4 | 8.680262 | 9.315125 |
| 5 | 8.672718* | 9.456961 |
| 6 | 8.692697 | 9.626320 |
| 7 | 8.697282 | 9.780285 |
| 8 | 8.698914 | 9.931297 |
* Lag order selected by the criterion
Pair-wise Granger causality tests
| Null hypothesis | |
|---|---|
| PRICE does not Granger Cause EXCHANGE_RATE | 4.05** |
| EXCHANGE_RATE does not Granger Cause PRICE | 1.79 |
| YEAR_END_TAX does not Granger Cause EXCHANGE_RATE | 2.59** |
| EXCHANGE_RATE does not Granger Cause YEAR_END_TAX | 1.94 |
| BUDGET_DEFICIT does not Granger Cause EXCHANGE_RATE | 3.44** |
| EXCHANGE_RATE does not Granger Cause BUDGET_DEFICIT | 1.21 |
The asterisks ** correspond to 5% significance
Vector auto-regression model estimates
| EX. Rate | PRICE | YEAR_END TAX | BUDGET DEFICIT | |
|---|---|---|---|---|
| EX.Rate(− 1) | 0.09** | − 0.19 | 3.64 | − 0.01 |
| (0.04) | (2.98) | (2.27) | (2.83) | |
| EX.Rate(− 2) | 0.06 | − 5.62* | 2.15 | 4.30 |
| (0.04) | (2.97) | (2.26) | (2.82) | |
| PRICE(− 1) | 0.00 | 0.1*** | − 0.01 | − 0.03 |
| (0.00) | (0.05) | (0.04) | (0.05) | |
| PRICE(− 2) | 0.00*** | 0.05 | 0.06 | − 0.07 |
| (0.00) | (0.05) | (0.04) | (0.05) | |
| YEAR END TAX(− 1) | 0.00 | − 0.01 | 0.47*** | − 0.03 |
| (0.001) | (0.06) | (0.05) | (0.06) | |
| YEAR END TAX(− 2) | 0.00* | 0.05 | 0.24*** | 0.05 |
| (0.00) | (0.06) | (0.05) | (0.06) | |
| BUDGET DEFICIT (− 1) | 0.00*** | − 0.08* | 0.028 | 0.15*** |
| (0.00) | (0.05) | (0.04) | (0.05) | |
| BUDGET DEFICIT (− 2) | − 0.00 | − 0.04 | − 0.02 | 0.15*** |
| (0.00) | (0.06) | (0.04) | (0.05) | |
| C | − 27*** | − 187 | 186.42* | 138.0 |
| (1.93) | (128.1) | (97.74) | (121.7) | |
| Likelihood | 1239.9 | − 619.2 | − 499.27 | − 596.41 |
Standard errors are shown in (). The asterisks *, ** and *** correspond to 10%, 5% and 1% significance, respectively. Ex. Rate signifies the Exchange Rate
(− j) indicates the j-period back observation of the data
AIC and BIC values
| Lag | AIC | BIC |
|---|---|---|
| 0 | 2.892807 | 2.930151 |
| 1 | 2.372808 | 2.559533* |
| 2 | 2.302566* | 2.638671 |
| 3 | 2.317637 | 2.803121 |
| 4 | 2.348656 | 2.983520 |
| 5 | 2.379977 | 3.164220 |
| 6 | 2.394989 | 3.328613 |
| 7 | 2.440544 | 3.523546 |
| 8 | 2.435519 | 3.667902 |
* Lag order selected by the criterion