| Literature DB >> 34867078 |
Srishti Vashishtha1, Seba Susan1.
Abstract
We have proposed MultiLexANFIS which is an adaptive neuro-fuzzy inference system (ANFIS) that incorporates inputs from multiple lexicons to perform sentiment analysis of social media posts. We classify tweets into two classes: neutral and non-neutral; the latter class includes both positive and negative polarity. This type of classification will be considered for applications that aim to test the neutrality of content posted by the users in social media platforms. In our proposed model, features are extracted by integrating natural language processing with fuzzy logic; hence, it is able to deal with the fuzziness of natural language in a very efficient and automatic manner. We have proposed a novel set of 64 rules for the proposed neuro-fuzzy network that can classify tweets correctly by working on fuzzy features fetched from VADER, AFINN and SentiWordNet lexicons. The proposed novel rules are domain independent, i.e., we can extend these rules for any textual data that employs lexicons. The antecedent and consequent parameters of the ANFIS are optimized by gradient descent and least squares estimate algorithms, respectively, in an iterative manner. The key contributions of this paper are: (1) a novel neuro-fuzzy system: MultiLexANFIS that takes as its input the positive and negative sentiment scores of tweets computed from multiple lexicons-VADER, AFINN and SentiWordNet, in order to classify the tweets into neutral and non-neutral content, (2) a novel set of 64 rules for the Sugeno-type fuzzy inference system-MultiLexANFIS, (3) single-lexicon-based ANFIS variants to classify tweets when multiple lexicons are not available and (4) comparison of MultiLexANFIS with different fuzzy, non-fuzzy and deep learning state of the art on various benchmark datasets revealing the superiority of our proposed neuro-fuzzy system for social sentiment analysis.Entities:
Keywords: ANFIS; Lexicon; Neuro-fuzzy network; Sentiment analysis; Social media; Tweets
Year: 2021 PMID: 34867078 PMCID: PMC8628494 DOI: 10.1007/s00500-021-06528-0
Source DB: PubMed Journal: Soft comput ISSN: 1432-7643 Impact factor: 3.643
Class distribution of different datasets
| Sanders | Nuclear | Apple | SemEval | SemEval | SemEval | STS-Test (Go et al. | Airline | |||
|---|---|---|---|---|---|---|---|---|---|---|
| Positive | 519 | 10 | 423 | 2375 | 5157 | 4377 | 182 | 2363 | 72,249 | 15,829 |
| Negative | 572 | 19 | 1219 | 3972 | 1225 | 1745 | 177 | 9178 | 35,509 | 8278 |
| Neutral | 2333 | 161 | 2162 | 5937 | 2667 | 5593 | 139 | 3099 | 55,212 | 12,909 |
| Total | 3424 | 190 | 3804 | 12,284 | 9049 | 11,715 | 498 | 14,640 | 162,970 | 37,014 |
Fig. 1Single-lexicon-based ANFIS architecture with nine rules
MultiLexANFIS rules (1–32)
| Rules | Input1 | Input2 | Input3 | Input4 | Input5 | Input6 | Output |
|---|---|---|---|---|---|---|---|
| 1 | Low | Low | Low | Low | Low | Low | O1 |
| 2 | Low | Low | Low | Low | Low | High | O2 |
| 3 | Low | Low | Low | Low | High | Low | O3 |
| 4 | Low | Low | Low | Low | High | High | O4 |
| 5 | Low | Low | Low | High | Low | Low | O5 |
| 6 | Low | Low | Low | High | Low | High | O6 |
| 7 | Low | Low | Low | High | High | Low | O7 |
| 8 | Low | Low | Low | High | High | High | O8 |
| 9 | Low | Low | High | Low | Low | Low | O9 |
| 10 | Low | Low | High | Low | Low | High | O10 |
| 11 | Low | Low | High | Low | High | Low | O11 |
| 12 | Low | Low | High | Low | High | High | O12 |
| 13 | Low | Low | High | High | Low | Low | O13 |
| 14 | Low | Low | High | High | Low | High | O14 |
| 15 | Low | Low | High | High | High | Low | O15 |
| 16 | Low | Low | High | High | High | High | O16 |
| 17 | Low | High | Low | Low | Low | Low | O17 |
| 18 | Low | High | Low | Low | Low | High | O18 |
| 19 | Low | High | Low | Low | High | Low | O19 |
| 20 | Low | High | Low | Low | High | High | O20 |
| 21 | Low | High | Low | High | Low | Low | O21 |
| 22 | Low | High | Low | High | Low | High | O22 |
| 23 | Low | High | Low | High | High | Low | O23 |
| 24 | Low | High | Low | High | High | High | O24 |
| 25 | Low | High | High | Low | Low | Low | O25 |
| 26 | Low | High | High | Low | Low | High | O26 |
| 27 | Low | High | High | Low | High | Low | O27 |
| 28 | Low | High | High | Low | High | High | O28 |
| 29 | Low | High | High | High | Low | Low | O29 |
| 30 | Low | High | High | High | Low | High | O30 |
| 31 | Low | High | High | High | High | Low | O31 |
| 32 | Low | High | High | High | High | High | O32 |
MultiLexANFIS rules (33–64)
| Rules | Input1 | Input2 | Input3 | Input4 | Input5 | Input6 | Output |
|---|---|---|---|---|---|---|---|
| 33 | High | Low | Low | Low | Low | Low | O33 |
| 34 | High | Low | Low | Low | Low | High | O34 |
| 35 | High | Low | Low | Low | High | Low | O35 |
| 36 | High | Low | Low | Low | High | High | O36 |
| 37 | High | Low | Low | High | Low | Low | O37 |
| 38 | High | Low | Low | High | Low | High | O38 |
| 39 | High | Low | Low | High | High | Low | O39 |
| 40 | High | Low | Low | High | High | High | O40 |
| 41 | High | Low | High | Low | Low | Low | O41 |
| 42 | High | Low | High | Low | Low | High | O42 |
| 43 | High | Low | High | Low | High | Low | O43 |
| 44 | High | Low | High | Low | High | High | O44 |
| 45 | High | Low | High | High | Low | Low | O45 |
| 46 | High | Low | High | High | Low | High | O46 |
| 47 | High | Low | High | High | High | Low | O47 |
| 48 | High | Low | High | High | High | High | O48 |
| 49 | High | High | Low | Low | Low | Low | O49 |
| 50 | High | High | Low | Low | Low | High | O50 |
| 51 | High | High | Low | Low | High | Low | O51 |
| 52 | High | High | Low | Low | High | High | O52 |
| 53 | High | High | Low | High | Low | Low | O53 |
| 54 | High | High | Low | High | Low | High | O54 |
| 55 | High | High | Low | High | High | Low | O55 |
| 56 | High | High | Low | High | High | High | O56 |
| 57 | High | High | High | Low | Low | Low | O57 |
| 58 | High | High | High | Low | Low | High | O58 |
| 59 | High | High | High | Low | High | Low | O59 |
| 60 | High | High | High | Low | High | High | O60 |
| 61 | High | High | High | High | Low | Low | O61 |
| 62 | High | High | High | High | Low | High | O62 |
| 63 | High | High | High | High | High | Low | O63 |
| 64 | High | High | High | High | High | High | O64 |
Fig. 2MultiLexANFIS architecture with 64 rules
Fig. 3Overall process of our proposed ANFIS for social SA
VADER lexicon-based ANFIS
| Datasets | TRIMF | GBELL | GAUSS |
|---|---|---|---|
| Root mean square error (RMSE) | |||
| Apple | 0.46213 | 0.45887 | |
| Nuclear | 0.37404 | 0.35645 | |
| Sanders | 0.43458 | 0.43683 | |
| SemEval ( | 0.47819 | 0.47825 | |
| SemEval ( | 0.45249 | 0.45197 | |
| SemEval ( | 0.45301 | 0.45266 | |
| STS (Go et al. | 0.37445 | 0.36976 | |
| Airline | 0.39421 | 0.38901 | |
| 0.44973 | 0.43371 | ||
| 0.43592 | 0.39223 | ||
Bold values are the lowest RMSE values for each dataset
AFINN lexicon-based ANFIS
| Datasets | TRIMF | GBELL | GAUSS |
|---|---|---|---|
| Root mean square error (RMSE) | |||
| Apple | 0.48046 | 0.50781 | |
| Nuclear | 0.4105 | 0.40527 | |
| Sanders | 0.45402 | 0.44803 | |
| SemEval ( | 0.47486 | 0.47449 | |
| SemEval ( | 0.45577 | 0.45535 | |
| SemEval ( | 0.44567 | 0.44541 | |
| STS (Go et al. | 0.40142 | 0.39323 | |
| Airline | 0.38994 | 0.38982 | |
| 0.42823 | 0.42991 | ||
| 0.43172 | 0.45053 | ||
Bold values are the lowest RMSE values for each dataset
SENTIWORDNET lexicon-based ANFIS
| Datasets | TRIMF | GBELL | GAUSS |
|---|---|---|---|
| Root mean square error (RMSE) | |||
| Apple | 0.4878 | 0.4866 | |
| Nuclear | 0.36638 | 0.36921 | |
| Sanders | 0.45498 | 0.45341 | |
| SemEval ( | 0.49336 | 0.49294 | |
| SemEval ( | 0.4526 | 0.45576 | |
| SemEval ( | 0.48483 | 0.48484 | |
| STS (Go et al. | 0.42831 | 0.42352 | |
| Airline | 0.39752 | 0.39623 | |
| 0.43542 | 0.43032 | ||
| 0.46493 | 0.45302 | ||
Bold values are the lowest RMSE values for each dataset
Comparison of VADER lexicon-based ANFIS with VADER-specific methods
| Dataset | Methods | RMSE |
|---|---|---|
| Apple | VADER SA (Hutto and Gilbert | 0.6127 |
| Fuzzy rule (Vashishtha and Susan | 0.6223 | |
| Vader-ANFIS | ||
| Nuclear | VADER SA (Hutto and Gilbert | 0.8013 |
| Fuzzy rule (Vashishtha and Susan | 0.4168 | |
| Vader-ANFIS | ||
| Sanders | VADER SA (Hutto and Gilbert | 0.6353 |
| Fuzzy rule (Vashishtha and Susan | 0.5593 | |
| Vader-ANFIS | ||
| SemEval ( | VADER SA (Hutto and Gilbert | 0.6137 |
| Fuzzy rule (Vashishtha and Susan | 0.6892 | |
| Vader-ANFIS | ||
| SemEval ( | VADER SA (Hutto and Gilbert | 0.6309 |
| Fuzzy rule (Vashishtha and Susan | 0.8078 | |
| Vader-ANFIS | ||
| SemEval ( | VADER SA (Hutto and Gilbert | 0.5764 |
| Fuzzy rule (Vashishtha and Susan | 0.689 | |
| Vader-ANFIS | ||
| STS (Go et al. | VADER SA (Hutto and Gilbert | 0.4251 |
| Fuzzy Rule (Vashishtha and Susan | 0.7525 | |
| Vader-ANFIS | ||
| Airline | VADER SA (Hutto and Gilbert | 0.5012 |
| Fuzzy rule (Vashishtha and Susan | 0.8552 | |
| Vader-ANFIS | ||
| VADER SA (Hutto and Gilbert | 0.5296 | |
| Fuzzy rule (Vashishtha and Susan | 0.7949 | |
| Vader-ANFIS | ||
| VADER SA (Hutto and Gilbert | 0.4648 | |
| Fuzzy rule (Vashishtha and Susan | 0.7765 | |
| Vader-ANFIS |
Bold values are the lowest RMSE values for each dataset
Comparison of AFINN lexicon-based ANFIS with AFINN-specific methods
| Dataset | Methods | RMSE |
|---|---|---|
| Apple | AFINN SA (Nielsen | 0.6112 |
| Fuzzy rule (Vashishtha and Susan | 0.6402 | |
| AFINN-ANFIS | ||
| Nuclear | AFINN SA (Nielsen | 0.846 |
| Fuzzy rule (Vashishtha and Susan | 0.4413 | |
| AFINN-ANFIS | ||
| Sanders | AFINN SA (Nielsen | 0.6235 |
| Fuzzy rule (Vashishtha and Susan | 0.5634 | |
| AFINN-ANFIS | ||
| SemEval ( | AFINN SA (Nielsen | 0.6127 |
| Fuzzy rule (Vashishtha and Susan | 0.7046 | |
| AFINN-ANFIS | ||
| SemEval ( | AFINN SA (Nielsen | 0.6591 |
| Fuzzy rule (Vashishtha and Susan | 0.8303 | |
| AFINN-ANFIS | ||
| SemEval ( | AFINN SA (Nielsen | 0.5649 |
| Fuzzy rule (Vashishtha and Susan | 0.718 | |
| AFINN-ANFIS | ||
| STS (Go et al. | AFINN SA (Nielsen | 0.4321 |
| Fuzzy rule (Vashishtha and Susan | 0.8287 | |
| AFINN-ANFIS | ||
| Airline | AFINN SA (Nielsen | 0.532 |
| Fuzzy rule (Vashishtha and Susan | 0.8819 | |
| AFINN-ANFIS | ||
| AFINN SA (Nielsen | 0.5444 | |
| Fuzzy rule (Vashishtha and Susan | 0.8106 | |
| AFINN-ANFIS | ||
| AFINN SA (Nielsen | 0.483 | |
| Fuzzy rule (Vashishtha and Susan | 0.8001 | |
| AFINN-ANFIS |
Bold values are the lowest RMSE values for each dataset
Comparison of SENTIWORDNET lexicon-based ANFIS with SENTIWORDNET-specific methods
| Dataset | Methods | RMSE |
|---|---|---|
| Apple | Cavalcanti (Cavalcanti et al. | 0.7437 |
| Ortega (Ortega et al. | 0.6449 | |
| Fuzzy rule (Vashishtha and Susan | 0.655 | |
| SENTIWORDNET-ANFIS | ||
| Nuclear | Cavalcanti (Cavalcanti et al. | 0.9205 |
| Ortega (Ortega et al. | 0.8553 | |
| Fuzzy rule (Vashishtha and Susan | 0.4168 | |
| SENTIWORDNET-ANFIS | ||
| Sanders | Cavalcanti (Cavalcanti et al. | 0.795 |
| Ortega (Ortega et al. | 0.6202 | |
| Fuzzy rule (Vashishtha and Susan | 0.5655 | |
| SENTIWORDNET-ANFIS | ||
| SemEval ( | Cavalcanti (Cavalcanti et al. | 0.6892 |
| Ortega (Ortega et al. | 0.6693 | |
| Fuzzy rule (Vashishtha and Susan | 0.7176 | |
| SENTIWORDNET-ANFIS | ||
| SemEval ( | Cavalcanti (Cavalcanti et al. | 0.554 |
| Ortega (Ortega et al. | 0.7164 | |
| Fuzzy rule (Vashishtha and Susan | 0.8115 | |
| SENTIWORDNET-ANFIS | ||
| SemEval ( | Cavalcanti (Cavalcanti et al. | 0.6839 |
| Ortega (Ortega et al. | 0.6619 | |
| Fuzzy rule (Vashishtha and Susan | 0.7222 | |
| SENTIWORDNET-ANFIS | ||
| STS (Go et al. | Cavalcanti (Cavalcanti et al. | 0.5226 |
| Ortega (Ortega et al. | 0.6555 | |
| Fuzzy rule (Vashishtha and Susan | 0.7418 | |
| SENTIWORDNET-ANFIS | ||
| Airline | Cavalcanti (Cavalcanti et al. | 0.4629 |
| Ortega (Ortega et al. | 0.6804 | |
| Fuzzy rule (Vashishtha and Susan | 0.8552 | |
| SENTIWORDNET-ANFIS | ||
| Cavalcanti (Cavalcanti et al. | 0.5654 | |
| Ortega (Ortega et al. | 0.7703 | |
| Fuzzy rule (Vashishtha and Susan | 0.7949 | |
| SENTIWORDNET-ANFIS | ||
| Cavalcanti (Cavalcanti et al. | 0.5224 | |
| Ortega (Ortega et al. | 0.7256 | |
| Fuzzy rule (Vashishtha and Susan | 0.7765 | |
| SENTIWORDNET-ANFIS |
Bold values are the lowest RMSE values for each dataset
MultiLexANFIS: multiple-lexicon-based ANFIS
| Datasets | Least among Lex | TRIMF | GBELL | GAUSS |
|---|---|---|---|---|
| Lex MF RMSE | RMSE | RMSE | RMSE | |
| Apple | V Gbell— 0.457 | 0.475 | 0.452 | |
| Nuclear | V Gauss—0.332 | 0.374 | 0.356 | |
| Sanders | V Trimf—0.435 | 0.494 | 0.519 | |
| SemEval ( | A Trimf—0.474 | 0.474 | 0.473 | |
| SemEval ( | V Trimf—0.452 | 0.456 | ||
| SemEval ( | A Gauss—0.445 | 0.447 | 0.445 | |
| STS (Go et al. | V Gauss—0.369 | 0.380 | 0.376 | |
| Airline | A Gbell—0.3873 | 0.3860 | 0.3859 | |
| A Gbell—0.4269 | 0.4159 | 0.4127 | ||
| V Gbell—0.4036 | 0.4104 | 0.4377 |
Bold values are the lowest RMSE value for each dataset. V stands for VADER and A stands for AFINN
Fig. 4Training error for Apple dataset with gaussmf in MultiLexANFIS
Fig. 5Test error for Apple dataset with gaussmf in MultiLexANFIS (red dots indicate the predicted values for testing data) (color figure online)
Comparison of MultiLexANFIS with the state of the art
| Datasets | Least among Single-Lex ANFIS | MultiLexANFIS | RNN (Nemes and Kiss 2021) | LSTM (Rahman et al. | n-gram SVM (Tripathy et al. | VADER—fuzzy rule (Vashishtha and Susan | AFINN—fuzzy rule (Vashishtha and Susan | SWN—fuzzy rule (Vashishtha and Susan | VADER SA (Hutto and Gilbert | AFINN SA (Nielsen | SWN SA (Cavalcanti et al. | SWN SA (Ortega et al. |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Lex MF RMSE | RMSE | RMSE | RMSE | RMSE | RMSE | RMSE | RMSE | RMSE | RMSE | RMSE | RMSE | |
| Apple | V Gbell- 0.457 | 0.5176 | 0.5309 | 0.6689 | 0.6223 | 0.6402 | 0.655 | 0.6127 | 0.6112 | 0.7437 | 0.6449 | |
| Nuclear | V Gauss—0.332 | 0.4699 | 0.456 | 0.3245 | 0.4168 | 0.4413 | 0.4168 | 0.8013 | 0.846 | 0.9205 | 0.8553 | |
| Sanders | V Trimf—0.435 | 0.4856 | 0.4911 | 0.5527 | 0.5593 | 0.5634 | 0.5655 | 0.6353 | 0.6235 | 0.795 | 0.6202 | |
| SemEval ( | A Trimf—0.474 | 0.514 | 0.5254 | 0.6978 | 0.6892 | 0.7046 | 0.7176 | 0.6137 | 0.6127 | 0.6892 | 0.6693 | |
| SemEval ( | V Trimf—0.452 | 0.4632 | 0.4661 | 0.5462 | 0.8078 | 0.8303 | 0.8115 | 0.6309 | 0.6591 | 0.554 | 0.7164 | |
| SemEval ( | A Gauss—0.445 | 0.5615 | 0.5452 | 0.6909 | 0.6890 | 0.7180 | 0.7222 | 0.5764 | 0.5649 | 0.6839 | 0.6619 | |
| STS (Go et al. | V Gauss—0.369 | 0.5262 | 0.5244 | 0.5416 | 0.7525 | 0.8287 | 0.7418 | 0.4251 | 0.4321 | 0.5226 | 0.6555 | |
| Airline | A Gbell—0.3873 | 0.4417 | 0.4383 | 0.4569 | 0.8552 | 0.8819 | 0.8816 | 0.5012 | 0.5320 | 0.4629 | 0.6804 | |
| Twitter_2019 | A Gbell—0.4269 | 0.5241 | 0.5275 | 0.5629 | 0.7949 | 0.8106 | 0.794 | 0.5296 | 0.5444 | 0.5654 | 0.7703 | |
| V Gbell—0.4036 | 0.5662 | 0.5627 | 0.5723 | 0.7765 | 0.8001 | 0.762 | 0.4648 | 0.483 | 0.5224 | 0.7256 |
Bold values are the lowest RMSE value for each dataset. V stands for VADER and A stands for AFINN