| Literature DB >> 36224328 |
Mayur Wankhade1, Annavarapu Chandra Sekhara Rao2.
Abstract
Social media platforms significantly increase general information about disease severity and inform preventive measures among community members. To identify public opinion through tweets on the subject of Covid-19 and investigate public sentiment in the country over the period. This article proposed a novel method for sentiment analysis of coronavirus-related tweets using bidirectional encoder representations from transformers (BERT) bi-directional long short-term memory (Bi-LSTM) ensemble learning model. The proposed approach consists of two stages. In the first stage, the BERT model gains the domain knowledge with Covid-19 data and fine-tunes with sentiment word dictionary. The second stage is the Bi-LSTM model, which is used to process the data in a bi-directional way with context sequence dependency preserving to process the data and classify the sentiment. Finally, the ensemble technique combines both models to classify the sentiment into positive and negative categories. The result obtained by the proposed method is better than the state-of-the-art methods. Moreover, the proposed model efficiently understands the public opinion on the Twitter platform, which can aid in formulating, monitoring and regulating public health policies during a pandemic.Entities:
Mesh:
Year: 2022 PMID: 36224328 PMCID: PMC9555259 DOI: 10.1038/s41598-022-21604-7
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Comparative analysis of different methods.
| Reference | Methods | Techniques applied | Dataset | Task | Limitation |
|---|---|---|---|---|---|
| Barkur et al.[ | Word frequency | Only WordCloud was used for the analysis, using the software R | Covid-19 Twitter | To investigated Indian public opinion following the government ordered lockdown | Depends only word frequency count only |
| Samuel et al.[ | Machine learning | Classifiers such as naive bayes and logistic regression has been used | Covid-19 Twitter | In the work assessed the sentiment based on single keyword monitoring focused solely on the dread of people in the United States | Method depends Word frequency count and no context relation considered |
| Hamzah et al.[ | Lexicon polarity | Predictive modeling of susceptible exposed- infectious recovered | Corona tracker website | To track the economic and health impacts on people as described on the corona tracker website | Polarity related to covid-19 keyword are neutral in majority of case |
| Abd-Alrazaq et al.[ | Machine learning | Unigrams and bigrams was used to evaluate tweets, while dirichlet allocation was used topic modeling | Covid-19 Twitter | To derive a precise concept by analyzed the major subjects tweeted by netizens regarding the Covid-19 pandemic | Method not considered the context relation |
| lwin et al.[ | Lexicon based | Lexicon based model analyzed the expression of various emotions in Covid-19 | Covid-19 Twitter | Examined global trends in the expression of diverse emotions during Covid-19 pandemic | The situation of the Covid-19 case depends on the time and country |
| Raamkumar et al.[ | Lexicon based | Recognize public health authorities communication methods for measuring public opinion and answers on social media | The purposed of this study is to analyze public health authorities outreach activities to the public on Covid-19 | Facebook data not related to Covid-19 context | |
| Liu et al.[ | Hybrid method | Categorizing contextual awareness through social via social media during Covid-19 pandemic | Covid-19 Twitter | Investigating the effects of Covid-19 on people mental health to aid policy and provide services to infected communities | These were largely concerns about health care |
| Satu et al.[ | Machine learning | Suggested classification, clustered based approach examined subjects relevant to Covid-19 | Covid-19 Twitter | Classify and examined sentiment relevant to Covid-19 as per topic | Very less tweets were determined from a single country |
| Wang et al.[ | Machine learning | Covid-19 requires authorities and stake- holders to communicate about risks and crises | Covid-19 Twitter | Study evaluated the players risk and crisis communication on Twitter | Depended on the time and country |
| Su et al.[ | Machine learning | LDA-Topic modeling was utilized to identify and track persistent difficulties | Covid-19 Twitter | Investigated to classify tweets according to country socio- economic condition | Topic depended only considered |
| Basiri et al.[ | Deep learning | Fusion-based deep learning model | Covid-19 Twitter | Opinion analyzed on social media for regulating, and eradicating the condition | Method not covered opinion features |
| Proposed work | Transfer + Deep learning | BERT model for domain knowledge adoption and Bi-LSTM ensemble method | Covid-19 Twitter | Opinion analysis on Covid-19 related tweets over the period of time for the public opinion analysis system |
Figure 1The architecture of BERT-Bi-LSTM Ensemble model for opinion analysis.
Top frequent tag used in collecting data from twitter.
| Aspect | Related hashtag (#) |
|---|---|
| Covid-19 | COVID19, corona, covid, coronavirusindia corona, coronavirus, IndiaFightsCorona,coronavirus, virus, Covid-19 |
| Vaccination | firstdose, vaccine, CoWin, seconddose COVAXIN, Covishield, sputnikvaccine, vaccineregistration, Modernavaccine, Novavaxvaccine, COVID-19vaccine |
| Lockdown | covidprotocol, adversary, quarantine-life, quarantine, stayhome, stayhomestaysafe, MentalHealthAwarenes, coronawarrior, SocialDistancing, StayHome, StayAtHome, SocialDistanacing, WorkFromHome, washyourhands, BackToWork, SayNoToMasks, lockdown2021, CoronaLockdown, StayAtHomeAndStaySafe, lockdownguideline |
Some sample example from Covid-19 data collection.
| Sr. No. | Sample example | Category |
|---|---|---|
| 1. | Stay safe stay home. | Neutral |
| 2. | We can fight against COVID-19 and beat him. | Positive |
| 3. | It is vital to follow the guideline to prevent the spread of COVID-19. | Positive |
| 4. | Can Zinc medicine treat corona, Don’t believe it. | Negative |
| 5. | Covid-19 is everyone’s fight. but nothing like this before happened. | Negative |
Statistics of the collected datasets.
| Dataset | Total number of tweet | Positive word count | Negative word count | Neutral word count | Average word length |
|---|---|---|---|---|---|
| Covid-19 | 465,639 | 245,983 | 223,441 | 345,350 | 15.07 |
| Vaccination | 122,521 | 61,109 | 4242 | 5450 | 11.02 |
| Lockdown | 380,359 | 180,311 | 245,780 | 105,790 | 16.38 |
Figure 2The overall public opinion discussed during Covid-19 pandemic condition month-wise result analysis.
Figure 3Graphical visualization of Covid-19 cases, the report contains a record of new case, new taste case, positive rate, reproduction rate, and over the period.
Proposed model comparison with baseline method on the Covid-19 dataset.
| Model | Classifier | CV training | CV testing | Accuracy | Time taken (in second) |
|---|---|---|---|---|---|
| BoW | Naive bayes | 68.141 | 58.147 | 61.365 | 12.36 |
| Support vector machine | 82.451 | 70.027 | 72.895 | 16.25 | |
| Random forest | 74.956 | 95.360 | 62.698 | 123.22 | |
| Logistic regression | 66.254 | 76.589 | 71.587 | 17.26 | |
| N-gram | Naive bayes | 70.640 | 65.107 | 66.160 | 14.09 |
| Support vector machine | 84.051 | 73.065 | 74.290 | 18.20 | |
| Random forest | 76.956 | 95.360 | 65.018 | 133.08 | |
| Logistic regression | 73.057 | 74.049 | 72.527 | 21.23 | |
| word2vec | LSTM | 75.026 | 70.568 | 73.897 | 27.14 |
| GloVe | LSTM | 76.714 | 72.358 | 74.106 | 30.47 |
| BERT | Bi-LSTM | 82.914 | 80.478 | 82.479 | 26.45 |
| Proposed | BERT-Bi-LSTM ensemble | 86.041 | 86.450 | 86.139 | 26.40 |
The proposed model with several variations comparative result in analysis on the Covid-19 dataset.
| Method | Positive | Negative | ||||
|---|---|---|---|---|---|---|
| Precision | Recall | F-measure | Precision | Recall | F-measure | |
| BERT | 74.14 | 75.07 | 75.14 | 74.47 | 75.14 | 75.47 |
| Bi-LSTM | 72.41 | 73.48 | 73.45 | 74.09 | 75.01 | 74.89 |
| SFT-BERT | 78.04 | 78.58 | 78.65 | 75.09 | 76.02 | 76.14 |
| BERT+LSTM | 79.74 | 79.89 | 79.95 | 79.18 | 80.14 | 80.49 |
| SFT-BERT+LSTM | 83.47 | 84.75 | 85.66 | 83.14 | 84.01 | 84.17 |
| SFT-BERT+Bi-LSTM | 86.04 | 86.45 | 86.13 | 84.01 | 85.47 | 85.78 |
Aspect attention base sentiment classification using the proposed method.
| Aspect | Positive | Negative | ||||
|---|---|---|---|---|---|---|
| Precision | Recall | F-measure | Precision | Recall | F-measure | |
| Covid-19 | 82.01 | 81.14 | 81.47 | 85.19 | 85.94 | 86.01 |
| Lockdown | 85.11 | 85.39 | 85.65 | 83.47 | 84.22 | 84.32 |
| Vaccination | 87.66 | 87.96 | 87.82 | 82.14 | 83.01 | 83.16 |
| Health | 85.14 | 85.36 | 85.47 | 85.77 | 85.89 | 85.44 |
| Quartine | 82.14 | 83.01 | 83.16 | 85.42 | 86.31 | 86.11 |
| Safety | 83.47 | 84.12 | 84.32 | 87.06 | 87.26 | 87.82 |
| Policy | 84.38 | 84.79 | 84.58 | 82.04 | 83.01 | 83.16 |
| Guideline | 85.47 | 85.89 | 85.45 | 85.14 | 85.36 | 85.37 |
| Overall | 86.04 | 86.45 | 86.13 | 84.01 | 85.47 | 85.78 |
Figure 6Public opinion analysis on vaccination discussed during the pandemic period.
Figure 5Overall public sentiment classification on various aspects discussed during the pandemic period.
Figure 4The proposed method comparative result analysis monthly reported during pandemic a period.
Figure 7Public opinion analysis on lowdown discussed during the pandemic period.
Figure 9The ratio of new confirmed Covid-19 cases reported and death cases reported over the period.
Figure 8The number of people vaccinated reported over the period of time.
Figure 10Top frequently used words which are most commonly discussed during the pandemic period.