| Literature DB >> 35628018 |
Mahendra Kumar Gourisaria1, Satish Chandra1, Himansu Das1, Sudhansu Shekhar Patra2, Manoj Sahni3, Ernesto Leon-Castro4, Vijander Singh5,6, Sandeep Kumar7.
Abstract
The evolution of the coronavirus (COVID-19) disease took a toll on the social, healthcare, economic, and psychological prosperity of human beings. In the past couple of months, many organizations, individuals, and governments have adopted Twitter to convey their sentiments on COVID-19, the lockdown, the pandemic, and hashtags. This paper aims to analyze the psychological reactions and discourse of Twitter users related to COVID-19. In this experiment, Latent Dirichlet Allocation (LDA) has been used for topic modeling. In addition, a Bidirectional Long Short-Term Memory (BiLSTM) model and various classification techniques such as random forest, support vector machine, logistic regression, naive Bayes, decision tree, logistic regression with stochastic gradient descent optimizer, and majority voting classifier have been adapted for analyzing the polarity of sentiment. The effectiveness of the aforesaid approaches along with LDA modeling has been tested, validated, and compared with several benchmark datasets and on a newly generated dataset for analysis. To achieve better results, a dual dataset approach has been incorporated to determine the frequency of positive and negative tweets and word clouds, which helps to identify the most effective model for analyzing the corpora. The experimental result shows that the BiLSTM approach outperforms the other approaches with an accuracy of 96.7%.Entities:
Keywords: BiLSTM; COVID-19 sentiment analysis; Latent Dirichlet Allocation (LDA); natural language processing; topic modeling
Year: 2022 PMID: 35628018 PMCID: PMC9141192 DOI: 10.3390/healthcare10050881
Source DB: PubMed Journal: Healthcare (Basel) ISSN: 2227-9032
Summary of Dataset.
| Dataset | Positive | Negative | Total |
|---|---|---|---|
| First Dataset | 80,844 | 96,702 | 177,456 |
| Second Dataset | 727 | 2363 | 3090 |
Figure 1Workflow of COVID-19 sentiment analysis.
Figure 2Frequency of positive or negative dataset of (a) first dataset (b) second dataset.
Figure 3Word cloud of the dataset.
Figure 4Plate notation for LDA where grey signifies an entity that can be observed.
Figure 5LSTM memory cell.
Figure 6A bidirectional LSTM network.
Figure 7Top 10 most frequent words for the mined dataset.
Figure 8Top 10 most frequent words for the second dataset.
Figure 9Top 10 most frequent words for the first dataset.
Most prominent words using LDA in our mined datasets.
| Topics | Top 10 Most Prominent Words |
|---|---|
| 1 | cases, trump, deaths, new, positive, total, india, active, hospital, pandemic |
| 2 | trump, people, mask, just, like, president, americans, masks, caughttrump, gop |
| 3 | pandemic, health, people, world, day, like, time, positive, test, trump |
| 4 | testing, know, test, pandemic, china, positive, days, nfl, tests, world |
| 5 | trump, news, place, good, starting, waiting, hurts, ends, jump, pockets |
Most prominent words using LDA for first dataset.
| Topics | Top 10 Most Prominent Words |
|---|---|
| 1 | india, local, lets, pm, narendramodi, lockdown, app, month, buy, time |
| 2 | fight, narendramodi, world, india, people, like, home, pm, stay, doing |
| 3 | cases, india, new, positive, deaths, total, people, delhi, number, tested |
| 4 | lockdown, people, pandemic, shri, follow, india, fight, rs, food, time |
| 5 | lockdown, govt, india, sir, pmoindia, just, like, people, health, pandemic |
Most prominent words using LDA for second dataset.
| Topics | Top 10 Most Prominent Words |
|---|---|
| 1 | twitter pic wajid away music suffering khan world sajid people |
| 2 | people twitter pic good time day sir trump virus bad |
| 3 | people just cases government virus going like need help days |
| 4 | people twitter india shit world cases pic like virus death |
| 5 | people home trump govt stay india going time work safe |
Figure 10Inter-topic distance map for first dataset.
Figure 11Inter-topic distance map for the mined dataset.
Figure 12Inter-topic distance map for the second dataset.
Confusion matrix of different classifiers for both the datasets.
| First Dataset | Second Dataset | |||||||
|---|---|---|---|---|---|---|---|---|
| TP | TN | FP | FN | TP | TN | FP | FN | |
| Bidirectional LSTM (BiLSTM) | 7769 | 9401 | 244 | 341 | 44 | 229 | 12 | 24 |
| Logistic Regression | 5509 | 8664 | 1108 | 2474 | 52 | 229 | 11 | 17 |
| Random Forest | 7420 | 9596 | 176 | 563 | 45 | 225 | 15 | 24 |
| Naïve Bayes | 5339 | 8244 | 1528 | 2644 | 51 | 230 | 10 | 18 |
| Support Vector Machine | 5408 | 8737 | 1035 | 2575 | 53 | 225 | 15 | 16 |
| LR-SGDC | 5568 | 8602 | 1170 | 2415 | 50 | 228 | 12 | 19 |
| Decision tree | 7597 | 9555 | 217 | 386 | 46 | 219 | 21 | 23 |
The performance measure of various classifiers of the first dataset.
| Accuracy (%) | F1-Score (%) | Precision (%) | Recall (%) | Roc-Auc (%) | Specificity (%) | BAC (%) | |
|---|---|---|---|---|---|---|---|
| Bidirectional LSTM (BiLSTM) | 96.7 | 96.67 | 96.72 | 96.63 | 97.47 | 97.47 | 96.72 |
| Random Forest | 95.83 | 95.77 | 96.07 | 95.57 | 95.57 | 98.19 | 96.07 |
| Logistic Regression | 79.82 | 79.16 | 80.52 | 78.83 | 78.83 | 88.67 | 80.52 |
| Naïve Bayes | 76.5 | 75.85 | 76.73 | 75.62 | 75.62 | 84.36 | 76.73 |
| Support Vector Machine | 79.67 | 78.92 | 80.58 | 78.57 | 78.57 | 89.41 | 80.58 |
| LR-SGDC | 79.81 | 79.2 | 80.35 | 78.87 | 78.87 | 88.02 | 80.35 |
| Decision Tree | 96.6 | 96.56 | 96.67 | 96.47 | 96.47 | 97.78 | 96.67 |
The performance measure of various classifiers of the second dataset.
| Accuracy (%) | F1-Score (%) | Precision (%) | Recall (%) | Roc-Auc (%) | Specificity (%) | BAC (%) | |
|---|---|---|---|---|---|---|---|
| Bidirectional LSTM (BiLSTM) | 88.34 | 81.84 | 84.54 | 79.86 | 79.86 | 95.02 | 84.54 |
| Logistic Regression | 90.93 | 86.51 | 87.81 | 85.38 | 85.38 | 95.41 | 87.81 |
| Random Forest | 87.37 | 80.89 | 82.68 | 79.48 | 79.48 | 93.75 | 82.68 |
| Naïve Bayes | 90.93 | 86.36 | 88.17 | 84.87 | 84.87 | 95.83 | 88.17 |
| Support Vector Machine | 89.96 | 85.46 | 85.65 | 85.28 | 85.28 | 93.75 | 85.65 |
| LR-SGDC | 89.96 | 84.98 | 86.47 | 83.73 | 83.73 | 95 | 86.47 |
| Decision Tree | 85.76 | 79.25 | 79.57 | 78.95 | 78.95 | 91.25 | 79.57 |
Figure 13Performance graph of different classifiers for first dataset.
Figure 14Performance graph of different classifiers for the second dataset.
Figure 15ROC curve for First dataset.
Figure 16ROC curve for the second dataset.
Prediction of mined tweets by the various classifiers trained with first dataset.
| Sl. No. | Text | First Dataset | Self | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| LSTM | Random Forest | Logistic Regression | Naïve Bayes | LR-SGDC | SVM | Decision Tree | MVC | |||
| 1 | #COVID-19 death toll breaches 900-mark in #Odisha | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | as per this data by @MoHFW_INDIA #Kerala is at the bottom in recovery rate, and top in active cases (%). Good thing is the death rate is also the lowest. State reported 3rd most cases yesterday behind #Maharashtra & #Karnataka. #COVID #COVID-19 #CoronaVirusUpdates | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 |
| 3 | People working in Tech are lucky in the current situation. If they work in a big-budget company, they have moderate (only moderate) amount of job security. Counterparts in other sectors, not lucky More to be expected from COVID-19? #COVID-19 #CoronaVirus #technology #jobs | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 |
| 4 | White House Aids release photos of President #Trump working while being treated for #COVID-19 #COVIDCaughtTrump | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 |
| 5 | @GovMikeDeWine @LtGovHusted Where is the Prayer Day for all 209,000+ #COVID-19 DEAD AMERICANS??? | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| 6 | Greek doctors stage 2 km fun run to debunk #COVID-19 mask myth | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 7 | #LoveIsNotTourism #LoveIsEssential #COVID-19 #travelban #poetry Credit: @igneusT | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 8 | Deputy Director of Narcotics Control Bureau, KPS Malhotra tests positive for #COVID-19. | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 |
| 9 | All are concerned about their future. We are also concerned about when will our colleges and universities been opened up? #BREAKING #India #COVID-19 #education #reopencollege #students @EduMinOfIndia #reopen_ug_college #Health #healthcare #Trending | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 |
| 10 | When a Country have a Govt who are full of #COVIDIOTS that preach Cow Urine will save one from #COVID-19 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 |
| 11 | Did COVID-19 positive Donald Trump continue meeting people and attending events even after test results? #DonaldTrump #COVID-19 #coronavirus @realDonaldTrump | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| 12 | So basically, one man in the entire world ate one single bat, he died, and then everybody followed… I believe that’s what the scientists call #COVID-19… That’s #wild. | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 |
| 13 | After 5 months of heavy usage, this 20 baht do it ourselves, still function, there also are YouTube Thai instruction mask making, M of Interior sponsored workshop in every district, we may not be top industrial countries in the world, but we find our #COVID-19 solution #Thailand | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| 14 | To be honest there is nothing funny about anyone over the age of 70 getting #COVID-19 and it shows the level of inhumanity from such tolerant and welcoming people who seem to be on the wrong side of overthrowing fascism!!!! Stop telling us what fascism is and stop being IT!!!”! | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| 15 | #Odisha: 15 more patients succumb to #COVID-19, death toll crosses 900-mark | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Where # stands for twitter specific features (Hashtag).
Prediction of mined tweets by the various classifiers trained with the second dataset.
| Sl. No. | Text | Second Dataset | Self | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| LSTM | Random Forest | Logistic Regression | Naïve Bayes | LR-SGDC | SVM | Decision Tree | MVC | |||
| 1 | #COVID-19 death toll breaches 900-mark in #Odisha | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | as per this data by @MoHFW_INDIA #Kerala is at the bottom in recovery rate, and top in active cases (%). Good thing is the death rate is also the lowest. State reported 3rd most cases yesterday behind #Maharashtra & #Karnataka. #COVID #COVID-19 #CoronaVirusUpdates | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 | People working in Tech are lucky in the current situation. If they work in a big-budget company, they have moderate (only moderate) amount of job security. Counterparts in other sectors, not lucky More to be expected from COVID-19? #COVID-19 #CoronaVirus #technology #jobs | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 |
| 4 | White House Aids release photos of President #Trump working while being treated for #COVID-19 #COVIDCaughtTrump | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| 5 | @GovMikeDeWine @LtGovHusted Where is the Prayer Day for all 209,000+ #COVID-19 DEAD AMERICANS ??? | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 6 | Greek doctors stage 2 km fun run to debunk #COVID-19 mask myth | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| 7 | #LoveIsNotTourism #LoveIsEssential #COVID-19 #travelban #poetry Credit: @igneusT | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 |
| 8 | Deputy Director of Narcotics Control Bureau, KPS Malhotra tests positive for #COVID-19. | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 |
| 9 | All are concerned about their future. We are also concerned about when will our colleges and universities been opened up? #BREAKING #India #COVID-19 #education #reopencollege #students @EduMinOfIndia #reopen_ug_college #Health #healthcare #Trending | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 |
| 10 | When a Country have a Govt who are full of #COVIDIOTS that preach Cow Urine will save one from #COVID-19 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| 11 | Did COVID-19 positive Donald Trump continue meeting people and attending events even after test results? #DonaldTrump #COVID-19 #coronavirus @ realDonaldTrump | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 |
| 12 | So basically, one man in the entire world ate one single bat, he died, and then everybody followed… I believe that’s what the scientists call #COVID-19… That’s #wild. | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 |
| 13 | After 5 months of heavy usage, this 20 baht do it ourselves, still function, there also are YouTube Thai instruction mask making, M of Interior sponsored workshop in every district, we may not be top industrial countries in the world, but we find our #COVID-19 solution #Thailand | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| 14 | To be honest there is nothing funny about anyone over the age of 70 getting #COVID-19 and it shows the level of inhumanity from such tolerant and welcoming people who seem to be on the wrong side of overthrowing fascism!!!! Stop telling us what fascism is and stop being IT!!!”! | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 |
| 15 | #Odisha: 15 more patients succumb to #COVID-19, death toll crosses 900-mark | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Where # stands for twitter specific features (Hashtag).
Prediction by first dataset.
| Classifier | No of Correct Prediction | No of Incorrect Prediction | Accuracy (%) |
|---|---|---|---|
| BiLSTM | 9 | 6 | 60 |
| Random Forest | 9 | 6 | 60 |
| Logistic Regression | 8 | 7 | 53.34 |
| Naïve Bayes | 8 | 7 | 53.34 |
| LR-SGDC | 8 | 7 | 53.34 |
| SVM | 8 | 7 | 53.34 |
| Decision Tree | 9 | 6 | 60 |
| Majority Voting Classifier | 8 | 7 | 53.34 |
Prediction by second dataset.
| Classifier | No of Correct Prediction | No of Incorrect Prediction | Accuracy (%) |
|---|---|---|---|
| BiLSTM | 11 | 4 | 73.34 |
| Random Forest | 7 | 8 | 46.67 |
| Logistic Regression | 13 | 2 | 86.67 |
| Naïve Bayes | 11 | 4 | 73.34 |
| LR-SGDC | 11 | 4 | 73.34 |
| SVM | 11 | 4 | 73.34 |
| Decision Tree | 11 | 4 | 73.34 |
| Majority Voting Classifier | 13 | 2 | 86.67 |
Time complexity of different models for each dataset (seconds).
| Classifier Model | First Dataset | Second Dataset |
|---|---|---|
| Random Forest | 44.068 | 0.253 |
| Logistic Regression | 28.389 | 0.282 |
| Naïve Bayes | 2.682 | 0.035 |
| SVM | 35.448 | 0.069 |
| LR SGDC | 8.089 | 0.277 |
| Decision Tree | 206.455 | 0.552 |
| BiLSTM | 15,707 | 0.66 |