| Literature DB >> 35535182 |
Jatla Srikanth1, Avula Damodaram2, Yuvaraja Teekaraman3, Ramya Kuppusamy4, Amruth Ramesh Thelkar5.
Abstract
Social media is Internet-based by design, allowing people to share content quickly via electronic means. People can openly express their thoughts on social media sites such as Twitter, which can then be shared with other people. During the recent COVID-19 outbreak, public opinion analytics provided useful information for determining the best public health response. At the same time, the dissemination of misinformation, aided by social media and other digital platforms, has proven to be a greater threat to global public health than the virus itself, as the COVID-19 pandemic has shown. The public's feelings on social distancing can be discovered by analysing articulated messages from Twitter. The automated method of recognizing and classifying subjective information in text data is known as sentiment analysis. In this research work, we have proposed to use a combination of preprocessing approaches such as tokenization, filtering, stemming, and building N-gram models. Deep belief neural network (DBN) with pseudo labelling is used to classify the tweets. Top layers of the base classifiers are boosted in the pseudo labelling strategy, whereas lower levels of the base classifiers share weights for feature extraction. By introducing the pseudo boost mechanism, our suggested technique preserves the same time complexity as a DBN while achieving fast convergence to optimality. The pseudo labelling improves the performance of the classification. It extracts the keywords from the tweets with high precision. The results reveal that using the DBN classifier in conjunction with the bigram in the N-gram model outperformed other models by 90.3 percent. The proposed approach can also aid medical professionals and decision-makers in determining the best course of action for each location based on their views regarding the pandemic.Entities:
Mesh:
Year: 2022 PMID: 35535182 PMCID: PMC9077450 DOI: 10.1155/2022/8898100
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1A typical framework with supervised learning model for text classification.
Figure 2Sentimental analysis pipeline.
Figure 3N-grams model for generative applications.
Figure 4DBN learning architecture with two hidden layers and one output layer.
Figure 5Labelling method.
Statistics overview of the tweets COVID-19 dataset.
| Feature | Total | Unique | Percentage of tweets (%) |
|---|---|---|---|
| Hashtag | 3653928 | 566308 | 30 |
| Mention | 5363449 | 1251963 | 40 |
| Entity | 11537537 | 331307 | 70 |
Sample tweets classification from the data set.
| Sample tweet | Sentiment category |
|---|---|
| Bright vision, a community hospital, is transferring all patients to create room for stable COVID-19 cases. | Mixed sentiment |
| Any fellow patriot who celebrates Boris contracting the Corona virus is a complete cunt. | Negative/sad |
| Twittizens, good morning I wish you a day without coronas. | Positive/joy |
| Perhaps if I lock my front door, the coronavirus will be kept away. | Anger |
| In order to infect visitors with malware, hackers create false coronavirus maps. | Fear |
| My heart hurts so much at the notion of Jacob's Nashville performance being cancelled. Please go away. | Negative/sad |
Classifier accuracy comparison with the proposed method.
|
| Type of attribute | Classifier accuracy | |||
|---|---|---|---|---|---|
| Proposed DBN (%) | Naïve Bayes (%) | SVM (%) | K-nearest neighbors (%) | ||
| Unigram | All twitter data | 80.3 | 79.4 | 81.9 | 73.3 |
| Information gain >0 | 84.1 | 86.6 | 83.6 | 74.2 | |
| Best 70% on ranking | 88.1 | 88.0 | 83.2 | 73.5 | |
|
| |||||
| Bigram | All twitter data | 86.1 | 75.2 | 85.8 | 62.7 |
| Information gain >0 | 90.3 | 89.0 | 82.8 | 63.3 | |
| Best 70% on ranking | 89.5 | 83.0 | 87.8 | 62.7 | |
|
| |||||
| 1 to 3 gram | All twitter data | 86.1 | 85.7 | 82.5 | 68.8 |
| Information gain >0 | 90.1 | 92.5 | 84.1 | 66.0 | |
| Best 70% on ranking | 89.0 | 88.3 | 83.8 | 66.4 | |
Words classification based on different sentiments across time periods.
| Sentiment | March 2020 | April 2020 | May 2020 | June 2020 |
|---|---|---|---|---|
| Positive | 32430 | 32437 | 31572 | 26507 |
| Negative | 34181 | 31538 | 37410 | 20677 |
| Fear | 31496 | 35542 | 34982 | 29184 |
| Noncategorized | 136916 | 155750 | 147115 | 101254 |
| Total | 235023 | 255267 | 251079 | 177622 |
Figure 6Classifier accuracy between proposed and literature methods.