| Literature DB >> 35009619 |
Siva Kumar Pathuri1, N Anbazhagan2, Gyanendra Prasad Joshi3, Jinsang You4.
Abstract
The COVID-19 pandemic has spread to almost all countries of the World and affected people both mentally and economically. The primary motivation of this research is to construct a model that takes reviews or evaluations from several people who are affected with COVID-19. As the number of cases has accelerated day by day, people are becoming panicked and concerned about their health. A good model may be helpful to provide accurate statistics in interpreting the actual records about the pandemic. In the proposed work, for sentimental analysis, a unique classifier named the Sentimental DataBase Miner algorithm (SADBM) is used to categorize the opinions and parallel processing, and is applied on the data collected from various online social media websites like Twitter, Facebook, and Linkedin. The accuracy of the proposed model is validated with trained data and compared with basic classifiers, such as logistic regression and decision tree. The proposed algorithm is executed on CPU as well as GPU and calculated the acceleration ratio of the model. The results show that the proposed model provides the best accuracy compared with the other two models, i.e., 96% (GPU).Entities:
Keywords: CUDA; GPU; SADBM; parallel processing; reviews
Mesh:
Year: 2021 PMID: 35009619 PMCID: PMC8747430 DOI: 10.3390/s22010080
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Literature survey.
| Author | Algorithms Used | Feature-Selection | Data Source | Accuracy |
|---|---|---|---|---|
| Gualtiero Bcolombo (2015) | Graph mining | TF-IDF | Web forums(Twitter Data) | 84% |
| Dmytro Karamshuk (2017) | Glove word vector, Conventional classification, DT | Consistency Label | Public Twitter | 85% |
| Tong Liu (2017) | Support vector Machine (SVM) | TF-IDF, N-Grams | Historical Twitters posts | 88% |
| Bridianne O’Dea (2015) | SVM, Logistic regression | TF-IDF wih filter and without filter and no filter, Data points | CSIRO | 80% |
| Pete Burnap (2015) | SVM, Naïve Bayes (NB), Decision Tree (DT), Rotation forest | Lexical, Structural, Emotive, Psychological TF-IDF, N-Grams, | Web forums (Twitter Data) | 75% |
| Benjamin. L (2016) | Logistic regression | N-Grams, Linguistic context | Kaggle | 82% |
| Mia Johnson Vioules (2017) | NB, Sequential minimal optimization (SMO), Decision tree (J48), | NBB, Multinomial L-R, RF | 80% | |
| Scott R Braithwaite (2016) | Decision Tree (DT) | Linguistic, word count | Amazon Mechanical Turk (AMT) | 76% |
| Munmum De Choudhury (2013) | SVM with a radial-basis function (RBF) kernel | Depression set | Crowdsourcing | 86% |
| Ramit Sawhney (2018) | Ensemble, Linear classification | Twitter streaming API | 81% | |
| Bart Desmet (2018) | Parallel Computing | Bag of words, polarity lexicon | KAGGLE | 92% |
| Shaoxiong Ji (2018) | SVM, random forest, gradient boost classification, XGboost | TF-IDF, semantics and syntactic, statistics, Linguistic features | Reddit and Twitter blogs | 89% |
| Jingcheng Du (2018) | CNN binary classification | Linguistic features | Twitter streaming API | 74% |
Figure 1Architecture of the model.
Figure 2The work flow of the proposed model.
Figure 3General architecture of GPU versus CPU.
Figure 4Review count based on polarity.
Figure 5Word cloud of a review.
Figure 6Data classification.
Figure 7Decision tree of a review.
Figure 8Polarity vs. frequency.
Figure 9Finding positive and negative words for Sample review.
Number of threads versus time taken.
| No of Threads | Time Taken |
|---|---|
| 128 | 5.12 |
| 256 | 4.14 |
| 512 | 3.12 |
| 1024 | 2.69 |
Figure 10Sample kernel code.
Acceleration ratio to classify the records using SADBM.
| SADBM GPU Time | No of | No of | No of | No of | No of |
|---|---|---|---|---|---|
| Records | Records | Records | Records | Records | |
| s/12,000 | s/32,000 | s/52,000 | s/72,000 | s/92,000 | |
| Classification Time | 0.552 | 1.020 | 1.705 | 2.052 | 2.742 |
| CPU-Time | 0.710 | 1.130 | 1.740 | 2.3500 | 2.900 |
| GPU-Time | 0.550 | 1.010 | 1.640 | 2.230 | 2.490 |
| Acceleration Ratio | 1.296 | 1.118 | 1.064 | 1.054 | 1.1649 |
Figure 11Obtained Confusion matrix for Trained Data.
Figure 12Comparison of 3 algorithms.
Logistic Classifier for multiclass classification. Training accuracy Score: 0.9269020501138952. Validation accuracy Score: 0.8156025267249757.
| Polarity | Precision | Recall | F-Score | Support |
|---|---|---|---|---|
| 0 | 0.79 | 0.82 | 0.77 | 2467 |
| 1 | 0.884 | 0.81 | 0.84 | 5765 |
| Accuracy | 0.81 | 8232 | ||
| MacroAvg | 0.81 | 0.82 | 0.81 | 8232 |
| WeightedAvg | 0.81 | 0.81 | 0.81 | 8232 |
Decision Tree Classifier for multiclass classification. Training accuracy Score: 0.9469020501138952. Validation accuracy Score: 0.8856025267249757.
| Polarity | Precision | Recall | F-Score | Support |
|---|---|---|---|---|
| 0 | 0.89 | 0.84 | 0.88 | 2899 |
| 1 | 0.884 | 0.91 | 0.89 | 5333 |
| Accuracy | 0.89 | 8232 | ||
| MacroAvg | 0.87 | 0.88 | 0.87 | 8232 |
| WeightedAvg | 0.89 | 0.89 | 0.89 | 8232 |
CUDA-SADBM Classifier for multiclass classification. Training accuracy Score: 0.9869020501138952. Validation accuracy Score: 0.9656025267249757.
| Polarity | Precision | Recall | F-Score | Support |
|---|---|---|---|---|
| 0 | 0.953 | 0.943 | 0.924 | 2882 |
| 1 | 0.950 | 0.948 | 0.946 | 5350 |
| Accuracy | 0.96 | 8232 | ||
| MacroAvg | 0.96 | 0.96 | 0.956 | 8232 |
| WeightedAvg | 0.955 | 0.961 | 0.959 | 8232 |