Literature DB >> 33972817

TClustVID: A novel machine learning classification model to investigate topics and sentiment in COVID-19 tweets.

Md Shahriare Satu1, Md Imran Khan2, Mufti Mahmud3, Shahadat Uddin4, Matthew A Summers5,6, Julian M W Quinn5,7, Mohammad Ali Moni5,8.   

Abstract

COVID-19, caused by SARS-CoV2 infection, varies greatly in its severity but presents with serious respiratory symptoms with vascular and other complications, particularly in older adults. The disease can be spread by both symptomatic and asymptomatic infected individuals. Uncertainty remains over key aspects of the virus infectiousness (particularly the newly emerging variants) and the disease has had severe economic impacts globally. For these reasons, COVID-19 is the subject of intense and widespread discussion on social media platforms including Facebook and Twitter. These public forums substantially influence public opinions and in some cases can exacerbate the widespread panic and misinformation spread during the crisis. Thus, this work aimed to design an intelligent clustering-based classification and topic extracting model named TClustVID that analyzes COVID-19-related public tweets to extract significant sentiments with high accuracy. We gathered COVID-19 Twitter datasets from the IEEE Dataport repository and employed a range of data preprocessing methods to clean the raw data, then applied tokenization and produced a word-to-index dictionary. Thereafter, different classifications were employed on these datasets which enabled the exploration of the performance of traditional classification and TClustVID. Our analysis found that TClustVID showed higher performance compared to traditional methodologies that are determined by clustering criteria. Finally, we extracted significant topics from the clusters, split them into positive, neutral and negative sentiments, and identified the most frequent topics using the proposed model. This approach is able to rapidly identify commonly prevailing aspects of public opinions and attitudes related to COVID-19 and infection prevention strategies spreading among different populations.
© 2021 Published by Elsevier B.V.

Entities:  

Keywords:  COVID-19; Classification; Machine learning; TClustVID; Topics modelling; Twitter data

Year:  2021        PMID: 33972817      PMCID: PMC8099549          DOI: 10.1016/j.knosys.2021.107126

Source DB:  PubMed          Journal:  Knowl Based Syst        ISSN: 0950-7051            Impact factor:   8.038


Introduction

COVID-19 has become a global concern as a major and dangerous public health threat. The World Health Organization (WHO) declared COVID-19 a Public Health Emergency of International Concern (PHEIC) on February 28, 2020. During the 1960s various coronaviruses were identified as infectious to humans in the upper respiratory tract, notably human coronavirus 229E and OC43 [1]. Numerous coronaviruses may circulate in wild mammalian populations, with some causing minor human health problems. However, this picture changed with the emergence of severe acute respiratory syndrome (SARS-CoV) at 2002 and the Middle East Respiratory Syndrome coronavirus (MERS-CoV) at 2012 that infect respiratory tract epithelial tissues to cause serious and often deadly respiratory disease [2]. Pandemic coronavirus SARS-CoV2 causes the pandemic disease COVID-19 that shows flu and pneumonia-like symptoms with cardiovascular complications with severity ranging from undetectable to rapidly lethal. The spread of this disease has been causing huge economic disruption, personal health fears, and uncertainties that have dominated both the news and social media. The massive use of web and mobile technologies gives opportunities to the population to share their opinions on social media platforms such as Facebook and Twitter. Emotion plays a significant role in conducting effective human-to-human communication and provides major effort to take proper decisions [3]. Text is one of the essential components for affective computing as most of the people use text message/sms using computer to express their pinion [4]. During the COVID-19 pandemic, various social media have been used to communicate daily activities and thoughts, including many significant messages (texts) left by users sharing their general feelings about their personal situation, health status, tips to stay well, and other related information [5]. Such messages may provide large-scale insights into behavioral responses to the pandemic. However, it is not easy to judge whether various social media carries important information, not least because semantic abstruseness makes it hard to understand many messages. Nevertheless, machine learning and computational methods have increasingly been used to scrutinize social media data in the biomedical sector [6]. Content relating to COVID-19 may be useful to extract significant information for individuals and policy-makers. Twitter, in particular, is a popular micro-blogging and public networking service widely used for messaging and posting [7]. Automatic classification of tweets into particular classes is challenging, not least because these messages are short, 140 characters, or less [8]. In recent years, sentiment analysis is useful to process social media data like blogs, wikis, micro-blogging and other online collaborative media [9]. It is a branch of affective computing that classify as text either positive or negative. So, the analysis requires identification of sentiments in Twitter messages (tweets) which contain abbreviations, spelling variations and ambiguous or informal language. The objective of this work is to investigate the type of tweets being communicated and to extract information on significant topics that are useful to understand the COVID-19 pandemic situation. Details of working methodology where A. Data preprocessing B. Traditional classification and evaluation C. Clustering, classification and evaluation D. Comparison the outcomes between traditional and TClustVID E. Select the best clusters/datasets and Identify positive, neutral and negative clusters F. Extract topics by LDA and represent top frequent topics from it. Average performance of various classifiers for evaluating them using (a) traditional way (b) TClustVID corresponding to the nine twitter experimental datasets. Compute SHAP values to determine COVID-19 (a) Positive (b) Neutral (c) Negative topics. In this study, we collected several Twitter datasets and investigated sentiment topics related to COVID-19 by designing a novel machine learning model named TClustVID. This model was used to explore significant subsets using clustering methods and select them by verifying high classification performance. Each of these tweet clusters was split into the positive, negative and neutral group, and employed latent dirichlet allocation (LDA) to extract key topics. We then interpreted and identified more significant topics. This methodology can be used to generate relevant information on public and human social behavior dealing with COVID-19 issues for researchers and policymakers. The key contributions of this work are described briefly as follows: In TClustVID, we have incorporated clustering and classification to facilitate the extraction of significant topics concerning the pandemic. Multiple tweet datasets were used to verify the results of the proposed model in primary and different clusters. Significant topics were represented using various word clouds that render them more visible and understandable. The identification of the most frequently raised topics can make awareness of the underlying matters, particularly related to widespread concern.

Literature review

Affective computing and sentiment analysis is the key to the advancement of artificial intelligence. It has a great potentiality to become a sub-component technology for other systems [3]. Sentiment analysis is broadly categorized into symbolic and sub-symbolic approaches [10]. Popular sources of affect words are created knowledge bases to identify polarity text e.g., WordNet-Affect, SentiWordNet, SenticNet. Therefore, the integration of logical reasoning was happened with deep learning in SenticNet6 to infer polarity of text [4], [10]. Dragoni et al. [9] proposed commonsense ontology based on SenticNet that supports word embedding, domain information and polarity representation for sentiment analysis. Poria et al. [11] provided three deep learning based architectures where different facets of analysis to be considered for multimodal sentiment analysis. Chaturvedi et al. [12] introduced a convolutional fuzzy commonsense reasoning model which projects features into four dimensional space in order to increase classification performance. Jiang et al. [13] proposed joint-aspect level sentiment modification which trained aspect-specific sentiment words extraction and aspect-level sentiment transformation modules. Baired et al. [14] presented a lexical knowledge base approach where SenticNet was used to explore natural language concept and fine tune various feature types from the large scale multimodel dataset. Besides, several NLP works were performed based on the knowledge-based and statistical methods are combined for investigating short messaging, microblogging (e.g., Twitter) sentiment analysis. Khatua et al. [15] represented their work in the context of 2014 Ebola and 2016 Zika outbreaks where they suggest domain-specific word vectors are better than pre-trained Word2Vec (contrived from Google News) or Global Vector for Word Representation of Stanford NLP group (GloVe). Ahmed et al. [16] provided a query expansion model that accelerates the initial queries with expansion terms. In this case, various word embedding models such as Word2Vec, GloVe, and fastText are trained tweet corpus. Behera et al. [17] proposed a hybrid model combining convolutional neural network (CNN) and long short term memory (LSTM) called Co-LSTM, which is highly adaptable with big social data. Alike recent relevant works of sentiment analysis, some recent studies have been attempted to scrutinize COVID-19 tweets in bulk for public health research purposes, although it is likely that they have been mined for more commercial purposes. Aljameel et al. [18] gathered 2,42,525 tweets from five regions in Saudi Arabia to analyze their sentiments using support vector machine (SVM), k-nearest neighbor (KNN) and Naïve Bayes (NB). Alomari et al. [19] investigated 14 million tweets where they extracted significant features using TF–IDF based correlation analysis and explored relevant topics using LDA. Al-rakmi et al. [20] gathered 4,00,000 tweets and implemented entropy and correlation based feature selection and ensemble methods using NB, Bayes Net, KNN, C4.5, random forest (RF) and SVM. Boot-Itt and Skunkan [21] explored 1,09,990 tweets to analyze their sentiments using NRC sentiments lexicon and LDA. Gencoglu et al. [22] investigated 26 million tweets using language agnostic BERT sentence embedding models and further classified sentiments using KNN, LR and Bayesian hyperparameter optimization. Kouzy et al. [23] explored tweets using 14 trending hashtags and keywords about COVID-19 and investigated the magnitude of misinformation by comparing terms and hashtags. Kaur et al. [24] translated 16,138 tweets into English and scrutinized sentiments and emotions using TextBlob and IBM Tone analyzer, respectively. Medford et al. [25] gathered all twitter user data from January 14th to 28th, 2020 and investigated sentiments and explored topics using LDA. Mackey et al. [26] collected 4,492,954 tweets from the United States, United Kingdom, India and Australia where they extracted topics using biterm topics model (BTM) with topics clusters. Nemes and Kiss [27] analyzed tweets using TextBlob and RNN. Samual et al. [28] investigated 9000 tweets and got non-textual variables using N-Gram and further analyzed sentiments using NB, Linear regression, LR and KNN. Xiang et al. [29] gathered 82,893 tweets for sentiment analysis and topics modeling using NRC Lexicon and LDA respectively. Xue et al. [30] extracted 4 million English language tweets using N-Gram, NRC Lexicon and LDA analysis. Also, they [31] scrutinized 1.9 million English language tweets using machine learning models and LDA. Yin et al. [32] utilized 13 million tweets by inspected them using VADER and dynamic LDA model. Zhang et al. [33] perused tweets by employing N-Gram model and TF–IDF as well as explored sentiments using DT, LR, KNN, RF and SVM respectively.

Drawbacks of previous works

There are few observed issues and potential pitfalls in interpreting recently published work. Most have not proposed a framework for the investigation of tweets and employ both sentiment analysis and topic modeling. In addition, many works have specified their analysis as specialized to particular regions or languages, and cannot easily generalize those approaches globally. For sentiment classification, a small number of machine learning methods have been implemented as well as verified their results with only a small number of evaluation metrics. Most times, they focused on the specific issues (e.g., psychological or human needs). However, they did not extract the most significant topics needed to realize this pandemic situation by individuals. In studies using topic modeling, positive, negative and neutral topics were not specified in their work. Hence it is difficult to gain an understanding of the current situation of pandemic according to this perspective. The details visualization of topics orientation was given in this work. Word cloud of various topics.

Materials and methods

We proposed a machine learning based COVID-19 tweet analytical model that can be used to explore significant topics from Twitter datasets. To process them, different natural language processing techniques are used along with machine learning methods as illustrated in Fig. 1. The working project is provided at the following link https://github.com/shahriariit/COVID-19-Twitter-Data-Analysis.
Fig. 1

Details of working methodology where A. Data preprocessing B. Traditional classification and evaluation C. Clustering, classification and evaluation D. Comparison the outcomes between traditional and TClustVID E. Select the best clusters/datasets and Identify positive, neutral and negative clusters F. Extract topics by LDA and represent top frequent topics from it.

Data description

The COVID-19 Twitter datasets has been collected from the IEEE Data portal that originated from the LSTM model, developed by Rabindra Lamsal, who monitors the real-time feeds of COVID-19-related tweets [34]. It generates over 0.3 million requests every 24 h and its time-series graph is updated at every 30 s. Almost 16 million tweets were identified before March 20th 2020. Each database (*.db) contains three attributes where the first, second, and third columns have been indicated date and time, tweets, and sentiment scores, respectively. However, these sentiment scores have been manipulated within the range [0, 2] where the most negative, neutral, and positive sentiments are indicated as 0, 1 and 2, respectively. Eight twitter datasets (corona_tweets_1M.db, corona_tweets_1M_2, corona_tweets_1M, corona_tweets_2L, corona_tweets_2M.db, corona_tweets_2M_2, corona_tweets_2M_3 and corona-_tweets_3M) have been investigated and deemed suitable models to classify tweets in this study. Each dataset has been represented as the tweets related to COVID-19 of each day before March 20th 2020. We gathered datasets of a couple of days to understand and extract various topics everyday. The first seven of these datasets are denoted as dataset-1, dataset-2, dataset-3, dataset-4, dataset-5, dataset-6, and dataset-7. In this study, corona_tweets_3M was split into dataset-8 and dataset-9 because the computational cost is manipulated very high for the corona_tweets_3M.

Data preprocessing

In the preprocessing steps, different twitter datasets have been prepared for manipulation. These types of tweets contain various HTML tags, punctuation, numbers, single characters and multiple spaces. Several functions were used to clean datasets in this step. The symbols were replaced with empty spaces. Again, every single character which does not indicate any meaningful communication was replaced with space respectively. Finally, all multiple spaces were removed from these tweets. This process was employed in the nine twitter datasets and combined for further analysis. Table 1 represents the number of tweets before and after prepossessing steps.
Table 1

Number of cleaned tweets COVID-19 after data preprocessing.

Primary dataset
# tweets (N = 19,797,541)
Denoted
# tweets (N = 19,712,979)
Before preprocessingAfter preprocessing
corona_tweets_1M.db1,578,957Dataset-11,569,619
corona_tweets_1M_21,889,781Dataset-21,880,297
corona_tweets_1M1,903,768Dataset-31,894,526
corona_tweets_2L2,80,304Dataset-42,76,566
corona_tweets_2M.db2,322,153Dataset-52,312,104
corona_tweets_2M_22,268,634Dataset-62,257,529
corona_tweets_2M_32,081,576Dataset-72,072,575
corona_tweets_3M7,472,368Dataset-83,724,882
Dataset-93,724,881
Number of cleaned tweets COVID-19 after data preprocessing.

Tokenization

After the pre-processing steps, tokenization procedures were used to generate a word-to-index dictionary whereby each word is created as a key in the corpus. Hence, the corresponding unique index has been indicated the value of the keys. In the training phase, each list is held on each sentence where the size is dissimilar. Thus, the maximum length of the list is fixed. If the length of any list is exceeded, it is truncated into the maximum permitted length. Zeros are added to the endpoint of a shortlist until it reaches a maximum length, a process is termed padding. Employing word embedding is useful to extract significant words and investigate similarity along with semantic relations more precisely. Pennington et al. [35] proposed a certain weighted least squares model that trains and counts global word-word co-appearance for efficient statistical usage. This is called GloVe that is also publicly available [15]. Thus, this embedding word vector has been used to create a dictionary that holds a word as a key and the corresponding list as values [36]. Finally, an embedding matrix is generated whereby each row number matches with the index of the word in the corpus. Raw tweets contain text instances which cannot handle by machine learning procedure. Therefore, we run data pre-processing and tokenization process to make it executable for clustering and classification computation.

Traditional analysis

In the traditional process, we have been manipulated by various data pre-processing, tokenization and implemented different baseline classifiers into twitter datasets. Therefore, various well known classifiers were applied in the primary datasets using 10 fold cross validation and compared the results with TClustVID. However, both traditional and TClustVID have been used the same baseline classifier which indicates at Section 3.6.

TClustVID: Clustered based classification and topics modeling approach

In the beginning, different preprocessing and tokenization process has been implemented into COVID-19 twitter datasets and split them into several groups applying k-means method. Clustering is an unsupervised technique to partition a set of the dataset into subsets/clusters. This procedure is helpful to improve the performance of machine learning methods by creating clusters. There are existing various algorithms such as k-means, k-medoids, fuzzy C-means, hierarchical, and density based clustering [37], [38]. K-medoids is not the best choice for analyzing sparse data like tweets. Then, fuzzy C-means is useful to the sheer volumes of tweets and contains low scalability where human annotation really expensive. The performance of hierarchical clustering is slower than the k-means method. Density based techniques are highly efficient for clustering unstructured data and less prone to outliers and noise. In this work, we processed a large amount of tweet data where K-means defines the mean point within the cluster by optimizing the Euclidean distance between each instance in less time [38], [39]. The default values of k 5 are mainly used in this work. Each cluster has been contained positive, negative and neutral tweets where generated tokens were replaced by primary tweets and re-tokenized in each cluster. Baseline classifiers have then been used to investigate the performance of individual clusters using 10 fold cross-validation. Different evaluation metrics such as accuracy, the area under the curve (AUC), f-measure, g-mean, sensitivity and specificity were used to assess these results. The detailed working steps of TClustVID is represented briefly in the Algorithm summary 1. Compared the classification results of traditional approach and TClustVID, the best performing clusters can be used to extract more frequent topics. These clusters are divided into positive, neutral and negative sentiments for further analysis. Therefore, LDA has been used to explore significant positive, neutral and negative topics from the high performing nine clusters. 20 topics were extracted from each cluster. We represented individual topics in a word cloud where each contains different words/tokens. According to the weights of tokens, this cloud represents different word. However, LDA cannot interpret these topics so we manually analyzed the words/tokens of each topics to define them.

Baseline classification

In previous studies, the various classifiers such as decision tree (DT), Gradient Boosting (GB), K-Nearest Neighbor (KNN), Logistic Regression (LR), Multi-Layer Perceptron (MLP), Naïve Bayes (NB), Random Forest (RF), Support Vector Machine (SVM) and XGBoost (XGB) have been commonly used to investigate different twitter datasets for sentiment analysis. These classifiers were used in similar kinds of tasks such as C5.0 (DT), KNN, SVM, LR and ZeroR [40], personality prediction using KNN, NB, SVM, and XGB [41], [42], spam detection using RF, NB, SMO and Ibk (KNN equivalent) [43], sentiment analysis using NB, SVM, and MLP of top colleges [44], prediction of alternation price fluctuation using GB [45]. Following this literature, we selected them to investigate COVID-19 twitter dataset and explore the best clusters. Positive topics of Cluster-3. Neutral topics of Cluster-3. Negative topics of Cluster-3. Top frequency of (a) Positive (b) Neutral (c) Negative COVID-19 associated topics.

Evaluation metrics

A confusion matrix is needed to estimate the performance of the classifier that indicates the number of correct and incorrect predictions by considering known true values. Based on positive and negative classes, this shows True Positive (TP), True Negative (TN), False Positive (FP) and False Negative (FN) values for the data fitting. Accuracy: represents the efficiency of the algorithm in terms of predicting true values. AUC: is used to explore machine learning models considering the TP and TN rates represent how well positive classes are isolated from negative classes. F-measure: represents the harmonic mean of precision and recall. Geometric-mean (G-mean): specifies the root of the class-specific sensitivity product and makes a trade-off between the expansion of accuracy on each class and balancing accuracy. Sensitivity: The portion of appropriately detected actual positives is indicated as sensitivity. Specificity: The portion of correctly identified actual negatives is denoted as specificity.

Experimental result

Sentiment analysis through classification approach

In this study, our proposed TClustVID has detected positive, negative, and neutral tweets more accurately using a clustering based classification and explored more significant thematic topics. However, primary datasets were cleaned using different data preprocessing procedures and Word-to-index dictionaries were then created using GloVe embedding tokenization. Several classification algorithms such as DT, GB, KNN, LR, MLP, NB, RF, SVM and XGB were analyzed sentiments of the COVID-19 datasets using the sci-kit-learn machine learning python library [46], [47]. The results of individual classifiers for nine COVID-19 twitter datasets are represented at Table 2.
Table 2

The results of sentiment classification for individual datasets.

Dataset
Classifier
Accuracy
AUC
F-Measure
Sensitivity
Specificity
Dataset
Accuracy
AUC
F-Measure
Sensitivity
Specificity
Traditional AnalysisTClustVID
Dataset-01LSTM0.9270.8970.9260.9270.868Cluster-010.9830.9790.9830.9830.974
DT0.9150.9010.9160.9150.8860.9520.9450.9520.9520.938
GB0.7880.6530.7460.7880.5180.8160.7130.7900.8160.610
KNN0.9100.8800.9090.9100.8500.9460.9300.9450.9460.914
LR0.6950.5020.5760.6950.3080.6790.5000.5520.6790.322
MLP0.8400.7660.8300.8400.6920.9010.8690.8990.9010.836
NB0.6540.5030.5970.6540.3520.6440.5020.5770.6440.359
RF0.9240.8980.9230.9240.8730.9570.9440.9570.9570.932
SVM0.7570.6000.6940.7570.4420.8030.6930.7720.8030.583
XGB0.7870.6570.7500.7870.5270.8540.7740.8400.8540.695

Dataset-02LSTM0.9680.9490.9680.9680.929Cluster-020.9880.9770.9880.9880.967
DT0.9310.8990.9310.9310.8660.9640.9430.9640.9640.921
GB0.8160.5670.7550.8160.3180.8560.6260.8190.8560.396
KNN0.9240.8650.9220.9240.8060.9580.9180.9570.9580.878
LR0.7870.5010.6950.7870.2160.8070.5050.7270.8070.203
MLP0.8670.7300.8540.8670.5920.9250.8410.9210.9250.756
NB0.2130.5000.0760.2130.7870.1920.5000.0630.1920.808
RF0.9370.8880.9360.9370.8380.9680.9360.9670.9680.905
SVM0.7870.5010.6950.7870.2150.8060.5020.7240.8060.197
XGB0.8200.5780.7640.8200.3360.8750.6940.8550.8750.513

Dataset-03LSTM0.9150.9220.9150.9150.929Cluster-030.9850.9870.9850.9850.988
DT0.9110.9300.9110.9110.9500.9600.9670.9600.9600.973
GB0.6990.7170.6740.6990.7350.8460.8380.8360.8460.830
KNN0.8930.9180.8930.8930.9420.9500.9580.9500.9500.965
LR0.5140.5480.4260.5140.5820.6680.6510.6280.6680.635
MLP0.7930.8270.7880.7930.8600.9090.9140.9080.9090.918
NB0.4850.5510.4410.4850.6160.2120.5040.1780.2120.796
RF0.9110.9330.9110.9110.9550.9590.9660.9590.9590.973
SVM0.3440.5200.3080.3440.6960.4630.6170.4820.4630.770
XGB0.7220.7660.7100.7220.8090.8470.8460.8380.8470.845

Dataset-04LSTM0.9040.9150.9030.9040.926Cluster-040.9570.9570.9560.9570.956
DT0.8920.9150.8920.8920.9370.9430.9490.9430.9430.956
GB0.6210.6140.5530.6210.6070.8180.7880.8060.8180.758
KNN0.8730.9010.8730.8730.9290.9300.9390.9300.9300.948
LR0.5470.5560.4570.5470.5650.7470.7220.7280.7470.697
MLP0.7650.7970.7580.7650.8290.8820.8770.8780.8820.872
NB0.5330.5360.4220.5330.5390.2740.5060.1390.2740.737
RF0.8920.9180.8920.8920.9430.9430.9500.9420.9430.958
SVM0.3970.5190.3980.3970.6410.3260.5230.3400.3260.720
XGB0.6830.6910.6480.6830.6990.8250.8090.8170.8250.792

Dataset-05LSTM0.9040.9270.9030.9040.951Cluster-050.9680.9750.9680.9680.983
DT0.8660.8990.8660.8660.9320.9020.9250.9020.9020.949
GB0.5340.6250.4940.5340.7150.6240.6840.5870.6240.744
KNN0.8410.8800.8410.8410.9200.8780.9070.8780.8780.937
LR0.4310.5520.3670.4310.6730.4540.5570.3860.4540.659
MLP0.6240.7120.6220.6240.8010.7490.8000.7440.7490.851
NB0.4190.5290.3050.4190.6390.4290.5240.3440.4290.619
RF0.8650.9000.8650.8650.9340.9000.9240.9000.9000.949
SVM0.3380.5250.2580.3380.7110.4240.5370.3620.4240.650
XGB0.5480.6470.5320.5480.7450.6450.7170.6390.6450.789

Dataset-06LSTM0.8760.9080.8770.8760.941Cluster-060.9770.9820.9770.9770.987
DT0.8790.9090.8790.8790.9380.9320.9480.9320.9320.963
GB0.6020.6590.5620.6020.7150.7630.7850.7480.7630.807
KNN0.8580.8930.8590.8580.9290.9170.9360.9170.9170.955
LR0.4740.5610.4000.4740.6480.5260.5810.4650.5260.636
MLP0.7140.7780.7120.7140.8420.8460.8740.8450.8460.902
NB0.4500.5220.3150.4500.5940.4750.5150.3280.4750.554
RF0.8790.9100.8790.8790.9420.9310.9480.9310.9310.964
SVM0.4180.5300.3410.4180.6430.5360.5680.4330.5360.600
XGB0.6420.7190.6370.6420.7960.7740.8130.7720.7740.851

Dataset-07LSTM0.9030.9190.9030.9030.936Cluster-070.9830.9860.9830.9830.990
DT0.9080.9290.9080.9080.9510.9550.9650.9550.9550.975
GB0.6640.7180.6560.6640.7730.8100.8300.8060.8100.850
KNN0.8890.9150.8890.8890.9420.9410.9540.9410.9410.967
LR0.4510.5380.3800.4510.6240.5480.5980.5010.5480.647
MLP0.7680.8130.7640.7680.8590.8850.9050.8850.8850.925
NB0.2190.5010.0830.2190.7830.2200.5030.0940.2200.787
RF0.9090.9310.9090.9090.9540.9540.9640.9540.9540.975
SVM0.3530.5170.3530.3530.6810.2990.5390.2510.2990.780
XGB0.6350.7050.6320.6350.7740.8150.8430.8140.8150.871

Dataset-08LSTM0.9080.9210.9070.9080.935Cluster-080.9760.9810.9760.9760.985
DT0.8700.9010.8700.8700.9310.9100.9290.9100.9100.948
GB0.6000.6540.5570.6000.7090.6870.6980.6550.6870.708
KNN0.8470.8840.8470.8470.9210.8530.8840.8530.8530.915
LR0.5010.5820.4400.5010.6630.5160.5470.4280.5160.578
MLP0.6500.7220.6350.6500.7940.7950.8250.7900.7950.856
NB0.4600.5290.3320.4600.5990.4890.5360.3790.4890.582
RF0.8700.9030.8700.8700.9360.9090.9300.9090.9090.951
SVM0.4400.5130.3260.4400.5850.4090.5050.3370.4090.601
XGB0.5970.6780.5770.5970.7590.6780.7240.6690.6780.770

Dataset-09LSTM0.8970.9120.8960.8970.928Cluster-090.9760.9810.9760.9760.986
DT0.8700.9000.8700.8700.9310.9110.9300.9110.9110.949
GB0.6000.6540.5570.6000.7090.6860.6980.6510.6860.711
KNN0.8470.8840.8470.8470.9210.8560.8860.8560.8560.917
LR0.4980.5790.4370.4980.6600.5080.5410.4200.5080.574
MLP0.6500.7150.6330.6500.7800.8020.8300.7970.8020.859
NB0.2210.5000.0830.2210.7800.2500.5070.1910.2500.764
RF0.8690.9020.8700.8690.9360.9100.9310.9100.9100.952
SVM0.3450.5080.3000.3450.6710.2700.5150.2430.2700.759
XGB0.5990.6800.5790.5990.7600.6760.7260.6680.6760.776
In traditional analysis, a number of classifiers such as LSTM, DT, RF, GB, KNN, MLP, NB, RF, SVM and XGB have been implemented. Therefore, LSTM gave the highest accuracy, f-measure and sensitivity and DT provided maximum AUC and specificity for dataset-1. Also, this classifier outperformed other classifiers in all evaluation metrics for dataset-2, 5 and 8, respectively. In addition, LSTM provided the highest accuracy, f-measure and sensitivity as well as RF provided the best AUC and specificity for dataset-3 and 4, individually. However, DT generated the maximum accuracy and sensitivity while RF gave the highest AUC, f-measure and specificity for dataset-6. Again, RF outperformed other classifiers in all metrics for dataset-7. LSTM showed the highest AUC, f-measure and sensitivity and RF provided the best accuracy and specificity for dataset-9. In contrast, individual classifiers were employed into different datasets using TClustVID where their results have been improved over the traditional analysis. However, the same classification methods that have been used in a general way were employed into Twitter datasets using TClustVID. Several clusters have been produced that were used to generate classification results where TClustVID has been identified those clusters whose were given the best classification results among them. In this case, LSTM outperforms other classifiers with all evaluation metrics for all datasets. In Fig. 2(a), the average outcomes of different classifiers such as LSTM, DT, KNN, MLP, XGB, GB, SVM and LR are represented using a traditional approach. Similarly, TClustVID manipulated average results of the same classifiers used by TClustVID and compared its findings with traditional procedure (see Fig. 2(b)). In this case, LSTM provided the highest average accuracy, AUC, f-measure, sensitivity and specificity for both traditional way and TClustVID. In addition, TClustVID showed better results compared to more traditional approaches (see Fig. 2).
Fig. 2

Average performance of various classifiers for evaluating them using (a) traditional way (b) TClustVID corresponding to the nine twitter experimental datasets.

The results of sentiment classification for individual datasets. However, we measured shapley additive explanations (SHAP) values of various tokens to determine positive, neutral and negative sentiments more effectively. SHAP is a game theoretic technique to interpret the findings of any machine learning model. Therefore, the result of TClustVID for LSTM has been evaluated in each cluster and explored which tokens are responsible to classify positive, neutral and negative sentiments. Fig. 3 shows the probability of SHAP values for different tokens in different nine clusters.
Fig. 3

Compute SHAP values to determine COVID-19 (a) Positive (b) Neutral (c) Negative topics.

Along with observing the performance of various classifiers, we noted that TClustVID shows better performance than traditional analysis. Hence, a topic modeling approach is used to produce high performing clusters for the extraction of significant topics in the next section (see Fig. 4).
Fig. 4

Word cloud of various topics.

Topic modeling approach

Extraction of clusters using TClustVID

A comprehensive analysis of different classifiers in traditional and TClustVID analyses indicated that TClustVID is the best model to identify significant groups of tweets from large COVID-19 Twitter datasets. The data obtained from the identification of groups/clusters were significant because they showed the highest classification accuracy were achieved compared to traditional analysis in primary data. In the TClustVID analysis, we generated significant clusters from each of these twitter datasets (for positive neutral, and negative categories) that showed greatly improved results for the different classifiers. These clusters have been denoted as Cluster-1, Cluster-2, Cluster-3, Cluster-4, Cluster-5, Cluster-6, Cluster-7, Cluster-8, and Cluster-9, respectively.

Topics exploration using LDA

A number of topics were then extracted from these clusters where within nine clusters seven of them produced positive, neutral and negative topics and two of them extracted positive and neutral topics using LDA. Each topic contains 10 tokens along with related weights and they can be used to prioritize each token. 20 topics were identified from each of the categories (positive, neutral and negative) in these clusters. Therefore, all topics of individual clusters are represented as a word cloud in the supplementary section. In this paper, extracted positive, neutral and negative topics of cluster-3 are visualized with word cloud in Fig. 5, Fig. 6, Fig. 7 individually.
Fig. 5

Positive topics of Cluster-3.

Fig. 6

Neutral topics of Cluster-3.

Fig. 7

Negative topics of Cluster-3.

Qualitative analysis

As LDA cannot interpret the meaning of topics, we defined their themes by determining the meaning and weight values in different groups manually. The themes of positive, neutral and negative topics are indicated in Table 3, Table 4, Table 5 respectively. These tasks are not simple because many pre-processed words do not have any semantic meaning. However, it can be hard to understand the association between the different words/tokens in these topics and these interpretations may slightly differ from that used in other types of reviewing.
Table 3

Positive themes of all significant clusters.

Cluster-1Cluster-2Cluster-3Cluster-4Cluster-5

Theme-1CulturePreventionKidsWishSunny
Theme-2NationalitySituationWishNewsWatch
Theme-3PreventionSituationTestingSituationAffect
Theme-4CaringHomeworkTreatmentHelpSituation
Theme-5BlamingNewsTestingHelpTreatment
Theme-6BelieveNewsCaringFactsAwareness
Theme-7DieNewsFeelingControlMedicine
Theme-8CaringWishSituationInfectiousTreatment
Theme-9DiscriminationAwarenessScaringRightMedicine
Theme-10SituationFinancial stateBuyingAwarenessAwareness
Theme-11CrisisNewsFunWishPrevention
Theme-12Financial HelpAvoidnessRightNewsSituation
Theme-13ConditionCrisisPanicSituationAwareness
Theme-14WishFoodProtectionDistance & TreatmentTreatment
Theme-15LockdownBlamingHealthAnnoyingAwareness
Theme-16ClosingSituationAwarenessSituationHumor
Theme-17ClosingLockdownPanicJobSituation
Theme-18AwarenessAwarenessEffectStay SafeRisk
Theme-19Financial helpAnnoyingMicro-OrganismAwarenessSituation
Theme-20CaringAwarenessNewsWishRisk

Cluster-6Cluster-7Cluster-8Cluster-9

Theme-1RightTesting & TreatmentSurviveShut
Theme-2NeedInterestFluHonest
Theme-3CovidNeedMoveMedia
Theme-4Social mediaSocial distanceOverreactRight
Theme-5AwarenessSocial distanceSituationTesting
Theme-6FlightEpidemicRumorCaring
Theme-7MessegeSocial distanceFight & CaringIsolation
Theme-8RightSymptomsCasesSurvive
Theme-9TreatmentEffectDiseaseHome
Theme-10WishConfirmedCasesWish
Theme-11SituationCoronavirusAwarenessWorried
Theme-12WarningMessageInfectiousSituation
Theme-13Testing & TreatmentCoronavirusSocial guysQuarantine
Theme-14CasesSocial distanceSituationLove
Theme-15MessageTourismQuarantineScaring
Theme-16MessageTourismAwarenessDo not Move
Theme-17SituationCoronavirusFactsAffect
Theme-18TourismOutbreakSchoolsWind
Theme-19CoronavirusCoronavirusCrisis & PreventionAwareness
Theme-20AwarenessAwarenessFinancial enrichmentFuck
Table 4

Neutral themes of all significant clusters.

Cluster-1Cluster-2Cluster-3Cluster-4Cluster-5

Theme-1Financial loseWarningOutbreakSituationAwareness
Theme-2FactFoodSharingPanicInfectious
Theme-3WarningSituationWishSituationSituation
Theme-4EstimateSituationGonnaEntertainmentNeed
Theme-5BlamingTestingCaringProtectionWish
Theme-6PleasedRumorCaringDeadFood
Theme-7Financial loseWarningPanicHealthBreak
Theme-8Pandemic warningVisitingSurviveStay HomeTreatment
Theme-9AwarenessJokeAwarenessAvoidWant
Theme-10DiseasePanicTreatmentFactPrevention
Theme-11WarningSituationPlaying gameAwarenessAwareness
Theme-12CaringPanicCoronavirusProtectionPanic
Theme-13PanicClosingHomeworkAwarenessSituation
Theme-14PanicPanicRamadhan newsSituationAwareness
Theme-15AwarenessPanicSanitationFactPrevention
Theme-16PanicSituationWishPanicCoronavirus
Theme-17BlamingHomeworkSituationWishAvoid
Theme-18JokeBlamingCoronavirusUpdateFood
Theme-19JokePanicAvoidCasesSituation
Theme-20AnnoyedAnnoyedStop spreadingHospitalizeCoronavirus

Cluster-6Cluster-7Cluster-8Cluster-9

Theme-1VacineRuinSituationTourism
Theme-2NewsCasesWatchOutbreak
Theme-3MessageCoronavirusVirusSituation
Theme-4PreventionAwarenessTouchSituation
Theme-5DeadWait & ThingsSymptomQuarantine
Theme-6NewsCrisisProblemEducation
Theme-7PanicSymptomShotEducation
Theme-8ProtectionNewsLikeVirus
Theme-9AwarenessSymptomSituationPandemic
Theme-10SituationInfectiousSickDead
Theme-11ThreadExposeDeadEducation
Theme-12WishCaringBodyAwareness
Theme-13SituationHelp & NeedFluBody
Theme-14AwarenessProtectionWishNeed
Theme-15MessageTestingPanicCaring
Theme-16SituationBlamingWatchPanic
Theme-17MediaCureTimeFact
Theme-18CoronavirusMessagePanicCases
Theme-19CasesStay HomeContractPublic
Theme-20HealthSituationAwarenessExhibit
Table 5

Negative themes of all significant clusters.

Cluster-1Cluster-2Cluster-3Cluster-4Cluster-5Cluster-6Cluster-7
Theme-1Financial crisisPanicAnxietyWarningSeriousFinancial crisisWorry
Theme-2PanicMediaDieAvoidBlamingHopeExcuse
Theme-3PanicFoodPanicWarningMessagePanicFake News
Theme-4SituationJoblessPanicSickBuyDeadSad
Theme-5IsolationRestrictionIncurBlamingHateSituationSituation
Theme-6StoppingFoodPanicSituationAvoidFeverCoronavirus
Theme-7DiseaseSituationPanicCovidStoppingAwarenessMedia
Theme-8SpreadingFoodSituationAfraidInfectiousSituationCatch & Game
Theme-9SituationJoblessSituationSituationScareFoodEbola
Theme-10AvoidSituationPanicBlamingEraziLack of protectionWorst
Theme-11TreatmentPanicSituationCrisisCrisisNeedSick
Theme-12PanicNewsSickPanicPanicLockdownQuarantine
Theme-13FearClosingCoronavirusDieLong lastingFearDisease
Theme-14DiseaseBlamingSituationSpreadingPropagandaWrongScare
Theme-15SituationSocial distanceSufferTreatmentFakeToiletPanic
Theme-16SituationPanicSituationDangerLockHateCovid
Theme-17Habitual FactNon-RealiablePanicFake NewsPanicDeadDisease
Theme-18HumorInfectousSituationWrongOutbreakDangerSituation
Theme-19PanicDiseaseDieTreatmentAcceptColdPanic
Theme-20PanicCareFake NewsDeadHopeEbolaAnnoy
In the different categories of tweets, we manipulated the frequency of different topics that appears several times. Positive, neutral and negative topics have been identified what activities are generated in the context. To understand individual topics into different themes, we considered the best themes which are appeared more than 1 times (see Fig. 8). The examples of positive topics of cluster-3 are shown as the word cloud in Fig. 5. In addition, The themes of positive topics within different clusters are shown in Table 3 and the top frequent positive themes are shown in Fig. 8(a). For the positive cases, awareness and situation are the most frequent themes that appear many times in different clusters. Both of these appear 17 times in different significant clusters. Awareness has specified those actions whose are taken by individuals and situation symbolizes the general situation of particular places/incidents where pandemic news indicates a generic situation relating to COVID-19. Wishes appear 8 and new appears 7 times in this study. Furthermore, caring, coronavirus, right and treatment are gathered 5 times, and message, and social distance are found 4 times this effort. Subsequently, cases, prevention, testing and tourism are obtained 3 times in the COVID-19 situation. In addition, other precaution related themes such as affect, annoying, blaming, closing, crisis, effect, facts, financial help, help, infectious, lockdown, medicine, need, panic, quarantine, risk and scaring are represented their frequency 2 times in different clusters. They are appeared regularly and specifies how we can improve this condition. However, some of negative themes, for instance blaming, crisis, infectious, panic, risk appeared in positive cases but their frequencies are not greater. More upcoming positive issues are also addressed in this analysis included financial help, help, lockdown, quarantine and medicine.
Fig. 8

Top frequency of (a) Positive (b) Neutral (c) Negative COVID-19 associated topics.

In the neutral category, there are appeared the mixture of positive and negative topics which indicates the most frequent topics in recent timeframes. For example, we have represented an example of neutral topics as a world cloud in Fig. 6. Besides this, neutral themes of different clusters are provided in Table 4 and top frequent themes are shown in Fig. 8(b). Therefore, situation, panic and awareness are found 19, 16 and 13 times in the following list of twitter topics. Panic is a related theme to explain epidemic conditions and news. In addition, wish and coronavirus appear 6 times as well as caring which appears 5 times at negative tweets. Consequently, blaming, cases, die, warning and protection appear 4 times while education, food, joke, message, news, prevention, and symptom appear 3 times in this condition. The rest of the themes perform 2 times to represent neutral topics. The issues such as those related to before and after the COVID-19 pandemic like Financial, lose, crisis, food, education also arose in this analysis. Positive themes of all significant clusters. The negative topics using the word cloud are represented in Fig. 7. Thus, the themes of negative topics have been provided in Table 5 and topmost frequent themes are shown in Fig. 8(c). In this category, panic and situation appear most of the times than other topics. Both of them appear 20 and 18 times respectively. Dead and disease appear 6 and 5 times enabling estimation of its influence. Thus, food and blaming occur 4 times and treatment, sick, fake news and avoid represent 3 times to represent significant topics. Some cases like food and treatment indicate the level of crisis perceived. The rest of the themes are provided with a frequency of 2 in this work. Therefore, they are shown in the top list of feelings or perceptions relating to COVID-19 that are negative. Neutral themes of all significant clusters. Negative themes of all significant clusters.

Discussion

Comparison of TClustVID with recent published work

Proposed TClustVID is overcame many of the pitfalls that are evident in many recent work. In current work, we present a well-organized machine learning model that has been employed into common COVID-19 oriented tweets where different regions are not specified like previous studies [18], [28], [48], [49]. Both sentiment analysis and topics modeling were used to explore COVID-19 related themes than many works [18], [20], [22], [24], [27], [48], [49], [50], [51]. However, many machine learning classifiers have been implemented in which we compared our proposed model with more traditional analyses to evaluate performance. However, most previous studies [18], [28], [31] used only a small number of classifiers to verify their tasks. Our work was also able to extract reliable themes of positive, negative and neutral topic to explore clusters and realize the condition of COVID-19.

Implications

Twitter refer to a reasonable and proficient platform to validate the efficiency of public health communication. Real-time epidemiological data are required to properly and comprehensively characterize user discussion, self-reporting capabilities and rapid evaluation of pandemic situation. In this study, we developed a machine learning based framework named TClustVID and investigated various types of public tweets related to COVID-19, identifying related sentiments and extracted associated topics from a number of localities. This efficiently provides significant insights on how people interpret mixed around COVID-19 messages. There are numerous theoretical and practical implications about this model which is described as follows.

Theoretical implications of the study

This proposed method has extracted positive, negative and neutral topics to scrutinize its contents and extract significant values to give various information about related issues. TClustVID has been designed to focus on particular types of analyses such as psychological and emotional analysis. However, it can easily be generalized and adapted to analyze any specific topics of interest. This study is very useful to verify these kinds of analysis in various perspective. However, demographic analysis, comparison and discussion can give a concrete idea about various source. Theoretical understanding gained from this work can be used for addressing similar types of problems but also doing so at a lower cost. From the limitations and suggestions, researchers can take numerous new challenges in future work.

Practical implications

Users of this model can isolate individuals one from another by giving relaxation and support via social media. It safeguards people interest and needs in the society. This analytical approach can be used for comprehensive contact tracing, unidentified hot spots of COVID-19 infection and increase the accuracy, predictability to find out COVID-19 cases. This model can be employed to explore how to improve public health campaigns on the leading topics featuring in twitter conversations to give timely responses and improve initiatives taken by agencies. This work has mainly focused on a number of particular common concerns relating to working conditions. Many tweets have been posted about working from home during this outbreak. It can be explored an opportunity to follow patterns of vaccine acceptance and failure or criticism against it. Also, it allows assessment of real-time trends for COVID-19 treatment, medical equipment, diagnosis, cross correlating its information with medical information and other factors. A new surveillance system can be built to examine web-based contents using this model for better understanding of public emotions and concerns. This works can be generalized to analyze other social media data such as Instagram, Facebook and YouTube The scientific community can also be studied to determine for their the similarity and dissimilarity from public comments using this model. Our work has generated useful data for agencies, local leaders, health providers and municipalities. This can enable governments to coordinate the flow of information and combat misinformation about the pandemic.

Limitations of the sudy

Twitter gives the community interaction and its user profiles represent a relatively small demographic data for further analysis. We only gathered tweets using a few numbers of keywords from one social media platform. This study has only investigated English language tweets. In addition, machine and deep learning methods have not been implemented into a large amount of COVID-19 oriented tweets. Again, the interpretation of topics is a challenging task, hence some manual interpretation of topics may misinterpret in the topics modeling.

Challenges and future suggestions

A number of challenges can be considered for investigating COVID-19 tweets for sentiment analysis and topics modeling. In different social media such as Twitter, many cases of showing irrelevant, fake, misinformed and insufficient data has been found. In addition, these tweets needed to be collected from different domains of the social media. It is difficult for researchers to work with this dataset as processing such dataset requires a high degree of technical skill. It is often hard to define which keywords are appropriate to gather COVID-19 related tweets and identify desired data. Moreover, decision makers face troubles to identify people’s sentiments on a subject or to characterize their beliefs. Also, there remain a lack of scientific studies, to gather knowledge for designing a new model. In these difficult circumstances, we will need to face these challenges. Along with Twitter, the records of other social media (such as Facebook, YouTube, Instagram and Reddit) need to be investigated to explore knowledge about COVID-19 pandemic from users. To overcome the general lack of published literature on the subject, most relevant previous works about pandemic situation can be useful for getting solutions from them. However, COVID-19 related hashtags and keywords need to be explored using to recently developed academic literature and sentiment and opinion mining tasks. New developments such as TClustVID can also be used with modifications to analyze more similar but heterogeneous records of various sources.

Conclusion

In this work, we have proposed a clustered based machine learning model named TClustVID that has given the best performance outcomes in sentiment analysis and topics modeling by analyzing COVID-19 twitter datasets compared to other methods. TClustVID first extracted various clusters from individual datasets using k-means algorithm [38], then the proposed model was used to separate different classifiers into clusters and one of them represents the highest classification accuracy in each dataset. We subsequently compared the topmost clustering result of each dataset with traditional analysis with TClustVID showing the maximum outcomes for each case. Furthermore, the best clusters identified provided more significant topics in each dataset and represents public opinions on Twitter. It also explored more significant information that can be abstracted from very large numbers of tweets by extracting commonly occurring topics and interpreting their themes. This model is helpful to identify important themes about the situation at the time the tweets were sent, and can enable designing better strategies to counter the pandemic that take human responses and behavior into account. This knowledge was extracted from positive, neutral and negative tweets and identified high frequency information features transmitted and commented as the response to the epidemic condition. As noted in the Study Limitations (Section 5.3) and future guidelines of this work (Section 5.4), more COVID-19 oriented social media data from different sources can in future be collected and investigated using TClustVID (and improved versions of TClustVID) and other techniques currently being used, which will enable efficient extraction and analysis of significant information about COVID-19 and other health emergencies.

CRediT authorship contribution statement

Md. Shahriare Satu: Conceptualization, Methodology, Resources, Data curation, Writing—original draft preparation, Visualization. Md. Imran Khan: Conceptualization, Methodology, Software, Data curation, Visualization. Mufti Mahmud: Methodology, Formal analysis, validation. Shahadat Uddin: Formal analysis, validation. Matthew A. Summers: Formal analysis, validation. Julian M.W. Quinn: Writing—review and editing. Mohammad Ali Moni: Writing—review and editing, supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
  11 in total

1.  Sentiment Analysis on COVID-19 Twitter Data Streams Using Deep Belief Neural Networks.

Authors:  Jatla Srikanth; Avula Damodaram; Yuvaraja Teekaraman; Ramya Kuppusamy; Amruth Ramesh Thelkar
Journal:  Comput Intell Neurosci       Date:  2022-05-06

2.  COVID-19 analytics: Towards the effect of vaccine brands through analyzing public sentiment of tweets.

Authors:  Khandaker Tayef Shahriar; Muhammad Nazrul Islam; Md Musfique Anwar; Iqbal H Sarker
Journal:  Inform Med Unlocked       Date:  2022-05-20

3.  Leveraging Tweets for Artificial Intelligence Driven Sentiment Analysis on the COVID-19 Pandemic.

Authors:  Nora A Alkhaldi; Yousef Asiri; Aisha M Mashraqi; Hanan T Halawani; Sayed Abdel-Khalek; Romany F Mansour
Journal:  Healthcare (Basel)       Date:  2022-05-13

4.  Bioinformatics and system biology approach to identify the influences of COVID-19 on cardiovascular and hypertensive comorbidities.

Authors:  Asif Nashiry; Shauli Sarmin Sumi; Salequl Islam; Julian M W Quinn; Mohammad Ali Moni
Journal:  Brief Bioinform       Date:  2021-03-22       Impact factor: 11.622

5.  Infoveillance of the Croatian Online Media During the COVID-19 Pandemic: One-Year Longitudinal Study Using Natural Language Processing.

Authors:  Slobodan Beliga; Sanda Martinčić-Ipšić; Mihaela Matešić; Irena Petrijevčanin Vuksanović; Ana Meštrović
Journal:  JMIR Public Health Surveill       Date:  2021-12-24

6.  Spatial evolution patterns of public panic on Chinese social networks amidst the COVID-19 pandemic.

Authors:  Yixin Yang; Yingying Zhang; Xiaowan Zhang; Yihan Cao; Jie Zhang
Journal:  Int J Disaster Risk Reduct       Date:  2022-01-03       Impact factor: 4.320

7.  Effects of Bacille Calmette Guerin (BCG) vaccination during COVID-19 infection.

Authors:  Utpala Nanda Chowdhury; Md Omar Faruqe; Md Mehedy; Shamim Ahmad; M Babul Islam; Watshara Shoombuatong; A K M Azad; Mohammad Ali Moni
Journal:  Comput Biol Med       Date:  2021-09-29       Impact factor: 4.589

8.  The Evolution of Rumors on a Closed Social Networking Platform During COVID-19: Algorithm Development and Content Study.

Authors:  Andrea W Wang; Jo-Yu Lan; Ming-Hung Wang; Chihhao Yu
Journal:  JMIR Med Inform       Date:  2021-11-23

9.  Improved Transfer-Learning-Based Facial Recognition Framework to Detect Autistic Children at an Early Stage.

Authors:  Tania Akter; Mohammad Hanif Ali; Md Imran Khan; Md Shahriare Satu; Md Jamal Uddin; Salem A Alyami; Sarwar Ali; Akm Azad; Mohammad Ali Moni
Journal:  Brain Sci       Date:  2021-05-31

Review 10.  A Comprehensive Review on the Behaviour of Motorcyclists: Motivations, Issues, Challenges, Substantial Analysis and Recommendations.

Authors:  Sarah Najm Abdulwahid; Moamin A Mahmoud; Bilal Bahaa Zaidan; Abdullah Hussein Alamoodi; Salem Garfan; Mohammed Talal; Aws Alaa Zaidan
Journal:  Int J Environ Res Public Health       Date:  2022-03-17       Impact factor: 3.390

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.