| Literature DB >> 35966348 |
Samina Amin1, Abdullah Alharbi2, M Irfan Uddin1, Hashem Alyami3.
Abstract
The COVID-19 infection, which began in December 2019, has claimed many lives and impacted all aspects of human life. With time, COVID-19 was identified as a pandemic outbreak by the World Health Organization (WHO), putting massive pressure on global health. During this ongoing pandemic, the exponential growth of social media platforms has provided valuable resources for distributing information, as well as a source for self-reported disease symptoms in public discourse. Therefore, there is an urgent need for effective approaches to detect self-reported symptoms or cases in social media content. In this study, we scrapped public discourse on COVID-19 symptoms in Twitter content. For this, we developed a huge dataset of COVID-19 self-reported symptoms and gold-annotated the tweets into four categories: confirmed, death, suspected, and recovered. Then, we use a machine and deep machine learning models, each with its own set of features, such as feature representation. Furthermore, the experimentations were achieved with recurrent neural networks (RNNs) variants and compared their performance with traditional machine learning algorithms. Experimental results report that optimizing the area under the curve (AUC) enhances model performance, and the long short-term memory (LSTM) has the highest accuracy in detecting COVID-19 symptoms in real-time public messaging. Thus, the LSTM classifier in the proposed pipeline achieves a classification accuracy of 90.7%, outperforming existing state-of-the-art algorithms for multi-class classification.Entities:
Keywords: COVID-19; Classification; Coronavirus; Deep learning; Pandemic; Recurrent neural networks; Twitter
Year: 2022 PMID: 35966348 PMCID: PMC9364288 DOI: 10.1007/s00500-022-07405-0
Source DB: PubMed Journal: Soft comput ISSN: 1432-7643 Impact factor: 3.732
Fig. 1Monthly wise distribution of total a confirmed cases. b deaths rates. c recovered cases. d “critical” or “serious” cases of COVID-19
Fig. 3RNN to process long sentence
Examples of COVID-19 confirmed, death, suspected, and recovered cases in tweets
| This is sheer evil Remember, that till 14th Sept, 18102 kids were infected with #COVID19, and ZERO died. #covidinfected | |
| I am currently infected with covid 19 right now my body feels terrible. But I’m trying my best to recover soon #COVID19 | |
| I am again seeing whole families infected with #COVID19. I don’t want to do this again. #GetVaccinated #WearAMask | |
| TRAGIC: 41-yo elementary schoolteacher Kelly Peterson died of #COVID-19 | |
| It is informed with grief that Brig Nadeem, 95 PMA (ex DA UK) died due to Covid19/Lungs failure. This resulted in Cardiac arrest. May Allah bless him in the highest place in Jannah | |
| RIP - #COVID19 has claimed the life of another South Florida police officer. Sergeant Patrick “Pat” Madison died on Friday due to complications of COVID-19, according to the police department | |
| Very happy to share that both of my parents and my brother recovered from rather severe cases of #COVID19 and doing great with all your prayers and love | |
| I along with Dad and Mom have won the battle against #COVID19. We are now fully recovered from #COVID19. The last 17 days have been life-changing experiences for me. I am a different person all together learned many thing’s. Keep Faith in God | |
| I got infected with Covid after 1st shot of the vaccine. But due to 1st shot of the vaccine, the covid load was mild | |
| I am at risk of being infected with Covid19...I knew this like a few days ago | |
| Well, I have had all the symptoms of COVID-19 but took 3 tests and they were all negative moral of the story: go get vaccinated | |
| It is frustrating to miss work when you have a mild cold that typically you would tough out and go to work for. I got my negative Covid results 3 days ago but need to be symptom-free for 24 hours and this congestion and slight cough are lingering. had to leave work early today |
Statistics for the manually tagged COVID-19 dataset
| 1662 | 38.56% | |
| 909 | 21.09% | |
| 1053 | 24.43% | |
| 686 | 15.92% | |
| Total | 4,310 | 100.00% |
Fig. 4The confusion matrix of the models classifying confirmed, death, recovered, and suspected cases in tweets
Optimal hyperparameters settings for f the proposed RNNs model
| Parameters | BiRNN | LSTM | GRU |
|---|---|---|---|
| Pre-trained vocabulary size | 23000 | 23000 | 23000 |
| ngram_range | (1, 2) (1, 3) | (1, 2) (1, 3) | (1, 2) (1, 3) |
| Number of hidden layers | 3 | 3 | 3 |
| Number of neurons in hidden layers | 256, 128, 64 | 256, 128, 64 | 256, 128, 64 |
| Output layer | 1 | 1 | 1 |
| Number of neurons in the output layer | 4 | 4 | 4 |
| Learning rate | 0.001 | 0.001 | 0.001 |
| Optimizer | adam | adam | adam |
| Epoch no# | 10, 15, 30 | 10, 20, 40, 50 | 10, 20, 40, 50 |
| Batch size | 32 | 32 | 32 |
| Activation | softmax | softmax | softmax |
| Loss function | binary_crossentropy | binary_crossentropy | binary_crossentropy |
| Dropout | 0.2, 0.5 | 0.2, 0.3, 0.5 | 0.2, 0.5 |
| Training time (sec) | 64 | 68 | 62 |
| Test loss | 0.88 | 0.23 | 0.57 |
Comparison of the proposed model performance and all the baselines on the TF-IDF feature representation
| Model | Accuracy (%) | Precision | Recall | F-score |
|---|---|---|---|---|
| LSTM | 90.7 | 0.89 | 0.90 | 0.89 |
| GRU | 86.6 | 0.86 | 0.85 | 0.87 |
| BiRNN | 86.1 | 0.85 | 0.86 | 0.86 |
| ANN | 83.2 | 0.82 | 0.83 | 0.82 |
| SVM | 76.7 | 0.77 | 0.76 | 0.76 |
| Logistic Regression | 76.6 | 0.77 | 0.78 | 0.77 |
| Naïve Bayes | 75.4 | 0.75 | 0.76 | 0.74 |
| Decision Tree | 73.2 | 0.72 | 0.71 | 0.72 |
Fig. 5ROC-AUC to evaluate the proposed models’ performance by plotting TPR against the FPR and classifying confirmed, death, recovered, and suspected symptoms cases