| Literature DB >> 35399189 |
Linkai Luo1, Yue Wang1, Hai Liu2.
Abstract
Twitter offers extensive and valuable information on the spread of COVID-19 and the current state of public health. Mining tweets could be an important supplement for public health departments in monitoring the status of COVID-19 in a timely manner and taking the appropriate actions to minimize its impact. Identifying personal health mentions (PHM) is the first step of social media public health surveillance. It aims to identify whether a person's health condition is mentioned in a tweet, and it serves as a crucial method in tracking pandemic conditions in real time. However, social media texts contain noise, many creative and novel phrases, sarcastic emoji expressions, and misspellings. In addition, the class imbalance issue is usually very serious. To address these challenges, we built a COVID-19 PHM dataset containing more than 11,000 annotated tweets, and we proposed a dual convolutional neural network (CNN) framework using this dataset. An auxiliary CNN in the dual CNN structure provides supplemental information for the primary CNN in order to detect PHMs from tweets more effectively. The experiment shows that the proposed structure could alleviate the effect of class imbalance and could achieve promising results. This automated approach could monitor public health in real time and save disease-prevention departments from the tedious manual work in public health surveillance.Entities:
Keywords: CNN; Deep learning; Health monitoring; Social media; Text mining
Year: 2022 PMID: 35399189 PMCID: PMC8976569 DOI: 10.1016/j.eswa.2022.117139
Source DB: PubMed Journal: Expert Syst Appl ISSN: 0957-4174 Impact factor: 8.665
Fig. 1The structure of the dual CNN network.
Fig. 2The procedure to obtain the embedding matrix of a tweet.
The Hyperparameters in the dual CNN stucture.
| Kernel size | 3, 4, and 5 |
|---|---|
| Dropout rate | 0.5 |
| Batch size | 64 |
| Learning rate | 0.002 |
Fig. 3The relationship between F1 score and loss weight function.
The overall performances of the models for the COVID-19 PHM identification task.
| Model | Precision | Recall | F1 score |
|---|---|---|---|
| WE + SVM | 0.7424 | 0.7597 | 0.7477 |
| LSTM | 0.7549 | 0.7731 | 0.7576 |
| GRU | 0.7551 | 0.7731 | 0.7472 |
| BERT | 0.7943 | 0.7690 | 0.7560 |
| Dual CNN |
The performances comparison between LSTM and Dual CNN on each class.
| Model | Label | Precision | Recall | F1 score |
|---|---|---|---|---|
| LSTM | Self-mention | 0.5000 | 0.4643 | 0.4815 |
| Other-mention | 0.6250 | 0.5906 | 0.6073 | |
| Awareness | 0.8217 | 0.9016 | 0.8598 | |
| Non-health | 0.5581 | 0.3077 | 0.3967 | |
| Dual CNN | Self-mention | 0.4510 | 0.8214 | 0.5823 |
| Other-mention | 0.5389 | 0.8189 | 0.6500 | |
| Awareness | 0.9060 | 0.8080 | 0.8597 | |
| Non-health | 0.6027 | 0.5641 | 0.5828 |
Fig. 4The comparison of F1 scores with and without data augmentation for each label (class).
The performances comparison between CNN and Dual CNN on each class.
| Model | Label | Precision | Recall | F1 score |
|---|---|---|---|---|
| CNN | Self-mention | 0.5000 | 0.5357 | 0.5172 |
| Other-mention | 0.6286 | 0.5197 | 0.5690 | |
| Awareness | 0.8206 | 0.9114 | 0.8636 | |
| Non-health | 0.6163 | 0.3397 | 0.4380 | |
| Dual CNN | Self-mention | 0.4510 | 0.8214 | 0.5823 |
| Other-mention | 0.5389 | 0.8189 | 0.6500 | |
| Awareness | 0.9060 | 0.8080 | 0.8597 | |
| Non-health | 0.6027 | 0.5641 | 0.5828 |