| Literature DB >> 31656594 |
Jiahua Du1, Sandra Michalska1, Sudha Subramani1, Hua Wang1, Yanchun Zhang1.
Abstract
The paper aims to leverage the highly unstructured user-generated content in the context of pollen allergy surveillance using neural networks with character embeddings and the attention mechanism. Currently, there is no accurate representation of hay fever prevalence, particularly in real-time scenarios. Social media serves as an alternative to extract knowledge about the condition, which is valuable for allergy sufferers, general practitioners, and policy makers. Despite tremendous potential offered, conventional natural language processing methods prove limited when exposed to the challenging nature of user-generated content. As a result, the detection of actual hay fever instances among the number of false positives, as well as the correct identification of non-technical expressions as pollen allergy symptoms poses a major problem. We propose a deep architecture enhanced with character embeddings and neural attention to improve the performance of hay fever-related content classification from Twitter data. Improvement in prediction is achieved due to the character-level semantics introduced, which effectively addresses the out-of-vocabulary problem in our dataset where the rate is approximately 9%. Overall, the study is a step forward towards improved real-time pollen allergy surveillance from social media with state-of-art technology.Entities:
Keywords: Deep learning; Hay fever; Pollen allergy; Twitter
Year: 2019 PMID: 31656594 PMCID: PMC6790203 DOI: 10.1007/s13755-019-0084-2
Source DB: PubMed Journal: Health Inf Sci Syst ISSN: 2047-2501
Fig. 1Prevalence of allergic rhinitis sufferers in Australia [1]
Annotation schema with the examples of tweets
| Class | Description | Example |
|---|---|---|
| Informative | ||
| 1 | Detailed personal reporting (symptoms, treatments, etc.) | My eyes have been watering and I’ve been sneezing heaps today...anyone else in Melbourne noticing their hay fever kicking in for the first time this spring? |
| 2 | Generic personal reporting | I wanted a Sunday morning lie-in, but hayfever is telling me different |
| Non-informative | ||
| 3 | Warnings/news/marketing | Struggling with athsma or hayfever? Find out how a #saltlamp can help |
| 4 | Ambiguous/un-related | If I had hayfever I would simply buy some hay |
OOV rate of words and characters across testing folds
| Fold | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | Average |
|---|---|---|---|---|---|---|---|---|---|---|---|
| OOV words | 8.64 | 8.52 | 9.26 | 8.19 | 9.42 | 9.08 | 9.79 | 7.57 | 8.88 | 9.89 | 8.92 |
| OOV characters | 0.07 | 0.07 | 0.03 | 0.06 | 0.03 | 0.05 | 0.01 | 0.04 | 0.03 | 0.06 | 0.04 |
The performance of model variants
| Model | Accuracy | Macro-F1 |
|---|---|---|
| BILSTM + ATT | 77.72 | 72.80 |
| BILSTM + ATT + CHAR | 79.51 | 75.67 |
The examples of post with the attention maps
Color intensity indicates the weight attributed to each word towards the respective class assignment
The examples of posts with OOV words and their respective predictions probabilities for BILSTM + ATT (A) and BILSTM + ATT + CHAR (B)
| Class | Prob. (A) | Prob. (B) | Post |
|---|---|---|---|
| 1 | 0.60 | 0.99 | |
| 1 | 0.97 | 0.98 | |
| 1 | 0.20 | 0.99 | seriously one bizarre moments life told buy usual effective hayfever |
| 2 | 0.03 | 0.50 | |
| 2 | 0.94 | 1.00 | canny sun |
| 2 | 0.80 | 1.00 | im thinking immune |
| 4 | 0.26 | 0.67 | last one like |