| Literature DB >> 31562510 |
Davy Weissenbacher1, Abeed Sarker1, Ari Klein1, Karen O'Connor1, Arjun Magge2, Graciela Gonzalez-Hernandez1.
Abstract
OBJECTIVE: Twitter posts are now recognized as an important source of patient-generated data, providing unique insights into population health. A fundamental step toward incorporating Twitter data in pharmacoepidemiologic research is to automatically recognize medication mentions in tweets. Given that lexical searches for medication names suffer from low recall due to misspellings or ambiguity with common words, we propose a more advanced method to recognize them.Entities:
Keywords: drug name detection; ensemble learning; pharmacovigilance; social media; text classification
Mesh:
Substances:
Year: 2019 PMID: 31562510 PMCID: PMC6857507 DOI: 10.1093/jamia/ocz156
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
Statistics of the UPennHLP Twitter Drug and Pregnancy Corpora
| UPennHLP Twitter Drug Corpus | UPennHLP Twitter Pregnancy Corpus | |
|---|---|---|
| Training set | 9623 tweets (4975 | 69 272 tweets (181 |
| Testing set | 5382 tweets (2852 | 29 687 tweets ( 77 |
| Users in training/testing set | 7584/4535 | 112/112 |
| Users posting in training set and in testing set |
1054 (these users posted 1713 tweets in testing set, 31.8% of testing set) |
112 --- |
Figure 1.Architecture of Kusuri, an ensemble learning classifier for drug detection in Twitter. LSTM: long short-term memory.
Figure 2.Deep neural network predicting ŷ, the probability for a tweet to mention a drug name. GRU: gated recurrent unit.
Precision, recall, and F1 scores for drug detection classifiers on the test set of the UPennHLP Twitter Drug Corpus
| System | Precision | Recall | F1 score |
|---|---|---|---|
| 1. Lexicon + variant classifier | 66.4 | 88.5 | 75.9 |
| 2. Supervised bidirectional-GRU | 93.5 | 89.5 | 91.4 |
| 3. THU_NGN hierarchical-NNs | 93.3 | 90.4 | 91.8 |
| 4. Best DNN model in the ensemble | 93.7 | 92.5 | 93.1 |
| 5. Ensemble DNNs (module 2 of | 95.1 | 92.5 | 93.7 |
DNN: deep neural network; GRU: gated recurrent unit; NN: neural network; THU_NGN.
Categories of false positive and false negative made by the drug detection classifier on the test set of the UPennHLP Twitter Drug Corpus
| Error category | Errors | Examples |
|---|---|---|
| False positive | ||
| Medical topic | 41 |
<user> you should see a dermatologist if you can. You may just need something to break you out of a cycle. I used a topical and took pills Lola may has a stye, or pink eye. Doc recommends warm compresses to see if it gets better today, but my eyes are itchy just looking at her. |
| Weighted words/patterns | 19 |
<user> i was robbed a foul when |
| Ambiguous name | 12 | <user>I actually really like |
| Food topic | 11 | This aerobically fermented product was tested & it’s antibiotic residue free. also certified organic. |
| Insufficient context | 7 | <user> adding |
| Cosmetic topic | 5 | Doc prescribed me this dandruff shampoo, if it works, I'm definitely getting a sew in after I'm done using it |
| Unknown | 2 | Ice_Cream, Ice-Cream and More Ice-Cream…thats Ol i Want |
| Error annotation | 3 | − |
| False negative | ||
| Ambiguous name | 36 |
Trying Oil of Oregano & garlic for congestion for my sinus infection. [ambiguous dietary supplement] In the church the person close to me's sniffling & coughing… I need a bathe of bactine and some |
| Drug not/rarely seen | 25 |
That’s the Pennsylvania Appellate Court Revives 1, 000 the |
| Generic terms | 18 | Tossing and turning. I need ur |
| Nonmedical topic | 11 | <user>Meet Mr an Mrs Lexapro….. guarenteed fidelity. |
| Short tweets | 3 | arnica-ointment-7 |
| Error annotation | 7 | − |
Precision, recall, F1 scores, true positives, false positives, and false negatives for drug detection classifiers on the UPennHLP Twitter Pregnancy Corpus testing set
| System | Precision | Recall | F1 score | True positives/false positives/false negatives |
|---|---|---|---|---|
| 1. Lexicon + variant classifier | 55.0 | 71.43 | 62.15 | 55/45/22 |
| 2. Ensemble supervised bidirectional-GRUs | ||||
| a. Trained on UPennHLP Twitter Pregnancy Corpus | 87.5 | 63.64 | 73.68 | 49/7/28 |
| 3. Ensemble DNNs (only module 2 [classifier] of | ||||
| a. Trained on UPennHLP Twitter Drug Corpus | 10.15 | 80.52 | 18.02 | 62/549/15 |
| b. Trained on UPennHLP Twitter Pregnancy Corpus | 93.75 | 58.44 | 72.00 | 45/3/32 |
| 4. | 94.55a | 67.53 | 78.79a | 52/3/25 |
DNN: deep neural network; GRU: gated recurrent unit; NN: neural network; THU_NGN.