| Literature DB >> 29743155 |
Simon Maskell1, Munir Pirmohamed2, Danushka Bollegala1, Richard Sloane1, Joanna Hajne1.
Abstract
BACKGROUND: Detecting adverse drug reactions (ADRs) is an important task that has direct implications for the use of that drug. If we can detect previously unknown ADRs as quickly as possible, then this information can be provided to the regulators, pharmaceutical companies, and health care organizations, thereby potentially reducing drug-related morbidity and saving lives of many patients. A promising approach for detecting ADRs is to use social media platforms such as Twitter and Facebook. A high level of correlation between a drug name and an event may be an indication of a potential adverse reaction associated with that drug. Although numerous association measures have been proposed by the signal detection community for identifying ADRs, these measures are limited in that they detect correlations but often ignore causality.Entities:
Keywords: ADR detection; causality; causality detection; lexical patterns; machine learning; support vector machines
Year: 2018 PMID: 29743155 PMCID: PMC5966656 DOI: 10.2196/publichealth.8214
Source DB: PubMed Journal: JMIR Public Health Surveill ISSN: 2369-2960
Figure 1Three tweets mentioning a drug (shown in blue boldface fonts) and symptoms (shown in red italic font).
Figure 2Extracting lexical patterns from a tweet that describes an adverse reaction (dizziness) caused by a drug (Atenolol). The tweet is split into 3 parts—prefix, midfix, and postfix, and various lexical patterns are extracted from each part. See text for the details of the pattern extraction method. Best viewed in color.
Figure 3Support vector machine—optimization problem.
Classification accuracy of different baselines and the proposed method.
| Method | Classification accuracy |
| Majority baseline | 63.19 |
| Bag-of-words classifier | 69.31a |
| Convolutional neural network | 69.26a |
| Prefix only | 66.41a |
| Midfix only | 72.78a |
| Postfix only | 68.08a |
| Prefix+midfix | 74.72a |
| Prefix+postfix | 71.07a |
| Midfix+postfix | 77.10a |
| Proposed method | 77.70a |
aStatistically significant values.
Figure 4Histogram of the weights of the features learned by the support vector machine (SVM) classifier.
A randomly selected sample of features with zero weights.
| Prefix patterns | Midfix patterns | Postfix patterns |
| P+trip+i | M+bad+idea | S+over |
| P+news+: | M+a+breakfast | S+12+hours |
| P+dat+lean | M+if+school | S+conquest |
| P+@rroddger | M+medica_authorities | S+please |
| P+fussiness+no | M+convicted+i | S+bad! |
Top-ranked positively (left 2 columns) and negatively (right 2 columns) weighted features (skip-gram patterns) by the support vector machine.
| Feature | Weight | Feature | Weight |
| Sc+als | 1.2096 | M+commercial | −1.2304 |
| Mb+induced | 1.1314 | P+hate+being | −1.0398 |
| Pa+oh+no | 1.0683 | P+I’m+definitely | −1.0000 |
| M+dstinks | 1.0000 | P+clumsiness | −1.0000 |
| S+.+wooh | 1.0000 | P+hospitalization | −1.000 |
| M+never+work | 1.0000 | S+lol+fml | −0.9674 |
| P+high+off | 0.9006 | S+wopps | −0.9035 |
| P+took+too | 0.8449 | P+rt+xanaaxhadme | −0.8067 |
| M+was+supposed | 0.8378 | P+don’t+think | −0.7721 |
aP: prefix skip-gram patterns.
bM: midfix skip-gram patterns.
cS: postfix skip-gram patterns.
dFor bigrams, we have used “+” to separate the constituent unigrams.