| Literature DB >> 26060719 |
Jingfang Liu1, Pengzhu Zhang1, Yingjie Lu2.
Abstract
BACKGROUND: User-generated medical messages on Internet contain extensive information related to adverse drug reactions (ADRs) and are known as valuable resources for post-marketing drug surveillance. The aim of this study was to find an effective method to identify messages related to ADRs automatically from online user reviews.Entities:
Keywords: Adverse drug reaction; Feature-based classification; Online user review
Year: 2014 PMID: 26060719 PMCID: PMC4449501
Source DB: PubMed Journal: Iran J Public Health ISSN: 2251-6085 Impact factor: 1.429
Data collection statistics
| Community name | Messages | Members | Messages per member | Time span |
|---|---|---|---|---|
| Allergy | 9,014 | 2,203 | 4.09 | September 2008-February 2014 |
| Schizophrenia | 1,060 | 405 | 2.62 | September 2008-February 2014 |
| Pain management | 12,180 | 5,024 | 2.42 | September 2008-February 2014 |
Fig. 1The design framework for the automatic identification of messages related to ADRs
Fig. 2The study flowchart and criteria
Definition of feature set variable
| Variable | Value |
|---|---|
| F1 | N-gram-based features |
| F2 | domain-specific features |
| F1+F2 | N-gram-based features and domain-specific features |
Fig. 3Accuracy results using different feature sets and classification techniques
F-measure results using different feature sets and classification techniques
| C4.5 | Naïve | SVM | |
|---|---|---|---|
| F1 | 0.595 | 0.681 | 0.764 |
| F2 | 0.559 | 0.693 | 0.760 |
| F1+F2 | 0.648 | 0.755 | 0.895 |
The results of sensitivity analyses for feature set
| 20% ADR messages (%) | 25% ADR | 30% ADR messages (%) | 35% ADR messages (%) | 40% ADR messages (%) | |
|---|---|---|---|---|---|
| F1+F2 | 65.32 | 70.37 | 75.66 | 80.21 | 89.51 |
| F2 | 55.49** | 61.71** | 66.63** | 72.97** | 80.31* |
| F1 | 59.33** | 63.99** | 69.41** | 73.05** | 81.72* |
The results of sensitivity analyses for classifier
| 20% ADR messages (%) | 25% ADR messages (%) | 30% ADR messages (%) | 35% ADR messages (%) | 40% ADR messages (%) | |
|---|---|---|---|---|---|
| SVM | 65.32 | 70.37 | 75.66 | 80.21 | 89.95 |
| Naïve Bayes | 61.53** | 65.37** | 71.59** | 76.49** | 81.61* |
| C4.5 | 59.82** | 64.29** | 70.18** | 75.28** | 80.87* |
Identification result in each community
| Community | Total Messages | Messages related to ADRs | Percentage |
|---|---|---|---|
| Allergy | 9,014 | 2,332 | 25.87 |
| Schizophrenia | 1,060 | 331 | 31.23 |
| Pain management | 12,180 | 3,349 | 27.50 |
Definition of classifier variable
| Variable | Value |
|---|---|
| Classifier | SVM |
| C4.5 | |
| Naïve Bayes |