| Literature DB >> 31087070 |
Arjun Magge1, Abeed Sarker2, Azadeh Nikfarjam2, Graciela Gonzalez-Hernandez2.
Abstract
Entities:
Mesh:
Year: 2019 PMID: 31087070 PMCID: PMC6515520 DOI: 10.1093/jamia/ocz013
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
Performance comparison of NERs under different training and testing modes
| Mode | Dataset Size | Precision | Recall | F1-score |
|---|---|---|---|---|
| Cocos et al on | 844 tweets | 0.70 (0.66-0.74) | 0.82 (0.76-0.89) | 0.75 (0.74-0.76) |
| October 2018: train | 526 tweets | 0.76 (0.70-0.82) | 0.72 (0.63-0.81) | 0.73 (0.70-0.76) |
| October 2018: train | 644 tweets | 0.60 (0.54-0.65) | 0.70 (0.62-0.77) | 0.63 (0.60-0.66) |
| October 2018: train Standard and test Standard | 1012 tweets | 0.73 (0.66-0.79) | 0.60 (0.52-0.68) | 0.64 (0.62-0.66) |
| Cocos et al on ADRMine Dataset | 1784 tweets | 0.68 (0.62-0.73) | 0.69 (0.62-0.75) | 0.67 (0.66-0.69) |
| ADRMine on ADRMine Dataset as published | 1784 tweets | 0.76 | 0.68 | 0.72 |
Values are mean (95% confidence interval). Scores were achieved by each model over 10 training and evaluation rounds. MostlyPos refers to how the dataset is used by Cocos et al (ie, removing tweets without span annotations), hence leaving mostly positive tweets. Standard refers to the dataset including a roughly 50-50 balance of positive to negative tweets as in Nikfarjam et al, and the balance of the ADRMine Dataset.