| Literature DB >> 28468748 |
Nestor Alvaro1,2, Yusuke Miyao1,2, Nigel Collier3.
Abstract
BACKGROUND: Work on pharmacovigilance systems using texts from PubMed and Twitter typically target at different elements and use different annotation guidelines resulting in a scenario where there is no comparable set of documents from both Twitter and PubMed annotated in the same manner.Entities:
Keywords: PubMed; Twitter; annotation; corpus; natural language processing; pharmacovigilance; text mining
Year: 2017 PMID: 28468748 PMCID: PMC5438461 DOI: 10.2196/publichealth.6396
Source DB: PubMed Journal: JMIR Public Health Surveill ISSN: 2369-2960
Figure 1Annotation pipeline. The initial number of raw sentences differed between twitter (165,489 tweets) and PubMed (29,435 sentences).
Details on the resulting corpora produced by the researchers who used the guidelines we reviewed.
| Corpus name | DDIacorpus | ADEbcorpus | AZDCccorpus | ShARed or CLEFe eHealth 2013 Task I |
| Annotated entities | Pharmacological substances | Drug, adverse effects, dosages | Diseases | Disorders |
| Annotated relations | Drug-drug interactions | Drug-adverse effect, drug-dosage | - | - |
| Texts origin | DrugBank and MEDLINE | MEDLINE | PubMed abstracts | Clinical notes |
| Number of documents | 1025 | 2972 | 794 (2775 sentences) | 200 |
| Number of annotators | 2 | 3 (after automatic annotation) | 2 (after automatic annotation) | 2 |
| Availability | Free use for academic research | Free | Free | Upon request |
| Annotation Tool | Brat | Knowtator | In-house tool | Knowtator |
aDDI: drug-drug interaction.
bADE: adverse drug event.
cAZDC: Arizona disease corpus.
dShARe: shared annotated resources.
eCLEF: Conference and Labs of the Evaluation Forum.
Total number of sentences for each drug name in Twitter and PubMed.
| Drug name | # Tweets | # Sentences in PubMed |
| Bevacizumab | 69 | 239 |
| Buprenorphine | 363 | 244 |
| Carbamazepine | 74 | 239 |
| Ciprofloxacin | 81 | 250 |
| Citalopram | 331 | 251 |
| Cortisone | 344 | 231 |
| Destroamphetamine sulphate | 373 | 19 |
| Docetaxel | 34 | 246 |
| Duloxetine | 242 | 241 |
| Fluoxetine | 344 | 238 |
| Fluvoxamine maleate | 13 | 204 |
| Lamotrigine | 168 | 242 |
| Lisdexamfetamine | 348 | 84 |
| Lisinopril | 56 | 147 |
| Melphalan | 2 | 234 |
| Methylphenidate hydrochloride | 349 | 112 |
| Modafinil | 287 | 10 |
| Montelukast | 71 | 239 |
| Olanzapine | 190 | 248 |
| Paroxetine | 365 | 249 |
| Prednisone | 350 | 249 |
| Quetiapine | 339 | 247 |
| Rupatadine | 1 | 45 |
| Sertraline | 343 | 236 |
| Tamoxifen | 122 | 238 |
| Topiramate | 133 | 231 |
| Trazodone | 206 | 70 |
| Triamcinolone acetonide | 14 | 253 |
| Venlafaxine | 326 | 238 |
| Ziprasidone | 62 | 226 |
Agreement with gold standard data during the annotator selection phase. We compared the results from 2 very active social media users, one native English speaker and 3 pharmacists. We indicate between brackets the time it took to complete the annotation for that dataset (time in min).
| Annotator | Twitter (min) | PubMed (min) | Total (min) |
| Social1 | 0.70 (9) | 0.80 (10) | 0.75 (19) |
| Social2 | 1.00 (8) | 0.70 (7) | 0.85 (15) |
| Native speaker | 0.85 (6) | 0.50 (6) | 0.67 (12) |
| Pharmacist1 | 0.90 (8) | 0.85 (7) | 0.87 (15) |
| Pharmacist2 | 0.70 (11) | 0.80 (9) | 0.75 (20) |
| Pharmacist3 | 0.50 (15) | 0.70 (15) | 0.60 (30) |
Detail of annotations in Twitter. The first column shows the element being evaluated. Columns 2-5 show the inter annotator agreement scores of pharmacist 1 (Ph1) and pharmacist 2 (Ph2) using relaxed and strict constraints. Columns 6 and 7 show the number of elements annotated by each pharmacist. Columns 8 and 9 show the number of matching elements between pharmacist’s annotations using relaxed and strict constraints.
| Annotated element | Ph1 | Ph2 | Ph1 | Ph2 | #Ph1 | #Ph2 | #Matches | #Matches |
| Drug | 97.39 | 98.72 | 93.52 | 94.80 | 1111 | 1096 | 1082 | 1039 |
| Disease | 50.86 | 91.47 | 46.12 | 82.95 | 464 | 258 | 236 | 214 |
| Symptom | 77.23 | 76.71 | 54.21 | 53.84 | 1164 | 1172 | 899 | 631 |
| Outcome-negative | 63.27 | 75.19 | 43.02 | 51.12 | 795 | 669 | 503 | 342 |
| Outcome-positive | 11.01 | 40.00 | 8.26 | 30.00 | 109 | 30 | 12 | 9 |
| Reason-to-use | 55.82 | 60.18 | 44.66 | 48.14 | 842 | 781 | 470 | 376 |
| Duration | 46.37 | 8.96 | 39.11 | 7.56 | 248 | 1283 | 115 | 97 |
| Exemplification | 10.11 | 64.77 | 3.37 | 21.59 | 564 | 88 | 57 | 19 |
| Modality | 56.92 | 30.58 | 49.57 | 26.63 | 585 | 1089 | 333 | 290 |
| Person | 72.56 | 58.55 | 60.21 | 48.58 | 1709 | 2118 | 1240 | 1029 |
| Polarity | 76.06 | 52.43 | 53.52 | 36.89 | 71 | 103 | 54 | 38 |
| Sentiment | 72.48 | 19.46 | 60.92 | 16.36 | 476 | 1773 | 345 | 290 |
| Severity | 64.18 | 19.59 | 44.03 | 13.44 | 134 | 439 | 86 | 59 |
| Status | 59.41 | 22.07 | 45.94 | 17.07 | 542 | 1459 | 322 | 249 |
Detail of annotations in PubMed. The first column shows the element being evaluated. Columns 2-5 show the inter annotator agreement scores of pharmacist 1 (Ph1) and pharmacist 2 (Ph2) using relaxed and strict constraints. Columns 6 and 7 show the number of elements annotated by each pharmacist. Columns 8 and 9 show the number of matching elements between pharmacist’s annotations using relaxed and strict constraints.
| Annotated element | Ph1 | Ph2 | Ph1 | Ph2 | #Ph1 | #Ph2 | #Matches | #Matches |
| Drug | 95.20 | 97.90 | 86.23 | 88.67 | 1271 | 1236 | 1210 | 1096 |
| Disease | 64.18 | 95.22 | 53.41 | 79.23 | 1086 | 732 | 697 | 580 |
| Symptom | 85.13 | 60.59 | 70.61 | 50.26 | 558 | 784 | 475 | 394 |
| Outcome-negative | 60.97 | 64.86 | 50.35 | 53.56 | 433 | 407 | 264 | 218 |
| Outcome-positive | 56.25 | 32.73 | 43.75 | 25.45 | 32 | 55 | 18 | 14 |
| Reason-to-use | 62.87 | 77.39 | 47.10 | 57.98 | 1535 | 1247 | 965 | 723 |
| Duration | 52.17 | 9.38 | 48.70 | 8.75 | 115 | 640 | 60 | 56 |
| Exemplification | 0.64 | 50.00 | 0.32 | 25.00 | 311 | 4 | 2 | 1 |
| Modality | 74.23 | 50.52 | 64.60 | 43.96 | 1370 | 2013 | 1017 | 885 |
| Person | 63.93 | 77.18 | 56.08 | 67.70 | 1439 | 1192 | 920 | 807 |
| Polarity | 25.00 | 22.22 | 25.00 | 22.22 | 16 | 18 | 4 | 4 |
| Sentiment | 33.33 | 1.96 | 22.22 | 1.31 | 9 | 153 | 3 | 2 |
| Severity | 42.22 | 33.33 | 37.78 | 29.82 | 45 | 57 | 19 | 17 |
| Status | 53.85 | 2.52 | 53.85 | 2.52 | 26 | 555 | 14 | 14 |
Figure 2Sample with the annotation of a drug, a disease and the relation between these concepts in a sentence from Twitter.
Detail of annotations in Twitter using the conflation strategy. The first column shows the element being evaluated. Columns 2-5 show the inter annotator agreement scores of pharmacist 1 (Ph1) and pharmacist 2 (Ph2) using relaxed and strict constraints. Columns 6 and 7 show the number of elements annotated by each pharmacist. Columns 8 and 9 show the number of matching elements between pharmacist's annotations using relaxed and strict constraints.
| Annotated element | Ph1 | Ph2 | Ph1 | Ph2 | #Ph1 | #Ph2 | #Matches | #Matches |
| Drug | 97.39 | 98.72 | 93.52 | 94.80 | 1111 | 1096 | 1082 | 1039 |
| Disease or symptom | 82.25 | 93.64 | 61.36 | 69.86 | 1628 | 1430 | 1339 | 999 |
| Outcome-negative | 67.30 | 79.97 | 46.29 | 55.01 | 795 | 669 | 535 | 368 |
| Benefit | 68.14 | 79.90 | 52.37 | 61.41 | 951 | 811 | 648 | 498 |
| Duration | 50.00 | 9.66 | 41.94 | 8.11 | 248 | 1283 | 124 | 104 |
| Exemplification | 10.11 | 64.77 | 3.37 | 21.59 | 564 | 88 | 57 | 19 |
| Modality | 64.44 | 34.62 | 54.53 | 29.29 | 585 | 1089 | 377 | 319 |
| Person | 77.30 | 62.37 | 63.96 | 51.61 | 1709 | 2118 | 1321 | 1093 |
| Polarity | 80.28 | 55.34 | 57.75 | 39.81 | 71 | 103 | 57 | 41 |
| Sentiment | 75.00 | 20.14 | 62.61 | 16.81 | 476 | 1773 | 357 | 298 |
| Severity | 67.16 | 20.50 | 47.01 | 14.35 | 134 | 439 | 90 | 63 |
| Status | 61.81 | 22.96 | 48.15 | 17.89 | 542 | 1459 | 335 | 261 |
Detail of annotations in PubMed using the conflation strategy. The first column shows the element being evaluated. Columns 2-5 show the inter annotator agreement scores of pharmacist 1 (Ph1) and pharmacist 2 (Ph2) using relaxed and strict constraints. Columns 6 and 7 show the number of elements annotated by each pharmacist. Columns 8 and 9 show the number of matching elements between pharmacist’s annotations using relaxed and strict constraints.
| Annotated element | Ph1 | Ph2 | Ph1 | Ph2 | #Ph1 | #Ph2 | #Matches | #Matches |
| Drug | 95.20 | 97.90 | 86.23 | 88.67 | 1271 | 1236 | 1210 | 1096 |
| Disease or symptom | 91.91 | 99.67 | 74.21 | 80.47 | 1644 | 1516 | 1511 | 1220 |
| Outcome-negative | 81.52 | 86.73 | 65.82 | 70.02 | 433 | 407 | 353 | 285 |
| Benefit | 77.41 | 93.16 | 56.86 | 68.43 | 1567 | 1302 | 1213 | 891 |
| Duration | 53.91 | 9.69 | 50.43 | 9.06 | 115 | 640 | 62 | 58 |
| Exemplification | 0.64 | 50.00 | 0.32 | 25.00 | 311 | 4 | 2 | 1 |
| Modality | 83.43 | 56.78 | 71.39 | 48.58 | 1370 | 2013 | 1143 | 978 |
| Person | 71.58 | 86.41 | 62.13 | 75.00 | 1439 | 1192 | 1030 | 894 |
| Polarity | 43.75 | 38.89 | 43.75 | 38.89 | 16 | 18 | 7 | 7 |
| Sentiment | 33.33 | 1.96 | 22.22 | 1.31 | 9 | 153 | 3 | 2 |
| Severity | 53.33 | 42.11 | 46.67 | 36.84 | 45 | 57 | 24 | 21 |
| Status | 53.85 | 2.52 | 53.85 | 2.52 | 26 | 555 | 14 | 14 |
Figure 3Sample of an annotation where “duration” and “exemplification” attributes are used.