| Literature DB >> 25797061 |
Ritu Khare1, John D Burger2, John S Aberdeen2, David W Tresner-Kirsch2, Theodore J Corrales3, Lynette Hirchman2, Zhiyong Lu4.
Abstract
Motivated by the high cost of human curation of biological databases, there is an increasing interest in using computational approaches to assist human curators and accelerate the manual curation process. Towards the goal of cataloging drug indications from FDA drug labels, we recently developed LabeledIn, a human-curated drug indication resource for 250 clinical drugs. Its development required over 40 h of human effort across 20 weeks, despite using well-defined annotation guidelines. In this study, we aim to investigate the feasibility of scaling drug indication annotation through a crowdsourcing technique where an unknown network of workers can be recruited through the technical environment of Amazon Mechanical Turk (MTurk). To translate the expert-curation task of cataloging indications into human intelligence tasks (HITs) suitable for the average workers on MTurk, we first simplify the complex task such that each HIT only involves a worker making a binary judgment of whether a highlighted disease, in context of a given drug label, is an indication. In addition, this study is novel in the crowdsourcing interface design where the annotation guidelines are encoded into user options. For evaluation, we assess the ability of our proposed method to achieve high-quality annotations in a time-efficient and cost-effective manner. We posted over 3000 HITs drawn from 706 drug labels on MTurk. Within 8 h of posting, we collected 18 775 judgments from 74 workers, and achieved an aggregated accuracy of 96% on 450 control HITs (where gold-standard answers are known), at a cost of $1.75 per drug label. On the basis of these results, we conclude that our crowdsourcing approach not only results in significant cost and time saving, but also leads to accuracy comparable to that of domain experts. Published by Oxford University Press 2015. This work is written by US Government employees and is in the public domain in the US.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25797061 PMCID: PMC4369375 DOI: 10.1093/database/bav016
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1.An example of an FDA Drug Label in DailyMed; drug names are specified as normalized concepts under the ‘RxNorm Names’ box, and the drug indications are described as free text in the ‘INDICATIONS AND USAGE’ section.
Figure 2.Crowdsourced Microtasking Pipeline for Cataloging Drug Indications from FDA Drug Labels. Part II shows the drug and disease mentions identified using named-entity recognition (NER) tools.
Figure 3.HITs corresponding to the drug label in Figure 2.
Figure 4.Screenshot of the drug indication micro task on MTurk.
Examples of non-indication disease mentions
| Category | Example |
|---|---|
| Characteristic or risk factor | |
| Side effect | A physician considering |
| Contraindication | |
| Unrelated | |
| Not a disease |
Figure 5.Distrbution of gold answers across control items.
Performance on control items
| Method | Number of HITs | Number of Judgments | Number of Turkers | Yes/no accuracy (%) | Six-way accuracy (%) |
|---|---|---|---|---|---|
| Judgment wise | 450 | 3470 | 64 | 90.95 | 83.22 |
| Turker wise | 450 | 3470 | 64 | 88.39 | 81.14 |
| 50+ turker wise | 450 | 2953 | 26 | 91.54 | 83.98 |
| 100+ turker wise | 450 | 2252 | 15 | 90.21 | 82.83 |
| Most prolific turker | 341 | 341 | 1 | 93.25 | 85.63 |
| EM | 450 | 3417 | 64 | 95.78 | 88.44 |
| Majority voting | 450 | 3417 | 64 | 96.00 | 88.66 |