| Literature DB >> 28541493 |
Byron C Wallace1, Anna Noel-Storr2, Iain J Marshall3, Aaron M Cohen4, Neil R Smalheiser5, James Thomas6.
Abstract
OBJECTIVES: Identifying all published reports of randomized controlled trials (RCTs) is an important aim, but it requires extensive manual effort to separate RCTs from non-RCTs, even using current machine learning (ML) approaches. We aimed to make this process more efficient via a hybrid approach using both crowdsourcing and ML.Entities:
Keywords: crowdsourcing; evidence-based medicine; human computation; machine learning; natural language processing
Mesh:
Year: 2017 PMID: 28541493 PMCID: PMC5975623 DOI: 10.1093/jamia/ocx053
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
Figure 1.Left: Receiver operating characteristic curve showing the performance of our RCT classifier, trained on a subset of the Embase dataset. Right: Receiver operating characteristic curve showing the performance of our pretrained RCT classifier on the entire Embase dataset.
Distribution of “ground truth” RCTs and non-RCTs within ranges of classifier confidence (N gives the number of abstracts that fall into each range)
| Probability | RCT | Non-RCT | Cumulative recall | Percent “screened” | |
|---|---|---|---|---|---|
| 0.9 to 1.0 | 1511 | 210 | 1721 | 0.633 | 3.6 |
| 0.8 to < 0.9 | 269 | 242 | 511 | 0.746 | 4.7 |
| 0.7 to < 0.8 | 150 | 270 | 420 | 0.809 | 5.6 |
| 0.6 to < 0.7 | 110 | 323 | 433 | 0.855 | 6.5 |
| 0.5 to < 0.6 | 92 | 396 | 488 | 0.893 | 7.5 |
| 0.4 to < 0.5 | 71 | 573 | 644 | 0.923 | 8.9 |
| 0.3 to < 0.4 | 63 | 912 | 975 | 0.950 | 10.9 |
| 0.2 to < 0.3 | 55 | 1635 | 1690 | 0.972 | 14.5 |
| 0.1 to < 0.2 | 47 | 3807 | 3854 | 0.992 | 22.6 |
| <0.1 | 19 | 36 690 | 36 709 | 1 | 100 |
First 3 columns: number of labels acquired from each type of labeler using the manual, hybrid, and totally automated approaches (with 2 different thresholds shown). Second 2 columns: precision and recall with respect to identifying RCTs. The manual measures have asterisks; we assume these are “ground truth” by construction (see text for discussion)
| Novice | Expert | Resolver | Precision | Recall | |
|---|---|---|---|---|---|
| Manual | 29,376 | 97,512 | 1,895 | 1.0* | 1.0* |
| Hybrid | 3,884 | 12,218 | 4,175 | 0.99 | 0.96 |
| Classifier-only (threshold = 0.5) | 0 | 0 | 0 | 0.99 | 0.71 |
| Classifier only (threshold = 0.1) | 0 | 0 | 0 | 0.27 | 0.96 |
Figure 2.A scatterplot of recall vs (simulated) total expended effort for varying values of the confidence threshold t. As noted in the text, effort is modeled as unit costs, where 1 novice screening decision = 1 unit, 1 expert decision = 2 units, and 1 resolver decision = 4 units.