| Literature DB >> 25246425 |
John D Burger1, Emily Doughty2, Ritu Khare2, Chih-Hsuan Wei2, Rajashree Mishra2, John Aberdeen2, David Tresner-Kirsch2, Ben Wellner2, Maricel G Kann2, Zhiyong Lu2, Lynette Hirschman3.
Abstract
BACKGROUND: This article describes capture of biological information using a hybrid approach that combines natural language processing to extract biological entities and crowdsourcing with annotators recruited via Amazon Mechanical Turk to judge correctness of candidate biological relations. These techniques were applied to extract gene- mutation relations from biomedical abstracts with the goal of supporting production scale capture of gene-mutation-disease findings as an open source resource for personalized medicine.Entities:
Mesh:
Year: 2014 PMID: 25246425 PMCID: PMC4170591 DOI: 10.1093/database/bau094
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1.HIT design for the gene–mutation task.
Figure 2.Schematic framework for hybrid curation.
Corpus comparison for Experiment 1 vs. Experiment 2
| Experiment | #1 | #2 |
|---|---|---|
| Number of Abstracts | 250 | 275 |
| Number of HITS | 1097 | 1078 |
| Number of genes (gold standard) | 279 | 246 |
| Number of mutations (gold Standard) | 586 | 452 |
| Number of gene–mutation pairs (gold standard) | 586 | 444 |
| Gene–mutation pairs per abstract | 2.3 | 1.6 |
Dimensions of possible evaluation
| Element of analysis | Displayed surface text (Turker view) | Concepts (Database view) |
|---|---|---|
| 1. Gene spans; mutation spans | 2. EntrezGene IDs; mutation triple | |
| 3. Judgments on entity spans in context | 4. Tuple of 〈gene ID, mutation triple〉 |
Precision and recall scores for gene and mutation identification
| Element of analysis | Gold standard | Candidates | Correct | Precision | Recall |
|---|---|---|---|---|---|
| Genes | 246 | 582 | 222 | 0.381 | 0.902 |
| Mutations | 452 | 497 | 395 | 0.795 | 0.874 |
| Gene–mutation pairs | 444 | 1078 | 374 | 0.347 | 0.842 |
Concept relation accuracy for the initial experiment vs. the current study
| Individual Turker results | ||
|---|---|---|
| Experiment 1 % | Experiment 2 % | |
| Baseline system (all ‘NO’) | 58.7 | 65.6 |
| Average response | 75.5 | 73.7 |
| Average 10+ Turker | 70.7 | 68.1 |
| Average 100+ Turker | 76.0 | 75.8 |
| Best Turker | 90.5 | 88.8 |
| Naïve Bayes aggregate | 84.5 | 85.3 |
Concept accuracy using Naïve Bayes aggregation, varying number of Turkers
| Number of Turkers | 5 | 4 | 3 | 2 | 1 | Dynamic |
|---|---|---|---|---|---|---|
| Concept accuracy | 85.3% | 84.2% | 82.7% | 81.6% | 74.3% | 86.0% |
| Cost | $1.89 | $1.51 | $1.13 | $0.75 | $0.38 | $0.97 |
Surface and concept aggregate performance
| Surface level—Quadrant 3 | Concept level—Quadrant 4 | |
|---|---|---|
| Accuracy | 90.6 | 85.3 |
| Precision | 83.6 | 71.9 |
| HIT recall | 95.1 | 94.3 |
| End-to-end recall | 91.7 | 78.8 |
Pairwise agreement and Kappa for each pair of most prolific Turkers
| A–B | A–C | B–C | |
|---|---|---|---|
| 0.630 | 0.680 | 0.731 | |
| 0.263 | 0.378 | 0.477 |
Turker aggregate judgment vs. gold standard
| Naïve Bayes aggregate score | GoldStd YES | GoldStd NO | Total | Percent | Concept level Eval | ||||
|---|---|---|---|---|---|---|---|---|---|
| All HITs | All HITs | All HITs | All HITs | ||||||
| Turker YES | 350 | 137 | 487 | 71.9 | Precision | ||||
| Turker NO | 21 | 570 | 591 | 94.3 | HIT recall | ||||
| Turker Total | 371 | 707 | 1078 | 85.3 | Accuracy | ||||
| TOTAL GOLD | 444 | 78.8 | E2E recall | ||||||
Figure 3.Concept accuracy for static and dynamic Turker pools, removing mutations with nonlocal positions.