| Literature DB >> 24001514 |
Todd Lingren1, Louise Deleger, Katalin Molnar, Haijun Zhai, Jareen Meinzen-Derr, Megan Kaiser, Laura Stoutenborough, Qi Li, Imre Solti.
Abstract
OBJECTIVE: To present a series of experiments: (1) to evaluate the impact of pre-annotation on the speed of manual annotation of clinical trial announcements; and (2) to test for potential bias, if pre-annotation is utilized.Entities:
Keywords: Information Extraction; Natural Language Processing; Pre-annotation; clinical trial announcements; named entity recognition; umls
Mesh:
Year: 2013 PMID: 24001514 PMCID: PMC3994857 DOI: 10.1136/amiajnl-2013-001837
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
Figure 1UMLS Technology Services SNOMED-CT browser: search for lung cancer.
Figure 2Sample disease/disorder and sign/symptom entities.
Figure 3Pre-annotated clinical trial announcement text in Knowtator.
Figure 4Experiment study design.
CTA pre-annotation experiments
| Entity class | |||||||
|---|---|---|---|---|---|---|---|
| Document sets | Corpus | Number of files | DD | SS | Annotator with pre-annotated Text | Dictionary method | Hypothesis |
| Dictionary | CTA | 500 | 6478 | 484 | N/A | N/A | |
| Control | CTA | 500 | 8117 | 474 | N/A | N/A | |
| 1 | |||||||
| 1.1 | CTA | 100 | 719 | 39 | A2 | Manually generated | Using human annotator collected dictionary of annotation terms to pre-annotate CTAs will reduce annotation time without accompanied bias |
| 1.2 | CTA | 100 | 603 | 38 | A1 | ||
| 2 | |||||||
| 2.1 | CTA | 100 | 878 | 102 | A2 | Automatically generated | Using automatically generated dictionary of annotation terms to pre-annotate CTAs will reduce annotation time without accompanied bias |
| 2.2 | CTA | 100 | 994 | 76 | A1 | ||
A1, annotator 1; A2, annotator 2; CTA, clinical trial announcements; DD, disease/disorder; SS, sign/symptom.
IAA and annotator performance
| Experiment set | IAA (%) | Performance (%) | |
|---|---|---|---|
| A1 | A2 | ||
| 1.1 | 95.5 | 98.8 | 96.4 |
| 1.2 | 93.4 | 98.2 | 95.2 |
| 2.1 | 93.7 | 97.0 | 96.0 |
| 2.2 | 94.7 | 97.0 | 96.9 |
A1, annotator 1; A2, annotator 2; IAA, inter-annotator agreement.
Overall and per entity time savings
| Overall time (hours) | Time per entity (seconds) | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Experiment set | Pre-annotated text | Non-label | % Saved | Average per experiment (%) | Pre | Non-label | % Saved | Average per experiment (%) | p Value |
| 1.1 | 17.7 | 20.5 | 13.9 | 45.4 | 52.7 | 13.9 | <0.01 | ||
| 1.2 | 14.3 | 17.7 | 19.3 | 16.6 | 34.9 | 43.3 | 19.3 | 16.6 | <0.01 |
| 2.1 | 14 | 17.5 | 20.0 | 30.5 | 38.2 | 20.0 | <0.01 | ||
| 2.2 | 14.25 | 18.2 | 21.5 | 20.8 | 28.7 | 36.6 | 21.5 | 20.8 | <0.01 |
Statistical significance of experiments
| Statistical significance of experiments 1–2 (CTA) | |||||
|---|---|---|---|---|---|
| CTA vs 500* | 1.1 vs 500 | 1.2 vs 500 | 2.1 vs 500 | 2.2 vs 500 | |
| A1 vs GS (D) | 0.37 | 0.38 | 0.43 | 0.44 | 0.08 |
| A1 vs GS (S) | 0.42 | 0.3 | 0.98 | ||
| A2 vs GS (D) | 0.36 | 0.14 | 0.01 | 0.22 | 0.11 |
| A2 vs GS (S) | 0.38 | 0.57 | 0.01 | ||
| IAA (D) | 0.34 | 0.96 | 0.24 | 0.54 | 0.95 |
| IAA (S) | 0.06 | 0.38 | 0.73 | ||
| Code_Ent | 0.35 | 0.14 | 0.28 | 0.48 | 0.22 |
| DS_Ent | 0.45 | 0.13 | 0.03 | 0.98 | 0.16 |
| Tokens | 0.4 | 0.06 | 0.03 | 0.11 | 0.15 |
*Control for CTA.
A1, annotator 1; A2, annotator 2; CTA, clinical trial announcements; D, disease/disorder; GS, gold standard; IAA, inter-annotator agreement; S, sign/symptom.
Bold indicates statistical significance at p<0.0001.