| Literature DB >> 23548263 |
Haijun Zhai1, Todd Lingren, Louise Deleger, Qi Li, Megan Kaiser, Laura Stoutenborough, Imre Solti.
Abstract
BACKGROUND: A high-quality gold standard is vital for supervised, machine learning-based, clinical natural language processing (NLP) systems. In clinical NLP projects, expert annotators traditionally create the gold standard. However, traditional annotation is expensive and time-consuming. To reduce the cost of annotation, general NLP projects have turned to crowdsourcing based on Web 2.0 technology, which involves submitting smaller subtasks to a coordinated marketplace of workers on the Internet. Many studies have been conducted in the area of crowdsourcing, but only a few have focused on tasks in the general NLP field and only a handful in the biomedical domain, usually based upon very small pilot sample sizes. In addition, the quality of the crowdsourced biomedical NLP corpora were never exceptional when compared to traditionally-developed gold standards. The previously reported results on medical named entity annotation task showed a 0.68 F-measure based agreement between crowdsourced and traditionally-developed corpora.Entities:
Mesh:
Year: 2013 PMID: 23548263 PMCID: PMC3636329 DOI: 10.2196/jmir.2426
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Figure 1Example of linkages between medications and their attributes.
CTA corpus statistics.
| Corpus statistics |
| |
| Documents | 3000 | |
| Tokens | 635,003 | |
|
|
| |
|
| Medication name | 9968 |
| Medication type | 11,789 | |
| Date | 16 | |
| Dosage | 645 | |
| Duration | 644 | |
| Form | 482 | |
| Frequency | 381 | |
| Route | 894 | |
| Status change | 598 | |
| Strength | 409 | |
| Modifier | 5827 | |
| All classes | 31,653 | |
Figure 2Snippets of CML, CSS and JavaScript.
Figure 3Medication named entity recognition task interface.
Figure 4Linking task interface.
Figure 5The distributions of turkers’ experience for medical named-entity task, correction task and linking task (X axis denotes number of jobs, Y axis indicates number of turkers).
Results of correction task with 200 units and 1000 judgments (the pre-determined threshold and its corresponding Pa, Rb, and Fc for each column are italicized).
| Simple | Trust | Experience | ||||||||||
| Thd | P | R | F | Th | P | R | F | Th | P | R | F | |
|
| ||||||||||||
|
| 0.20 | 0.796 | 0.938 | 0.861 | 0.18 | 0.825 | 0.927 | 0.873 | 0.18 | 0.845 | 0.921 | 0.881 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 0.60 | 0.950 | 0.812 | 0.875 | 0.30 | 0.898 | 0.906 | 0.902 | 0.30 | 0.900 | 0.891 | 0.896 |
|
| 0.80 | 0.972 | 0.732 | 0.835 | 0.36 | 0.908 | 0.887 | 0.897 | 0.36 | 0.933 | 0.867 | 0.899 |
|
| ||||||||||||
|
| 0.20 | 0.610 | 0.916 | 0.733 | 0.18 | 0.655 | 0.892 | 0.755 | 0.18 | 0.662 | 0.872 | 0.752 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 0.60 | 0.851 | 0.541 | 0.661 | 0.30 | 0.736 | 0.792 | 0.763 | 0.30 | 0.756 | 0.773 | 0.764 |
|
| 0.80 | 0.945 | 0.382 | 0.544 | 0.36 | 0.776 | 0.744 | 0.760 | 0.36 | 0.783 | 0.684 | 0.730 |
aprecision
brecall
cF-measure
dthreshold
Baseline Results of medical named entity annotation corresponding to the correction task (the pre-determined threshold and its corresponding Pa, Rb, and Fc for each column are italicized).
| Simple | Trust | Experience | ||||||||||
| Thd | P | R | F | Th | P | R | F | Th | P | R | F | |
|
| ||||||||||||
|
| 0.20 | 0.712 | 0.934 | 0.808 | 0.18 | 0.839 | 0.891 | 0.864 | 0.18 | 0.774 | 0.908 | 0.835 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 0.60 | 0.909 | 0.788 | 0.844 | 0.30 | 0.870 | 0.887 | 0.878 | 0.30 | 0.881 | 0.876 | 0.879 |
|
| 0.80 | 0.956 | 0.655 | 0.778 | 0.36 | 0.899 | 0.803 | 0.848 | 0.36 | 0.894 | 0.809 | 0.849 |
|
| ||||||||||||
|
| 0.20 | 0.473 | 0.879 | 0.615 | 0.18 | 0.627 | 0.724 | 0.672 | 0.18 | 0.594 | 0.779 | 0.674 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 0.60 | 0.737 | 0.519 | 0.609 | 0.30 | 0.670 | 0.700 | 0.685 | 0.30 | 0.681 | 0.664 | 0.673 |
|
| 0.80 | 0.890 | 0.358 | 0.510 | 0.36 | 0.726 | 0.550 | 0.626 | 0.36 | 0.717 | 0.558 | 0.628 |
aprecision
brecall
cF-measure
dthreshold
Figure 6Improvement chart for correction task.
Information on turkers participating in the 3 tasks.
| Task name | Participating turkers | Turkers passing the test |
| Medical named-entities | 1144 | 156 |
| Correction | 678 | 86 |
| Linking | 644 | 46 |
Figure 7The distribution of turkers’ F-measure for medical named-entity task, correction task and linking task (X axis denotes F-measure, Y axis indicates number of turkers).
Cost and time of the 3 tasks.
|
|
| Crowdsourcing | In-house | |||
| Task name | Total units | Total judgments | Total cost | Total time | Total judgments | Total time |
| Medical named-entities | 3400 | 17,000 | $652.85 | 57 hours | 6800 | 128 hours |
| Correction | 735 | 3675 | $141.13 | 38 hours | N/A | N/A |
| Linking | 3400 | 17,000 | $652.85 | 27 hours | 6800 | 44 hours |
Results of medical named entity annotation (the pre-determined threshold and its corresponding Pa, Rb, and Fc for each column are italicized).
| Simple | Trust | Experience | |||||||||||
| Thd | P | R | F | Th | P | R | F | Th | P | R | F | ||
|
| |||||||||||||
|
| 0.20 | 0.694 | 0.931 | 0.796 | 0.18 | 0.835 | 0.887 | 0.860 | 0.18 | 0.807 | 0.895 | 0.849 | |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
| 0.60 | 0.920 | 0.815 | 0.865 | 0.30 | 0.869 | 0.874 | 0.871 | 0.30 | 0.885 | 0.854 | 0.870 | |
|
| 0.80 | 0.955 | 0.696 | 0.805 | 0.36 | 0.916 | 0.819 | 0.865 | 0.36 | 0.910 | 0.820 | 0.863 | |
|
| |||||||||||||
|
| 0.20 | 0.431 | 0.879 | 0.579 | 0.18 | 0.632 | 0.781 | 0.699 | 0.18 | 0.583 | 0.800 | 0.675 | |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
| 0.60 | 0.831 | 0.598 | 0.696 | 0.30 | 0.709 | 0.745 | 0.727 | 0.30 | 0.756 | 0.703 | 0.729 | |
|
| 0.80 | 0.911 | 0.396 | 0.552 | 0.36 | 0.819 | 0.608 | 0.698 | 0.36 | 0.816 | 0.614 | 0.700 | |
aprecision
brecall
cF-measure
dthreshold
Results of linking task (the pre-determined threshold and its corresponding Pa, Rb, and Fc for each column italicized).
| Simple | Trust | Experience | |||||||||
| Thd | P | R | F | Th | P | R | F | Th | P | R | F |
| 0.20 | 0.845 | 0.984 | 0.910 | 0.18 | 0.927 | 0.982 | 0.954 | 0.18 | 0.927 | 0.979 | 0.952 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 0.60 | 0.981 | 0.959 | 0.970 | 0.30 | 0.949 | 0.975 | 0.962 | 0.30 | 0.955 | 0.973 | 0.964 |
| 0.80 | 0.990 | 0.925 | 0.956 | 0.36 | 0.975 | 0.967 | 0.971 | 0.36 | 0.977 | 0.965 | 0.971 |
aprecision
brecall
cF-measure
dthreshold