| Literature DB >> 30066655 |
Juan Antonio Lossio-Ventura1, William Hogan1, François Modave1, Yi Guo1, Zhe He2, Xi Yang1, Hansi Zhang1, Jiang Bian3.
Abstract
BACKGROUND: There is strong scientific evidence linking obesity and overweight to the risk of various cancers and to cancer survivorship. Nevertheless, the existing online information about the relationship between obesity and cancer is poorly organized, not evidenced-based, of poor quality, and confusing to health information consumers. A formal knowledge representation such as a Semantic Web knowledge base (KB) can help better organize and deliver quality health information. We previously presented the OC-2-KB (Obesity and Cancer to Knowledge Base), a software pipeline that can automatically build an obesity and cancer KB from scientific literature. In this work, we investigated crowdsourcing strategies to increase the number of ground truth annotations and improve the quality of the KB.Entities:
Keywords: Biomedical named-entity recognition; Cancer; Crowdsourcing; Information extraction; Obesity; Relation extraction; Semantic web knowledge base
Mesh:
Year: 2018 PMID: 30066655 PMCID: PMC6069686 DOI: 10.1186/s12911-018-0635-5
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1An overview of the new OC-2-KB information extraction system with crowdsourcing feedback. The offline module creates dictionaries of domain-relevant entities and predicates. The online module extracts facts from scientific literature to construct an obesity and cancer knowledge base
Fig. 2An example question of the crowdsourcing task. The terms in blue are biomedical entities, and terms in purple are the predicates describing the relations
Configurations of the pilot crowdsourcing study
| Data | Number of assignments per HIT | 400 |
|---|---|---|
| Reward per assignment | $0.15 | |
| Estimated | Total reward | $60.0 (=400× |
| Fees to Mechanical Turk | $24.0 (=400× | |
| Total cost | $84.0 | |
| Actual Cost | Assignments done and approved | 193 |
| Total reward | $28.95 | |
| Fees to Mechanical Turk | $11.58 | |
| Total cost | $40.53 |
The time spent by the workers on the HIT
| Time spent on the HIT | |||
|---|---|---|---|
| Time | ≥ 0 s | ≥ 120 s | ≥ 300 s |
| Minimum (min) | 38 | 127 | 302 |
| Maximum (max) | 889 | 889 | 889 |
| Average ( | 358.47 | 372.62 | 466.59 |
| Standard Deviation ( | 176.13 | 167.93 | 140.73 |
Worker performance (F-measures) on the 10 sentences of the HIT
| Time spent on the HIT | |||
|---|---|---|---|
| Sentences | ≥ 0 s | ≥ 120 s | ≥ 300 s |
| Sentence 1 | 59.33% | 60.24% | 65.43% |
| Sentence 2 | 73.63% | 75.03% | 79.62% |
| Sentence 3 | 63.86% | 65.30% | 67.10% |
| Sentence 4 | 84.97% | 84.78% | 87.83% |
| Sentence 5 | 79.79% | 80.62% | 83.04% |
| Sentence 6 | 98.45% | 98.37% | 100.00% |
| Sentence 7 | 72.99% | 74.02% | 76.82% |
| Sentence 8 | 70.99% | 72.67% | 76.99% |
| Sentence 9 | 43.10% | 43.30% | 47.31% |
| Sentence 10 | 72.11% | 73.54% | 77.60% |
Overall worker performance in the pilot study
| Time spent | Number of workers | F |
|---|---|---|
| ≥ 0 s (0 s per sentence) | 193 | 71.92% |
| ≥ 120 s (12 s per sentence) | 184 | 72.79% |
| ≥ 300 s (30 s per sentence) | 115 | 76.17 % |
Configuration and price information of the final study
| Data | Number of HITs | 27 |
| Number of assignments per HIT | 5 | |
| Reward per assignment | $0.5 | |
| Estimated Cost | Total reward per HIT | $2.5 (=5× |
| Fees to Mechanical Turk per HIT | $0.5 (=5× | |
| Total cost per HIT | $3.0 (= | |
| Total cost for 27 HITs | $81.0 (= | |
| Actual Cost | Assignments done and approved | 135 (=27 HITS × |
| 5 assignments) | ||
| Total cost | $81.0 |
The number of HITs completed by the workers
| Number of HITs | Number of workers that completed |
|---|---|
| 1 HIT | 89 workers |
| 2 HITs | 5 workers |
| 3 HITs | 1 worker |
| 4 HITs | 3 workers |
| 5 HITs | 0 worker |
| 6 HITs | 1 worker |
| 7 HITs | 1 worker |
| 8 HITs | 1 worker |
Time spent by the workers over the 27 HITs
| Time spent on HITs | |||
|---|---|---|---|
| Time | ≥ 0 s | ≥ 300 s | ≥ 750 s |
| min | 43 | 332 | 761 |
| max | 2095 | 2095 | 2095 |
|
| 1,001.21 | 1,255.35 | 1,421.11 |
|
| 638.27 | 495.58 | 382.54 |
| Number of assignments completed ∗ | 5734 | 4413 | 3770 |
*The number assignments completed within the time range. Note that, there were 27 HITs, and each HIT was completed by 5 workers. Thus, there are a total of 135 assignments (27 HITs times 5 workers)
Number of triples validated more than 3, 4, and 5 times varying by the workers’ time spent on the HITs
| Time spent on HITs | |||
|---|---|---|---|
| The number of times validated | ≥ 0 s | ≥ 300 s | ≥ 750 s |
| Validated = 5 times | 37 | 19 | 15 |
| Validated ≥ 4 times | 258 | 109 | 68 |
| Validated ≥ 3 times | 918 | 506 | 320 |
Fig. 3An example of triples validated through crowdsourcing
Fig. 4The number of triples created with the baseline OC-2-KB system varying the threshold λ from 0.80 to 1.00
Fig. 5A Venn diagram of comparing the number of triples extracted with the baseline system (B) and validated through crowdsourcing (C), where the number of common triples between B and C is B ∩ C
The number of common triples between the baseline OC-2-KB system and the crowdsourcing (B ∩ C), and the number of triples missed by the baseline OC-2-KB (C − B)
| OC-2-KB | B ∩ C | C − B |
|---|---|---|
| 226 | 692 | |
| 222 | 696 | |
| 216 | 702 | |
| 215 | 703 | |
| 208 | 710 | |
| 198 | 720 | |
| 185 | 733 | |
| 171 | 749 | |
| 154 | 764 | |
| 119 | 799 | |
| 71 | 847 |
F-measures of the retrained random forest models varying the crowdsourcing parameters
| Workers’ time spent on HITs | |||
|---|---|---|---|
| Number of times validated | ≥ 0 s | ≥ 300 s | ≥ 750 s |
| Validated = 5 times | 98.5% | 98.8% | 99.8% |
| Validated ≥ 4 times | 90.3% | 97.0% | 97.2% |
| Validated ≥ 3 times | 79.1% | 85.1% | 91.4% |
Fig. 7A SPARQL query for extracting all the entities related to “breast cancer risk” in (1). The results from the baseline OC-2-KB system are shown in (2), and the results from the retrained system are in (3)
Fig. 6The receiver operating characteristic (ROC) curves of the retrained relation detection models
Fig. 8A SPARQL query for extracting entities related to “interleukin 6”
Fig. 9Results of all predicates existing between “progesterone levels” and “risk of endometrial cancer”
Fig. 10Example of invalid triples associated with “prostate cancer risk in men”