| Literature DB >> 23990935 |
Danny Mitry1, Tunde Peto, Shabina Hayat, James E Morgan, Kay-Tee Khaw, Paul J Foster.
Abstract
AIM: Crowdsourcing is the process of outsourcing numerous tasks to many untrained individuals. Our aim was to assess the performance and repeatability of crowdsourcing for the classification of retinal fundus photography.Entities:
Mesh:
Year: 2013 PMID: 23990935 PMCID: PMC3749186 DOI: 10.1371/journal.pone.0071154
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Baseline characteristics of KW participation by study design for trials 1 and 2.
| Trial 1 | ||||
| 0.03c | 0.05c | 0.03c_500_90% | 0.03c_5000_99% | |
| Number of different KWs | 152 | 127 | 39 | 61 |
| Mean (SD) number of HITs per KWs | 13(18) | 15(20) | 51(96) | 26(16) |
| Mean (SD) time on each HIT (secs) | 78(109) | 62(76) | 63(71) | 66(90) |
| Time to overall completion | <1 day | <1 day | 1–2 days | 15 days |
|
| ||||
| 0.03_20 | 0.05_20 | 0.03_500_90% | 0.03_5000_99 | |
| Number of different workers | 69 | 72 | 56 | 46 |
| Mean (SD) number of hits per KWs | 37(18) | 35(19) | 25(15) | 24(14) |
| Mean (SD) time on each hit (secs) | 63(83) | 73(105) | 79(102) | 58(80) |
| Time to overall completion | <1 day | <1 day | 2–3 days | 7 days |
(0.03c = study design 1; 0.05c = study design 2; 0.03c_500_90% = study design 3; 0.03c_5000_99% = study design 4).
The proportion correctly identified by severity of abnormality as well as the sensitivity, specificity and area under the ROC curve (AUC) for each study design in trials 1 and 2 by classification difficulty.
| Trial 1 | Trial 2 | ||||||||
| Proportion correctly identified | Proportion correctly identified | ||||||||
| 0.03c | 0.05c | 0.03c_500_90% | 0.03c_5000_99% | 0.03c | 0.05c | 0.03c_500_90% | 0.03c_5000_99% | ||
|
| 57% | 64% | 55% | 72% | 67% | 69% | 67% | 79% | |
|
| 77% | 75% | 87% | 64% | 87% | 86% | 89% | 52% | |
|
| 96% | 92% | 90% | 99% | 99% | 96% | 98% | 99% | |
|
|
| ||||||||
|
|
|
|
|
|
|
|
| ||
|
| 74% | 68% | 85% | 64% | 87% | 86% | 89% | 52% | |
|
| 74% | 68% | 85% | 64% | 87% | 86% | 89% | 52% | |
|
| 74% | 68% | 85% | 64% | 87% | 86% | 89% | 52% | |
|
|
| ||||||||
|
|
|
|
|
|
|
|
| ||
|
| 61% | 70% | 61% | 72% | 67% | 69% | 67% | 79% | |
|
| 98% | 99% | 98% | 99% | 99% | 96% | 98% | 98% | |
|
| 66% | 74% | 66% | 76% | 72% | 73% | 72% | 82% | |
|
|
| ||||||||
|
|
|
|
|
|
|
|
| ||
|
| 0.678(0.656–0.700 ) | 0.692(0.669–0.715 ) | 0.731(0.711–0.751 ) | 0.681(0.658–0.704 ) | 0.771(0.752–0.790) | 0.777(0.758–0.796) | 0.784(0.766–0.802) | 0.656(0.634–0.680) | |
|
| 0.871(0.850–0.889) | 0.833(0.813–0.854 ) | 0.915(0.895–0.930) | 0.819(0.799–0.839 ) | 0.929(0.913–0.944) | 0.91(0.891–0.929) | 0.938(0.922–0.953) | 0.754(0.732–0.776) | |
|
| 0.704(0.683–0.724) | 0.712(0.692–0.732) | 0.757(0.738–0.776) | 0.701(0.680–0.721) | 0.794(0.776–0.811) | 0.796(0.778–0.814) | 0.806(0.789–0.823) | 0.671(0.648–0.693) | |
(0.03c = study design 1; 0.05c = study design 2; 0.03c_500_90% = study design 3; 0.03c_5000_99% = study design 4).
Figure 1Comparative graphical illustration of the AUC for all classifications by study design (normal-abnormal) - Trial 1 (A) and Trial 2 (D); Comparative graphical illustration of the AUC for easy classifications (normal versus severely abnormal) by study design- Trial 1 (B) and Trial 2 (E); Comparative graphical illustration of the AUC for difficult classifications (normal versus mildly abnormal) by study design- Trial 1 (C) and Trial 2 (F).
The percentage of HITs correctly classified by the majority (>50%) of KW’s, with range of percentage of correct “votes” for each image category in brackets.
| Trial 1 | 0.03c | 0.05c | 0.03c_500_90% | 0.03c_5000_99% |
| Normal (N = 30) | 90%(25–95) | 87%(30–90) | 97%(50–100) | 90%(30–90) |
| Mildly abnormal (N = 60) | 58%(25–95) | 83%(25–100) | 63%(20–100) | 80%(35–100) |
| Severely abnormal (N = 10) | 100%(90–100) | 100%(90–100) | 100%(90–100) | 100%(95–100) |
| Any abnormality (N = 70) | 64%(25–100) | 86%(25–100) | 69%(20–100) | 83%(35–100) |
|
|
|
|
|
|
| Normal (N = 30) | 97%(50–100) | 97%(40–100) | 97%(45–100) | 50%(30–75) |
| Mildly abnormal (N = 60) | 68%(10–100) | 85%(20–100) | 70%(15–100) | 96%(45–100) |
| Severely abnormal (N = 10) | 100%(95–100) | 100%(95–100) | 100%(95–100) | 100%(95–100) |
| Any abnormality (N = 70) | 80%(10–100) | 87%(20–100) | 74%(15–100) | 97%(45–100) |
Figure 2The AUC and associated 95%CI for trial 1 (0.03c) as a function of the number of KW gradings per image.
The AUC increases as the number of KW gradings increases with a peak at 16 individual gradings per image. A similar curve was obtained for all study designs in both trials, although a variation was seen in the optimal number of KWs needed to achieve a peak ROC.