| Literature DB >> 28634154 |
Christopher John Brady1, Lucy Iluka Mudie1, Xueyang Wang1, Eliseo Guallar2, David Steven Friedman1,2.
Abstract
BACKGROUND: Diabetic retinopathy (DR) is a leading cause of vision loss in working age individuals worldwide. While screening is effective and cost effective, it remains underutilized, and novel methods are needed to increase detection of DR. This clinical validation study compared diagnostic gradings of retinal fundus photographs provided by volunteers on the Amazon Mechanical Turk (AMT) crowdsourcing marketplace with expert-provided gold-standard grading and explored whether determination of the consensus of crowdsourced classifications could be improved beyond a simple majority vote (MV) using regression methods.Entities:
Keywords: Amazon Mechanical Turk; Rasch analysis; crowdsourcing; diabetic retinopathy
Mesh:
Year: 2017 PMID: 28634154 PMCID: PMC5497070 DOI: 10.2196/jmir.7984
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Figure 1Example color image and simulated red-free retinal photograph created by deleting the red channel in Adobe Lightroom.
Figure 2The Rasch model formula.
Figure 3Receiver operating characteristic for the diagnosis of abnormal retinal photograph in the phase 1 baseline analysis.
Figure 4Image-Turker map illustrating distribution of measure scores for image grading difficulty and Turker ability (#=2 images/Turkers, .=1 image/Turker, M=mean score, S=1 standard deviation, T=2 standard deviations).
Figure 5Receiver operating curve generated from a logistic regression model using weighted consensus scores of the random 50% (600 images) test set and a second using the nonweighted scores from the same data.
Characteristics of different cut-point values using the weighted logistic model, as compared with the majority vote weighted cut-point and the phase 1 baseline task.
| Correct | Sensitivity | Specificity | AUROCa | ||
| Phase 1 MVb baseline | 75.5 | 75.5 | 75.5 | 0.75 (0.73-0.78) | |
| MV weighted arbitrary cut-point | 80.7 | 87.1 | 76.1 | 0.82 (0.79-0.85) | |
| 0.91 (0.88-0.93) | |||||
| Maximizing % correct | 85.0 | 81.1 | 87.8 | 0.84 (0.81-0.87) | |
| Sensitivity ≈ 90% | 77.5 | 90.3 | 68.5 | 0.79 (0.76-0.83) | |
| Specificity ≈ 90% | 84.5 | 76.6 | 90.1 | 0.83 (0.80-0.86) |
aAUROC: area under the receiver operating characteristic.
bMV: majority vote.
Figure 6Receiver operating curve from logistic regression model using weighted consensus scores using a dichotomization cut-point designed to permit sensitivity of 90% shown alongside unweighted and Rasch-weighted majority vote cut-points.
| Difficulty | Measure score range | Images graded correctly | Messidor grade |
| Hardest | –4.74 to –2.56 | 0-8.3 | 1 |
| Intermediate 1 | –0.14 to –.04 | 43.4-53.9 | 0 |
| Intermediate 2 | 1.01-1.1 | 69.2-76.9 | 0 |
| Intermediate 3 | 2.04-2.09 | 85.2-88.9 | 0 |
| Easiest | 4.5-4.91 | 100 | 3 |
Figure 7Representative retinal fundus images organized by progressive ease of grading correctly (A-E). (A) The image reveals areas of chorioretinal atrophy (arrow) but is without lesions of diabetic retinopathy. (B) This image reveals very subtle microaneurysms (arrows). (C) This image reveals more obvious microaneurysms (arrowheads) and subtle hard exudates (arrow). (D) This image reveals more apparent hard exudates (arrow). (E) This image reveals obvious hard exudates (arrow) and more obvious hemorrhagic microaneurysms (arrowhead).