| Literature DB >> 31867141 |
Mike Schaekermann1,2, Naama Hammel1, Michael Terry1, Tayyeba K Ali3, Yun Liu1, Brian Basham1, Bilson Campana1, William Chen1, Xiang Ji1, Jonathan Krause1, Greg S Corrado1, Lily Peng1, Dale R Webster1, Edith Law2, Rory Sayres1.
Abstract
PURPOSE: To present and evaluate a remote, tool-based system and structured grading rubric for adjudicating image-based diabetic retinopathy (DR) grades.Entities:
Keywords: adjudication; diabetic retinopathy; retinal imaging; teleophthalmology
Year: 2019 PMID: 31867141 PMCID: PMC6922270 DOI: 10.1167/tvst.8.6.40
Source DB: PubMed Journal: Transl Vis Sci Technol ISSN: 2164-2591 Impact factor: 3.283
Baseline Characteristics
| Characteristic | Value |
| Number of images | 499 |
| Number of images for which an anonymized patient code was availablea | 330 |
| Number of unique individuals out of the images for which a patient code was available | 307 |
| DR gradeability distribution according to Baseline adjudication | |
| Images gradable for DR, | 472/499 (94.6) |
| DR severity distribution according to Baseline adjudication, | |
| No apparent DR | 217 (45.9) |
| Mild NPDR | 17 (3.6) |
| Moderate NPDR | 108 (22.9) |
| Severe NPDR | 72 (15.3) |
| PDR | 58 (12.3) |
PDR, proliferative diabetic retinopathy.
Patient codes were available for images from two hospitals (Sankara Nethralaya and Narayana Nethralaya) of three.
Figure 1Process diagram illustrating remote TA; images are first graded independently by each panel member (round 0); cases with any level of disagreement after independent grading are reviewed by all graders in a round-robin fashion (rounds 1–N); the procedure ends after N review rounds.
Figure 2Illustration of the round-robin approach for remote TA in the context of DR severity grading.
Comparison of Adjudication Procedures
| Property | Adjudication Procedure | |
| Baseline | Tool-Based (TA and TA-F) | |
| Image viewer | Web-based image viewer with built-in tools to adjust zoom level and contrast settings; graders submitted their independent assessments using prompts embedded into the image viewer | |
| Aggregation of grades and identification of disagreements | Exporting results into spreadsheet to manually identify disagreements | Automated process to identify images with disagreement in the grades database |
| First review round | Remotely in spreadsheet | Remotely, using the web-based image viewer; one grader at a time in a round-robin fashion |
| Subsequent review rounds | In-person session; all panel members convene at a set time | |
| Channel for discussion | In-person verbal discussion | Discussion thread integrated into the image viewer; up to one written comment per grader per review round |
| Scheduling of review rounds | Manual process | No manual scheduling required; grading and review tasks automatically queue up for individual graders in the online platform |
| Anonymization of graders | Possible only in the first review round, but not during live discussion | Possible throughout the entire procedure |
| Organization of the disagreement discussion around a set of explicit diagnostic criteria (e.g., lesions) | Challenging to implement during live discussion | Possible using prompt structure integrated into the image viewer |
Figure 3Grading interface for remote TA-F for DR severity assessment. Grader pseudonyms (RX, RY, RZ) are used to associate grading decisions and discussion comments from previous rounds with specific (anonymized) grader identities. The current grader's pseudonym is highlighted with bold white font (see RZ). The panel on the right-hand side lists all prompts included in the TA-F procedure and allows for vertical scrolling between the top half (A) and the bottom half (B).
Interpanel Agreement Between all Adjudication Panels
| Parameter | TA | TA-F | ||
| Panel A | Panel B | Panel C | Panel D | |
| Baseline | 0.948 (0.931–0.964) | 0.943 (0.919–0.962) | 0.921 (0.886–0.948) | 0.963 (0.949–0.975) |
| TA | ||||
| Panel A | / | 0.932 (0.911–0.950) | 0.917 (0.885–0.944) | 0.939 (0.916–0.960) |
| Panel B | / | / | 0.911 (0.873–0.942) | 0.936 (0.914–0.953) |
| TA-F | ||||
| Panel C | / | / | / | 0.919 (0.882–0.949) |
Values are quadratically weighted Cohen's Kappa (95%CI).
Interpanel Agreement (Exact Agreement Rate) Between All Adjudication Panels
| Parameter | TA | TA-F | ||
| Panel A | Panel B | Panel C | Panel D | |
| Baseline | 0.820 | 0.828 | 0.789 | 0.857 |
| TA | ||||
| Panel A | / | 0.811 | 0.811 | 0.852 |
| Panel B | / | / | 0.822 | 0.816 |
| TA-F | ||||
| Panel C | / | / | / | 0.820 |
Interpanel Agreement (Strikeout Rate) Between All Adjudication Panels
| Parameter | TA | TA-F | ||
| Panel A | Panel B | Panel C | Panel D | |
| Baseline | 0.026 | 0.026 | 0.027 | 0.017 |
| TA | ||||
| Panel A | / | 0.039 | 0.041 | 0.038 |
| Panel B | / | / | 0.042 | 0.034 |
| TA-F | ||||
| Panel C | / | / | / | 0.033 |
Figure 4Number of review rounds required per case (i.e., number of rounds until agreement or 15 in case of persistent disagreement) for each of the four adjudication panels.
Figure 5Cumulative percentage of cases resolved per adjudication round for TA procedures.
Figure 6Mean number of review rounds required per rubric criterion in remote TA-F. The Y axis indicates the number of rounds after independent grading until either agreement was reached for the given criterion; or the case was closed due to overall agreement on the diagnosis level. Note that the mean number of review rounds may be below 1 because cases not requiring adjudication due to independent agreement were considered to have 0 review rounds. Green bars correspond to feature criteria, blue bars correspond to differential diagnosis criteria. Error bars indicate the 95% confidence intervals. CWS, cotton-wool spot; HE, hard exudate; NVFP, neovascularization or fibrous proliferation; PRHVH, Preretinal or vitreous hemorrhage; PRP, pan-retinal photocoagulation scars; FLP, focal laser photocoagulation scars.