| Literature DB >> 35881576 |
Kornélia Lenke Laurik-Feuerstein1, Rishav Sapahia2, Delia Cabrera DeBuc2, Gábor Márk Somfai3,4,5.
Abstract
PURPOSE: For the training of machine learning (ML) algorithms, correctly labeled ground truth data are inevitable. In this pilot study, we assessed the performance of graders with different backgrounds in the labeling of retinal fundus image quality.Entities:
Mesh:
Year: 2022 PMID: 35881576 PMCID: PMC9321443 DOI: 10.1371/journal.pone.0271156
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.752
Image quality grading criteria.
Image quality was determined based on four relevant categories representing the standard for good quality color fundus images. All the following categories are necessary in order to grade retinopathy lesions in fullest.
| Grading categories | Definition |
|---|---|
|
| Details are present up to a level allowing to grade smallest retinal alterations e.g. microaneurysm, intraretinal microvascular abnormalities. The small retinal vessels within one-disc diameter around the fovea are depicted sharply. |
|
| The amount of source light incident on the retina is correct for the visualization of smallest retinopathy lesions. There are no washed-out or dark areas that interfere with detailed grading. |
|
| The primary image field includes the entire optic nerve head (ONH) and macula. There is at least one optic disc diameter retina nasally and temporally from the ONH and macula, respectively. |
|
| Artefacts in the image acquisition such as: dust spots, arc defects, fingerprints, camera reflexes or eyelash images regardless whether they hindered image grading |
Fig 1Representative color fundus images from the Excellent (A), Good (B), Adequate (C) and Insufficient for grading (D) categories. Image 1B fulfills criteria for “Good” due to the image field definition (decentered image) and peripheral artifacts; 1C qualifies as Adequate due to its poor illumination, off focus and due to insufficient image field definition (the image field does not contain enough of the retina temporal to the fovea). Image 1D was labelled as Insufficient as it neither depicts the optic nerve head nor makes it possible to visualize the third-generation vessel branches around the macula which, in turn, would not enable to detect retinal changes characteristic for diabetic retinopathy.
Fig 2Screenshots of the image labeling tool developed in Python demonstrating its function.
A) First, the folder containing the images is selected (1). In the next step the output of image labeling can be chosen (2) and finally the labels are specified (3). B) After setting the above parameters there are two options. Either navigate through with the “Prev” and “Next” buttons, or run a timed labeling round by selecting the option to “automatically show next image when labeled” and then the “Start” button. Upon closing the tool, a.csv and optionally.xls will be generated with the results.
Grading labels used for the objective grading round.
In order to decrease the inherent subjectivity of our study, our participants were asked to perform a second round of grading using predefined labels assigned to each category. In this round, the four categories were then complied similarly to the first round of grading.
| Grading categories | Labels used in the objective grading round |
|---|---|
|
| • Optimal |
|
| • Optimal |
|
| • Optimal |
|
| • No artefacts |
Inter-rater agreement in the two setups with different image quality category groups.
Cohens’s weighted kappa was calculated for the grading using 4 image quality categories (Excellent (E)/ Good (G)/ Adequate (A)/ Insufficient (I)) and for the second round of grading using 14 labels. In the latter, the same 4 categories were compiled as in the first round of grading (E/G/A/I). Cohens’s weighted kappa was also determined with E and G merged for both grading rounds [(E+G)/A/I]. To assess the agreement when distinguishing between poor quality images both in the first and second round, Cohen’s weighted kappa was calculated with two merged groups (E and G vs. A and I). The kappa values are presented as medians (interquartile range) for all the graders and for both groups (medical, non-medical).
| 4 Image quality grading criteria | 14 Predefined labels | |||||
|---|---|---|---|---|---|---|
| 4 groups | 3 groups | 2 groups | 4 groups | 3 groups | 2 groups | |
|
| 0.590 (0.167) | 0.657 (0.116) | 0.715 (0.190) | 0.598 (0.053) | 0.669 (0.052) | 0.708 (0.126) |
|
| 0.554 (0.176) | 0.627 (0.147) | 0.625 (0.175) | 0.568 (0.085) | 0.612 (0.107) | 0.581 (0.127) |
|
| 0.564 (0.163) | 0.637 (0.096) | 0.665 (0.178) | 0.594 (0.60) | 0.667 (0.80) | 0.670 (0.151) |