| Literature DB >> 33173613 |
Janan Arslan1,2, Gihan Samarasinghe3, Kurt K Benke4,5, Arcot Sowmya3, Zhichao Wu1, Robyn H Guymer1,2, Paul N Baird2.
Abstract
Purpose: The purpose of this study was to summarize and evaluate artificial intelligence (AI) algorithms used in geographic atrophy (GA) diagnostic processes (e.g. isolating lesions or disease progression).Entities:
Keywords: age-related macular degeneration; artificial intelligence; geographic atrophy
Mesh:
Year: 2020 PMID: 33173613 PMCID: PMC7594588 DOI: 10.1167/tvst.9.2.57
Source DB: PubMed Journal: Transl Vis Sci Technol ISSN: 2164-2591 Impact factor: 3.283
Inclusion and Exclusion Criteria for the Literature Review
| Inclusion Criteria | Exclusion Criteria |
|---|---|
| Original, peer-reviewed publication that assessed an AI-based algorithm for GA | Systematic reviews, meta-analyses, narrative reviews |
| Published in English language | Reviews with unsystematic methods |
| No limitation on time frame | Editorials, opinion pieces, and commentary letters |
| No limitation on study design or study population | Publications in which GA was not the only disease/disease state under assessment (e.g. a classification algorithm that classified AMD into neovascular AMD or GA) |
| Could include conference proceedings and abstracts | Publications that developed an algorithm which could have a widespread use in ophthalmology (e.g. vessel or drusen segmentation not specifically designed for GA assessment) |
AI, artificial intelligence; AMD, age-related macular degeneration; GA, geographic atrophy.
Figure 1.The PRISMA flowchart illustrating the literature selection process. Online databases, PubMed and Web of Science, were used for this review. Reference lists from identified publications were also reviewed to identify any GA-AI papers which may have been missed using our search keywords. AI, artificial intelligence; GA, geographic atrophy.
Measured Variables Collected for Review
| Measured Variable | Reasoning |
|---|---|
|
| Recording the objective of each study allowed the quantification of the intention (and current direction) of GA-AI studies. For example, what is the primary purpose of GA-AI studies currently? Is it to understand progression or simply to automate current annotative processes? |
|
| This variable quantified the various image types used in GA-AI studies to highlight (1) what common imaging modalities are used in GA assessment, and (2) do different image types contribute to more or less successful AI applications? Although there are several imaging modalities available, the FAF is considered an appropriate tool to measure GA size and growth rate longitudinally with a high degree of reproducibility. |
|
| The general consensus in AI and statistical theory is that the larger the sample size the more accurate the algorithm. However, large sample sizes may be difficult to attain in medical research, depending on the disease prevalence, confidentiality, and ethical and privacy concerns. This variable summarizes the sample sizes used for GA-AI algorithms, and by extension, whether sample sizes of tens and hundreds would suffice in developing highly accurate algorithms. |
|
| This variable assessed the algorithms used, and whether there was a diversity of methods employed or whether similar AI algorithms were being used repetitively. |
|
| Human grader comparison refers to comparing a proposed AI method to that of the current gold standard in GA diagnostics: the human. The expectation is that an AI algorithm should be designed to meet or exceed grading by a human grader. This variable identified publications that have evaluated the accuracy of their algorithms against a human grader, and whether the AI was successful in meeting and/or exceeding expectations. |
|
| This variable quantified the diagnostic accuracy of the proposed GA-AI algorithms. |
AI, artificial intelligence; FAF, fundus autofluorescence; GA, geographic atrophy.
Synopsis of GA-AI Publication Techniques
| Category | Reference | Retinal Image Modality | Total Sample Size | Artificial Intelligence Algorithm | Human Grader Comparison | Outcome Measures (Examples) |
|---|---|---|---|---|---|---|
| 1. Detection and classification of GA | Treder et al., 2018 | FAF | 690 images | Deep CNN using TensorFlow (Google Inc.) | No | Sensitivity, specificity, accuracy |
| Keenan et al., 2019 | CFP | 59,812 images (from AREDS) | CNN | Yes | ||
| 2. Segmentation of GA | Deckert et al., 2005 | FAF | 40 eyes | Region-growing algorithm | No | DSC |
| Lee et al., 2008 | FAF | 100 images | Watershed transform algorithm | No | ||
| Devisetti et al., 2011 | FAF and IR | N/A | Supervised neural network with scaled conjugate gradient learning algorithm | No | ||
| Chen et al., 2013 | SD-OCT and FAF | Geometric active contour model | Yes | |||
| Hu et al., 2013 | SD-OCT and FAF | 20 eyes | Level set method for segmentation | Yes | ||
| Hu et al., 2014 | FAF | 16 images from 16 patients | Supervised pixel classification using | Yes | ||
| Ramsey et al., 2014 | CFP and FAF | Ten patients, each with an average of three image pairs | Fuzzy | Yes | ||
| Feeny et al., 2015 | CFP | 143 images | Random forest decision tree | No | ||
| Hu et al., 2015 | FAF | 16 eyes | Supervised pixel classification using | Yes | ||
| Niu et al., 2016 | SD-OCT and FAF | Chan-Vese model via local similarity factor | Yes | |||
| Fang et al., 2017 | SD-OCT | 117 volume scans from 39 participants | CNN-graph search model | Yes | ||
| Hu et al., 2018 | FAF | 50 images | CNN | Yes | ||
| Hu et al., 2018 | IR | 70 images from 70 subjects | CNN | Yes | ||
| Ji et al., 2018 | SD-OCT | Sparse autoencoders deep network | Yes | |||
| Xu et al., 2018 | SD-OCT | 3D CNN | Yes | |||
| Yang et al., 2018 | SD-OCT and FAF | N/A | Region-growing algorithm | Yes | ||
| Wu et al., 2019 | SD-OCT and synthesized FAF | 56 SD-OCT volumes from 56 patients | Region-aware adversarial network to synthesize FAF images and U-Net for segmentation | No | ||
| Xu et al., 2019 | SD-OCT | A two-stage learning model with offline- and self-learning based on stacked sparse auto-encoders | Yes | |||
| 3. A. Prediction of future overall GA progression | Pfau et al, 2019 | FAF and IR | 296 eyes of 201 patients | Linear mixed-effects model | No | MAE, MSECoefficient of determination ( |
| Liefers et al., 2020 | CFP | Eight-level encoder-decoder network and linear regression | Yes | |||
| 3. B. Prediction of future spatial GA progression | Niu et al., 2016 | SD-OCT | 118 SD-OCT scans from 38 eyes in 29 patients | Chan-Vese model for segmentation and random forest for prediction | No | DSC |
| Pfau et al., 2020 | FAF, IR, SD-OCT, and OCTA | 98 eyes and 59 patients | Mixed-effect logistic regression | No | ||
| Schmidt-Erfurth et al., 2020 | SD-OCT and FAF | 491 SD-OCT volumes from 87 eyes of 54 patients | Residual U-Net and linear regression | No | ||
| 4. Prediction of visual function in GA | Künzel et al, 2020 | FAF, IR, and SD-OCT | 87 patients | Linear regression modelling and LASSO for multicollinearity | No | MAE, MSE, |
| Pfau et al., 2020 | FAF, SD-OCT and IR | 41 eyes from 41 patients (from the Directional-Spread-in-GA (DSGA) study) | Random forest decision tree | No |
Abstract only information.
Conference paper.
N/A, not applicable or information is missing.
AI, artificial intelligence; CFP, color fundus photograph; CNN, convolutional neural network; DSC, Dice Similarity Coefficient; FAF, fundus autofluorescence; GA, geographic atrophy; IR, near-infrared imaging; MAE, Mean Absolute Error; MSE, Mean Squared Error; SD-OCT, spectral domain optical coherence tomography.
Figure 2.Cumulative count of GA-AI publications. There is an increasing trend of GA-AI publications. AI, artificial intelligence; GA, geographic atrophy.
Figure 3.Imaging modalities used in GA-AI studies. FAF only images were the most commonly used image modality among GA-AI studies (n = 6). A combination of SD-OCT and FAF imaging (n = 5) and SD-OCT only (n = 5) were the second most commonly used imaging types in GA-AI studies, followed by CFP only (n = 3). AI, artificial intelligence; GA, geographic atrophy; CFP, color fundus photograph; FAF, fundus autofluorescence; IR, near-infrared; SD-OCT, spectral domain optical coherence tomography; OCTA, optical coherence tomography angiography.
Figure 4.Examples of image segmentation using the Fuzzy c-Means algorithm reported by Ramsey et al. Top row illustrates CFP-based segmentation and the bottom row FAF-based segmentation. B and G are ground truths, C and H are segmentation results, D and I are color coded maps of segmentation results, and E and J illustrate which GA borders were correctly identified (i.e. green). See also the website of MathWorks (https://www.mathworks.com/discovery/image-segmentation.html) for many other examples of image segmentation.
Segmentation-Only Algorithm Outcomes
| Summary of Findings for Segmentation-Only Algorithms ( | |
|---|---|
| 0.47–0.983 | |
| 0.93–0.99 | |
| 0.42–0.995 | |
| 0.659–0.899 | |
| 0.82–0.998 | |
| 0.68–0.89 | |
| 0.79–0.87 | |
| 0.13–0.20 | |
These results represent a total of 18 publications.
Algorithm Outcomes for Main Image Types: FAF, SD-OCT, and CFP
| Evaluation Metric | FAF | SD-OCT | CFP |
|---|---|---|---|
|
| 0.87–0.983 | 0.81–0.90 | 0.47–0.782 |
|
| 0.93–0.98 | 0.95–0.97 | 0.729–0.99 |
|
| 0.75–0.97 | 0.986–0.995 | 0.42–0.966 |
|
| 0.659–0.79 | 0.726–0.899 | – |
|
| 0.937–0.99 | 0.72–0.998 | – |
|
| 0.83–0.89 | 0.81–0.87 | 0.66–0.72 |
|
| 0.80–0.87 | 0.83–0.86 | 0.82 |
|
| – | 0.96–0.97 | 0.95 |
|
| 0.13–0.20 | – | – |
|
| 2.77–4.89 | 2.77–4.89 | – |
These results represent a total of 18 publications that have assessed FAF, including FAF only, SD-OCT and FAF, CFP and FAF, FAF and IR, and FAF, IR, and SD-OCT. In studies with combination modalities, most studies separated results based on image set. Other publications did not discern results between FAF and the other imaging modalities (e.g., Devisetti et al. used FAF and IR and stated a sensitivity of 0.825 and specificity 0.93).
These results represent a total of 14 publications that have assessed SD-OCT, including SD-OCT only, SD-OCT and FAF, and SD-OCT, FAF and IR. Both accuracies were from Ji et al. (one for each dataset used in the study). Sensitivity, specificity, positive predictive value, and negative predictive value ranges were all from Niu et al. One study by Schmidt-Erfurth et al. is not presented, as the results were presented as various correlation P values.
These results represent a total of 4 publications that have assessed CFP, including CFP only and CFP and FAF.