Literature DB >> 36249702

Improved Training Efficiency for Retinopathy of Prematurity Deep Learning Models Using Comparison versus Class Labels.

Adam Hanif¹, İlkay Yıldız², Peng Tian², Beyza Kalkanlı², Deniz Erdoğmuş², Stratis Ioannidis², Jennifer Dy², Jayashree Kalpathy-Cramer³, Susan Ostmo¹, Karyn Jonas⁴, R V Paul Chan⁴, Michael F Chiang⁵, J Peter Campbell¹.

Abstract

Purpose: To compare the efficacy and efficiency of training neural networks for medical image classification using comparison labels indicating relative disease severity versus diagnostic class labels from a retinopathy of prematurity (ROP) image dataset. Design: Evaluation of diagnostic test or technology. Participants: Deep learning neural networks trained on expert-labeled wide-angle retinal images obtained from patients undergoing diagnostic ROP examinations obtained as part of the Imaging and Informatics in ROP (i-ROP) cohort study.
Methods: Neural networks were trained with either class or comparison labels indicating plus disease severity in ROP retinal fundus images from 2 datasets. After training and validation, all networks underwent evaluation using a separate test dataset in 1 of 2 binary classification tasks: normal versus abnormal or plus versus nonplus. Main Outcome Measures: Area under the receiver operating characteristic curve (AUC) values were measured to assess network performance.
Results: Given the same number of labels, neural networks learned more efficiently by comparison, generating significantly higher AUCs in both classification tasks across both datasets. Similarly, given the same number of images, comparison learning developed networks with significantly higher AUCs across both classification tasks in 1 of 2 datasets. The difference in efficiency and accuracy between models trained on either label type decreased as the size of the training set increased. Conclusions: Comparison labels individually are more informative and more abundant per sample than class labels. These findings indicate a potential means of overcoming the common obstacle of data variability and scarcity when training neural networks for medical image classification tasks.

Entities: Chemical

Keywords: ANOVA, analysis of variance; AUC, area under the receiver operating characteristic curve; Artificial intelligence; Deep learning; ICROP, International Classification of Retinopathy of Prematurity; Labels; Neural networks; ROP, retinopathy of prematurity; Retinopathy of prematurity; i-ROP, Imaging and Informatics in ROP

Year: 2022 PMID： 36249702 PMCID： PMC9560533 DOI： 10.1016/j.xops.2022.100122

Source DB: PubMed Journal: Ophthalmol Sci ISSN： 2666-9145

A deep learning model’s performance is associated strongly with the volume and quality of data on which it has been trained.1, 2, 3, 4, 5 In use cases involving medical image classification, datasets traditionally comprise images with human-assigned class labels indicating the represented diagnosis or finding. However, large, well-composed datasets containing high-quality images for these purposes are not always feasible to obtain. Furthermore, the process of acquiring such images and enlisting the help of expert graders to assign labels is both labor intensive and prone to high interlabeler variance. An alternative method for training deep learning models has been described using comparison labels obtained from human-drawn comparisons of 2 data inputs in the set., In the context of medical diagnosis, this may involve experts grading the relative severity of disease in multiple pairwise comparisons of cases within a dataset and assigning labels to indicate the ranking. This approach has the added advantage of assessing disease severity along a continuum, rather than in categories, which may reflect the natural distribution of disease phenotypes more accurately. Incorporation of comparison labels offers 2 theoretical advantages in the training process. First, training with labels representing all possible relative comparisons between each input in the dataset increases the number of labels for use in training quadratically, potentially improving the performance of models trained on smaller datasets. Second, grading of disease through comparison of severity has demonstrated less intergrader variability than classification alone, suggesting that the creation of a training set with less noise than other labeling methods.8, 9, 10 Although potentially more labor intensive to obtain, the method may lead to more accurate and efficient neural network training with limited amounts of data. In this project, we applied this concept to explore the relative efficiency of neural networks trained to predict disease severity using comparison labels versus the traditional method of using diagnostic class labels on a retinopathy of prematurity (ROP) image dataset.

Methods

Neural Network

A neural network architecture inspired by Siamese networks was constructed using Bradley-Terry and Thurstone models as loss functions and designed to learn from class and comparison labels as described previously., This network expands on the conventional application of Siamese networks not only by predicting the similarity between inputs of 2 identical base networks, but also by regressing comparison labels simultaneously. When training with comparison labels, the network learns by maximizing the likelihood of comparison labels under the Bradley-Terry model., When learning from class labels, the network uses the same architecture as a base Siamese network, predicting the class label pertaining to the single input image. Formally, a base neural network exists representing the coupling between class and comparison labels. The base neural network receives an image and produces latent features that are predictive of both class and comparison labels. The classification network contains the base network, followed by a fully connected neural network that predicts the class label from latent features extracted by the base network. The comparison network receives a pair of images and extracts the corresponding pair of latent features using the same base network. The base network is followed by another fully connected neural network that predicts the severity score from the latent features of each image. Finally, the pair of severity scores are used collectively to predict the comparison label outcome between the pair of images.

Datasets

Three pre-existing datasets were used in the study comprising wide-angle retinal images obtained from patients undergoing diagnostic ROP examinations with digital fundus imaging using the RetCam (Natus Medical, Inc). All images exhibited the posterior retina and were obtained as part of the Imaging and Informatics in ROP (i-ROP) cohort study. Two labeled datasets were used to train the network using class and comparison labels. The first dataset included 100 retinal images labeled by members of the i-ROP consortium (the i-ROP dataset). The second dataset included 30 images labeled by the 34 members of the Third International Classification of Retinopathy of Prematurity (ICROP) committee (the ICROP dataset). A test dataset comprising 5561 separate retinal images was used for evaluation of the classification and comparison neural networks (Table 1). All images in the i-ROP and test datasets were assigned a reference standard diagnosis based on the consensus diagnosis among 3 masked image graders and the ophthalmoscopic diagnosis, as described previously. This study was approved by the institutional review board at Oregon Health & Science University and all participating institutions (Beaumont Health, Cedars Sinai Medical Center, Children's Hospital of Los Angeles, Columbia University Medical Center, Weill Cornell Medical Center, University of Miami Health System) in the i-ROP cohort study. The research adhered to the tenets of the Declaration of Helsinki, and written informed consent was obtained from all parents of infants whose images were included in the datasets.

Table 1

Distribution of Plus Disease Severity Classes within Datasets

Dataset	Normal	Preplus	Plus	Total
i-ROP	54	31	15	100
ICROP	6	10	14	30
Test dataset	4577	812	172	5561

ICROP = International Classification of Retinopathy of Prematurity; i-ROP = Imaging and Informatics in ROP.

Distribution of Plus Disease Severity Classes within Datasets ICROP = International Classification of Retinopathy of Prematurity; i-ROP = Imaging and Informatics in ROP.

Labeling

The i-ROP and ICROP datasets were labeled in 2 ways: with a class label for each image and with a comparison label for each pair of images. For both datasets used in training and validation, classification and comparison were performed using an open-source, web-based, image severity assessment platform as described previously. For class labels, each grader first was presented 1 image at a time and was asked to assign each image a label of no plus, preplus, and plus (Fig 1). Next, for comparison labels, graders were presented a pair of images from within the dataset and prompted to click on the image that represents more severe disease. In instances where images of similar severity were presented, reviewers were expected to choose which was of marginally greater severity based on their best clinical intuition and experience. The Elo algorithm was used to convert pairwise comparisons from this task into rankings. One “class label” indicating severity of disease was assigned per image per grader. For the i-ROP dataset, 13 expert graders were recruited to provide class labels, and 5 experts completed the comparison task (consisting of more than 4000 pairwise comparisons among the 100 images). The 30 images in the ICROP dataset were labeled both at the image level (class) and by pairwise comparisons by the 34 members of the Third ICROP committee.

Figure 1

Diagram showing the labeling process. Graders were asked to perform 2 tasks. A, They were given a single image at a time and asked to label the image as plus, preplus, or no plus. B, They were shown a pair of images and asked to choose the image that represented more severe disease.

Experimental Design

Both the i-ROP and ICROP datasets were used separately for training of neural networks in 2 primary experiments. Twenty images with reference standard diagnosis labels from the i-ROP dataset (that were not used in training or testing) were used to optimize each trained model for the classification task in a validation step. The test dataset then was used to measure the performance of the best class and comparison models from each dataset. This training, validation, and testing scheme (Fig 2) was performed with incrementally smaller training sets comprising either class or comparison labels corresponding to a fixed number of randomly selected images within the dataset (experiment A) or a fixed number of randomly selected labels (experiment B). Each experiment was performed 3 times, each generating an area under the receiver operating characteristic curve (AUC) per training set size in 1 of 2 binary classification tasks: normal versus abnormal (preplus and plus) and plus versus nonplus (normal and preplus). For each of the 3 repetitions at each size of training set, a different group of randomly selected images from within the corresponding image dataset was used.

Figure 2

Schematic diagram illustrating the training, validation, and testing process involved in developing the neural networks applied to 1 of 2 binary classification tasks: normal versus abnormal and plus versus nonplus. RSD = reference standard diagnosis. From each dataset, 60% of the available images and their corresponding labels given by the number of experts, E, were selected randomly for use in a training subset (Fig 3). This subset then was refined to compose a balanced distribution of the 3 possible severity classes. First, 1 expert whose gradings within the subset were distributed across these classes most evenly compared with other graders was identified. The number of labels for the severity class assigned least frequently by this grader then was determined and was used as the number of images to sample randomly from each class type. The resulting sum, N, of these selected images and their corresponding class and comparison labels, M, collected from all experts constituted the final balanced training set, comprising equal numbers of images of each severity class. This allowed the use of the largest samples possible, because the limiting batch size was the minority class of the labeler with the largest number in the minority class.

Figure 3

Flow diagram showing a simplified depiction of neural network training between class and comparison labels in experiments A and B. Sixty percent of images from either the Imaging and Informatics in ROP (i-ROP) or International Classification of Retinopathy of Prematurity (ICROP) datasets were selected randomly. This selection then was balanced so as to achieve a near-even distribution of images represented by each of the 3 severity classes. In experiment A, the total number of class labels assigned to these images by expert graders then was used to train a neural network. Similarly, all comparison labels associated with the same images in this balanced training set were used to train a neural network for performance comparison. In experiment B, a set of class labels each corresponding to a single image in the balanced test set was used for training a neural network and was compared with a neural network trained on an equivalent number of comparison labels. E = total number of expert graders. ROP = retinopathy of prematurity. In experiment A, neural networks were trained on either N × E class labels corresponding to the images within the final balanced training set, or all comparison labels corresponding to the same set of images. In experiment B, neural networks were trained on either M / E images with M class labels within the final balanced training set, or M / E randomly selected image pairs with M comparison labels associated with the same set of images. In later iterations of both experiments, the number of images (N) or labels (M) used in training was reduced incrementally. In experiment A, neural networks were trained on either N (total selected images) × E (number of graders who assigned class labels to the images in N) class labels corresponding to the images within the final balanced training set, or all comparison labels corresponding to the same set of images. In experiment B, neural networks were trained on either M / E images with M class labels within the final balanced training set, or M / E randomly selected image pairs with M comparison labels associated with the same set of images. In later iterations of both experiments, the number of images (N) or labels (M) used in training was reduced incrementally.

Neural Network Implementation

The training procedure follows closely from Yıldız et al and Brown et al. Retinal images are prepared first with a pretrained U-Net architecture to convert the colored images into black-and-white masks for retinal vessels. The GoogleNet convolutional neural network architecture, without the fully connected layers, is used as the base neural network to extract latent features from each image. As required by the GoogleNet architecture design, each image is resized to 224 × 224. To leverage the well-known transfer learning properties of neural networks trained on images, GoogleNet layers are initialized with weights pretrained on the ImageNet dataset. Both fully connected networks following the base network in classification and comparison networks are designed as single fully connected layers with sigmoid activations. Classification and comparison networks are trained separately end to end via stochastic gradient descent, in which the learning rate is varied in the range 0.01 to 0.0001. To avoid overfitting when learning from a small number of training images, weight decay is used with regularization parameter varying in the range 0.02 to 0.0002. Both learning rate and regularization hyperparameters are selected with respect to the prediction performance on the validation set. Having learned from comparison labels via the comparison network, the severity score predicted for each image can be used for both class and comparison predictions. Comparison label prediction follows the same procedure as training, in which a pair of severity scores extracted from a pair of images are used collectively to predict the comparison label. To classify a single image, the neural network that predicts the corresponding severity score is applied once, and the resulting severity score is thresholded to determine the class label. Because the severity score is predicted by sigmoid activation, its range is in 0 to 1. Thus, we threshold the severity score at 0.5 to perform each binary classification task.

Statistical Analysis

Descriptive statistics, Welch’s t test, and 2-way repeated-measures analyses of variance (ANOVAs) were performed with Microsoft Excel (Microsoft Corporation). Significance was set at α = 0.05 for all tests. All values are presented as mean ± standard error of the mean. Where applicable, statistically significant differences between values are indicated on figures with asterisks.

Results

Experiment A

A neural network was trained with either comparison or class labels corresponding to 8, 16, and 24 images from the i-ROP dataset (Fig 4A, B). In both the normal versus abnormal and plus versus nonplus classification tasks, no statistically significant difference was calculated between models trained on either label type. Separately, a neural network was trained with either comparison or class labels corresponding to 3 and 6 images from the ICROP dataset (Fig 4C, D). In the normal versus abnormal task, the average AUC from training with comparison labels associated with 3 images was significantly higher than from training with class labels associated with the same number of images (P = 0.008, Welch’s t test). For both classification tasks, training on comparison labels yielded significantly higher AUCs than training on class labels (2-way ANOVA: normal vs. abnormal, F = 30.41; main effect, P = 0.0006; plus vs. nonplus, F = 5.83; main effect, P = 0.04).

Figure 4

Line graphs showing experiment A neural network performance. A, B, Normal versus abnormal (A) and plus versus nonplus (B) classification tasks from models trained on class or comparison labels corresponding to images within the Imaging and Informatics in ROP (i-ROP) dataset. No statistically significant difference was calculated between models trained on either label type. C, D, Classification performances from models trained on class or comparison labels corresponding to images within the Classification of Retinopathy of Prematurity (ICROP) dataset. Training on comparison labels yielded significantly higher area under the receiver operating characteristic curves (AUCs) than training on class labels (2-way analysis of variance: normal vs. abnormal: F = 30.41; main effect, P = 0.0006; plus vs. nonplus: F = 5.83; main effect, P = 0.04). In the normal versus abnormal task (C), the average AUC from training with comparison labels associated with 3 images was significantly higher than from training with class labels associated with the same number of images (P = 0.008, Welch’s t test).

Experiment B

A neural network was trained with 78, 156, 234, and 312 comparison or class labels from the i-ROP dataset (Fig 5A, B). In both classification tasks, the average AUC from training with 156 comparison labels was significantly higher than that measured from training with class labels (Welch’s t test: normal vs. abnormal, P = 0.002; plus vs. nonplus, P = 0.02). Additionally, training on comparison labels in both classification tasks yielded significantly higher AUCs than training on class labels (2-way ANOVA: normal vs. abnormal, F = 12.16; main effect, P = 0.003; plus vs. nonplus, F = 8.77; main effect, P = 0.009). Separately, a neural network was trained with 70, 140, and 204 comparison or class labels from the ICROP dataset. In the normal versus abnormal task, the average AUC from training with 204 comparison labels was significantly higher than that measured from training with class labels (P = 0.002, Welch’s t test; Fig 5C, D). Training on comparison labels yielded significantly higher AUCs than training on class labels in both classification tasks (2-way ANOVA: normal vs. abnormal: F = 13.93; main effect, P = 0.003; plus vs. nonplus, F = 7.14; main effect, P = 0.02).

Figure 5

Line graphs showing experiment B neural network performance. A, B, Normal versus abnormal (A) and plus versus nonplus (B) classification tasks from models trained on class or comparison labels within the Imaging and Informatics in ROP (i-ROP) dataset. A, B, Average area under the receiver operating characteristic curve (AUC) from training with 156 comparison labels was significantly higher than that measured from training with class labels (Welch’s t test: normal vs. abnormal, P = 0.002; plus vs. nonplus, P = 0.02). Training on comparison labels yielded significantly higher AUCs than training on class labels (2-way analysis of variance [ANOVA]: normal vs. abnormal: F = 12.16; main effect, P = 0.003; plus vs. nonplus: F = 8.77; main effect, P = 0.009). C, D, Classification performances from models trained on class or comparison labels corresponding to images within the International Classification of Retinopathy of Prematurity (ICROP) dataset. Training on comparison labels yielded significantly higher AUCs than training on class labels in both classification tasks (normal vs. abnormal: 2-way ANOVA: F = 13.93; main effect, P = 0.003; plus vs. nonplus: F = 7.14; main effect, P = 0.02). In the normal versus abnormal task (C), the average AUC from training with 204 comparison labels was significantly higher than that measured from training with class labels (P = 0.002, Welch’s t test).

Discussion

This study evaluated the relative performance of neural networks trained on either class or comparison labels for classification of disease severity in ROP fundus images. Given the same number of represented images, as in experiment A, learning from comparison labels generated more accurate neural networks, achieving statistical significance in both classification tasks for training sets derived from the ICROP dataset. This observation may be explained in part by the fact that pairwise comparisons allow for multiple comparison labels to be associated with a single image, as opposed to a single diagnostic class label. With more labels available for training per image, a neural network therefore may have a deeper pool of samples from which it may be trained and validated. The use of comparison labels in this way offers a potential solution for training image classification models with small datasets. Because the number of labels available for training per image was greater using comparisons in experiment A, we additionally investigated whether network performance may differ when training on equal numbers of label type. Given the same number of labels, as in experiment B, neural networks using comparison labels achieved higher AUC, exhibiting statistically significant main effects in both classification tasks with both datasets. This may be explained by prior observations that comparison labels elicit less intergrader variability, or noise, compared with that of class labels.,, Although 2-way ANOVAs were useful in projecting a main effect of treatment, or training label type, statistically significant differences per Welch’s t test were not calculated consistently between models trained on the same number of images or labels. However, significant differences were observed most frequently between models trained with fewer samples. This not only supports the presumed greater efficiency of training with comparison labels, but also a diminishing difference in performance between networks trained on either label type as the size of the training set increases. At the greatest sizes of training set used, learning by both label types achieved AUC values comparable with those achieved by other deep learning models applied for medical imaging classification., However, networks trained on comparison labels approached these performance levels earlier, as the size of training set became incrementally greater. The usefulness of this approach when high-quality models are required in the setting of limited data therefore may be circumstantial, because the performance of networks trained on either label type may achieve similar levels of performance when trained in data-replete settings. Although the use of comparison labels offers an alternative solution to training with noisy, small datasets, they are more labor intensive, or expensive, to obtain. Whereas a single diagnostic class label may be assigned per image per grader, comparison labels require graders to label all pairwise comparisons of the images in the set independently. To facilitate this process, we used an internally developed image severity assessment platform that presented graders with 2 images to compare and incorporated responses into an Elo algorithm to generate rankings. Ordinal classifications long have been used in medicine to indicate severity in continuous disease processes. In the context of ROP, International Classification of ROP (ICROP) criteria are used conventionally to derive subclassifications of zone (I–III), stage (0–5), and plus disease status (present or not) from subjective and qualitative assessment of disease features that direct treatment and guide clinical trials. However, recognition of ROP classifications as checkpoints on a disease continuum is increasing, most recently exemplified in 2021 by the update on ICROP, third edition, which formally recognizes preplus and plus disease as part of a continuous spectrum of disease. As the frameworks for ROP disease classification increasingly reflect its underlying mechanisms, so must the appropriate models be applied to the classification tasks at hand. The use of comparison labels in training may be a more fitting way to train image classifiers tasked with assigning ordinal terms that individually represent a range of severity on the ROP spectrum. As neural networks are implemented for classification in other continuous disease models, this approach to training should be considered.

Limitations

The findings and interpretation of this study are limited by the time and computing power required both to acquire more expert labels per image and to perform multiple repetitions of experiments. Access to datasets of labels that are both more numerous per image and distributed more evenly between severity class per grader would permit a wider range of training set sizes after the balancing process and would characterize AUC curve profiles more accurately. Our analysis of the ICROP dataset in experiment A was limited in this way, with a maximum training set size of 6 images and only 2 average AUCs because of the minimal difference between possible iterations of training set size. Furthermore, the ability to perform more experiment repetitions presumably would reduce variability between multiple trials of testing at a given set size and would draw subjectively observable differences between groups toward statistical significance. This also may enable more informed choices of methods for statistical analysis. Performing only 3 repetitions of each experiment per size of training set precluded the assessment of normality in our data. Although more fitting methods for comparison of a continuous, skewed variable interest between 2 independent samples were considered, such tests as the Mann–Whitney U test generally require larger sample sizes. The Welch’s t test for comparison of 2 independent samples of unequal variance thus was chosen, conceding the assumption of a normal distribution. We additionally used 2-way repeated-measures ANOVAs to estimate a main effect of treatment (i.e., training label type) on the dependent variable AUC across the independent variable of training set size. To justify this approach, we again had to permit the assumption of normality, as well as repeated measures design. In this case, we interpreted repeated measures to involve the multiple measurements of the dependent variable AUC taken on the same subjects (i.e., neural network architectures) under different conditions (i.e., training set size). In conclusion, the potential of neural networks to generate predictions in the context of medical image classification often is limited by datasets of modest size and quality. We propose an alternative approach to training models with labels generated from pairwise comparisons of disease severity between images within the dataset. Our data indicate that grading images by comparison generates labels that are more abundant and informative per image than diagnostic class labels. This method may offer a solution for improving the efficiency of training models and training highly accurate models in data-scarce settings.

17 in total

1. Interexpert agreement of plus disease diagnosis in retinopathy of prematurity.

Authors: Michael F Chiang; Lei Jiang; Rony Gelman; Yunling E Du; John T Flynn
Journal: Arch Ophthalmol Date: 2007-07

2. Development and Evaluation of Reference Standards for Image-based Telemedicine Diagnosis and Clinical Research Studies in Ophthalmology.

Authors: Michael C Ryan; Susan Ostmo; Karyn Jonas; Audina Berrocal; Kimberly Drenser; Jason Horowitz; Thomas C Lee; Charles Simmons; Maria-Ana Martinez-Castellanos; R V Paul Chan; Michael F Chiang
Journal: AMIA Annu Symp Proc Date: 2014-11-14

3. Multi-Institutional Assessment and Crowdsourcing Evaluation of Deep Learning for Automated Classification of Breast Density.

Authors: Ken Chang; Andrew L Beers; Laura Brink; Jay B Patel; Praveer Singh; Nishanth T Arun; Katharina V Hoebel; Nathan Gaw; Meesam Shah; Etta D Pisano; Mike Tilkin; Laura P Coombs; Keith J Dreyer; Bibb Allen; Sheela Agarwal; Jayashree Kalpathy-Cramer
Journal: J Am Coll Radiol Date: 2020-06-24 Impact factor: 5.532

4. Predictive value of pre-plus disease in retinopathy of prematurity.

Authors: David K Wallace; Sharon F Freedman; M E Hartnett; Graham E Quinn
Journal: Arch Ophthalmol Date: 2011-05

5. Classification and comparison via neural networks.

Authors: İlkay Yıldız; Peng Tian; Jennifer Dy; Deniz Erdoğmuş; James Brown; Jayashree Kalpathy-Cramer; Susan Ostmo; J Peter Campbell; Michael F Chiang; Stratis Ioannidis
Journal: Neural Netw Date: 2019-06-19

6. Plus Disease in Retinopathy of Prematurity: Improving Diagnosis by Ranking Disease Severity and Using Quantitative Image Analysis.

Authors: Jayashree Kalpathy-Cramer; J Peter Campbell; Deniz Erdogmus; Peng Tian; Dharanish Kedarisetti; Chace Moleta; James D Reynolds; Kelly Hutcheson; Michael J Shapiro; Michael X Repka; Philip Ferrone; Kimberly Drenser; Jason Horowitz; Kemal Sonmez; Ryan Swan; Susan Ostmo; Karyn E Jonas; R V Paul Chan; Michael F Chiang
Journal: Ophthalmology Date: 2016-08-24 Impact factor: 12.079

7. U-Net: deep learning for cell counting, detection, and morphometry.

Authors: Thorsten Falk; Dominic Mai; Robert Bensch; Özgün Çiçek; Ahmed Abdulkadir; Yassine Marrakchi; Anton Böhm; Jan Deubner; Zoe Jäckel; Katharina Seiwald; Alexander Dovzhenko; Olaf Tietz; Cristina Dal Bosco; Sean Walsh; Deniz Saltukoglu; Tuan Leng Tay; Marco Prinz; Klaus Palme; Matias Simons; Ilka Diester; Thomas Brox; Olaf Ronneberger
Journal: Nat Methods Date: 2018-12-17 Impact factor: 28.547

8. Assessment of Convolutional Neural Networks for Automated Classification of Chest Radiographs.

Authors: Jared A Dunnmon; Darvin Yi; Curtis P Langlotz; Christopher Ré; Daniel L Rubin; Matthew P Lungren
Journal: Radiology Date: 2018-11-13 Impact factor: 29.146

Review 9. Machine Learning in Dermatology: Current Applications, Opportunities, and Limitations.

Authors: Stephanie Chan; Vidhatha Reddy; Bridget Myers; Quinn Thibodeaux; Nicholas Brownstone; Wilson Liao
Journal: Dermatol Ther (Heidelb) Date: 2020-04-06

Review 10. International Classification of Retinopathy of Prematurity, Third Edition.

Authors: Michael F Chiang; Graham E Quinn; Alistair R Fielder; Susan R Ostmo; R V Paul Chan; Audina Berrocal; Gil Binenbaum; Michael Blair; J Peter Campbell; Antonio Capone; Yi Chen; Shuan Dai; Anna Ells; Brian W Fleck; William V Good; M Elizabeth Hartnett; Gerd Holmstrom; Shunji Kusaka; Andrés Kychenthal; Domenico Lepore; Birgit Lorenz; Maria Ana Martinez-Castellanos; Şengül Özdek; Dupe Ademola-Popoola; James D Reynolds; Parag K Shah; Michael Shapiro; Andreas Stahl; Cynthia Toth; Anand Vinekar; Linda Visser; David K Wallace; Wei-Chi Wu; Peiquan Zhao; Andrea Zin
Journal: Ophthalmology Date: 2021-07-08 Impact factor: 12.079