Literature DB >> 33532144

Detection of Referable Horizontal Strabismus in Children's Primary Gaze Photographs Using Deep Learning.

Ce Zheng^1,2, Qian Yao³, Jiewei Lu⁴, Xiaolin Xie⁵, Shibin Lin⁵, Zilei Wang², Siyin Wang², Zhun Fan⁴, Tong Qiao².

Abstract

Purpose: This study implements and demonstrates a deep learning (DL) approach for screening referable horizontal strabismus based on primary gaze photographs using clinical assessments as a reference. The purpose of this study was to develop and evaluate deep learning algorithms that screen referable horizontal strabismus in children's primary gaze photographs.
Methods: DL algorithms were developed and trained using primary gaze photographs from two tertiary hospitals of children with primary horizontal strabismus who underwent surgery as well as orthotropic children who underwent routine refractive tests. A total of 7026 images (3829 non-strabismus from 3021 orthoptics [healthy] subjects and 3197 strabismus images from 2772 subjects) were used to develop the DL algorithms. The DL model was evaluated by 5-fold cross-validation and tested on an independent validation data set of 277 images. The diagnostic performance of the DL algorithm was assessed by calculating the accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUC).
Results: Using 5-fold cross-validation during training, the average AUCs of the DL models were approximately 0.99. In the external validation data set, the DL algorithm achieved an AUC of 0.99 with a sensitivity of 94.0% and a specificity of 99.3%. The DL algorithm's performance (with an accuracy of 0.95) in diagnosing referable horizontal strabismus was better than that of the resident ophthalmologists (with accuracy ranging from 0.81 to 0.85). Conclusions: We developed and evaluated a DL model to automatically identify referable horizontal strabismus using primary gaze photographs. The diagnostic performance of the DL model is comparable to or better than that of ophthalmologists. Translational Relevance: DL methods that automate the detection of referable horizontal strabismus can facilitate clinical assessment and screening for children at risk of strabismus. Copyright 2021 The Authors.

Entities: CellLine Chemical Disease Gene Species

Keywords: automated detection; deep learning; strabismus

Mesh：

Year: 2021 PMID： 33532144 PMCID： PMC7846951 DOI： 10.1167/tvst.10.1.33

Source DB: PubMed Journal: Transl Vis Sci Technol ISSN： 2164-2591 Impact factor: 3.283

Introduction

Strabismus is characterized as any binocular misalignment. It is traditionally considered an eye disorder affecting children, with a prevalence ranging from 0.8% to 6.0%.– In children, the most common type of strabismus is horizontal strabismus, which includes esotropia and exotropia. Early detection and prompt management of strabismus in children is vital as it can improve their long-term visual and sensorimotor outcomes. It has been reported that strabismus is the most common cause of childhood amblyopia., Children with apparent strabismus may also be subject to psychosocial sequelae, such as lower self-confidence and self-esteem anxiety., Despite the importance of strabismus screening, there is only a limited number of pediatric ophthalmologists able to perform this task in many countries, such as China. Furthermore, clinical assessment of strabismus, such as the Hirschberg and Krimsky tests, is labor-intensive and needs to be performed by experts. Choi et al. reported that even strabismus subspecialists showed underestimation or overestimation by at least 10 PD when applying the Hirschberg and Krimsky tests. Several authors have suggested automatically assessing strabismus using photographic and videographic methods., Using a stationary photographic apparatus, Barry et al. presented a modified “self-assessment” method with satisfactory accuracy compared to orthoptic measurements of the angle of strabismus. In another study, Yang et al. reported a computerized three-dimensional Strabismus Photo Analyzer, and their result was comparable with the Hirschberg and Krimsky test. More recently, Lim et al. suggested a “feature-engineering” method for measuring ocular versions using photographs of the nine cardinal positions. However, both manual and “feature-engineering” methods are labor-intensive and need to be specified by experts. Therefore, automated detection of strabismus, mainly referable horizontal strabismus in children, is warranted and has potential benefits in the clinical setting. Deep learning (DL) methods, such as deep convolution neural networks (DCNNs), have demonstrated a capability to automatically screen many pediatric eye diseases, such as congenital cataract and retinopathy of prematurity, more accurately than traditional approaches., In our previous study, we presented a DCNN that achieved automated strabismus detection based on social media photographs. However, such a DL model was not developed in a clinic scenario. The screening performance of DL algorithms has not been compared with human experts using clinical assessments, such as alternate prism and cover test tests, as the gold standard. In this study, we present DL algorithms that screen for referable horizontal strabismus on primary gaze photographs taken during a clinical assessment. We also evaluated the algorithms in an external data set to assess the performance of the DL model.

Methods

Study Design

This study followed the tenets set forth in the Declaration of Helsinki, and it was approved by the institutional review board (IRB) of both the Shanghai Children's Hospital (SCH; identifier, 2018RY029-E01) and the Joint Shantou International Eye Center of Shantou University and the Chinese University of Hong Kong (JSIEC; identifier, EC 20198911(4)-P11). All images were de-identified according to the Health Insurance Portability and Accountability Act Safe Harbor prior to transfer to the study investigators. Informed consent was not required by the IRB due to the retrospective nature of the study and the use of fully de-identified images.

Data Sets

In this cross-sectional study, a total of 7530 primary gaze photographs (with 3330 referable horizontal strabismus and 4200 orthoptic cases) were collected from the Department of Ophthalmology, SCH (referred to as the SCH data set) from 2013 to 2019. The DL networks would require images with visually noticeable features in order to work properly. Each image was inspected by three senior pediatric ophthalmologists. If any ophthalmologist deemed an image as non-gradable (cannot visually detect strabismus), it was removed from the raw data set. We therefore excluded non-gradable images (371 [4.9%] normal and 133 [1.8%] strabismus), leaving the total data set with 7026 images (3829 normal images from 3021 orthoptic subjects and 3197 strabismus images from 2772 subjects) for DL algorithm development. The participants were children with primary horizontal strabismus who underwent surgery and orthotropic children who underwent routine refractive tests at SCH. All subjects underwent a standardized ophthalmic examination, including slit-lamp examination, Snellen visual acuity, cycloplegic refraction, and axial length with the IOLMaster (Carl Zeiss Meditec, Dublin, CA, USA). The Hirschberg test was used to assess eye alignment by noting the corneal light reflex's location within the pupil. The angle of deviation was measured by an alternate prism cover test near (0.33 m) and at a distance (6 m). Referable horizontal strabismus was defined as follows: 1 constant infantile esotropia (≥ 40 PD), residual esotropia of accommodative esotropia (> 10 PD) with full hypermetropic correction, and intermittent or infantile exotropia (> 15 PD) with a manifestation during more than 50% of waking hours after full hypermetropic correction. Exclusion criteria included children with restrictive strabismus, sensory strabismus, paralytic strabismus, myasthenia gravis, nystagmus, and Duane syndrome. All photographs were obtained at a distance of 1 m from the subject using a commercially available camera (D800; Nikon Inc., Tokyo, Japan) with a pen torch attached to the camera. Only primary gaze photographs were collected for the data set.

Experimental Setup

The data set was further randomly split into a training set (80%) and a validation set (20%). We also used person-disjoint to prevent bias, meaning no subjects were shared between the training and validation data sets. As our data set is not large, we used fivefold cross-validation to train and evaluate the performance of the DL model. In cross-validation, the entire data set was split into five groups, and four groups were used as training data, whereas one group was used for validation. The training process was repeated 1000 times to allow for the use of all subsets exactly once as a validation data set. We implement the DL algorithms with the TensorFlow framework (Google, version 2.1.0). The models were trained on an Ubuntu 16.04 operation system with Intel Xeon E5-2690 CPU, 128 GB RAM, and NVIDIA Titan Xp 12 GB GPU. The algorithm's performance was measured by diagnostic parameters, including accuracy, sensitivity, specificity, and area under the receiver operating curve (AUC). The DL algorithm's performance in image classification was compared with three professional screeners on an external validation data set. All statistical analyses were carried out using the Python statistical programming language (version 3.5.1; Python Software Foundation, Beaverton, OR).

Evaluation

To evaluate the performance of the DL model outside of our in-house dataset, we also collected an external test set from the Department of Pediatrics and Strabismus, JSIEC (referred to as the JSIEC validation dataset) from 2018 to 2019 using the same study protocol as described above (Fig. 1). The JSIEC validation dataset included 277 primary gaze photographs with 133 referable horizontal strabismus and 144 normal cases.

Figure 1.

STARD diagrams of image datasets collection and preprocessing for detecting referable horizontal strabismus using deep learning.

The Architecture of the DL Algorithms

In the current study, we adopted a two-stage DL algorithm (Fig. 2) that first detects and de-identifies the primary gaze and further classifies it into referable horizontal strabismus or orthotropic. To detect regions of interest (ROI) and to satisfy the Privacy Rule's de-identification standard, the ROIs (primary gaze photograph without full-face information) were first automatically localized and cropped using a faster-region based convolutional neural network (Faster R-CNN). Faster R-CNN is a target detection algorithm driven by the regional proposal and region-based CNN. It aims to obtain bounding boxes of the targets with location and size information. The primary gaze photographs generated by Faster-RCNN were manually checked, and necessary corrections (adjusting two eyes into a horizontal position) were made where needed. As the ROI is rectangular, we did padding to avoid changing the aspect ratio. The second stage was implemented by performing transfer learning based on the three DCNNs architecture, which was pretrained on ImageNet (1000 object categories with more than 1 million images). The applied DCNNs were VGG16 (Visual Geometry Group, Department of Engineering Science, University of Oxford), Inception-V3 (Google Inc.), and Xception (Google Inc.).– Image pixels were rescaled to values in a range of 0 through 1 and interpolated to fill a 299 × 299 matrix to match the pretrained networks. Data augmentation was performed to enhance the data set, including random horizontal flipping and adjustments of the saturation, brightness, and contrast. For training, we use an Adam optimizer learning rate of 0.0001 and a minibatch gradient descent of size 32. Early stopping was applied when the validation loss did not decrease for 10 epochs. To better visualize the learning procedure of our DL algorithms, we use Class Activation Maps (CAM) to highlight the area that the DL algorithms may highlight in the discriminative image regions.

Figure 2.

Diagrams showing an overview of the proposed two-stage deep learning (DL) algorithm automated detection for referable horizontal strabismus. The first stage algorithm is a faster-region based convolutional neural network (Faster R-CNN) to localize and crop the ROIs (primary gaze photograph without full-face information). The second stage algorithm is pretrained Inception-V3 networks to automatically detect referable horizontal strabismus.

Comparison of the DL Algorithms with Human Experts

We used an external validation data set to compare the DL algorithm's referral decisions with the decisions made by human experts. Three resident ophthalmologists (X.L.X., H.Y., and Q.Y.) with at least 3 years of clinical experience in pediatric and strabismus, who were blinded from the data set collection procedures, were instructed to decide on each testing image independently. As mentioned previously, they were asked to identify referable horizontal strabismus by detecting misalignment of the first Purkinje images (reflection pattern from the corneal light reflex) from the center of the patient's pupil.

Results

After processing by Faster R-CNN, only ROIs with de-identified primary gaze photographs were cropped. The proportion of labels in the training and validation data sets is described in Table 1. A training course was performed before grading. The three resident ophthalmologists achieved an unweighted kappa value of 0.772 in a test set (20 normal and 20 referable horizontal strabismus photographs, respectively).

Table 1.

Summary of SCH and JSIEC Data Set for Development and Validation of Deep Learning Models

	SCH Data Set		JSIEC Data Set
	Training Dataset	Validation Dataset	External Validation Set
Referable horizontal strabismus	2558	639	133
Orthotropic	3064	765	144
Total	5622	1404	277

Summary of SCH and JSIEC Data Set for Development and Validation of Deep Learning Models All three DL models achieved excellent performance to detect referable horizontal strabismus in primary gaze photographs. Using 5-fold cross-validation during training, the average AUCs of the DL models were 0.993 (95% confidence interval [CI] = 0.989–1.000) with the InceptionV3 model, 0.993 (95% CI = 0.989–1.000) with the VGG16 model (Supplementary Figs. 7, 8), and 0.991 (95% CI = 0.986–1.000) with the Xception model (Supplementary Figs. 9, 10). Considering the similar results, we only used the InceptionV3 model for further experiments in the current study. Figures 3 and 4 show the training performance, the mean AUCs, and the confusion matrix of the DL model using 5-fold cross-validation during training. In the JSIEC validation dataset, an accuracy of 0.95 (95% CI = 0.92–0.97) with a sensitivity of 0.94 (95% CI = 0.92–0.97) and a specificity of 0.99 (95% CI = 0.98–1.00) was obtained for the classification of referable horizontal strabismus by the DL algorithm (Table 2). The diagnostic performance of the DL algorithm compared with the resident ophthalmologists is shown in Table 2 and Figure 5. The DL algorithm outperformed any ophthalmologist whose sensitivity and specificity point falls below the ROC curve of the DL algorithm (see Fig. 5).

Figure 3.

Figure 4.

Mean receiver operating characteristic curves (A) and mean confusion matrix of the DL model for detecting referable horizontal strabismus. The performance was evaluated using fivefold cross-validation.

Table 2.

The Diagnostic Performance of DL_Model and Human Graders Testing in JSIEC Validation Data Set

	Accuracy (95% CI)	Specificity (95% CI)	Sensitivity (95% CI)
Deep learning models	0.968 (0.947 to 0.989)	0.993 (0.983 to 1.000)	0.940 (0.919 to 0.968)
Human graders
Ophthalmologist #1	0.834 (0.790 to 0.878)	0.847 (0.805 to 0.889)	0.820 (0.775 to 0.865)
Ophthalmologist #2	0.848 (0.806 to 0.890)	0.868 (0.828 to 0.908)	0.827 (0.782 to 0.872)
Ophthalmologist #3	0.809 (0.763 to 0.855)	0.778 (0.729 to 0.827)	0.842 (0.799 to 0.885)

95% CI, 95% confidence interval.

Figure 5.

Performance of the DL model and ophthalmologists for detecting referable horizontal strabismus in the JSIEC validation dataset (A). Confusion matrix of the DL model in the same testing dataset (B).

Plot showing the performance of the DL model. (A) The training accuracy and loss are plotted against epochs in the fivefold cross-validation. (B) Validation accuracy and loss are plotted against epochs in the fivefold cross-validation. Mean receiver operating characteristic curves (A) and mean confusion matrix of the DL model for detecting referable horizontal strabismus. The performance was evaluated using fivefold cross-validation. The Diagnostic Performance of DL_Model and Human Graders Testing in JSIEC Validation Data Set 95% CI, 95% confidence interval. Performance of the DL model and ophthalmologists for detecting referable horizontal strabismus in the JSIEC validation dataset (A). Confusion matrix of the DL model in the same testing dataset (B). Figures 6A and 6B presents a referable horizontal strabismus (left exotropia) photograph superimposed on its corresponding CAM (see Fig. 6B) created by the DCNN. The DL algorithms accurately identified the left eye areas in the photographs. Figures 6C and 6D demonstrate a normal case classified by the DCNN. This child has epicanthus (Fig. 6C), which is a common cause of pseudostrabismus. Figures 6E and 6F show a failure case misclassified by the DCNNs. There are vague reflection areas in the corneal area (see Fig. 6F). The reasons for failure classification cases in all data sets are provided in Table 3. The most common reasons for failure classification were off-center (the child's eye cannot be centrally displayed due to a head tilt) and poor image quality (images with weak corneal light reflection).

Figure 6.

Examples of primary gaze photographs. (A) Presents a case of referable horizontal strabismus (left exotropia) superimposed with its corresponding CAM created by the DCNN. The DL algorithms accurately identified the left eye areas in the photographs. (B) Demonstrates a normal case classified by the DCNN. This child has epicanthus, which is a common cause of pseudostrabismus. (C) Shows a failure case misclassified by the DCNN. There are vague reflection areas in the corneal region.

Table 3.

The Proportion of Reasons for Misclassification by the Deep Learning Model in both SCH and JSIEC Data Sets

Reason	No. (%)
Off-center of child's eye	180 (49.1)
Poor image quality	134 (36.5)
Others	53 (14.4)
Total	367 (100)

Discussion

The traditional clinical assessment of strabismus, such as with the Hirschberg and Krimsky test, is labor-intensive and needs to be performed by experts. Recently, we have proposed a DL model to automate the detection of strabismus using social media photographs. Despite the promising results, the previous DL model could not be substituted for ophthalmologists because the study subjects were not assessed by the Hirschberg and Krimsky test, which is the gold standard for measuring binocular alignment. In this study, using clinical evaluation as the reference, we developed and validated a DL model for automated screening of referable horizontal strabismus in children. We further validated our DL model in two different external data sets, and its performance is better than that of the human experts. To our knowledge, this is the first DL model for the screening of strabismus in children. Therefore, this study may constitute a useful baseline for applying DL to assist in detecting strabismus in the future. Accurate, consistent assessment of binocular misalignment is particularly crucial for detecting strabismus. Compared with the previous studies, our study had a unique advantage: it allows us to train entirely end-to-end learning instead of recognizing an output category from manual or hand-engineered features. In medical image analysis, DL has achieved excellent performance exceeding traditional diagnosis by human experts, including diabetic retinopathy, age-related macular degeneration, and glaucoma.– Our DL algorithm also demonstrated high accuracy with diagnostic performance better than that of resident ophthalmologists. Moreover, we further evaluated it using an external data set established from different hospitals. Our DL model still achieved high performance, with an AUC of 0.997 with 94% sensitivity and 99.3% specificity. In the present study, nearly half of the misclassified cases (49.1%) were off-center (the child's eye cannot be centrally displayed due to a head tilt). The second common reason for failure of correct classification was a poor image quality (images had a weak corneal light reflex). Therefore, our method can be considered a reliable tool for referable horizontal strabismus screening, which does not rely on expert input. This advantage offers our DL model an opportunity to be applied in many scenarios, such as telescreening programs, in a cost-effective and time-efficient manner. Our study has some limitations. First, the potential for selection bias exists because most of the patients with strabismus were children who underwent surgery. Therefore, the screening performance of the DL model is less accurate in regard to variance ranges of angular deviations, such as intermittent exotropia or nystagmus. Second, our DL model was only developed among children of Chinese ethnicity; whether these results can be generalized to other populations of different ages remains to be seen. Third, the patients did not wear spectacles during photography. The angle of strabismus might be changed with spectacle-wear; thus, our DL model may not achieve good performance if glasses are prescribed. The DL networks would require images with visually noticeable features to work correctly. We excluded 504 (6.7%) of images from the raw data set, as ophthalmologists cannot visually detect strabismus from poor-quality images. With the advance of hardware and software, it is possible to design a platform to capture a good quality image and achieve better performance of a strabismus screening program. In conclusion, we demonstrated an automated DL model that can achieve high accuracy in screening referable horizontal strabismus from orthotropic based on primary gaze photographs using clinical assessment as a reference. Additional studies to determine the generalizability of the DL model, as well as its usefulness and potential cost savings in the clinical setting, seem to be warranted.

23 in total

1. Clinical measurement of the angle of ocular movements in the nine cardinal positions of gaze.

Authors: Han Woong Lim; Dong Eik Lee; Jung Wook Lee; Min Ho Kang; Mincheol Seong; Hee Yoon Cho; Jae-Eung Oh; Sei Yeul Oh
Journal: Ophthalmology Date: 2014-01-10 Impact factor: 12.079

2. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.

Authors: Shaoqing Ren; Kaiming He; Ross Girshick; Jian Sun
Journal: IEEE Trans Pattern Anal Mach Intell Date: 2016-06-06 Impact factor: 6.226

3. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs.

Authors: Varun Gulshan; Lily Peng; Marc Coram; Martin C Stumpe; Derek Wu; Arunachalam Narayanaswamy; Subhashini Venugopalan; Kasumi Widner; Tom Madams; Jorge Cuadros; Ramasamy Kim; Rajiv Raman; Philip C Nelson; Jessica L Mega; Dale R Webster
Journal: JAMA Date: 2016-12-13 Impact factor: 56.272

4. Computational principles in Purkinje I and IV reflection pattern evaluation for the assessment of ocular alignment.

Authors: J C Barry; R Effert; M Reim; D Meyer-Ebrecht
Journal: Invest Ophthalmol Vis Sci Date: 1994-12 Impact factor: 4.799

5. The psychosocial benefits of corrective surgery for adults with strabismus.

Authors: S Jackson; R A Harrad; M Morris; N Rumsey
Journal: Br J Ophthalmol Date: 2006-07 Impact factor: 4.638

6. Prevalence of amblyopia and strabismus in young singaporean chinese children.

Authors: Audrey Chia; Mohamed Dirani; Yiong-Huak Chan; Gus Gazzard; Kah-Guan Au Eong; Prabakaran Selvaraj; Yvonne Ling; Boon-Long Quah; Terri L Young; Paul Mitchell; Rohit Varma; Tien-Yin Wong; Seang-Mei Saw
Journal: Invest Ophthalmol Vis Sci Date: 2010-03-05 Impact factor: 4.799

7. Prevalence of amblyopia and strabismus in white and African American children aged 6 through 71 months the Baltimore Pediatric Eye Disease Study.

Authors: David S Friedman; Michael X Repka; Joanne Katz; Lydia Giordano; Josephine Ibironke; Patricia Hawse; James M Tielsch
Journal: Ophthalmology Date: 2009-09-16 Impact factor: 12.079

8. Strabismus, Strabismus Surgery, and Reoperation Rate in the United States: Analysis from the IRIS Registry.

Authors: Michael X Repka; Flora Lum; Bhavya Burugapalli
Journal: Ophthalmology Date: 2018-05-18 Impact factor: 12.079

9. Self-assessment of angles of strabismus with photographic Purkinje I and IV reflection pattern evaluation.

Authors: R Effert; J C Barry; R Colberg; A Kaupp; G Scherer
Journal: Graefes Arch Clin Exp Ophthalmol Date: 1995-08 Impact factor: 3.117

10. The accuracy of experienced strabismologists using the Hirschberg and Krimsky tests.

Authors: R Y Choi; B J Kushner
Journal: Ophthalmology Date: 1998-07 Impact factor: 12.079

3 in total

1. Automated Mathematical Algorithm for Quantitative Measurement of Strabismus Based on Photographs of Nine Cardinal Gaze Positions.

Authors: Yena Christina Kang; Hee Kyung Yang; Young Jae Kim; Jeong-Min Hwang; Kwang Gi Kim
Journal: Biomed Res Int Date: 2022-03-24 Impact factor: 3.411

2. An improved strabismus screening method with combination of meta-learning and image processing under data scarcity.

Authors: Xilang Huang; Sang Joon Lee; Chang Zoo Kim; Seon Han Choi
Journal: PLoS One Date: 2022-08-05 Impact factor: 3.752

3. An automatic screening method for strabismus detection based on image processing.

Authors: Xilang Huang; Sang Joon Lee; Chang Zoo Kim; Seon Han Choi
Journal: PLoS One Date: 2021-08-03 Impact factor: 3.240

3 in total