Literature DB >> 34077760

Large-scale machine-learning-based phenotyping significantly improves genomic discovery for optic nerve head morphology.

Babak Alipanahi¹, Farhad Hormozdiari², Babak Behsaz², Justin Cosentino³, Zachary R McCaw³, Emanuel Schorsch³, D Sculley², Elizabeth H Dorfman³, Paul J Foster⁴, Lily H Peng³, Sonia Phene³, Naama Hammel³, Andrew Carroll³, Anthony P Khawaja⁵, Cory Y McLean⁶.

Abstract

Genome-wide association studies (GWASs) require accurate cohort phenotyping, but expert labeling can be costly, time intensive, and variable. Here, we develop a machine learning (ML) model to predict glaucomatous optic nerve head features from color fundus photographs. We used the model to predict vertical cup-to-disc ratio (VCDR), a diagnostic parameter and cardinal endophenotype for glaucoma, in 65,680 Europeans in the UK Biobank (UKB). A GWAS of ML-based VCDR identified 299 independent genome-wide significant (GWS; p ≤ 5 × 10-8) hits in 156 loci. The ML-based GWAS replicated 62 of 65 GWS loci from a recent VCDR GWAS in the UKB for which two ophthalmologists manually labeled images for 67,040 Europeans. The ML-based GWAS also identified 93 novel loci, significantly expanding our understanding of the genetic etiologies of glaucoma and VCDR. Pathway analyses support the biological significance of the novel hits to VCDR: select loci near genes involved in neuronal and synaptic biology or harboring variants are known to cause severe Mendelian ophthalmic disease. Finally, the ML-based GWAS results significantly improve polygenic prediction of VCDR and primary open-angle glaucoma in the independent EPIC-Norfolk cohort.

Entities: Chemical

Keywords: GWAS, phenotyping, machine learning, glaucoma

Mesh：

Year: 2021 PMID： 34077760 PMCID： PMC8322934 DOI： 10.1016/j.ajhg.2021.05.004

Source DB: PubMed Journal: Am J Hum Genet ISSN： 0002-9297 Impact factor: 11.025

Introduction

Genome-wide association studies (GWASs) require accurate phenotyping of large cohorts, but expert phenotyping can be costly and time intensive. On the other hand, self-reported phenotyping, while cost-effective and often insightful, can be inaccurate for nuanced phenotypes such as osteoarthritis or infeasible to obtain for complex quantitative phenotypes. Population-scale biobanks, such as the UK Biobank (UKB) and Biobank Japan, that contain genomics, biomedical data, and health records for hundreds of thousands of individuals provide opportunities to study complex disorders and traits. GWASs of individual blood- and urine-based biomarkers, which can be assayed accurately with high throughput, have shed light on disease etiology., Advances in deep learning have enabled the extraction of medically relevant features from high-dimensional data, such as using cardiac magnetic resonance imaging to infer cardiac and aortic dimensions, color fundus photographs to detect glaucoma risk, and optical coherence tomography images to predict age-related macular degeneration progression. Using medically relevant features extracted from biobank data by machine learning (ML) models as GWAS phenotypes provides an opportunity to identify genetic signals influencing these traits. For example, Glastonbury et al. trained an ML model to predict mean adipocyte areas from histology images and used the predictions to perform a GWAS, doubling the cohort size in comparison to similar studies. Here, we propose training an ML model to automatically phenotype a large cohort for genomic discovery. The proposed paradigm has two phases: in the “model training” phase, a database of expert-labeled samples (for which genomics data are not required) is used to train and validate a phenotype prediction model (Figure 1A); in the “model application” phase, the model is applied to biobank data to predict phenotypes of interest, which are then analyzed for genomic associations (Figure 1B). This paradigm has several advantages. First, model application is scalable and efficient. Second, a single model can predict multiple phenotypes simultaneously. Third, the model can be applied retrospectively to existing data, resulting in new phenotypes or more accurate predictions for the existing phenotypes. Fourth, multiple lines of evidence can be integrated to predict a single phenotype, which would be prohibitively expensive if performed manually.

Figure 1

ML-based phenotyping concept and its application to VCDR

(A) “Model training” phase in which a phenotype prediction model is trained with expert-labeled data.

(B) “Model application” phase in which the validated phenotype prediction model is applied to new, unlabeled data followed by genomic discovery.

(D) Schematic of the multi-task ensemble model used in phenotype prediction.

(E–H) Scatterplots of the ML-based VCDR versus expert-labeled VCDR values for the train (E), tune (F), test (G), and UK Biobank (H) datasets. Number of grades per image is shown in parentheses.

ML-based phenotyping concept and its application to VCDR (A) “Model training” phase in which a phenotype prediction model is trained with expert-labeled data. (B) “Model application” phase in which the validated phenotype prediction model is applied to new, unlabeled data followed by genomic discovery. (C) Definition of vertical cup-to-disc ratio (VCDR) in a real fundus image. (D) Schematic of the multi-task ensemble model used in phenotype prediction. (E–H) Scatterplots of the ML-based VCDR versus expert-labeled VCDR values for the train (E), tune (F), test (G), and UK Biobank (H) datasets. Number of grades per image is shown in parentheses. As a proof of concept, we investigate predicting glaucoma-related features from fundus images and performing genomic discovery on the predicted features. Glaucoma is an optic neuropathy that results from progressive retinal ganglion cell degeneration and is the leading cause of irreversible blindness globally, affecting more than 80 million people worldwide. Moreover, glaucoma is one of the most heritable common human diseases, with heritability estimates of 70%, and there is evidence for effective genomic risk prediction., The hallmark diagnostic feature of glaucoma is optic disc cupping. The vertical cup-to-disc ratio (VCDR; Figure 1C), a quantitative indicator for optic nerve head morphology and a frequently reported quantitative measure of cupping, is an important endophenotype of glaucoma.18, 19, 20, 21 With the advent of very large biobank studies and routine retinal imaging in community optometric practices, there is huge potential for furthering our understanding of glaucoma through population-level analysis of VCDR. However, human grading of optic disc images to ascertain VCDR is costly and extremely resource intensive at large scale because it requires expert knowledge and deciphering the optic cup margin is challenging. Here, we developed an ML model using 81,830 non-UKB, ophthalmologist-labeled fundus images to predict image gradability, VCDR, and referable glaucoma risk. We used the model to predict VCDR in 65,680 UKB participants of European ancestry from 175,337 fundus images. We then performed a GWAS on the ML-based VCDR phenotype (hereafter, “ML-based GWAS”) and compared the results to prior VCDR GWASs, including a recent VCDR GWAS using phenotypes derived from expert-labeled UKB fundus images. We show that ML-based phenotypes are accurate and substantially more efficient to obtain than expert-phenotyped VCDR measurements, identify novel genetic associations with plausible links to known VCDR biology, and produce more accurate polygenic risk scores for predicting VCDR in an independent population.

Methods

Model training and validation

We followed the procedure described previously by Phene et al., modifying only to remove all UKB images. Briefly, we used 81,830 color fundus images from AREDS, EyePACS (see web resources), Inoveon (see web resources) from the United States, and two eye hospitals in India (Narayana Nethralaya and Sankara Nethralaya). Ethics review and institutional review board exemption was obtained via Quorum Review Institutional Review Board. We trained ten independent multi-task Inception v3 deep convolutional neural networks on the fundus images, using weights learned from the Image Net dataset as pre-trained weights for the convolutional layers. For each of the ten models, a different random seed, which randomly changes the ordering of the training data and selection of mini-batches, the random initialization of the last layers of the neural network, and random image augmentation and dropout patterns, was used. Furthermore, we performed image augmentation and early stopping based on mean squared error (MSE) for predicting VCDR on the tune dataset for picking the best model. The final prediction model is the average prediction of the ten models in the ensemble.

Phenotype calling in the UK Biobank cohort

We included UKB participants with color fundus images. After making predictions for 175,337 images, 21,400 were predicted to be ungradable and were removed. Individual-level VCDR values were computed as the average per-eye VCDR within a single visit, with preference for the initial visit (supplemental information).

Genome-wide association study

We used BOLT-LMM v.2.3.4, to examine associations between genotype and ML-based VCDR in European individuals in UKB by using the –lmm parameter to compute the Bayesian mixed model statistics. We used all genotyped variants with minor allele frequency > 0.001 to perform model fitting and heritability estimation. We performed rank-based inverse normal (INT) transformation to the ML-based VCDR phenotype to increase the power for association discovery. Finally, in our association study, we used sex, age at visit, visit number (i.e., 1 or 2 to indicate visit 1 or visit 2), number of eyes used to compute VCDR, genotyping array indicator, refractive error, average gradability scores of all fundus images included for each participant, and the top 15 genetic principal components as covariates.

Detecting independent genome-wide significant loci

Genome-wide significant (GWS; p ≤ 5 × 10−8) lead SNPs, independent at R2 = 0.1, were identified via PLINK’s –clump command (see web resources). The reference panel for linkage disequilibrium (LD) calculation contained 10,000 unrelated subjects of European ancestry from the UKB. Loci were formed around lead SNPs on the basis of the span of reference panel SNPs in LD with the lead SNPs at R2 ≥ 0.1. Loci separated by fewer than 250 kb were subsequently merged.

SNP-heritability estimates for ML-based VCDR

We computed the SNP heritability for ML-based VCDR by applying stratified LD score regression on the VCDR GWAS summary statistics while using the 75 baseline LD annotations provided by S-LDSC authors (see web resources).

Replication of existing loci

Loci for ML-based VCDR and comparator studies were formed as described above, and the common reference panel of 10,000 randomly selected unrelated subjects from the UKB. Replication was assessed via the proportion of ML-based VCDR loci that overlapped with comparators and the proportion of comparator loci that overlapped with the ML-based VCDR loci. Thus, replication required that both studies had a GWS variant within a common genomic region, although not necessarily the same variant. Loci reaching GWS in the ML-based VCDR but not identified in any comparator GWASs of VCDR analyzed here are hereafter referred to as “novel loci.”

Mendelian randomization and mediation analyses

We performed two sample Mendelian randomization analysis, implemented via TwoSampleMR (see web resources), to examine the causal association between intraocular pressure (IOP), as assessed by Khawaja et al., and ML-based VCDR. Per-SNP associations were meta-analyzed via Egger regression. We performed mediation analysis to estimate the association between ML-based VCDR and glaucoma, as assessed by Gharahkhani et al. Mendelian randomization is in fact a special case of mediation analysis in which the instrumental variables (here, SNPs) have no effect on the outcome (here, glaucoma) other than through the mediator (here, ML-based VCDR). Our mediation analysis differs from Mendelian randomization in that, because limited availability of summary statistics from Gharahkhani et al., the SNP set was defined on the basis association with the mediator (ML-based VCDR) rather than the outcome (glaucoma). Among the 118 independent, significant glaucoma SNPs identified by Gharahkhani et al., 116 remained after harmonizing with VCDR. To account for probable direct effects of the candidate SNPs on glaucoma odds, for example via IOP, we again meta-analyzed the per-SNP associations via Egger regression.

VCDR polygenic risk score

We developed two polygenic risk scores (PRSs) by using the pruning and thresholding (P+T) and elastic net methods. The UKB test cohort was graded with the same guidelines used in grading other datasets used in this study. The HRT-derived VCDR was examined and, for participants with good quality scans in both eyes, the mean value of right and left eyes was considered, as previously described. Genotyping was carried out on the Affymetrix UK Biobank Axiom array, as previously described. In the P+T model, we used a set of variants common to the UKB and EPIC-Norfolk cohorts. EPIC-Norfolk’s imputation was performed with the HRC v.1 panel and excludes indels; thus, to harmonize the variants, we filtered out variants from Craig et al. and our ML-based GWAS not present in EPIC-Norfolk. This resulted in 58 variants from the 76 reported variants from the Craig et al. GWAS (i.e., 18 variants were dropped) and 282 of the 299 variants from our ML-based GWAS (i.e., 17 fewer variants). In the elastic model, we used the ML-predicted VCDR as the target label from the 62,969 UKB training samples to train the elastic model. For Craig et al., we used 76 variants that included the 58 variants from the P+T model and 18 additional proxy variants that are in high LD (R2 ≥ 0.6) with the 18 variants dropped from the Craig et al. P+T model. The same set of 282 variants used in P+T was used for the ML-based model. We performed 5-fold cross validation and used the L1-penalty ratios of [0.1, 0.5, 0.7, 0.9, 0.95, 0.99, 1.0].

Glaucoma liability conditional analysis

We defined glaucoma risk liability as the logit transform of the highest-level of ML-based glaucoma probability (“likely glaucoma”; supplemental information) aswhere p and g denote ML-based glaucoma risk probability and liability, respectively. We performed conditional analysis on ML-based glaucoma risk liabilities by using BOLT-LMM conditional on ML-based VCDR. In this conditional analysis, we additionally adjusted for the same covariates used in the primary ML-based VCDR GWAS.

Glaucoma subtypes prediction in the EPIC-Norfolk cohort

We analyzed 5,868 participants from the EPIC-Norfolk Eye Study cohort who were genotyped via the Affymetrix UK Biobank Axiom array, met inclusion criteria and quality control, and had scanning laser ophthalmoscopy VCDR measurements (supplemental information). Included participants had a mean age of 68 years (SD = 7.7, range 48–90), 55% were women, and the mean VCDR was 0.34 (SD = 0.23). Of the 5,868 samples, 175 were classified as primary open-angle glaucoma (POAG) cases (see supplemental information for detailed POAG criteria), of which 98 were classified as high tension glaucoma (HTG; IOP > 21 mmHg) and 77 as normal tension glaucoma (NTG; IOP ≤ 21 mmHg) on the basis of the corneal-compensated IOP at the Eye Study assessment. Pre-treatment IOP was imputed by dividing by 0.7 for participants using glaucoma medication at the time of assessment, as previously described. We extracted age, sex, POAG status, NTG status, and HTG status from all 5,868 samples. We fitted independent logistic regression models to predict POAG, HTG, and NTG statuses by using VCDR PRS, age, and sex as predictors. We considered both the ML-based elastic net VCDR PRS and the Craig et al. elastic net PRS described above.

Results

Overview of the ML-based phenotyping method

We used 81,830 fundus images graded by a panel of experts that passed our labeling guideline assessment (supplemental information) to train a phenotype prediction model that jointly predicts image gradability, VCDR, and referable glaucoma risk (Figure 1D). We split these images into “train,” “tune,” and “test” sets; training images were graded by one to two eye care providers with varied expertise, while images in the two latter sets were each graded by three glaucoma specialist experts. We benchmarked model performance on all data splits (Figures 1E–1G; Table S1). On the test set of 1,076 test images, the model achieved a Pearson’s correlation of R = 0.91 between predicted and graded VCDR (95% confidence interval [CI] = 0.90–0.92) and root mean square error (RMSE) of 0.079 (95% CI = 0.074–0.085). Additionally, we validated model generalizability on 2,115 UKB fundus images each graded by two to three experts (hereafter, “UKB test set”), which achieved similar predictive performance to the test set (Figure 1H; R = 0.89, 95% CI = 0.88–0.90; RMSE = 0.092, 95% CI = 0.088–0.096; Table S1). We also validated that the model generalizes across ancestries in a larger set of 4,816 UKB fundus images with at least one manual grade (Figure S1).

ML-based GWAS replicates a manual phenotyping VCDR GWAS and discovers 93 additional novel loci

We applied the VCDR prediction model to the entire set of 175,337 UKB fundus images. Most images were either predicted to be easily gradable (predicted gradability > 0.9) or completely unusable (predicted gradability < 0.2) (Figure S2). We classified all 21,400 images with predicted gradability < 0.7 as “ungradable.” Manual inspection of 100 randomly selected ungradable images showed they were typically completely dark, bleached white, or extremely out of focus. After removing the 21,400 ungradable images, aggregating predicted VCDR values across left and right eyes and the first and second visits for each individual, subsetting the cohort to individuals of European ancestry, and performing cohort quality control, a cohort of 65,680 individuals with VCDR phenotype remained for further analysis (supplemental information, Figures S3 and S4). To control for confounding factors (e.g., population structure) and increase power, we added age at the time of visit, sex, average image gradability, number of fundus images used in VCDR calculation, normalized refractive error, genotyping array type, and the top 15 genetic principal components as covariates. We performed the ML-based GWAS by using BOLT-LMM (supplemental information). While genomic inflation ΔGC was 1.20 (Figure S5), the stratified LD score regression-based (S-LDSC) intercept was 1.06 (SEM = 0.02), indicating that most test statistic inflation can be attributed to polygenicity rather than population structure. The SNP-based heritability in the ML-based GWAS was 0.43 (SEM = 0.03), a majority of the 56% heritability estimated for VCDR by twin and family-based studies (Asefa et al., 2019). The ML-based GWAS identified 299 independent genome-wide significant (GWS) hits (R2 ≤ 0.1, p ≤ 5 × 10−8) at 156 independent GWS loci after merging hits within 250 kb together (Figure 2A, Tables S2 and S3). Based on sum of single effects regression, the number of causal variants within the 156 independent GWS loci was conservatively estimated at 813 (supplemental information; Tables S4 and S5).

Figure 2

ML-based VCDR GWAS results and comparison to known associations

(A) Manhattan plot depicting ML-based VCDR-associated GWAS p values from the BOLT-LMM analysis. There are 156 GWS (genome-wide significant) loci, representing 299 independent (R2 = 0.1) GWS hits. For each locus, the closest gene is shown. Blue gene names and dots indicate loci also identified in the Craig et al. study and red dots and black gene names indicate novel loci. The dashed red line denotes the GWS p value, 5 × 10−8.

(B) Venn diagram of loci overlap for three VCDR GWASs. ML-based GWAS replicates all 22 loci of the IGGC VCDR meta-analysis and 62 of 65 loci identified by Craig et al., while discovering 93 novel loci (supplemental information).

(C) Effect sizes for the 73 GWS hits shared by the Craig et al. and ML-based VCDR GWAS. The three Craig et al. hits not included failed the ML-based GWAS QC (rs61952219 for low imputation quality and rs7039467 and rs146055611 for violating Hardy-Weinberg equilibrium). Blue and red dots denote the SNP’s being more significant in the ML-based and Craig et al. GWAS, respectively. Error bars depict standard errors. The banding in Craig et al. effect sizes is due to large effect sizes’ being reported in multiples of 0.01. The blue line is the best fit line and the shaded area shows the 95% confidence interval.

ML-based VCDR GWAS results and comparison to known associations (A) Manhattan plot depicting ML-based VCDR-associated GWAS p values from the BOLT-LMM analysis. There are 156 GWS (genome-wide significant) loci, representing 299 independent (R2 = 0.1) GWS hits. For each locus, the closest gene is shown. Blue gene names and dots indicate loci also identified in the Craig et al. study and red dots and black gene names indicate novel loci. The dashed red line denotes the GWS p value, 5 × 10−8. (B) Venn diagram of loci overlap for three VCDR GWASs. ML-based GWAS replicates all 22 loci of the IGGC VCDR meta-analysis and 62 of 65 loci identified by Craig et al., while discovering 93 novel loci (supplemental information). (C) Effect sizes for the 73 GWS hits shared by the Craig et al. and ML-based VCDR GWAS. The three Craig et al. hits not included failed the ML-based GWAS QC (rs61952219 for low imputation quality and rs7039467 and rs146055611 for violating Hardy-Weinberg equilibrium). Blue and red dots denote the SNP’s being more significant in the ML-based and Craig et al. GWAS, respectively. Error bars depict standard errors. The banding in Craig et al. effect sizes is due to large effect sizes’ being reported in multiples of 0.01. The blue line is the best fit line and the shaded area shows the 95% confidence interval. To understand the influence of training dataset size on model performance and GWAS results, we retrained the ML model with as little as 10% of the full training set. Performance curves indicate that using fewer than 8,000 training images achieved a Pearson’s correlation R = 0.83 (95% CI = 0.81–0.84) on the UKB test set, identified 131 GWS loci, and replicated 123 of the 156 loci identified in the full model (Figures S6 and S7). An analysis of the implications of phenotyping accuracy on genomic discovery suggested that the difference in power for the model trained with 10% of the training data and the model trained with all data would maximally reach 15% (Figure S8). Next, we compared the ML-based GWAS results with those from the two largest existing VCDR GWASs. First, we compared with the VCDR meta-analysis from the International Glaucoma Genetics Consortium (IGGC) in 23,899 Europeans for which all summary statistics are publicly available (see web resources). The ML-based GWAS replicated all 22 GWS loci and exhibited strong genetic correlation (0.95, SEM = 0.03, p = 2.1 × 10−167) with the IGGC GWAS (Figure 2B, Table 1), and effect size regression analysis showed a slope significantly different from zero (slope = 0.983, SEM = 0.041, p = 1 × 10−61) and indistinguishable from one (p = 0.67; Figure S9; supplemental information). Second, we compared with a GWAS on 67,040 manually phenotyped UKB fundus images for which only the independent genome-wide significant SNPs are publicly available. The ML-based GWAS replicated 62 out of 65 GWS loci with very similar estimated effect sizes (Figures 2B and 2C, Table 1) and more significant p values (Figure S10). The p values and effect sizes of the novel loci are shown in Figure S11. The three loci not replicated at the GWS level in the ML-based GWAS were all Bonferroni-replicated (adjusting for 65 tests), and p values ranged from 5.5 × 10−8 to 6.6 × 10−5. Third, we compared our results with a meta-analysis of the Craig et al. and IGGC VCDR GWASs. The ML-based GWAS replicated 82 of the 90 loci at GWS level, and the remaining eight loci were Bonferroni-replicated with and had p values ranging from 1.4 × 10−7 to 6.6 × 10−5 (Table 1).

Table 1

Replicated loci of ML-based VCDR GWASs and meta-analysis at GWS level

Discovery GWAS details			Number of loci replicated in ML-basedVCDR GWAS	Number of loci replicated in ML-based + IGGCVCDR GWAS	S-LDSC-based genetic correlation with ML-based VCDR
Study (phenotype)	Number of participants	Loci	Number of loci replicated in ML-basedVCDR GWAS	Number of loci replicated in ML-based + IGGCVCDR GWAS	S-LDSC-based genetic correlation with ML-based VCDR
ML-based (VCDR)	65,680	156	–	151	–
ML-based 10% (VCDR)	65,044	131	123	125	0.99 (2.1 × 10⁻³)
ML-based + IGGC²⁰ (VCDR)	89,579	189	151	–	0.97 (2.6 × 10⁻³)
IGGC²⁰ (VCDR)	23,899	22	22	22	0.95 (0.03)
Craig et al.¹⁷ (VCDR)	67,040	65	62	63	N/A
Craig et al.¹⁷ + IGGC²⁰ (VCDR)	90,939	90	82	85	N/A
Khawaja et al.¹⁶ (IOP)	139,555	107	14	22	0.19 (0.02)
Gharahkhani et al.³² (POAG)	383,500	118	32	40	N/A

“ML-based 10% (VCDR)” denotes the GWAS performed on VCDR predictions of the ML model trained with only 10% of the training data. “ML-based + IGGC (VCDR)” denotes meta-analysis of ML-based and IGGC VCDR GWAS. Likewise, “Craig et al. + IGGC (VCDR)” denotes meta-analysis of Craig et al. VCDR and IGGC VCDR GWAS. Genetic correlation was only computed when the full set of summary statistics were available.

Replicated loci of ML-based VCDR GWASs and meta-analysis at GWS level “ML-based 10% (VCDR)” denotes the GWAS performed on VCDR predictions of the ML model trained with only 10% of the training data. “ML-based + IGGC (VCDR)” denotes meta-analysis of ML-based and IGGC VCDR GWAS. Likewise, “Craig et al. + IGGC (VCDR)” denotes meta-analysis of Craig et al. VCDR and IGGC VCDR GWAS. Genetic correlation was only computed when the full set of summary statistics were available. Finally, we performed a meta-analysis of our ML-based GWAS with the IGGC VCDR GWAS, which resulted in 189 GWS loci (supplemental information; Table 1 and Tables S6 and S7). This ML-based meta-analysis replicated 63 out of 65 of Craig et al.’s discovery GWAS and 85 out of 90 Craig et al.’s meta-analysis at GWS level (Table 1). Taken together, these comparisons demonstrate that the ML-based GWAS accurately identifies known VCDR associations and additionally identifies over 90 novel loci (Figure 2B, Table S8), substantially increasing our understanding of the genetic underpinnings of this complex trait. To assess the biological plausibility of the novel loci identified in the ML-based GWAS, we compared gene set enrichment analyses of the 156 ML-based loci to those of the 65 Craig et al. loci by using FUMA. Nine eye-related gene sets were significantly enriched in both sets of loci. The enrichment odds ratios (ML-based enrichment over Craig et al. enrichment) of all nine gene sets were greater than one, suggesting improved identification of functionally relevant pathways in the ML-based loci (Figure S12). To assess effects of distal cis-regulatory interactions, we also performed enrichment analyses of the 156 ML-based loci and the 65 Craig et al. loci by using GREAT. Consistent with the FUMA results, the ML-based loci were more significantly enriched than the Craig et al. loci across all tested ontologies (Figure S13). The ML-based loci were significantly enriched for 22 gene sets, the majority of which are developmental and seven of which are eye related (Table S9). In contrast, the Craig et al. loci were significantly enriched for only three gene sets; two of these are eye-related sets that were also enriched in the ML-based results (Table S9). Lastly, we performed a phenome-wide association study (PheWAS) over all 299 independent GWS hits by using OpenTargets (web resources). OpenTargets reported 62,753 (variant, phenotype) pairs that were nominally significant (p ≤ 0.05); after Bonferroni correction, 974 pairs were significant (supplemental information). We observed that 314 of the 974 significant pairs belonged to the “anthropometric measurement” trait category, while the “eye measurement” category had 101 pairs (Table S10).

Biological significance of select novel VCDR-associated loci

Several of the VCDR-associated loci discovered in this study are known to be associated with intraocular pressure (IOP), including rs1361108 near CENPW, rs2570981 in SNCAIP, rs6999835 near PKIA, and rs351364 in WNT2B. This suggests that a proportion of the genetic variation in VCDR is mediated via IOP and pathophysiological processes affecting the anterior segment of the eye, consistent with IOP’s being a strong risk factor for glaucoma. Indeed, we observed that 13% (14 of 107) of the GWS loci from the latest IOP meta-analysis were GWS in the ML-based VCDR GWAS. In addition, the overall genetic correlation between our ML-based VCDR GWAS and the IOP GWAS meta-analysis is 0.19 (SEM = 0.02, p = 5.5 × 10−15), indicating that VCDR is partially explained by IOP. Moreover, a Mendelian randomization (MR) analysis followed by Egger regression suggests that IOP has a strong directional association with ML-based VCDR: the regression intercept does not differ significantly from zero (intercept = 0.001, SE = 0.002, p = 0.7), but the slope does (slope = 0.072, SE = 0.020, p = 4 × 10−4). The reverse analysis provided no evidence for a directional association between ML-based VCDR and IOP (supplemental information; Figure S14). VCDR is an objective quantification of the proportion of neuronal tissue at the head of the optic nerve (Figure 1C). Interestingly, several VCDR-associated loci discovered in this study encompass genes involved in neuronal and synaptic biology, and thus may influence VCDR via direct effects on the retina and optic nerve rather than via IOP. NCKIPSD (rs7633840) is involved in the formation and maintenance of dendritic spines, and modulates synaptic activity in neurons. CPLX4 (rs77759734) is required for the maintenance of synaptic ultrastructure in the adult retina. MARK2 (rs199826712) has roles in neuronal cell polarity and the regulation of neuronal migration. These loci complement additional neuronal loci also discovered by Craig et al.; some notable examples include MYO16 (rs10162202), TRIM71 (rs56131903), and FLRT2 (rs1289426). An increase in VCDR may be due not only to loss of retinal ganglion cell neurons but also loss of neural supporting tissue, such as glial cells. One of our novel VCDR-associated loci is an indel on chromosome 8 (chr8: 131,606,303_CTGTT_C), near ASAP1; this locus has been associated with glioma, suggesting glial cells as potential mediators of the VCDR association. Several genes at the novel VCDR-associated loci harbor mutations that cause severe Mendelian ophthalmic disease. Here, for the first time, we report common variants at these genes that are associated with VCDR variation at a population level. Three of our novel loci are at ADAMTSL3 (rs59199978), PITX2 (rs2661764), and FOXC1 (rs2745572), all of which are associated with syndromic ocular anterior segment dysgenesis, which in turn causes raised IOP and secondary glaucoma. ADAMTSL3 is an important paralog of ADAMTSL1—which itself is also associated with VCDR in our GWAS. A mutation in ADAMTSL1 has been reported to cause inherited anterior segment dysgenesis and secondary congenital glaucoma. Mutations in PITX2 and FOXC1 cause Axenfeld-Rieger syndrome. Common variants at these loci may mark more subtle effects on ocular anterior segment development, resulting in subclinical changes in IOP and VCDR that are apparent on a population level. While FOXC1 variants have been previously associated with glaucoma, this is the first time they have been associated with population variation in VCDR. Mutations in PRSS56, a gene at one of our novel VCDR-associated loci, cause microphthalmia in humans. Another two of our VCDR-associated loci are at EYA1 and EYA2 (eyes absent homologs 1 and 2), genes that are important for eye development in Drosophila. EYA1 has been implicated in ocular anterior segment anomalies and cataract. We also replicate some of the loci identified by Craig et al., such as ELP4, which has been associated with aniridia, a condition characterized by the absence of an iris and that can predispose patients to glaucoma.,

ML-based GWAS improves VCDR polygenic risk scores

We developed P+T and elastic net PRSs for both the ML-based VCDR GWAS and the Craig et al. GWAS (Tables S11–S14). These PRSs were evaluated in two test sets: a holdout set of 2,076 subjects from UKB with VCDR measured by two to three experts and a set of 5,868 subjects from the European Prospective Investigation into Cancer Norfolk (EPIC-Norfolk) cohort with VCDR measured by scanning laser ophthalmoscopy (HRT). Because the EPIC-Norfolk imputation was done with the HRC v.1 (Haplotype Reference Consortium) panel, which excludes indels, we subset the ML-based GWAS summary statistics to HRC v.1. For the P+T model, subsetting to HRC v.1 results in 282 hits, down from 299 original hits. With the effect sizes from the ML-based GWAS (Table S11), this model achieves a Pearson’s correlation R = 0.37 (95% CI = 0.33–40) in the UKB adjudicated cohort. The P+T model from the Craig et al. GWAS does not include 18 out of 76 SNPs (absent in HRC v.1) and achieves a Pearson’s correlation R = 0.29 (95% CI = 0.25–0.33). The performance metrics of the ML-based Craig et al. P+T models when not subset to HRC v.1 are shown in Figure S15. Performance in the EPIC-Norfolk set was slightly lower, but the P+T model still explained 9.6% of the total variance (Figure 3A). In both sets, the ML-based P+T model outperformed the Craig et al. P+T model (UKB: ΔR = 0.079, p < 0.031, n = 2,076; EPIC: ΔR = 0.082, p < 5.9 × 10−4, n = 5,868, permutation test).

Figure 3

VCDR polygenic risk score performance metrics

(A and B) Pearson’s correlations between measured VCDR values and predictions of the pruning and thresholding (P+T) (A) and the elastic net models (B) are shown for the PRS learned from ML-based and Craig et al. hits. Error bars depict 95% confidence intervals. Numbers above bars are the observed Pearson’s correlations. Indications of p value ranges (permutation test): ∗p ≤ 0.05, ∗∗p ≤ 0.01, ∗∗∗p ≤ 0.001. The Craig et al. P+T model uses 58 out of 76 hits. Measured VCDR values were obtained from adjudicated expert labeling of fundus images (UKB, n = 2,076) and scanning laser ophthalmoscopy (HRT) (EPIC-Norfolk, n = 5,868).

VCDR polygenic risk score performance metrics (A and B) Pearson’s correlations between measured VCDR values and predictions of the pruning and thresholding (P+T) (A) and the elastic net models (B) are shown for the PRS learned from ML-based and Craig et al. hits. Error bars depict 95% confidence intervals. Numbers above bars are the observed Pearson’s correlations. Indications of p value ranges (permutation test): ∗p ≤ 0.05, ∗∗p ≤ 0.01, ∗∗∗p ≤ 0.001. The Craig et al. P+T model uses 58 out of 76 hits. Measured VCDR values were obtained from adjudicated expert labeling of fundus images (UKB, n = 2,076) and scanning laser ophthalmoscopy (HRT) (EPIC-Norfolk, n = 5,868). We then used the ML-based VCDR values from UKB to train elastic net models; after removing all images used in building the adjudicated test set, the training set contained 62,969 samples. In contrast to the P+T model in which GWAS marginal effect sizes are used as PRS weights, elastic net jointly learns all weights in a supervised manner. To make up for the 18 missing Craig et al. SNPs, we identified LD-based proxies for all of the missing hits in HRC v.1 and included them in training the elastic net model. The ML-based elastic net model (Table S12) numerically improved upon the P+T model in both UKB (R = 0.38, 95% CI = 0.34–0.41) and EPIC (R = 0.33, 95% CI = 0.30–0.35) sets (Figure 3B). The elastic net model explains 14.2% and 10.6% of total VCDR variation in the UKB and EPIC-Norfolk sets, respectively. The Craig et al. elastic net model has a more pronounced improvement—probably because of the addition of proxy SNPs—but the ML-based model still significantly outperforms it (UKB: ΔR = 0.064, p < 9.6 × 10−3, n = 2,076; EPIC: ΔR = 0.053, p < 6.8 × 10−4, n = 5,868, permutation test).

Relationship of primary open-angle glaucoma and VCDR

To study the relationship between primary open-angle glaucoma (POAG) and VCDR, we defined POAG status in UKB by using a combination of self-report and hospital episode International Classification of Diseases 9/10 codes (supplemental information). ML-based VCDR has moderate predictive power for POAG with an area under the ROC curve (AUC) of 0.76 (n = 65,193, 95% CI = 0.74–0.78, POAG prevalence = 1.9%) and area under the precision-recall curve (AUPRC) of 0.14 (95% CI = 0.12–0.16). After binning individuals by ML-based VCDR, we computed odds ratios (ORs) in each bin versus the bottom bin (Figure 4A). The most extreme bin (VCDR > 0.7, n = 385), which corresponds to a diagnostic criterion for glaucoma, has an OR of 74.3 (95% CI = 57.0–94.3) versus the bottom bin (VCDR < 0.3, n = 30,752).

Figure 4

Relationship between glaucoma and VCDR

(A) Glaucoma odds ratios for each ML-based VCDR bin versus the bottom bin is shown. The fraction of individuals in each bin is shown (n = 65,193).

(B) Glaucoma odds ratios for different VCDR elastic net PRS bins versus the bottom bin for individuals with a glaucoma phenotype not used in the GWAS or developing the PRS (n = 98,151). The fractions are selected to match those from (A).

(C) A histogram of ML-based glaucoma liability versus ML-based VCDR (Pearson’s correlation R = 0.91, n = 65,680, p < 1 × 10−300).

(D) LocusZoom for the strongest associated variant (rs12913832, p = 2.2 × 10−66) in the ML-based glaucoma liability GWAS conditioned on the ML-based VCDR.

Relationship between glaucoma and VCDR (A) Glaucoma odds ratios for each ML-based VCDR bin versus the bottom bin is shown. The fraction of individuals in each bin is shown (n = 65,193). (B) Glaucoma odds ratios for different VCDR elastic net PRS bins versus the bottom bin for individuals with a glaucoma phenotype not used in the GWAS or developing the PRS (n = 98,151). The fractions are selected to match those from (A). (C) A histogram of ML-based glaucoma liability versus ML-based VCDR (Pearson’s correlation R = 0.91, n = 65,680, p < 1 × 10−300). (D) LocusZoom for the strongest associated variant (rs12913832, p = 2.2 × 10−66) in the ML-based glaucoma liability GWAS conditioned on the ML-based VCDR. We then performed mediation analysis (MA) to study the association of VCDR with glaucoma. Similar to MR, MA evaluates the association between an intermediary or mediating phenotype (here, VCDR) and an outcome phenotype (here, glaucoma). However, whereas in MR the SNP set is selected on the basis of association with the mediator, because of the limited availability of glaucoma summary statistics from the study by Gharahkhani et al., the SNP set for MA was selected on the basis of association with the outcome. Because, contrary to MR’s exclusion restriction, the included SNPs may have affected glaucoma through a pathway other than VCDR (e.g., IOP), the per-SNP estimates of association were meta-analyzed with Egger regression (Egger et al., 1997), which is robust to this assumption. The Egger slope of 5.7 (SE = 1.8, p = 3 × 10−3) differs significantly from zero, providing evidence that VCDR, as ascertained by our ML-based models, is strongly associated with the odds of glaucoma (Figure S16). We note that the Egger intercept of 0.04 also differs significantly from zero (p = 7 × 10−7), indicating the presence of directional pleiotropy; that is, variants included in the analysis, on average, were associated with an increase in the odds of POAG through a pathway other than VCDR. As shown above, VCDR is an informative endophenotype for glaucoma, and we hypothesize that its PRS should also be predictive of POAG. Indeed, 32 out of 118 loci previously associated with POAG were significantly associated with ML-based VCDR in this study. We applied the ML-based elastic net model to the UKB individuals of European ancestry that do not have fundus images (n = 98,151) to estimate their genetic VCDR. As expected, this genetic model performs noticeably worse than the model using a direct measurement of the VCDR phenotype (AUC = 0.56, 95% CI = 0.55–0.57, AUPRC = 0.07, 95% CI = 0.066–0.073, n = 98,151, POAG prevalence = 5.5%). Nonetheless, when we binned samples by VCDR elastic net PRS, participants in the highest bin (PRS Z > 2.5, n = 567) had a considerably higher POAG prevalence (OR = 3.4, 95% CI = 2.6–4.3; Figure 4B) than those in the lowest bin (PRS Z < −0.1, n = 46,136). In addition to VCDR, the ML model was trained to predict referable glaucoma risk; this model output can be interpreted as the probability a specialist would refer an individual for detailed glaucoma evaluation. Because the model output is a continuous value, we can evaluate the contribution of features other than VCDR to referable glaucoma risk by regressing out the VCDR signal. We computed glaucoma risk liability as the logit transform of the ML-based glaucoma probability, which is highly correlated with ML-based VCDR (Figure 4C, Pearson’s R = 0.91, n = 65,680, p < 1 × 10−300). While a large VCDR is the cardinal feature of a glaucomatous optic nerve, there are other features that suggest glaucoma that are difficult to quantify (e.g., bayoneting or baring of blood vessels and hemorrhages). To examine the genetic associations with glaucomatous optic disc features other than VCDR, we carried out a GWAS of ML-based glaucoma risk conditioned on ML-based VCDR by using BOLT-LMM. The observed SNP heritability was 0.062 (SEM = 0.013) with genomic inflation of 1.04 and S-LDSC-based intercept of 1.01 (SEM = 9.8 × 10−3; Figure S17) and the GWAS identified eight GWS loci (Tables S15 and S16). Interestingly, two of these loci, OCA2-HERC2 (Figure 4D; rs12913832, p = 2.2 × 10−66) and TYR (rs1126809, p = 5.8 × 10−13), have been previously associated with macular inner retinal thickness (retinal nerve fiber layer and ganglion cell inner plexiform layer) as derived from UKB optical coherence tomography images. These inner retinal parameters have diagnostic utility for glaucoma that is considered complementary to VCDR and may be particularly efficacious at detecting early glaucoma. Moreover, it is not currently possible to ascertain the thickness of the inner retina from fundus images, which are two-dimensional. Together, this suggests that ML-based phenotyping has the potential to identify glaucoma-related features from fundus images that are complementary to VCDR and not typically gradable by humans.

Glaucoma prediction in the EPIC-Norfolk cohort

To further assess the utility of the ML-based elastic net VCDR PRS for prediction of glaucoma, we classified the status of EPIC-Norfolk participants (n = 5,868) for POAG (175 cases and 5,693 controls). We additionally sub-categorized POAG cases into HTG (98 cases) and NTG (77 cases). Given the enrichment of the VCDR PRS for variants associated with neuronal development and function, we hypothesized that the PRS would be particularly associated with NTG. We fit a logistic regression model to predict POAG status by using age, sex, and ML-based elastic net VCDR PRS as its three predictors. The ML-based elastic net VCDR PRS was strikingly associated with POAG, and particularly NTG, in EPIC-Norfolk (Figure 5). The ORs (95% CI) comparing the top risk decile with the bottom decile were 9.7 (3.4–27.6) for POAG, 7.4 (2.2–25.2) for HTG, and 16.5 (2.2–125.9) for NTG (Figure 5). The overall prediction metrics were AUC = 0.74, 95% CI = 0.70–0.77, AUPRC = 0.08, 95% CI = 0.06–0.11, prevalence = 3.0% for POAG; AUC = 0.73, 95% CI = 0.68–0.78, AUPRC = 0.05, 95% CI = 0.03–0.08, prevalence = 1.7% for HTG; and AUC = 0.76, 95% CI = 0.71–0.80, AUPRC = 0.04, 95% CI = 0.03–0.06, prevalence = 1.3% for NTG. The AUC and AUPRC show nominally significant improvements over those from an analogous model using the Craig et al. elastic net VCDR PRS for POAG (ΔAUC = 0.014, 95% CI = 0.0–0.03, p = 0.03; ΔAUPRC = 0.008, 95% CI = 0.0–0.02, p = 0.03, paired bootstrap test) and HTG (ΔAUC = 0.014, 95% CI = 0.0–0.03, p = 0.04; ΔAUPRC = 0.006, 95% CI = 0.0–0.02, p = 0.04, paired bootstrap test).

Figure 5

Primary open-angle glaucoma (POAG) prediction in the EPIC-Norfolk cohort

(A–C) Odds ratios and 95% CIs for POAG prevalence by decile of VCDR PRS; reference is decile 1. Results are from logistic regression models adjusted for age and sex for primary open-angle glaucoma (175 cases, 5,693 controls) (A), high-tension glaucoma (HTG; 98 cases, 5,693 controls) (B), and normal-tension glaucoma (NTG; 77 cases, 5,693 controls) (C). Results are presented for the ML-based elastic net VCDR PRS (blue) and the Craig et al. elastic net VCDR PRS (yellow). Note the y axis log scale.

Primary open-angle glaucoma (POAG) prediction in the EPIC-Norfolk cohort (A–C) Odds ratios and 95% CIs for POAG prevalence by decile of VCDR PRS; reference is decile 1. Results are from logistic regression models adjusted for age and sex for primary open-angle glaucoma (175 cases, 5,693 controls) (A), high-tension glaucoma (HTG; 98 cases, 5,693 controls) (B), and normal-tension glaucoma (NTG; 77 cases, 5,693 controls) (C). Results are presented for the ML-based elastic net VCDR PRS (blue) and the Craig et al. elastic net VCDR PRS (yellow). Note the y axis log scale.

Discussion

Large cohorts of genotyped and phenotyped individuals have enabled researchers to identify genetic influences of many traits. As methods to ascertain genetic variants in large cohorts continue to improve, we anticipate the major challenge for cohort generation to be accurate and deep phenotyping at scale. Here, we demonstrated that ML-based phenotyping shows promise for improving both scalability to biobank-sized datasets and phenotyping accuracy. We predicted VCDR from all 175,337 UKB fundus images in less than 1 h on a distributed computing system. Multiple lines of evidence indicate that the model-based VCDR predictions improve accuracy over manual labeling, including the reproduction of known VCDR-related biology, identification of plausible novel genetic associations, and generation of polygenic risk scores that better predict VCDR in multiple held-out datasets. Additional advantages of ML-based phenotyping over manual labeling are improved joint prediction accuracy for multiple correlated phenotypes and predicting liabilities instead of binary labels for binary phenotypes. By regressing out predicted VCDR from the predicted referrable glaucoma risk (i.e., whether the individual should seek further ophthalmologist care), we identified residual referrable risk not attributable to variation in VCDR. The improvement of our model-based VCDR GWAS over the recent expert-labeled VCDR GWAS by Craig et al. is consistent with improved phenotyping accuracy by our model. The expert labels may include more noise or measurement error than the ML-based labels, as suggested by the inter-grader variability; the inter-grader Pearson’s correlation between the two ophthalmologists as reported by Craig et al. for images graded multiple times was 0.75 (95% CI = 0.72–0.77), whereas the ML model achieves a Pearson’s correlation of 0.89 between the model predictions and adjudicated expert labels (95% CI = 0.88–0.90). Noise or variability in human grading of VCDR can arise from difficulty in defining the cup-rim border of the optic disc. If the cup-rim border is sloping, rather than having vertical edges, defining it is challenging via two-dimensional images. In this situation, the average VCDR of multiple graders may be considered more accurate than a single grader’s score. Our ML-based model was trained and tuned on images that were assessed by multiple graders and may therefore be expected to outperform a single human grader, on average. The 93 novel VCDR-associated loci discovered by ML-based phenotyping substantially expand our knowledge of the biological processes underlying optic nerve head morphology. While elevated IOP is an established cause of glaucoma, characterized by a pathologically enlarged VCDR, our results support the role of IOP’s contributing to variation in VCDR within the healthy range as well. Of particular note were common VCDR-associated variants in genes harboring mutations that cause inherited anterior segment dysgenesis that is well characterized phenotypically. Our findings suggest these dysgenesis processes may also occur at subclinical levels and contribute to variation in the complex VCDR phenotype. Understanding the genotype-phenotype link in rare single-gene disorders can therefore improve our knowledge of some of the many contributory causes to complex traits. Our results also support an important role of neuronal development processes for VCDR. It remains uncertain whether these processes primarily influence VCDR during optic nerve development in early life, thereby reflecting population variation in baseline optic nerve head anatomy, or act later in life and reflect a pathological, glaucomatous change in VCDR over time. Interestingly, genes involved in developmental processes more broadly, including development of the cardiovascular and urogenital systems, were significantly enriched in our results (Table S8). This may suggest early life processes are a major determinant of VCDR variation in adult populations. This study also showed that a substantial proportion of VCDR variation can be predicted with a polygenic risk score. Improving VCDR prediction produces a concomitant improvement in glaucoma prediction, as we demonstrated by stratifying glaucoma prevalence by using the VCDR PRS. While the UK National Screening Committee does not currently recommend population screening for glaucoma because tests lack sufficient positive predictive value, using polygenic prediction to identify subsets of the general population that are at risk for glaucoma may enable effective screening. Notably, we identified a substantially higher POAG prevalence in the top decile of VCDR PRSs and it may be that current screening tests would have sufficient positive predictive value if applied to this enriched population subset. Earlier detection and treatment of glaucoma, a disease that causes progressive and irreversible vision loss, is a key strategy outlined by the World Health Organization for the prevention of blindness worldwide. While this study demonstrates the potential for ML-based phenotyping to expand our understanding of the genetic variation underlying complex traits, the method has important limitations that must be taken into account. Application of this technique relies on the trained model’s producing accurate predictions in the genomic discovery set. Here, we showed strong generalizability of the model trained on non-UKB fundus images to the UKB fundus images used for genomic discovery by manually labeling a small subset of UKB fundus images and validating model predictions against these ground truth labels. Application to other phenotypes derived from fundus images, or other data modalities such as optical coherence tomography or magnetic resonance imaging, would require similar demonstrations of model generalizability. Additionally, the initial model training can be costly and time intensive, as it requires manual labeling to be performed. While our ablation analysis showed that training on only 10% of the data still identified the majority of VCDR-associated loci, model performance did not appear to saturate even at the full training set size. Ongoing improvements to transfer learning may reduce future labeled data requirements, although the ability to extrapolate consumer imaging improvements to biomedical imaging is unclear. Another limitation of our study was the absence of data for absolute vertical disc diameter (VDD), a commonly used proxy for disc size. While VDD is a heritable trait that would be of interest given its correlation with VCDR, considerable challenges preclude extending ML-based phenotyping to VDD in our study. Because VDD is an absolute size measurement, it requires strict standardization of image acquisition. In particular, differences in absolute size measurements from images arise secondary to camera-related magnification and from ocular refraction, mostly determined by the length of the eye. Since our training images were derived from multiple centers and multiple different cameras that were not standardized in terms of magnification and zoom, it is not possible to derive an accurate VDD on which to train an algorithm. Even within UKB, accurately measuring VDD from fundus images is not possible because there are no measurements of axial length. Correcting for magnification with spherical equivalent only corrects for about 30% of eye size-related magnification artifact, whereas axial length correction can account for nearly 100% of the variation. Consequently, we cannot exclude the possibility that some loci discovered in this study would not reach genome-wide significance in a GWAS adjusted for VDD. However, the similar effect sizes estimated for loci significant both in our study and in Craig et al., and the increased number of loci discovered in an independent ML-based GWAS of VDD-adjusted VCDR in the UKB, suggest that many of the loci discovered here influence VCDR independently of VDD. In summary, we have proposed a method for performing genomic discovery on biobank-scale datasets by using machine learning algorithms for accurate phenotyping. A key benefit of the method is its ability to use a modest-sized biomedical dataset annotated with reasonable accuracy to train a model that identifies the underlying patterns and yields usable predictions. Extending the method to additional phenotypes and data modalities in large-scale biobanks could further expand our understanding of disease etiology and improve genetic risk modeling.

Declaration of interests

P.J.F. and A.P.K. are employees of the UCL Institute of Ophthalmology, London, UK. The remaining authors are employees and shareholders of Google LLC.

54 in total

1. The Age-Related Eye Disease Study (AREDS): design implications. AREDS report no. 1.

Authors:
Journal: Control Clin Trials Date: 1999-12

2. Identification and functional analysis of an ADAMTSL1 variant associated with a complex phenotype including congenital glaucoma, craniofacial, and other systemic features in a three-generation human pedigree.

Authors: Kathryn Hendee; Lauren Weiping Wang; Linda M Reis; Gregory M Rice; Suneel S Apte; Elena V Semina
Journal: Hum Mutat Date: 2017-08-01 Impact factor: 4.878

3. Accurate balance of the polarity kinase MARK2/Par-1 is required for proper cortical neuronal migration.

Authors: Tamar Sapir; Sivan Sapoznik; Talia Levy; Danit Finkelshtein; Anat Shmueli; Thomas Timm; Eva-Maria Mandelkow; Orly Reiner
Journal: J Neurosci Date: 2008-05-28 Impact factor: 6.167

Review 4. Global prevalence of glaucoma and projections of glaucoma burden through 2040: a systematic review and meta-analysis.

Authors: Yih-Chung Tham; Xiang Li; Tien Y Wong; Harry A Quigley; Tin Aung; Ching-Yu Cheng
Journal: Ophthalmology Date: 2014-06-26 Impact factor: 12.079

5. Measurement of optic disc size: equivalence of methods to correct for ocular magnification.

Authors: D F Garway-Heath; A R Rudnicka; T Lowe; P J Foster; F W Fitzke; R A Hitchings
Journal: Br J Ophthalmol Date: 1998-06 Impact factor: 5.908

6. Efficient Bayesian mixed-model analysis increases association power in large cohorts.

Authors: Po-Ru Loh; George Tucker; Brendan K Bulik-Sullivan; Bjarni J Vilhjálmsson; Hilary K Finucane; Rany M Salem; Daniel I Chasman; Paul M Ridker; Benjamin M Neale; Bonnie Berger; Nick Patterson; Alkes L Price
Journal: Nat Genet Date: 2015-02-02 Impact factor: 38.330

7. Genome-wide association analysis identifies TXNRD2, ATXN2 and FOXC1 as susceptibility loci for primary open-angle glaucoma.

Authors: Jessica N Cooke Bailey; Stephanie J Loomis; Jae H Kang; R Rand Allingham; Puya Gharahkhani; Chiea Chuen Khor; Kathryn P Burdon; Hugues Aschard; Daniel I Chasman; Robert P Igo; Pirro G Hysi; Craig A Glastonbury; Allison Ashley-Koch; Murray Brilliant; Andrew A Brown; Donald L Budenz; Alfonso Buil; Ching-Yu Cheng; Hyon Choi; William G Christen; Gary Curhan; Immaculata De Vivo; John H Fingert; Paul J Foster; Charles Fuchs; Douglas Gaasterland; Terry Gaasterland; Alex W Hewitt; Frank Hu; David J Hunter; Anthony P Khawaja; Richard K Lee; Zheng Li; Paul R Lichter; David A Mackey; Peter McGuffin; Paul Mitchell; Sayoko E Moroi; Shamira A Perera; Keating W Pepper; Qibin Qi; Tony Realini; Julia E Richards; Paul M Ridker; Eric Rimm; Robert Ritch; Marylyn Ritchie; Joel S Schuman; William K Scott; Kuldev Singh; Arthur J Sit; Yeunjoo E Song; Rulla M Tamimi; Fotis Topouzis; Ananth C Viswanathan; Shefali Setia Verma; Douglas Vollrath; Jie Jin Wang; Nicole Weisschuh; Bernd Wissinger; Gadi Wollstein; Tien Y Wong; Brian L Yaspan; Donald J Zack; Kang Zhang; Epic-Norfolk Eye Study; Robert N Weinreb; Margaret A Pericak-Vance; Kerrin Small; Christopher J Hammond; Tin Aung; Yutao Liu; Eranga N Vithana; Stuart MacGregor; Jamie E Craig; Peter Kraft; Gareth Howell; Michael A Hauser; Louis R Pasquale; Jonathan L Haines; Janey L Wiggs
Journal: Nat Genet Date: 2016-01-11 Impact factor: 38.330

8. A framework for the investigation of pleiotropy in two-sample summary data Mendelian randomization.

Authors: Jack Bowden; Fabiola Del Greco M; Cosetta Minelli; George Davey Smith; Nuala Sheehan; John Thompson
Journal: Stat Med Date: 2017-01-23 Impact factor: 2.373

9. Comparison of Associations with Different Macular Inner Retinal Thickness Parameters in a Large Cohort: The UK Biobank.

Authors: Anthony P Khawaja; Sharon Chua; Pirro G Hysi; Stelios Georgoulas; Hannah Currant; Tomas W Fitzgerald; Ewan Birney; Fang Ko; Qi Yang; Charles Reisman; David F Garway-Heath; Chris J Hammond; Peng T Khaw; Paul J Foster; Praveen J Patel; Nicholas Strouthidis
Journal: Ophthalmology Date: 2019-08-21 Impact factor: 12.079

10. Multitrait analysis of glaucoma identifies new risk loci and enables polygenic prediction of disease susceptibility and progression.

Authors: Jamie E Craig; Xikun Han; Ayub Qassim; Alex W Hewitt; Stuart MacGregor; Mark Hassall; Jessica N Cooke Bailey; Tyler G Kinzy; Anthony P Khawaja; Jiyuan An; Henry Marshall; Puya Gharahkhani; Robert P Igo; Stuart L Graham; Paul R Healey; Jue-Sheng Ong; Tiger Zhou; Owen Siggs; Matthew H Law; Emmanuelle Souzeau; Bronwyn Ridge; Pirro G Hysi; Kathryn P Burdon; Richard A Mills; John Landers; Jonathan B Ruddle; Ashish Agar; Anna Galanopoulos; Andrew J R White; Colin E Willoughby; Nicholas H Andrew; Stephen Best; Andrea L Vincent; Ivan Goldberg; Graham Radford-Smith; Nicholas G Martin; Grant W Montgomery; Veronique Vitart; Rene Hoehn; Robert Wojciechowski; Jost B Jonas; Tin Aung; Louis R Pasquale; Angela Jane Cree; Sobha Sivaprasad; Neeru A Vallabh; Ananth C Viswanathan; Francesca Pasutto; Jonathan L Haines; Caroline C W Klaver; Cornelia M van Duijn; Robert J Casson; Paul J Foster; Peng Tee Khaw; Christopher J Hammond; David A Mackey; Paul Mitchell; Andrew J Lotery; Janey L Wiggs
Journal: Nat Genet Date: 2020-01-20 Impact factor: 38.330

4 in total

Review 1. The genetics of glaucoma: Disease associations, personalised risk assessment and therapeutic opportunities-A review.

Authors: Inas F Aboobakar; Janey L Wiggs
Journal: Clin Exp Ophthalmol Date: 2022-01-17 Impact factor: 4.383