| Literature DB >> 34272381 |
Kira J Stanzick1, Yong Li2, Pascal Schlosser2, Mathias Gorski1, Matthias Wuttke2, Laurent F Thomas3,4,5, Humaira Rasheed3,6, Bryce X Rowan7,8, Sarah E Graham9, Brett R Vanderweff10,11, Snehal B Patil10,11,12, Cassiane Robinson-Cohen8,13, John M Gaziano14,15, Christopher J O'Donnell16, Cristen J Willer9,12,17, Stein Hallan4,18, Bjørn Olav Åsvold3,19, Andre Gessner20, Adriana M Hung8,13, Cristian Pattaro21, Anna Köttgen2,22, Klaus J Stark1, Iris M Heid1, Thomas W Winkler23.
Abstract
Genes underneath signals from genome-wide association studies (GWAS) for kidney function are promising targets for functional studies, but prioritizing variants and genes is challenging. By GWAS meta-analysis for creatinine-based estimated glomerular filtration rate (eGFR) from the Chronic Kidney Disease Genetics Consortium and UK Biobank (n = 1,201,909), we expand the number of eGFRcrea loci (424 loci, 201 novel; 9.8% eGFRcrea variance explained by 634 independent signal variants). Our increased sample size in fine-mapping (n = 1,004,040, European) more than doubles the number of signals with resolved fine-mapping (99% credible sets down to 1 variant for 44 signals, ≤5 variants for 138 signals). Cystatin-based eGFR and/or blood urea nitrogen association support 348 loci (n = 460,826 and 852,678, respectively). Our customizable tool for Gene PrioritiSation reveals 23 compelling genes including mechanistic insights and enables navigation through genes and variants likely relevant for kidney function in human to help select targets for experimental follow-up.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34272381 PMCID: PMC8285412 DOI: 10.1038/s41467-021-24491-0
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Primary meta-analysis for eGFRcrea identified 424 loci, including 201 novel loci.
Shown are results from our primary meta-analysis for eGFRcrea (n = 1,201,929). We identified 424 loci with genome-wide significance (P < 5 × 10−8), including 223 known (previous GWAS[7]) and 201 novel (marked in blue and red, respectively). a Manhattan plot shows –log10 association P value for the genetic effect on eGFRcrea by chromosomal base position (GRCh37). The red dashed line marks genome-wide significance (5 × 10−8). P values are two-sided and were derived using a Wald test. b Scatterplot comparing eGFRcrea effect sizes versus allele frequencies for the 424 identified locus lead variants (orange lines at 5% and 95% allele frequency). Effect sizes and allele frequencies were aligned to the eGFRcrea-decreasing alleles.
Fig. 2Supporting alternative biomarker association for 348 loci.
Shown are results from our evaluation of alternative kidney function biomarker association for the 424 locus lead variants to establish loci with likely kidney function relevance. We classified each of the 424 variants as “validated” by BUN and/or eGFRcys based on a nominal significant association (P < 0.05) with consistent effect direction for BUN (n = 852,678, i.e. opposite effect to eGFRcrea) and/or eGFRcys (n = 460,826, i.e. same effect direction as eGFRcrea). We validated 348 of the 424 loci and thus more than doubled the number of loci with additional biomarker evidence compared to previous work (147 loci previously based on BUN-only[7]). a Pie chart showing the classification of the 424 lead variants as “validated” by eGFRcys and/or BUN effects. b Scatterplot comparing effect sizes for eGFRcrea and eGFRcys with 95% confidence intervals (green: eGFRcys and BUN validated, brown: only eGFRcys-validated, magenta: only BUN validated, grey: not validated). c Scatterplot comparing effect sizes for eGFRcrea and BUN (colouring analogous to b). The correlation coefficients between effect sizes shown are Spearman correlation coefficients and were based on the 348 validated loci lead variants. Genetic effect sizes are presented with error bars +/− 1.96* standard error of the genetic effect size estimate.
Fig. 3Fine-mapping of 634 independent signals by credible set variants including 138 with small credible set size.
For the 424 identified eGFRcrea loci, we derived 634 independent signals by approximate conditional analyses with GCTA[24] and, for each signal, 99% credible sets of variants using the method by Wakefield[25] based on the European-only meta-analysis results (n = 1,004,040). a Distribution of the number of signals per 424 loci. b Distribution of credible set sizes for the 226 signals at novel loci. c Distribution of credible set sizes for the 408 signals at known loci. Colour in panels b and c denotes the order in which the signal appeared in the stepwise conditional analysis. Of the 634 signals, 138 were successfully fine-mapped down to a small credible set (i.e. <=5 variants) including 44 that contained exactly one variant.
Summary of annotation of the 138 single or small 99% credible variant sets.
| 44 (37) single sets (1 variant) | 94 (83) sets with 2–5 variants | |||||
|---|---|---|---|---|---|---|
| 8 (8) at novel loci | 36 (29) at known loci | 22 (19) at novel loci | 72 (64) at known loci | |||
| Among 99% credible set variants | 22 (17) | 14 (12) | 47 (43) newly small | 25 (21) known-small | ||
| Any protein-relevant variant | 3 (3) | 5 (3) | 6 (5) | 7 (5) | 6 (6) | 3 (1) |
| • Stop-gained/ stop-lost/non-synonymous | 2 (2) | 2 (2) | 5 (4) | 4 (4) | 3 (3) | 2 (1) |
| • Canonical-splice/noncoding-change/synonymous/splice-site | 0 | 0 | 0 | 0 | 0 | 0 |
| • Other consequence | 1 (1) | 3 (1) | 1 (1) | 3 (1) | 3 (3) | 2 (0) |
| Any kidney-tissue regulatory variant | 0 | 0 | 0 | 2 (1) | 7 (7) | 0 |
| • eQTL in glomerulus (NEPTUNE) | 0 | 0 | 0 | 1 (1) | 1 (1) | 0 |
| • eQTL in tubulo-interstitium (NEPTUNE) | 0 | 0 | 0 | 1 (0) | 6 (6) | 0 |
| • eQTL in kidney tissue (GTEx) | 0 | 0 | 0 | 0 | 1 (1) | 0 |
| • sQTL in kidney tissue (GTEx) | 0 | 0 | 0 | 0 | 1 (1) | 0 |
| Any protein-relevant or kidney-tissue regulatory variant | 3 (3) | 5 (3) | 6 (5) | 8 (6) | 12 (12) | 3 (1) |
| Any other tissue regulatory variant | 5 (5) | 13 (11) | 11 (10) | 15 (14) | 34 (31) | 18 (14) |
| • eQTL in other tissue (GTEx) | 5 (5) | 12 (10) | 11 (10) | 15 (14) | 34 (31) | 18 (14) |
| • sQTL in other tissue (GTEx) | 1 (1) | 10 (8) | 7 (6) | 11 (10) | 20 (17) | 7 (5) |
For the 138 identified eGFRcrea signals mapping to single or small (2–5 variants) 99% credible variant sets (fine-mapping in n = 1,004,040 individuals), we applied bioinformatic follow-up to the credible variants. Shown are the number of signals containing a credible variant targeting a gene in the locus by being (i) relevant for the protein (i.e., CADD score ≥15, variant within gene, Supplementary Data 9), (ii) relevant for regulatory function in kidney tissue (i.e., eQTL in NEPTUNE glomerular or tubule-interstitial tissue, Supplementary Data 10; or eQTL/sQTL in GTEX kidney tissue, Supplementary Data 11), or (iii) relevant for regulatory function in other non-kidney tissue (i.e. eQTL/sQTL in GTEx non-kidney tissues, Supplementary Data 12). Shown in brackets is the number of signals mapping to eGFRcys/BUN-validated loci.
Fig. 4Results from Gene PrioritiSation (GPS) yields 32 genes.
By querying our GPS (Supplementary Data 14), we identified 32 genes that are mapping to eGFRcys/BUN-validated loci and to a small credible set (≤5 variants) that contains a protein-relevant variant within the gene (CADD ≥15) or a kidney-tissue regulatory variant (eQTL in NEPTUNE glomerulus or tubule-interstitial tissue; eQTL or sQTL in GTEx kidney tissue). Shown is the locus information (locus id, signal id, number of signals in the locus and the number of credible variants in the signal), variant information for credible variants within the gene (functional annotation, blue), for regulatory credible variants (regulatory annotation, orange) and gene information for kidney-related phenotypes (in mouse or human, green). Genes are grey if the PPA of the relevant variant is <10% or if the gene was previously highlighted by Wuttke et al. without additional evidence[7]. An alternative result limited to variants that are available in the mostly European CKDGen consortium meta-analysis is shown in Supplementary Fig. 8.
Highlighted genes with novel evidence for kidney function.
| Gene | Variant (EAF, PPA), consequence (CADD PHRED) | Novelty |
|---|---|---|
| Known single | ||
| | rs863678 (0.64, 99.9%), 3’ UTR (18.4) | Not further described previously as “other CADD>=15” previously |
| Newly single | ||
| | rs267738 (0.46, 99.1%), p.Glu115Ala (32.0) | Previous cred set size=5 (previous PPA = 46%)[ |
| | rs76572975 (0.024, 99.7%), p.Arg3842Leu (23.8) | Not fine-mapped previously[ |
| | rs112068790 (0.97, 99.2%), intron (18.3); rs55760516 (0.67, 39.8%), p.Gly2790Arg (22.3) | 1st signal newly single, 2nd signal newly small (cred set size = 2), previously one signal with cred set >5[ |
| Novel locus | ||
| | rs139323761 (0.027, 99.9%), Intron (21.9) | Experimental link to kidney function unknown |
| | rs11557049 (0.065, 99.9%), p.Gly76Glu (24.0) | Experimental link to kidney function unknown |
| | rs35529250 (0.006, 99.8%), p.Gly538Arg (28.5) | Experimental link to kidney function unknown |
| Newly small | ||
| | rs9905761 (0.81*, 36.9%), Intron (15.2) | Experimental link to kidney function unknown |
| | rs17420882 (0.72,93.2%), Intron (16.2) | Experimental link to kidney function unknown |
| | rs6464165 (0.71,49.9%), eQTL tubulo-interstitial; rs10224210 (0.71, 49.9%), eQTL tubulo-interstitial | Previous coloc tubulo-interstitial (PPH4 = 98%)[ |
| | rs72629024 (0.85, 85.1%), eQTL tubulo-interstitial/glomerular | New coloc tubulo-interstitial/ glomerular (PPH4 = 99.5% / 99.8%); experimental link to kidney function unknown |
| | rs854922 (0.092, 90.5%), 5’ UTR (18.0) | Experimental link to kidney function unknown. |
| | rs10774020 (0.34,51.9%), eQTL tubulo-interstitial; rs11062102 (0.34, 47.5%), eQTL tubulo-interstitial | New coloc tubulo-interstitial (PPH4 = 99.5%), cell-type specific expression in proximal tubulus in both datasets; link to kidney function unclear |
| | rs143710547 (0.08, 58.7%), eQTL tubulo-interstitial | No coloc; experimental link to kidney function unknown |
| | rs434215 (0.28, 93.2%), eQTL tubulo-interstitial | New coloc tubulo-interstitial (PPH4 = 99.6%); experimental link to kidney function unknown |
| | rs4971092 (0.88, 83.1%), eQTL tubulo-interstitial | New coloc tubulo-interstitial (PPH4 = 99.3%); link to rare Mendelian disease with potential kidney involvement |
| | rs11556924 (0.38, 84.1%), p.Arg363His (27.5) | Experimental link to kidney function unknown |
| Novel locus | ||
| | rs17602729 (0.13, 96.0%), p.Gln45Ter (36.0) | Experimental link to kidney function unknown |
| | rs6084180 (0.80, 82.4%), eQTL glomerulus | Experimental link to kidney function unknown |
| | rs1800574 (0.03, 48.4%) p.Ala98Val (22.7) | Two variants with identical PPA; less-frequent variant in Mendelian disorder gene with kidney phenotype, previously associated with urate[ |
| | rs112905092 (0.017, 81.2%) Intron (18.8) | Experimental link to kidney function unknown |
| | rs3814995 (0.31, 91.5%) p.Glu117Lys, (25.0) | rs3814995 as common variant in rare Mendelian kidney disorder gene not reported before |
| | rs117739035 (0.037, 65.9%), p.Ser80Tyr (23.5) | Experimental link to kidney function unknown |
Here we present details on the 23 genes (among the 32 identified by the GPS approach on eGFRcys/BUN-validated, small set relevant variants, Fig. 4) that showed adequate fine-mapping resolution for the respective protein-relevant or kidney-tissue regulatory-relevant variant (PPA > = 10%) and novel evidence compared to the previous work[7]. A detailed description of the genes can be found in Supplementary Note 3.
Fig. 5Specific expression in GTEx and DEPICT tissues.
Shown are tissue-specific enrichment P values from gene expression enrichment analyses. a Enrichment analyses in GTEx tissues and cell types (FDR <5%). b Tissue- and cell-type-specific enrichment analysis by DEPICT (FDR <5%). Both analyses were conducted twice: based on all 5906 genes located at the 424 identified eGFRcrea loci and based on the subset of 4941 genes located at the 348 eGFRcys/BUN-validated loci. The enrichment in muscle tissue is attenuated after focusing on eGFRcys/BUN-validated loci in both approaches. Significance lines approximately refer to a FDR of 5%. P values are derived from a one-sided resampling based enrichment test (“Methods”).
Fig. 6Specific expression in single-cell RNA-seq datasets of the human mature kidney.
Shown are results from gene expression enrichment analyses (based on 4941 genes located at eGFRcys/BUN-validated loci) and heatmaps of expression z scores for the 23 genes highlighted by Table 2. a Enrichment in 17 cell types by Wu et al. b Enrichment in 27 cell types by Stewart et al. P values are derived from a one-sided resampling based enrichment test (“Methods”). c Expression heatmap for 21 of the 23 genes in cell types by Wu et al. (AMPD1 and CPXM1 not specifically expressed in any cell type by Wu et al., Supplementary Data 20). d Expression heatmap for 22 of the 23 genes in cell types by Stewart et al. (GALNTL5 not specifically expressed in any cell type by Stewart et al., Supplementary Data 20). In A and B, shown are the enrichment P values and significance lines approximately refer to a FDR of 5%. AVRE ascending vasa recta endothelium, B B cell, CD4-T CD4 T cell, CNT connecting tubule, DVRE descending vasa recta endothelium, DCT distal convoluted tubule, EC endothelial cells, EPC epithelial progenitor cell, Fib fibroblast, GE glomerular endothelium, IC intercalated cells, LOH Loop of Henle (ATL ascending thin limbs, DTL descending thin limbs), Mast mast cell, MNP mononuclear phagocyte, MFib myofibroblast, NK natural killer cell, Neutro neutrophil, PCE peritubular capillary endothelium, Podo podocytes, PC principal cells, PT proximal tubule, TE transitional epithelium of ureter.
Colocalization analysis results for selected genes.
| Gene expression | eGFRcrea association | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Locus ID | Signal ID | Gene | PP_H4 | rsid | EA | OA | BETA | FDR | BETA | PPA | ||
| Expression in tubule-interstitial tissue | ||||||||||||
| k4.1 | 1 | rs10224210 | C | T | 0.67 | 1.0E-05 | −0.0078 | 2.9E-139 | 0.50 | |||
| k21.2 | 2 | 0.001 | rs58436159 | T | C | −0.58 | 1.1E-05 | −0.0051 | 1.6E-23 | 0.13 | ||
| k27.1 | 1 | rs11062102 | C | T | −0.29 | 3.4E-07 | −0.0041 | 2.3E-47 | 0.47 | |||
| k88 | 1 | rs434215 | A | G | 0.57 | 7.4E-06 | −0.0039 | 6.4E-26 | 0.93 | |||
| k99.1 | 1 | rs2314639 | T | C | −0.47 | 1.0E-10 | −0.0035 | 5.0E-17 | 0.07 | |||
| k191.2 | 2 | rs4971092 | T | C | −0.73 | 2.9E-05 | −0.0027 | 1.0E-10 | 0.83 | |||
| n95.1 | 1 | 0.158 | rs6084184 | A | G | −0.43 | 5.1E-04 | 0.19 | −0.0019 | 2.0E-08 | 0.07 | |
| Expression in glomerular tissue | ||||||||||||
| k4.1 | 1 | 0.033 | rs10224210 | C | T | 0.15 | 0.44 | 0.99 | −0.0078 | 2.9E-139 | 0.50 | |
| k21.2 | 2 | 0.035 | rs2203002 | T | C | 0.22 | 0.0976 | 0.94 | −0.0051 | 1.2E-23 | 0.14 | |
| k27.1 | 1 | 0.039 | rs11062102 | C | T | −0.11 | 0.12 | 0.95 | −0.0041 | 2.3E-47 | 0.47 | |
| k88 | 1 | 0.043 | rs434215 | A | G | 0.19 | 0.21 | 0.97 | −0.0039 | 6.4E-26 | 0.93 | |
| k99.1 | 1 | rs72629024 | G | C | −0.46 | 4.9E-07 | −0.0036 | 3.5E-18 | 0.85 | |||
| k191.2 | 2 | 0.164 | rs4971092 | T | C | 0.30 | 0.037 | 0.90 | −0.0027 | 1.0E-10 | 0.83 | |
| n95.1 | 1 | 0.731 | rs6084180 | T | C | −0.87 | 1.9E-09 | −0.002 | 1.3E-08 | 0.82 | ||
For the seven genes with small 99% credible sets (<= 5 variants, among 23 highlighted genes from Table 2) that contain significant eQTLs in kidney tissue, we here show results from colocalization analysis between eGFRcrea association signals (n = 1,004,040) and gene expression signals for two types of kidney tissues from NEPTUNE (tubule-interstitial and glomerular tissue, n = 187). PP_H4 is the posterior probability of positive colocalization[41]. We also show the respective credible set variant with the smallest P value for gene expression and its association estimates for gene expression (NEPTUNE data) and eGFRcrea (GWAS data) (EA: effect allele, OA: other allele, BETA: genetic effect per EA, P: two-sided association P value based on Wald test, FDR: false-discovery-rate, PPA: posterior probability of association from variant-based fine-mapping). Locus/Signal ID: Identifier of identified locus/signal (“n” novel, “k” known; first integer indicating the locus, second integer the signal within the locus). Marked in bold are positive colocalizations (PP_H4 ≥80%) and significant eQTLs (FDR <5%).
Explained variance and genetic risk score analyses.
| (a) | |||||
|---|---|---|---|---|---|
| Study | Study design | Number of variants | sd of age-/sex-adjusted log eGFRcrea in the respective study | ||
| UKB | 436,581 | Population-based | 634 | 0.15 | 9.3% |
| HUNT | 69,389 | Population-based | 625 | 0.15 | 6.7% |
| MGI | 47,219 | Hospital-based | 620 | 0.28 | 3.7% |
| MVP | 300,680 | Hospital-based | 620 | 0.28 | 4.1% |
| Second meta | 417,288 | Meta-analysis | 632 | 0.13a | 9.8% |
| 0.28b | 2.0% |
aFrom population-based ARIC (as in Wuttke et al.[7]).
bFrom hospital-based MVP.
Shown are results from the explained variance and genetic risk score (GRS) analysis based on the 634 identified signal index variants. (a) Summary of explained variance analyses based on summary statistics from population- and hospital-based studies or from the second meta-analysis. UKB was part of the primary identifying meta-analysis. HUNT, MVP and MGI were independent studies, which were meta-analysed as second meta-analysis (n = 417,288). The variance explained by the 634 signal lead variants (R2) was computed based on genetic effects, genotype and phenotype variance from the respective study. Since phenotype variance was not available for the second meta-analysis, we here assumed phenotype variances taken from the population-based study ARIC or from the hospital-based study MVP. (b) Summary of GRS analyses. For the GRS, two example studies of different age range were analysed, each independent from the identifying GWAS meta-analysis: HUNT (n = 26,254, population-based, age 19–99 years, sd of age-/sex-adjusted eGFRcrea = 11.9 ml/min/1.73 m2) and AugUR (n = 1105, population-based of mobile elderly, age 70–95 years, sd of age-/sex-adjusted eGFRcrea = 14.6 ml/min/1.73 m2). The unweighted and weighted GRS (i.e., using effect sizes from eGFRcrea GWAS) were computed and the association of the GRS on eGFRcrea (not log-transformed) and the variance explained (R2) were derived via linear regression with GRS as covariate and eGFRcrea as outcome (adjusted for age, sex and principal components, “Methods“; GRS and eGFRcrea descriptives in Supplementary Table 1).