Chengxiang Qiu1, Shizheng Huang1, Jihwan Park1, YoSon Park2, Yi-An Ko1,2, Matthew J Seasock1, Joshua S Bryer1, Xiang-Xi Xu3, Wen-Chao Song4, Matthew Palmer5, Jon Hill6, Paolo Guarnieri6, Julie Hawkins6, Carine M Boustany-Kari6, Steven S Pullen6, Christopher D Brown2, Katalin Susztak7,8. 1. Department of Medicine, Renal Electrolyte and Hypertension Division, University of Pennsylvania, Philadelphia, PA, USA. 2. Department of Genetics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA. 3. Department of Cell Biology, Sylvester Comprehensive Cancer Center, University of Miami Miller School of Medicine, Miami, FL, USA. 4. Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine at University of Pennsylvania, Pennsylvania, PA, USA. 5. Pathology and Laboratory Medicine at the Hospital of the University of Pennsylvania, Philadelphia, PA, USA. 6. Boehringer Ingelheim Pharmaceuticals, Inc., Ridgefield, CT, USA. 7. Department of Medicine, Renal Electrolyte and Hypertension Division, University of Pennsylvania, Philadelphia, PA, USA. ksusztak@pennmedicine.upenn.edu. 8. Department of Genetics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA. ksusztak@pennmedicine.upenn.edu.
Abstract
Chronic kidney disease (CKD), a condition in which the kidneys are unable to clear waste products, affects 700 million people globally. Genome-wide association studies (GWASs) have identified sequence variants for CKD; however, the biological basis of these GWAS results remains poorly understood. To address this issue, we created an expression quantitative trait loci (eQTL) atlas for the glomerular and tubular compartments of the human kidney. Through integrating the CKD GWAS with eQTL, single-cell RNA sequencing and regulatory region maps, we identified novel genes for CKD. Putative causal genes were enriched for proximal tubule expression and endolysosomal function, where DAB2, an adaptor protein in the TGF-β pathway, formed a central node. Functional experiments confirmed that reducing Dab2 expression in renal tubules protected mice from CKD. In conclusion, compartment-specific eQTL analysis is an important avenue for the identification of novel genes and cellular pathways involved in CKD development and thus potential new opportunities for its treatment.
Chronic kidney disease (CKD), a condition in which the kidneys are unable to clear waste products, affects 700 million people globally. Genome-wide association studies (GWASs) have identified sequence variants for CKD; however, the biological basis of these GWAS results remains poorly understood. To address this issue, we created an expression quantitative trait loci (eQTL) atlas for the glomerular and tubular compartments of the human kidney. Through integrating the CKD GWAS with eQTL, single-cell RNA sequencing and regulatory region maps, we identified novel genes for CKD. Putative causal genes were enriched for proximal tubule expression and endolysosomal function, where DAB2, an adaptor protein in the TGF-β pathway, formed a central node. Functional experiments confirmed that reducing Dab2 expression in renal tubules protected mice from CKD. In conclusion, compartment-specific eQTL analysis is an important avenue for the identification of novel genes and cellular pathways involved in CKD development and thus potential new opportunities for its treatment.
Chronic kidney disease (CKD) affects close to 10% of the population
worldwide[1,2]. Despite advances in therapeutic methods such
as dialysis and transplantation, CKD remains the 9th leading cause of
death in the world[3,4]. The precise molecular and cellular
mechanisms underlying CKD pathogenesis are still poorly understood[5]. As a result, current treatments are
mostly palliative rather than truly curative.It is estimated that 20% of individuals with CKD in the US harbor potentially
identifiable and causal (for CKD) mutations in a single gene[6,7]. The
majority of individuals with CKD exhibit a polygenic architecture with an estimated
heritability proportion of around 30%−50%[8]. Large-scale genome-wide association (GWA) studies using
more than 100,000 subjects were performed to unravel the genetics of kidney function
and CKD in the general population and successfully identified a total of 53 loci
significantly associated with kidney disease[9]. Currently, the total number of loci with genome-wide
significant association with kidney disease traits are 83[9-12]. As seen for other traits, more than 95% of variants
significantly associated with kidney function are in the non-coding regions of the
genome[13,14].GWA studies often label risk allele regions by their proximity to specific
genes. However, regulatory elements can act over large distances and in a
cell-type-specific manner; thus, making the identification of disease-causing genes
and their mechanism exceedingly difficult[15,16]. For example,
specific genes that explain the common genetic variation associated with
hyperlipidemia, diabetes and heart disease have been uncovered only for a handful of
loci[17-19]. These studies support a model that causal
nucleotide variants are localized to cell-type-specific regulatory regions, such as
enhancers, that alter transcription factor binding and induce quantitative
differences in transcript level of nearby causal genes[20,21].
One such example is the FTO region, where variants in the intronic region of FTO
show strong association with obesity, but the actual causal genetic variation was
found to be an enhancer modulating the expression of the transcription factors IRX3
and IRX5 that are encoded nearby the FTO gene[22]. These observations highlight the critical importance of
careful, unbiased in-depth follow-up studies to GWA studies to better confirm or
identify the actual causal genes for a given pathological condition.Several large collaborative efforts have been set up to better understand
complex trait variations. The Encyclopedia of DNA Elements (ENCODE) project was an
important international effort to identify regulatory regions that could harbor
disease-causing variants. The project generated regulatory region maps for large
number of cell types and organs[23,24]. Similarly, the Genotype-Tissue
Expression (GTEx) project provided one of the largest and most comprehensive
catalogs of whole genome and transcriptome sequencing data, comprised of up to 53
human tissue samples across 635 individuals defining an association between genetic
variant and gene expression; also called expression of quantitative trait (eQTL)
analysis[25-27]. Nonetheless, the human kidney has
been poorly covered both by ENCODE[28,29] and
GTEx[25-27] studies. To address this issue, our group
has previously generated histone Chromatin Immunoprecipitation Sequencing (ChIP-Seq)
data to annotate kidney-specific regulatory regions and performed eQTL analyses on
96 human whole kidney tissue samples[30]. As a result of this whole tissue-based analysis, we identified
putative disease associated genes for five of the 83 previously identified
CKD-associated GWAS loci. Candidate gene and animal model studies indicated that
UMOD and SHROOM3 are likely causal genes for
CKD GWAS variants[31,32]. Causal genes and pathways for the remaining
76 loci remain unknown to date.Here we argue that cell-type heterogeneity of the eQTL dataset, in addition
to the sample size limitation, are the key contributors to the low yield of
identifying causal genes for CKD using the GWAS-eQTL integration approach[24,33-35]. Our
recent single-cell transcriptome analysis highlighted important cell-type
convergence, indicating that diseases that present with similar phenotypes originate
from the same cell types[36]. We
propose that diseases are not organ-specific but, rather, cell-type-specific;
therefore, genetic variants are localized to cell-type-specific regulatory regions
and influence gene expression changes only in disease-causing cell types[35,37,38].As a first step towards identifying disease genes of CKD, we performed a
compartment-based eQTL analysis of human kidney tissue samples using manual
microdissection of the glomerulus and tubule, which are two key compartments of this
organ. This microdissection significantly reduces cell heterogeneity as each
compartment is composed of around only five cell types[36]. We aimed to define genotype-driven gene
expression changes in the glomerular and tubular compartments of human kidneys,
identifying genetic variants that influence the expression of genes. Here, we call
genetic variants that influence gene expression eVariants and their target genes
eGenes.Subsequently, we integrated this information with genotype and phenotype
association studies (that is, GWAS hits) to identify genes for which expression in
the kidney shows differences in individuals with GWAS-identified variants (Supplementary Fig. 1a). We
show that compartment-based eQTL data significantly improves identification of genes
for which expressions are regulated by GWAS-identified variants. Furthermore, we
integrated the kidney eQTL data with epigenomic data and transcriptome analysis from
single-cell RNA sequencing (RNA-Seq) to study the regulatory mechanism of the
cell-type-specific eQTL effects of disease variants. Finally, we performed
cell-type-specific gene expression manipulations in animal models and specifically
demonstrated that DAB2 is likely a causal gene for CKD development.
Our study provides a novel genetic framework for CKD development as it defines key
cell types and novel mechanisms involved in the disease.
Results
Compartment-based eQTLs in the human kidney
We separated human kidney tissue compartments, in particular glomeruli
and tubules, by manual microdissection followed by RNA-Seq of each compartment
(Supplementary Fig.
1b). The expression of tubule epithelial-specific markers such as
SLC12A1 and SLC34A1 were significantly
greater in tubules (P < 2.2 ×
10−16 and P = 3.59 ×
10−11, respectively; two-sided Student’s
t test), while glomerulus epithelial-specific genes were
almost exclusively expressed in glomeruli (Supplementary Fig. 1c). Well-known
nephrotic syndrome genes showed preferential expression in glomerular
compartment and proximal tubulopathy genes expressed in tubules (Supplementary Fig. 1d). We
validated that the fraction of each cell type was similar in the kidney samples
included in the analysis using in silico cell deconvolution
analysis that estimates cell-type proportions based on latent variable
modeling[39,40] (Supplementary Fig. 1e).
Furthermore, tissue samples underwent careful clinical and histological
evaluation, and we included samples only without significant structural and
functional changes in the analysis to minimize non-genetically driven gene
expression fluctuations (Supplementary Table 1).Using these stringent criteria, we included 151 kidneys in the analysis,
including 121 tubule samples and 119 glomerulus samples used to identify
compartment-based cis-eQTLs (further referred to as eQTLs)
(Supplementary Fig.
1f). We performed eQTL analysis separately for the two compartments
by using a linear model and ± 1Mb window around the transcription start
site (TSS). In the tubular compartment, we identified 4,081 genes with eQTLs at
a 5% FDR (hereafter referred to as eGenes) and 389,454 significant SNP-gene
pairs; in the glomerular compartment, 4,913 eGenes and 467,994 significant
SNP-gene pairs. We made this dataset available via our searchable website
(http://susztaklab.com/eqtl). We have also
performed meta-analysis on tubules and glomeruli eQTLs and have defined
tubule-compartment-specific eGenes (n = 417),
glomerulus-compartment-specific eGenes (n = 674), and
compartment-shared eGenes (n = 3,493) (Fig. 1a).
Figure 1:
Summary of kidney compartment-based eQTL analysis. a,
Diagram indicating the work flow (and results) by which eGenes were identified
in the tubular and glomerular compartment of human kidneys. b,
Diagram illustrating the number of tubule-specific (top) and glomerulus-specific
(bottom) eGenes identified by meta-analysis of eQTLs from 46 human tissues (44
GTEx tissues plus 2 kidney compartments). c, Overlap of tubule
(top) and glomerulus (bottom) eQTLs with hits of various GWA studies from GWAS
catalogue. The x-axis shows the proportion of GWAS variants of
each GWAS trait overlapped with tubule and glomerulus eQTLs (when the GWAS
varuants and eVariants are located in the same LD,
r2 > 0.8). The y-axis
shows enrichment of tubule and glomerulus eGenes by genes around the hits of
each GWAS trait. –log10 (P) was calculated by
two-sided Fisher’s exact test (height was used for background
estimation). Red dots, metabolite traits; blue dots, kidney-related traits; grey
dots, other traits. Orange line, statistical significance threshold.
d, The 15 GWAS traits showing the strongest enrichment
(y-axis) for tubule- and glomerulus-specific eGenes (left)
or shared eGenes (right). Tubule-specific eGene: n = 589;
tubule-shared eGene: n = 7050; glomerulus-specific eGene:
n = 594; glomerulus-shared eGene: n =
7090. –log (P) was calculated by two-sided
Fisher’s exact test (x-axis).
The number of eGenes identified is in line with expectations (Supplementary Fig. 2a),
from previous publications of 44 human tissues of the GTEx Consortium[25]. Furthermore, many of the
significant SNP-gene pairs identified by our tubule eQTL analysis can be
replicated in other tissues (thyroid (π1 =
0.79) and transformed fibroblasts cells (π1 =
0.77), π1 refers to the proportion of true
positives). As expected, the greatest overlap for tubule was observed in
glomeruli (π1 = 0.91) (Supplementary Fig. 2b).By performing meta-analysis of kidney compartments and 44 GTEx tissue
samples, we could identify 589 tubule-specific eGenes (m
> 0.9 in less than five tissues, including tubule, m
refers to the posterior probability that the effect is shared in each tissue)
and 594 glomerulus-specific eGenes. We found 7,050 shared eGenes in the tubules
(m > 0.9 in more than 40 tissues, including tubule);
and 7,090 shared eGenes in the glomerulus. These results are in line with
previous observations indicating that most eGenes were shared across
tissues[25] (Fig. 1b).Our compartment-based eQTL study identified many novel eQTLs that were
previously not detected by bulk tissue, whole kidney eQTL analysis (Supplementary Table 2).
For example, LRRC3 is associated with rs2838917 genotype only
in the tubular compartment, but not in whole kidney or glomerulus (Supplementary Fig. 2c,
d). In contrast, ANXA2 is associated with rs3068
genotype only in the glomerular compartment (Supplementary Fig. 2e, f). Of note,
we found one gene, NUCB1, had an opposite eQTL effect direction
between tubule and glomerulus (Supplementary Fig. 2g). These examples highlight the utility of
compartment-based eQTL analysis to identify a large number of novel eQTLs.
Renal compartment eQTLs are enriched for kidney-associated GWAS trait
variants
By assessing overlap between the entire published GWAS
catalogue[41]
(downloaded 1/4/2017) and our kidney compartment eQTLs, we found that GWAS hits
in genes encoding proteins that are related to regulating the levels various
blood metabolites were significantly enriched for tubule eQTLs. 26% of
metabolite GWAS hits colocalized with tubule eQTLs and its associated genes were
significantly enriched for tubule eGenes (P = 3.04 ×
10−6, Fisher’s exact test). Moreover, tubule eGenes
were also significantly enriched for genes for kidney disease traits, such as
IgA nephropathy (P = 2.12 × 10−3). A
similar pattern of kidney and metabolite trait enrichment was seen in the
glomerulus eQTL dataset (Fig. 1c).Next, we focused on our newly identified tubule-specific and
glomerulus-specific eGenes by assessing overlap with the GWAS catalogue.
Tubule-specific eGenes, compared to non-tubule-specific eGenes, were
significantly enriched for CKD GWAS hits (P = 4.79 ×
10−3, Fisher’s exact test) and glomerular
filtration rate GWAS hits (P = 7.96 ×
10−3). In contrast, these tubule-shared eGenes were
enriched in genes for inflammatory skin disease (P = 5.94
× 10−3) and Parkinson’s disease
(P = 2.95 × 10−2), compared with
non-tubule-shared eGenes. Glomerulus-specific eGenes were not only enriched for
CKD GWAS variants but also for blood pressure-associated genetic variants (Fig. 1d). In summary, these results indicate
that kidney-specific eQTLs show enrichment for kidney-associated phenotypic
traits, but shared eQTLs showed no enrichment for renal traits, thus confirming
the specificity and biological relevance of compartment-based eQTL analysis.
Integration of CKD GWAS variants and compartment-based eQTLs to identify
causal genes for kidney disease
Our goal was to annotate CKD GWAS variants to identify genes and
pathways responsible for CKD development. By literature survey, we identified a
total of 83 replicated, CKD-associated GWAS loci[9-12] (Supplementary Table 3). A direct overlap of loci that reached
genome-wide significance in GWAS for CKD and in our eQTL data identified
differences in the expression level of 27 genes associated with CKD GWAS loci.
To test whether two traits share a causal variant at a given locus, we conducted
colocalization analysis using coloc[42]. And, in a complementary analysis, to
assess whether a GWAS SNP tags the same functional variant as the eQTL, we
calculated RTC scores[43] for
each candidate SNP-gene pair. Most of these genetic regions showed a positive
colocalization between the eQTL effect and CKD GWAS (posterior probability of
colocalization between eQTL and GWAS effect in the given region (PP_H4) >
0.8), suggesting that these genes are strong causal candidates for kidney
disease (Table 1). The functional role of
MANBA, PGAP3 and CASP9 for kidney disease
development has been supported by prior animal model studies, confirming the
biological validity of our computational analysis[44,45].
Table 1:
Colocalization of CKD GWAS leading SNPs with kidney compartment eQTLs.
a, GWAS variant and eQTL variant are not the same one, but in the same LD
(r[2]
> 0.8); b, The regulatory direction of risk allele of each SNP on its
target gene. ↑, higher gene expression with risk allele; ↓, lower
gene expression with risk allele; N/A, this gene or SNP was excluded by data
pre-processing.
Compartment
Chr
GWAS SNP
Risk Allele
Gene
eQTL β
eQTL P
Regulatory directionb
Regulatory direction in Whole
Kidneyb
PP_H3 (coloc)
PP_H4 (coloc)
RTC
Tubule
1
rs12136063
G
SORT1
0.717
6.85 ×
10−8
↓
↓
0.049
0.943
1
1
rs12124078
G
CASP9
0.832
1.13 ×
10−9
↑
↑
0.021
0.978
1
2
rs1260326
C
NRBP1
−0.614
6.52 ×
10−7
↓
N/A
0.003
0.981
1
3
rs347685
A
TFDP2
−0.773
4.07 ×
10−8
↓
↓
0.014
0.979
1
5
rs11959928
A
DAB2
0.566
3.52 ×
10−6
↑
↑
0.039
0.938
1
7
rs228611
A
MANBAa
−0.572
1.79 ×
10−5
↓
↓
0.682
0.294
0.906
7
rs228611
A
CISD2
0.580
1.10 ×
10−5
↑
↑
0.665
0.322
1
7
rs10277115
T
UNCX
0.708
3.84 ×
10−6
↑
N/A
0.002
0.916
1
10
rs10994860
C
ASAH2B
1.153
3.13 ×
10−11
↓
↓
0.257
0.743
1
11
rs4014195
G
MAP3K11a
0.482
1.90 ×
10−4
↑
↑
0.152
0.677
0.843
12
rs10774021
T
SLC6A13
−0.709
4.14 ×
10−8
↓
↓
0.002
0.994
1
13
rs626277
A
DACH1
0.669
5.61 ×
10−8
↓
↓
0.023
0.973
1
15
rs2928148
G
CHAC1
−0.681
2.95 ×
10−7
↑
↑
0.036
0.944
1
16
rs164748
G
CHMP1A
−0.766
5.38 ×
10−10
↓
↓
0.017
0.982
1
16
rs164748
G
DPEP1
0.811
3.09 ×
10−11
↑
↑
0.023
0.975
1
17
rs11078903
A
PGAP3
−0.809
1.38 ×
10−9
↓
↓
0.032
0.967
1
17
rs11078903
A
FBXL20
0.655
2.06 ×
10−6
↑
↑
0.031
0.920
1
glom
1
rs12136063
G
ATXN7L2a
0.629
1.26 ×
10−5
↓
↓
0.034
0.801
0.896
1
rs12124078
G
CASP9
0.798
9.29 ×
10−8
↑
↑
0.021
0.968
1
2
rs13538
A
ALMS1Pa
0.572
4.72 ×
10−5
↓
↓
0.041
0.717
0.998
2
rs1260326
C
NRBP1a
−0.499
5.79 ×
10−5
↓
N/A
0.007
0.873
0.988
2
rs13538
A
NAT8
−0.609
9.34 ×
10−6
↑
↑
0.048
0.814
1
5
rs12654812
A
RGS14
−0.740
2.60 ×
10−8
↓
N/A
0.273
0.162
1
5
rs12654812
A
SLC34A1a
−0.608
1.16 ×
10−5
↓
N/A
0.021
0.525
0.97
6
rs7759001
A
ZNF391
1.081
6.18 ×
10−16
↑
↑
0.095
0.894
1
10
rs10994860
C
ASAH2B
1.261
1.11 ×
10−13
↓
↓
0.209
0.791
1
11
rs4014195
G
DPP3a
−0.510
4.94 ×
10−5
↓
↓
0.006
0.867
0.973
15
rs2928148
G
CHAC1
−0.592
2.05 ×
10−6
↑
↑
0.039
0.864
1
15
rs2467853
G
SPATA5L1
0.729
1.08 ×
10−8
↑
↑
1.000
0.000
1
16
rs164748
G
CHMP1Aa
−0.587
1.06 ×
10−5
↓
↓
0.022
0.941
0.988
16
rs164748
G
DPEP1
0.889
7.54 ×
10−13
↑
↑
0.021
0.978
1
17
rs11078903
A
PGAP3
−0.660
5.55 ×
10−7
↓
↓
0.032
0.945
1
20
rs6088580
C
ACSS2
0.567
1.43 ×
10−5
↑
↑
0.332
0.592
1
20
rs6088580
C
MAP1LC3A
0.713
3.34 ×
10−8
↑
↑
0.015
0.980
1
In summary, when compared to previous eQTL studies of whole kidney
tissue samples, the compartment-based eQTL analysis could identify larger number
of causal genes for CKD (n = 5 for bulk tissue and
n = 27 for specific compartment). Furthermore, despite the
significant eQTL sharing (π1 = 0.91) between
tubule and glomerulus, more identified GWAS causal genes came from the smaller,
newly discovered compartment-specific eQTL subset.
Compartment-specific eQTLs are enriched in distal regulatory regions and show
greater cell-type specificity
We hypothesized that integration of GWAS with compartment eQTLs can
identify potential causal cell types for a given set of traits (Fig. 2a). To further understand whether the 27 genes
with GWAS overlapping compartment-based eQTLs show cell-type specificity, we
have examined the expression of putative GWAS target genes in kidney cells. We
have previously performed single-cell RNA-Seq on mouse kidney tissue samples and
our analysis has identified 16 distinct cell types in the kidney[36]. Using this unique expression
dataset (only use 13 well-annotated cell types), we next examined the cell types
where putative GWAS target genes were expressed. We were able to map the
expression of 23 out of the 27 human target genes in our mouse kidney
single-cell atlas. We found that most CKD-associated genes have a unique
cell-type-enriched expression profile and these transcripts are not broadly
expressed in all kidney cells. Particularly, we found that renal proximal
tubules show the greatest enrichment for GWAS-eQTL target genes, where 39% of
genes (9 out of 23) were expressed in proximal tubule epithelial cells, which is
markedly higher than what is expected if genes were randomly distributed in
kidney cells (23 genes/13 cell types; ~1 gene per cell type) (Fig. 2b).
Figure 2:
Compartment-specific eQTLs show greater cell-type specificity and distal
regulatory element enrichment. a, Diagram describing the
integration of kidney eQTLs, GWAS, single-cell expression and regulatory region.
b, Heatmap of cell-type-specific expression of identified CKD
target genes. The blue/yellow color corresponds to the level of expression
(z-score). Endo, endothelial; Podo, podocyte; PT, proximal tubule; LOH, loop of
Henle; DCT, distal convoluted tubule; CD-PC, collecting duct principal cell;
CD-IC, collecting duct intercalated cell; Fibro, fibroblast; Macro, macrophage;
Neutro, neutrophil; NK, natural killer cell. c, Density plots of
best eVariants in tubule (top) and glomerulus (bottom) and the relationship to
transcription start site (TSS). d, Distance of top eVariants from
TSS (-log10) by groups. e, Odds ratios of the top
eVariants on kidney promoter by groups. The groups were compared to randomly
selected variants matched by MAF and distance to TSS (n = 5,000
randomly selected times). Center lines show the medians; box limits indicate the
25th and 75th percentiles; whiskers extend to the
5th and 95th percentiles, outliers are represented by
dots (d, e). f, Odds ratios (y-axis)
of eGenes from each group enriched by kidney-specific cell type expression.
P was calculated by two-sided Fisher’s exact test.
RTEC: PT, LOH and DCT, Myeloid: Fibro, Macro and Neutro; Lymphoid: B cell, T
cell and NK. Tubule, tubule-specific-eGenes: n = 417; Share,
compartment-shared-eGenes: n = 3,493; Glom,
glomerulus-specific-eGenes: n = 674 (d-f).
g, Genome scale integrated analysis of gene network (GIANT)
visualization of tubule-specific candidate genes. The color of each link shows
the relationship confidence, from green (0.01) to red (1).
As observed by GTEx and most prior studies, eQTLs identified by our
analysis were mostly enriched on promoters (Supplementary Fig. 3a). Upon
examining for binding sites of transcription factors, we found enrichment for
SAP30, KDM5A, HDAC1, CREB1, ELK4 and ESRRA, key transcription factors that are
proposed to be involved in maintaining the high metabolic rate of proximal
tubule cells (Supplementary
Fig. 3b).We reasoned that if compartment-based eQTLs are more informative for
GWAS annotation, then compartment-specific eGenes must arise from variants that
are localized to cell-type-specific regulatory regions such as distant
regulatory regions (e.g., enhancers), rather
than proximal regulatory regions (i.e.,
promoters) that are mostly shared between cell types. When we compared shared
and compartment-specific eGenes, we found that their eQTL variants were
significantly further away from TSS for compartment-specific compared with
shared eGenes, indicating that compartment-specific eQTLs might be linked to
more distal (i.e., enhancers) regulatory
variants (Fig. 2c, d).To further investigate this finding, we compared the genomic feature
annotations for compartment-specific eGenes and compartment-shared eGenes, using
ChIP-Seq data derived from adult human kidney. Compartment-shared eGenes were
significantly more enriched in promoters than compartment-specific eGenes
(P < 2.2 × 10−16,
Wilcox-test) (Fig. 2e). Moreover, we
examined whether cell-type-specific enrichment would explain the identification
of compartment-specific eGenes. We found that glomerulus-compartment-specific
eGenes were significantly enriched for expression in podocyte
(P = 2.91 × 10−7, Fisher’s
exact test), and tubule-compartment-specific eGenes were enriched for expression
in tubule epithelial cells (P = 6.14 ×
10−2, Fisher’s exact test), indicating that
disease-associated target genes are more likely to be cell-type-specific (Fig. 2f).To gain further insight into kidney disease pathogenesis, we have
performed functional annotation of the 27 genes identified by colocalization
between kidney compartment-based eQTLs and CKD GWAS. By using the pathway
analysis described in GIANT[46],
we found that these genes identified by tubule compartment were significantly
enriched for a single functional group: endo-lysosomal function (FDR = 1.5
× 10−2) (Fig. 2g).
Combining results from both kidney compartments showed enrichment not only for
endo-lysosomal genes but related pathways such as autophagy (FDR = 3.59 ×
10−2) and mitochondrial degradation pathway (FDR = 4.58
× 10−2) (Supplementary Fig. 3c). These
results indicate that the expression of the target genes for kidney function are
enriched for a limited number of functional groups. Specifically, we noticed
that DAB2 had the greatest number of connections in the
functional network and represented a central hub.
Identification of DAB2 as a kidney disease gene
To leverage our unique eQTL datasets to identify kidney-specific causal
genes for kidney disease, we focused on the DAB2-Complement C9
(C9) CKD GWAS for follow-up functional validation studies.
rs11959928 at the DAB2-C9 locus has been significantly
associated with CKD in multiple CKD GWA studies[9,10,12]. The DAB2 and
C9 genes are located near rs11959928; however, it was
unclear which gene is regulated by the disease-causing variant. In whole kidney
level eQTL analysis, the rs11959928 genotype was not significantly associated
with either DAB2 or C9 levels[30]. Compartment-based eQTL
analysis identified an association between the rs11959928 genotype and
DAB2 level, but not with C9 level.
Furthermore, this eQTL was only identified in tubule eQTLs, not in glomerulus
eQTLs (Fig. 3a, b). The tubule-specific
eQTL effect of rs11959928 on DAB2 can be replicated by a recent
kidney cis-eQTLs study[47] (Supplementary Fig. 4a). Although DAB2 is broadly
expressed in multiple tissues and cell types, rs11959928 was not identified as a
significant eQTL in the 44 GTEx tissue samples[25] (Fig.
4a). Meta-analysis[48] of our kidney compartment and 44 GTEx tissues indicated
that rs11959928 showed tubule-specific eQTL effect only on DAB2
(m = 0.855), rather than on C9
(m = 0.407) (Fig.
4b).
Figure 3:
CKD associated GWAS SNP (rs11959928) specifically influences
DAB2 levels in human kidney tubules. a,
LocusZoom plots of CKD GWAS or eQTL around the region of rs11959928. The
y-axis shows -log (P) of association tests
(by linear regression). Top, GWAS association (genotype and eGFR as outcome);
Bottom, eQTLs (genotype and expression of DAB2 (left) or
C9 (right)) in different compartments (whole kidney, tubule
or glomerulus). Note the overlapping genetic region associated with kidney
function and DAB2 expression. n = 133,814
individuals for CKD GWAS; n = 121 individuals for tubule eQTLs.
n = 119 individuals for glomerulus eQTLs;
n = 96 individuals for whole kidney eQTLs. Dash lines,
significance thresholds (GWAS, P = 5 ×
10−8 ; eQTL, nominal P threshold for
individual gene calculated by permutation). b, Genotype
(rs11959928) and gene expression (DAB2 or C9)
association in human tubules (n = 121) and glomeruli
(n = 119). Center lines show the medians; box limits
indicate the 25th and 75th percentiles; whiskers extend to
the 5th and 95th percentiles, outliers are represented by
dots. P was calculated by linear regression.
Figure 4:
Functional annotation of the genomic area around rs11959928.
a, The effect size (β) of eQTL
association between rs11959928 and DAB2, in tubular
(n = 121) glomerular (n = 119)
compartments and 44 human tissues from GTEx (v6p). The x-axis
shows effect size (β) with 95% CI. b,
Meta-analysis result of association relationship between rs11959928 and
DAB2, using eQTLs of kidney compartments and 44 human
tissues from GTEx. Each dot represents one tissue data point. The
y-axis shows the nominal P of association
in each single tissue. The x-axis shows m, the
posterior probability calculated by METASOFT, for each tissue. c,
Top, LocusZoom plots of GWAS (genotype and eGFR association) and kidney tubule
DAB2 eQTLs (genotype and DAB2 expression
association). The y-axis show -log (P) of association tests (by
linear regression). n = 133,814 individuals for CKD GWAS;
n = 121 individuals for tubule eQTLs. Bottom, histone
H3K27ac enrichment of the genomic region (analyzed by chromatin
immunoprecipitation) in human kidney samples, blood macrophages and the 7 ENCODE
cell lines.
Using single-cell RNA-Seq of mouse kidney tissue samples, we found that
in the kidney, Dab2 expression is restricted to the proximal
tubules and to macrophages (Fig. 2a). While
one prior study reported a significant eQTL effect of rs11959928 for
DAB2 in peripheral blood mononuclear cells (PBMC)[49] that was in the opposite
direction (z-score = −32.69, P = 5.123 ×
10−124) (Supplementary Fig. 4b); however, this eQTL was not replicated in the
whole blood samples of the GTEx dataset (β = 0.032,
P = 0.394) (Supplementary Fig. 4c).Given differences in eQTL effects, we examined the functional annotation
of this candidate GWAS and eQTL locus in the human kidney and other
tissues/cells by comparing the enrichment for histone H3K27ac, a marker that
distinguishes an active enhancer from poised ones. We noticed a binding peak of
H3K27ac that is specifically found in kidney, but not in macrophages or any
other cells analyzed by ENCODE (Fig. 4c).
The kidney-specific enhancer peak coincided with the CKD GWAS region, suggesting
that this region might have a functional role in kidney tubule cells and likely
explain the existence of a kidney eQTL effect for a gene with a relatively broad
expression profile.Furthermore, we also noted that GTEx (GTEx v7, https://www.gtexportal.org/) reported a
kidney-specific DAB2 isoform, when compared to other human
tissues. This isoform contains an exon skipping (chr5:
39,382,720–39,383,373, hg19) and was highly expressed in the kidney.
Using the LeafCutter[50] method
for isoform quantification, we confirmed the presence of this isoform in our
human kidney tubule dataset (Supplementary Fig. 4d). Kidney-specific isoform expression can also
contribute to the specific eQTL effect on DAB2 in tubule cells.
In summary, we have identified the CKD associated variant rs11959928 with renal
tubule-specific effects on DAB2 expression levels. Our results
indicate a kidney-specific isoform expression and a kidney-specific enhancer
peak at this locus that is not present in other cell types and suggest genetic
variation at this enhancer could lead to kidney diseases through higher
DAB2 expression in renal tubules.
Dab2 alters kidney disease development in mice
To determine whether C9 or Dab2 plays
a role in kidney disease development, we performed in vivo
validation using animal models with reduced gene dosage. Global knock-out and
heterozygous C9 mice were phenotypically normal and we did not
observe structural or functional changes in the kidney (Fig. 5a–c). We reasoned that
C9 might alter injury response and might still be
responsible for kidney disease development. Therefore, we subjected wild-type
(WT) and C9+/− mice to folic acid-induced
kidney injury (FAN; folic acidnephropathy) (Supplementary Fig. 5a). Transcript
levels of kidney fibrosis markers including Fibronectin 1
(Fn1), Collagen 1a1
(Col1a1), and Collagen 3a1
(Col3a1) were significantly higher in WT mice when injected
with folic acid. However, we found no differences between WT and
C9+/− mice in this CKD model (Fig. 5a). Similar results were obtained when
analyzing histological lesion on PAS-stained and Sirius Red-stained kidney
sections (Fig. 5b, c).
Figure 5:
Tubule-specific expression of Dab2 influences kidney
disease development in mice. a, Relative mRNA amount of
Fibronectin 1, Fn1; Collagen1a1, Col1a1; and Collagen3a1,
Col3a1 in C9+/+ and
C9+/− mice with or without FA injection.
b, Representative images of PAS-stained and Sirius Red-stained
kidney sections from C9+/+ and C9
+/− mice with or without FA injection. Scale bar: 20μm.
c, Quantification of Sirius Red-stained kidney sections from
C9+/+ and C9
+/− mice with or without FA injection. d,
Relative mRNA amount of Fn1, Col1a1 and
Col3a1 in control and
Kspcre/Dab2flox/+ mice with or
without FA injection. e, Representative images of PAS-stained and
Sirius Red-stained kidney sections from control and
Kspcre/Dab2flox/+ mice with or
without FA injection. Scale bar: 20μm. f, Quantification of
Sirius Red-stained kidney sections from control and
Kspcre/Dab2flox/+ mice with or
without FA injection. g, h, Representative western blots of
p-Smad2, Smad2, p-Smad3, Smad3, p-JNK, JNK, Fibronectin in TGFβ-treated
Dab2+/+ and
Dab2+/− primary TECs from
n = 4 independently repeated experiments. β-actin
was used as a loading control. P values were calculated by
One-way ANOVA with post-hoc Tukey test (a, c, d,
f). C9+/+ Control: n = 8,
C9+/− Control: n = 8,
C9+/+ FA: n = 5,
C9+/− FA: n = 6
(a-c). WT/Dab2flox/+ Control:
n = 5,
Kspcre/Dab2flox/+ Control:
n = 4, WT/Dab2flox/+ FA:
n = 8,
Kspcre/Dab2flox/+ FA:
n = 5 (d-f). Data are represented as mean
± SD.
Next, we studied the role of Dab2, as the CKD GWAS also
influenced the expression of Dab2 in the kidney. We generated
mice with tubule epithelial-specific dose reduction of Dab2
using the well-characterized tubule-specific Kspcre mice and
Dab2flox mice (Supplementary Fig. 5b). The genetic
and eQTL study integration suggested that the risk allele was associated with
higher DAB2 levels in the kidney (Table 1). We did not observe any renal functional or
histological abnormalities in
Kspcre/Dab2flox/+ animals at
baseline. Next, we subjected these animals to FA-induced acute and chronic
kidney injury. A significant lower expression of fibrosis markers
(Fn1, Col1a1 and Col3a1)
at the mRNA level in Kspcre/Dab2flox/+
mice was observed after FA injection (Fig.
5d). We also found that the degree of disease was significantly less
in tubule-specific Dab2 knock-out mice compared with FA-treated
WT mice (Fig. 5e, f), indicating that the
directionality is consistent with the GWAS effect direction. In additional to
the FAN model, we also found a lower level of interstitial fibrosis in mice with
tubule-specific loss of Dab2 subjected to unilateral ureter
obstruction (UUO). (Supplementary Fig. 5c–e).To understand the role of Dab2 in kidney fibrosis
development, we generated primary renal tubule cells with lower
Dab2 levels by infecting cells from
Dab2flox/+ mice with Cre-eGFP adenovirus (Supplementary Fig. 6a).
Injured TECs release TGFβ (encoded by Tgfb1), which is
one of the best-known profibrotic cytokines[51,52]. TGFβ
levels are known to be higher in the UUO model of fibrosis[53] and we confirmed the higher
Tgfb1 expression in the FA-induced fibrosismouse model as
well (Supplementary Fig.
6b).We found that lowering the Dab2 level resulted in less
TGFβ-induced Smad2 and Smad3 phosphorylation of cultured tubule cells
(Fig. 5g). In our experiments,
Dab2 had no significant effect on p38 or ERK
phosphorylation (Supplementary
Fig. 6c), but we found lower JNK phosphorylation in absence of
Dab2 (Fig. 5h). As
Smad3 and JNK mediate TGFβ-induced profibrotic gene expression, we found
that reducing Dab2 levels resulted in less TGFβ-induced
fibronectin and collagen1 production in cultured tubule cells (Fig. 5h and Supplementary Fig. 6d).In summary, these results indicate that Dab2 in tubule
cells is a likely causal gene for CKD development as Dab2 plays
an important role in TGFβ-induced profibrotic gene expression.
Discussion
The contribution of multiple cell types and mechanisms to kidney disease
pathogenesis has been proposed. A key limitation to disease understanding is that
most human studies cannot go beyond descriptive and correlative analysis, and animal
models often show limited relevance to human disease development as genotypes are
established before disease development. GWA studies provide valuable insight into
disease mechanisms. While GWAS has identified non-coding genetic variants associated
with CKD, the underlying genes, cell types and mechanism still remains elusive. Here
we generated novel datasets and conducted a genome- and transcriptome-wide scan for
genetic variants associated with gene expression variation (eQTLs) in two kidney
compartments: tubules and glomeruli. While some cell-type-specific eQTL analyses
have been performed in blood samples, we believe that this is the first study to
directly characterize cell-type specificity of different compartments on solid organ
samples[38]. By integrating
this data with prior GWA studies, epigenome analysis and single-cell transcriptome
analysis, we have identified putative causal genes for 24 of the 83 GWAS loci.To date, GTEx has conducted eQTL analysis on 53 human tissues, 44 of which
had sufficient sample size for further analyses (GTEx v6p). Kidney, however, was not
included due to the limited sample size[25]. By comparing our results with GTEx, we showed that the
majority of eQTLs are shared between multiple GTEx tissue types as well as two
kidney compartments. Strikingly though, we observed that previously identified CKD
GWAS loci were more likely to be associated with eQTLs that are
compartment-specific, reside further away from promoters and possibly act by
modulating compartment-specific active enhancers compared to those identified by
bulk-sequencing data analysis. Our study highlights that cell-type-specific eQTL and
epigenome datasets are crucial components of post-GWAS prioritization and mechanism
studies. Deeper insight into these cell-type-specific mechanisms may reveal that
many diseases and conditions are perturbations of a specific cell type rather than
specific organs.Next, we investigated whether specific regulatory elements were enriched for
compartment-specific or compartment-shared eQTLs. Functional annotation analysis
indicated that compartment- and tissue-shared eQTLs were more likely to be located
in promoter regions. On the other hand, by incorporating single-cell mouse kidney
RNA-Seq data, we demonstrate that disease-specific target gene regulation is likely
to be cell-type-specific and the most likely causal GWAS variants are localized to
cell-type-specific enhancers. We then focused on the DAB2 locus to
illustrate such cell-type-specific eQTL and enhancer effects.We used comprehensive and novel compartment-specific RNA-Seq dataset,
computational analysis as well as rigorous in vitro and in
vivo assays to identify DAB2 as a new causal gene for
CKD pathophysiology. The rs11959928 variant has been associated with CKD in multiple
GWAS[10-12]. Yet, this variant was not associated with
gene expression changes in GTEx[25]
or in bulk kidney eQTL datasets[30].
Furthermore, even though DAB2 is expressed across multiple cell
types, we observed that this variant only displays tubule-specific effects on
DAB2 expression. Single-cell RNA-Seq data further refined the
target tissues to two cell types (tubules and macrophages) with high
DAB2 expression in the kidney. Of note, a prior
report[49] showed a
macrophage-specific eQTL effect of DAB2 of this variant in the
opposite direction. However, this effect and direction was not replicated in the
larger GTEx dataset[25], and
follow-up in vivo results failed to show consistent effects of
macrophage-specific Dab2 reduction on kidney fibrosis development
(S. H. and K. S. unpublished data). Based on the replicated eQTL effect, combination
of strong colocalization between the tubule-specific eQTLs and CKD GWAS, and the
protection from kidney injury observed in DAB2 knock-out mice, we
conclude that the CKD risk allele at the DAB2 locus from GWAS is
associated with higher expressed levels of DAB2 in tubule cells,
leading to tubular fibrosis.DAB2 is a central adaptor protein in several
receptor-mediated pathways[54].
In vitro studies showed that DAB2 is required
for TGFβ-mediated signaling in epithelial cells[55]. TGFβ is one of the strongest
inducers of kidney fibrosis and CKD[51]. The CKD risk-increasing allele of the underlying causal
variant likely modifies this enhancer function specifically in tubule cells, raising
cytokine levels (including TGFβ), thus altering endocytosis and inducing
downstream development of fibrosis through DAB2. We demonstrate the
role of tubule-specific DAB2 in kidney disease development using
cell-type-specific knock-out mice.Through gene-gene network analysis, we also show that these 24 putative
causal genes are enriched in a specific kidney cell type and highlight the
endo-lysosomal pathway as a likely pathophysiological mechanism in the proximal
tubules for kidney disease development. Proximal tubule epithelial cells are the
most common cell type in the kidney and dedifferentiation of these highly
specialized cells has been linked to a reduction in kidney function and the
development of kidney disease[56,57]. The proximal tubules play a key
role in initial processing of the primary blood filtrate as the renal glomerulus
processes more than 144 liters of primary filtrate a day. Further, the proximal
tubules have one of the highest levels of endocytic activity in the body to reclaim
critical nutrients from the primary filtrate[58]. Alterations in the endocytic, lysosomal and autophagy
pathways of these cells have been previously observed in animal models and
individuals with kidney disease[59,60]. Now, our genetic studies can help
to refocus attention to this specific cell type and mechanisms in CKD
development.In summary, this is one of the first and extensive post-GWAS annotations of
CKD risk loci. We combined compartment-specific transcriptome, genome, epigenome and
single-cell sequencing data to identify novel genes, cell types and mechanisms
contributing to the CKD pathophysiology. The integration of these datasets is
critically needed information for the field. Our data delivered novel biological
insight by the identification of DAB2 as kidney disease-causing
gene and an additional 26 putative candidate genes. We also show that renal tubule
cells and the endo-lysosomal pathway play an important role in CKD, providing a
novel mechanism for disease development.
Online Methods
Sample procurement
Human kidney tissue collection was approved by the University of
Pennsylvania Institutional Review Board. Kidney samples were obtained from
surgical nephrectomies. Nephrectomies were de-identified, and the corresponding
clinical information was collected through an honest broker; therefore, no
consent was obtained from the subjects.Collected tissue was immersed in RNAlater (Ambion#AM7020) solution at
4°C for several hours prior to being stored at −80°C in
RNAlater. Tissue was thawed on ice, placed in RNAlater, and manually
microdissected for glomerular and tubular compartments. In general,
60–150 glomeruli that readily released from the capsule were collected
and placed into RNeasy RNA Tissue Lysis Buffer Solution (per Qiagen RNeasy kit
manufacturer instructions (Qiagen#74106)). We refer to the remaining compartment
as tubule throughout the article.Part of the tissue core was formalin fixed and paraffin embedded. These
samples were later sectioned and stained with Hematoxylin eosin or periodic acid
schiff. Our local renal pathologist performed an unbiased review of the tissue
section by scoring multiple parameters.DNA was isolated using the Qiagen DNAeasy or MagAttract High Molecular
Weight DNA Kit (Qiagen#67563) according to the manufacturer’s
instructions. DNA was quantified by the Invitrogen Quant-iT PicoGreen dsDNA
Assay Kit (Invitrogen#P11496).
Data production
Genomic DNA isolated from whole kidney or tubule tissue was used for
genotyping. 424 individuals were genotyped using Affymetrix Axiom Biobank
genotyping array. Quality control steps were performed using PLINK 1.9[61], and ultimately, we excluded
16 samples. First, two samples were excluded due to their SNP data which were
ambiguous regarding sex. An additional four samples were excluded because of
their low call-rate (<5%), and another two samples were excluded because
of elevated heterozygosity (Supplementary Fig. 7a). To further identify potential sample
contamination, identity-by-descent (IBD) was computed between all pairwise
sample combinations, and eight samples were excluded due to their high PI_HAT
(>0.2).To quantify population structure, Principal Component Analysis (PCA)
implemented in EIGENSTRAT was conducted on these final 408 samples, with
additional genotype data from 395 HapMap2 samples (112 CEU, 84 CHB, 86 JPT, 113
YRI) and 2,504 samples from the 1000 Genomes Project Phase 1 (release v3, 661
AFR, 347 AMR, 504 EAS, 503 EUR, 489 SAS), respectively[62,63]. Plots of the two main principal components (PCs) for each
dataset are shown in Supplementary Fig. 7b, c. Subsequently, genotypes of the 408 samples
were phased with SHAPEIT2[64]
and imputed with IMPUTE2[65],
using multi-ethnic panel reference[66] from 1000 Genomes Phase 1 v3.Separated by two kidney compartments, RNA was isolated from a total of
329 tubule samples and 311 glomerulus samples from up to 359 individuals. RNA
quality was assessed with the Agilent Bioanalyzer 2100. Only samples with RIN
scores above 7 and a minimum total RNA of 100 ng were used for cDNA production.
Non-strand specific, polyA+ selected RNA-Seq libraries were generated using the
Illumina TruSeq protocol. Libraries were sequenced to a median depth of 35
million 100-bp single-end reads. After first assessing sequence quality using
FastQC and then trimming the adaptor and lower-quality bases using Trim-galore,
RNA-Seq reads were aligned to the human genome (hg19/GRCh37) using STAR
(v2.4.1d) based on GENCODE v19 annotations[29,67]. Gene-level
expression was estimated on uniquely mapped reads as reads per kilobase of
transcript per million mapped reads (RPKM) using HTSeq 0.9.1[68] and DESeq2[69]. Samples with less than 15 million
mapped reads were excluded.For eQTL analysis, only samples with European ancestry and absence of
significant kidney structural changes (tubular fibrosis < 10%, glomerular
sclerosis < 10%) were used. Variants were excluded from analysis if they
had: (1) call rate < 95%; (2) Minor allele frequency (MAF) < 5%;
(3) deviated from Hardy-Weinberg Equilibrium (P <
10−6); and (4) imputation info score less than 0.4.
Finally, 121 samples were used for tubule eQTL analysis, and 119 samples were
used for glomerulus eQTL analysis, respectively. In addition, genotype-based PCA
analysis was conducted again using now the final set of tubule and glomerulus
samples, respectively, and then the first six PCs were used as covariates (Supplementary Fig. 7d,
e). We also generated a VCF file with dosages for alternative allele
counts used as input for the Matrix eQTL and FastQTL software packages[70,71].
Cis-eQTL mapping
We conducted cis-eQTL (further just referred to as
eQTL) mapping with 121 samples of tubules and 119 samples of glomeruli,
respectively. Only genes having at least 0.1 RPKM in 2 or more samples were
considered significantly expressed and used for eQTL mapping. Within each
compartment, quantile-normalization was performed on RPKMs, and then expression
measurements for each gene were rank-based inverse normal transformed. To remove
the effects of unobserved confounding variables, PEER was performed for 15
factors on gene expression with age, gender, collected site, sequencing batch,
RIN, and fibrosis percentage (for tubule samples) or sclerosis percentage (for
glomerulus samples) as covariates[72]. The Pearson correlation P between known
clinical variables and 15 PEER factors calculated by 121 tubule RNA-Seq samples
are shown in Supplementary
Fig. 7f. The Pearson correlation P between known
clinical variables and 15 PEER factors calculated by 119 glomerulus RNA-Seq
samples are shown in Supplementary Fig. 7g. Finally, once the PEER factors were regressed
out, the residuals of the analysis with rank-based inverse normal transformation
were used as expression measurements in following eQTL mapping, and 32,596 genes
were used for tubule analysis as well as 31,635 for glomeruli analysis.The cis window was defined as 1 megabase up- and
down-stream of the transcriptional start site (± 1Mb), and we tested
about 1.25 × 108 gene-SNP pairs for each compartment. Nominal
P were calculated for each SNP-gene pair with FastQTL using
linear regression with an additive effect model and adjusted by six genotype
PCs. Significance of the top associated variant per gene was estimated by
adaptive permutation with the setting “--permute 10000” in
FastQTL. Beta distribution-adjusted empirical P were used to
calculate q using Storey’s q
method[73], and a
q threshold ≤ 0.05 was applied to identify eGenes
(eQTL-containing genes).Next a genome-wide empirical P threshold,
Pt, was defined as the empirical
P of the gene closest to the 0.05 FDR threshold.
Pt was used to calculate a nominal
P threshold for each gene based on the beta distribution
model (from FastQTL) of the minimum P distribution
f(Pmin) obtained from the permutations for the
gene. For each eGene, variants with a nominal P below the
gene-level threshold were considered significant as eVariants[25].
Meta-analysis of multiple-tissue eQTL mapping
The eQTL summary results of 44 other human tissues were downloaded from
GTEx [v6p; www.gtexprotal.org][25]. To evaluate which of the GTEx eQTLs
replicated the significant SNP-gene pair eQTLs found in the kidney eQTL study,
the π1 statistic was calculated to estimate
the fraction of true positive eQTLs, considering the formula
π1 = 1 -
π0, where
π0 is the estimated fraction of null
eQTLs from the full distribution of P. For example, for tubule
eQTL, eGenes and their top significant SNPs were selected. The nominal
P of these SNP-gene pairs in another GTEx eQTL were
extracted, and then π1 was calculated on
these P using Storey’s q
method[73].To further define tubule-specific eGenes and glomerulus-specific eGenes,
METASOFT, a meta-analysis method, was performed on all variant-gene pairs that
were significant (FDR < 5%) in at least one of the 46 tissues (2 kidney
compartments and 44 GTEx tissues) based on the single-tissue results from
FastQTL. A random effects model in METASOFT (called RE2) was used and the
posterior probability that an eQTL effect exists in a given tissue (called
m) was calculated for each SNP-gene pair and tissue
tested[48]. A
significance cutoff of m > 0.9 was used to discover
high-confidence eQTLs. Based on this meta-analysis result, tubule-specific
eGenes (glomerulus-specific eGenes) were defined as having m
> 0.9 in less than 5 tissues, including tubule (glomerulus). In contrast,
tubule-shared eGenes (glomerulus-shared eGenes) were defined as having
m > 0.9 in more than 40 tissues, including tubule
(glomerulus) (Supplementary
Fig. 8a).To further test the sensitivity of the original results to these
cut-offs, we changed the threshold numbers and re-estimated the enrichment
results of tubule-specific eGenes and tubule-shared eGenes. Even though the
P are slightly inflated, we found that chronic kidney
disease, glomerular filtration rate, and metabolite traits are still the top
traits enriched for tubule-specific eGenes, as well as tubule-shared eGenes are
enriched by some common human diseases (Supplementary Fig.
8b–d).To identify tubule-compartment-specific eGenes,
glomerulus-compartment-specific eGenes and kidney-compartment-shared eGenes,
Meta-tissue, which was used to address effect size heterogeneity to detect eQTLs
across multiple tissues, was performed on three eQTLs datasets[74], tubule eQTLs, glomerulus
eQTLs and our previous whole kidney eQTLs[30], to remove the effect brought by collecting multiple
tissues from the same individuals.
Functional annotation
Traits and disease associated significant genetic variants were
extracted from GWAS Catalog v1.0.1 (accessed 1/4/2017, https://www.ebi.ac.uk/gwas/)[41]. These variants were used for enrichment analysis for
kidney compartment eQTLs.Human kidney-specific ChIP-Seq data can be found at GEO: GSM621634,
GSM670025, GSM621648, GSM772811, GSM621651, GSM1112806, GSM621638. Different
histone markers were combined into chromatin states using ChromHMM[75,76]. To quantify kidney eQTL enrichment in these regulatory
regions, the top significant SNPs per eGenes were used as tested eQTL subset. To
calculate enrichment significance, the data was compared between the tested eQTL
subset and a randomly selected SNP set that was matched for MAF (< 0.01)
and distance to TSS of target genes (< 2.5kb).Transcription Factor ChIP-Seq (161 factors) was extracted from ENCODE
project TFBS clusters (V3), and the enrichment analysis procedure was as same as
former[29].
GWAS analysis
Leading SNPs of CKD GWAS
eGFR-associated GWAS SNPs were collected from the CKDGen Consortium and
several other studies to determine whether kidney disease traits in our eQTLs
were enriched[9-12]. This step yielded 83 leading SNPs.
Coloc
To estimate the posterior probabilities of whether two potentially
related phenotypes – kidney disease and kidney compartment gene
expression – share common genetic causal variants in a given region, we
performed colocalization analyze using coloc[42] and summary data of GWAS and
eQTL analyses. The eGFR-associated GWAS study from CKDGen Consortium was used as
CKD GWAS dataset. Then, the summary data from both studies of SNPs within 100
kilobase of each leading SNP were used to calculate the posterior probability.
In the coloc results, H3 represents the posterior probability
of both traits are associated, but with different causal variants; H4 represents
the posterior probability that both traits are associated and share a single
causal variant. We used PP_H4 > 0.8 as the threshold of
colocalization.
RTC
Regulatory Trait Concordance (RTC) score was used to assess whether a
GWAS SNP tags the same functional variant as a regulatory variant[43]. In short, if an eQTL variant
and GWAS leading SNP were located in the same region between recombination
hotspots, the eQTL phenotype (gene expression) was corrected for all the N
variants within this region using linear regression, and then the residuals were
used as N pseudo-phenotypes. The P, obtained by testing for
eQTL association between the eQTL variant and these N pseudo-phenotypes, were
sorted (descending) and ranked. Then, the rank of the P
obtained from the pseudo-phenotype corrected by GWAS SNP was used to calculate
the RTC score: (N-GWASrank) / N. The RTC score ranges from 0 to 1,
and 1 represents higher likelihood of shared effect between eQTL and GWAS.
Single-cell RNA-Seq of mouse kidney
Single-cell RNA-Seq libraries were generated using the 10X Chromium
Single Cell instrument and the 10X Chromium TM Single cell 3’ Library Kit
according to the manufacturer’s original protocol. We have generated
single cell suspensions from seven different mouse kidney samples. The
single-cell sequencing libraries were sequenced on an Illumina HiSeq with
2×150 paired-end kit. The sequencing reads were demultiplexed, aligned to
the mouse genome (mm10) and processed to generate gene-cell data matrix using
Cell Ranger 1.3 (http://10xgenomics.com). We have sequenced 57,979 mouse kidney
cells[36].Cells were clustered using dimension reduction. We identified 3 cell
types in the glomerulus and 7 kidney cell types in the tubule, in addition to 6
immune cells. Of the entire 16 cell clusters identified by single-cell RNA-Seq
in mice, 13 are previously known cell types and while the other 3 are novel cell
types.
Pathway analysis of candidate genes
The pathway analysis of candidate genes, identified by colocalization
between eGFR GWAS and kidney compartment-based eQTLs, was performed by the
web-based program, GIANT (http://giant.princeton.edu)[46].
Decompose tubule compartment tissue by various segments
CIBERSORT was developed to identify the relative levels of distinct cell
types within a complex expression admixture[40]. Here, we use RNA-Seq data of microdissected rat kidney
tubule segments[39] (https://hpcwebapps.cit.nih.gov/ESBL) as
signature matrix, to decompose our 121 tubule samples, identifying their segment
composition.
Isoform detection using LeafCutter
LeafCutter[50] is used
to quantify RNA splicing variation using short-read RNA-Seq data by leveraging
reads that span each intron to quantify intron usage across samples (http://davidaknowles.github.io/leafcutter/).
We performed LeafCutter on 121 tubule RNA-Seq samples, by which a specific exon
deletion was identified on DAB2.
SMR
SMR[77]
(Summary-data-based Mendelian Randomization) is used to test for potential
causal effects of, e.g. gene on complex trait, given a SNP as an instrumental
variable, using summary-level data from GWAS and eQTL studies. We performed SMR
on CKD GWAS and eQTLs (tubule and blood[49] separately) to estimate the direction of expression
changes of DAB2 with eGFR, using multiple variants in the
cis-eQTL region of the gene.
Mice
C9 knock-out mice purchased from Jackson Lab (Stock No:
022779) were kindly provided by Dr. Wen-chao Song from University of
Pennsylvania. Dab2flox/flox mice were kindly
provided by Dr. Xiang-Xi Xu from University of Miami[78,79]. Kspcre mice were purchased from Jackson Lab
(Stock No: 012237). 8- to 10-week-old male mice were used in this study. Mice
were injected with FA (250 mg/kg, dissolved in 300mM NaHCO3)
intraperitoneally and sacrificed 1 week later. For the UUO model, mice underwent
ligation of the left ureter, and sacrificed 1 week later. Sham-operated mice
were used as controls. Animal studies were approved by the Animal Care Committee
of the University of Pennsylvania.
Quantitative Real-Time PCR
RNA was isolated using RNeasy Mini Kit per manufacturer’s
instructions. 1μg RNA was reverse transcribed using cDNA archival kit
(Applied Biosystems#4368813), and quantitative real-time PCR was conducted in
the ViiA 7 System (Applied Biosystems) machine using SYBRGreen Master Mix
(Applied Biosystems#A25742). The data were normalized and analyzed using the
delta/delta CT method. Primers used are listed in Supplementary Table 4.
Histological analysis
We used formalin-fixed, paraffin-embedded kidney sections stained with
periodic acid Schiff (PAS) or Picrosirius red (Polyscience#24901). Slides were
examined, and pictures were taken with an Olympus BX43 microscope and Olympus
DP73 Diagnostic CCD camera. Quantification was performed on Sirius Red-stained
kidney sections. For each section, five random fields were quantified in an
unbiased manner using ImageJ. Immunohistochemistry images of LRRC3 (https://www.proteinatlas.org/ENSG00000160233-LRRC3/tissue/kidney#img)
and ANXA2 (https://www.proteinatlas.org/ENSG00000182718-ANXA2/tissue/kidney#img)
were downloaded from Human Protein Atlas (v18.proteinatlas.org) [80].
Primary Culture of Renal Tubule Cells
Kidneys were collected from Dab2flox/+mice
(males, 3–5 weeks old). Cells were isolated by 2 mg/ml collagenase I
(Worthington Biochemical Product#CLS-1) digestion for 30 min at 37°C with
gentle stirring. Cells were then filtered through the 100-mm mesh to isolate
single cells. Cell suspensions were cultured in RPMI 1640
(Corning#10–040-CM) supplement with 10% fetal bovine serum (Atlanta
Biologicals#S11950), 20 ng/ml EGF (Peprotech#AF-100–15), 1 × ITS
(Gibco#51500–056), and 1% penicillin-streptomycin
(Corning#30–002-CI) at 5% CO2, at 37°C. When cell
confluence reached 70%, the media were changed with serum-free RPMI and infected
with Ad5CMV-eGFP (Ad-GFP) or Ad5CMVCre-eGFP (Ad-Cre-eGFP) (University of Iowa
Gene Transfer Vector Core, Iowa City, IA) at 4 × 1010 plaque
forming units/ml for 24h. Infection efficiency was estimated under fluorescence
microscope by the presence of GFP-positive cells. After that, cells were treated
with 5 ng/ml TGFβ (Peprotech#100–21) for 24 h.
Western blot analysis
Cell lysates were prepared with SDS lysis buffer containing protease
inhibitor cocktail (Complete Mini, Roche#11836153001) and phosphatase inhibitor
(PhosSTOP, Roche# 4906837001). Proteins were resolved on 8–12% gradient
gels, transferred on to polyvinylidene difluoride membranes and probed with
antibodies as below; Dab2 (BD Transduction Laboratories#610464; 1:1000),
Phospho-Smad2 (Cell signaling#3108; 1:1000), Smad2 (Cell signaling#5339;
1:2000), Phospho-Smad3 (Abcam#ab52903; 1:1000), Smad3 (Abcam#ab40854; 1:2000),
Phospho-JNK (Cell signaling#4668; 1:2000), JNK (Cell signaling#9252; 1:2000),
Phospho-Erk1/2 (Cell signaling#4370; 1:2000), Erk1/2 (Cell signaling#4695;
1:2000), Phospho-p38 (Cell signaling#4511; 1:2000), p38 (Cell signaling#8690;
1:2000), Fibronectin (Abcam#ab32419, 1:2000). Anti-rabbit (Cell Signaling#7074)
or anti-mouse (Cell Signaling#7076) IgG horseradish peroxidase (HRP) was used as
a secondary antibody. Blots were detected by enhanced chemiluminescence (Western
Lightning-ECL, Thermo Scientific). The full scans of the Western blots are
showed in Supplementary Fig.
9.
Statistics
Statistical analyses for animal study were performed using GraphPad
Prism 6.0 software. All values are expressed as mean and standard deviation
(SD). Unpaired two-sided Student’s t test was used for
comparisons between two groups. One-way ANOVA with post-hoc
Tukey test was used to compare multiple groups. P less than
0.05 was considered statistically significant.
Reporting Summary
Further information on experimental design is available in the Nature
Research ‘Life Sciences Reporting Summary’ linked to this
article.
Authors: Alkes L Price; Nick J Patterson; Robert M Plenge; Michael E Weinblatt; Nancy A Shadick; David Reich Journal: Nat Genet Date: 2006-07-23 Impact factor: 38.330
Authors: Kiran Musunuru; Alanna Strong; Maria Frank-Kamenetsky; Noemi E Lee; Tim Ahfeldt; Katherine V Sachs; Xiaoyu Li; Hui Li; Nicolas Kuperwasser; Vera M Ruda; James P Pirruccello; Brian Muchmore; Ludmila Prokunina-Olsson; Jennifer L Hall; Eric E Schadt; Carlos R Morales; Sissel Lund-Katz; Michael C Phillips; Jamie Wong; William Cantley; Timothy Racie; Kenechi G Ejebe; Marju Orho-Melander; Olle Melander; Victor Koteliansky; Kevin Fitzgerald; Ronald M Krauss; Chad A Cowan; Sekar Kathiresan; Daniel J Rader Journal: Nature Date: 2010-08-05 Impact factor: 49.962
Authors: Christopher E Gillies; Rosemary Putler; Rajasree Menon; Edgar Otto; Kalyn Yasutake; Viji Nair; Paul Hoover; David Lieb; Shuqiang Li; Sean Eddy; Damian Fermin; Michelle T McNulty; Nir Hacohen; Krzysztof Kiryluk; Matthias Kretzler; Xiaoquan Wen; Matthew G Sampson Journal: Am J Hum Genet Date: 2018-07-26 Impact factor: 11.025
Authors: Alexandra C Nica; Stephen B Montgomery; Antigone S Dimas; Barbara E Stranger; Claude Beazley; Inês Barroso; Emmanouil T Dermitzakis Journal: PLoS Genet Date: 2010-04-01 Impact factor: 5.917
Authors: Anna Köttgen; Nicole L Glazer; Abbas Dehghan; Shih-Jen Hwang; Ronit Katz; Man Li; Qiong Yang; Vilmundur Gudnason; Lenore J Launer; Tamara B Harris; Albert V Smith; Dan E Arking; Brad C Astor; Eric Boerwinkle; Georg B Ehret; Ingo Ruczinski; Robert B Scharpf; Yii-Der Ida Chen; Ian H de Boer; Talin Haritunians; Thomas Lumley; Mark Sarnak; David Siscovick; Emelia J Benjamin; Daniel Levy; Ashish Upadhyay; Yurii S Aulchenko; Albert Hofman; Fernando Rivadeneira; André G Uitterlinden; Cornelia M van Duijn; Daniel I Chasman; Guillaume Paré; Paul M Ridker; W H Linda Kao; Jacqueline C Witteman; Josef Coresh; Michael G Shlipak; Caroline S Fox Journal: Nat Genet Date: 2009-05-10 Impact factor: 38.330
Authors: Gaia M Coppock; Lillian R Aronson; Jihwan Park; Chengxiang Qiu; Jeongho Park; Jonathan H DeLong; Enrico Radaelli; Katalin Suszták; Christopher A Hunter Journal: J Immunol Date: 2020-06-10 Impact factor: 5.422
Authors: Karsten B Sieber; Anna Batorsky; Kyle Siebenthall; Kelly L Hudkins; Jeff D Vierstra; Shawn Sullivan; Aakash Sur; Michelle McNulty; Richard Sandstrom; Alex Reynolds; Daniel Bates; Morgan Diegel; Douglass Dunn; Jemma Nelson; Michael Buckley; Rajinder Kaul; Matthew G Sampson; Jonathan Himmelfarb; Charles E Alpers; Dawn Waterworth; Shreeram Akilesh Journal: J Am Soc Nephrol Date: 2019-02-13 Impact factor: 10.121
Authors: Xin Sheng; Chengxiang Qiu; Hongbo Liu; Caroline Gluck; Jesse Y Hsu; Jiang He; Chi-Yuan Hsu; Daohang Sha; Matthew R Weir; Tamara Isakova; Dominic Raj; Hernan Rincon-Choles; Harold I Feldman; Raymond Townsend; Hongzhe Li; Katalin Susztak Journal: Proc Natl Acad Sci U S A Date: 2020-11-03 Impact factor: 11.205