Literature DB >> 31570892

Allele-specific NKX2-5 binding underlies multiple genetic associations with human electrocardiographic traits.

Agnieszka D'Antonio-Chronowska¹, Wubin Ma², Paola Benaglio³, Feng Yang², William W Young Greenwald⁴, Margaret K R Donovan^4,5, Christopher DeBoever⁴, He Li¹, Frauke Drees¹, Sanghamitra Singhal³, Hiroko Matsui¹, Jessica van Setten⁶, Nona Sotoodehnia^7,8, Kyle J Gaulton³, Erin N Smith³, Matteo D'Antonio¹, Michael G Rosenfeld⁹, Kelly A Frazer^10,11.

Abstract

The cardiac transcription factor (TF) gene NKX2-5 has been associated with electrocardiographic (EKG) traits through genome-wide association studies (GWASs), but the extent to which differential binding of NKX2-5 at common regulatory variants contributes to these traits has not yet been studied. We analyzed transcriptomic and epigenomic data from induced pluripotent stem cell-derived cardiomyocytes from seven related individuals, and identified ~2,000 single-nucleotide variants associated with allele-specific effects (ASE-SNVs) on NKX2-5 binding. NKX2-5 ASE-SNVs were enriched for altered TF motifs, for heart-specific expression quantitative trait loci and for EKG GWAS signals. Using fine-mapping combined with epigenomic data from induced pluripotent stem cell-derived cardiomyocytes, we prioritized candidate causal variants for EKG traits, many of which were NKX2-5 ASE-SNVs. Experimentally characterizing two NKX2-5 ASE-SNVs (rs3807989 and rs590041) showed that they modulate the expression of target genes via differential protein binding in cardiac cells, indicating that they are functional variants underlying EKG GWAS signals. Our results show that differential NKX2-5 binding at numerous regulatory variants across the genome contributes to EKG phenotypes.

Entities: Chemical

Mesh：

Substances：

Year: 2019 PMID： 31570892 PMCID： PMC6858543 DOI： 10.1038/s41588-019-0499-3

Source DB: PubMed Journal: Nat Genet ISSN： 1061-4036 Impact factor: 38.330

Genome-wide association studies (GWAS) for electrocardiographic (EKG) phenotypes have found >500 risk variants[1], the majority of which are non-coding and enriched in regulatory elements of the genome. Detecting the causal variants and the molecular mechanisms that drive these associations has been challenging[2], and therefore only a handful of genetic associations with EKG traits have been explained by variants with clear molecular mechanisms[3,4]. Altered transcription factor (TF) binding has been proposed as one of the major mechanisms by which non-coding regulatory variants are causally associated with complex traits[5-7]. NKX2-5 is an evolutionary conserved, cardiac-specific TF, which, through cooperative binding with other core cardiac TFs such as TBX5 and GATA4, regulates heart development[8-11] and is implicated in a spectrum of human congenital heart defects[12]. Moreover, common non-coding variants near NKX2-5, TBX5, and MEIS1 have been associated through GWAS[13-17] with EKG phenotypes, indicating that variation in developmental pathways plays an important role in these traits. Therefore, it is likely that genetic variation affecting the binding of developmental cardiac TFs also influences the heritability of EKG traits. However, this hypothesis has not yet been examined on a genome-wide scale. Because the function of regulatory variants that contribute to common traits is often cell-type specific, attention to the appropriate cellular model in which to test the variants is important. Human induced pluripotent stem cell (iPSC)-derived cell types have recently emerged as a novel platform to analyze the functional consequence of genetic variants on molecular phenotypes in target cell types. iPSCs show variation in molecular phenotypes associated with their genetic background[18-20], making them a suitable model to perform expression QTL (eQTL) studies[19-24]. However, there are only a few studies showing similar utility of iPSC-derived cardiomyocytes (iPSC-CMs) to study regulatory variations[22], with potential limitations being cell-type heterogeneity that arises from directed differentiation[24-26] and the functional immaturity of iPSC-CMs[27]. Thus, while human iPSC-CMs are a promising model system, it has yet to be shown that they could enable the identification and characterization of regulatory variants that play important roles in cardiac traits. Here we conducted a genome-wide analysis to identify regulatory variants affecting the binding of NKX2-5 and investigated their role in cardiac gene expression and EKG phenotypes. We generated iPSC-CM lines from a pedigree of seven whole-genome sequenced individuals and profiled them with a variety of functional genomic assays, including RNA-seq, ATAC-seq, and ChIP-seq of both NKX2-5 and histone modification H3K27ac. After identifying heterozygous sites that showed ASE, we investigated NKX2-5 ASE-SNVs in detail by examining whether they altered cardiac TF motifs and whether they were enriched for eQTLs and EKG GWAS-SNPs. By applying a fine-mapping statistical approach to three GWAS studies (heart rate, atrial fibrillation, and PR interval), we prioritized putative causal variants at known (as well as novel) loci. As a proof-of-principle, we experimentally interrogated two NKX2-5 ASE-SNVs, providing evidence that they are causal variants underlying genetic associations with EKG traits. Our data show that variation affecting the binding of NKX2-5 and other cardiac TFs likely serves as a molecular mechanism underlying control of numerous EKG loci across the genome, and that fine-mapping approaches, combined with molecular phenotype data from iPSC-CMs, can be used to prioritize causal variants in EKG GWAS loci.

Results

Generation and functional genomic profiling of iPSC-derived cardiomyocytes

We generated iPSC-CMs from seven individuals in a three-generation family that includes three genetically unrelated subjects and two parent-offspring quartets (Fig. 1a and Supplementary Table 1). In total, we differentiated nine iPSC lines[28] into 26 iPSC-CM samples: 12 were harvested at day 25 after lactate selection to obtain purer cardiomyocytes, and 14 were harvested at day 15, of which one was lactate purified (Fig. 1a). After confirming the expression of cardiac markers by flow cytometer and immunofluorescence (TNNT2 and MYL7; Supplementary Fig. 1a,b and Supplementary Note), we further examined the iPSC-CMs, and the iPSCs from which they were derived, by comparing their functional genomics profiles (RNA-seq, ATAC-seq, ChIP-seq of H3K27ac and NKX2-5; Supplementary Tables 2 and 3) with those from the Roadmap Epigenomics project[14]. We confirmed that the iPSC-CMs and iPSCs, respectively, expressed cardiac-specific and stem cell-specific genes and epigenetic signatures (Fig. 1b, Supplementary Note, and Supplementary Figs. 1c and 2).

Figure 1

Generation and characterization of iPSCs and iPSC-CMs by gene expression and epigenetic profiling.

a, Pedigree showing the relationships of the seven individuals and summary of derived cell types analyzed. b, Principal component 1 and 2 of RNA-seq (15,725 genes) from iPSCs (29 samples from 7 individuals), iPSC-CMs (27 samples from 7 individuals), Roadmap stem cell lines (H1, HUES64, iPS-20b and iPS18), and human tissues (right ventricle, left ventricle, right atrium, and fetal heart). c-g, Distributions of the average Spearman correlation coefficients between pairs of samples across the 1,000 most variable genes (c,d) or peaks (e-g) for the indicated molecular phenotypes. Median (white dot), interquartile range (thick bar), and the rest of the distributions (line) are shown within each violin plot, with each sample size reported below. P-values of significant (P < 0.05) one-tailed Mann-Whitney tests are shown.

Genetic background underlies variability of molecular phenotypes in iPSC-CMs

Experimental sources of variation across the iPSC-CMs, such as differentiation efficiency, may confound the effects that are driven by different genetic backgrounds[24]. To identify sources of variability in our iPSC-CM datasets, and evaluate the contribution of genetic background to this variation compared with the iPSCs, we performed principal components (PC) analysis on each of the RNA-seq and ChIP-seq datasets, and tested whether known covariates, such as batch, TNNT2 expression (for iPSC-CMs), and subject, were associated with each of the top 10 PCs. While we observed variation in both the iPSC-CMs and iPSCs due to differentiation efficiencies and/or batch effects (Supplementary Fig. 3), the average sample-to-sample Spearman correlation of molecular phenotypes was higher between samples of the same individual than between different individuals (Mann Whitney test P < 0.05); additionally, samples of related individuals tended to be more correlated than samples of unrelated individuals (Fig. 1c-g). Of note, the iPSC-CMs showed slightly greater variation (i.e. lower correlation values) than the iPSCs, likely due to cellular heterogeneity[24-26]. These analyses show that genetic background was a major driver of variability in our iPSC-CM molecular datasets.

NKX2-5 peaks commonly show allele-specific effects

We examined the fraction of genetic variants associated with variable NKX2-5 peaks compared with the other molecular phenotypes by identifying heterozygous sites that showed ASE within each individual. We first merged the sequencing reads of different samples from the same subject and calculated ASE, and then, when multiple individuals carried the same heterozygous SNV, we combined the ASE results across individuals in a meta-analysis. For each phenotype, we tested between 19,371 (NKX2-5) to 123,151 (H3K27ac in iPSC-CMs) heterozygous SNVs within 12,492 to 57,631 regions (genes or peaks) (Fig. 2a) and identified the fraction of SNVs with significant imbalance at FDR < 0.05 (ASE-SNVs) (Fig. 2b). The different phenotypes showed over a 30-fold difference in the percent of ASE-SNVs, with NKX2-5 ChIP-seq having the highest fraction (10% of tested SNVs), while H3K27ac (0.7% in iPSC-CMs) and ATAC-seq (0.3% in iPSC-CMs) had considerably lower fractions. The fact that NKX2-5 ChIP-seq was so much more efficient for detecting ASE-SNVs was largely due its higher effect sizes, consistent with the fact that the assay directly measures differential TF binding, whereas ATAC-seq and H3K27ac measure altered chromatin accessibility and histone modification, respectively, which are indirect consequences of differential TF binding (Supplementary Note and Supplementary Fig. 4). Shared ASE-SNVs between iPSC-CMs and iPSCs (519 in RNA-seq and 43 in H3K27ac) showed high concordance of ASE effects (Fig. 2c) – defined as the mean proportion of the alternate allele across heterozygous sites (Spearman correlation r > 0.85) – indicating consistency of allelic effects between the two cell types. We further tested whether the ASE observed in heterozygous individuals was consistent with the overall effect size (ß, linear regression) on the phenotype when including homozygous samples and observed a significant (P < 0.05), positive relationship for all molecular phenotypes (Fig. 2d-f), with the highest correlation in NKX2-5 peaks (r = 0.69, Spearman correlation). These data demonstrate that the majority of allele-specific effects identified in both iPSC-CMs and iPSCs are due to genetic variation, and that, among all molecular phenotypes examined, NKX2-5 peaks had substantially more ASE-SNVs and showed the highest consistency across individuals.

Figure 2

Identification of coordinated allele-specific effects (ASE) in gene expression, H3K27 acetylation, chromatin accessibility, and NKX2-5 binding in iPSCs and iPSC-CMs.

a, Total number of regions and heterozygous SNVs tested for ASE across all individuals and samples in each dataset. b, Total number of heterozygous SNVs and corresponding regions across all individuals and samples with ASE at FDR < 0.05. The number of ASE shared between iPSCs and iPSC-CMs is indicated by hatches. c, Scatterplot of the alternate allele proportion at shared ASE-SNVs between iPSCs and iPSC-CMs for RNA-seq (n = 516 SNVs) and H3K27ac (n = 43 SNVs). Spearman correlation statistics are indicated. d-f, Scatterplots of the mean proportion of the alternate allele of SNVs with ASE in heterozygous individuals and the effect size of each ASE-SNV, expressed as the slope of linear regression (ß) between gene expression or peak density and genotypes of all seven individuals. Spearman correlation statistics are indicated. The number of SNVs analyzed in d are: 970 for iPSCs and 799 for iPSC-CMs; in e: 255 for iPSCs and 550 for iPSC-CMs; in f: 1,714. g, Scatterplot showing relationship between effect sizes (ß’s) of ASE-SNVs in NKX2-5 peaks on both NKX2-5 and H3K27ac phenotypes (n = 854 SNPs). h, Table showing Spearman correlation coefficients of effect sizes between pairs of different molecular phenotypes. Correlations were calculated between ß’s of SNVs that showed ASE in ChIP-seq datasets (rows) and ß’s of the same variant for the closest gene or peak in a different molecular phenotype dataset (columns).

NKX2-5 correlated effects are consistent with dual role as activator and repressor

Genetic loci associated with differential TF binding between individuals often show coordinated effects across different molecular traits[29]. To examine if NKX2-5 loci with ASE were correlated with H3K27ac and gene expression ASEs, we compared the effect sizes (ß) of ASE-SNVs identified within ChIP-seq peaks with the effect size of the same SNV on neighboring regions from different molecular phenotypes (nearest peak or nearest gene) (Fig. 2g,h). The strongest positive correlation was found between NKX2-5 and H3K27ac genetic effects in iPSC-CMs (Spearman correlation coefficient r = 0.58, P = 1.7 x 10-77 for NKX2-5 ASE-SNVs (Fig. 2g), and r = 0.60, P = 1.6 x 10-30 for H3K27ac ASE-SNVs), supporting the role of NKX2-5 binding in enhancer and promoter activation in these cells. However, genetic effects on NKX2-5 binding were not positively correlated with the expression of neighboring genes (Fig. 2h), possibly due to NKX2-5’s dual role as an activator or repressor[30,31]. We also observed that, when iPSC-CMs or iPSCs had H3K27ac ASE, the effect sizes were positively correlated (r = 0.39, P = 3 x 10-13; r = 0.59, P = 1.3 x 10-11) with H3K27ac peaks in nearby or overlapping regions in the other cell type, suggesting conserved genetic effects at shared enhancers and promoters. On the other hand, while H3K27ac ASE effect sizes were moderately correlated with gene expression in the corresponding cell type, they were not correlated with gene expression in the other cell type (r = 0.33, P = 4 x 10-12 and r = 0.41, P = 1.7 x 10-7 within the same cell type, and r = 0.17, P = 7 x 10-4 and r = 0.06, P = 0.43 for mismatched comparisons; Fig. 2h). These results show that, in both the iPSC-CMs and iPSCs, genetic variation underlies coordinated and cell-type specific differences across multiple molecular phenotypes; of note, while NKX2-5 and H3K27ac ASE-SNVs were highly correlated, altered NKX2-5 binding was not positively correlated with gene expression changes, consistent with a more complex function as both an activator and repressor.

Variation in cardiac TF binding motifs underlie NKX2-5 ASE-SNVs

To investigate whether NKX2-5 ASE-SNVs affected sequence motifs of TF binding sites, we selected the most enriched motifs in NKX2-5 peaks, which included the NKX2-5 homeobox motif (cognate motif), as well as motifs of other heart development TFs (GATA4, TBX5, TBX20, MEF2A/C and MEIS1; Supplementary Table 4) (secondary motifs). For both alleles of all heterozygous SNVs tested for ASE within NKX2-5 peaks, we calculated the motif position weight matrix (PWM) score of each motif. We then compared SNVs with ASE to SNVs without ASE and observed that the former were enriched for altered motifs (Fisher’s exact test FDR < 0.05) (Fig. 3a). Out of the 1,941 NKX2-5 ASE-SNVs, 735 (37.8%) modified at least one of the twelve tested TF motifs: 94 (4.8%) modified both the cognate and a secondary motif, 247 (12.7%) modified only the cognate motif, and 394 (20.3%) modified one or more secondary motifs. Next, we asked whether the preferred allele (highest read count) of each ASE-SNV was associated with a higher predicted motif score. For most motifs, the preferred allele increased the motif score in 70-88% of SNVs (Fig. 3b), and the allelic proportion of ASE-SNVs positively correlated with the change in motif score, supporting an underlying causal effect for the majority of these SNVs (Fig. 3c,d and Supplementary Fig. 5). We additionally observed that ASE-SNVs tended to affect core, conserved positions within the motif more frequently than they affected less conserved positions (Fig. 3e-h), indicating a stronger effect on TF binding affinity. These data indicate that ~40% of sites containing NKX2-5 ASE-SNVs have altered motifs for NKX2-5 and/or for other known cardiac TFs, suggesting that differential allelic binding of NKX2-5 at these sites likely occurred either directly, due to alterations of its own binding sequence, or indirectly, via alterations of TF binding sites of co-binding partners.

Figure 3

Transcription factor binding motifs are altered by SNVs with ASE in NKX2-5 ChIP-seq.

a, Odds ratios from two-sided Fisher’s exact test comparing the proportion of motif-altering SNVs between variants with ASE (n = 1,941) and variants without ASE (n = 19,371) in NKX2-5 ChIP-seq peaks from combined iPSC-CM samples. Asterisks indicate enrichment at FDR corrected P-value < 0.05. b, Number of TFBS motifs that were strengthened (red) or weakened (blue) by the preferred allele of ASE-SNVs identified in NKX2-5 ChIP-seq. c, Scatterplot of the reference allele proportion at ASE-SNVs (n = 341) and the difference of NKX2-5 motif score between reference and alternate alleles. Spearman correlation coefficient and P-value are indicated at the bottom. Dots are color-coded as in b. d, Summary table of Spearman correlation statistics calculated as in c for all motifs tested (see Supplementary Fig. 4 for the other scatterplots). e-h, Frequency of ASE-SNVs altering different positions within the motifs of NKX2-5 (e), GATA (f), MEIS1 (g), and TBX20 (h). NKX2-5, GATA and TBX20 PWMs were obtained using de-novo motif finding. Bars are color-coded as in b. Blue bars overlap the red ones (i.e. they are not stacked).

NKX2-5 ASE-SNVs modulate cardiac-specific gene expression

We examined if NKX2-5 ASE-SNVs were associated with cardiac-specific effects on gene regulation by comparing the enrichment of NKX2-5 and H3K27ac ASE-SNVs with quantitative trait loci (QTL) from diverse cell types, including DNase hypersensitivity QTLs (dsQTLs) in lymphoblastoid cell lines (LCLs)[32], expression QTLs (eQTLs) from iPSCs[21], and eQTLs from 13 combined studies obtained from Haploreg[33] (“combined tissues”) (Fig. 4a-c and Supplementary Table 5). In iPSC-CMs, H3K27ac ASE-SNVs were enriched over SNVs without ASE for all three types of QTLs (Fisher’s exact test P < 0.05); on the other hand, H3K27ac ASE-SNVs in iPSCs were only enriched for iPSC eQTLs. Of note, NKX2-5 ASE-SNVs were significantly depleted for iPSC and combined tissue eQTLs, suggesting that they exert regulatory functions only in cardiac tissues.

Figure 4

Enrichment of ChIP-seq ASE variants for known QTLs.

a-c, Histograms showing the percentage of SNVs with and without ASE in each ChIP-seq (from combined iPSC or iPSC-CM samples) and overlapping dsQTLs from LCLs[32] (a), eQTLs from iPSCs[21] (b), and combined eQTLs identified in different tissues[33] (c). Two-sided Fisher’s exact test P-values are shown in red or blue for enrichment or depletion, respectively. d, Heatmap showing enrichment of ASE variants for tissue-specific eQTLs[34] (similar tissues in GTEx were merged; see Methods). Asterisks indicate two-sided Fisher’s exact test FDR corrected P-value < 0.05. Heatmap is colored based on -log10 of FDR corrected P-values, with negative sign if odds ratio was < 1. The complete Fisher’s exact test statistics including P-values, odds ratios and number of SNVs analyzed are reported in Supplementary Table 5.

We therefore investigated if NKX2-5 ASE-SNVs were enriched for heart-specific eQTLs. NKX2-5 and H3K27ac ASE-SNVs were compared with SNVs without ASE to assess enrichment for tissue-specific eQTLs (defined in methods) in 26 tissue types from the GTEx project (v6)[34]. ASE-SNVs in both NKX2-5 and H3K27ac peaks in iPSC-CMs were more enriched for heart-specific eQTLs (Fig. 4d and Supplementary Table 5) than other tissue-specific eQTLs, while H3K27ac ASE-SNVs in iPSCs were not enriched for any GTEx tissue-specific eQTL. Notably, there were 55 NKX2-5 ASE-SNVs that overlapped a heart-specific eQTL, of which 9 affected the NKX2-5 binding motif, and 13 affected one or more of the other cardiac TF motifs in Figure 3 (Supplementary Table 5). These results indicate that ASE-SNVs in the iPSC-CM lines are enriched for tissue-specific regulatory variants associated with molecular traits in previous studies. Overall, consistent with its importance as a cardiac identity transcriptional regulator, we found that SNVs affecting the binding of NKX2-5 and other cardiac TFs (with which NKX2-5 cooperatively binds) are likely to underlie cardiac-specific eQTLs.

NKX2-5 ASE-SNVs are enriched for GWAS associations with EKG traits

Based on the fact that GWAS variants near the NKX2-5 gene have been previously associated with EKG traits[13-15,35,36], we hypothesized that the altered binding of NKX2-5 in other GWAS loci could be causally implicated in these traits. We first examined if NKX2-5, H3K27ac, or ATAC peaks from iPSC-CMs were enriched for GWAS-SNPs for six EKG traits (heart rate, PR interval, QT interval, QRS duration, atrial fibrillation (AF) and P-wave duration), compared with GWAS-SNPs from 119 other traits having a comparable number of associated SNPs. We observed a strong relative enrichment for several EKG traits (Binomial test FDR < 0.05, Fig. 5a-c and Supplementary Fig. 6), with QRS duration GWAS-SNPs and heart rate GWAS-SNPs being the top two enriched traits in NKX2-5 peaks. We also examined H3K27ac and DHS peaks from Roadmap cardiac tissues, which similarly showed high enrichment for all EKG GWAS-SNPs, while H3K27ac and DHS peaks from iPSCs did not (Supplementary Fig. 6). These data show that enhancer regions in iPSC-CM and Roadmap cardiac tissues both show enrichment for EKG trait-specific regulatory variants.

Figure 5

Enrichment of NKX2-5 SNVs at GWAS loci and validation of rs590041 as a regulatory variant in the SSBP3 locus for p-wave duration.

a-c, Volcano plots showing -log10 P-values and fold enrichment for GWAS loci in NKX2-5 (a), H3K27ac (b), and ATAC-seq (c) peaks from combined iPSC-CM samples. Red symbols indicate significant enrichment at FDR corrected P-value < 0.05, calculated using GREGOR. In total n = 125 GWAS traits were tested, of which 6 were for EKG traits. d, Percentage of NKX2-5 ASE-SNVs overlapping an EKG GWAS-SNP versus overlapping a non-GWAS-SNP. Two-sided Fisher’s exact test P-value and the number of SNVs are given. e, Top panel: regional plot of association P-values with P-wave duration[37], color coded based on r2 values[54]. Second panel: regional plot of eQTLs for SSBP3 in atrial appendage samples from GTEx. NKX2-5 allelic imbalance (pie chart) for rs590041 is shown. Panels three through five: epigenetic tracks from iPSC-CM combined samples. Bottom panel: UCSC genome browser tracks for Roadmap fetal heart ChromHMM, DHS, and gene annotations. f, EMSA with nuclear extract from iPSC-CM using probes containing two allelic variants of rs590041. Similar results were obtained in two independent experiments. The full scans of the blots are shown in Supplementary Figure 9. g, Screenshot from the GTEx portal (https://gtexportal.org) showing association between rs590041 genotypes and expression levels of SSBP3 in heart atrial appendage samples. h, Luciferase assay in iPSC-CMs for rs590041, in both forward and reverse orientations. RLUs are normalized to cells transfected with the empty vector (pgl4.23). Lines indicate median, lower and upper quartiles of 6 transfection replicates per plasmid. P-values from two-tailed t-tests are shown, comparing expression from the two alleles. i, qPCR expression of SSBP3 in iPSC-CMs (id: iPSCORE_1_5) stably expressing dCas9-KRAB (CRISPRi) and either a control guide RNA (gCTL) or two guide RNAs targeting the region encompassing rs590041. Bars and error bars represent the mean and the standard deviation from three qPCR measurements, respectively; two-tailed t-test P-value is shown. Similar results were obtained in an independent cell line (Supplementary Fig. 10). All iPSC-CMs used in f, h, and i were lactate-purified.

To examine if differential binding of NKX2-5 might have a role in EKG phenotypes, we determined if NKX2-5 ASE-SNVs were enriched for being EKG GWAS-SNPs. In total, there were 121 SNPs that were associated with any of the six EKG traits and were within NKX2-5 peaks, of which 81 were heterozygous in the family and had sufficient read coverage to be tested for ASE. Fourteen of these GWAS-SNPs (17%) were NKX2-5 ASE-SNVs (Table 1), which were significantly enriched compared with the proportion of NKX2-5 ASE-SNVs overlapping heterozygous non-GWAS-SNPs (1,926/19,290 (10%), Fisher’s exact test, OR = 1.88, P = 0.0392, Fig. 5d). Among these 14 NKX2-5 ASE-SNVs at EKG GWAS loci, seven were evolutionary conserved in mammals (SiPhy conservation[33]) and/or altered a cardiac TF motif (Table 1), and three overlapped heart-specific eQTLs from GTEx. These results suggest a functional link between NKX2-5 binding, cardiac-specific gene expression, and EKG phenotypes at these loci.

Table 1

Allelic binding of NKX2-5 at GWAS loci for EKG traits

dbSNP ID	ASE FDR	ASE reference allele ratio	Gene locus	eQTL	GWAS traits	Altered motifs	Conserved	Functional validation
rs590041	2.5E-105	0.07	SSBP3 (intron)	Heart-specific	P wave duration (Lead = rs562408)[37]	Tbx5, Nkx2.5	-	EMSA, luciferase assay, CRISPRi-
rs562408	7.9E-04	0.05	SSBP3 (intron)	Heart-specific	P wave duration (Lead = rs562408)[37]	-	-	EMSA, luciferase assay, CRISPRi-

rs35176054	3.4E-18	0.16	SH3PXD2A (intron)	-	Atrial fibrillation (Lead)[47]	Gata,	Yes	-

rs7612445	2.1E-15	0.08	GNB4 (>3 kb)	Heart-specific	Heart rate (Lead)[15,39]	Meis1, Tbx5	-	EMSA

rs4890490	2.1E-12	0.29	SETBP1 (intron)	-	QRS duration[55–57]	-	-	-

rs4657167	3.5E-12	0.74	NOS1AP (intron)	-	QT interval[42]	-	-	-

rs6606689	3.8E-09	0.29	PPTC7 (intron)	Other	Heart rate[15]	-	Yes	-

rs7132327	4.9E-04	0.68	TBX3 (>130 kb)	-	PR segment[14]PR interval[13]QRS duration (Lead)[56]	-	-	-

rs3807989	6.9E-04	0.66	CAV1 (intron)	Other	PR segment (Lead)[14]PR interval (Lead)[13,41,43]Atrial fibrillation (Lead)[58]	-	Yes	EMSA, luciferase assay, CRISPRi

rs8044595	1.4E-03	0.62	MYH11 (intron)	-	Resting heart rate[39]		-	-

rs6932481	2.0E-03	0.79	SAMD3 (intron)	Other	PR interval[59]	-	-	-

rs6801957	4.2E-03	0.37	SCN10A (intron)	-	PR segment (Lead)[14]PR interval (Lead)[13,40,41]QT interval (Lead)[42]P wave duration (Lead)[14]QRS duration (Lead)[43,44]Brugada syndrome[60]Resting heart rate[39]	Meis1	Yes	EMSA, reporter assays, from [45]

rs7986508	1.0E-02	0.65	LRCH1 (intron)	Heart-specific	PR segment[14]	-	-	,-

rs10841486	1.2E-02	0.28	PDE3A (>49 kb)	Other	Resting heart rate (Lead)[39]	Eomes	-	-

rs6569252	1.7E-02	0.63	GJA1 (>7 Mb)	-	Atrial fibrillation[47]	-	-

Fourteen GWAS loci for EKG traits overlapping NKX2-5 ASE-SNVs, ordered by P-value for imbalance. For each SNV, we indicate the dbSNP ID (build 137), ASE corrected P-value (FDR) combined across heterozygous samples from the seven individuals, ASE reference allele ratio, the closest genes and relative location of the SNV, known association with gene expression (eQTL) and in which tissue (heart-specific = restricted to left ventricle and/or atrial appendage in GTEx; other = any other tissue or cell line), associated EKG GWAS traits and if the SNV is the lead variant, altered motifs, conservation in mammals and experiments performed for functional validation in this or previous studies. Additional annotations are reported in Supplementary Table 5.

Validation of NKX2-5 ASE-SNV in the SSBP3 locus as a functional regulatory variant

To provide evidence that NKX2-5 ASE-SNVs within EKG GWAS loci could be functional, we experimentally investigated the SNV that showed the strongest evidence for allelic imbalance: rs590041 (NC_000001.10:g.54742471T>C) (Table 1). Two SNPs in the transcription factor SSBP3 locus are in perfect LD and showed ASE in the same peak; while rs562408 (NC_000001.10:g.54742618A>G) was the lead variant in a P-wave duration GWAS[37], our data suggested that rs590041 is the likely functional variant, as it is more centrally located in the peak and alters both TBX5 and NKX2-5 motifs (Fig. 5e). We confirmed that rs590041 had a direct causal effect on NKX2-5 binding by electrophoretic mobility shift assay (EMSA), showing that the alternate (C) allele, which creates an NKX2-5 motif, had stronger binding to nuclear extract from iPSC-CMs (Fig. 5f), consistent with the allelic imbalance that we identified in NKX2-5 ChIP-Seq (Fig. 5e). Interestingly, the stronger NKX2-5 binding C allele was associated with lower SSBP3 expression in human atrial appendages (GTEx) (Fig. 5g), suggesting a repressive function of the regulatory element harboring rs590041. In luciferase assays in iPSC-CMs (Fig. 5h), sequences encoding both alleles showed lower expression than the control, but the stronger NKX2-5 binding C allele was significantly lower than the T allele, additionally supporting a repressive function of NKX2-5 binding in this region. This hypothesis was further substantiated by the fact that specific dCas9-KRAB blocking (CRISPRi) of the region resulted in increased expression of SSBP3 in iPSC-CMs (Fig. 5i). Of note, there is no previously described role for SSBP3 in EKG phenotypes. Altogether, these data show that rs590041 is a regulatory variant that represses the expression of SSBP3 in cardiac cells, and suggest that it likely underlies the association of P-wave duration in this locus.

NKX2-5 ASE-SNVs prioritize causal variants in heart-rate GWAS loci

To examine more broadly whether NKX2-5 ASE-SNVs could help prioritize causal variants for EKG traits, we utilized fgwas[38], a statistical framework that integrates functional genomics annotations and GWAS summary statistics to identify putative causal variants at know loci, as well as at potentially novel loci. We initially applied a single annotation model to examine a heart rate[15] meta-analysis to determine if genetic associations were enriched within each individual iPSC-CM genomic annotation (NKX2-5, H3K27ac, and ATAC-seq peaks, and NKX2-5 ASE-SNVs and H3K27ac ASE-SNVs). We found NKX2-5 ASE-SNVs were the most enriched annotation, followed by NKX2-5 peaks (Supplementary Fig. 7). We next applied a joint model, where the association enrichment was quantified simultaneously for all five annotations and refined using 10-fold cross-validation, and found again NKX2-5 ASE-SNVs to be the most significantly enriched, followed by H3K27ac peaks (Fig. 6a). Then, to prioritize causal variants, we used the enrichment estimates from the joint model as priors to update the probability for a variant to be causal (posterior probability of association, PPA) within consecutive 1-Mb windows across the genome. We found 21 variants with greater than 30% probability of being causal, of which seven (30%) were NKX2-5 ASE-SNVs (Supplementary Table 6), suggesting that altered binding of NKX2-5 accounts for a considerable fraction of the genome-wide genetic contribution underlying variable heart rate. Out of these seven NKX2-5 ASE-SNVs (Fig. 6b), four were from “sub-threshold” loci that did not reach genome-wide significance in the heart rate[15] meta-analysis. One of these variants, rs6801957 (NC_000003.11:g.38767315T>C), identified with a 35% PPA, did not reach genome-wide significance in the heart rate[15] meta-analysis, but was significantly associated in a larger heart-rate GWAS[39], as well as in several GWAS for multiple EKG traits[13,14,40-44]. While we predicted that rs6801957 altered a T-box binding sequence and resulted in differential co-binding of NKX2-5 (Fig. 6c), previous functional experiments showed that this variant affects binding of TBX3 and TBX5 and expression of SCN5A, the main cardiac sodium channel[3,45]. Thus, rs6801957 serves as a proof of principle for using NKX2-5 ASE-SNVs to identify causal variants at known EKG trait GWAS loci as well as identify novel associated loci.

Figure 6

Prioritization of candidate causal variants at heart rate loci using fgwas.

a, fgwas natural log fold enrichment of GWAS-SNPs for heart rate[15] in iPSC-CM genomic annotations (y-axis). The bars indicate 95% confidence intervals. b, Table showing 11 SNPs with > 0.3 fgwas posterior probability of causality (PPA) and that overlapped at least two of the indicated iPSC-CM genomic annotations. SNPs that showed genome-wide significance (P < 10-8) for each trait in the corresponding studies are indicated, while those with P >10-8 are sub-threshold, and thus novel, GWAS loci. c, Functional annotation of rs6801957 associated with heart rate[15]. Top panel: regional plot of association P-values; SNPs are color coded based on r2 values from the 1000 Genome Project CEU population[54]; lead GWAS variants from other studies in the locus are indicated by a diamond. Second panel: fgwas PPA of the variants in the locus. Panels three through five: epigenetic tracks from iPSC-CM combined samples. Bottom panels: Roadmap fetal heart ChromHMM and genes from UCSC genome browser. Inner panel: allelic imbalance (pie chart) of NKX2-5 ASE with FRD-corrected P-values, and altered TF motif.

To further investigate the mechanisms of association between heart rate and NKX2-5 ASE-SNVs identified as candidate causal variants by fgwas (Fig. 6b), we followed up three loci previously associated with heart rate (rs7612445, NC_000003.11:g.179172979G>T; rs8044595, NC_000016.9:g.15906130A>G; and rs6606689, NC_000012.11:g.110975675T>C) and a potential novel locus (rs176107, NC_000005.9:g.89392662A>G) with additional experimental data (Supplementary Note). These data included Hi-C chromatin conformation maps from the same iPSC-CM samples[46] (Supplementary Table 2a), and RNA-seq data from iPSC-CMs from an additional 128 whole-genome-sequenced subjects[26], to examine associations between the putative causal NKX2-5 ASE-SNVs and expression of nearby or distal candidate target genes. For rs7612445 (98% PPA), which altered a T-box motif in the GNB4 locus, we validated that the two alleles have differential binding using EMSA, and that it is associated with differential expression in iPSC-CMs of several genes, including GNB4 (heart specific eQTL in GTEx) and MFN1 (influencing heart rate in zebrafish and Drosophila[15]; Supplementary Fig. 8a-c). rs8044595 (89% PPA) was associated with expression of multiple genes within the same chromatin loop in iPSC-CMs, including a strong candidate NOMO3 (nodal signaling protein associated with heart defects) (Supplementary Fig. 8d,e). rs6606689 (86% PPA) was associated with ARPC3 gene expression, an actin cytoskeleton regulator (Supplementary Fig. 8f,g). For rs176107 (35% PPA), Hi-C showed numerous long-range interactions including with the key cardiac TF MEF2C (~1.2 Mb distal) and it was also associated with expression of MEF2C in iPSC-CMs (Supplementary Fig. 8h,i). Overall, these results uncover plausible molecular mechanism underlying variability in heart rate, both at novel and previously identified GWAS loci.

Validation of NKX2-5 ASE-SNV rs3807989 as a functional variant at the CAV1 locus

To examine other EKG traits, we applied the fgwas fine-mapping framework to both atrial fibrillation[47] and PR interval[17] GWAS studies (Fig. 7a and Supplementary Fig. 7), and identified 26 and 102 SNPs, respectively, with greater than 30% probability of being causal, of which 8% (2/26) and 14% (14/102) were NKX2-5 ASE-SNVs (Supplementary Table 6). In both the AF and PR interval fgwas analyses, rs3807989 (NC_000007.13:g.116186241A>G) had the highest probability of being causal (>99% PPA) (Fig. 7b,c), and therefore, we experimentally investigated potential mechanisms underlying these associations. rs3807989, located within the CAV1 associated interval, has been reported as an eQTL for both CAV1 and CAV2 (encoding caveolins, scaffolding proteins involved in various signaling pathways) in multiple tissues[34,48,49], including left atrial samples[17]. This eQTL was reproduced in our 128 iPSC-CMs (Fig. 7d), confirming that there is a clear genetic association between rs3807989 and expression levels of CAV1 and CAV2 in cardiomyocytes. To provide evidence that this SNP is directly responsible for differential regulatory activity, we performed EMSA using iPSC-CM nuclear extracts, which demonstrated that oligonucleotide probes for the reference allele (A) bound more strongly than those for the alternate allele (G), consistent with the allelic imbalance that we identified in NKX2-5 ChIP-seq (Fig. 7e). Although rs3807989 was not predicted to directly modify a motif for NKX2-5 or other cardiac TFs, the SNV is located 6 bp from a NKX2-5 motif (Fig. 7f), and could modify a sequence important for the recognition of the binding site, such as those affecting DNA shape[50-52]. Furthermore, we observed consistent allele-specific enhancer activity in iPSC-CMs by luciferase assays (Fig. 7g). Finally, by repressing the rs3807989-containing genomic region using dCas9-KRAB (CRISPRi), we observed a significant reduction in the expression levels of both CAV1 and CAV2 in iPSC-CMs (Fig. 7h and Supplementary Fig. 10). Altogether, these results demonstrate that rs3807989 is a regulatory variant that modulates the expression levels of CAV1 and CAV2 via differential protein binding, and as such, is highly likely the causal variant underlying the AF and PR interval GWAS signals in the CAV1 interval.

Figure 7

Functional characterization of rs3807989 as candidate causal variants for PR interval and atrial fibrillation.

a, fgwas natural log fold enrichment of GWAS-SNPs for atrial fibrillation (AF) and PR interval (PR) in iPSC-CM genomic annotations (y-axis). Bars indicate 95% confidence intervals. b, Tables showing the top 5 SNPs ordered by fgwas posterior probability of causality (PPA) and overlapping at least two of the indicated iPSC-CM genomic annotations. c, Top panel: regional plot of association P-values with PR interval[17], color coded based on r2 values[54]. NKX2-5 allelic imbalance (pie chart) for rs3807989 is shown. Second panel: fgwas PPA of the variants in the locus. Panels three through five: epigenetic tracks from iPSC-CM combined samples. Bottom panel: UCSC genome browser tracks for Roadmap fetal heart ChromHMM, DHS, and gene annotations. d, Association between rs3807989 genotypes and gene expression of CAV1 and CAV2 genes in 128 iPSC-CMs from different individuals[26]. Boxplot elements: median (thick line), lower and upper quartiles (box), maximum and minimum (whiskers). P-value of linear regression is shown. e, EMSA with iPSC-CMs nuclear extract using probes containing two allelic variants of rs3807989. A second blot from an independent experiment with similar results and full scans of the blots are shown in Supplementary Figure 9. f, Position of rs3807989 with respect to the NKX2-5 motif. g, Luciferase assays in iPSC-CMs for rs3807989, in both forward and reverse orientations. RLUs are normalized to cells transfected with the empty vector (pGL4.23). Plot lines indicate median, lower and upper quartiles of 6 transfection replicates per plasmid. P-values from two-tailed t-tests are shown. h, qPCR expression of CAV1 and CAV2 genes in iPSC-CMs stably expressing dCas9-KRAB (CRISPRi) (id: iPSCORE_1_5) and either a control guide RNA (gCTL) or two guide RNAs targeting the region encompassing rs3807989. Bars and error bars represent the mean and the standard deviation from three qPCR measurements, respectively; two-tailed t-test P-value is shown. The result was replicated in an independent cell line (Supplementary Fig. 10). All iPSC-CMs used in d-h were lactate-purified.

Discussion

Our study shows that differential binding of NKX2-5 likely underlies the molecular mechanisms of numerous genetic associations with EKG traits across the genome. Additionally, we showed that molecular phenotype data from iPSC-CMs combined with fine-mapping statistical approaches can be used to prioritize putative causal variants underlying genetic associations with cardiac-specific traits. Furthermore, our study demonstrates the effectiveness of using iPSC-derived cells as a model system for understanding the genetic basis of complex human traits and diseases by conducting genome-wide genotype-phenotype analyses as well as interrogating the function of individual variants. Within ~38,000 NKX2-5 binding sites, we identified 1,941 genetic variants that altered binding of the transcription factor. Because we investigated seven individuals in a three-generational family, the statistical power for identifying ASE-SNVs was increased as there were multiple replicates of allelic imbalance at the same heterozygous SNV. However, we anticipate that analyzing a larger sample size would identify a greater fraction of the NKX2-5 sites affected by genetic variants. For the NKX2-5 sites with differential binding, ~40% had genetic variants that altered the cognate TF motif and/or motifs of functionally related cardiac TFs, suggesting that a large fraction of the observed allelic binding of NKX2-5 was either a direct consequence of the SNV, or an indirect consequence resulting from the differential binding of a known co-factor. ASE-SNVs that were not associated with core cardiac TF motifs could: (i) affect consensus motifs from TFs that were not included in our targeted analysis; (ii) affect important sequences that impact DNA shape or an as of yet unknown regulatory mechanism[50-52]; or (iii) be non-functional. Combinatorial interactions between key cardiac TFs is known to be an important mechanism for orchestrating the cardiac gene expression program during development[8-11]. While genetic variation has been shown to affect collaborative binding of lineage determining TFs in mice[53], our study is the first that we are aware of to show these effects in humans. Coding mutations in and non-coding variants near NKX2-5 have, respectively, been associated with congenital heart defects[12], as well as heart rate, atrial fibrillation and PR interval[13-15], implicating this TF in a range of cardiac disease in both development and adult stages. Here, our analysis of genome-wide NKX2-5 binding enabled us to investigate its role in cardiac phenotypes through a different genetic mechanism, i.e. variation in TF binding sites resulting in differential expression of target genes. We showed that differential NKX2-5 binding was positively correlated with H3K27ac peaks at iPSC-CM enhancers, but not iPSC enhancers, suggesting that NKX2-5 ASE-SNVs altered cardiac specific enhancer activity. These findings are consistent with the fact that we found enrichment for GTEx heart-specific eQTLs in both NKX2-5 and H3K27ac ASE-SNVs in iPSC-CMs. Importantly, out of all the molecular phenotypes examined, NKX2-5 ASE-SNVs were the more strongly enriched within EKG loci, thereby implicating NKX2-5 in the development of these traits, and indicating that NKX2-5 ASE-SNVs could be used to prioritize putative causal variants. Analyzing GWASs for heart rate, atrial fibrillation and PR interval using a fine-mapping method that integrates functional annotations with GWAS summary statistics (fgwas) revealed several NKX2-5 ASE-SNVs with a high probability of causality at known loci as well as potentially novel sub-threshold GWAS signals. As a proof that this approach was effective to prioritize causal variants, one of the NKX2-5 ASE-SNVs (rs6801957 at the SCN10A-SCN5A locus) had been previously investigated in detail and had been shown to be functionally implicated in the association with EKG[3,45]. Further investigation of NKX2-5 ASE-SNVs heart rate loci using Hi-C generated from the same iPSC-CMs and gene expression in iPSC-CMs derived from 128 individuals revealed an association between the putative causal NKX2-5 ASE-SNVs and expression of nearby or distal candidate target genes. As a notable example, one of the prioritized variants (rs176107) at a sub-threshold locus showed long-range (~1.2 Mb) interaction with MEF2C, a key cardiac morphogenesis regulator, and was associated with its expression, thus providing a plausible mechanism underlying associations between differential NKX2-5 binding and heart rate. We further followed up two NKX2-5 ASE-SNVs that were potential causal variants underlying associations with EKG traits with experimental validation including EMSA, luciferase assay, and CRISPRi. These analyses demonstrated that the two common SNPs, rs590041 (associated with P-wave duration) and rs3807989 (associated with PR interval and atrial fibrillation), are functional regulatory variants that influence the expression of SSBP3 and CAV1-CAV2 genes, respectively, via differential TF binding. Interestingly, while the rs3807989 stronger TF binding allele was associated with higher gene expression, the rs590041 stronger TF binding allele was associated with reduced gene expression, indicating that NKX2-5 binding is associated with both activating and repressing regulatory elements. Although future experimental studies are needed to elucidate the function of SSBP3 and CAV1-CAV2 with respect to the associated EKG phenotypes, our results provide novel insights into the role differential binding of NKX2-5 and other cardiac TFs play in the genetic underpinnings of EKG traits. Finally, our study demonstrates that analyzing the allelic binding of master developmental TFs in iPSC-CMs is highly effective to pinpoint genetic variation important for cardiac traits, and suggests that expanding this approach to study other cardiac TF (such as TBX5, GATA4, and MEF2C) in larger sample sizes could potentially identify and characterize many of the regulatory variants that play a role in cardiac traits and diseases.

Methods

Additional details are provided in the Supplementary Note and in the Reporting Summary.

Subjects and iPSC derivation

We selected seven individuals that are part of a three-generational family (three genetically unrelated subjects and two parent-offspring quartets) in the iPSCORE resource[28] (Supplementary Table 1). Fibroblasts from skin biopsies of each subject were reprogrammed using non-integrative Sendai virus[61] and analyzed for pluripotency as described in Panopulos et al.[28]. For five individuals, we analyzed one iPSC line (“clone”), and for two individuals we analyzed two iPSC lines (Fig. 1). The nine iPSC lines were harvested in multiple replicates between passages 12 to 40; a total of 35 different iPSC harvests were used in this study (Supplementary Table 2). This study was approved by the Institutional Review Boards of the University of California at San Diego (Project #110776ZF).

Differentiation of iPSCs into cardiomyocytes

The nine iPSCs were each differentiated multiple times using a monolayer protocol[62], resulting in a total of 26 iPSC-CM samples (Supplementary Table 2). Twelve of the iPSC-CM samples were subjected to selection using 4 mM sodium L-lactate media[63] and collected at day 25. Fourteen iPSC-CM samples were collected at day 15, of which one was subjected to lactate purification at day 11. At the day of collection, iPSC-CMs were dissociated using Accutase (Thermo Scientific), pooled, counted and separated into different aliquots. About 6 x 107 cells were fixed with formaldehyde and frozen for ChIP-seq. Cells (2 x 107) were lysed and stored in RLT plus buffer (Qiagen) for RNA extraction. Nuclei from 2 x 105 cells were frozen for ATAC-seq. Differentiation efficiency was measured by the percentage of cells that stained positive for the cardiac marker cardiac troponin T (TNNT2) (Thermo Scientific MA5-12960) using flow cytometry (FACSCanto system, BD Biosciences). The same protocols of dissociation and collection of samples for RNA-seq, ChIP-seq and ATAC-seq were applied to non-differentiated iPSC lines.

Whole-genome sequencing

Genomic DNA was whole genome sequenced as a part of the iPSCORE collection, as described by DeBoever et al.[21]. Briefly, reads were aligned against human genome b37 with decoy sequences[64] using BWA-MEM and default parameters[65]. The resulting BAM files were sorted using Sambamba[66] and duplicate reads were marked using biobambam2[67]. Variant calling was performed using the GATK best-practices pipeline[68,69] on BAM files separated into individual chromosomes.

RNA-seq

We generated and analyzed 56 RNA-seq (iPSCs: 29 independent samples; iPSC-CMs: 26 independent samples and 1 technical replicate). Total RNA was isolated using the Qiagen RNAeasy Mini Kit from frozen RTL plus pellets, and run on a Bioanalyzer (Agilent). Illumina Truseq Stranded mRNA libraries were prepared and sequenced on HiSeq2500, to an average of 40 million 100 bp paired-end reads per sample. RNA-seq reads were aligned using STAR[70] with a splice junction database built from the Gencode v19 gene annotation[71]. Gene-based expression values were quantified using the RSEM package[72] and normalized to transcript per million bp (TPM).

ChIP-seq

We generated and analyzed 48 ChIP-seq of histone modification H3K27ac (iPSCs: 17 samples and 4 technical replicates; iPSC-CMs: 25 samples and 2 technical replicates), and 15 ChIP-seq of NKX2-5 (iPSC-CMs: 12 samples and 3 technical replicates) (Supplementary Tables 2 and 3), using anti-H3K27ac (Abcam ab4729) and anti-NKX2-5 (Santa Cruz Biotechnology, sc-8697x) antibodies. Libraries were sequenced to an average of 35 million 100 bp paired-end reads per sample. ChIP-seq reads were mapped to the hg19 reference using BWA[65]. Duplicate reads, reads mapping to blacklisted regions and read-pairs with mapping quality Q < 30 were filtered. Peak calling was performed using MACS2[73] with reads derived from sonicated chromatin not subjected to IP (i.e. input chromatin) from a pool of samples used as negative control. For each data type, peak coordinates were called from combined BAM files across all samples of either iPSCs or iPSC-CMs. Quantification of the signal at peaks in each sample was performed using featureCounts[74]. Motif enrichment analysis was performed using HOMER[75] and, for NKX2-5, also using MEME ChIP[76].

ATAC-seq

We generated 37 ATAC-seq libraries (iPSCs: 12 samples and 5 technical replicates; iPSC-CMs: 11 samples and 9 technical replicates) using an adapted protocol from Buenrostro et al.[77]. Libraries were sequenced to an average depth of 20 million 100-150 bp paired end reads. ATAC-seq reads were aligned using STAR to hg19 and filtered using the same protocol as for ChIP-seq. In addition, to restrict analysis to regions spanning only one nucleosome, we required an insert size no larger than 140 bp. Peak calling was performed using MACS2 on combined BAM files of either iPSC or iPSC-CM samples.

Analysis of gene expression differences between iPSCs and iPSC-CMs

A matrix of raw gene expression values from 64 RNA-seq samples (29 iPSCs, 27 iPSC-CMs, and 8 RNA-seq samples from Roadmap including H1-hESC, HUES64, iPS-20b, iPS-18, Right Atrium, Right Ventricle, Left Ventricle, and Fetal Heart) was created from the RSEM expected counts, filtered for > 1 TPM on average samples, and rounded to integer values. After filtering, 15,725 genes remained from the initial 57,820. Expression values were normalized using variance stabilizing transformation (vst) implemented in DESeq2[78]. Hierarchical clustering and the heatmap in Supplementary Figure 1 were generated using vst-normalized read counts for a panel of 61 selected genes using ‘pheatmap’ package in R. Analysis and plotting of principal components of all 15,725 genes were performed in R (Fig. 1). To identify differentially expressed genes (DEGs) between iPSCs and iPSC-CMs, we used a matrix of raw expression counts from 56 RNA-seq (29 iPSCs and 27 iPSC-CMs), filtered for average TPM > 1 (22,447 genes), and applied DESeq2 with default settings to identify DEGs more than 2-fold and at a BH (Benjamini & Hochberg) FDR of 5%.

Normalization and analysis of variability of molecular phenotypes

For RNA-seq, we restricted the analysis to autosomal genes that had on average a minimum of 1 TPM per sample (14,933 and 15,167 genes for iPSCs and iPSC-CMs, respectively) and integer-rounded RSEM expected counts were used as expression levels. For ChIP-seq, we excluded peaks > 5 kb long and those located on sex chromosomes, resulting in 110,345 H3K27ac peaks analyzed in iPSCs and 83,689 H3K27ac peaks and 37,994 NKX2-5 peaks analyzed in iPSC-CMs (Supplementary Table 3). Matrices of raw expression levels or peak coverage for each of the 5 datasets were vst-normalized using DESeq2 and analyzed for principal components using R. To investigate the major sources of variability within each dataset, values for the first 10 PCs were correlated with known covariates across samples (for iPSCs: sequencing batch, passage and subject; for iPSC-CMs: TNNT2 expression, protocol of differentiation and subject; for ChIP-seq of both cell types, we also included the fraction of reads mapping to peaks, or FRiP) using ANOVA. We corrected the respective datasets by fitting a model including the covariates that were most associated with the first PC (batch for iPSCs; TNNT2 expression and protocol/batch for iPSC-CMs; and FrIP for all ChIP-seq datasets) using the ‘lmFit’ function from ‘limma’ package and calculating the residuals using the ‘residuals’ function in R. Mean expression and coverage values for each gene/peak were added back to the residuals. Residual-corrected values were used in all subsequent analyses. To assess the consistency of data generated from cell lines derived from the same individual versus cell lines from different individuals, we selected the 1,000 most variable genes or peaks and computed matrices of Spearman correlation values across all pairs of samples for each molecular phenotype. We then separated correlation values between pairs of samples from the same, different, related or unrelated individuals and calculated the average correlation per sample. Technical replicates were excluded for the comparisons between samples of the same subject. We tested for significant increase in correlation between samples from the same subject using a one-tailed Mann-Whitney test (Fig. 1c-g and Supplementary Fig. 3k-o).

Allelic-specific effect (ASE) analysis

ASE analysis was performed as previously described[21]. To increase sensitivity of ASE and maximize the number of genes/peaks to analyze, reads from all samples from each individual per assay were merged. Heterozygous SNVs were identified by intersecting variant calls from WGS with either exonic regions from Gencode v19 or regions identified by each ChIP-seq or ATAC-seq dataset. The WASP pipeline[79] was employed to reduce reference allele bias at heterozygous sites. The number of read pairs supporting each allele was counted using the ASEReadCounter from GATK[80]. Heterozygous SNVs were then filtered to keep SNVs where the reference or alternate allele had more than 8 supporting read pairs, the reference allele frequency was between 2-98%, the SNV was located in unique mappability regions according to wgEncodeCrgMapabilityAlign100mer track, and not located within 10 bp of another variant in a particular subject (heterozygous or homozygous alternative)[49,81,82]. ASE P-values for each SNV were calculated in each sample using a binomial test method[49,82]. To combine ASE results at each SNV across samples, we performed a meta-analysis on all samples that were heterozygous for a given SNV and for which ASE could be tested. The binomial P-values of heterozygous SNVs were combined using the Stouffer z-score method[83], using the formula where Z is the z-score derived from P-values and signed according to the direction of the effect, and k is the number of individuals for each SNV. The combined z-scores were transformed to P-values and a BH FDR was calculated using ‘p.adjust’ in R. The alternate allele frequency was averaged across all heterozygous samples.

Correlation of ASEs across all individuals

The direction of ASE effects across all family members (including homozygous individuals) was estimated using the ß coefficient of a linear model testing association between the corrected gene expression or peak coverage (normalized to z-scores across individuals) and the genotype of the seven family members (0, 1 or 2, testing only one ASE-SNV per region). Spearman correlation was used to compare ß to the average allele proportion of ASE-SNVs to estimate the consistency of effects (Fig. 2d-f).

Correlation of ASEs across different molecular phenotypes

To test if the direction of ASE of SNVs within ChIP-seq peaks correlated with changes in peak coverage of other ChIP-seq peaks or with gene expression, we performed a linear regression between the ASE-SNV genotypes and each phenotype. ChIP-seq peaks were paired with the closest gene or peak within 500 bp using ‘bedtools closest’. Using linear regression, we tested the association between the individual genotypes (0, 1, 2, testing only one ASE-SNV per region) of the ASE-SNVs (FDR < 0.05) and either the corresponding corrected and z-score normalized peak coverage or gene expression or those of the closest feature. In both peak/gene and peak/peak pairs, Spearman correlation was calculated between the two slopes (ß) of linear regression (Fig. 2g,h).

Analysis of SNVs altering TFBS motifs

The effect of NKX2-5 ASE-SNVs on TFBS motifs was estimated using position probability matrices (PPMs) of the 12 most enriched families of motifs identified using HOMER (Supplementary Table 4), from a library of known motifs. For NKX2, GATA, TEAD, MEF2, TBX20 and PDX1, we also used PPMs derived from a de-novo analysis. All PPMs are provided in Supplementary Table 4. Position weight matrices (PWMs) were calculated from the PPMs using a background nucleotide frequency of 0.25 for each base. Using a custom R script, a 40-bp window centered on each SNV tested for ASE was scanned with PWMs for each motif, and the position with the highest score was identified. For SNVs where either the reference or the alternate sequence matched or exceeded the log odds detection threshold reported by HOMER PPMs, the difference between the scores of the two alleles was calculated. In cases where a SNV matched multiple motifs from the same family, we kept only the motif with the highest score for either of the alleles. Fisher’s exact test was used to calculate enrichment for motif-altering SNVs in variants with ASE compared to variants without ASE (Fig. 3a). For each of the 12 motifs, we also calculated Spearman correlation between the allelic imbalance proportion of the reference allele and the difference in motif score between the reference and the alternate allele (Fig. 3c,d and Supplementary Fig. 5). Motifs that were altered at NKX2-5 ASE-SNVs are indicated in Supplementary Table 5.

Enrichment of ASE-SNVs for known quantitative trait loci

To examine the enrichment of ASE-SNVs in known quantitative trait loci across different tissues, we obtained dsQTLs in LCLs from Degner et al.[32], eQTLs from iPSCs from DeBoever et al.[21], and eQTLs from HaploReg v4.1[33], which contained combined results from 13 different studies including GTEx v.6[82]. To identify tissue-specific eQTLs (Fig. 4d), the 44 tissues from GTEx were classified into 26 groups by merging similar tissues (adipose (n = 2), artery (n = 3), brain (n = 10), cell lines (n = 2), colon (n = 2), esophagus (n = 3), heart (n = 2), skin (n = 2), and the remaining 18 tissues were n = 1). A gene-eQTL combination was defined as tissue-specific if 50% or more of the significant associations were in a single tissue group. All SNVs tested for ASE in ChIP-seq datasets (H3K27ac in iPSCs and H3K27ac and NKX2-5 in iPSC-CMs) were intersected with these annotations, and enrichment between heterozygous SNVs with and without ASE was calculated using Fisher’s exact test in R. In cases where multiple SNPs overlapped a peak, we counted only one SNP per peak. The complete Fisher’s exact test statistics including P-values, odds ratios and number of SNVs analyzed are reported in Supplementary Table 5.

Enrichment of GWAS-SNPs in regulatory regions in iPSC-CMs

To calculate enrichment for GWAS-SNPs in ChIP-seq and ATAC-seq peaks, we extracted sets of SNPs associated with six EKG traits (heart rate, PR interval, QT interval, QRS duration, atrial fibrillation and P-wave duration) from the GWAS catalog[1] and 119 non-EKG traits that were associated with a similar number of SNPs. We used GREGOR[84] to test each of these 125 SNP sets for enrichment in ChIP-seq and ATAC-seq peaks from iPSCs and iPSC-CMs from this study as well as in peaks from cardiac tissues from Roadmap as a control (Fig. 5a-c and Supplementary Fig. 6). To calculate the enrichment for EKG GWAS-SNPs in NKX2-5 ASE-SNVs, we obtained the SNVs overlapping NKX2-5 peaks and associated with any of the six EKG traits. For the SNVs that could be tested for ASE, we calculated the proportion with and without ASE and tested their relative enrichment using Fisher’s exact test (Fig. 5d).

Estimating GWAS enrichment in molecular phenotypes and prioritizing putative causal variants

To determine the enrichment of genetic variants influencing EKG traits within the different iPSC-CM molecular phenotypes and to identify putative causal variants and novel associations, we employed the fgwas framework, as described by Pickrell et al.[38]. We obtained summary statistics from the den Hoed et al.[15] heart-rate GWAS meta-analysis (2,516,407 SNPs analyzed) from LD hub (http://ldsc.broadinstitute.org/ldhub/), the Christophersen et al. atrial fibrillation meta-analysis[47] (11,779,664 SNPs) from the CVD portal (http://broadcvdi.org/), and the van Setten at al. PR-interval[17] GWAS (2,712,310 SNPs) as a collaboration with the authors. For each GWAS, we annotated each variant with the type of molecular phenotype it overlapped: peaks (ATAC-seq, H3K27ac, and NKX2-5 peaks) and/or ASE-SNVs (H3K27ac and NKX2-5), and applied a single annotation model followed by a joint model, where the association enrichment was quantified simultaneously for all five annotations. To prioritize causal variants, we used the enrichment estimates from the joint model as priors to estimate the probability for a variant to be causal (posterior probability of association, PPA) within consecutive 1-Mb windows across the genome. We report all variants with PPA > 0.3 in Supplementary Table 6.

Gene expression analysis of 128 iPSC-CMs

We used RNA-seq of iPSC-CMs from 128 different individuals[26]. Subjects included 43 males and 85 females, between 9 and 88 years of age, of diverse ethnicities (Europeans, n = 78, and Asians, n = 23). iPSCs were differentiated into day-25 cardiomyocytes using the method described above, including a 4 mM sodium L-lactate enrichment step at day 15, and yielded on average 83.9 +/- 13.6 % cTNT-positive populations. RNA-seq was generated and processed as described above. Raw gene expression data were first filtered for genes with TPM ≥ 2 in at least 5% of the samples and then quantile-normalized. From these values we calculated PEER factors[85] and used the residuals of the first 10 factors as normalized gene expression values. We extracted the individuals’ genotypes from WGS and performed linear regression for the specific SNV-gene expression associations in R.

Electrophoretic mobility shift assay (EMSA)

EMSAs were performed using the LightShift™ Chemiluminescent EMSA Kit (Thermo Scientific) with biotinylated and non-biotinylated single-stranded oligonucleotides corresponding to 33-34 genomic fragments containing the SNPs rs590041, rs3807989, and rs7512445 (Supplementary Table 7). Both forward and reverse strand were tested; the forward strand was bound in case of rs590041 and rs3807989 and the reverse strand was bound in case of rs7512445. Nuclear extract from day-30-33 iPSC-CMs was extracted using the NE-PER Nuclear and Cytoplasmic Extraction Reagents (Thermo Scientific) with 1X Halt™ Protease Inhibitor Cocktail (Thermo Scientific). The binding reaction was carried in 10 μl volume containing 1 μl of 10X Binding Buffer (100 mM Tris pH 7.5, 500 mM KCl and 10 mM DTT), 2.5% glycerol, 5 mM MgCl2, 0.05% NP40, 50 ng Poly(dI:dC), 1 pmol of biotin-labeled probe, and 15.3-16.8 μg nuclear extract. For competition experiments, a 200-fold molar excess of unlabeled probe was added. Binding reactions were incubated at room temperature for 20 min and loaded onto a 6% polyacrylamide 0.5X TBE gel. After sample electrophoresis and transfer to a Biodine B Pre-Cut Modified Nylon Membrane, 0.45 μm (Thermo Scientific), DNA was UV-crosslinked for 15 min, and the biotinylated probes were detected using Chemiluminescent Nucleic Acid Detection Module (Thermo Scientific). Membranes were acquired using C-DiGit Blot scanner (LI-COR Biosciences).

Luciferase assay

Candidate functional variants rs590041 (SSBP3 intron) and rs3807989 (CAV1 intron) were tested for differential transcriptional activity using luciferase reporter assay. ~1.7-kb regions centered on each SNP were amplified from genomic DNA and cloned into pGL4.23 Firefly Luciferase reporter vectors (Promega) using Kpn I restriction sites, with primers given in Supplementary Table 7. For rs590041, the two allelic variants were obtained using site-directed mutagenesis of a homozygous alternate genomic DNA, while for rs3807989, they were obtained by sub-cloning DNA with heterozygous genotype. Cryopreserved day-25 iPSC-CMs were seeded onto a matrigel-coated 96-w plate at a density of 30-40 x 106cells per well and cultured in RPMI + Insulin for 5-10 days prior to transfection, when media was exchanged to Opti-MEM (Life Technologies). Each well was transfected with a mix of 120 ng of Firefly Luciferase reporter vector, 30 ng of Renilla Luciferase control vector (pRL-TK, Promega), and 0.6 μl of Viafect transfection reagent (Promega) in 10 μl of Opti-MEM. We transfected six wells per construct. Luciferase activity was measured 24 hours after transfection using the Dual-Luciferase® Reporter Assay System (Promega).

CRISPRi experiments

Two gRNAs targeting CAV1 and SSBP3 regulatory elements were designed using the online software CHOPCHOP (http://chopchop.cbu.uib.no/index.php) and cloned into the lentiviral vector pLKO.1-U6-2sgRNA-ccdB-EF1a-Puromycin. Lentiviral gRNAs or Lenti-dCas9-KRAB-blast plasmids (Addgene #89567) were co-transfected with packaging plasmids (psPAX2 and pMD2.G) into human 293T cells. Culture medium containing lentivirus particles for gRNA and dCas9-KRAB was harvested, mixed well with polybrene (10 μg/ml), and added to a 24-well plate. Day-30 iPSC-CMs (cell lines iPSCORE_1_5 and iPSCORE_75_1) were dissociated and added into the virus-containing media at around 80% confluence. For higher infection efficiency, a new collection of lentiviral particles mixed with polybrene was added to the medium after 24 hours. Medium was exchanged after 24 hours to regular culture medium and changed to selection medium containing 0.2 μg/ml puromycin and 6 μg/ml blasticidin after another 24 hours. Cells were cultured for 6 days when all cells from the non-infected control died, and then harvested. RNA was isolated with Quick-RNA kit (Zymo Research) and reverse-transcribed using SuperScript III Reverse Transcriptase (Life Technologies). qPCR reactions were performed in StepOne™ Real-Time PCR Systems (Applied Biosystems) using 2X Affymetrix qPCR master mix. Relative quantities of gene expression levels were normalized to the METTL2B gene. Guide RNAs and primers for qPCR are given in Supplementary Table 7.

79 in total

Review 1. The Genetics of Transcription Factor DNA Binding Variation.

Authors: Bart Deplancke; Daniel Alpern; Vincent Gardeux
Journal: Cell Date: 2016-07-28 Impact factor: 41.582

Review 2. The Post-GWAS Era: From Association to Function.

Authors: Michael D Gallagher; Alice S Chen-Plotkin
Journal: Am J Hum Genet Date: 2018-05-03 Impact factor: 11.025

3. Complex Interdependence Regulates Heterotypic Transcription Factor Distribution and Coordinates Cardiogenesis.

Authors: Luis Luna-Zurita; Christian U Stirnimann; Sebastian Glatt; Bogac L Kaynak; Sean Thomas; Florence Baudin; Md Abul Hassan Samee; Daniel He; Eric M Small; Maria Mileikovsky; Andras Nagy; Alisha K Holloway; Katherine S Pollard; Christoph W Müller; Benoit G Bruneau
Journal: Cell Date: 2016-02-11 Impact factor: 41.582

4. Co-occupancy by multiple cardiac transcription factors identifies transcriptional enhancers active in heart.

Authors: Aibin He; Sek Won Kong; Qing Ma; William T Pu
Journal: Proc Natl Acad Sci U S A Date: 2011-03-17 Impact factor: 11.205

5. A common genetic variant within SCN10A modulates cardiac SCN5A expression.

Authors: Malou van den Boogaard; Scott Smemo; Ozanna Burnicka-Turek; David E Arnolds; Harmen J G van de Werken; Petra Klous; David McKean; Jochen D Muehlschlegel; Julia Moosmann; Okan Toka; Xinan H Yang; Tamara T Koopmann; Michiel E Adriaens; Connie R Bezzina; Wouter de Laat; Christine Seidman; J G Seidman; Vincent M Christoffels; Marcelo A Nobrega; Phil Barnett; Ivan P Moskowitz
Journal: J Clin Invest Date: 2014-03-18 Impact factor: 14.808

6. The cardiac transcription network modulated by Gata4, Mef2a, Nkx2.5, Srf, histone modifications, and microRNAs.

Authors: Jenny Schlesinger; Markus Schueler; Marcel Grunert; Jenny J Fischer; Qin Zhang; Tammo Krueger; Martin Lange; Martje Tönjes; Ilona Dunkel; Silke R Sperling
Journal: PLoS Genet Date: 2011-02-17 Impact factor: 5.917

7. The genetic and mechanistic basis for variation in gene regulation.

Authors: Athma A Pai; Jonathan K Pritchard; Yoav Gilad
Journal: PLoS Genet Date: 2015-01-08 Impact factor: 5.917

8. Discovery and validation of sub-threshold genome-wide association study loci using epigenomic signatures.

Authors: Xinchen Wang; Nathan R Tucker; Gizem Rizki; Robert Mills; Peter Hl Krijger; Elzo de Wit; Vidya Subramanian; Eric Bartell; Xinh-Xinh Nguyen; Jiangchuan Ye; Jordan Leyton-Mange; Elena V Dolmatova; Pim van der Harst; Wouter de Laat; Patrick T Ellinor; Christopher Newton-Cheh; David J Milan; Manolis Kellis; Laurie A Boyer
Journal: Elife Date: 2016-05-10 Impact factor: 8.140

9. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog).

Authors: Jacqueline MacArthur; Emily Bowler; Maria Cerezo; Laurent Gil; Peggy Hall; Emma Hastings; Heather Junkins; Aoife McMahon; Annalisa Milano; Joannella Morales; Zoe May Pendlington; Danielle Welter; Tony Burdett; Lucia Hindorff; Paul Flicek; Fiona Cunningham; Helen Parkinson
Journal: Nucleic Acids Res Date: 2016-11-29 Impact factor: 16.971

10. Large-scale identification of sequence variants influencing human transcription factor occupancy in vivo.

Authors: Matthew T Maurano; Eric Haugen; Richard Sandstrom; Jeff Vierstra; Anthony Shafer; Rajinder Kaul; John A Stamatoyannopoulos
Journal: Nat Genet Date: 2015-10-26 Impact factor: 38.330

12 in total

1. In heart failure reactivation of RNA-binding proteins is associated with the expression of 1,523 fetal-specific isoforms.

Authors: Matteo D'Antonio; Jennifer P Nguyen; Timothy D Arthur; Hiroko Matsui; Margaret K R Donovan; Agnieszka D'Antonio-Chronowska; Kelly A Frazer
Journal: PLoS Comput Biol Date: 2022-02-28 Impact factor: 4.475

2. Allele-specific epigenetic activity in prostate cancer and normal prostate tissue implicates prostate cancer risk mechanisms.

Authors: Anamay Shetty; Ji-Heui Seo; Connor A Bell; Edward P O'Connor; Mark M Pomerantz; Matthew L Freedman; Alexander Gusev
Journal: Am J Hum Genet Date: 2021-10-25 Impact factor: 11.025

3. Genetic determinants of chromatin reveal prostate cancer risk mediated by context-dependent gene regulation.

Authors: Sylvan C Baca; Cassandra Singler; Soumya Zacharia; Ji-Heui Seo; Tunc Morova; Faraz Hach; Yi Ding; Tommer Schwarz; Chia-Chi Flora Huang; Jacob Anderson; André P Fay; Cynthia Kalita; Stefan Groha; Mark M Pomerantz; Victoria Wang; Simon Linder; Christopher J Sweeney; Wilbert Zwart; Nathan A Lack; Bogdan Pasaniuc; David Y Takeda; Alexander Gusev; Matthew L Freedman
Journal: Nat Genet Date: 2022-09-07 Impact factor: 41.307

4. In vitro Differentiation of Human iPSC-derived Cardiovascular Progenitor Cells (iPSC-CVPCs).

Authors: Agnieszka D'Antonio-Chronowska; Matteo D'Antonio; Kelly A Frazer
Journal: Bio Protoc Date: 2020-09-20

5. Dynamic effects of genetic variation on gene expression revealed following hypoxic stress in cardiomyocytes.

Authors: Michelle C Ward; Nicholas E Banovich; Abhishek Sarkar; Matthew Stephens; Yoav Gilad
Journal: Elife Date: 2021-02-08 Impact factor: 8.140

6. A genome-wide association and polygenic risk score study on abnormal electrocardiogram in a Chinese population.

Authors: Mengqiao Wang; Jiaqi Gao; Yang Shi; Xing Zhao
Journal: Sci Rep Date: 2021-02-25 Impact factor: 4.379

7. Perinatal risk factors for congenital hypothyroidism: A retrospective cohort study performed at a tertiary hospital in China.

Authors: Jinfu Zhou; Jinying Luo; Junyu Lin; Yinglin Zeng; Xiaolong Qiu; Wenbin Zhu; Guanghua Liu
Journal: Medicine (Baltimore) Date: 2020-06-26 Impact factor: 1.817

8. A Panel of rSNPs Demonstrating Allelic Asymmetry in Both ChIP-seq and RNA-seq Data and the Search for Their Phenotypic Outcomes through Analysis of DEGs.

Authors: Elena E Korbolina; Leonid O Bryzgalov; Diana Z Ustrokhanova; Sergey N Postovalov; Dmitry V Poverin; Igor S Damarov; Tatiana I Merkulova
Journal: Int J Mol Sci Date: 2021-07-06 Impact factor: 5.923

9. Tissue-specific multi-omics analysis of atrial fibrillation.

Authors: Ines Assum; Julia Krause; Tanja Zeller; Renate B Schnabel; Matthias Heinig; Markus O Scheinhardt; Christian Müller; Elke Hammer; Christin S Börschel; Uwe Völker; Lenard Conradi; Bastiaan Geelhoed
Journal: Nat Commun Date: 2022-01-21 Impact factor: 14.919

Review 10. Human iPSC-Cardiomyocytes as an Experimental Model to Study Epigenetic Modifiers of Electrophysiology.

Authors: Maria R Pozo; Gantt W Meredith; Emilia Entcheva
Journal: Cells Date: 2022-01-07 Impact factor: 7.666