Literature DB >> 28991257

Contribution of rare inherited and de novo variants in 2,871 congenital heart disease probands.

Sheng Chih Jin¹, Jason Homsy^2,3, Samir Zaidi¹, Qiongshi Lu⁴, Sarah Morton⁵, Steven R DePalma², Xue Zeng¹, Hongjian Qi⁶, Weni Chang⁷, Michael C Sierant¹, Wei-Chien Hung¹, Shozeb Haider⁸, Junhui Zhang¹, James Knight⁹, Robert D Bjornson⁹, Christopher Castaldi⁹, Irina R Tikhonoa⁹, Kaya Bilguvar⁹, Shrikant M Mane⁹, Stephan J Sanders¹⁰, Seema Mital¹¹, Mark W Russell¹², J William Gaynor¹³, John Deanfield¹⁴, Alessandro Giardini¹⁴, George A Porter¹⁵, Deepak Srivastava^16,17,18, Cecelia W Lo¹⁹, Yufeng Shen²⁰, W Scott Watkins²¹, Mark Yandell^21,22, H Joseph Yost²¹, Martin Tristani-Firouzi²³, Jane W Newburger²⁴, Amy E Roberts²⁴, Richard Kim²⁵, Hongyu Zhao⁴, Jonathan R Kaltman²⁶, Elizabeth Goldmuntz²⁷, Wendy K Chung²⁸, Jonathan G Seidman², Bruce D Gelb²⁹, Christine E Seidman^2,3,30, Richard P Lifton^1,31, Martina Brueckner^1,32.

Abstract

Congenital heart disease (CHD) is the leading cause of mortality from birth defects. Here, exome sequencing of a single cohort of 2,871 CHD probands, including 2,645 parent-offspring trios, implicated rare inherited mutations in 1.8%, including a recessive founder mutation in GDF1 accounting for ∼5% of severe CHD in Ashkenazim, recessive genotypes in MYH6 accounting for ∼11% of Shone complex, and dominant FLT4 mutations accounting for 2.3% of Tetralogy of Fallot. De novo mutations (DNMs) accounted for 8% of cases, including ∼3% of isolated CHD patients and ∼28% with both neurodevelopmental and extra-cardiac congenital anomalies. Seven genes surpassed thresholds for genome-wide significance, and 12 genes not previously implicated in CHD had >70% probability of being disease related. DNMs in ∼440 genes were inferred to contribute to CHD. Striking overlap between genes with damaging DNMs in probands with CHD and autism was also found.

Entities: Chemical Disease Gene Mutation Species

Mesh：

Substances：

Year: 2017 PMID： 28991257 PMCID： PMC5675000 DOI： 10.1038/ng.3970

Source DB: PubMed Journal: Nat Genet ISSN： 1061-4036 Impact factor: 38.330

INTRODUCTION

Congenital heart disease (CHD) affects ~1% of live births and remains the leading cause of mortality from birth defects[1]. After surgical repair, patients remain at risk of cardiac arrhythmias, heart failure, neurodevelopmental deficits and other congenital anomalies[2, 3]. While aneuploidies and copy number variations (CNVs) account for ~23% of CHD patients[4-6], these have yielded few individual causal genes. While genes causing rare Mendelian syndromic forms of CHD have been identified, genes underlying the large majority of sporadic CHD remain unknown. To this end, the NHLBI Pediatric Cardiac Genomics Consortium (PCGC) has collected >10,000 CHD probands, including >5,000 parent-offspring trios[7]. Whole exome sequencing (WES) of 1,213 trios from this cohort showed that ~10% of cases are attributable to de novo mutations (DNMs) in >400 target genes, including dramatic enrichment for damaging mutations in genes encoding chromatin modifiers[8, 9]. Moreover, these studies demonstrated a striking shared genetic etiology between CHD and neurodevelopmental disorders (NDD)[6, 9]. Genetic studies of humans and mice predict a role for inherited variants with large effect[10, 11]. Analysis of rare multigenerational CHD families has identified mutations in cardiac transcription factors, signaling molecules and structural components[12]. Inherited heterozygous protein-truncating variants have been implicated in non-syndromic CHD and have suggested distinct genetic architectures for syndromic and non-syndormic CHD[9, 13]. To date, the roles of recessive inheritance and novel genes operating via dominant transmission have not been systematically studied. Discovery of additional large-effect mutations requires large cohorts, comprehensive genomic data and robust statistical methods. Here, we analyze the impact of rare inherited recessive and dominant variants, and of DNMs on CHD via WES of a single large CHD cohort.

RESULTS

Cohort Characteristics and Sequencing

We studied 2,871 CHD probands comprising 2,645 parent-offspring trios and 226 singletons recruited to the PCGC and the Pediatric Heart Network (PHN) programs (Supplementary Data Set 1). These include 1,204 previously reported trios[9]. The ethnicities, gender and clinical features of probands are shown in Supplementary Table 2 and Supplementary Tables 3a–c. Patients with known trisomies and CHD-associated CNVs were prospectively excluded from analysis. Genomic DNAs underwent WES (see Online Methods). In parallel, WES from 1,789 control trios comprising parents and unaffected siblings of autism probands was analyzed[14]. Cases and controls showed similar sequencing metrics (Supplementary Table 4). Variants were called and annotated as described in methods.

Recessive Genotypes Enriched in CHD

Principal component analysis (PCA) from WES genotypes showed that CHD cases were more frequently of non-European ancestry than controls. The inbreeding coefficient of probands was higher than controls (Supplementary Figure 1). These differences complicate direct comparison recessive genoytpes (RGs) in cases and controls. Accordingly, we implemented a binomial test to quantify the enrichment of damaging RGs in genes or gene sets in cases, independent of controls. This method compares the observed number of rare damaging RGs to the expected frequency, estimated from from the de novo probability, adjusting for inbreeding, using the polynomial model (see Online Methods and Supplementary Figures 2–6). We curated a set of 212 human CHD genes (H-CHD genes) from the Online Mendelian Inheritance in Man (OMIM) and published data[13], and human orthologs of 61 mouse CHD genes (M-CHD genes) identified in a recessive screen for CHD (Supplementary Data Set 2 and Supplementary Note)[11]. The H-CHD set comprised 104 dominant genes, 85 recessive genes, 12 X-linked genes, and 11 genes showing both dominant and recessive transmission. Accounting for 20 genes identified in both human and mouse, the combined set comprised 253 human genes (Supplementary Data Set 2). We identified rare (minor allele frequency [MAF] < 0.001) likely loss-of-function (LoF; frameshift, nonsense, canonical splice site, and start loss), likely damaging missense variants (by MetaSVM; D-Mis), and non-frameshift insertion/deletion variants, and identified homozygous or compound heterozygous genotypes comprising these alleles. This identified 467 damaging RGs in CHD cases (Supplementary Data Set 3) and 165 in controls (Supplementary Data Set 4). We used the one-tailed binomial test to determine whether damaging RGs were enriched among 96 genes implicated in recessive human CHD (Table 1a). This gene set had 29 damaging RGs vs. 6.7 expected (enrichment = 4.4, P = 8.0×10−11; Table 1, Supplementary Figure 5b, Supplementary Table 5). This set showed zero RGs in controls (Table 1). Adding 41 recessive mouse genes, there were 34 damaging RGs compared to 11.1 expected (enrichment = 3.1, P = 1.4×10−8; Table 1). Adding 116 dominant CHD genes added 17 damaging RGs in 9 genes (cumulative total, observed 51 vs. expected 25.2, enrichment = 2.0, P = 1.8×10−6; Table 1). Similar results were obtained from independently modeling homozygous and compound heterozygous genotypes (see Online Methods, Supplementary Table 6, and Supplementary Figures 7–8) and further corroborated using a burden test-based approach[15, 16] that also integrates proband phenotype[17] (see Online Methods and Supplementary Figure 9). These findings implicate RGs in known CHD genes in 0.9% of these CHD cases.

Table 1

Damaging recessive genotypes in known CHD genes in cases and controls

2,871 CHD cases

Gene set (# genes)	Observed				Expected	Enrichment	P-value

	#homozygotes	# compoundheterozygous	# uniquegenes	# recessivegenotypes	# recessivegenotypes

All genes (18,989)	265	202	391	467	-	-	-

Recessive Known Human (96)	19	10	16	29	6.65	4.36	8.0×10⁻¹¹
Recessive Known Mouse or Human (137)	21	13	19	34	11.06	3.07	1.4×10⁻⁸
Known Mouse or Human CHD (253)	28	23	28	51	25.15	2.03	1.8×10⁻⁶

1,789 controls	Observed				Expected

All genes (18,989)	22	131	146	165	-	-	-

Recessive Known Human (96)	0	0	0	0	2.61	0	1
Recessive Known Mouse or Human (137)	1	1	2	2	4.47	0.45	0.94
Known Mouse or Human CHD (253)	2	3	5	5	10.18	0.49	0.98

The expected number of recessive genotypes was determined based on fitted values from the polynomial regression model using the damaging de novo probabilities. P-values were calculated using the one-tailed binomial probability. Values in bold are p-values exceeding the Bonferroni multiple testing cutoff = 0.05/(3×2) = 8.3×10−3.

For previously identified recessive genes, the observed and previously reported cardiac phenotypes were concordant in 22 of 31 cases, suggesting variable expressivity of RGs. For previously identified dominant genes, observed cardiac phenotypes matched those previously reported in only 3 of 17 probands. Of these, phenotypes seen with RGs were more severe than previously described dominant phenotypes (COL1A1, COL5A2, FBN2, MYH6, NSD1, and TSC2), or at the severe end of the described spectrum (CHD7 and NOTCH1; Supplementary Table 5). We examined the contribution of consanguinity to RGs. 161 probands (5.6%) had homozygous segments implying parental relationships of 3rd cousins or closer (see Supplementary Note). This group included 81 of 84 probands with reported consanguinity. Thirteen (8.1%) of these probands had damaging RGs in recessive H-CHD genes (2.4 expected, 5.4-fold enrichment, P = 1.3×10−6; Supplementary Table 7); all but one genotype was homozygous. Among the remaining 2710 probands, RGs were also enriched (3.9-fold, 16 observed vs. 4.1 expected, P = 5.3×10−6), however RG’s comprised only 0.6% of this group (Supplementary Table 7). Among the seven homozygotes in this group, five probands had inbreeding coefficients between 0.0015 and 0.0035, implying distant parental relatedness, whereas two homozygotes and all nine compound heterozygotes had inbreeding coefficients of zero. Thus, cryptic or overt parental consanguinity was a strong driver of recessive CHD in this cohort. Importantly, 38% of RGs in recessive CHD genes were attributable to a single GDF1 founder mutation (see below). Significant enrichment for RGs in known CHD genes persists after removal of GDF1 homozygotes (Supplementary Table 8). We observed 44 genes with > 1 damaging RG compared to 26.4 expected (enrichment = 1.7; P = 8.9×10−5 by permutation; see Online Methods); synonymous RGs were not significantly enriched (167 observed, 156.7 expected, P = 0.15 by permutation). This excess persisted after removal of 5 known recessive genes (GDF1, ATIC, DNAH5, DAW1, LRP1; enrichment = 1.6; P = 10−3 by permutation). GO ontology of the novel gene set revealed enrichment of genes involved in muscle cell development (GO:0055001, enrichment = 29.5, FDR = 3.2×10−3), including KEL, MYH6, MYH11, NOTCH1, and RYR1 (Supplementary Data Sets 3,5).

Founder Mutation in GDF1 in Ashkenazim

Q-Q plots comparing the observed and expected damaging RGs in each gene using the binomial test showed that two genes, GDF1 and MYH6, had more RGs than expected (genome-wide threshold, P < 2.6×10−6, Figure 1a; Supplementary Table 9); modeling homozygotes and compound heterozygotes separately yielded similar results (Supplementary Table 10). No genes approached genome-wide significance in controls (Figure 1b).

Figure 1

Quantile-quantile plots comparing observed versus expected P-values for recessive genotypes in each gene in cases and controls

Recessive genotypes (RGs) shown include LoF, D-Mis, and non frameshift insertion/deletions. The expected number of RGs in each gene was calculated from the total number of observed RGs as described in Methods. The significance of the difference between the observed and expected number of RGs was calculated using a one-sided binomial test. (a). Quantile-quantile (Q-Q) plot in cases. (b). Q-Q plot in controls. While the observed values closely conform to expected values in controls, two genes, GDF1 and MYH6, show a significantly increased burden of RGs in cases and survive the multiple-testing correction threshold.

GDF1 had 11 damaging RGs in apparently unrelated subjects compared with 0.016 expected (enrichment = 692.6, one-tailed binomial P = 3.6×10−28; Supplementary Table 9); all were confirmed by Sanger sequencing (Supplementary Figure 10). Ten RGs were homozygous for a p.Met364Thr (c.1091T>C) variant, suggesting a founder mutation; the other was p.Met364del (c.1090_1092delATG)/p.Cys227* (c.681C>A). Consistent with a founder mutation, PCA showed that all p.Met364Thr homozygotes clustered with Ashkenazim (Supplementary Figure 11). Additional evidence supports homozygosity for p.Met364Thr in CHD risk among Ashkenazim. p.Met364Thr shows remarkable violation of Hardy Weinberg equilibrium among Ashkenazi CHD cases, with 10 homozygotes and only 1 heterozygote among 204 Ashkenazi cases defined by PCA (P = 5.5×10−38, 1-df chi-square test with Yate’s correction; Supplementary Table 11a). In contrast, among 302 Ashkenazi autism parental controls and 926 additional Ashkenazi adults from an independent cohort without CHD, there were no homozygotes and 12 heterozygotes (carrier frequency = 1.0%), providing strong association of p.Met364Thr homozygosity with CHD among Ashkenazim (two-sided Fisher’s Exact P = 2.8×10−9, Supplementary Table S11b). Moreover, this allele was absent among African, Asian, and Finnish European populations in ExAC. Lastly, all homozygotes shared p.Met364Thr on a common haplotype background, indicating identity by descent (Figure 2a). The length of the shared haplotype varied widely (0.4–5.9 Mb; Figure 2a), indicating remote shared ancestry. The inferred coalescent time for the last shared ancestor, using DMLE+2.3 software[18], is 50 generations (95% CI: 45 to 63 generations; Supplementary Figure 12).

Figure 2

Phenotypes and shared haplotypes among homozygotes for GDF1-p.Met364Thr

(a). Extent of homozygous SNPs flanking homozygous GDF1-p.Met364Thr genotypes. A 5.9 Mb segment of chromosome 19 extending across the location of the homozygous GDF1-p.Met364Thr mutation (denoted by red square) in each unrelated subject is depicted. At the bottom, tick marks indicate location of all SNPs found by exome sequencing among Ashkenazim in cases. Known SNPs are shown via their rs identifiers. Allele frequencies of novel SNPs are indicated by asterisks. The closest heterozygous SNP to either side of the GDF1-p.Met364Thr in each subject is shown as a white square; all SNPs between these two heterozygous SNPs, encompassed by the light blue bar, are homozygous for the same allele seen in other subjects, consistent with the p.Met364Thr variant being identical by descent among all subjects. The length of each homozygous segment is indicated at the right of the panel. The maximum length of the homozygous segment shared by all subjects is 234 kb (shown as grey vertical bar), consistent with the mutation having been introduced into a shared ancestor many generations ago. (b). Cardiac and extracardiac phenotypes of GDF1-p.Met364Thr homozygotes. Present phenotypes are denoted with ‘+’, those absent with ‘−’, and those unavailable for testing with ‘NA’ (c). Ribbon diagram of part of GDF1 homodimer containing p.Met364. The hydrophobic helix from one subunit (yellow) sits above p.Met364 on the other subunit (blue). (d). Space filling model of the segment of GDF1 containing the wild-type p.Met364 showing surface electrostatic charge (blue=positive, red=negative). (e). Surface electrostatic charge of the segment containing mutant p.Thr364. Compared to wild-type, the mutant peptide shows a more negatively charged cavity.

Consistent with this RG causing CHD and not merely being in linkage disequilibrium with the causal variant, the phenotype of p.Met364Thr homozygotes is shared by previously described cases with different recessive GDF1 mutations[19]. Like prior cases, all GDF1 p.Met364Thr homozygotes had D- or L-transposition of the great arteries, pulmonary stenosis/atresia or both (Figure 2b). GDF1 belongs to the transforming growth factor-beta (TGF-β) superfamily. Studies in mouse implicated Gdf1 in establishment of left-right asymmetry and neural development[20-22]. GDF1 functions as a homodimer with two-fold inverted symmetry (Figure 2c and Supplementary Figure 13). The interaction surface between monomers comprises a hydrophophic α-helix in one monomer and a hydrophobic cavity in the other; this interaction occurs reciprocally. Met364 lies in the hydrophobic cavity (Figure 2d–e). p.Met364Thr substitutes the polar threonine in the hydrophobic cavity; we infer that this variant impairs dimer formation and downstream signaling (Figure 2c), consistent with recessive transmission. Homozygosity for GDF1 p.Met364Thr accounts for ~5% of severe CHD among Ashkenazim, including 18% of those with TGA (7 of 38), and 31% with TGA plus PS/PA (5 of 16). This finding has clinical implications for assessing risk of CHD among Ashkenazim.

Recessive MYH6 Genotypes in Shone Complex

MYH6 encodes the cardiac alpha myosin heavy chain, which is highly expressed in embryonic heart. Dominant MYH6 mutations are implicated in atrial septal defect[23] and cardiomyopathy[24, 25]. We identified seven rare damaging RGs in MYH6 versus 0.482 expected (enrichment = 14.5, P = 7.6×10−7; Supplementary Table 9). These included diverse LoF alleles and D-Mis variants, all validated by Sanger sequencing (Table 2, Supplementary Table 9, and Supplementary Figure 14). Five probands had left ventricular obstruction, including four with Shone complex[26], having mitral valve and aortic valve obstruction plus aortic arch obstruction (Table 2). Echocardiography revealed abnormal ventricular function in 4 of 7 probands, consistent with a previous report of two patients with RGs in MYH6 who had decreased ventricular function[27]. RGs in MYH6 accounted for 11% of the 37 sequenced patients with Shone complex (enrichment = 57.45, two-sided Fisher’s exact P = 6.7×10−5).

Table 2

Recessive MYH6 genotypes associated with Shone complex and valvular disease.

ID	AA Change(coding DNA Change)	ExAC EthnicSpecific Freq	Shonecomplex	Detailed CardiacPhenotype	Cardiac Function	Extracardiac	NDD	Age atfollow-up
1-00051	p.Lys1932*/p.Ala1891Thr (c.5794A>T/c.5671G>A)	3.0×10⁻⁵/0	+	LSVC, abn MV, sub AS, valve AS, CoA	LV diastolic dysfunction	−	+ (LD)	22
1-01407	p.Glu98Lys (c.292G>A)	3.0×10⁻⁴	−	mitral atresia, DORV, CoA	mild RV systolic dysfunction	Hypothyroid	+ (LD)	16
1-04847	p.Arg1899His/p.Asn598Lysfs*38 (c.5696G>A/c.1793dupA)	0/0	+	parachute MV, BAV, CoA	NL	−	−	16
1-05009	p.Ala1327Val/p.Leu388Phe(c.3980C>T/c.1162C>T)	2.7×10⁻³/0	−	TA, PA	dilated, hyper-trabeculated LV	−	NA	0
1-06399	p.Gly585Ser/p.Ile512Thr(c.1753G>A/c.1535T>C)	2.0×10⁻⁴/3.0×10⁻⁵	+	mitral stenosis, VSD, BAV, hypoplastic transv. Ao	NL	−	NA	0.08
1-06876	p.Ile1068Thr/Splice site(c.3203T>c.3979-2A>C)	1.5×10⁻⁵/2.0×10⁻⁵	+	LSVC, abn mitral valve, valve AS, CoA	dilated LV	−	−	22
1-07343	p.Arg1610Cys (c.4828C>T)	3.0×10⁻⁵	−	ASD/VSD	NA	−	NA	NA

Abbreviations: ASD- Atrial septal defect, AS- Aortic stenosis, BAV- Bicuspid aortic valve, CoA- Coarctation of the aorta, DORV- Double outlet right ventricle. MV-mitral valve, PA-Pulmonary atresia, TA-Tricuspid atresia, VSD-Ventricular septal defect. Extracardiac manifestations refer to CHD probands displaying additional abnormalities not pertaining to the heart. NDD-neurodevelopmental disabilities, LD-Learning Disability, NA-NDD status not attained as proband < age 1. “+”:Present; “ −”:Not present.

Recessive Genotypes Enriched in Patients with Laterality Defects

Among the major CHD subgroups (laterality defects, left ventricular obstruction, conotruncal defects and others; Supplementary Table 3a), only laterality defects (heterotaxy and D-TGA) were significantly enriched for damaging RGs in known CHD genes (21 damaging RGs in 13 genes vs. 4.8 expected; enrichment = 4.4, P = 8.5×10−9; Supplementary Table 12). Significant enrichment persisted after removing GDF1 RGs (enrichment = 3.2, P = 1.2×10−4). These RGs occurred in eight genes previously implicated in laterality defects (ARMC4, BBS10, DAW1, DNAAF1, DNAH5, DYNC2H1, GDF1, and PKD1L1) and five not previously implicated (ATIC, COL1A1, COL5A2, DGCR2, and MYH6). We also performed GO ontology analysis of all 82 genes with LoF RGs. This identified significant terms related to cilia structure/regulation, a predominant mechanism in laterality determination (Supplementary Data Set 6). Genes in these GO terms included DNAI2, ARMC4, DNAH5, and DNAAF1 (proband phenotypes in Supplementary Data Set 3). Although all these genes have been associated with human primary ciliary dyskinesia and situs inversus totalis, only DNAH5 has been previously associated with human CHD[28].

Heterozygous LoF Mutations in FLT4 in Tetralogy of Fallot

We compared the observed and expected frequency of rare (MAF ≤ 10−5) heterozygous LoF variants in 115 known dominant CHD genes in cases and controls using the binomial test and found no significant enrichment in either group (Supplementary Data Sets 7–8; Supplementary Table 13a,b). Analysis of heterozygous LoF variants in all 212 known human CHD genes also showed no enrichment. To search for novel haploinsufficient CHD genes, we compared the observed and expected distribution of rare heterozygous LoFs in each gene (see Online Methods). Q-Q plots (Supplementary Figure 15) showed that FLT4, with eight different inherited LoFs, significantly departed from expectation (enrichment = 15.5, P = 7.6×10−8, Supplementary Table 14). Moreover, there were two de novo FLT4 LoF mutations, yielding a combined p-value of 9.8×10−10 (p-values combined by Fisher’s method, Figure 3). LoF variants were distributed throughout the encoded protein; all were confirmed by Sanger sequencing (Supplementary Figure 16).

Figure 3

FLT4 loss-of-function mutations in Tetralogy of Fallot

(a). Pedigrees of 10 CHD kindreds with rare FLT4 loss-of-function (LoF) mutations are shown. Subjects with and without CHD are shown as filled and unfilled symbols, respectively. Each kindred ID number is shown along with the FLT4 genotype of each subject and CHD phenotype of affected subjects. (b) Diagram of FLT4 protein is shown with seven immunoglobulin domains (Ig) and a kinase domain. The top panel shows LoF mutations associated with Tetralogy-type CHD, whereas the bottom panel displays missense mutations associated with the Milroy disease (Hereditary Lymphedema).

FLT4 was highly intolerant to LoF variation in ExAC (pLI = 1) and only one LoF allele was identified among 3,578 parental controls. Pedigrees of FLT4 probands revealed four family members with CHD; all shared the proband’s FLT4 mutation (Figure 3a). However, only 4 of 10 FLT4 mutation carriers reported CHD, indicating incomplete penetrance. Strongly supporting a pathogenic role for the FLT4 LoFs, the phenotype of 9 of 10 probands and 3 of 4 affected relatives was tetralogy of Fallot (TOF) (Figure 3a); mutation carriers had no extracardiac malformations, growth abnormalities or NDD. Among 426 probands with TOF in our cohort, 2.3% had FLT4 LoF mutations (95.2-fold enrichment, P = 1.9×10−12; Supplementary Table 15). FLT4 encodes a VEGF receptor expressed in lymphatics and the vasculature. Interestingly, diverse missense mutations that cluster in the kinase domain and impair enzymatic activity cause hereditary lymphedema (Figure 3b)[29].

De Novo Damaging Mutations Enriched in Isolated CHD Cases

The number of observed DNMs in cases and controls closely fit the Poisson distribution (Supplementary Figure 17; Supplementary Data Sets 9–10). Damaging DNMs were enriched in cases (1.4-fold, P = 2.4×10−17, Supplementary Table 16) but not controls. We inferred that damaging DNMs contribute to ~8.3% of cases. Additionally, we found 89 damaging DNMs in 46 chromatin modifiers accounting for 2.3% of cases (enrichment = 3.1, P = 8.7×10−20; Figure 4a; Supplementary Tables 17–18), including seventeen chromatin modifier genes not previously implicated in CHD.

Figure 4

Chromatin modification genes and genes with multiple damaging de novo mutations are enriched for high expression in developing heart and intolerance to loss-of-function mutation

(a) Enrichment of damaging mutations in chromatin modifiers in genes highly expressed in developing heart and intolerant to loss-of-function (LoF) mutation. X axis (0–100) denotes the percentile rank of heart expression in developing mouse heart at E14.5, and y axis (0–1.0) denotes intolerance to LoF mutation (pLI) in the ExAC database. (b) 66 genes with 2 or more damaging de novo mutations are plotted. Multihit genes are highly enriched (N=31) for genes that are highly expressed in developing heart and intolerant to LoF mutation (pLI ≥ 0.99).

There were 66 genes with two or more damaging DNMs compared to 21 previously[8, 9] (Figure 4b, Supplementary Tables 19–20). Interestingly, 108 damaging DNMs affecting 39 of 104 known dominant H-CHD genes accounted for 3.7% of cases (enrichment = 9.3, P = 5.5×10−65; Supplementary Table 21). An orthogonal analytic approach yielded similar results (see Supplementary Note and Supplementary Figure 18). Unlike prior studies[8, 9, 13], we found that damaging DNMs were enriched in isolated CHD cases (CHD without extracardiac congenital anomaly, clinically diagnosed syndrome or neurodevelopmental abnormality, and limited to patients over age 1 at enrollment); these mutations contributed to ~3.1% of cases (1.5-fold enrichment, P = 8.5×10−4; Supplementary Table 22a). Damaging DNMs in known CHD genes accounted for ~50% (13/26) of the excess mutation burden in isolated CHD. DNMs contributed to 6%–8% of probands with any extracardiac features (EA alone or NDD alone), and to 28% of cases with both EA and NDD (Supplementary Tables 22a–d and 23).

De novo mutations are Enriched in Autism-Associated Genes

We previously showed unexpected overlap of genes harboring damaging DNMs in CHD and neurodevelopmental disorders[8, 9]. We compared the genes harboring damaging DNMs in our CHD cohort and in 4,778 probands with autism[30, 31], focusing on genes in the upper quartile of brain and heart expression. Nineteen such genes had de novo LoF mutations in both cohorts (enrichment 5.2, P < 10−6) and 48 had damaging mutations in both (enrichment 2.8, P < 10−6; Supplementary Table 24). Notably, among CHD patients with neurodevelopmental phenotyping, 67% (21/31) of those with LoF DNMs in the overlapping gene set had NDD, compared to 32.8% in the total cohort with neurodevelopmental phenotyping (OR = 4.3; two-sided Fisher’s P = 1.4 ×10−4; Supplementary Table 25). Notably, 14/35 of all genes with LoF DNMs in both the CHD and autism cohorts are chromatin modifiers (enrichment = 14.7, P < 10−6 by permutation; Supplementary Table 25). Most strikingly, 87% of patients who had LoF DNMs in chromatin modifiers had NDD at enrollment.

Meta-Analysis of Damaging De Novo and Loss-of-function Heterozygous Variants

We tested each gene for an excess of de novo and rare inherited heterozygous variants. Seven genes (CHD7, KMT2D, PTPN11, RBFOX2, FLT4, SMAD6, and NOTCH1) surpassed genome wide significance (Table 3) compared to four previously[9, 13]. Among the remaining top 25 genes, KDM5B had strong prior statistical support, ELN, NSD1, NODAL, RPL5, and SOS1 have previously been found associated with syndromic CHD; GATA6, FRYL, and TBX18 were identified in case reports with a phenotype that included CHD. Our findings strengthen the evidence supporting a role for these genes.

Table 3

Top 25 genes in the meta-analysis of damaging de novo mutations and loss-of-function heterozygous mutations in probands

Gene	Damaging de novo		LoF heterozygotes		Meta P-value	pLI	HHE Rank	Gene Set

	# Damaging	P-value	# LoF	P-value
CHD7	14	1.6×10⁻²⁰	0	1	7.5×10⁻¹⁹	1	93.4	H-CHD/Chromatin
KMT2D	16	2.1×10⁻²⁰	1*	0.86	8.5×10⁻¹⁹	1	96.8	H-CHD/Chromatin
PTPN11	9	4.6×10⁻¹⁷	0	1	1.8×10⁻¹⁵	1	94.2	H-CHD
FLT4	2	5.2×10⁻⁴	8	7.6×10⁻⁸	9.8×10⁻¹⁰	1	74.4	NA
NOTCH1	5	2.7×10⁻⁵	6*	1.8×10⁻⁴	9.4×10⁻⁸	1	87.9	H-CHD
RBFOX2	3	3.4×10⁻⁷	1*	0.18	1.1×10⁻⁶	0.99	97.8	NA
SMAD6	1	0.012	8	6.0×10⁻⁶	1.3×10⁻⁶	0	78.3	M-CHD
GATA6	4	2.4×10⁻⁷	0	1	3.8×10⁻⁶	N/A	94.8	H-CHD
ELN	2	1.3×10⁻⁴	5*	8.7×10⁻³	1.7×10⁻⁵	0	79.8	H-CHD
CCDC154	0	1	7*	5.5×10⁻⁶	7.2×10⁻⁵	0.31	18.4	NA
SLCO1B3	0	1	9	6.6×10⁻⁶	8.5×10⁻⁵	0	11.7	NA
GPBAR1	2	2.6×10⁻⁵	1	0.27	9.1×10⁻⁵	0	19.9	NA
PTEN	2	6.0×10⁻⁵	1	0.16	1.2×10⁻⁴	0.98	77.9	H-CHD
RPL5	2	6.2×10⁻⁵	1	0.16	1.3×10⁻⁴	0.99	97.9	H-CHD
NSD1	5	1.0×10⁻⁵	0	1	1.3×10⁻⁴	1	94.8	H-CHD/Chromatin
SAMD11	2	1.8×10⁻⁴	4*	0.06	1.4×10⁻⁴	0	N/A	NA
C21ORF2	0	1	5	1.2×10⁻⁵	1.5×10⁻⁴	0.01	46.7	NA
NODAL	0	1	4	1.2×10⁻⁵	1.5×10⁻⁴	0.95	16.4	H-CHD
SMAD2	3	5.5×10⁻⁵	1	0.24	1.6×10⁻⁴	0.99	74.7	NA
H1FOO	0	1	4	1.6×10⁻⁵	1.9×10⁻⁴	0.4	10.3	NA
FRYL	2	2.8×10⁻³	5*	8.3×10⁻³	2.8×10⁻⁴	1	84.4	NA
KDM5B	3	2.9×10⁻⁵	2*	0.86	2.9×10⁻⁴	0	86	Chromatin
POGZ	3	2.5×10⁻⁵	0	1	2.9×10⁻⁴	1	83.8	Chromatin
SOS1	3	2.6×10⁻⁵	0	1	3.0×10⁻⁴	1	67.9	H-CHD
TBX18	1	0.02	3	1.8×10⁻³	3.0×10⁻⁴	1	72.6	NA

Meta-analysis was performed by combining the p-values from damaging de novo mutations and loss-of-function (LoF) heterozygous mutations using the Fisher's method with 4 degrees of freedom. The top 25 genes are shown. Genes which are bolded surpass the Bonferroni multiple testing correction (2.6×10−6, 0.05/18,989) for p-values tabulated by either de novo, heterozygous, or meta-analysis. H-CHD: Known human CHD genes. M-CHD: Known mouse CHD genes. Chromatin: Chromatin modification genes consists of 546 genes in GO:0016569.

denotes that at least one of the carriers has unknown transmission.

SMAD6, an inhibitor of BMP signaling, had 8 inherited and one de novo LoF mutation (Meta P = 1.3×10−6; Table 3). Phenotypes included TOF, hypoplastic left heart syndrome, coarctation and D-TGA. Only two probands had extracardiac abnormalities. Zero LoFs were found among 7,156 parental control alleles, and LoFs were markedly enriched among European probands compared to non-Finnish European controls in ExAC (OR = 20.5, two-sided Fisher’s P = 2.7×10−6). SMAD6 missense variants, but not LoFs, have been previously identified in three sporadic cases with bicuspid aortic valve and mitral valve disease[32]. Among parents transmitting SMAD6 LOFs, only one had a CHD diagnosis, BAV. Interestingly, SMAD6 LoFs showing incomplete penetrance have also been implicated in midline craniosynostosis, with a common variant near BMP2 modifying penetrance[33]. Our findings suggest that SMAD6 LoFs produce variable phenotypes, dependent on additional genetic or environmental factors.

DISCUSSION

This study represents the largest genetic investigation of a single CHD cohort, and the first comprehensive analysis of recessive and dominant inherited variants in CHD. Our search for disease-associated transmitted variants and pathways was enhanced by comparing observed and expected numbers of recessive or dominant genotypes independent of control subjects, accommodating for variation in inbreeding and ethnic background. While extension of the expected frequency of DNMs to standing variation is confounded by the impact of selection and drift on allele frequencies over subsequent generations, our analysis demonstrates that this approach is robust for estimating the expected frequency of rare inherited variants, which are likely to be recently introduced into the population. We anticipate this approach will be broadly relevant. Rare inherited genotypes in known CHD genes, and genome-wide significant new CHD candidate genes accounted for 1.8% of CHD in this cohort. The excess of genes with RGs suggests that more genes await discovery. A recessive founder mutation in GDF1 accounted a large fraction of severe CHD among Ashkenazim. Genotyping this specific variant, which has a minor allele frequency of ~0.5% in Ashkenazim, can immediately be used for diagnosis and population-based risk assessment. Enrichment of damaging RGs was particularly marked in probands with laterality defects. This is consistent with epidemiology showing that laterality defects have the highest recurrence risk of any CHD[10], are more prevalent in populations with high consanguinity[34], and conversely show no enrichment for damaging DNMs[8, 9]. We also found new phenotypes arising from recessive mutations in genes previously implicated in CHD caused by monoallelic mutations, including RGs in MYH6 in Shone complex, a disease of previously unknown cause. The finding of abnormal ventricular function in several of these patients, as well as in other patients with monoallelic MYH6 mutation, suggests that patients with Shone complex and biallelic MYH6 mutations may be at particular risk for ventricular dysfunction, potentially allowing early identification and intervention. Other genes without previously described recessive phenotypes included CHD7, COL1A1, COL5A2, FBN2, NOTCH1, NSD1, and TSC2, as well as genes previously implicated only in mouse CHD (DGCR2, and DAW1, LRP1, and MYH10). Ten probands had rare LoFs in FLT4 and predominantly had TOF. None had NDD and only 1 had EA, unlike 25% of all TOF probands in this study. FLT4 LoFs resulted in phenotypes distinct from heterozygous missense mutations in the kinase domain that cause defective lymphatic development[35]. Further studies of the expression and role of FLT4 in the developing heart will be of interest. Doubling the size of our sequenced cohort more than doubled the identified CHD risk genes. The current data set includes 66 genes with two or more damaging DNMs compared to 21 previously, and 19 with two or more LoF DNMs compared to five previously[9]. Highly enriched gene sets, in which 72%–85% of genes are expected to confer risk, include 12 genes (AKAP12, ANK3, CLUH, CTNNB1, KDM5A, KMT2C, MINK1, MYRF, PRRC2B, RYR3, U2SURP, and WHSC1) not previously implicated in CHD[9], and have increased the strength supporting a role for 6 additional genes which as yet do not reach thresholds for significance (CAD, FRYL, GANAB, KDM5B, NAA15, and POGZ). DNMs are highly enriched in cases with neurodevelopmental abnormalities or extra-cardiac structural manifestations, or both. Importantly, we report for the first time a significant contribution of DNMs to 3.1% of isolated CHD. From the distribution of genes with multiple damaging DNMs, the estimated number of genes in which DNMs contribute to CHD in this cohort is 443 (95% CI = [154.1, 731.9]; Supplementary Figure 19; see Supplementary Note). Pathway analysis identifies DNMs, predominantly LoFs, in chromatin modifiers as a major contributor to CHD, accounting for 2.3% of probands (Figure 4). Eleven chromatin modifiers have two or more damaging DNMs, and we estimate that mutations in at least ~38 (95% CI = [7, 69]) chromatin modifier genes contribute to CHD using a maximum likelihood approach (Supplementary Figure 20). The implication of LoF DNMs in writers, erasers and readers of many different specific chromatin marks in CHD underscores the dosage sensitivity of these genes, which is supported by their general intolerance to LoF mutation. Together these findings suggest that heart development depends on precise control of transcription mediated by changes in chromatin state in response to developmental signals[36-38]. After removing chromatin modifiers from GO term enrichment analysis (for GO enrichment analysis with chromatin modifiers see Supplementary Data Set 11), several terms involved in developmental processes show enrichment (Supplementary Data Set 12). Extension of pathway analysis to genes with damaging RGs demonstrated enrichment of genes involved in cilia formation and function. These genes have long been known to play a critical role in establishment of the left-right body axis, and cilia gene mutations frequently contribute to heterotaxy. Understanding the mechanisms underlying the effects of these mutations will be of great interest in determining mechanisms of normal and abnormal human development. It is important to link the genetic causes of CHD to patient outcomes. There is striking overlap of genes mutated in CHD and autism. In particular, patients in our cohort with LoF mutations in chromatin modifiers are at very high risk of NDD (87%). Conversely, virtually all patients with LoF mutations in chromatin modifiers who have been ascertained for autism studies in the Simons Collection do not have CHD[31], indicating variable expressivity of CHD. We have noted previously that patients with DNMs in chromatin modifiers have high risk of NDD[9], suggesting that mutations in these genes may identify CHD patients at high risk of autism and intellectual disability who may benefit from early neurodevelopmental intervention[39]. By combining inherited and de novo variant analysis, we identified a genetic contribution to 10.1% of CHD. Despite these advances, the pathogenesis of a large fraction of CHD cases remains unknown. Potential explanations include contributions from more common variants, structural variants that have eluded detection by WES, variants in non-coding regions, polygenic inheritance, epistasis and gene-environment interactions[6, 33, 40, 41]. A recent study estimated that WES of 10,000 trios will yield 80% saturation for identifying genes contributing to syndromic CHD cases[13]. Our Monte Carlo simulations suggest that two or more damaging DNMs have now been identified in ~10.5% of risk loci, and that sequencing 10,000 trios will yield 170.1 risk genes, predicting 38% saturation of all CHD risk genes, comprising both syndromic and non-syndromic CHD acting via DNMs (Supplementary Figure 21). It is clear that loci suggested from human studies can be further substantiated at low cost by orthogonal approaches engineering mutations into model organisms and cells[42]. This study indicates that continued sequencing of large, well-phenotyped cohorts will provide an increasingly complete picture of the genetic underpinnings of CHD, allowing new insight into mechanisms governing human development, improved prediction of clinical outcome, and the opportunity to mitigate these risks.

ONLINE METHODS

Patient Subjects

Pediatric Cardiac Genomics Consortium (PCGC)

CHD subjects were recruited to the Congenital Heart Disease Network Study of the Pediatric Cardiac Genomics Consortium (CHD GENES: ClinicalTrials.gov identifier NCT01196182)[7]. The institutional Review Boards of Boston’s Children’s Hospital, Brigham and Women’s Hospital, Great Ormond Street Hospital, Children’s Hospital of Los Angeles, Children’s Hospital of Philadelphia, Columbia University Medical Center, Icahn School of Medicine at Mount Sinai, Rochester School of Medicine and Dentistry, Steven and Alexandra Cohen Children’s Medical Center of New York, and Yale School of Medicine approved the protocols. All subjects or their parents provided informed consent. Subjects were selected for structural CHD (excluding PDA associated with prematurity, and pulmonic stenosis associated with twin-twin transfusion). Individuals with either an identified chromosomal aneuploidy or a CNV that is known to be associated with CHD were not included. For all subjects, cardiac diagnoses were obtained from review of all imaging and operative reports and entered as Fyler codes based on the International Paediatric and Congenital Cardiac Codes (http://www.ipccc.net/). All patients were evaluated at study entry using a standardized protocol consisting of an interview that includes maternal, paternal and birth history and whether the patient has been examined by a geneticist. A comprehensive review of the proband’s medical record was performed that included height and weight data, along with presence or absence of a broad range of reported extracardiac malformations, the availability and results of genetic testing and the presence or absence of a clinical genetic diagnosis. For probands under age 1, specialty (other than cardiology) services obtained in the course of clinical care were documented. For probands over age 1, parents were asked if their child was diagnosed with developmental delay and whether educational supports were obtained. Each patient has a 3-generation pedigree. For the current study, assessment of neurodevelopmental outcome was based on parental report when the subject was at least 12 months old and classified as having NDD if they answered “Yes” to the presence of at least one of the following conditions: developmental delay, learning disability, mental retardation, or autism. A total of 1,027 cases could not be evaluated for neurodevelopmental outcome because the age at interview was < 1 year.

Pediatric Heart Network (PHN)

CHD subjects were chosen from the DNA biorepository of the Single Ventricle Reconstruction trial[43]. Subjects underwent in-person neurodevelopment evaluation at 14 months old with the Psychomotor Developmental Index (PDI) and Mental Development Index (MDI) of the Bayley Scales of Infant Development-II[44]. Subjects were further assessed with the Ages and Stages Questionnaire (ASQ) from which the scores at 3 year of age were analyzed. Subjects were classified as having NDD if PDI or MDI score < 70 or a risk score in at least one of the five domains of the ASQ at 3 year of age. DNA from blood or sputum was collected from trios follow-up visits at or after 3 years.

Controls

Controls included 1,789 previously analyzed families which include one offspring with autism, one unaffected sibling, and unaffected parents[14]. The permission to access to the genomic data in the Simons Simplex Collection (SSC) on the National Institute of Mental Health Data Repository was obtained. Written informed consent for all participants was provided by the Simons Foundation Autism Research Initiative[45]. Only the unaffected sibling and parents were analyzed in this study. Controls were designated as unaffected by the SSC[14].

Cardiac Phenotyping

Cardiac phenotypes were divided into 5 major categories (Supplementary Table 3a) on the basis of the major cardiac lesion: conotruncal defects (CTD, N=872), D-transposition of the great arteries (D-TGA, N=251), heterotaxy (HTX, N=272), left ventricular outflow tract obstruction (LVO, N=797), or Other (N=679). CTD phenotypes include Tetralogy of Fallot (TOF), double-outlet right ventricle (DORV), truncus arteriosus, membranous ventricular septal defects (VSD), and aortic arch abnormalities. LVO phenotypes include hypoplastic left heart syndrome (HLHS), coarctation of the aorta (CoA), and aortic stenosis/bicuspid aortic valve (AS/BAV). HTX syndromes include situs abnormalities such as dextrocardia, left or right isomerism (LAI, RAI) as the major malformation, and may include other defects such as L-transposition of the great arteries (L-TGA), atrioventricular canal defects (AVC), anomalous pulmonary venous drainage (TAPVR, PAPVR), and double outlet right ventricle. Isomerism of other organs was not considered a separate extra-cardiac malformation for this study. Lesions in the “Other” category include pulmonary valve abnormalities, anomalous pulmonary venous drainage, atrial septal defects (ASD), atrioventricular canal defects, double inlet left ventricle (DILV), and tricuspid valve atresia (TA). Any structural anomaly that was not acquired was called an extracardiac malformation.

Exome sequencing

Samples were sequenced at the Yale Center for Genome Analysis following the same protocol. Genomic DNA from venous blood or saliva was captured using the Nimblegen v.2 exome capture reagent (Roche) or Nimblegen SeqxCap EZ MedExome Target Enrichment Kit (Roche) followed by Illumina DNA sequencing as previously described[8]. WES data were processed using two independent analysis pipelines at Yale University School of Medicine and Harvard Medical School (HMS). At each site sequence reads were independently mapped to the reference genome (hg19) with BWA-MEM (Yale) and Novoalign (HMS) and further processed using the GATK Best Practices workflows[46-48], which include duplication marking, indel realignment, and base quality recalibration, as previously described[49]. Single nucleotide variants and small indels were called with GATK HaplotypeCaller and annotated using ANNOVAR[50], dbSNP (v138), 1000 Genomes (August 2015), NHLBI Exome Variant Server (EVS), and ExAC (v3)[51]. The MetaSVM algorithm, annotated using dbNSFP version 2.9[52], was used to predict deleteriousness of missense variants (annotated as “D-Mis”) using software defaults[53]. Variant calls were reconciled between Yale and HMS prior to downstream statistical analyses.

Kinship analysis

Relationship between proband and parents was estimated using the pairwise identity-by-descent (IBD) calculation in PLINK[54]. The IBD sharing between the proband and parents in all trios is between 45% and 55%.

Principal component analysis

To determine the ethnicity of each sample, we used the EIGENSTRAT[55] software to analyze tag SNPs in cases, controls, and HapMap subjects as described before[56]. Because all subjects who carried the p.Met364Thr RGs in GDF1 were self-reported Ashkenazi Jewish (AJ), we utilized an additional software package, LASER[57], which can accurately infer worldwide continental ancestry from sequencing data. To validate their reported AJ ancestry and to determine the number of AJ in cases and controls, we first downloaded genome-wide SNP array data for 471 AJ Individuals from the Gene Expression Omnibus database (accession no. GSE23636)[58] and then merged this data with 938 unrelated individuals from the Human Genome Diversity Project provided with LASER. We then clustered our cases and controls with these 1,409 samples whose ancestral information was known and determined which individuals in our cohort best cluster with known AJ using LASER.

Variant filtering

We filtered RGs for rare (MAF ≤ 10−3 across all samples in 1000 Genomes, EVS, and ExAC) homozygous and compound heterozygous variants that exhibited high quality sequence reads (pass GATK Variant Score Quality Recalibration [VSQR], have a minimum 8 total reads total for both proband and parents, and have a genotype quality [GQ] ≥ 20). Only LoF variants (nonsense, canonical splice-site, frameshift indels, and start loss), D-Mis, and non-frameshift indels were considered potentially damaging to the disease. For probands whose parents’ WES data were not available, only homozygous variants were analyzed. Synonymous variants were also filtered using the same criteria and analyzed separately to determine whether there is an inflation of background rate. DNMs were called by Yale using the TrioDenovo[59] program and by HMS as previously described[49], and filtered using the same criteria, which have been shown to yield a specificity of 96.3% as described previously[49]. These hard filters include: (1) an in-cohort MAF ≤ 4×10−4; (2) a minimum 10 total reads total, 5 alternate allele reads, and a minimum 20% alternate allele ratio in the proband if alternate allele reads ≥ 10 or, if alternate allele reads is < 10, a minimum 28% alternate ratio; (3) a minimum depth of 10 reference reads and alternate allele ratio < 3.5% in parents; and (4) exonic or canonical splice-site variants. For the LoF heterozygous variants, we filtered for rarity (MAF ≤ 10−5 across all samples in 1000 Genomes, EVS, and ExAC) and high-quality heterozygotes (pass GATK VQSR, minimum 8 total reads, GQ score ≥ 20, mapping quality [MQ] score ≥ 59, and minimum 20% alternate allele ratio in the proband if alternate allele reads ≥ 10 or, if alternate allele reads is < 10, a minimum 28% alternate ratio). Additionally, variants located in segmental duplication regions (as annotated by ANNOVAR[50]), RGs, and DNMs were excluded. Of particular note, all LoF heterozygous variants that met aforementioned criteria in 226 singletons were also included in the LoF heterozygous burden analysis even though an unknown proportion of these filtered variants could be de novo or compound heterozygous events. Finally, in silico visualization was performed on: (1) calls in the H-CHD set, (2) calls in the LoF-intolerant gene set (pLI ≥ 0.9), (3) variants that appear at least twice, and (4) variants in the top 50 significant genes from our burden analysis

Estimation of the expected number of recessive and dominant variants

We implemented a polynomial regression model coupled with a one-tailed binomial test to quantify the enrichment of damaging RGs in a specific gene or gene set in cases, independent of controls. Details about the modeling of the distribution of recessive and dominant variant counts are in the Supplementary Note. The expectation of the RG count for each gene was calculated using the fitted values from the polynomial model by the formula below: where ‘i’ denotes the ‘ith’ gene and ‘N’ denotes the total number of RGs. For a given gene set, the expected RG count was based on the sum of fitted values for the gene set. Alternatively, RG can also be modeled separately as compound heterozygotes or homozygotes without the need for regression fits. In this method, the expected number of compound heterozygotes for each gene is derived from distributing the observed number of RGs, N, across all genes according to the ratio of the squared de novo probabilities: The expected number of homozygotes is derived similarly, but using the linear ratio of de novo probabilities: The total number of expected RG for each gene is the sum of the derived expected compound heterozygous and homozygous values. For rare LoF heterozygous variants, we found that the number of LoF heterozygous variants in a gene was inversely correlated with the pLI score obtained from the ExAC database. To control for the potential confounding effect due to the pLI score, we stratified genes into 5 subsets by pLI quantiles: (1) those with a pLI score between 0 and the first quantile (pLI = 3.1×10−5); (2) those with a pLI score between the first quantile and the second quantile (pLI = 2.9×10−2); (3) those with a pLI score between the second quantile and the third quantile (pLI = 0.71); (4) those with a pLI score between third quantile and 1; (5) those without a pLI score. For each set, the expected number of LoF heterozygous variants for a gene was estimated by the following formula: where ‘j’ denotes the ‘jth’ gene, ‘k’ denotes the ‘kth’ set, and ‘L’ denotes the total number of LoF heterozygous variants. The expected number of heterozygous variants closely match the observed number of heterozygous variants in each gene in cases and controls (Supplementary Figure 2).

Statistical analysis

Gene-set enrichment analysis

To test for over-representation of a gene set without controls and correction for consanguinity, a one-tailed binomial test was conducted by comparing the observed number of variants to the expected count estimated using the method detailed above. Assuming that our exome capture reagent captures N genes and the testing gene set contains M genes, then the p-value of finding k variants in this gene set out of a total of x variants in the entire exome is given by where p= (Σ Expected Value)/(Σ Expected Value). Enrichment was calculated as the observed number of genotypes/variants divided by the expected number of genotypes/variants.

Gene-based binomial test

A one-tailed binomial test was used to compare the observed number of damaging variants within each gene was compared to the expected number estimated using the approach detailed above. Enrichment was calculated as the number of observed damaging genotypes/variants divided by the expected number of damaging genotypes/variants.

De novo enrichment analysis

The R package ‘denovolyzeR’ was used for the analysis of DNMs based on a mutation model developed previously[60, 61]. The probability of observing a DNM in each gene was derived as described previously[49], except that the coverage adjustment factor was based on the full set of 2,645 case trios or 1,789 control trios (separate probability tables for each cohort). The overall enrichment was calculated by comparing the observed number of DNMs across each functional class to expected under the null mutation model. The expected number of DNMs was calculated by taking the sum of each functional class specific probability multiplied by the number of probands in the study, multiplied by two (diploid genomes). The Poisson test was then used to test for enrichment of observed DNMs versus expected as implemented in denovolyzeR[60]. For gene set enrichment, the expected probability was calculated from the probabilities corresponding to the gene set only. To estimate the number of genes with > 1 DNM, 1 million permutations were performed to derive the empirical distribution of the number of genes with multiple DNMs. For each permutation, the number of DNMs observed in each functional class was randomly distributed across the genome adjusting for gene mutability. The empirical p value is calculated as the proportion of times that the number of recurrent genes from the permutation is greater than or equal to the observed number of recurrent genes. To examine whether any individual gene contain more DNMs than expected, the expected number of DNMs for each functional class (LoF, D-Mis, and LoF+D-Mis) was calculated from the corresponding probability adjusting for cohort size. The Poisson test was then used to compare the observed DNMs for each gene versus expected. For each gene, we compared the statistical significance across LoF, D-Mis, and LoF+D-Mis and reported the most significance statistical values. The Bonferroni multiple-testing threshold is, therefore, equal to 8.8×10−7 (0.05/(3×18,989)).

Meta-analysis of damaging de novo and LoF heterozygous variants

The Fisher’s method[62] with 4 degrees of freedom was performed to combine p-values from damaging DNMs and LoF heterozygous variants. We calculated p-values for damaging DNMs in each gene by comparing the observed number of damaging DNMs to the expected number in a respective gene under the null mutation model. We calculated p-values for LoF heterozygous variants using the one-tailed binomial test to compare the observed number of LoF heterozygous variants to the expected number adjusted for LoF de novo probabilities.

Estimating the number of genes with more than one recessive genotype

One million permutations were performed to derive the empirical distribution of the number of genes with multiple damaging RGs. For each permutation, the number of observed damaging RGs (N = 467) was randomly distributed across the genome using the fitted values from the polynomial model for each gene. The empirical p value is calculated as the proportion of times that the number of recurrent genes from the permutation is greater than or equal to the observed number of recurrent genes (N = 44). Similarly, 1 million permutations were conducted on synonymous RGs as an ancillary analysis.

Estimating the number of overlapping genes with damaging/LoF de novo mutations between CHD and autism cohorts

A permutation test was performed to assess the enrichment of overlapping genes with damaging/LoF DNMs shared between the CHD and autism cohorts. Given that the observed numbers of genes with DNMs in the CHD and autism cohorts are N1 and N2, respectively, and the observed number of overlapping genes is M, we sampled N1 genes from all genes in the CHD cohort and N2 genes from all genes in the autism cohorts without replacement using the probability of observing at least one DNM as weight. The number of overlapping genes, P, was determined in each iteration of the simulation. A total of 1,000,000 iterations were conducted to construct the empirical distribution. The empirical number of overlapping genes was calculated by taking the average of the number of overlapping gens across all iterations. The empirical p-value was calculated as follows:

Gene ontology enrichment analysis

The complete list of genes which harbored LoF/damaging variants were input into GOrilla[63] (http://cbl-gorilla.cs.technion.ac.il/) to identify enriched GO terms compared to the background set of genes (M=18,715). A false-discovery rate (FDR; represented as q value) of 0.1 was used as cutoff.

Case vs. control comparison

For FLT4 and SMAD6, we compared the burden of LoF alleles in all European cases to all non-Finnish subjects in the ExAC database. Only LoF variants with a global (i.e. across all individuals) MAF < 10−5 were extracted from ExAC for comparison. The total number of alleles evaluated per gene was taken as the median of the allele numbers reported for all positions in a gene. A two-sided Fisher’s exact test was used to compare the frequency of LoF variants in FLT4 and SMAD6.

URLs

GATK: (https://www.broadinstitute.org/gatk/); TrioDeNovo: (http://genome.sph.umich.edu/wiki/Triodenovo); DenovolyzeR: (http://denovolyzer.org); Plink: (http://pngu.mgh.harvard.edu/~purcell/plink); MetaSVM/ANNOVAR: (http://annovar.openbioinformatics.org); NHLBI ESP: (http://evs.gs.washington.edu/EVS/); ExAC03: (http://exac.broadinstitute.org) Contact the authors for the in-house pipelines

Data availability

Whole-exome sequencing data have been deposited in the database of Genotypes and Phenotypes (dbGaP) under accession number phs000571.v1.p1, phs000571.v2.p1, and phs000571.v3.p2

61 in total

1. Principal components analysis corrects for stratification in genome-wide association studies.

Authors: Alkes L Price; Nick J Patterson; Robert M Plenge; Michael E Weinblatt; Nancy A Shadick; David Reich
Journal: Nat Genet Date: 2006-07-23 Impact factor: 38.330

2. Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies.

Authors: Chengliang Dong; Peng Wei; Xueqiu Jian; Richard Gibbs; Eric Boerwinkle; Kai Wang; Xiaoming Liu
Journal: Hum Mol Genet Date: 2014-12-30 Impact factor: 6.150

3. De novo mutations in congenital heart disease with neurodevelopmental and other congenital anomalies.

Authors: Jason Homsy; Samir Zaidi; Yufeng Shen; James S Ware; Kaitlin E Samocha; Konrad J Karczewski; Steven R DePalma; David McKean; Hiroko Wakimoto; Josh Gorham; Sheng Chih Jin; John Deanfield; Alessandro Giardini; George A Porter; Richard Kim; Kaya Bilguvar; Francesc López-Giráldez; Irina Tikhonova; Shrikant Mane; Angela Romano-Adesman; Hongjian Qi; Badri Vardarajan; Lijiang Ma; Mark Daly; Amy E Roberts; Mark W Russell; Seema Mital; Jane W Newburger; J William Gaynor; Roger E Breitbart; Ivan Iossifov; Michael Ronemus; Stephan J Sanders; Jonathan R Kaltman; Jonathan G Seidman; Martina Brueckner; Bruce D Gelb; Elizabeth Goldmuntz; Richard P Lifton; Christine E Seidman; Wendy K Chung
Journal: Science Date: 2015-12-04 Impact factor: 47.728

Review 4. Consanguinity and the risk of congenital heart disease.

Authors: Joseph T C Shieh; Alan H Bittles; Louanne Hudgins
Journal: Am J Med Genet A Date: 2012-04-09 Impact factor: 2.802

5. Recessively inherited right atrial isomerism caused by mutations in growth/differentiation factor 1 (GDF1).

Authors: Eevi Kaasinen; Kristiina Aittomäki; Marianne Eronen; Pia Vahteristo; Auli Karhu; Jukka-Pekka Mecklin; Eero Kajantie; Lauri A Aaltonen; Rainer Lehtonen
Journal: Hum Mol Genet Date: 2010-04-22 Impact factor: 6.150

6. A temporal chromatin signature in human embryonic stem cells identifies regulators of cardiac development.

Authors: Sharon L Paige; Sean Thomas; Cristi L Stoick-Cooper; Hao Wang; Lisa Maves; Richard Sandstrom; Lil Pabon; Hans Reinecke; Gabriel Pratt; Gordon Keller; Randall T Moon; John Stamatoyannopoulos; Charles E Murry
Journal: Cell Date: 2012-09-11 Impact factor: 41.582

7. dbNSFP v2.0: a database of human non-synonymous SNVs and their functional predictions and annotations.

Authors: Xiaoming Liu; Xueqiu Jian; Eric Boerwinkle
Journal: Hum Mutat Date: 2013-07-10 Impact factor: 4.878

8. VAAST 2.0: improved variant classification and disease-gene identification using a conservation-controlled amino acid substitution matrix.

Authors: Hao Hu; Chad D Huff; Barry Moore; Steven Flygare; Martin G Reese; Mark Yandell
Journal: Genet Epidemiol Date: 2013-07-08 Impact factor: 2.135

9. A global reference for human genetic variation.

Authors: Adam Auton; Lisa D Brooks; Richard M Durbin; Erik P Garrison; Hyun Min Kang; Jan O Korbel; Jonathan L Marchini; Shane McCarthy; Gil A McVean; Gonçalo R Abecasis
Journal: Nature Date: 2015-10-01 Impact factor: 49.962

10. Two locus inheritance of non-syndromic midline craniosynostosis via rare SMAD6 and common BMP2 alleles.

Authors: Andrew T Timberlake; Jungmin Choi; Samir Zaidi; Qiongshi Lu; Carol Nelson-Williams; Eric D Brooks; Kaya Bilguvar; Irina Tikhonova; Shrikant Mane; Jenny F Yang; Rajendra Sawh-Martinez; Sarah Persing; Elizabeth G Zellner; Erin Loring; Carolyn Chuang; Amy Galm; Peter W Hashim; Derek M Steinbacher; Michael L DiLuna; Charles C Duncan; Kevin A Pelphrey; Hongyu Zhao; John A Persing; Richard P Lifton
Journal: Elife Date: 2016-09-08 Impact factor: 8.140

236 in total

1. ISL1 loss-of-function mutation contributes to congenital heart defects.

Authors: Lan Ma; Juan Wang; Li Li; Qi Qiao; Ruo-Min Di; Xiu-Mei Li; Ying-Jia Xu; Min Zhang; Ruo-Gu Li; Xing-Biao Qiu; Xun Li; Yi-Qing Yang
Journal: Heart Vessels Date: 2018-11-02 Impact factor: 2.037

Review 2. The Pediatric Cell Atlas: Defining the Growth Phase of Human Development at Single-Cell Resolution.

Authors: Deanne M Taylor; Bruce J Aronow; Kai Tan; Kathrin Bernt; Nathan Salomonis; Casey S Greene; Alina Frolova; Sarah E Henrickson; Andrew Wells; Liming Pei; Jyoti K Jaiswal; Jeffrey Whitsett; Kathryn E Hamilton; Sonya A MacParland; Judith Kelsen; Robert O Heuckeroth; S Steven Potter; Laura A Vella; Natalie A Terry; Louis R Ghanem; Benjamin C Kennedy; Ingo Helbig; Kathleen E Sullivan; Leslie Castelo-Soccio; Arnold Kreigstein; Florian Herse; Martijn C Nawijn; Gerard H Koppelman; Melissa Haendel; Nomi L Harris; Jo Lynne Rokita; Yuanchao Zhang; Aviv Regev; Orit Rozenblatt-Rosen; Jennifer E Rood; Timothy L Tickle; Roser Vento-Tormo; Saif Alimohamed; Monkol Lek; Jessica C Mar; Kathleen M Loomes; David M Barrett; Prech Uapinyoying; Alan H Beggs; Pankaj B Agrawal; Yi-Wen Chen; Amanda B Muir; Lana X Garmire; Scott B Snapper; Javad Nazarian; Steven H Seeholzer; Hossein Fazelinia; Larry N Singh; Robert B Faryabi; Pichai Raman; Noor Dawany; Hongbo Michael Xie; Batsal Devkota; Sharon J Diskin; Stewart A Anderson; Eric F Rappaport; William Peranteau; Kathryn A Wikenheiser-Brokamp; Sarah Teichmann; Douglas Wallace; Tao Peng; Yang-Yang Ding; Man S Kim; Yi Xing; Sek Won Kong; Carsten G Bönnemann; Kenneth D Mandl; Peter S White
Journal: Dev Cell Date: 2019-03-28 Impact factor: 12.270

3. Histone H2B monoubiquitination regulates heart development via epigenetic control of cilia motility.

Authors: Andrew Robson; Svetlana Z Makova; Syndi Barish; Samir Zaidi; Sameet Mehta; Jeffrey Drozd; Sheng Chih Jin; Bruce D Gelb; Christine E Seidman; Wendy K Chung; Richard P Lifton; Mustafa K Khokha; Martina Brueckner
Journal: Proc Natl Acad Sci U S A Date: 2019-06-24 Impact factor: 11.205

Review 4. Review of the phenotypic spectrum associated with haploinsufficiency of MYRF.

Authors: Linda Z Rossetti; Kevin Glinton; Bo Yuan; Pengfei Liu; Nishitha Pillai; Elizabeth Mizerik; Pilar Magoulas; Jill A Rosenfeld; Lefkothea Karaviti; Vernon R Sutton; Seema R Lalani; Daryl A Scott
Journal: Am J Med Genet A Date: 2019-05-08 Impact factor: 2.802

5. Genetic architecture of laterality defects revealed by whole exome sequencing.

Authors: Alexander H Li; Neil A Hanchard; Mahshid Azamian; Lisa C A D'Alessandro; Zeynep Coban-Akdemir; Keila N Lopez; Nancy J Hall; Heather Dickerson; Annarita Nicosia; Susan Fernbach; Philip M Boone; Tomaz Gambin; Ender Karaca; Shen Gu; Bo Yuan; Shalini N Jhangiani; HarshaVardhan Doddapaneni; Jianhong Hu; Huyen Dinh; Joy Jayaseelan; Donna Muzny; Seema Lalani; Jeffrey Towbin; Daniel Penny; Charles Fraser; James Martin; James R Lupski; Richard A Gibbs; Eric Boerwinkle; Stephanie M Ware; John W Belmont
Journal: Eur J Hum Genet Date: 2019-01-08 Impact factor: 4.246

Review 6. Taking Systems Medicine to Heart.

Authors: Kalliopi Trachana; Rhishikesh Bargaje; Gustavo Glusman; Nathan D Price; Sui Huang; Leroy E Hood
Journal: Circ Res Date: 2018-04-27 Impact factor: 17.367

Review 7. Genetic Basis for Congenital Heart Disease: Revisited: A Scientific Statement From the American Heart Association.

Authors: Mary Ella Pierpont; Martina Brueckner; Wendy K Chung; Vidu Garg; Ronald V Lacro; Amy L McGuire; Seema Mital; James R Priest; William T Pu; Amy Roberts; Stephanie M Ware; Bruce D Gelb; Mark W Russell
Journal: Circulation Date: 2018-11-20 Impact factor: 29.690

8. Truncating Variants in NAA15 Are Associated with Variable Levels of Intellectual Disability, Autism Spectrum Disorder, and Congenital Anomalies.

Authors: Hanyin Cheng; Avinash V Dharmadhikari; Sylvia Varland; Ning Ma; Deepti Domingo; Robert Kleyner; Alan F Rope; Margaret Yoon; Asbjørg Stray-Pedersen; Jennifer E Posey; Sarah R Crews; Mohammad K Eldomery; Zeynep Coban Akdemir; Andrea M Lewis; Vernon R Sutton; Jill A Rosenfeld; Erin Conboy; Katherine Agre; Fan Xia; Magdalena Walkiewicz; Mauro Longoni; Frances A High; Marjon A van Slegtenhorst; Grazia M S Mancini; Candice R Finnila; Arie van Haeringen; Nicolette den Hollander; Claudia Ruivenkamp; Sakkubai Naidu; Sonal Mahida; Elizabeth E Palmer; Lucinda Murray; Derek Lim; Parul Jayakar; Michael J Parker; Stefania Giusto; Emanuela Stracuzzi; Corrado Romano; Jennifer S Beighley; Raphael A Bernier; Sébastien Küry; Mathilde Nizon; Mark A Corbett; Marie Shaw; Alison Gardner; Christopher Barnett; Ruth Armstrong; Karin S Kassahn; Anke Van Dijck; Geert Vandeweyer; Tjitske Kleefstra; Jolanda Schieving; Marjolijn J Jongmans; Bert B A de Vries; Rolph Pfundt; Bronwyn Kerr; Samantha K Rojas; Kym M Boycott; Richard Person; Rebecca Willaert; Evan E Eichler; R Frank Kooy; Yaping Yang; Joseph C Wu; James R Lupski; Thomas Arnesen; Gregory M Cooper; Wendy K Chung; Jozef Gecz; Holly A F Stessman; Linyan Meng; Gholson J Lyon
Journal: Am J Hum Genet Date: 2018-04-12 Impact factor: 11.025

9. Beyond Gene Panels: Whole Exome Sequencing for Diagnosis of Congenital Heart Disease.

Authors: Sharon L Paige; Priyanka Saha; James R Priest
Journal: Circ Genom Precis Med Date: 2018-03

10. Resilience to Pain: A Peripheral Component Identified Using Induced Pluripotent Stem Cells and Dynamic Clamp.

Authors: Malgorzata A Mis; Yang Yang; Brian S Tanaka; Carolina Gomis-Perez; Shujun Liu; Fadia Dib-Hajj; Talia Adi; Rolando Garcia-Milian; Betsy R Schulman; Sulayman D Dib-Hajj; Stephen G Waxman
Journal: J Neurosci Date: 2018-11-20 Impact factor: 6.167