Literature DB >> 34324492

Integrative analysis of genomic variants reveals new associations of candidate haploinsufficient genes with congenital heart disease.

Enrique Audain1,2, Anna Wilsdon3, Jeroen Breckpot4, Jose M G Izarzugaza5, Tomas W Fitzgerald6, Anne-Karin Kahlert1,2,7, Alejandro Sifrim8,9, Florian Wünnemann10, Yasset Perez-Riverol11, Hashim Abdul-Khaliq12, Mads Bak13,14, Anne S Bassett15,16, D Woodrow Benson17, Felix Berger18, Ingo Daehnert19, Koenraad Devriendt4, Sven Dittrich20, Piers Ef Daubeney21, Vidu Garg22,23,24,25, Karl Hackmann7, Kirstin Hoff1,2, Philipp Hofmann1,2, Gregor Dombrowsky1,2, Thomas Pickardt26, Ulrike Bauer26, Bernard D Keavney27,28, Sabine Klaassen29,30,31, Hans-Heiner Kramer1,2, Christian R Marshall32,33, Dianna M Milewicz34, Scott Lemaire35, Joseph S Coselli36, Michael E Mitchell36, Aoy Tomita-Mitchell36, Siddharth K Prakash34, Karl Stamm36, Alexandre F R Stewart37, Candice K Silversides15, Reiner Siebert38,39, Brigitte Stiller40, Jill A Rosenfeld17, Inga Vater39, Alex V Postma41,42, Almuth Caliebe39, J David Brook3, Gregor Andelfinger43, Matthew E Hurles44, Bernard Thienpont4,45, Lars Allan Larsen13, Marc-Phillip Hitz1,2,39,44.   

Abstract

Numerous genetic studies have established a role for rare genomic variants in Congenital Heart Disease (CHD) at the copy number variation (CNV) and de novo variant (DNV) level. To identify novel haploinsufficient CHD disease genes, we performed an integrative analysis of CNVs and DNVs identified in probands with CHD including cases with sporadic thoracic aortic aneurysm. We assembled CNV data from 7,958 cases and 14,082 controls and performed a gene-wise analysis of the burden of rare genomic deletions in cases versus controls. In addition, we performed variation rate testing for DNVs identified in 2,489 parent-offspring trios. Our analysis revealed 21 genes which were significantly affected by rare CNVs and/or DNVs in probands. Fourteen of these genes have previously been associated with CHD while the remaining genes (FEZ1, MYO16, ARID1B, NALCN, WAC, KDM5B and WHSC1) have only been associated in small cases series or show new associations with CHD. In addition, a systems level analysis revealed affected protein-protein interaction networks involved in Notch signaling pathway, heart morphogenesis, DNA repair and cilia/centrosome function. Taken together, this approach highlights the importance of re-analyzing existing datasets to strengthen disease association and identify novel disease genes and pathways.

Entities:  

Mesh:

Substances:

Year:  2021        PMID: 34324492      PMCID: PMC8354477          DOI: 10.1371/journal.pgen.1009679

Source DB:  PubMed          Journal:  PLoS Genet        ISSN: 1553-7390            Impact factor:   6.020


Introduction

Congenital Heart Disease (CHD) accounts for a large fraction of foetal and infant deaths, with incidence rates ranging from 7–9 per 1000 live births [1]. Within the last 30 years, survival rates have substantially increased due to improvements in surgical, interventional and clinical intensive care resulting in a rapidly growing number of CHD survivors reaching adulthood [2]. Nevertheless, there is still increased morbidity and mortality in individuals with CHD, resource utilization is high especially among severely affected patients, and importantly, the underlying etiology remains unclear for the majority of cases. CHD is multifactorial, with both environmental and genetic risk factors [3,4]. Familial aggregation of CHD including Thoracic aortic aneurysm (TAA), as well as a large proportion of genomic copy number variants (CNVs) and de novo intragenic variations (DNVs) in probands with CHD suggest a strong genetic component. An estimated 4–20% of CHD cases are due to rare CNVs, suggesting that a significant part of CHD is caused by gene-dosage defects [5]. Recently, exome sequencing in large cohorts has been used to identify novel disease genes and strengthen known disease associations through the demonstration of an excess of de novo protein truncating variants (PTV) and rare inherited loss-of-function (LOF) variants in probands with CHD [6,7]. Overlaying both CNVs and PTVs has been used to define novel CHD relevant disease genes in contiguous gene disorders [8,9]. Following this principle, we have performed a genome-wide integrative meta-analysis of published and publicly available datasets of CNVs and DNVs identified in probands with CHD. This analysis, which is one of the larger meta-analyses of genomic variants in CHD so far, strengthens the disease association of known CHD genes and identifies novel haploinsufficient CHD candidate genes.

Results

Cohort description and workflow

We assembled a cohort with 7,958 cases (comprising both non-syndromic CHD, syndromic CHD and TAA cases) and 14,082 controls (). Of the total of cases, 777 (~10%) were diagnosed with Thoracic Aortic Aneurysm (TAA). An overview of the sources used to assemble the present cohort is listed in (for CHD cases) and (for controls). We applied a set of quality control filters to our assembled CNV data before performing case-control association tests (Materials and Methods). In addition, common CNVs (minor allele frequency (MAF) in controls > 0.01) were excluded from the analysis. After filtering, 6,746 cases and 14,024 controls remained for further downstream analysis. Furthermore, we built a dataset of de novo variations (DNVs) identified in 2,489 probands with CHD from parent-offspring trios [6,7].

CNV burden test of known CHD genes

Haploinsufficiency has been shown to cause a reasonable proportion of CHD [5]. Thus, genes known to be associated with CHD and genes which are intolerant for LOF variations should be deleted more often in probands with CHD than in controls. To test this hypothesis, we performed a CNV burden test using sets of genes known to be involved in CHD. In addition, we included genes known to be associated with developmental disorders, a curated list of known haploinsufficient disease genes, autosomal recessive disease genes and genes predicted to be intolerant to LOF variations (based on the observed/expected LOF ratio from gnomAD [10]). The burden test was performed using a logistic regression framework [11] (implemented in PLINK v1.7). and summarize the results from the burden test on the different gene sets: known CHD genes (grouped in syndromic, non-syndromic, monoallelic and biallelic), developmental disorder genes, haploinsufficiency disease genes, autosomal recessive genes and all protein-coding genes. We tested all protein-coding genes to address the possibility that the analyses could be biased by differences in the CNV rate within the case and control groups, since we have assembled our cohort from different datasets. We did not observe genome-wide (all tested protein-coding genes) enrichment (P = 0.39, OR = 0.99) nor enrichment in the autosomal recessive gene set (P = 0.52, OR = 1.03) when comparing rare CNV deletions in cases vs controls. In contrast, the analysis revealed significant differences in the burden of CNV deletions between cases and controls for the set of haploinsufficiency genes (P = 8.29 x 10−13, OR = 2.27). As expected, our analysis revealed significant enrichment for the set of known CHD genes, which is mainly explained by the contribution of monoallelic CHD genes (P = 2.04 x 10−31, OR = 4.13) and syndromic CHD gene set (P = 1.66 x 10−33, OR = 4.06). Unlike the monoallelic and syndromic CHD gene sets, no significant enrichment was found for the nonsyndromic (P = 0.75, OR = 1.16) and biallelic (P = 0.08, OR = 1.87) CHD gene sets. Our analysis revealed a moderate enrichment of rare CNVs in the developmental disorder gene set (P = 6.90 x 10−11, OR = 1.75).

CNV burden test on known gene sets.

The forest plot shows the odds ratio (dots), the 95% confidence intervals indicating the certainty about the OR (interrupted line) and the P-value in the indicated gene sets. When the regression-based analysis was performed at different levels of the observed/expected LOF ratio (oeLOF) constraint metric ( and ), we observed the higher enrichment toward the most LOF constrained genes (oeLOF < 0.01, P = 9.55 x 10−18, OR = 1.40) and still a moderate enrichment for genes with oeLOF < 0.1 (P = 0.002, OR = 1.09). No enrichment was observed in the set with oeLOF ratio > = 0.1 (P = 0.03, OR = 0.99). Based on these results we conclude that haploinsufficiency causes a significant component of CHD.

CNV burden test on constraint LOF genes at different observed/expected LOF ratio thresholds.

The forest plot shows the odds ratio (dots), the 95% confidence intervals indicating the certainty about the OR (interrupted line) and the P-value in the indicated gene sets.

Genome-wide identification of haploinsufficiency candidate disease genes for CHD

To perform a systematic, genome-wide identification of potential haploinsufficient CHD disease genes and loci, we analysed the CNV burden of 19,969 protein-coding genes (GENCODE v19). To this end, we compared the number of rare CNV deletions (MAF < 0.01) among cases and controls for each gene, and identified genes with significant CNV burden using a permutation test (significance level of adjusted P < 0.05, see Materials and Methods). If a CNV spanned two or more genes, all affected protein-coding genes were considered in the analysis. The distributions of rare CNV deletions in CHD cases across all 22 human autosomes is shown in . Significant candidate genes had a median number of 12 overlapping CNVs in cases, compared to a median of 0 overlapping CNVs in controls (). Because CNVs can be large chromosomal aberrations, multiple genes were affected by some of the CNVs. In total, 528 genes (Sheet A in ) reached significance (Permutation test, P adjusted < 0.05). These 528 genes encompass a total of 63 loci (Sheet B in , highlighted in magenta in ). The sizes of these loci range from 558 bp to 10.5 Mbp, with a median value of 243 Kbp (). The number of genes per locus ranged from 1 to 48, with a median value of 3 (). Only 16 loci contained a single gene (Sheet B in ).

CNV deletion distribution across the 22 autosomes.

The plot shows the distribution of rare CNV deletions (green track) in CHD cases, the differences between the overlapping CNV deletions in cases and controls (black track) and highlight the location of the 63 significant loci discovered (in magenta). In addition, we tested previously described CNV deletion syndrome regions (https://decipher.sanger.ac.uk/disorders#syndromes/overview) associated with developmental disorders and/or CHD for enrichment in our analysis (Materials and Methods). We found eight of these regions enriched in the dataset (), with the 16p11.2-p12.2 locus being the region with the largest number of deletions in cases (n = 230).

Shared genetic architecture of CHD and TAA

We independently performed a genome-wide test without the TAA cases to evaluate its impact on CHD. As expected, most of the genes (447 out of 528) remained significant after removing the contribution of TAA cases, since ~90% of the cases in the analyzed CNV cohort were CHD. Ten genes were significantly enriched independently when analyzing CHD and TAA cases, while 61 were significantly enriched only in TAA cases ().

De novo variation analysis

To identify an independent set of haploinsufficient CHD candidate genes, we combined de novo variations identified in two large-scale CHD case-control studies [6,7] and performed a gene-based de novo variation (DNV) burden test [12]. We analysed a total of 4,195 rare DNVs within 2,534 genes in the patient cohort. After classifying every variant into functional groups (Materials and Methods) 526 of these variants were predicted to be protein-truncating and 2,647 were missense. We evaluated for potential differences of the DNV rates between cohorts (see Materials and Methods). Comparison of the rate of each variant type across the groups was non-significant (P > 0.05, Poisson test, ). We used two available statistical methods, Mupit [12] and DeNovoWEST [13], which test the significance of observed DNV at gene level, by comparing the number of observed variations with the number of expected variations (based on a sequence-dependent variation recurrence rate, see Materials and Methods). While Mupit focuses on enrichment of protein-truncating DNVs specifically, the DeNovoWEST test incorporates missense constraint information at variant level and applies a unified severity scale at variant level based on the empirically-estimated positive predictive value of being pathogenic. Based on the complementary results of both tests [13], we reported the minimal observed DNV p-value (P) per gene. We identified 14 genes significantly enriched in the DNV analysis (P < 0.05 after Bonferroni correction for multiple testing, ). All of these genes were affected by at least two constrained non-synonymous DNV (nsDNV) and show significant overlap with 11/14 (78.6%) of the genes being known CHD disease genes. CHD7 (OMIM 214800) was the most significant haploinsufficient gene (P = 2.84 x 10−26) with 18 nsDNVs identified in the patient cohort. Other highly enriched genes for nsDNV—KMT2D (OMIM 147920), KMT2A (OMIM 605130), NSD1 (OMIM 117550), TAB2 (OMIM 614980), and ADNP (OMIM 615873)—have been previously associated with different types of neurodevelopmental disorders with co-occurrence of CHD. In the case of KDM5B (OMIM 618109), it has only been described in the context of a recessive neurodevelopmental phenotype with cases presenting ASD (Atrial septal defects) [14,15]. We next evaluated the distribution of o/e LOF ratio at different levels of DNV enrichment (genes were split based on P). Since the o/e ratio of LOF variation in each gene is strongly affected by its length, we instead used the 90% upper bound of its confidence interval (termed LOEUF), which keeps the direct estimate of the o/e ratio and allows to distinguish small genes from large genes, as suggested by Karczewski et al [10]. We observed that the genes with higher enrichment for nsDNV (lower P) show a significant decreased LOEUF compared to the mean of all protein-coding genes ().

Comparison of the distribution of LOEUF metric at different level of significance of nsDNV-enriched genes.

X-axis denotes the P-values from the DNV analysis (binned). Y-axis denotes the o/e LOF ratio upper bound fraction (LOEUF). All groups were compared against the LOEUF distribution of all protein-coding genes (purple). Differences between the distributions were tested using a two-sided Wilcoxon rank sum test. ****: P<0.0001, ns: non-significant.

Integration of DNV and CNV results

To identify high confidence haploinsufficient CHD disease genes, we performed a joint analysis integrating the results from the CNV and the DNV analysis. We combined the results from both analyses (P and P) using the Fisher combine method. We demonstrated that both enriched genes for DNV and CNV deletions are significantly represented among LOF constraint genes (measured by the o/e LOF ratio). Therefore, we applied a Bonferroni multiple testing correction using independent hypothesis weighting [16] (IHW) by incorporating the gene o/e LOF ratio, as a measure of haploinsufficiency (). Our analysis revealed 21 genes that were significantly enriched for CNV deletions and/or non-synonymous DNV (). A gene was included in the final set of haploinsufficient CHD disease genes if it reached a significant corrected metaP < 0.05 (after Bonferroni adjustment with IHW).

Top 21 significant genes arising from both the permutation-based test and the DNV rate-based test.

Cases/Controls: Number of cases and controls carrying CNV deletions overlapping the gene in the CNV analysis. P: p-value from the CNV permutation test. nsDNV: Number of constrained non-synonymous variations in the de novo analysis. P: p-value from the DNV analysis. Significant: The analysis where the gene was significant (dnv: DNV analysis, cnv: CNV analysis, both: Both analysis, none: Non-significant neither DNV nor CNV analysis). metaP: combined p-value (P and P) using the Fisher method. P: Bonferroni corrected p-value using independent hypothesis weighting (IHW) and LOEUF metric as covariate. LOEUF: o/e LOF ratio upper bound fraction from gnomAD. *All the 21 genes were significant after combining their p-values and applying Bonferroni correction. 1Evidence is from mouse models [24,62].

Subclassification of CHD phenotypes

We performed a further analysis based on specific CHD subtypes in addition to the collective analysis of all CHD phenotypes. The analysis focused on simplex CHD cases, within two main categories: Conotruncal (NCNV = 873, NDNV = 234) and LVOTO (NCNV = 594, NDNV = 351). We only included cases with a clear phenotypic description and without any overlapping phenotypic features between the two categories (LVOTO/Conotruncal). The conotruncal group consisted mostly of Tetralogy of Fallot (TOF), Truncus arteriosus and Transposition of great arteries (TGA), whereas LVOTO mainly constituted Aortic stenosis (AS) including Bicuspid aortic valve disease (BAV), Coarctation and Hypoplastic left heart syndrome (HLHS). As described above, we performed the same integrative approach for the LVOTO and conotruncal groups to identify CHD subtype-specific genes. Our analysis showed four significant genes (). Three are observed in LVOTO (KMT2D, KMT2A and TAB2) and a single gene showed significant enrichment in the conotruncal subtype (NSD1).

Significant CHD genes are highly and/or differentially expressed in the heart

We next evaluated the expression pattern of the 21 significant genes (Bonferroni corrected metaP < 0.05) in the heart using RNA-Seq data from human tissues at different developmental time points [17]. We stratified the analysis based on stages of heart development (see Materials and Methods). Our analysis revealed that the most significant genes (metaP < 1 x 10e-5) show significantly increased mean expression in the heart (P < 0.0001, Wilcox test) at different developmental stages (development, maturation and infant/adult), compared to all protein-coding genes (). Moreover, 18 out of 21 genes fall in the in the top quartile of heart expression in both developmental and maturation stages (). To complement our expression analysis, we compared gene expression during human heart development with expression in two other mesodermal organs: kidney and liver. This allowed identification of genes with significant changes in its expression levels during crucial heart developmental stages, which would have not been possible when focusing on expression levels alone (Materials and Methods). We found that 17 out of 21 CHD candidate genes are differentially expressed in the heart (R > 0.50, Bonferroni corrected P < 0.01) when compared to its expression levels in kidney and/or liver. Interestingly, the three genes (FEZ1, NALCN and MYO16) which are not among the highly expressed genes, were found to be significantly differentially expressed during heart development compared to kidney and/or liver ().

Comparison of the mean expression (heart) distribution at different metaP cut-offs.

Panels show three different heart development stages: early development, maturation and infant/adult. X-axis denotes the combined p-value from DNV and CNV analysis (metaP, at different cut-offs). Y-axis denotes the genes’ mean expression in the heart (log scale). The 21 significant candidate CHD genes () are contained in the fraction with the higher expression (red box). Differences between the distributions were tested using a two-sided Wilcoxon rank sum test (reference group: all genes). ****: P<0.0001, ***: P<0.001, **: P<0.01, *: P<0.05, ns: non-significant.

CNV/DNVs burden of specific protein complexes

CNVs and DNVs can affect heart development either through haploinsufficiency of a single gene, or through its combined impact on the function of several genes. Indeed, oligogenic models have been implicated in CHD, and proteins acting in the same complex or pathway are known to be encoded in genomic clusters [18,19]. We therefore conducted a systems-level analysis to identify global mechanisms by which haploinsufficiency might promote CHD. In particular, we assessed the combined effect of CNVs and DNVs with respect to human protein-protein interactions (PPIs). The InWeb and ConsesusPathDB databases provides ranked information about experimentally determined physical interactions and, therefore, serves as a proxy to understand the functional effects of CNV/DNVs on human protein complexes (Materials and Methods). The genes with Benjamini–Hochberg adjusted metaP < 0.05 (n = 492 genes) were used as seeds to build a PPI network from the data available in InWEb and ConsessusPathDB. No additional interections were considered. The final network consisted of 164 proteins and 290 interactions (). A total of 10 overlapping sub-clusters within this network were identified using the in-built clustering algorithm implemented in GeNets [20] (Materials and Methods). Gene-ontology (GO) enrichment analysis suggested that four out of these ten sub-clusters are enriched for genes involved in Notch signaling pathway, cardiocyte differentiation, DNA repair and centrosome function (). All the four clusters accommodate more CNV deletions in CHD cases compared to controls. Six out of the ten sub-clusters did not show significant enrichment for any particular biological process.

Identification of functional networks enriched for proteins encoded by genes affected by CNVs and/or DNVs associated with CHD.

The protein-protein interaction networks (a-d, for clusters 1, 3, 8 and 9 respectively) were identified using GeNets (). Proteins are shown as nodes, interactions as edges. Enrichment for CNVs (blue) and DNVs (green) are highlighted. Proteins with no specific enrichment for CNV and/or DNVs but with B-H adjusted metaP < 0.05 are highlighted in red. The size of the circles denotes if the genes was found significantly highly and/or differentially expressed in the heart (large circles: significant expression; small circles: non-significant). The distribution of CHD case-CNVs and control-CNVs are shown for each cluster. Significant difference in the CNV distribution was calculated using a Wilcox rank sum test. The horizontal bar plots show the top ten GO enriched terms for each cluster (output from Enrichr tool). X-axis in the horizontal bar plot denotes the combined score from Enrichr, which is computed by multiplying the log-transformed p-value and the z-score. Bar color encoded the GO biological process significant level (dark blue: FDR < 5%, light blue: FDR 5–10%, grey: FDR > 10%).

Discussion

We performed a meta-analysis of rare genomic variants in a cohort of 10,447 CHD probands, which provides a useful resource for interpreting CNVs and DNVs identified in patients with CHD. We implemented a statistical approach which allows the integration of different types of genomic variants to discover novel genes associated with CHD. Our data-driven integrative analysis took into account three major criteria at the genomic level: a) gene enrichment for DNVs, b) gene enrichment for CNV deletions and c) gene intolerance for LOF variations. Our analysis identified 21 significant haploinsufficient CHD genes. Fourteen of these are known CHD genes, and the remaining seven genes have not previously been associated with CHD (). To further strengthen associations, we made use of a newly published human transcriptome atlas covering different developmental, maturation and adult stages in numerous organs [17]. Similar to previous results [7], our analysis highlights that the majority of the 21 significant genes are highly expressed during critical stages of heart development. Unlike earlier studies [7,21] which did not address the importance of expression changes over time, we evaluated the differential expression patterns of genes by comparing levels of expression in the heart, kidney and liver at different time points in development. This analysis allowed us to strengthen disease association for genes not falling under the high expression group and highlight the critical importance of all 21 genes independently of the genomic approach. This aspect is complemented by the fact that the majority of genes (14/21) were already known to cause CHD. To further strengthen disease associations, spatiotemporal expression at single-cell resolution during critical cardiac developmental timepoints and analyses of animal models with targeted mutation in the candidate disease genes is warranted. This could strengthen disease association further and provide pathophysiological information. Among the 21 likely haploinsufficient disease genes for which the combined analyses showed enrichment (Bonferroni corrected metaP < 0.05), 14 genes (CHD7, KMT2D, KMT2A, NOTCH1, NSD1, TAB2, ANKRD11, ADNP, DYRK1A, RBFOX2, KANSL1, ELN, MED13L and GATA6) are well-established CHD genes, and our data confirms this association. To the best of our knowledge, association between CHD and seven genes (KDM5B, WHSC1, WAC, NALCN, ARID1B, FEZ1 and MYO16) had either not been established, or had been reported in small cases studies or a single individual only. KDM5B is not an established CHD gene thus far, although one patient with compound heterozygous frameshift variants had an ASD [14]. While some have argued against the haploinsufficiency of the gene [22], our analysis suggests KDM5B as a plausible haploinsufficient CHD gene. Additional functional studies are warranted to confirm its role in CHD. A recent CNV meta-analysis [23] based on non-syndromic CHD patients found that duplication of WHSC1 (also known as NSD2) is a possible cause of CHD. However, haploinsufficiency of WHSC1 has not previously been associated with CHD. In support of its role in CHD, Whsc1 has been reported to cause heart malformations in mouse models [24]. In addition, WHSC1 is known to interact with NKX2.5 [24]. In spite of this, the low incidence of CHD in individuals with Wolf-Hirschhorn syndrome suggests that haploinsufficiency of WHSC1 alone does not cause CHD. Heterozygous truncating variations in WAC, as well as CNV deletions involving this gene, have been recently associated with the DeSanto-Shinawi syndrome, a rare neurodevelopmental disorder characterized by global developmental delay [25,26]. Furthermore, in two non-consanguineous unrelated individuals with heart malformations, among other disorders [27], microdeletions at 10p11.23-p12.1 (overlapping ARMC4, MPP7, BAMBI and WAC) were identified. Despite these isolated reports, no definite association between WAC and CHD has been established. DNVs in NALCN have been reported to cause a dominant condition characterized by multiple features including developmental delay, congenital contractures of the limbs and face and hypotonia [28,29]. However, among the phenotypes observed, CHD have been not described thus far. Heterozygous variation of ARID1B is a frequent cause of intellectual disability [30,31]. A recent analysis of 143 patients with ARID1B variations showed that individuals display a spectrum of clinical characteristics. Congenital heart defects were observed in 19.5% of the patients [32]. FEZ1 is a neurodevelopmental gene, which has been associated with schizophrenia [33]. Fez1 has been reported to be regulated by Nkx2-5 in heart progenitors in mice, suggesting a possible role in heart development [34]. MYO16 (NYAP3) encodes an unconventional myosin protein, involved in regulation of neuronal morphogenesis [35]. We have not found an association between MYO16 and heart development in the literature. Although, several genes have been shown to be altered in syndromic and non-syndromic cases with CHD and TAA (e.g. HEY2 [36], MYH11 and NOTCH1 [37]), among the 10 genes significant in our analysis for TAA, CHD and the combined scenario, none has been reported previously to be associated with either CHD or TAA. Given the limited data size and only accessing CNV calls from TAA cases, future studies looking at CNV and DNV in both phenotypes are required to establish stronger genotype-phenotype correlation to better understand a possibly shared genetic architecture for the two disease entities. Also, our study did not identify strong signals associated with specific CHD subtypes. We identified four genes with significant enrichment (adjusted metaP < 0.05) within the two evaluated CHD subtypes (LVOTO and conotruncal defects). Three were associated with LVOTO, and the contribution was mainly from DNV. KMT2D was enriched in LVOTO, which is consistent with the reported spectrum of CHD in patients with Kabuki syndrome, where a large proportion of individuals have LVOTO type CHD [38,39]. PDA and septal defects predominate in Wiedemann-Steiner Syndrome (KMT2A). However, aortic insufficiency and BAV have all been reported [40] suggesting that LVOTO might form part of the phenotypic spectrum. TAB2 was also enriched in LVOTO. Mutations in TAB2 are associated with a wide range of cardiac phenotypes [41-43]. Deletions at 6q24 causing haploinsufficiency of TAB2 have been associated with outflow tract abnormalities, including LVOTO [44,45], which might point to a role of TAB2 LVOTO pathogenesis. NSD1 was the only gene identified as significant among conotruncal defects. The significance of this is unclear as septal defects predominate in Sotos Syndrome. However, patients in previous studies were ascertained focusing on Sotos syndrome (OMIM 117550) rather than CHD. Given the current size of each CHD subgroup and the low number of CNV and DNV events in these genes, these results cannot be considered conclusive. More precise phenotypic descriptions for each individual are necessary to increase the genotype-phenotype correlation for individual genes and improve the identification of genetic subnetworks, along with larger sample sizes. In addition to the gene-centered analysis, we also applied a systems-level analysis in order to identify potential novel pathophysiological mechanisms affected by haploinsufficiency. In this approach, we took advantage of GeNets [20], a computational framework for the analysis of protein-protein interactions, developed for the interpretation of genomic data. Our analysis allowed us to identify PPI clusters enriched for genes affected by CNVs and/or DNV in patients. Furthermore, GO enrichment analysis suggested distinct biological functions for four of these clusters. Cluster 1 () contains proteins involved in the Notch signaling pathway. Our data corroborate previous studies that confirm the central role of Notch pathway in the pathophysiology of CHD [46] and highlights the shared contribution of CNVs and DNVs within the cluster. Cluster 3 () contains proteins driving essentials processes in the development of the heart such as atrial septum and cardiac right ventricle morphogenesis as well as proteins playing significant role in the positive regulation of gene expression. These mechanisms has been well studied elsewhere [47]. Interestingly, three out of the seven candidate novel CHD genes (WAC, ARID1B and KDMB5) were found to be contributing to these two clusters. Cluster 8 () showed enrichment for processes related with chromosome organization and DNA repair. Association between DNA repair and CHD is not well established thus far. Cluster 9 () was found to be associated with microtubule organizing function. This biological process has been not described in the context of CHD, although an earlier report [48] describes complex CHD among the phenotypes in individuals with 15q11.2 deletion syndrome, which involves the tubulin gamma complex protein 5. Given the heterogenous data sources and the complex inheritance patterns often observed in patients with CHD, our study has limitations. Firstly, the patient data was collected from almost 200 different published sources, and in many cases it was only possible to obtain data from CNV calls which had already been suggested to be pathogenic. Thus, we are aware that our patient data are incomplete because genome-wide CNV data are missing from a large part of the patient cohort and re-emphasizes already established associations. This is not the case for controls, for which genome-wide data was used. As a direct consequence, even though the difference between the rates of CNV deletions in controls and cases decreased dramatically after applying a quality control filtering step, a slight difference remained between both cohorts. The lack of collected CNV data spanning sex chromosomes limited the analysis only to autosomal chromosomes. In addition, the distribution of CNVs that overlap known microdeletion syndromes such as DiGeorge syndrome and Williams syndrome is overrepresented in the dataset. Similarly, the degree of phenotyping varied across the different studies, and often only basic phenotypic terms relating to CHD were available. This made it impossible to refine the diagnosis to a precise phenotypic class of CHD in many individuals. Previous research has shown that the chances of finding a genetic cause of CHD is higher in syndromic, rather than non-syndromic CHD [6,7]. Therefore, it is not surprising that known CHD genes were enriched in this cohort. To identify non-syndromic causes of CHD, it is important to take into account previous findings, which have shown a significant excess of apparently deleterious inherited PTVs in unaffected parents [6]. To help address the challenges of identifying non-syndromic CHD genes, we have used an integrative approach, which has allowed us to look for novel CHD associations in a larger sample size, in a binary fashion. This will help to facilitate future studies. Variable expression and reduced penetrance are common features in CHD, including in even well-established conditions such as Noonan Syndrome [49]. Most cases of CHD occur as a one off in the family and when recurrences do happen, CHD subtypes are more likely to be discordant than concordant [50]. This suggests that other modifying factors may be at play, including genetic, or environmental factors, or both. This presents a further challenge in identifying genetic causes of CHD. Moving forward, detailed phenotypic descriptions using a standardized system, and better data sharing strategies (including primary data), will facilitate further gene discovery and improved genotype-phenotype correlation in CHD subgroups. In summary, we have performed an integrative analysis of CNVs, DNVs, o/e LOF ratio and expression during heart development amongst more than 10,000 CHD patients. Our analyses identify seven potential disease genes and mechanisms with novel association with CHD and strengthen previously reported associations.

Materials and methods

Ethics statement

This project was based on unidentifiable data and did not require approval by science ethics committees in Denmark or Christian-Albrechts-Universität in Kiel.

Cohort description

Our cohort contains 7,958 CHD cases and 14,082 controls (see summary at ). Data from both affected and unaffected individuals were collected from 190 different CNV studies (). Most of the CNV data included in the present study were assembled from public repositories, data available from literature as well as clinical data (see for a more detailed description). We sampled all available and accessible studies as of February 2018 cited in PubMed. We have focused on studies incorporating Caucasian samples (the larger sampled population) to decrease population effects. Given no primary data were available for most of the studies, we cannot exclude the possibility that non-Caucasian samples were included. Both non-syndromic and syndromic CHD cases were included and CNVs were mapped to the human genome build NCBI37/hg19. Phenotype information was reviewed and if possible standardised across studies, to ensure consistency and accuracy. We used as controls re-analysed samples from the Wellcome Trust Case Control Consortium 2, the Genetic Association Network (GAIN) and the Ottawa Heart Institute (). In addition, we built a dataset from the two largest DNV studies in CHD published thus far, which include a total of 2,489 parent-offspring trios [6,7].

Determining the gene sets

The compiled highly confident CHD gene list consisted of genes with plausible disease-causing mutations in an interpretable functional region of a gene reported in association with CHD in three or more unrelated individuals. Genes, where fewer than three reports of CHD exist, were also considered if there was available solid functional evidence, such as a mouse model that displays CHD or in silico evidence. CHD genes were then coded as either non-syndromic (isolated CHD), or syndromic based on published phenotypes. The list of genes associated with developmental disorders was derived from the Developmental Disorder Genotype to Phenotype (DDG2P) list maintained by Decipher and the European bioinformatics Institute [51]. All genes were annotated with either monoallelic or biallelic as appropriate, based on the published literature ().

CNV analysis

Only autosomal CNVs were included in the analysis. All the CNV boundaries were determined using genome build NCBI37/hg19. For the CNVs provided in hg18, we used the Assembly Converter (https://www.ensembl.org/Homo_sapiens/Tools/AssemblyConverter) build on CrossMap (http://crossmap.sourceforge.net/) to convert samples to NCBI37/hg19. Also, an extra validation step of all CNV boundaries was performed using the R-package BSgenome.Hsapiens.UCSC.hg19. Smaller and longer CNVs were filtered out by applying a size cut-off of 5 Kb and 20 Mb as lower and upper limit, respectively. It has been demonstrated before that smaller and larger CNV calls tend to have a high rate of false positives [52,53]. We removed CNVs overlapping more than 50% of telomeres, centromeres and segmental duplication regions. In addition, we computed the internal CNV frequencies by counting the number of relative overlaps (>50% reciprocal overlap) on the CNVs control subset divided by the total number of controls. The internal MAF was computed for deletions and duplications subsets separately. Only CNVs with a minor allele frequency (MAF) < 0.01 in controls and overlapping ten or more CNV platform calling probes (Affymetrix Array 6.0 and Illumina Human660W-Quad) were considered for downstream analysis. Our analysis was focused only on CNV deletions. The distributions of the number of CNV deletions per individual within the case and control groups were compared (two-sided Wilcoxon rank sum test) to evaluate the impact of the quality control filtering step (). After filtering, 6,746 cases (3, 929 harbouring CNV deletions) and 14,024 controls (12,585 harbouring CNV deletions) remained for further analysis. A region-based permutation test (using PLINK version 1.07, test ‘—cnv-test-region—mperm 10000’) was used on the filtered set to perform a case-control association analysis. For the gene-based permutation analysis, we reported both the ‘point-wise’ empirical p-value (EMP1) and the empirical adjusted p-value (EMP2), which controls the family-wise error rate (FWER) (http://zzz.bwh.harvard.edu/plink/perm.shtml). In addition to the gene-centered permutation testing, a similar region-based permutation analysis was performed to access enrichment in known CNV deletion syndromes. All CNVs deletions passing QC filtering overlapping these regions were considered in the analysis. The region genomic coordinates and syndrome descriptions were downloaded from the Database of genomic variation and phenotype in humans using Ensembl resources (Decipher, https://decipher.sanger.ac.uk/disorders#syndromes/overview).

CNV burden test on gene sets

A logistic regression-based burden test (‘cnv-enrichment-test’ in PLINK v1.7) [11] was performed on different gene sets (known CHD genes (non-syndromic/syndromic/biallelic/monoallelic), developmental disorder genes, known haploinsufficient disease genes [54], autosomal recessive disease genes [55,56] and low observed/expected LOF ratio genes, ) using the rare CNV deletions passing the quality control and filtering stage. For every gene set examined, the binary phenotype (CHD case or control) was regressed on the number of genes disrupted by one or more CNVs. The averaged CNV size and the number of segments per individual were used as covariates into the model to control for potential differences between cases and controls as suggested by Raychaudhuri et al [11]. In addition, the PLINK implementation of this test was slightly modified by including a third (categorical) covariate, the sample study ID, since we have assembled the CNV data from different sources.

DNV analysis

The assembled DNV dataset (Sheet A in ) was re-annotated using the Variant Effect Predictor (VEP version 90) tool. All the DNVs included in this study were validated with the VariantValidator tool [57] (Sheet B in ). Based on the VEP annotation, we classified every variation into three major functional groups as follows: a) Protein truncation variant (stop_gained, splice_acceptor, splice_donor, frameshift, initiator_codon, start_lost, conserved_exon_terminus), b) missense variant (stop_lost, missense, inframe_deletion, inframe_insertion, coding_sequence, protein_altering) and c) silent variant (synonymous). Variants with minor allelic frequency (MAF) > 0.01 in gnomAD database were excluded from the analysis. The rates of rare DNVs (MAF < 0.01) in both DNV studies [6,7] were compared (Poisson test) for different variant consequence groups (PTV, missense and synonymous). No significant differences were found between the DNV rates for any of the evaluated groups (). De novo variation recurrence significance testing was performed to evaluate the impact of DNVs at gene level using the Mupit tool [12]. By default, Mupit uses the sequence-specific variation rate published by Samocha et al [58]. A second test, DeNovoWEST [13], was used to assess gene-wise de novo variation enrichment. DeNovoWEST assigns a variation severity score (based on the variant consequences and the CADD score) to all classes of variants as a proxy of its deleteriousness. For each tested gene, the minimal p-value obtained from Mupit and DeNovoWEST was reported (P). The corrected P value was computed using the Bonferroni method with n = 18,272.

Inferring differentially and highly expressed genes

Differentially expressed genes (DEGs) were identified by comparing the gene expression profile in heart to kidney and liver at matched time points. We used maSigPro R-package [59] for inferring genes with dynamic temporal profiles from time-course transcriptomic data as previously described by Cardoso-Moreira et al [17]. As the input for maSigPro, we used the count per million matrix (CPM, output from EdgeR package) hosted in ArrayExpress (E-MTAB-6814). Genes which did not reach a CPM > 0.5 in at least five samples were excluded from the analysis. We ran maSigPro on the time scale measured in days post-conception using defaults parameters and only included time points with at least two biological replicates. A gene was selected as DEG if the R (goodness-of-fit) parameter was higher than 0.50 and Bonferroni corrected P < 0.01. The R parameter distinguish genes with clear expression trends from genes with ‘flat’ expression profile. lists the final DEGs identified in the heart (R > 0.50). To assess the gene expression levels in the heart, the RPKM matrix was used. Gene expression levels were averaged among samples in the different development stages of the heart as follow: early development (4wpc-8wpc), maturation (9wpc-20wpc), infant/adult (newborn-adulthood). Genes were ranked based on the computed mean expression.

Identification of CNV/DNV enriched PPI sub-clusters

A protein-protein interaction network was constructed using the GeNets framework [20] and the information from InWeb [60] and ConsensusPathDB [61] protein-protein interaction databases. Nodes in the network correspond to proteins whereas edges represent their physical interactions. The network was strictly seeded with 492 candidate genes, those with a significant adjusted metaP < 0.05 (Benjamini-Hochberg’s false discovery rate, FDR). The PPI network was partitioned into overlapping sub-clusters using the in-built clustering method described in GeNets [20]. Only statistically significant sub-clusters (p-value < 0.05, permutation test) with at least 5 proteins were considered for further analysis. Finally, Gene Ontology enrichment analysis (Biological Process database 2018) of each identified sub-cluster was performed using the enrichr tool (https://maayanlab.cloud/Enrichr/).

Distribution of overlapping CNVs, size and genes in 63 CHD loci.

A) CNVs per gene. Overlapping CNVs for each of the 528 significant candidate genes are shown as box-and-whiskers plots. Statistically significant difference was observed between the two distributions (Mann-Whitney test, ***: P<0.001). B) Size of loci in kilobase-pairs (kbp). C) Number of genes per locus. Median values are shown above each box. (TIF) Click here for additional data file.

Statistical framework to discover novel candidate CHD genes by integrating DNV and CNV deletions.

The workflow follows four major steps: 1) Data aggregation and quality control of both DNV data and CNV data, 2) DNV rate-based enrichment testing and CNV deletions case/control association analysis at gene level are performed independently, 3) the results are combined using the Fisher method and 4) P-values are Bonferroni corrected using the Independent Hypothesis Weighting method (IHW). As independent covariate for the IHW method, the o/e LOF ratio upper bound fraction (LOEUF) was used. (TIF) Click here for additional data file.

Heart expression pattern of the 21 significant genes at different heart development stages.

Panels show three different heart development stages: early development (red), maturation (green) and infant/adult (blue). The x-axis denotes the percentile rank of heart expression in the heart. The y-axis denotes the o/e LOF ratio upper bound fraction (LOEUF) from gnomAD. Dashed lines denote the threshold for highly expressed genes (expression rank > = 0.75) and highly LOF constrained genes (LOEUF < = 0.30). (TIF) Click here for additional data file.

The functional network enriched for proteins encoded by genes affected by CNVs and/or DNVs associated with CHD.

Ten sub-clusters were identified using GeNets. Proteins are shown as nodes, interactions as edges. Enrichment for CNVs (blue), DNVs (green) or both independently (purple) are highlighted. Proteins with no specific enrichment for CNV and/or DNVs but with B-H adjusted metaP < 0.05 are highlighted in red. The size of the circles denotes if the gene was found significantly highly and/or differentially expressed in the heart (large circles: significant expression; small circles: non-significant). (TIF) Click here for additional data file. Distribution of the number of CNV deletions per individual in both control and CHD case cohorts before (A) and after (B) applying the quality control filtering approach. Differences between the distributions were tested using a two-sided Wilcoxon rank sum test. ****: P<0.0001. (TIF) Click here for additional data file.

Number of probands in the CNV cohort.

Stratified by CHD cases/controls and CNV type (deletion/duplication). (XLSX) Click here for additional data file.

Sources of the CNV-case cohorts used in this study.

(XLSX) Click here for additional data file.

Sources of the CNV-control cohorts used in this study.

(XLSX) Click here for additional data file.

Gene set-based logistical regression enrichment CNV analysis.

(XLSX) Click here for additional data file.

Gene set-based logistical regression enrichment CNV analysis stratified by observed/expected LOF ratio (from gnomAD).

(XLSX) Click here for additional data file. Sheet A) Gene-based case/control CNV-deletions permutation testing (PLINK results). Sheet B) Significant locus (Locus ID and contributing genes). (XLSX) Click here for additional data file.

Case/control CNV permutation testing on known deletion syndrome (PLINK results).

(XLSX) Click here for additional data file.

CNV case/control permuatation testing output from PLINK.

The table shows the 528 significant genes combining both CHD and TAA cases (CHD+TAA), the contribution of only CHD cases (only CHD) and the contribution of only TAA cases (only TAA). (XLSX) Click here for additional data file.

Comparision of the DNV rates, stratified by variant consequences, between two independent cohorts.

(XLSX) Click here for additional data file.

Gene-based DNV analysis.

(XLSX) Click here for additional data file.

Meta-analysis of CNV/DNV stratified by CHD sub-types (Conotruncal and LVOTO).

Table shows the four genes with Bonferroni corrected metaP < 0.05. Cases/Controls: Number of cases and controls carrying CNV deletions overlapping the gene in the CNV analysis. p_cnv: p-value from the CNV permutation test. nsDNV: Number of constrained non-synonymous variations in the de novo analysis. p_dnv: p-value from the DNV analysis. metaP: combined p-value (P and P) using the Fisher method. adj metaP: Bonferroni corrected p-value using independent hypothesis weighting (IHW) and LOEUF metric from gnomad as covariate. (XLSX) Click here for additional data file.

Differentially expressed genes in the heart compared to kidney and liver.

(XLSX) Click here for additional data file.

List of gene sets used in the CNV enrichment analysis.

(XLSX) Click here for additional data file. Sheet A) List of the de novo variants analized in this study. Sheet B) Validation results of the DNVs from the VarinatValidator tool. (XLSX) Click here for additional data file.

Rare CNV deletions (MAF 0.01) used in this study.

CNVs bounderies are determinated in genome build hg19. (XLSX) Click here for additional data file. 4 Feb 2021 Dear Dr Hitz, Thank you very much for submitting your Research Article entitled 'Integrative analysis of genomic variants reveals new associations of candidate haploinsufficient genes with congenital heart disease' to PLOS Genetics. The manuscript was fully evaluated at the editorial level and by independent peer reviewers. The reviewers appreciated the attention to an important problem, but raised some substantial concerns about the current manuscript. Based on the reviews, we will not be able to accept this version of the manuscript, but we would be willing to review a much-revised version. We cannot, of course, promise publication at that time. Should you decide to revise the manuscript for further consideration here, your revisions should address the specific points made by each reviewer. We will also require a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. If you decide to revise the manuscript for further consideration at PLOS Genetics, please aim to resubmit within the next 60 days, unless it will take extra time to address the concerns of the reviewers, in which case we would appreciate an expected resubmission date by email to plosgenetics@plos.org. If present, accompanying reviewer attachments are included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist. To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see our guidelines. Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission. While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool.  PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process. To resubmit, use the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder. [LINK] We are sorry that we cannot be more positive about your manuscript at this stage. Please do not hesitate to contact us if you have any concerns or questions. Yours sincerely, Anthony B Firulli Associate Editor PLOS Genetics Gregory Barsh Editor-in-Chief PLOS Genetics Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: This paper by Audain et al is a massive study collecting CNVs/DNVs data across multiple sources in an attempt to increase the power to determine involvement in CHD, and then explore these CNVs and DNVs to find individual genes and networks. They focused only on solidly pathogenic changes due to the nature of the case data, a trade off to using all CNVs that could identify new CNVs worth consideration as novel pathogenic variants. A large control group was selected across several datasets. The authors have used orthogonal analyses for several parts of the study (such as using Mupit and DeNovoWEST) when no one gold standard exists, and then considered the genes in the CNVs in different ways (straight association, as part of heart tissue expression, and then as differential expression, and as a network analysis). Overall, the study is quite well executed, and the paper provides nice confirmation of known information (specific CNVs, known genes and involvement of specific pathways). In addition, they identify several novel genes that warrant investigation. The methods are scant in the description of the study design. The authors should state their search strategy for the literature and the public datasets chosen. Of the studies identified, how were they selected for inclusion or exclusion? Were the CNV boundaries determined using the same build of the genome? How did the authors determine the controls and cases were of similar ancestral groups? While this is likely less of a problem for CNVs, it would be helpful to note any differences between groups. Did the authors include all genes within a CNV, or was there some selection? A definition of the different gene sets used in the analysis is needed. For example, it is not clear what difference there is between monoallelic and non-syndromic sets. Was the phenotype information clean enough and were there enough individuals to explore the CHDs by subgroups? The genetic architecture is quite different for instance between conotruncal defects and left sided defects. The authors should also acknowledge an additional bias of the case data collected. The CNVs are highly selected from only the cases published due the presence of a CNV so are not representative of all individuals with CHD. Reviewer #2: In this manuscript, the authors present their study on an integrative analysis to identify copy number variants (CNVs) and de novo intragenic variants (DNVs) in patients with congenital heart disease (CHD) with sporadic thoracic aortic aneurysm. The authors used published and publicly available datasets of CNVs and DNVs identified in probands with CHD (10,447) to perform a genome-wide integrative meta-analysis. To achieve their goal the authors implemented a statistical approach which allows the integration of different types of genomic variants. Based on their study they identified 21 significant haploinsufficient CHD genes. Among these genes 14 are already known as CHD genes and 7 have not previously been associated with CHD. The authors evaluated expression pattern of these genes in the heart using available data from the recent human atlas study. In summary, this study used astutely available datasets to identify novel CHD genes. Although the basic insight regarding the identification of novel genes involved in CHD is very interesting, there are some weaknesses in the data reported in this study. Major concerns: - My main concern is related to diversity of CHD used in this study. The authors used CHD as a generic denomination. There is no detail regarding the type and class of CHD used in this study. For instance, conotruncal and non-conotruncal groups would be a good indicator to better classify these CHDs in order to improve the analysis. - I wonder if there is a bias in using these publicly available datasets since their integrative analysis revealed significant enrichment for the known CHD genes, which is mainly explained by the contribution of monoallelic CHD genes and syndromic CHD gene set. This should be better discussed and the authors should propose alternative analysis to identify non-syndromic CHD genes. - It is very surprising to see that one of the most frequent syndromes (22q11/DiGeorge syndrome) is not among the CNV deletion syndrome regions identified in this study. Particularly, the authors mentioned that the distribution of CNVs that overlap known microdeletion syndrome such as DiGeorge syndrome is overrepresented in the dataset. I wonder about the quality of the analysis and the filters applied to identify CHD genes. - The authors used bulk-RNAseq dataset to determine if the 21 significant genes are expressed in the heart at different developmental time points. However, there is no information regarding the region of the heart where these genes are expressed. This information could be useful in relation to the type of CHD they are associated with. Minor points: - S6 Table: It is not evident to find the 528 genes. The authors should better highlight them). - The authors should better explain why it is important to known that genes remain highly expressed in adulthood. - It is not clear why the authors have conducted a systems-level analysis to identify global mechanisms if there is no mechanism proposed in this study. Reviewer #3: The manuscript by Audain et al. performs a meta-analysis of copy number variants and de novo variants identified by exome sequencing from previously published studies. The case and control cohort sizes are large and the authors have established an extensive cohort. Using burden analyses the authors identified 21 genes associated with CHD including 14 previously known genes and 7 “novel” genes. In many ways this manuscript provides proof of principle that meta-analyses from a large number of studies, including single case studies, can be performed and platform results can be harmonized for both DNV and CNVs and this is an important contribution. However, the novel findings are limited and the goal of using integrative analyses to identify CHD disease genes in multifactorial inheritance is not accomplished here but it does provide a path forward. It would improve the impact of the manuscript to apply the expression and/or network analyses to a somewhat broader group of candidate genes. Major comments: 1. The study focuses on CNV deletions but it would be useful to perform analyses for duplications as well particularly for overlaps with the deletions as this would further support dosage sensitivity and mechanisms. 2. The control population is not described well. How carefully phenotyped are these controls? 3. Please clarify how genes were assigned to syndromic, non-syndromic, monoallelic, biallelic gene sets and other gene sets. What proportion of the genes are shared between lists and how is this accounted for in analyses? 4. Previous studies from some of the authors of this study have demonstrated enrichment of CNVs in specific subphenotypes of CHD. Additional information and analyses performed on subphenotypes would be useful to include as well as whether any previous findings can be expanded. 5. For TAA, previous studies from some of the authors indicate a higher burden of CNVs in familial TAA. What is the representation of familial vs sporadic in this meta-analysis and can any additional comments be made about CNV burden? 6. In Table 7 the method for identifying CNVs in controls needs explanation or better descriptors. Why are the CNV counts so high in well described genomic disorders such as 22q11.2/VCFS/DiGeorge (986 controls), 1p36 deletion syndrome (886 controls) or Cri du Chat (1488 controls). These values should also have established a MAF > .01 prompting filtering. More broadly, it is surprising that TBX1 was not identified in the burden analyses. Could the authors comment particularly as it was noted in the discussion that 22q11.2 cases were overrepresented and so TBX1 seemingly should have shown significant association by burden testing. 7. Some of the “novel” genes cause syndromes that do have well described CHD such as Wolf-Hirschhorn or ARID1B. The manuscript could be improved by a better discussion of modifiers or oligogenic mechanisms and reduced penetrance of CHD as the rule even in genetic syndromic conditions. 8. The data on TAA do not contribute significantly to the manuscript. 9. The manuscript is difficult to follow from an analysis standpoint at times. It would benefit from a flow diagram of analyses and/or including more of the methods within the manuscript. Minor: 1. The size cutoff of 5 kb and 20 Mb is relevant and should be moved to the methods of the main manuscript. 2. Fig. S4 – what does purple denote? 3. Fig. 5 - the x axis (Score) needs an explanation in the legend 4. Heading p. 12 CNV/DNVs hinder the function of specific protein complexes should be reworded since this has not been proven by this study. ********** Have all data underlying the figures and results presented in the manuscript been provided? Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Kim L McBride Reviewer #2: No Reviewer #3: No 23 Apr 2021 Submitted filename: Response_to_Reviewers_PGEN.docx Click here for additional data file. 24 May 2021 Dear Dr Hitz, Thank you very much for submitting your Research Article entitled 'Integrative analysis of genomic variants reveals new associations of candidate haploinsufficient genes with congenital heart disease' to PLOS Genetics. The manuscript was fully evaluated at the editorial level and by independent peer reviewers. The reviewers appreciated the attention to an important topic but identified some concerns that we ask you address in a revised manuscript We therefore ask you to modify the manuscript according to the review recommendations. Your revisions should address the specific points made by each reviewer. In addition we ask that you: 1) Provide a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. 2) Upload a Striking Image with a corresponding caption to accompany your manuscript if one is available (either a new image or an existing one from within your manuscript). If this image is judged to be suitable, it may be featured on our website. Images should ideally be high resolution, eye-catching, single panel square images. For examples, please browse our archive. If your image is from someone other than yourself, please ensure that the artist has read and agreed to the terms and conditions of the Creative Commons Attribution License. Note: we cannot publish copyrighted images. We hope to receive your revised manuscript within the next 30 days. If you anticipate any delay in its return, we would ask you to let us know the expected resubmission date by email to plosgenetics@plos.org. If present, accompanying reviewer attachments should be included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist. While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission. To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process. To resubmit, you will need to go to the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder. [LINK] Please let us know if you have any questions while making these revisions. Yours sincerely, Anthony B Firulli Associate Editor PLOS Genetics Gregory Barsh Editor-in-Chief PLOS Genetics Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #2: The manuscript by Audain et al. has been well revised. The authors have completed significant revisions based on my comments. I find the current version of the manuscript considerably improved. The authors have replied to my main concern regarding the diagnosis of specific CHD type. They have performed a sub-analysis of the data based on specific groups of heart defects. They have also replied to my question concerning the possibly bias of their analysis since they used previously published datasets. The current version included a discussion on this specific point. The authors have satisfactorily answered my question regarding the 22q11 locus. Absence of detailed information about spatiotemporal expression of identified genes with CHD does not confirm the link these genes in the different part of the heart and the type of CHD in which they are involved. The authors should have performed a better analysis to reinforce this specific issue. Reviewer #3: The authors have been responsive to the reviewers’ comments within the limitations of available data. The addition of the CHD phenotype subclassifications is useful and the discussion/interpretation of the findings is appropriate. The requested points for clarification have mostly been addressed. I have only minor comments. I did not note in the original submission that CNVs were compiled for autosomes only. I would suggest the discussion/limitations make more explicit the fact that X-linked candidate genes will not be identified by the integrative approach. The incorporation of the detailed methods is helpful. For the older platforms in the study, was the minimum CNV interval or maximum CNV interval (< 20 Mb) used? Are platform differences the explanation for the persistent differences between cases and controls even after filtering (FigS5)? ********** Have all data underlying the figures and results presented in the manuscript been provided? Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #2: Yes Reviewer #3: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #2: Yes: Stéphane ZAFFRAN Reviewer #3: No 18 Jun 2021 Submitted filename: Response_to_Reviewers_PGEN.docx Click here for additional data file. 23 Jun 2021 Dear Dr Hitz, We are pleased to inform you that your manuscript entitled "Integrative analysis of genomic variants reveals new associations of candidate haploinsufficient genes with congenital heart disease" has been editorially accepted for publication in PLOS Genetics. Congratulations! Before your submission can be formally accepted and sent to production you will need to complete our formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Please note: the accept date on your published article will reflect the date of this provisional acceptance, but your manuscript will not be scheduled for publication until the required changes have been made. Once your paper is formally accepted, an uncorrected proof of your manuscript will be published online ahead of the final version, unless you’ve already opted out via the online submission form. If, for any reason, you do not want an earlier version of your manuscript published online or are unsure if you have already indicated as such, please let the journal staff know immediately at plosgenetics@plos.org. In the meantime, please log into Editorial Manager at https://www.editorialmanager.com/pgenetics/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production and billing process. Note that PLOS requires an ORCID iD for all corresponding authors. Therefore, please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field.  This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. If you have a press-related query, or would like to know about making your underlying data available (as you will be aware, this is required for publication), please see the end of this email. If your institution or institutions have a press office, please notify them about your upcoming article at this point, to enable them to help maximise its impact. Inform journal staff as soon as possible if you are preparing a press release for your article and need a publication date. Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Genetics! Yours sincerely, Anthony B Firulli Associate Editor PLOS Genetics Gregory Barsh Editor-in-Chief PLOS Genetics www.plosgenetics.org Twitter: @PLOSGenetics ---------------------------------------------------- Comments from the reviewers (if applicable): ---------------------------------------------------- Data Deposition If you have submitted a Research Article or Front Matter that has associated data that are not suitable for deposition in a subject-specific public repository (such as GenBank or ArrayExpress), one way to make that data available is to deposit it in the Dryad Digital Repository. As you may recall, we ask all authors to agree to make data available; this is one way to achieve that. A full list of recommended repositories can be found on our website. The following link will take you to the Dryad record for your article, so you won't have to re‐enter its bibliographic information, and can upload your files directly: http://datadryad.org/submit?journalID=pgenetics&manu=PGENETICS-D-20-01841R2 More information about depositing data in Dryad is available at http://www.datadryad.org/depositing. If you experience any difficulties in submitting your data, please contact help@datadryad.org for support. Additionally, please be aware that our data availability policy requires that all numerical data underlying display items are included with the submission, and you will need to provide this before we can formally accept your manuscript, if not already present. ---------------------------------------------------- Press Queries If you or your institution will be preparing press materials for this manuscript, or if you need to know your paper's publication date for media purposes, please inform the journal staff as soon as possible so that your submission can be scheduled accordingly. Your manuscript will remain under a strict press embargo until the publication date and time. This means an early version of your manuscript will not be published ahead of your final version. PLOS Genetics may also choose to issue a press release for your article. If there's anything the journal should know or you'd like more information, please get in touch via plosgenetics@plos.org. 26 Jul 2021 PGENETICS-D-20-01841R2 Integrative analysis of genomic variants reveals new associations of candidate haploinsufficient genes with congenital heart disease Dear Dr Hitz, We are pleased to inform you that your manuscript entitled "Integrative analysis of genomic variants reveals new associations of candidate haploinsufficient genes with congenital heart disease" has been formally accepted for publication in PLOS Genetics! Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out or your manuscript is a front-matter piece, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Genetics and open-access publishing. We are looking forward to publishing your work! With kind regards, Agota Szep PLOS Genetics On behalf of: The PLOS Genetics Team Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom plosgenetics@plos.org | +44 (0) 1223-442823 plosgenetics.org | Twitter: @PLOSGenetics
Table 1

Top 21 significant genes arising from both the permutation-based test and the DNV rate-based test.

Cases/Controls: Number of cases and controls carrying CNV deletions overlapping the gene in the CNV analysis. P: p-value from the CNV permutation test. nsDNV: Number of constrained non-synonymous variations in the de novo analysis. P: p-value from the DNV analysis. Significant: The analysis where the gene was significant (dnv: DNV analysis, cnv: CNV analysis, both: Both analysis, none: Non-significant neither DNV nor CNV analysis). metaP: combined p-value (P and P) using the Fisher method. P: Bonferroni corrected p-value using independent hypothesis weighting (IHW) and LOEUF metric as covariate. LOEUF: o/e LOF ratio upper bound fraction from gnomAD. *All the 21 genes were significant after combining their p-values and applying Bonferroni correction. 1Evidence is from mouse models [24,62].

CNVDNVCombined
Genecasecontrol P cnv nsDNV P dnv *Significant metaP P ihw LOEUFKnown CHD
CHD7 616.80E-03182.84E-26dnv1.25E-268.05E-230.076Yes
KMT2D 001.00E+00181.32E-25dnv7.67E-244.93E-200.103Yes
NSD1 115.63E-01121.00E-14dnv1.90E-132.14E-090.095Yes
KMT2A 001.00E+0071.00E-14dnv3.32E-131.86E-090.065Yes
NOTCH1 10241.00E+0071.00E-14dnv3.32E-132.14E-090.097Yes
TAB2 1201.00E-0453.46E-09both1.03E-115.75E-080.098Yes
ANKRD11 1301.00E-0432.32E-05cnv4.85E-082.72E-040.107Yes
WHSC1 1101.00E-0438.73E-05cnv1.71E-079.96E-040.119No1
ADNP 001.00E+0049.94E-09dnv1.93E-071.13E-030.123Yes
DYRK1A 401.43E-0249.46E-07dnv2.59E-071.64E-030.214Yes
NALCN 1011.00E-0431.76E-04cnv3.32E-076.83E-030.522No
ELN 3001.00E-0421.77E-04cnv3.34E-077.50E-030.871Yes
WAC 704.00E-0431.31E-04none9.33E-075.44E-030.084No
RBFOX2 103.45E-0141.59E-07dnv9.72E-076.25E-030.194Yes
KANSL1 941102.00E-0423.38E-04none1.19E-066.92E-030.238Yes
MYO16 1321.00E-0428.38E-04cnv1.45E-061.74E-020.272No
MED13L 212.70E-0144.58E-07dnv2.09E-061.22E-020.064Yes
KDM5B 001.00E+0041.45E-07dnv2.43E-064.97E-020.572No
GATA6 001.00E+0051.80E-07dnv2.98E-061.92E-020.174Yes
ARID1B 401.31E-0241.48E-05none3.20E-061.87E-020.102No
FEZ1 1001.00E-0422.22E-03cnv3.62E-064.30E-020.414No
  59 in total

1.  De novo mutations in NALCN cause a syndrome characterized by congenital contractures of the limbs and face, hypotonia, and developmental delay.

Authors:  Jessica X Chong; Margaret J McMillin; Kathryn M Shively; Anita E Beck; Colby T Marvin; Jose R Armenteros; Kati J Buckingham; Naomi T Nkinsi; Evan A Boyle; Margaret N Berry; Maureen Bocian; Nicola Foulds; Maria Luisa Giovannucci Uzielli; Chad Haldeman-Englert; Raoul C M Hennekam; Paige Kaplan; Antonie D Kline; Catherine L Mercer; Malgorzata J M Nowaczyk; Jolien S Klein Wassink-Ruiter; Elizabeth W McPherson; Regina A Moreno; Angela E Scheuerle; Vandana Shashi; Cathy A Stevens; John C Carey; Arnaud Monteil; Philippe Lory; Holly K Tabor; Joshua D Smith; Jay Shendure; Deborah A Nickerson; Michael J Bamshad
Journal:  Am J Hum Genet       Date:  2015-02-12       Impact factor: 11.025

Review 2.  Wiedemann-Steiner syndrome as a major cause of syndromic intellectual disability: A study of 33 French cases.

Authors:  S Baer; A Afenjar; T Smol; A Piton; B Gérard; Y Alembik; T Bienvenu; G Boursier; O Boute; C Colson; M-P Cordier; V Cormier-Daire; B Delobel; M Doco-Fenzy; B Duban-Bedu; M Fradin; D Geneviève; A Goldenberg; M Grelet; D Haye; D Heron; B Isidor; B Keren; D Lacombe; A-S Lèbre; G Lesca; A Masurel; M Mathieu-Dramard; C Nava; L Pasquier; A Petit; N Philip; J Piard; S Rondeau; P Saugier-Veber; S Sukno; J Thevenon; J Van-Gils; C Vincent-Delorme; M Willems; E Schaefer; G Morin
Journal:  Clin Genet       Date:  2018-05-17       Impact factor: 4.438

3.  Jarid2 is among a set of genes differentially regulated by Nkx2.5 during outflow tract morphogenesis.

Authors:  Jeremy L Barth; Christopher D Clark; Victor M Fresco; Ellen P Knoll; Benjamin Lee; W Scott Argraves; Kyu-Ho Lee
Journal:  Dev Dyn       Date:  2010-07       Impact factor: 3.780

4.  Natural selection on genes that underlie human disease susceptibility.

Authors:  Ran Blekhman; Orna Man; Leslie Herrmann; Adam R Boyko; Amit Indap; Carolin Kosiol; Carlos D Bustamante; Kosuke M Teshima; Molly Przeworski
Journal:  Curr Biol       Date:  2008-06-24       Impact factor: 10.834

5.  A histone H3 lysine 36 trimethyltransferase links Nkx2-5 to Wolf-Hirschhorn syndrome.

Authors:  Keisuke Nimura; Kiyoe Ura; Hidetaka Shiratori; Masato Ikawa; Masaru Okabe; Robert J Schwartz; Yasufumi Kaneda
Journal:  Nature       Date:  2009-05-31       Impact factor: 49.962

Review 6.  Congenital heart defects in Kabuki syndrome.

Authors:  Shi-Min Yuan
Journal:  Cardiol J       Date:  2013       Impact factor: 2.737

7.  The contribution of rare variants to risk of schizophrenia in individuals with and without intellectual disability.

Authors:  Tarjinder Singh; James T R Walters; Mandy Johnstone; David Curtis; Jaana Suvisaari; Minna Torniainen; Elliott Rees; Conrad Iyegbe; Douglas Blackwood; Andrew M McIntosh; Georg Kirov; Daniel Geschwind; Robin M Murray; Marta Di Forti; Elvira Bramon; Michael Gandal; Christina M Hultman; Pamela Sklar; Aarno Palotie; Patrick F Sullivan; Michael C O'Donovan; Michael J Owen; Jeffrey C Barrett
Journal:  Nat Genet       Date:  2017-06-26       Impact factor: 38.330

8.  An informatics approach to analyzing the incidentalome.

Authors:  Jonathan S Berg; Michael Adams; Nassib Nassar; Chris Bizon; Kristy Lee; Charles P Schmitt; Kirk C Wilhelmsen; James P Evans
Journal:  Genet Med       Date:  2012-09-20       Impact factor: 8.822

9.  HIRA Is Required for Heart Development and Directly Regulates Tnni2 and Tnnt3.

Authors:  Daniel Dilg; Rasha Noureldin M Saleh; Sarah Elizabeth Lee Phelps; Yoann Rose; Laurent Dupays; Cian Murphy; Timothy Mohun; Robert H Anderson; Peter J Scambler; Ariane L A Chapgier
Journal:  PLoS One       Date:  2016-08-12       Impact factor: 3.240

10.  VariantValidator: Accurate validation, mapping, and formatting of sequence variation descriptions.

Authors:  Peter J Freeman; Reece K Hart; Liam J Gretton; Anthony J Brookes; Raymond Dalgleish
Journal:  Hum Mutat       Date:  2017-10-17       Impact factor: 4.878

View more
  3 in total

1.  Discovering a new part of the phenotypic spectrum of Coffin-Siris syndrome in a fetal cohort.

Authors:  Pleuntje J van der Sluijs; Marieke Joosten; Caroline Alby; Tania Attié-Bitach; Kelly Gilmore; Christele Dubourg; Mélanie Fradin; Tianyun Wang; Evangeline C Kurtz-Nelson; Kaitlyn P Ahlers; Peer Arts; Christopher P Barnett; Myla Ashfaq; Anwar Baban; Myrthe van den Born; Sarah Borrie; Tiffany Busa; Alicia Byrne; Miriam Carriero; Claudia Cesario; Karen Chong; Anna Maria Cueto-González; Jennifer C Dempsey; Karin E M Diderich; Dan Doherty; Stense Farholt; Erica H Gerkes; Svetlana Gorokhova; Lutgarde C P Govaerts; Pernille A Gregersen; Scott E Hickey; Mathilde Lefebvre; Francesca Mari; Jelena Martinovic; Hope Northrup; Melanie O'Leary; Kareesma Parbhoo; Sophie Patrier; Bernt Popp; Fernando Santos-Simarro; Corinna Stoltenburg; Christel Thauvin-Robinet; Elisabeth Thompson; Anneke T Vulto-van Silfhout; Farah R Zahir; Hamish S Scott; Rachel K Earl; Evan E Eichler; Neeta L Vora; Yael Wilnai; Jessica L Giordano; Ronald J Wapner; Jill A Rosenfeld; Monique C Haak; Gijs W E Santen
Journal:  Genet Med       Date:  2022-05-18       Impact factor: 8.864

2.  A novel KLF13 mutation underlying congenital patent ductus arteriosus and ventricular septal defect, as well as bicuspid aortic valve.

Authors:  Pradhan Abhinav; Gao-Feng Zhang; Cui-Mei Zhao; Ying-Jia Xu; Juan Wang; Yi-Qing Yang
Journal:  Exp Ther Med       Date:  2022-03-01       Impact factor: 2.447

3.  Network assisted analysis of de novo variants using protein-protein interaction information identified 46 candidate genes for congenital heart disease.

Authors:  Yuhan Xie; Wei Jiang; Weilai Dong; Hongyu Li; Sheng Chih Jin; Martina Brueckner; Hongyu Zhao
Journal:  PLoS Genet       Date:  2022-06-07       Impact factor: 6.020

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.