Literature DB >> 29025761

Genome-Wide Association Study to Find Modifiers for Tetralogy of Fallot in the 22q11.2 Deletion Syndrome Identifies Variants in the GPR98 Locus on 5q14.3.

Tingwei Guo1, Gabriela M Repetto1, Donna M McDonald McGinn1, Jonathan H Chung1, Hiroko Nomaru1, Christopher L Campbell1, Anna Blonska1, Anne S Bassett1, Eva W C Chow1, Elisabeth E Mlynarski1, Ann Swillen1, Joris Vermeesch1, Koen Devriendt1, Doron Gothelf1, Miri Carmel1, Elena Michaelovsky1, Maude Schneider1, Stephan Eliez1, Stylianos E Antonarakis1, Karlene Coleman1, Aoy Tomita-Mitchell1, Michael E Mitchell1, M Cristina Digilio1, Bruno Dallapiccola1, Bruno Marino1, Nicole Philip1, Tiffany Busa1, Leila Kushan-Wells1, Carrie E Bearden1, Małgorzata Piotrowicz1, Wanda Hawuła1, Amy E Roberts1, Flora Tassone1, Tony J Simon1, Esther D A van Duin1, Thérèse A van Amelsvoort1, Wendy R Kates1, Elaine Zackai1, H Richard Johnston1, David J Cutler1, A J Agopian1, Elizabeth Goldmuntz1, Laura E Mitchell1, Tao Wang1, Beverly S Emanuel1, Bernice E Morrow2.   

Abstract

BACKGROUND: The 22q11.2 deletion syndrome (22q11.2DS; DiGeorge syndrome/velocardiofacial syndrome) occurs in 1 of 4000 live births, and 60% to 70% of affected individuals have congenital heart disease, ranging from mild to severe. In our cohort of 1472 subjects with 22q11.2DS, a total of 62% (n=906) have congenital heart disease and 36% (n=326) of these have tetralogy of Fallot (TOF), comprising the largest subset of severe congenital heart disease in the cohort. METHODS AND
RESULTS: To identify common genetic variants associated with TOF in individuals with 22q11.2DS, we performed a genome-wide association study using Affymetrix 6.0 array and imputed genotype data. In our cohort, TOF was significantly associated with a genotyped single-nucleotide polymorphism (rs12519770, P=2.98×10-8) in an intron of the adhesion GPR98 (G-protein-coupled receptor V1) gene on chromosome 5q14.3. There was also suggestive evidence of association between TOF and several additional single-nucleotide polymorphisms in this region. Some genome-wide significant loci in introns or noncoding regions could affect regulation of genes nearby or at a distance. On the basis of this possibility, we examined existing Hi-C chromatin conformation data to identify genes that might be under shared transcriptional regulation within the region on 5q14.3. There are 6 genes in a topologically associated domain of chromatin with GPR98, including MEF2C (Myocyte-specific enhancer factor 2C). MEF2C is the only gene that is known to affect heart development in mammals and might be of interest with respect to 22q11.2DS.
CONCLUSIONS: In conclusion, common variants may contribute to TOF in 22q11.2DS and may function in cardiac outflow tract development.
© 2017 The Authors.

Entities:  

Keywords:  DiGeorge syndrome; chromosomes; genotype; ivelo-cardio-facial syndrome; tetralogy of Fallot

Mesh:

Substances:

Year:  2017        PMID: 29025761      PMCID: PMC5647121          DOI: 10.1161/CIRCGENETICS.116.001690

Source DB:  PubMed          Journal:  Circ Cardiovasc Genet        ISSN: 1942-3268


One of the greatest challenges in the area of human genetics is to understand the basis of phenotypic heterogeneity in known diseases. The 22q11.2 deletion syndrome (22q11.2DS; velocardiofacial syndrome/DiGeorge syndrome; Mendelian Inheritance in Man No. 192430, 188400) is one of the most common genomic disorders, occurring in 1 of 4000 live births.[1] Over 90% of affected individuals have a de novo, hemizygous 3 million base pair (Mb) deletion on chromosome 22q11.2.[2-4] All subjects with the deletion have features of the syndrome, but the clinical presentation is quite variable. For example, 60% to 70% of patients have congenital heart disease (CHD) involving the cardiac outflow tract (OFT) and aortic arch, whereas the rest have apparently normal cardiac structures.[5] Among the most serious defect observed in individuals with the 22q11.2DS is tetralogy of Fallot (TOF), which is defined by the presence of a ventricular septal defect, pulmonary stenosis, overriding aorta, and right ventricular hypertrophy. TOF is caused in part by failed migration or differentiation of second heart field mesodermal cells from the pharyngeal apparatus in embryos, needed to form or remodel the cardiac OFT.[6] TOF occurs in 1 of 2500 live births in the general population (Center for Disease Control and Prevention). Among individuals with the 22q11.2DS, 36% in our cohort has TOF, and among individuals with TOF, ≈15% have a 22q11.2 deletion.[7,8] See Editorial by See Among the genes in the deleted region on 22q11.2, TBX1, which encodes a T-box transcription factor, is the major candidate for CHD.[9-11] Tbx1 is expressed in the second heart field mesoderm, which is disrupted in 22q11.2DS.[9-11] Global inactivation[9-11] or second heart field–specific inactivation[12] of Tbx1 results in neonatal lethality with severe cardiac OFT defects. One hypothesis to explain variable phenotypic expression in the 22q11.2DS is the presence of pathogenic variants in TBX1 on the haploid allele of 22q11.2. Previously, we tested whether common or rare single-nucleotide variants (SNVs) in the coding region of TBX1 on the remaining allele of 22q11.2 were associated with CHD in 22q11.2DS subjects, but we did not find an association.[13] Another hypothesis is that there are copy number variations elsewhere in the genome that could explain differences in phenotypes. We previously found that a commonly occurring genomic duplication encompassing the glucose transporter gene, SLC2A3, was associated with CHD (P=2.68×10−4).[14] This copy number variation occurred in 5.8% of individuals with 22q11.2DS and CHD and 1.1% of those with 22q11.2DS and no CHD. Recently, a partial duplication of a chromatin modifier, KANSL1, was associated with CHD in a Chilean 22q11.2DS cohort.[15] However, these copy number variations occurred in only some deleted subjects with CHD and thus do not explain the basis of phenotypic variability in the majority of patients. Our goal was to identify common single-nucleotide polymorphisms (SNPs) that are associated with TOF in individuals with 22q11.2DS. We restricted our analyses to TOF because it is the largest single phenotypic category of severe CHD in our cohort. Restricting our analyses in this way may reduce heterogeneity in the genes that contribute to CHD in individuals with 22q11.2DS and thus may increase the power of a GWAS.

Methods

Human Subjects and Phenotype Data

We assembled a cohort of subjects with 22q11.2DS (Tables I and II in the Data Supplement). Subjects were previously recruited by the International Chromosome 22q11.2DS Consortium, the International 22q11.2 Brain Behavior Consortium (http://22q11-ibbc.org), and clinical groups that specialize in the treatment of individuals with 22q11.2DS. All subjects within the cohort had a clinical diagnosis of 22q11.2DS that was confirmed by the presence of a 22q11.2 deletion using fluorescence in situ hybridization or multiplex ligation-dependent probe amplification (SALSA MLPA kit P250 DiGeorge; MRC Holland, The Netherlands). Informed consent was obtained for all participants, and this study was conducted under an Internal Review Board-approved protocol at the Albert Einstein College of Medicine (CCI 1999-201). For this study, we used previously collected genomic DNA and phenotypic and demographic information. We obtained echocardiogram and cardiology reports to confirm the specific CHD diagnosis (eg, TOF).

SNP Array Genotype and Data Quality Control

Genomic DNA from 1244 study subjects was array genotyped using Affymetrix GeneChip Genome-Wide SNP 6.0 array. The majority of samples were genotyped was at the Genomics Facility core laboratory of Albert Einstein College of Medicine. However, 37 samples were genotyped in the Advanced Genomics laboratory core at the Children’s Research Institute (Milwaukee, WI) for clinical purposes,[16] and 191 Chilean samples were genotyped in the Center for Human Genetics, Clínica Alemana Universidad del Desarrollo, Santiago, Chile.[15] The raw data from all arrays were processed through the same pipeline using the same criteria. Genotype data from arrays with contrast quality control scores ≤0.4 per sample, contrast quality control <1.7 per batch, and Median Absolute Pairwise Difference metric >0.35, were excluded. Genotypes were called using the Birdseed V2 Genotyping Algorithm (call rate: 99.02±0.02%). To account for batch effects, BEAGLECALL Version 1.0.1 software was used to rescore genotypes.[17] SNPs with call rates <95%, minor allele frequency <1%, or Hardy–Weinberg equilibrium P value <10−5 were excluded. In addition, samples that showed second-degree relatedness or closer, based on identity by state, were removed. For each subject, deletion size was determined using the log2 intensity ratio as estimated by the Copy Number Analysis Module of Golden Helix Powerseat Package. The CEL files and genotype data are being deposited to National Center for Biotechnology Information database of Genotypes and Phenotypes phs001339.v1.p1. We performed imputation to increase the number of SNPs available for analysis. Only genotyped SNPs with minor allele frequency >1% were used for imputation. Haplotypes were prephased using SHAPEIT software,[18,19] and imputation was performed using IMPUTE2 with the 1000 Genomes Phase I data set as the reference panel.[20] Imputed SNPs with minor allele frequency ≤1% or imputation quality (INFO) scores ≤0.8 were excluded from the GWAS.

Statistical Methods

We used a case–control approach, in which individuals with 22q11.2DS and TOF were considered cases and individuals with 22q11.2DS without CHD were considered controls. We conducted principal component analyses to identify the PCs of race/ethnicity. Potential associations between TOF, sex, and deletion size were assessed using logistic regression adjusted for the first 4 PCs. A P value <0.05 was considered significant. The association between TOF and each SNP was assessed by logistic regression analysis under an additive genetic model using data from all study subjects. These analyses were performed using SNPTEST v2.5.2 https://mathgen.stats.ox.ac.uk/genetics_software/snptest/snptest.html) and accounted for the genotyping accuracy and first 4 PCs of race/ethnicity.[21,22] A P value of 5×10−8 was used as the genome-wide significance cutoff for single association tests. For a meta-analysis, the cohort was split into groups determined by principal component analyses. Each group was analyzed separately using logistic regression, and the results were meta-analyzed using the inverse-variance method. Power for these analyses was assessed using QUANTO (http://biostats.usc.edu/Quanto.html). Manhattan plots and quantile–quantile (Q–Q) plots were generated using Golden Helix Powerseat. For regions of interest identified in the GWAS, regional association plots were generated using LocusZoom software (http://locuszoom.sph.umich.edu/locuszoom/).[23] Conditional logistic regression analyses were performed to determine whether multiple variants within a region are independently associated with TOF in individuals with 22q11.2DS. Specifically, within a region, we conditioned on the genotyped SNP with the lowest P value and the first 4 principal components of race/ethnicity and individually evaluated the association of TOF with each additional SNP in the region. Conditional analyses were conducted using SNPTEST v2.5.2.

Linkage Disequilibrium Analysis of Whole-Genome Sequence to Identify Variants in Linkage Disequilibrium With GWAS Findings

Whole-genome sequencing of a subset (n=397) of our 22q11.2DS samples was performed using the Illumina HiSeq2000 and HiSeq X Ten platform at Hudson Alpha Institute for Biotechnology (Huntsville, Alabama) as part of the International 22q11.2 Brain and Behavior Consortium to find genes for schizophrenia. Variant calling was performed using PEMapper software for read mapping to the hg38 (GRCh38) reference genome and PECaller software for variant calling.[24] CrossMap (http://crossmap.sourceforge.net/) was used to convert genome coordinates between hg38(GRCh38) and hg19 (GRCh37).[25] To follow-up on the top result from the GWAS (Results), the genomic region (chromosome 5 [chr5]:88 703 723–91 409 593) from MEF2C through ARRDC3 was extracted from the whole-genome sequencing data. Functional annotation of SNVs was performed using the Variant Classification and the Annotate and Filter tools in the Golden Helix software. Nonexonic SNVs (intronic, intergenic) were removed, and predicted functional SNVs were used to generate an linkage disequilibrium (LD) matrix using Haploview. LD measurements of r2 >0.8 were used to define LD haplotype blocks.

Mouse Embryo Analysis and Whole-Mount RNA In Situ Hybridization

Gene expression profiling was previously performed to identify differentially expressed genes in the second heart field mesoderm of wild-type mouse embryos.[12] Data from wild-type embryos were extracted for evaluation of specific expression levels in this tissue. For in situ hybridization, RNA probes were generated from mouse embryo cDNA using digoxigenin-uridine triphosphate (Roche Diagnostic Corp, Indianapolis, IN; Table III in the Data Supplement). Swiss Webster strain wild-type mouse embryos were isolated at day (E)9.5 or E10.5 and used for experiments.

Results

Description of the 22q11.2DS Population

A total of 1472 unrelated subjects with 22q11.2DS were ascertained through multiple sources (Tables I and II in the Data Supplement) and were genotyped on Affymetrix 6.0 arrays. Genotypes (n≈6.6 million SNPs with minor allele frequency >0.01 and INFO>0.8) were also imputed. The characteristics of the study subjects are shown in Table 1. These subjects predominantly self-reported as white and non-Hispanic (Table 1). This was confirmed by principal component analyses using 4 PCs (we did not observe a difference between usage of 4 or 10 PCs) to estimate genetic ancestry (Figure I in the Data Supplement). The majority (191/217) of Hispanic subjects were recruited at the collection site in Santiago, Chile. Approximately 62% of the subjects (n=906) had CHD at birth. There are 4 sets of low copy repeats, termed LCR22A, B, C, and D, that span the 22q11.2 region. All of the subjects had a deletion of 1 allele of TBX1, which is located in the LCR22A-B interval, and the majority of subjects had the typical 3 Mb deletion flanked by LCR22A-D (Table 1; Table IV in the Data Supplement). In this cohort, neither sex nor deletion size were significantly associated with CHD in general or with TOF specifically (P>0.05).
Table 1.

Characteristics of Study Subjects

Characteristics of Study Subjects

GWAS to Identify Genetic Loci for TOF

The TOF phenotype comprised the largest individual group of subjects with severe intracardiac anomalies in our 22q11.2DS population (36%; Table 1; Figure 1). We conducted a GWAS to identify genetic variants associated with TOF. This analysis was based on data from 326 subjects with 22q11.2DS and TOF and 566 subjects with 22q11.2DS and normal cardiac anatomy. This study had a power of 80% to detect an odds ratio of >1.9 for a common SNP with an allele frequency >0.3 under a log-additive model at P<5×10−8.
Figure 1.

Distribution of cardiovascular phenotypes in 1472, 22q11.2 deletion syndrome (22q11.2DS) subjects. The number of subjects (y axis) sorted into phenotypes (x axis) is shown in the bar graph. All individuals have a hemizygous 22q11.2 deletion. The most serious cardiovascular diagnoses with the largest number of subjects is tetralogy of Fallot (TOF; n=326; black bar) among a total with congenital heart disease (CHD; n=906; gray bar) when compared with those with no intracardiac or aortic arch anomalies as detected by echocardiogram summary and cardiology report (white bar).

Distribution of cardiovascular phenotypes in 1472, 22q11.2 deletion syndrome (22q11.2DS) subjects. The number of subjects (y axis) sorted into phenotypes (x axis) is shown in the bar graph. All individuals have a hemizygous 22q11.2 deletion. The most serious cardiovascular diagnoses with the largest number of subjects is tetralogy of Fallot (TOF; n=326; black bar) among a total with congenital heart disease (CHD; n=906; gray bar) when compared with those with no intracardiac or aortic arch anomalies as detected by echocardiogram summary and cardiology report (white bar). Subjects of all races and ethnicities were included, and associations were assessed using logistic regression adjusted with the first 4 PCs of race/ethnicity. As neither deletion size nor sex was significantly associated with TOF in this cohort, these variables were not included in the logistic models. The genomic inflation factor (λ=1.02) and the Q–Q plot (Figure II in the Data Supplement) provided little evidence of a systematic deviation from the expected distribution of the test statistic. Three SNPs mapping to intron 61 of GPR98 (G-protein–coupled receptor 98) were significantly associated with TOF. The genotyped SNP, rs12519770 (P=2.98×10−8), and imputed SNPs, rs7720206 (P=2.22×10−8) and chr5: 90 067 043:D (P=2.10×10−8), showed the strongest association (Figure 2A, Table 2). These three SNPs seem to be in complete LD (Figure 2B and 2C). For rs12519770, the A allele was the risk allele with a frequency of 0.58 in TOF cases and 0.45 in controls, conferring an odds ratio of 1.69 (P=3.2×10−8) per copy of the A allele in the 22q11.2DS cohort (Table 2; Figure 2). There was also suggestive evidence of association between TOF and 2 additional groups of SNPs in GPR98. The top genotyped SNP in each cluster (rs6889138, rs6893710) is listed in Table 2 and illustrated in Figure 2B and 2C.
Figure 2.

Genome-wide association results for tetralogy of Fallot (TOF) in 22q11.2 deletion syndrome (22q11.2DS). A, Values in the Manhattan plot for TOF vs controls were plotted against their respective positions on the autosomal chromosomes. The red line represents the genome-wide significance threshold (P=5×10−8). The blue line represents the threshold for suggestive association (P=1×10−5). A single locus marked by the GPR98 (G-protein–coupled receptor V1) gene reached genome-wide significance. B, LD matrix of selected, predicted functional SNPs with top P values on the 5q14.3 region from WGS (Methods). The LD with respect to associated single-nucleotide polymorphisms (SNPs) with highest P values in the region is shown. The LD plot is based on r2 values. Key: r2=0 is given in white, 00.05) based on LD and imputed SNPs from the arrays was used to narrow the TOF signal to a 104.7 Kb region (chromosome 5 [chr5]: 90 057 563–90 162 285) as shown. Most of the common SNPs from this region have P values <10−5. ARRDC3 indicates Arrestin domain containing 3; CETN3, Centrin 3; CEU, Northern Europeans who are Utah residents part of the CEPH collection; GPR98, G-protein–coupled receptor V1; LYSMD3, LysM, putative peptidoglycan-binding, domain containing 3; MBLAC2, Metallo-β-lactamase domain containing 2; MEF2C, Myocyte-specific enhancer factor 2C; and POLR3G, RNA polymerase III subunit G.

Table 2.

Top SNPs in GPR98 Associated With TOF

Top SNPs in GPR98 Associated With TOF Genome-wide association results for tetralogy of Fallot (TOF) in 22q11.2 deletion syndrome (22q11.2DS). A, Values in the Manhattan plot for TOF vs controls were plotted against their respective positions on the autosomal chromosomes. The red line represents the genome-wide significance threshold (P=5×10−8). The blue line represents the threshold for suggestive association (P=1×10−5). A single locus marked by the GPR98 (G-protein–coupled receptor V1) gene reached genome-wide significance. B, LD matrix of selected, predicted functional SNPs with top P values on the 5q14.3 region from WGS (Methods). The LD with respect to associated single-nucleotide polymorphisms (SNPs) with highest P values in the region is shown. The LD plot is based on r2 values. Key: r2=0 is given in white, 0rs12519770. The G allele of the top genotyped SNP, rs6889138, located in intron 74 of GPR98 and had a minor allele frequency (MAF) of 0.30 in TOF cases, and 0.21 in controls giving an odds ratio (OR) of 1.68 (P=1.72×10−7) per copy in the 22q11.2DS cohort. There were 14 SNPs in the second group that had suggestive association with TOF and were in modest LD with SNP, rs12519770 (D′=0.84, r2=0.02). The top SNP, rs6893710, is located in intron 47 of GPR98 and had a MAP of 0.058 in TOF cases, but 0.015 in controls, giving an OR of 4.05 (P=1.04×10−6) per copy of the C allele in the 22q11.2DS cohort (Table 2). C, LocusZoom plot of region of association at rs12519770 on 5q14.3 indicating −log10 P values (y axis) against the chromosomal positions of SNPs (x axis). The genotyped SNP with the strongest association signal in each locus is represented as a purple diamond; the other SNPs are colored according to the extent of LD (correlation r2 is based on CEU HapMap haplotypes) with this SNP. Estimated recombination rates (GRCh37/hg19, CEU; 1000 Genomes Project 2012) are shown as light blue lines. Genes are indicated below the LocusZoom plot. D, Fine mapping of whole-genome sequencing (WGS; MAF>0.05) based on LD and imputed SNPs from the arrays was used to narrow the TOF signal to a 104.7 Kb region (chromosome 5 [chr5]: 90 057 563–90 162 285) as shown. Most of the common SNPs from this region have P values <10−5. ARRDC3 indicates Arrestin domain containing 3; CETN3, Centrin 3; CEU, Northern Europeans who are Utah residents part of the CEPH collection; GPR98, G-protein–coupled receptor V1; LYSMD3, LysM, putative peptidoglycan-binding, domain containing 3; MBLAC2, Metallo-β-lactamase domain containing 2; MEF2C, Myocyte-specific enhancer factor 2C; and POLR3G, RNA polymerase III subunit G. Although our initial GWAS adjusted for the first 4 PCs of race/ethnicity, the observed associations may still reflect bias because of uncontrolled confounding resulting from population stratification. Consequently, we repeated our analyses for the top SNP, rs12519770, after separating the cohort into 3 groups: white, Admixed, and African, as determined by principal component analyses (Figure I in the Data Supplement). The P value for this SNP was significant in the meta-analysis (P=4.43×10−8; Table V in the Data Supplement), suggesting that the observed association is unlikely to be the result of population stratification. Within the 5q14.3 region, there seemed to be 3 clusters of SNPs that were associated with TOF. We refer to these as clusters 1, 2, and 3, and the clusters are ranked in ascending order based on the P value for the top SNP within the cluster. To determine whether >1 variant was independently associated with TOF, we performed conditional analyses in which we conditioned on the genotyped SNP in GPR98 with the smallest P value in cluster 1 (rs12519770; P=2.98×10−8) and individually evaluated the association of TOF with each of the additional SNPs in the 5q14.3 region (n=1344 SNPs, Table VI in the Data Supplement). In these conditional analyses, there was suggestive evidence for association with 1 SNP (rs6893710, P=3.92x10−5). This variant was the top SNP in the third cluster of associated genes (Table 2). The association of the top SNP in the second cluster (rs6889138) was attenuated in the conditional analysis (P=0.002).

Definition of the 5q14.3 Locus

To identify nonsynonymous variants that may be in LD with the rs12519770 and to narrow the region containing the association signal on 5q14.3 based on LD, we performed an LD analysis using existing whole-genome sequence data from 397 individuals with 22q11.2DS (http://22q11-ibbc.org; unpublished data, International 22q11.2 Brain and Behavior Consortium authors in Supplementary Table 1, 2017). These individuals comprise a subset of the samples genotyped on Affymetrix 6.0 arrays and were selected based on psychiatric but not cardiovascular phenotype. There were 9680 SNVs identified in the 2.7 Mb region around GPR98 (chr5: 8 799 640–90 704 983). There were 161 coding SNVs (102 nonsynonymous, 58 synonymous, and 1 splicing) and 115 SNVs in the 3′ or 5′-untranslated regions for a total of 276 SNVs (Table VII in the Data Supplement). None of these SNVs were in LD with rs12519770. The SNP, rs6893710, was in weak LD with the synonymous variant, rs41304884 (GPR98, NM_032119.3, c.16164G>A;D′=0.859, r2=0.389; Figure 2B). There were no nonsynonymous variants related to our association signal. Most GWAS signals that have been previously discovered are in intergenic regions and may mark transcriptional regulatory regions in the genome rather than genes themselves. To test this for TOF in 22q11.2DS, we examined the LD pattern from available whole-genome sequencing data on the same 397 22q11.2DS subjects to narrow the interval with SNPs showing the strongest association. We narrowed down the association signal to a 104.7 kb region on chromosome 5 (chr5: 90 057 563–90 162 285; Figure 2C and 2D, red block). Most of the common SNPs with P values <10−5 were located in this region (Figure 2C). A similar LD pattern has been observed in the white subset from the 1000 Genomes Project (Figure III in the Data Supplement). Thus, we were able to narrow the region of the association signal.

Genes Mapping to 5q14.3

Because the SNPs found within the intron of GPR98 might affect its regulation or, instead, the regulation of other genes in the region, we examined local chromosome conformation forming topologically associated domains (TADs).[26-28] To investigate higher-order chromatin-mediated looping, we extracted data for the 5q14.3 interval from the Hi-C browser.[29] There was chromatin interaction data for 28 different cell lines ranging from H1 embryonic stem cells to cancer cell lines.[29-31] We focused on TAD contact domains in a 3.3 Mb region including GPR98 (Figure 3; Figure IV in the Data Supplement). Because the 104.7 kb region with genetic association is within the GPR98 locus, it is possible that variants might affect its expression or that of nearby genes. In addition to GPR98, this region includes 6 additional genes (Figure 3; Figure IV in the Data Supplement), including 5 protein coding genes within the 2.3 Mb TAD: MEF2C (Myocyte enhancer factor 2C), CETN3 (Centrin 3), MBLAC2 (Metallo-β-lactamase domain containing 2), POLR3G (RNA polymerase III subunit G), and LYSMD3 (LysM, putative peptidoglycan-binding, domain containing 3). One gene, ARRDC3 (Arrestin domain containing 3), maps downstream of GPR98, but it is in a different TAD (Figure 3). The same domain structure marked by TAD triangles occurred in most of the cell lines that were examined (Figure IV in the Data Supplement).
Figure 3.

Representative heatmap of chromatin conformation for the MEF2C (Myocyte-specific enhancer factor 2C)-GPR98 (G-protein–coupled receptor V1) interval and in situ hybridization of selected genes in mouse embryos. A, Chromatin conformation map extracted from Hi-C data for the human lymphoblastoid cell line, GM12878, from the HapMap sample set.[29] Similar results were found using other cell types (Figure III in the Data Supplement). The intensity of each pixel represents the normalized number of contacts between a pair of loci. The purple line marking the triangle depicts the contact domain[31] between the GPR98 and the MEF2C locus. B, Lateral views of E9.5 and E10.5 wild-type embryos following in situ hybridization of antisense probes for Gpr98, Cetn3 (Centrin 3), Mef2c, and Lysmd3 (LysM, putative peptidoglycan-binding, domain containing 3). Purple color indicates mRNA expression. C, Bar graph of relative gene expression levels from Affymetrix microarrays for the microdissected distal pharyngeal apparatus from E9.5 stage mouse embryos (y axis). Individual genes are shown for comparison, including representative housekeeping genes with high expression (Actb, Gapdh), genes on chromosome 5q14.3 (Cetn3, Arrdc3, Mef2c, Polr3g, Lysmd3, Mblac2, and Gpr98) and representative low expressing genes (Il6, Olfr299), in descending order according to expression. A indicates anterior; Arrdc3 indicates Arrestin domain containing 3; Cetn3, Centrin 3; FB, forebrain; H, heart; HB, hindbrain; Lysmd3, LysM, putative peptidoglycan-binding, domain containing 3; Mblac2, Metallo-β-lactamase domain containing 2; OFT, cardiac outflow tract; P, posterior; Polr3g, RNA polymerase III subunit G; and S, somites.

Representative heatmap of chromatin conformation for the MEF2C (Myocyte-specific enhancer factor 2C)-GPR98 (G-protein–coupled receptor V1) interval and in situ hybridization of selected genes in mouse embryos. A, Chromatin conformation map extracted from Hi-C data for the human lymphoblastoid cell line, GM12878, from the HapMap sample set.[29] Similar results were found using other cell types (Figure III in the Data Supplement). The intensity of each pixel represents the normalized number of contacts between a pair of loci. The purple line marking the triangle depicts the contact domain[31] between the GPR98 and the MEF2C locus. B, Lateral views of E9.5 and E10.5 wild-type embryos following in situ hybridization of antisense probes for Gpr98, Cetn3 (Centrin 3), Mef2c, and Lysmd3 (LysM, putative peptidoglycan-binding, domain containing 3). Purple color indicates mRNA expression. C, Bar graph of relative gene expression levels from Affymetrix microarrays for the microdissected distal pharyngeal apparatus from E9.5 stage mouse embryos (y axis). Individual genes are shown for comparison, including representative housekeeping genes with high expression (Actb, Gapdh), genes on chromosome 5q14.3 (Cetn3, Arrdc3, Mef2c, Polr3g, Lysmd3, Mblac2, and Gpr98) and representative low expressing genes (Il6, Olfr299), in descending order according to expression. A indicates anterior; Arrdc3 indicates Arrestin domain containing 3; Cetn3, Centrin 3; FB, forebrain; H, heart; HB, hindbrain; Lysmd3, LysM, putative peptidoglycan-binding, domain containing 3; Mblac2, Metallo-β-lactamase domain containing 2; OFT, cardiac outflow tract; P, posterior; Polr3g, RNA polymerase III subunit G; and S, somites. We next determined whether any of the genes are expressed in the pharyngeal apparatus or heart in embryos. Probes were generated, and in situ hybridization was successfully performed for Gpr98, Cetn3, Lysmd3, and Mef2c in mouse embryos at E9.5 and E10.5, when the cardiac OFT is expanding (Figure 3). Although Gpr98 is weakly expressed at E9.5, it is strongly expressed in the neural tube region, particularly the hindbrain[32] (Figure 3). Cetn3, important in the cilia[33] for centrosome reproduction,[34] and Lysmd3, of unknown function, are ubiquitously expressed, although Cetn3 has lower expression levels in the heart itself (Figure 3). Mef2c encodes a MADS box transcription factor, and it is expressed in the pharyngeal apparatus including the second heart field mesoderm,[35] as indicated in Figure 3. Among the genes in the TAD, MEF2C is the only one specifically expressed in cardiac progenitor cells known to be required for heart development.[36] The second heart field mesodermal progenitor cell populations forming the cardiac OFT lie within the distal pharyngeal apparatus. We then examined expression levels of genes in the 5q14.3 region in existing Affymetrix microarray data from the microdissected distal pharyngeal apparatus.[12] The purpose was to determine whether any of the genes are expressed in this critical tissue. We compared expression levels of genes on 5q14.3 to the highest expressed genes (Actb, Gapdh), Tbx1 expression, and the lowest expressed genes (Il6, Olfr299) in this tissue as shown in Figure 3C. All of the genes are expressed in the pharyngeal apparatus, albeit Gpr98 is expressed at the lowest level, as can be also seen in Figure 3B. The Encyclopedia of DNA Elements functional genomics data were examined in the 104.7 kb region of LD with SNP, rs12519770, to identify possible regulatory regions. We found 3 possible regulatory regions defined by binding of multiple transcription factors, with one close to rs12519770 (Figure V in the Data Supplement). None of the SNPs that were genotyped in our study lie within these putative regulatory regions; however, critical embryonic regulatory regions could be different, and they have not yet been defined.

Discussion

We identified genome-wide significant associations between TOF and several SNPs in an intron of GPR98 in our 22q11.2DS cohort. We narrowed the associated region to a 104.7 kb interval. This interval may harbor functional variants that are in LD with the associated SNPs or noncoding variants that regulate the expression of genes within a broad TAD on 5q14.3. GPR98 contains 90 exons, spans over 610 kb, and encodes a member of the adhesion-G protein-coupled receptor family of receptors. The GPR98 protein binds calcium and is weakly expressed early in mouse embryonic development at E9.5, but it becomes more strongly expressed in the future brain and neural tube by E10.5. It is the largest of the 7-transmembrane receptors and has important functions in hearing and vision.[37,38] Recessive mutations cause Usher syndrome type 2C (Mendelian Inheritance in Man No. 605472), which is characterized by congenital hearing loss and progressive retinitis pigmentosa.[39] There are multiple splice variants present in GPR98, and it is not known if all isoforms have similar functions. Thus, it is possible that ≥1 splice variants could have a function in neural crest cells deriving from the neural tube. Neural crest cells are a migratory population of progenitor cells in the pharyngeal apparatus, which contribute to cardiac OFT septation. There are no reports of a possible function of GPR98 in the cardiovascular system and no known connections to human TOF. CETN3 is another gene of note in the 5q14.3 region because it encodes a centrin protein that functions in the cytoskeleton of centrosomes and cilia.[40] Cilia are critically important for conferring left right assymetry during embryonic development and when disrupted is associated with human cardiac anomalies[22] Because laterality defects are not commonly found in association with 22q11.2DS, more work would need to be done to provide additional support of the role of CETN3 as a modifier of TOF in these individuals. Among the remaining 5 genes (MEF2C, POLR3G, MBLAC2, LYSMD3, and ARRDC3), MEF2C is of particular interest because it encodes a transcription factor required in the second heart field mesoderm of the pharyngeal apparatus during embryogenesis for cardiac OFT development.[41,42] Because haploinsufficiency of TBX1 is important for 22q11.2DS, and both MEF2C and TBX1 are expressed in the second heart field progenitor cells, it is possible that TBX1 might act in the same genetic pathway as MEF2C. Further, studies in mouse models indicate that Tbx1 may be a negative regulator of Mef2c.[43] This suggests that genetic variants in the 5q14.3 locus that may be associated with MEF2C expression levels could act as genetic modifiers of 22q11.2DS, with the caveat that causation will require direct experimental support. Further, ISL1 transcription factor (ISL LIM homeobox 1 transcription factor) and GATA (transcription factor) proteins bind and regulate both Mef2c and Nkx2-5 (NK2 homeobox 5 transcription factor) cardiac development genes.[44] One hypothesis to test in the future by direct experimentation would be that rare DNA variants in MEF2C, ISL1, or NKX2-5 affect cardiac OFT formation in individuals with 22q11.2DS. In recent years, higher-order chromatin structure technologies have demonstrated that chromatin interactions occur in a nonrandom manner along the chromosome arms, which are separated into regions of highly interacting chromatin.[30,45-47] Relevant to this, the 104.7 kb region found with top associated SNPs to TOF shows a possible regulatory connection with MEF2C located 2 Mb upstream of GPR98 using available Hi-C, chromatin conformation data.[29] One limitation of using Hi-C data to draw conclusions about gene regulation is that these data indicate that the 2 genes may reside within the same topological region but do not prove that there is a definitive regulatory connection. Further chromatin conformation data in cell progenitors relevant to cardiac OFT development, followed by direct experimental approaches, will be required to define this interaction further. In addition, a better understanding of the regulatory landscape in this region will be needed to identify the mechanism(s) responsible for the association to TOF we have observed in the 5q14.3 region. Further support for MEF2C, as a possible modifier gene, comes from a recent GWAS of circulating VEGF (vascular endothelial growth factor) levels in blood in adults.[48] Although this is a study of adults, factors that regulate VEGF levels may be similar throughout life and could affect fetal development. Previously, it was found that absence of one of the VEGF isoforms causes a phenocopy of 22q11.2DS in mouse models.[49] In this recent GWAS of VEGF levels in adults, MEF2C and JMJD1C (Jumonji domain containing 1C) were found among the 6 loci with significant association to VEGF levels. JMJD1C is relevant because we previously found significant enrichment of rare predicted exonic variants in JMJD1C in whole-exome sequence from 184 22q11.2DS subjects in which the cases were enriched for TOF.[50] This provides a potential biological connection between common and previous rare variant analyses for 22q11.2DS. However, in regards to the general population, neither haploinsufficiency nor mutation of MEF2C in humans has been associated with any type of CHD thus far.[51-53] Further proof that MEF2C is a cardiac disease gene in the general population will require future genetic studies. The number of samples we obtained, even after 25 years of collection, is quite small for a GWAS of a complex trait. Thus, one of the limitations of the study was the lack of a true replication cohort. Another limitation of our study is possible population stratification. The majority of our cohort was self-reported as white. Nevertheless, a subset had different ethnicities requiring statistical correction in the analysis. Further, we examined each ethnicity separately and combined each group by performing a meta-analysis. We found that the top SNP in GPR98 was still statistically significant. In addition, the odds ratio was in the same direction in all the subcohorts, supporting our findings despite the limitations. In this report, we used available bioinformatic data to interpret our data. However, this study did not provide proof of causation. This will need to be done by performing functional studies including development of animal models. Despite the limitations of the study, this is the first GWAS to identify common variants that may modify the cardiac phenotype in a large cohort of individuals with 22q11.2DS.

Conclusions

A GWAS of TOF in 22q11.2DS has identified a significant locus on 5q14.3 harboring potential genetic risk factors. Several genes reside in this locus including MEF2C that is a known gene for cardiac OFT development in animal models. Further work needs to be done to ascertain whether MEF2C or other genes in this region act as modifiers of TOF in 22q11.2DS.

Acknowledgments

We thank the families with 22q11.2DS who provided DNA and clinical information. We acknowledge the Genomics and Molecular Cytogenetics Cores at Einstein. We also acknowledge Mark Zeffren, Nousin Haque, Antoneta Preldakaj, and Francisco Ujueta for project management and John Bruppacher, Dan Arroyo, Michael Gleeson, Dominique Calandrillo, and Frédérique Bena for technical support at Einstein. We also greatly appreciate the effort of Dr Frédérique Bena who works with S.E. and S.E.A. (Institute of Genetics and Genomics of Geneva, Switzerland). URLs: Golden Helix software: http://goldenhelix.com/; EIGENSOFT: http://genetics.med.harvard.edu/reich/Reich_Lab/Software.html; QUANTO: http://biostats.usc.edu/Quanto.html; IMPUTE: https://mathgen.stats.ox.ac.uk/impute/impute.html; HapMap: http://hapmap.ncbi.nlm.nih.gov/; LocusZoom: http://csg.sph.umich.edu/locuszoom/; PLINK 1.07: http://zzz.bwh.harvard.edu/plink/; R statistical software: http://www.r-project.org/; 1000 Genomes Project: http://www.1000genomes.org/; META: https://mathgen.stats.ox.ac.uk/genetics_software/meta/meta.html; SNPTEST v2.5.2: https://mathgen.stats.ox.ac.uk/genetics_software/snptest/snptest.html.

Sources of Funding

This work was supported by National Institutes of Health grants R01 HL084410 (Dr Emanuel, Dr Morrow, D.M. McDonald McGinn, Dr Guo, A.S. Bassett), P01 HD070454 (Dr Goldmuntz, Dr Mitchell, Dr Agopian, Emanuel, D.M. McDonald McGinn, Dr Mlynarski, Dr Guo, Dr Wang, Dr Nomaru, Dr Campbell), U01 MH101720 (Dr Emanuel, Dr Morrow, D.M. McDonald McGinn, Dr Repetto, Dr Bassett, Dr Bassett, Dr Swillen, Dr Gothelf, Dr Eliez, Dr Tassone, Dr Philip, Dr Bearden, Dr Simon, E.D.A. van Duin, Dr van Amelsvoort, Dr Kates, Dr Guo, Dr Wang), R21HL118637 (Drs Wang, Morrow, Guo, Goldmuntz), T32GM007491-41 (JHC). This work was supported by the American Heart Association, grant 14PRE199800006 (Dr Chung). Dr Repetto was supported by the Fondo National de Desarrollo Cientifico y Tecnologico-Chile (grants 1100131 and 1130392). Dr Bassett was supported by the Dalglish Chair in 22q11.2 Deletion Syndrome, the Canada Research Chair in Schizophrenia Genetics and Genomic Disorders, Canadian Institutes of Health Research funding (Missionary Orientation Program-97800 and Missionary Orientation Program-89066), and the University of Toronto McLaughlin Centre. Dr Bearden was supported by National Institutes of Health grant R01 MH085903. Dr Simon was supported by National Institutes of Health grant R01 HD042974. Dr Mitchell was supported by National Institutes of Health grant R21 HD060309-01. Dr Eliez was supported by the Swiss National Science Foundation (Swiss National Science Foundation 324730_121996; Swiss National Science Foundation 324730_144260).

Disclosures

None.
  53 in total

1.  A linear complexity phasing method for thousands of genomes.

Authors:  Olivier Delaneau; Jonathan Marchini; Jean-François Zagury
Journal:  Nat Methods       Date:  2011-12-04       Impact factor: 28.547

Review 2.  The second heart field.

Authors:  Robert G Kelly
Journal:  Curr Top Dev Biol       Date:  2012       Impact factor: 4.897

3.  Mutations in MEF2C from the 5q14.3q15 microdeletion syndrome region are a frequent cause of severe mental retardation and diminish MECP2 and CDKL5 expression.

Authors:  Markus Zweier; Anne Gregor; Christiane Zweier; Hartmut Engels; Heinrich Sticht; Eva Wohlleber; Emilia K Bijlsma; Susan E Holder; Martin Zenker; Eva Rossier; Ute Grasshoff; Diana S Johnson; Lisa Robertson; Helen V Firth; Arif B Ekici; André Reis; Anita Rauch
Journal:  Hum Mutat       Date:  2010-06       Impact factor: 4.878

4.  Human gene copy number spectra analysis in congenital heart malformations.

Authors:  Aoy Tomita-Mitchell; Donna K Mahnke; Craig A Struble; Maureen E Tuffnell; Karl D Stamm; Mats Hidestrand; Susan E Harris; Mary A Goetsch; Pippa M Simpson; David P Bick; Ulrich Broeckel; Andrew N Pelech; James S Tweddell; Michael E Mitchell
Journal:  Physiol Genomics       Date:  2012-02-07       Impact factor: 3.107

5.  Simultaneous genotype calling and haplotype phasing improves genotype accuracy and reduces false-positive associations for genome-wide association studies.

Authors:  Brian L Browning; Zhaoxia Yu
Journal:  Am J Hum Genet       Date:  2009-12       Impact factor: 11.025

6.  TBX1 is responsible for cardiovascular defects in velo-cardio-facial/DiGeorge syndrome.

Authors:  S Merscher; B Funke; J A Epstein; J Heyer; A Puech; M M Lu; R J Xavier; M B Demay; R G Russell; S Factor; K Tokooya; B S Jore; M Lopez; R K Pandita; M Lia; D Carrion; H Xu; H Schorle; J B Kobler; P Scambler; A Wynshaw-Boris; A I Skoultchi; B E Morrow; R Kucherlapati
Journal:  Cell       Date:  2001-02-23       Impact factor: 41.582

Review 7.  Studies on the very large G protein-coupled receptor: from initial discovery to determining its role in sensorineural deafness in higher animals.

Authors:  D Randy McMillan; Perrin C White
Journal:  Adv Exp Med Biol       Date:  2010       Impact factor: 2.622

8.  A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping.

Authors:  Suhas S P Rao; Miriam H Huntley; Neva C Durand; Elena K Stamenova; Ivan D Bochkov; James T Robinson; Adrian L Sanborn; Ido Machol; Arina D Omer; Eric S Lander; Erez Lieberman Aiden
Journal:  Cell       Date:  2014-12-11       Impact factor: 41.582

Review 9.  22q11.2 deletion syndrome.

Authors:  Donna M McDonald-McGinn; Kathleen E Sullivan; Bruno Marino; Nicole Philip; Ann Swillen; Jacob A S Vorstman; Elaine H Zackai; Beverly S Emanuel; Joris R Vermeesch; Bernice E Morrow; Peter J Scambler; Anne S Bassett
Journal:  Nat Rev Dis Primers       Date:  2015-11-19       Impact factor: 52.329

10.  Molecular definition of the 22q11 deletions in velo-cardio-facial syndrome.

Authors:  B Morrow; R Goldberg; C Carlson; R Das Gupta; H Sirotkin; J Collins; I Dunham; H O'Donnell; P Scambler; R Shprintzen
Journal:  Am J Hum Genet       Date:  1995-06       Impact factor: 11.025

View more
  12 in total

1.  Heterozygous Mutations in TBX1 as a Cause of Isolated Hypoparathyroidism.

Authors:  Dong Li; Christopher T Gordon; Myriam Oufadem; Jeanne Amiel; Harsh S Kanwar; Marina Bakay; Tiancheng Wang; Hakon Hakonarson; Michael A Levine
Journal:  J Clin Endocrinol Metab       Date:  2018-11-01       Impact factor: 5.958

2.  Deletion size analysis of 1680 22q11.2DS subjects identifies a new recombination hotspot on chromosome 22q11.2.

Authors:  Tingwei Guo; Alexander Diacou; Hiroko Nomaru; Donna M McDonald-McGinn; Matthew Hestand; Wolfram Demaerel; Liangtian Zhang; Yingjie Zhao; Francisco Ujueta; Jidong Shan; Cristina Montagna; Deyou Zheng; Terrence B Crowley; Leila Kushan-Wells; Carrie E Bearden; Wendy R Kates; Doron Gothelf; Maude Schneider; Stephan Eliez; Jeroen Breckpot; Ann Swillen; Jacob Vorstman; Elaine Zackai; Felipe Benavides Gonzalez; Gabriela M Repetto; Beverly S Emanuel; Anne S Bassett; Joris R Vermeesch; Christian R Marshall; Bernice E Morrow
Journal:  Hum Mol Genet       Date:  2018-04-01       Impact factor: 6.150

3.  Modifying Mendel Redux: Unbiased Approaches Can Find Modifiers.

Authors:  Kim L McBride; Stephanie M Ware
Journal:  Circ Cardiovasc Genet       Date:  2017-10

4.  Dynamics of genome reorganization during human cardiogenesis reveal an RBM20-dependent splicing factory.

Authors:  Alessandro Bertero; Paul A Fields; Vijay Ramani; Giancarlo Bonora; Galip G Yardimci; Hans Reinecke; Lil Pabon; William S Noble; Jay Shendure; Charles E Murry
Journal:  Nat Commun       Date:  2019-04-04       Impact factor: 14.919

5.  Lower [18F]fallypride binding to dopamine D2/3 receptors in frontal brain areas in adults with 22q11.2 deletion syndrome: a positron emission tomography study.

Authors:  Esther D A van Duin; Jenny Ceccarini; Jan Booij; Zuzana Kasanova; Claudia Vingerhoets; Jytte van Huijstee; Alexander Heinzel; Siamak Mohammadkhani-Shali; Oliver Winz; Felix Mottaghy; Inez Myin-Germeys; Thérèse van Amelsvoort
Journal:  Psychol Med       Date:  2019-04-02       Impact factor: 7.723

6.  Identification of Rare Variants in Right Ventricular Outflow Tract Obstruction Congenital Heart Disease by Whole-Exome Sequencing.

Authors:  Yue Zhou; Kai Bai; Yu Wang; Zhuo Meng; Shuang Zhou; Shiwei Jiang; Hualin Wang; Jian Wang; Mei Yang; Qingjie Wang; Kun Sun; Sun Chen
Journal:  Front Cardiovasc Med       Date:  2022-01-24

Review 7.  Genome-wide association studies of structural birth defects: A review and commentary.

Authors:  Philip J Lupo; Laura E Mitchell; Mary M Jenkins
Journal:  Birth Defects Res       Date:  2019-10-25       Impact factor: 2.661

8.  Candidate modifier genes for immune function in 22q11.2 deletion syndrome.

Authors:  Catherina T Pinnaro; Travis Henry; Heather J Major; Mrutyunjaya Parida; Lucy E DesJardin; John R Manak; Benjamin W Darbro
Journal:  Mol Genet Genomic Med       Date:  2019-12-12       Impact factor: 2.183

9.  Variance of IQ is partially dependent on deletion type among 1,427 22q11.2 deletion syndrome subjects.

Authors:  Yingjie Zhao; Tingwei Guo; Ania Fiksinski; Elemi Breetvelt; Donna M McDonald-McGinn; Terrence B Crowley; Alexander Diacou; Maude Schneider; Stephan Eliez; Ann Swillen; Jeroen Breckpot; Joris Vermeesch; Eva W C Chow; Doron Gothelf; Sasja Duijff; Rens Evers; Thérèse A van Amelsvoort; Marianne van den Bree; Michael Owen; Maria Niarchou; Carrie E Bearden; Claudia Ornstein; Maria Pontillo; Antonino Buzzanca; Stefano Vicari; Marco Armando; Kieran C Murphy; Clodagh Murphy; Sixto Garcia-Minaur; Nicole Philip; Linda Campbell; Jaume Morey-Cañellas; Jasna Raventos; Jordi Rosell; Damian Heine-Suner; Robert J Shprintzen; Raquel E Gur; Elaine Zackai; Beverly S Emanuel; Tao Wang; Wendy R Kates; Anne S Bassett; Jacob A S Vorstman; Bernice E Morrow
Journal:  Am J Med Genet A       Date:  2018-10-05       Impact factor: 2.802

Review 10.  Genomic frontiers in congenital heart disease.

Authors:  Sarah U Morton; Daniel Quiat; Jonathan G Seidman; Christine E Seidman
Journal:  Nat Rev Cardiol       Date:  2021-07-16       Impact factor: 49.421

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.