Literature DB >> 32960507

Identification of novel candidate risk genes for myelomeningocele within the glucose homeostasis/oxidative stress and folate/one-carbon metabolism networks.

Paul Hillman1, Craig Baker1,2, Luke Hebert1, Michael Brown3, James Hixson3, Allison Ashley-Koch4,5, Alanna C Morrison3, Hope Northrup1, Kit Sing Au1.   

Abstract

BACKGROUND: Neural tube defects (NTDs) are the second most common complex birth defect, yet, our understanding of the genetic contribution to their development remains incomplete. Two environmental factors associated with NTDs are Folate and One Carbon Metabolism (FOCM) and Glucose Homeostasis and Oxidative Stress (GHOS). Utilizing next-generation sequencing of a large patient cohort, we identify novel candidate genes in these two networks to provide insights into NTD mechanisms.
METHODS: Exome sequencing (ES) was performed in 511 patients, born with myelomeningocele, divided between European American and Mexican American ethnicities. Healthy control data from the Genome Aggregation database were ethnically matched and used as controls. Rare, high fidelity, nonsynonymous predicted damaging missense, nonsense, or canonical splice site variants in independently generated candidate gene lists for FOCM and GHOS were identified. We used a gene-based collapsing approach to quantify mutational burden in case and controls, with the control cohort estimated using cumulative allele frequencies assuming Hardy-Weinberg equilibrium.
RESULTS: We identified 45 of 837 genes in the FOCM network and 22 of 568 genes in the GHOS network as possible NTD risk genes with p < 0.05. No nominally significant risk genes were shared between ethnicities. Using a novel approach to mutational burden we identify 55 novel NTD risk associations.
CONCLUSIONS: We provide a means of utilizing large publicly available sequencing datasets as controls for sequencing projects examining rare disease. This approach confirmed existing risk genes for myelomeningocele and identified possible novel risk genes. Lastly, it suggests possible distinct genetic etiologies for this malformation between different ethnicities.
© 2020 The Authors. Molecular Genetics & Genomic Medicine published by Wiley Periodicals LLC.

Entities:  

Keywords:  Myelomeningocele; exome sequencing; folate; glucose; mutation burden

Mesh:

Substances:

Year:  2020        PMID: 32960507      PMCID: PMC7667334          DOI: 10.1002/mgg3.1495

Source DB:  PubMed          Journal:  Mol Genet Genomic Med        ISSN: 2324-9269            Impact factor:   2.183


INTRODUCTION

The neural tube is the embryonic precursor of the central nervous system and forms in humans with closure during Carnegie stages 12 and 13, coinciding with the 4–5th weeks of gestation. Neural tube defects (NTDs) are congenital malformations of the central nervous system occurring secondary to incomplete closure of the neural tube in the region of the head (anencephaly), below the head (spina bifida, SB), or along its entire length (craniorachischisis; de Bakker, Driessen, Boukens, van den Hoff, & Oostra, 2017). NTDs are the second most common complex congenital malformation following congenital heart defects with an estimated worldwide birth prevalence of 18.6 per 10,000 live births (Blencowe, Kancherla, Moorthie, Darlison, & Modell, 2018; Blom, 2009). Interestingly, significant regional variation exists, and birth prevalence in the United States (US) is estimated at 5–6 per 10,000 live births (Zaganjor et al., 2016). SB accounts for ~60% of all NTDs, while myelomeningocele (MM), the most severe subtype of NTD compatible with survival, accounts for greater than 90% of open SB cases(Au, Ashley‐Koch, & Northrup, 2010; Boulet et al., 2008). In the US, Mexican Americans (MA) and European Americans (EA) have the highest incidence rates of MM at 4.17 per 10,000 and 3.22 per 10,000 live births, respectively (Boulet et al., 2008). Occurrence of MM represents a significant economic and public health burden in the US. In 2013, the estimated NTD‐related healthcare costs in the US were in excess of 1.6 billion dollars (Arth et al., 2017). Studies to date have shown a multifactorial etiology of NTDs with both genetic and environmental contributions. A genetic component to NTD development risk has been reflected by increased familial recurrence, preponderance in monozygotic twins, and effects of ethnicity on incidence. Environmental components have been supported by variation of incidence rate across time, seasons, geography, and socioeconomic status (Blom, 2009). Two of the best associated environmental factors conferring an increased risk for NTD development are folate status and glucose metabolism, specifically maternal folate deficiency and maternal glucose excess (hyperglycemia). Forty years ago, folate metabolism was implicated in NTDs when low folate levels were seen in mothers of infants affected with NTDs and subsequent studies showed dietary folic acid supplementation reduced recurrence of NTDs up to 72% and first occurrence by 93% (GROUP MRC VITAMIN STUDY RESEARCH, 1991). However, no effect on NTD incidence was observed until the US mandated fortification of cereal grains with folic acid, leading to a decrease in incidence of NTDs by ~35% (Williams et al., 2015). Maternal diabetes, maternal obesity, and elevated glycemic index have all been associated with increased risk for NTDs (Correa et al., 2008; Stothard, Tennant, Bell, & Rankin, 2009; Yazdy, Liu, Mitchell, & Werler, 2010). Pregestational diabetes mellitus has recently been proposed as partly responsible for the recent rise in NTD incidence observed in Canada (Liu et al., 2019). The mechanism by which folate and glucose metabolism confer increased NTD development risk remains to be fully elucidated. Folate (vitamin B9) exists in cells as a family of enzyme cofactors that carry and activate methyl derived one‐carbon units, and is collectively known as the folate and one‐carbon metabolism network (FOCM network; Ebara, 2017). FOCM network metabolism is the primary source for one‐carbon residues important in numerous cellular activities and pathways including homocysteine remethylation, purine biosynthesis, and dTMP biosynthesis. Homocysteine remethylation is directly related to the generation of S‐adenosyl‐methionine (SAM), the universal methyl donor for nearly all one‐carbon metabolism essential to methylation of nucleic acids, proteins, and toxic metabolites (Au, Findley, & Northrup, 2017; Greene & Copp, 2014). This association also supports the growing evidence for the association of other closely related metabolic pathways involving riboflavin, B6, B12, choline, and cobalamin to NTD susceptibility (Au et al., 2017; Ebara, 2017; Nguyen et al., 2017). Furthermore, changes in DNA methylation patterns have been observed in NTD fetuses and correlated with folate levels in mothers while metabolic studies found changes in intermediate metabolites of FOCM in women with NTD‐affected pregnancies (Chang et al., 2011; Chi et al., 2014). The range of these abnormalities illustrate the broad impact of defects in the FOCM network. Considerable epidemiological evidence implicates glucose homeostasis and oxidative stress (GHOS) as closely linked with embryopathy at the cellular level, mostly through the lens of maternal diabetes. Proposed secondary teratogenic mechanisms resulting from maternal diabetes include nutritional deficiencies, mitochondrial disturbances, enhanced cellular stress, hypoxia, increased apoptosis, decreased autophagy, altered metalloproteinases, and epigenetic changes (Ornoy, Koren, & Yanai, 2018). Increased oxidative stress leads to increased proapoptotic signaling and mitochondrial dysfunction, associated with an overall increased risk of birth defects (Gabbay‐Benziv, 2015; Xu, Li, Wang, Weng, & Yang, 2013; Yang, Albert Reece, Wang, & Gabbay‐Benziv, 2015). Previous studies have shown an association between maternal diabetes and abnormalities of yolk sac structures that are inversely related to embryonic malformations (Dong et al., 2016). As mentioned above, genetic factors contribute to NTD development risk in addition to environmental factors. First degree relatives of NTD cases have a 30‐fold increased risk compared to the general population (3% vs. 0.1%) and second degree relatives have a 5‐fold increased risk (0.5% vs 0.1%; Au et al., 2017). Additionally, 2–16% of NTD cases are attributable to chromosome abnormalities, although these cases are considered purely genetic rather than multifactorial (Lynch, 2005). The multifactorial nature of nonsyndromic NTDs is supported by mouse models that require multiple deleterious factors in combination to result in a NTD phenotype (Zohn & Sarkar, 2008). Candidate gene and genome wide association studies have identified genes of the FOCM and GHOS networks that associate with NTDs (Crider, Yang, Berry, & Bailey, 2012; Davidson et al., 2008). Improvements in next generation sequencing (NGS) technology make the study of rare potentially functional variants in rare disease more obtainable. Recently, an alternative approach to the study of rare disease using NGS data for disease gene discovery has proposed using a gene‐based collapsing analysis where the incidence of rare and predicted damaging variants within individual genes is compared between cases and controls (Guo, Plummer, Chan, Hirschhorn, & Lippincott, 2018). The advantage of a gene‐collapsing approach is the potential to improve power, which limits isolated statistical analysis of rare variants, by unifying variants with similar functional effect into single variables. Two independent metabolic pathways, the FOCM and GHOS networks, represent two pathways implicated in embryonic environmental contribution to NTD risk. Genetic factors are also known to contribute to risk and damaging variants in genes playing a role in GHOS or FOCM may represent identifiable risk factors. To broadly explore the role of both of these metabolic pathways, we sequenced 514 exomes of MM patients representing the two most affected populations in the US and assessed the mutational burden with a gene collapsing approach utilizing the Genome Aggregation Database (gnomAD) as an ethnically matched control population (Lek et al., 2016).

MATERIALS AND METHODS

Ethical compliance

Subjects were recruited and affected individuals and/or their parents consented for enrolment in research studies in accordance with an institutional IRB at University of Texas Health Science Center (UTHealth) at Houston as described in Au et al., 2008 (Au et al., 2008).

Study design

The case population data represent exome sequencing of 511 individuals born with myelomeningocele. The case population was divided evenly between 257 EA and 254 MA subjects. The Genome Aggregation Database was used as a control population (Lek et al., 2016). The control for MA cases were the 8,556 control Latin/Admixed American (AMR) exomes and the controls for the EA were the 21,384 control Non‐Finnish European (NFE) exomes.

Folate and one‐carbon metabolism and glucose homeostasis/oxidative stress network candidate gene lists

Two candidate gene lists were generated. Genes of folate and one‐carbon metabolism previously associated with NTD risk were reviewed in Au et al., 2017 (Au et al., 2017). These genes were used as a seed list to retrieve relevant gene ontology (GO) terms from the Ensembl Biomart (Zerbino et al., 2018). GO Terms were manually reviewed for specificity to the FOCM network using key term searches including methylation, folate, methyl, choline, cobalamin, glycine, riboflavin, purine, and pyrimidine. Selected GO terms (Table S3) were returned to Ensembl Biomart to retrieve genes annotated to these terms (Table S4), creating the FOCM Network of 837 genes used in this study. Genes of glucose homeostasis and oxidative stress were retrieved directly from the Gene Ontology consortium using the GO terms glucose homeostasis (GO:0042593) and response to oxidative stress (GO:0006979; Carbon et al., 2019). This resulted in the GHOS Network of 568 genes used in this study (Table S5). Combined, the networks contained 1,363 genes, with 42 genes overlapping between both networks.

Exome sequencing

Genomic DNA samples were used for exome capture with TargetSeq reagents (Life Technologies, Inc.) based on high density oligonucleotide hybridization of GENCODE‐annotated coding exons, NCBI CCDS, exon flanking sequences (including intron splice sites), small non‐coding RNAs (e.g., microRNAs) and a selection of miRNA binding sites. After capture, libraries were constructed with addition of barcodes (AB Library Builder, Life Technologies, Inc.). Multiplexed sequencing used the Ion Proton platform (Life Technologies, Inc.) based on proton assays for polymerase sequencing of individual DNA molecules in wells of modified semiconductor chips. Reads were aligned to the hg19 reference genome.

Data filtering and annotation

Figure 1 describes the steps for quality control filtering that were applied to both the cases and the gnomAD controls. Briefly, variant level filtering excluded variants which did not “PASS” GATK filters, map quality score <20, or inbreeding coefficient <−0.3. To ensure analysis of high fidelity variants, individual sample calls were filtered to exclude those with alternate allele depth <20%, a read depth <10, or genotype quality score <20. Variant level information including allele count (AC), allele number (AN), and allele frequencies (AF) were then recalculated for individual ethnicities after sample level exclusions. Remaining variants were annotated using the database of non‐synonymous single‐nucleotide variant functional predictions (dbNSFP) version 4.0b1a (Liu, Wu, Li, & Boerwinkle, 2016), restricting further analysis to non‐synonymous single nucleotide variants (SNVs) in exons and known canonical splice sites.
FIGURE 1

Variant filtering and annotation workflow with summative case and control variant and gene counts. Top. General workflow and filtering parameters applied to data. Bottom. Table summarizing total qualifying variant and gene counts by ethnicity and in total for cases and controls. Variant counts represent the number of unique qualifying variant loci identified in each ethnicity and network. Gene counts represent the number of genes containing at least one qualifying variant in each network and ethnicity. AMR, American Latino; CADD, combined annotation dependent depletion; dbNSFP, database of non‐synonymous single‐nucleotide variant functional predictions; EA, European Americans; FOCM, Folate and One‐Carbon Metabolism; GATK, Genome Analysis Toolkit; GHOS, Glucose Homeostasis and Oxidative Stress; gnomAD, genome aggregation database; MA, Mexican Americans; NFE, Non‐Finnish Europeans; VCF, variant call format.

Variant filtering and annotation workflow with summative case and control variant and gene counts. Top. General workflow and filtering parameters applied to data. Bottom. Table summarizing total qualifying variant and gene counts by ethnicity and in total for cases and controls. Variant counts represent the number of unique qualifying variant loci identified in each ethnicity and network. Gene counts represent the number of genes containing at least one qualifying variant in each network and ethnicity. AMR, American Latino; CADD, combined annotation dependent depletion; dbNSFP, database of non‐synonymous single‐nucleotide variant functional predictions; EA, European Americans; FOCM, Folate and One‐Carbon Metabolism; GATK, Genome Analysis Toolkit; GHOS, Glucose Homeostasis and Oxidative Stress; gnomAD, genome aggregation database; MA, Mexican Americans; NFE, Non‐Finnish Europeans; VCF, variant call format.

Rare variant analysis

Rare variants were defined as having an allele frequency (AF) of <1% in the respective control population (AMR, NFE). Analysis was further restricted to likely damaging variants defined as a stop gain, stop loss, or splice variants, or variants with a combined annotation dependent depletion (CADD) phred score ≥20 as annotated by dbNSFP. A CADD phred score >20 represents the top 1% of predicted damaging severity in all predicted nonsynonymous exonic variants (Rentzsch, Witten, Cooper, Shendure, & Kircher, 2019; Richards et al., 2015). Following a model similar to that presented in Guo et al, false discovery was further limited by including only variants whose positions were covered at a read depth minimum of 10 in 89.5% of both cases and control samples (Guo et al., 2018; Figure 1). For cases, the number of individuals carrying at least one qualifying variant within a gene were counted. For controls, the number of individuals with a qualifying variant in a gene was estimated based on cumulative gnomAD AF and gnomAD population size, assuming Hardy–Weinberg equilibrium. Using an R package, mutational burden analysis by gene was completed using a Fisher Exact Test, comparing the number of case individuals with and without a qualifying variant to the number of estimated control individuals with and without a qualifying variant ((2019) R.C.T. (2019)). Q‐Q plots were generated for the mutational burden analysis and evaluated for a lambda value greater than 1. As our hypothesis is that genes conferring risk for MM will have a greater mutational burden in the cases compared to the controls, results are reported only for genes with an odds ratio (OR) greater than 1. The Bonferroni correction for the number of genes tested for each network was calculated as p < 0.05/837 (5.97 × 10−5) in FOCM and p < 0.05/568 (8.80 × 10−5) in GHOS. Nominally significant genes are those with a p < 0.05.

Human embryonic expression

Embryonic NTD expression as measured by long Serial Analysis of Gene Expression has been previously reported (Krupp et al., 2012). Expression of genes of interest was provided from unpublished data.

Literature search

Boolean searches were conducted in PubMed.gov for nominally significant genes to evaluate for previous association to NTDs. Searches included the gene name and “NTD,” “Neural Tube Defect,” or “myelomeningocele.”

RESULTS

Data filtering

Results of data filtering are summarized in Figure 1. A greater number of variants were identified in the MA population compared to the EA population. On average, an individual in the MA population had a rare damaging variant in 8.2 genes of FOCM and 5.8 genes of GHOS while the EA population had rare damaging variants in 6.8 and 4.9 genes, respectively. Greater than 95% of Mexican Americans had rare damaging variants in 2 or more genes of GHOS, 4 or more genes of FOCM, and 6 or more genes between the two networks, while greater than 95% of European Americans had rare damaging variants in 2 or more genes of GHOS, 3 or more genes of FOCM, and 6 or more genes between the two networks (Figure 2).
FIGURE 2

Number of qualifying variant containing genes by individual. Solid line graphs represent the number of individual cases having the number of genes containing a qualifying variant in the glucose homeostasis and oxidative stress network (GHOS) [blue], folate and one‐carbon metabolism network (FOCM) [orange], and total between the two networks (GHOS + FOCM) [grey] for both Mexican Americans and European Americans. The x‐axis is discontinuous at its upper bound for both graphs as indicated by the hashmarks. Vertical dashed lines represent the 95% population cutoff for each network. Diamond marker represent the population averages for each network.

Number of qualifying variant containing genes by individual. Solid line graphs represent the number of individual cases having the number of genes containing a qualifying variant in the glucose homeostasis and oxidative stress network (GHOS) [blue], folate and one‐carbon metabolism network (FOCM) [orange], and total between the two networks (GHOS + FOCM) [grey] for both Mexican Americans and European Americans. The x‐axis is discontinuous at its upper bound for both graphs as indicated by the hashmarks. Vertical dashed lines represent the 95% population cutoff for each network. Diamond marker represent the population averages for each network.

Mutational burden

No genes reached significance in either network after applying Bonferroni correction for the number of genes tested. Table 1 contains relevant information for genes reaching nominal significance (p < 0.05) including variant quantity with number of cases harboring a qualifying variant and the estimated number of gnomAD controls carrying similar variants. Also included is the gene transcript length, GC percentage, probability of loss‐of‐function intolerance score (pLI), human embryonic Carnegie stage 12/13 neural tube expression data, and known mouse NTD phenotypes (Bult et al., 2019; Krupp et al., 2012). Variants contributing these nominally significant genes are found in Table S1.
TABLE 1

Results for all genes reaching nominal significance for mutational burden analysis using Fisher Exact test.

GeneNetworkEthnicityCase variant countCase individual countControl variant countControl individual estimation p‐ValueOdds ratioConfidence interval (CI)CI rangeTranscript lengthGC percentagepLI scoreNeural tube expression/stageMGI neural tube phenotype
METTL5 FOCMMA5922673.15E‐044.652.24–9.447.21027 bp39.58%2.64E‐03NDND
SLC43A2 FOCMEA4641941.24E‐035.412.28–12.2910.018413 bp54.66%3.77E‐02NDND
BHMT FOCMMA4840722.06E‐033.831.67–8.116.442500 bp41.67%1.86E‐04NDND
CLN3 FOCMEA66411072.32E‐034.752.01–10.78.691913 bp50.61%2.64E‐06NDND
SLC25A32 FOCMMA5535292.69E‐035.92.16–15.4613.33228 bp36.96%2.65E‐05NDYES
SDC1 GHOSEA4514762.79E‐035.562.12–13.4411.333293 bp57.56%6.44E‐01NDND
GPX2 GHOSEA36371143.13E‐034.461.89–108.111298 bp47.98%1.42E‐02NDND
SLC7A1 FOCMMA3433215.30E‐036.52.03–19.3517.327343 bp47.44%8.12E‐01NDND
WDR82 FOCMEA22885.94E‐0320.943.18–101.5898.44292 bp44.88%8.92E‐01NDND
ARNT GHOSMA515682376.77E‐032.21.23–3.792.564928 bp41.21%9.25E‐01NDYES
MC1R FOCMMA1339678447.60E‐031.661.16–2.361.23099 bp58.16%2.50E‐07NDND
PTPRN2 GHOSEA66591418.33E‐033.61.53–7.996.464949 bp52.12%4.07E‐01NDND
ICMT FOCMMA247259.09E‐035.461.73–15.4813.764815 bp49.59%9.49E‐01Caud12>13 2xND
P4HB GHOSMA6755759.22E‐033.21.44–6.895.442578 bp55.79%9.90E‐01NDND
DUOX1 GHOSMA12301516149.52E‐031.731.17–2.571.45679 bp52.90%2.46E‐14NDND
POMC GHOSEA39212931.04E‐022.611.28–5.183.91486 bp51.16%4.03E‐03NDND
SERINC1 FOCMMA3642591.08E‐023.481.45–8.266.813202 bp35.93%1.98E‐01NDND
HELLS FOCMEA37461961.09E‐023.031.39–6.3953287 bp38.96%9.99E‐01Caud12<13 2xND
EDN1 GHOS / FOCMMA221841.15E‐0216.952.26–91.4589.192100 bp42.83%6.59E‐02Caud12<13 2xND
SLC19A2 FOCMMA4541431.19E‐023.971.48–10.268.773638 bp38.72%1.51E‐02Caud12<13 2xND
NR1H4 GHOS / FOCMEA410373531.19E‐022.411.25–4.553.32452 bp39.74%7.41E‐01NDND
GALR2 FOCMEA36241531.20E‐023.321.41–7.636.221357 bp64.59%3.86E‐02NDND
CAP2 FOCMEA1447711.22E‐024.751.57–13.0211.453027 bp42.32%2.78E‐01Caud12<13 2xND
UCP1 GHOSMA78301001.26E‐022.751.22–5.664.441000 bp39.65%1.99E‐08NDND
CSRP3 GHOSMA2532441.30E‐023.881.45–9.998.541464 bp42.88%1.02E‐02NDND
SELENOS GHOSMA110251461.47E‐022.361.2–4.463.271417 bp45.06%NANDND
RPH3AL GHOSEA58322651.67E‐022.561.16–5.214.042754 bp54.25%1.37E‐10Caud12onlyND
PIWIL2 FOCMEA92110810141.73E‐021.791.13–2.821.74340 bp42.78%1.66E‐02Caud12>13 2xND
NSUN2 FOCMMA6782871.86E‐022.761.25–6.14.853372 bp43.29%9.77E‐01Caud12=13 2xND
GATA4 GHOSMA3329171.88E‐0261.49–20.5119.023467 bp48.32%9.43E‐01Caud12<13 2xYES
SHMT2 FOCMMA48651102.05E‐022.51.11–5.13.992357 bp56.68%1.75E‐02Caud12<13 2xND
GSTO1 FOCMEA2213172.10E‐029.861.62–38.737.091057 bp41.43%1.17E‐02NDND
BRF2 GHOSMA58481112.15E‐022.471.1–5.053.962000 bp49.89%4.58E‐10Caud12<13 2xND
PRDX1 GHOSEA3328482.27E‐025.251.38–16.7715.41235 bp47.46%9.49E‐07Caud12<13 2xND
MRAP FOCMEA229182.32E−029.311.53–36.9835.451000 bp46.36%6.02E‐01NDND
STOML2 FOCMMA2529522.37E−023.281.24–8.257.021431 bp53.10%7.58E‐01NDND
GUCY2D FOCMEA58902862.47E−022.371.08–4.813.733623 bp55.58%3.59E‐02NDND
EHMT2 FOCMMA4794932.52E−022.581.17–5.674.54238 bp57.71%5.97E‐02NDYES
AMT FOCMMA411451802.58E−022.111.06–3.912.852202 bp54.51%3.52E‐04Caud12=13 2xYES
ARL6IP5 GHOS / FOCMMA3426362.74E−023.791.22–10.689.462128 bp42.50%3.02E‐01Caud12onlyND
ASZ1 FOCMMA4645742.76E−022.771.16–6.425.261865 bp35.27%4.46E‐06NDND
AVP FOCMMA11202.88E−021.77–∞619 bp67.96%1.85E‐01NDND
EZH1 FOCMEA35331392.89E−023.031.17–7.396.224689 bp45.44%5.04E‐03NDND
SETDB1 FOCMEA58772952.89E−022.31.05–4.663.614446 bp44.16%1.00E+00Caud12<13 2xND
TCN1 FOCMEA3331543.03E−024.661.23–14.7113.481567 bp39.60%1.27E‐15NDND
PSIP1 GHOSEA1448953.04E−023.541.18–9.518.333391 bp37.88%9.73E‐01NDND
GALR1 FOCMMA4538563.07E−023.051.15–7.596.442787 bp43.70%1.06E‐03NDND
GUCA1A FOCMMA3524563.07E−023.051.15–7.596.442331 bp51.23%7.91E‐03NDND
STXBP5L GHOSMA9111081883.18E−022.011.02–3.732.729365 bp36.35%1.00E+00NDND
SLC32A1 FOCMMA4542573.27E−022.991.13–7.446.312574 bp60.86%NANDND
FOLR2 FOCMEA4321563.31E−024.51.18–14.1312.941131 bp48.88%3.53E‐06NDND
ADCY1 FOCMMA4467393.46E−023.491.13–9.738.5912499 bp47.64%1.00E+00NDND
WDR77 FOCMEA2217233.52E−027.281.22–29.2828.062544 bp44.57%6.14E‐01NDND
BHMT2 FOCMEA55301513.88E−022.791.08–6.785.72651 bp43.35%1.22E‐09Caud12onlyND
SLC7A9 FOCMEA45511513.88E−022.791.08–6.785.71785 bp46.62%1.62E‐01NDND
ALAD GHOSMA6636813.93E−022.531.06–5.824.753349 bp51.45%1.72E‐01NDND
FANCC GHOSMA712492133.95E−021.941.06–3.552.494592 bp41.78%4.52E‐08Caud12>13 2xND
SLC46A1 FOCMMA4418413.99E−023.321.08–9.188.16489 bp52.64%8.40E‐02NDND
UBE3A GHOSMA17261044.11E−022.31.05–5.023.985276 bp36.21%9.99E‐01NDND
ADCY10 FOCMEA561192074.22E−022.451.04–5.564.515469 bp41.76%2.31E‐27NDND
SEPSECS FOCMMA3547624.35E−022.751.04–6.785.745645 bp40.63%6.91E‐06NDND
OCA2 FOCMEA714906654.46E−021.791.03–3.092.053140 bp43.49%7.65E‐12NDND
MMACHC FOCMEA67412674.58E−022.211.02–4.633.612901 bp47.99%1.89E‐13NDND
METTL3 FOCMMA2647864.94E−022.381–5.454.452022 bp42.49%1.27E‐02NDND

Gene is common gene name. Network is gene network that gene originated from in analysis. Ethnicity is case ethnicity in which gene reached significance. Variant Count is the number of unique qualifying variant loci found in the gene for the given ethnicity or its matched control. Case individual count is the exact number of case individuals harboring at least one of the identified variants for the gene in this ethnicity. Control individual estimation is the estimated number of control individuals harboring one of the qualifying variants identified as outlined in the Methods. P‐alue, Odds Ratio, Confidence Interval, and CI Range are all products of the statistical output of the R package for Fisher Exact Test. Transcript length, GC percentage, and pLI score are collected from public databases for the canonical transcript for each gene. Neural Tube Expression/Stage reflects if transcript was detected during Carnegie Stages 12 and/or 13 and the comparative contributions. MGI Mouse Phenotype reports annotation of the gene in the mouse genome informatics database with a neural tube phenotype including: open neural tube, abnormal neural tube closure, or abnormal neural tube morphology. OMIM accession numbers and GenBank versions are found in Tables S4 and S5. Abbreviations: CI, Confidence Interval; EA, European Americans; MA, Mexican Americans; MGI, Mouse Genome Informatics (Bult et al., 2019);NA, not available; ND, not detected/not described.

Results for all genes reaching nominal significance for mutational burden analysis using Fisher Exact test. Gene is common gene name. Network is gene network that gene originated from in analysis. Ethnicity is case ethnicity in which gene reached significance. Variant Count is the number of unique qualifying variant loci found in the gene for the given ethnicity or its matched control. Case individual count is the exact number of case individuals harboring at least one of the identified variants for the gene in this ethnicity. Control individual estimation is the estimated number of control individuals harboring one of the qualifying variants identified as outlined in the Methods. P‐alue, Odds Ratio, Confidence Interval, and CI Range are all products of the statistical output of the R package for Fisher Exact Test. Transcript length, GC percentage, and pLI score are collected from public databases for the canonical transcript for each gene. Neural Tube Expression/Stage reflects if transcript was detected during Carnegie Stages 12 and/or 13 and the comparative contributions. MGI Mouse Phenotype reports annotation of the gene in the mouse genome informatics database with a neural tube phenotype including: open neural tube, abnormal neural tube closure, or abnormal neural tube morphology. OMIM accession numbers and GenBank versions are found in Tables S4 and S5. Abbreviations: CI, Confidence Interval; EA, European Americans; MA, Mexican Americans; MGI, Mouse Genome Informatics (Bult et al., 2019);NA, not available; ND, not detected/not described.

FOCM

Of the 835 genes in the FOCM network, 631 had qualifying variants passing filters. 45 of these 631 genes containing variants in the FOCM network reached nominal significance. All genes reaching nominal significance were unique to an ethnicity, 24 in MA and 21 in EA. Of these 7 in the MA and 4 in the EA population are expressed during Carnegie stages 12 or 13 (Table 1). About 48% of MA and 41% of EA cases had a variant in at least one nominally significant gene in the FOCM network (Figures 3 and 4A). Interestingly, 71% of the nominally significant FOCM genes contained non‐private variants, variants found in more than one individual (Table S1).
FIGURE 3

The number of nominally significant genes containing a qualifying variant per individual. Graphs total the number of individual cases having qualifying variants in variable numbers of nominally significant genes of the glucose homeostasis and oxidative stress network (GHOS) [blue], folate and one‐carbon metabolism network (FOCM) [orange], and total between the two networks (GHOS + FOCM) [grey] for both Mexican Americans and European Americans. n provides the population size for each ethnicity.

FIGURE 4

Proportion of qualifying variants and contributing individuals per gene in Folate and One‐Carbon Metabolism network (a) and Glucose Homeostasis and Oxidative Stress (b). Top Rows A and B. Graphs show the total number of qualifying variants (number following “,”) in each nominally significant gene by ethnicity. The central bar graph shows the total number of variants compared between ethnicities (blue bars). Bottom Rows A and B. Similar to top rows, showing the number of individual cases with at least one qualifying variant in the nominally significant genes by ethnicity. The central bar graph shows the total number of affected individuals compared between ethnicities (red bars). Colors are matched between top row and bottom row pairs by ethnicity but not between ethnicities or parts A or B. n provides the population size for each ethnicity.

The number of nominally significant genes containing a qualifying variant per individual. Graphs total the number of individual cases having qualifying variants in variable numbers of nominally significant genes of the glucose homeostasis and oxidative stress network (GHOS) [blue], folate and one‐carbon metabolism network (FOCM) [orange], and total between the two networks (GHOS + FOCM) [grey] for both Mexican Americans and European Americans. n provides the population size for each ethnicity. Proportion of qualifying variants and contributing individuals per gene in Folate and One‐Carbon Metabolism network (a) and Glucose Homeostasis and Oxidative Stress (b). Top Rows A and B. Graphs show the total number of qualifying variants (number following “,”) in each nominally significant gene by ethnicity. The central bar graph shows the total number of variants compared between ethnicities (blue bars). Bottom Rows A and B. Similar to top rows, showing the number of individual cases with at least one qualifying variant in the nominally significant genes by ethnicity. The central bar graph shows the total number of affected individuals compared between ethnicities (red bars). Colors are matched between top row and bottom row pairs by ethnicity but not between ethnicities or parts A or B. n provides the population size for each ethnicity.

GHOS

Of the 568 genes in the GHOS network, 458 had qualifying variants passing filters. About 22 of these 458 genes containing variants in the GHOS network reached nominal significance. All genes reaching nominal significance were unique to an ethnicity, 14 in MA and 8 in EA. Of these 5 in the MA and 2 in the EA population are expressed during Carnegie stages 12 or 13 (Table 1). 41% of MA and 17.5% of EA cases had a variant in at least one nominally significant gene in the GHOS network (Figures 3 and 4B). Overall, 77% of the GHOS genes reaching nominal significance had non‐private variants (Table S1).

Pathway enrichment

To identify broader categories of interest in regards to these two networks, nominally significant genes were annotated with gene ontology (GO). Within FOCM, 17 of the 93 GO terms used to create the gene list had nominally significant genes in EA cases and 27 of 93 in MA cases. Of these, only 7 in EA and 7 in MA had more than one gene with that ontology. Also, of the 33 GO terms in the original GO list for FOCM generation having nominally significant genes, 11 were found in both populations. The GO terms with most nominally significant genes involved methylation/methyltransferase activity, folic acid transport, and amino acid transport (Table S2).

DISCUSSION

Neural tube defects are the second most common complex congenital birth defect and represent a significant financial burden to the medical system. Interventions to date, including federally mandated fortification of cereal grains with folic acid and recommendations for increased folic acid intake prior to conception and during pregnancy, have reduced but not eliminated this disease. We present here the largest sequenced cohort of NTD patients to date and our data provides new genes of interest for evaluation. Furthermore, we propose a unique approach to the use of large publicly available sequencing databases as controls in similar projects. Overall, we identified 64 genes reaching nominal significance in our data. These genes are of scientific interest, particularly when one considers the complex nature of NTDs with likely environmental and multigenic contributions to disease. Review of the available literature could only provide a previously documented association between these genes and NTD in 6 cases: FOLR2, BHMT, SLC25A32, SLC19A2, AMT, and SLC46A1 (OMIM: 136425, 602888, 610815, 603941, 238310, and 611672). Additionally, EHMT2, ARNT, and GATA4 (OMIM: 604599, 126110, and 600576) have been demonstrated in mouse models to confer an NTD phenotype. Thus, we present 55 new genes for consideration as possible risk genes for NTD. Of these, 16 have predicted pLI >0.5 implying relative intolerance to variation and 16 are shown to be expressed during neural tube closure in human. Lastly, four genes are found in both of these groups. SETDB1 and HELLS (OMIM: 604396 and 603946) were significant in the FOCM network of EA and NSUN2 and ICMT (OMIM: 610916 and 605851) in the FOCM network of MA. Of the nine genes previously associated with NTD, two had a pLI >0.5 and three showed expression during human neural tube closure. Only one gene was found in both of these groups, GATA4, which reached significance in the GHOS network for MA. Furthermore, seven of the nine genes were associated with the FOCM network and eight of the nine genes reached significance in the MA cohort. In evaluation of the FOCM network, three gene ontologies were of interest in both ethnicities; methylation/methyltransferase activity, folic acid transport, and amino acid transport. The predominance of both methylation/methyltransferase activity and folic acid transport provide insight to the possible mechanism of persistence of NTDs in the setting of folic acid supplementation. Our findings in these areas of FOCM suggest that either absorption of folate or the subsequent utilization of the single carbon unit provided may play a significant role in NTD cases. Furthermore, alternate gene ontology acquisition suggested that histone methylation in particular may be of interest and nine genes reaching nominal significance have also been implicated in glucose metabolism although only two were included in the GHOS network (data not shown, see below). Findings in the GHOS network showed a preponderance of affect in the MA population out of proportion to the findings in the FOCM network and overall incidence rate of variants between the two populations. Similarly, population studies of the diabetes epidemic in North American showed that Hispanics have the highest prevalence of diabetes. The incidence of disease in this population was the highest of all evaluated ethnicities and nearly twice that seen in non‐Hispanic whites. Furthermore, Hispanics have one of the highest rates of undiagnosed diabetes at nearly 50% (Menke, Casagrande, Geiss, & Cowie, 2015). Taken together, these findings support a role of GHOS network genes in NTD that may be amenable to greater intervention. Three genes met nominal significance and were independently present in both candidate gene lists: ARL6IP5, NR1H4, and EDN1 (OMIM: 605709, 603826, and 131240). ARL6IP5 encodes for PRA1 family protein 3 that is involved in regulating intracellular concentrations of taurine and glutamate, and also implicated in apoptotic signaling in response to oxidative stress. NR1H4 encodes for a bile acid receptor that functions as a receptor for bile acids with downstream transcriptional regulation affecting bile acid synthesis and transport. It has four known isoforms and has been associated with Notch signaling, glucose homeostasis, and intestinal inflammation. EDN1 encodes for Endothelin‐1, part of the endothelin family that is secreted from the cell and acts as a potent vasoconstrictor. It is interesting that no genes of interest were shared between the two populations. However, overlap in common gene ontologies of nominally significant genes between the two ethnicities does support a shared mechanism, although possibly at different points in the same pathway. A large majority of individuals had more than two rare damaging variant each in the FOCM and GHOS networks supporting a hypothesis of added effects of damaging variants in a pathway. However, less than 50% were found to carry rare damaging variants in genes at a frequency greater than that seen in the general population. These findings do suggest that there are other factors at play in the pathogenesis of NTDs. Our results do support the hypothesis that these pathways represent significant burden in NTD risk and pathology, especially when considering that over 70% of the genes presented here had variants that were found in more than one individual. The authors recognize the limitation of the study in the use of public databases rather than simultaneously processed control samples. GnomAD provides control variant frequency for populations similar to our own and the same filtering parameters were applied to our dataset. While genotypic data was not readily available, the use of available allele frequencies and population size assuming Hardy–Weinberg equilibrium provided a reasonable estimation of qualifying variant carrying individuals, based on comparing estimation within our cases to exact counts. Furthermore, if skewing occurred it is an overestimation of control counts, furthering support of the nominally significant findings focused on here. Overall, we have provided a unique approach in the use of publically available datasets as controls in large sequencing projects. This approach is of particular use when studying rare diseases that typically preclude a case‐control study, but must be limited to disorders readily apparent at birth to avoid inclusion in control exome sets. By matching strict data filtering parameters to those of publicly available datasets, we provide a means of harnessing this information in similar risk evaluation studies. Given the abundance of publicly available data, our outline of data filtering and analysis with independent case sets provides a reasonable approach to further evaluation of rare disease. Furthermore, the data set presented here represents the largest sequenced cohort of NTDs to date. Although the approach may exclude some positively associated genes, the results presented here identified genes previously associated with NTDs and provides new genes of interest in established pathways for further investigation.

CONFLICT OF INTEREST

The authors report no conflicts of interest.

AUTHOR CONTRIBUTIONS

Conceptualization—PH, CB, LH, HN, KSA; Data Curation—PH, CB, LH, MB, KSA; Formal Analysis—PH, CB, LH; Funding Acquisition and Project Administration—JH, ACM, HN, KSA; Investigation—PH, CB, LH, MB, JH; Methodology—PH, CB, LH, JH, HN, KSA; Resources—AA‐K, KSA, HN; Software—PH, CB, LH, KSA; Supervision—HN, KSA; Validation—PH, CB, LH, MB, KSA; Visualization—PH, CB, KSA; Writing—Original Draft Preparation—PH, CB; Writing—Review & Editing—PH, CB, LH, MB, AA‐K, JH, ACM, HN, KSA. Table S1‐S5 Click here for additional data file.
  39 in total

Review 1.  Single-site neural tube closure in human embryos revisited.

Authors:  Bernadette S de Bakker; Stan Driessen; Bastiaan J D Boukens; Maurice J B van den Hoff; Roelof-Jan Oostra
Journal:  Clin Anat       Date:  2017-08-29       Impact factor: 2.414

2.  Transcriptome profiling of genes involved in neural tube closure during human embryonic development using long serial analysis of gene expression (long-SAGE).

Authors:  Deidre R Krupp; Pu-Ting Xu; Sophie Thomas; Andrew Dellinger; Heather C Etchevers; Michel Vekemans; John R Gilbert; Marcy C Speer; Allison E Ashley-Koch; Simon G Gregory
Journal:  Birth Defects Res A Clin Mol Teratol       Date:  2012-07-18

3.  Trehalose prevents neural tube defects by correcting maternal diabetes-suppressed autophagy and neurogenesis.

Authors:  Cheng Xu; Xuezheng Li; Fang Wang; Hongbo Weng; Peixin Yang
Journal:  Am J Physiol Endocrinol Metab       Date:  2013-07-23       Impact factor: 4.310

Review 4.  Folate and DNA methylation: a review of molecular mechanisms and the evidence for folate's role.

Authors:  Krista S Crider; Thomas P Yang; Robert J Berry; Lynn B Bailey
Journal:  Adv Nutr       Date:  2012-01-05       Impact factor: 8.701

Review 5.  Epidemiologic and genetic aspects of spina bifida and other neural tube defects.

Authors:  Kit Sing Au; Allison Ashley-Koch; Hope Northrup
Journal:  Dev Disabil Res Rev       Date:  2010

Review 6.  Modeling neural tube defects in the mouse.

Authors:  Irene E Zohn; Anjali A Sarkar
Journal:  Curr Top Dev Biol       Date:  2008       Impact factor: 4.897

Review 7.  Maternal overweight and obesity and the risk of congenital anomalies: a systematic review and meta-analysis.

Authors:  Katherine J Stothard; Peter W G Tennant; Ruth Bell; Judith Rankin
Journal:  JAMA       Date:  2009-02-11       Impact factor: 56.272

8.  Prevalence of and Trends in Diabetes Among Adults in the United States, 1988-2012.

Authors:  Andy Menke; Sarah Casagrande; Linda Geiss; Catherine C Cowie
Journal:  JAMA       Date:  2015-09-08       Impact factor: 56.272

9.  Analysis of protein-coding genetic variation in 60,706 humans.

Authors:  Monkol Lek; Konrad J Karczewski; Eric V Minikel; Kaitlin E Samocha; Eric Banks; Timothy Fennell; Anne H O'Donnell-Luria; James S Ware; Andrew J Hill; Beryl B Cummings; Taru Tukiainen; Daniel P Birnbaum; Jack A Kosmicki; Laramie E Duncan; Karol Estrada; Fengmei Zhao; James Zou; Emma Pierce-Hoffman; Joanne Berghout; David N Cooper; Nicole Deflaux; Mark DePristo; Ron Do; Jason Flannick; Menachem Fromer; Laura Gauthier; Jackie Goldstein; Namrata Gupta; Daniel Howrigan; Adam Kiezun; Mitja I Kurki; Ami Levy Moonshine; Pradeep Natarajan; Lorena Orozco; Gina M Peloso; Ryan Poplin; Manuel A Rivas; Valentin Ruano-Rubio; Samuel A Rose; Douglas M Ruderfer; Khalid Shakir; Peter D Stenson; Christine Stevens; Brett P Thomas; Grace Tiao; Maria T Tusie-Luna; Ben Weisburd; Hong-Hee Won; Dongmei Yu; David M Altshuler; Diego Ardissino; Michael Boehnke; John Danesh; Stacey Donnelly; Roberto Elosua; Jose C Florez; Stacey B Gabriel; Gad Getz; Stephen J Glatt; Christina M Hultman; Sekar Kathiresan; Markku Laakso; Steven McCarroll; Mark I McCarthy; Dermot McGovern; Ruth McPherson; Benjamin M Neale; Aarno Palotie; Shaun M Purcell; Danish Saleheen; Jeremiah M Scharf; Pamela Sklar; Patrick F Sullivan; Jaakko Tuomilehto; Ming T Tsuang; Hugh C Watkins; James G Wilson; Mark J Daly; Daniel G MacArthur
Journal:  Nature       Date:  2016-08-18       Impact factor: 49.962

10.  Mouse Genome Database (MGD) 2019.

Authors:  Carol J Bult; Judith A Blake; Cynthia L Smith; James A Kadin; Joel E Richardson
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

View more
  3 in total

Review 1.  Unraveling the complex genetics of neural tube defects: From biological models to human genomics and back.

Authors:  Paul Wolujewicz; John W Steele; Julia A Kaltschmidt; Richard H Finnell; Margaret Elizabeth Ross
Journal:  Genesis       Date:  2021-10-29       Impact factor: 2.487

2.  Human myelomeningocele risk and ultra-rare deleterious variants in genes associated with cilium, WNT-signaling, ECM, cytoskeleton and cell migration.

Authors:  K S Au; L Hebert; P Hillman; C Baker; M R Brown; D-K Kim; K Soldano; M Garrett; A Ashley-Koch; S Lee; J Gleeson; J E Hixson; A C Morrison; H Northrup
Journal:  Sci Rep       Date:  2021-02-11       Impact factor: 4.379

Review 3.  Multifactorial Rare Diseases: Can Uncertainty Analysis Bring Added Value to the Search for Risk Factors and Etiopathogenesis?

Authors:  Domenica Taruscio; Alberto Mantovani
Journal:  Medicina (Kaunas)       Date:  2021-01-28       Impact factor: 2.430

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.