Literature DB >> 30166421

Targeted Gene Sequencing in Children with Crohn's Disease and Their Parents: Implications for Missing Heritability.

Jiun-Sheng Chen^1,2, Fulan Hu^1,3, Subra Kugathasan⁴, Lynn B Jorde⁵, David Nix⁶, Ann Rutherford⁷, Lee Denson⁸, W Scott Watkins⁵, Sampath Prahalad⁴, Chad Huff^9,2, Stephen L Guthery¹⁰.

Abstract

Crohn's disease is a complex genetic trait characterized by chronic relapsing intestinal inflammation. Genome wide association studies (GWAS) have identified more than 170 loci associated with the disease, accounting for ∼14% of the disease variance. We hypothesized that rare genetic variation in GWAS positional candidates also contribute to disease pathogenesis. We performed targeted, massively-parallel sequencing of 101 genes in 205 children with Crohn's disease, including 179 parent-child trios and 200 controls, both of European ancestry. We used the gene burden test implemented in VAAST and estimated effect sizes using logistic regression and meta-analyses. We identified three genes with nominally significant p-values: NOD2, RTKN2, and MGAT3 Only NOD2 was significant after correcting for multiple comparisons. We identified eight novel rare variants in NOD2 that are likely disease-associated. Incorporation of rare variation and compound heterozygosity nominally increased the proportion of variance explained from 0.074 to 0.089. We estimated the population attributable risk and total heritability of variation in NOD2 to be 32.9% and 3.4%, respectively, with 3.7% and 0.25% accounted for by rare putatively functional variants. Sequencing probands (as opposed to genotyping) to identify rare variants and incorporating phase by sequencing parents can recover a portion of the missing heritability of Crohn's disease.

Entities: Disease Gene Mutation Species

Keywords: Crohn’s disease; NOD2; case-control study; de novo mutation; rare variant

Mesh：

Substances：

Year: 2018 PMID： 30166421 PMCID： PMC6118318 DOI： 10.1534/g3.118.200404

Source DB: PubMed Journal: G3 (Bethesda) ISSN： 2160-1836 Impact factor: 3.154

Crohn’s disease is an idiopathic inflammatory bowel disease (IBD) characterized by chronic and relapsing inflammation of the gastrointestinal tract. Crohn’s disease is believed to be caused by an abnormal immune response to intestinal microbiota in genetically susceptible individuals (Sartor 2006; Sartor 2008; Kostic ; Sartor and Wu 2017). Twin concordance and familial aggregation studies overwhelmingly support a significant genetic contribution to Crohn’s disease (Tysk ; Monsén ; Orholm ; Meucci ; Thompson ; Russel ; Orholm ; Halfvarson ; Reynisdottir ; Guthery ), and genome wide association studies (GWAS) have identified susceptibility loci in affected individuals of European (Duerr ; Hampe ; Parkes ; Rioux ; Wellcome Trust Case Control Consortium 2007; Barrett ; Kugathasan ; Imielinski ; Franke ; Liu ; Jostins ), Asian (Yamazaki ; Umeno ), and African (Brant ) ancestry. Pan-ancestry GWAS have thus far identified more than 170 loci associated with Crohn’s disease (Liu ). These loci are enriched for genes involved in important pathways of Crohn’s disease, including innate immunity, autophagy, and the IL-23-TH17 pathway(Ramjeet ; Corridoni ). For most of the >170 common variant associations implicated in Crohn’s disease, the causal variants responsible for disease association are unknown. Moreover, currently identified variants account for only ∼14% of the variance explained for Crohn’s disease (Barrett ; Franke ; Jostins ). Rare variants are potentially more deleterious than common variation due to the effects of natural selection (Eyre-Walker and Keightley 2007). Although they are individually rare, they may in aggregate contribute to Crohn’s disease and other complex genetic diseases (Eichler ; Veltman and Brunner 2012). Indeed, multiple exome sequencing studies have identified rare and highly penetrant mutations in multiple genes that cause chronic intestinal inflammation (Glocker ; Rivas ; Worthey . ; Alangari ; Christodoulou ; Dinwiddie ; Avitzur ; Dhillon ; Hu ; Okou ; Kelsen ; Hong ). Moreover, the contribution of de novo variation to Crohn’s disease susceptibility is currently unknown. To further explore the contribution of rare variants and de novo mutations to Crohn’s disease susceptibility, we conducted targeted sequencing of 101 genes identified by GWAS in 205 childhood onset patient-offspring trios and 200 unaffected controls. To test for rare variant association in each gene, we conducted gene-based burden tests using VAAST and de novo mutation association tests using VARPRISM(Kong ; Hu ). Both tests incorporate functional variant prediction information to improve statistical power. Collectively, this study not only reaffirms the role of common variation in NOD2 as the primary genetic risk for Crohn’s disease but also defines the role of rare variation in NOD2 in the causation of Crohn’s disease.

Materials and Methods

The purpose of this study was to identify highly penetrant, rare deleterious variants, to estimate the effect size of those variants in combination with common variants, and to estimate the impact of phase on effect size. Measures of effect size included relative risk estimates, percent of variance of disease explained, population attributable risk, and total heritability. Our approach included the identification of Crohn’s-affected children, their parents and a cohort of healthy controls. Prior to sequencing, we used genotyping arrays and principal components analysis to assure that cases and controls were of similar genetic ancestry. We then performed rare variant association tests and single marker association tests. We used logistic regression to estimate percent of variance explained, and meta-analyses to estimate population attributable risk.

Subjects

The study included 205 Crohn’s disease cases with disease onset 18 years or younger. For 179 of these cases, trio sets were obtained by collecting samples from both parents. All subjects were phenotyped by one of three pediatric gastroenterologists with expertise in Crohn’s disease (SLK, LD, SLG). 60.1% of the subjects were male, and the median age of diagnosis was 11 years (see supplementary Figure S1). Controls were selected from healthy adult volunteers who had neither personal nor family history of autoimmune disease and no chronic diarrhea. The control group had a median age at enrollment of 23 years and 59.5% were male. The healthy adult controls have a greater time at risk than affected children, and therefore are less likely to carry disease-causing alleles. Additionally, phlebotomy in healthy children—especially under the age of 12 years–is difficult to justify ethically. Therefore, we deemed healthy adult controls as more appropriate controls. DNA was obtained from whole blood using standard procedures. To identify population structure in cases and controls prior to sequencing, we generated 185,337 raw genotypes obtained for each study subject using the Immunochip genotyping array (Cortes and Brown 2011). To examine stratification in the data, the Immunochip data were merged with unrelated HapMap samples (60 CEU, 60 YRI, and 90 CHB/JPT). The overlap between data sets consisted of 17,119 SNPs. To eliminate platform-specific differences, A/T and G/C SNPs were removed. Additionally, SNPs with a minor allele frequency (MAF) less than 0.05 were removed. There were 13,549 SNPs remaining after all filtering. To select a genetically matched subset of samples for sequencing, we selected samples falling within three standard deviations of the centroid for the CEU HapMap cluster in a principal components analysis (PCA), as indicated by the red circle in Figure S2. The 200 control samples, who had no family nor personal history of autoimmunity, were randomly selected from 500 individuals for whom Immunochip data were available. Figure S2 displays the PCA with the final control set of 200 European Immunochip samples and the HapMap samples.

Targeted capture and sequencing

We selected the coding regions of 97 genes for targeted sequencing based on their proximity to well-established GWAS associations. We also selected STAT1 and STAT4 because of our identification of STAT1 as a cause of IPEX-like enteropathy and as positional candidates in a GWAS (Hu ), and similarly, STAT3 and JAK2 because of their apparent involvement in Crohn’s disease pathophysiology(Willson ). The full list of 101 genes is presented in Table S1. We used Agilent SureDesign to design probes for targeted enrichment using the HaloPlex Target Enrichment System. The targeted regions constituted 1,076 targets and 17,912 amplicons for a total of 497.3 kb of coding sequence. Genomic DNA was diluted and digested using restriction enzymes. DNA fragments were hybridized to HaloPlex probes and sample indexes, captured using streptavidin beads, and ligated into circulated fragments per the manufacturer’s protocol. Target libraries were amplified and purified, and indexed samples were pooled for multiplexed sequencing. Sequencing libraries (25 pM) were chemically denatured and applied to an Illumina HiSeq v4 paired end flow cell using an Illumina cBot. Hybridized molecules were clonally amplified and annealed to sequencing primers with reagents from an Illumina HiSeq PE Cluster Kit v4-cBot (PE-401-4001). Following transfer of the flowcell to an Illumina HiSeq 2500 instrument (HCS v2.2.38 and RTA v1.18.61), a 125-cycle paired-end sequence run was performed using HiSeq SBS Kit v4 sequencing reagents (FC-401-4003).

Bioinformatics and statistical analysis

Variant calling and quality control:

We processed targeted sequencing data from both case and control groups using GATK, which is a toolkit for SNV and inertion/deletion calling (McKenna ). In brief, we used GATK HaplotypeCaller 3.2 to first obtain individual genotype calls, generating one gvcf file for each individual. We then used HaplotypeCaller to combine the gvcf files and jointly recall genotypes for all individuals. We filtered the joint-called variants to the targeted region and recalibrated according to the variants’ quality score (VQSR). We used 99.9 as the VQSR score cutoff for SNVs. We did not apply VQSR filtering to insertions and deletions (Indels) due to their limited number. We excluded all sites with missing genotype rates of 10% or greater in either the cases or controls. In total, 6,323 SNVs, 217 insertions, and 335 deletions were carried forward for subsequent analyses. We tested for cryptic relationships using the relationship inference software KING(Manichaikul ) and identified no first- through third-degree relationships among individuals in the study.

Association test and single-marker tests:

To evaluate evidence for rare variant association, we performed gene-based tests using the Variant Annotation, Analysis and Search Tool (VAAST), version 2.1(Hu ). VAAST incorporates phylogenetic conservation and amino acid substitution information together with allele frequency differences between cases and controls to identify the set of variants in a given gene that are most likely to directly influence disease risk. A positive VAAST score indicates that the combined allele frequency and functional prediction evidence for the variant is sufficient to infer causality based on the Akaike Information Criterion. We constrained the population MAF of each variant in the likelihood model to 0.05 or lower (parameter –r 0.05). We restricted the VAAST analysis to the subset of 63 genes with five or more allele copies among cases and controls, which is the minimum number of allele copies needed to potentially achieve a p value of less than 0.05. For rare variants with MAF less than 1% in the controls, we conducted single-marker tests and estimated variant-specific odds ratios using non-Finnish European controls from the Exome Aggregation Consortium (ExAC) database(Lek ).

Estimating effect size:

Using pre-specified covariates, we modeled the association between variants in NOD2 and Crohn’s disease using logistic regression. We used two models to quantitatively investigate the variance of disease explained by common variants only, and common variants with additional rare variants and parental information. In the first model, we included the following covariates: heterozygosity for each of the L1007fs, R702W, and G908R alleles, and homozygosity for any one of these alleles. These data would be similar to the data obtained from a genotyping study without parental information. In the second model, we included the following covariates: heterozygosity for each of the L1007fs, R702W, and G908R alleles, homozygosity for any one of these alleles, any compound heterozygote (common and rare), and heterozygosity for any of the rare variants we deemed putatively disease-associated. This second model takes advantage of more information obtained by sequencing parents and rare variants obtained by sequencing. For both models, we included R702W because of its widely reported association with Crohn’s disease, although the variant was marginally significant in this study. Because of the quasi-separation noted for rare variants in cases, we used the method of Firth to estimate the maximum likelihood (Firth 1993). We approximated the variance explained from each model by calculating R2 using the method of Tjur (Tjur 2009). For the meta-analysis of R702W, L1007fs and G908R, we included data from 89, 90 and 96 published articles(for R702W, G908 and L1007fs, respectively) with a combined total of 23,703 cases and 22,873 controls, which were published before December 31st, 2016(D’Addabbo ; Schoultz ; Yazdanyar ; Csöngei ; Dassopoulos ; Gazouli ; Glas ; Hoffmann ; Lacher ; Sventoraityte ; Yazdanyar ; Naderi ; Adeyanju ; Azzam ; Hama ; Kanaan ; Akyuz ; Meddour ; Boukercha ; Salkic ), and conducted the analysis in R using the ‘meta’ package. We calculated the odds ratios for the three variants in the meta-analysis using DerSimonian-Laird statistics as a random-effects model.

Population Attributed Risk and total heritability estimate:

We calculated the population attribute risk (PAR) for every potential risk variant using its odds ratio (OR) and allele frequency as the proportion of exposed subjects (Pe) from ExAC using the following formulas to account for the contribution from heterozygotes and homozygotes.(Witte ). We converted the odds ratio to the relative risk (RR) using the Crohn’s disease prevalence (P0) of 34.7 cases per 100,000 children(Kappelman ):In this equation, RRBb and RRBB are relative risk of heterozygotes and homozygotes for putative risk variant B. We calculated RRBB under the assumption of a multiplicative model(Witte ). We calculated the locus-wide PAR and its confidence interval using a multiplicative model rather than sum of individual PARs to obtain a conserved PAR(Natarajan ; Goyal 2013). Finally, we also computed total heritability for putative variants using a web application, INDI-V (http://cnsgenomics.com/shiny/INDI-V/)(Witte ).

De novo mutation analysis:

In addition to the variant-calling procedures outlined above, we also applied filtering criteria for de novo mutation calls(Kong ). We also applied two additional de novo mutation filters, excluding all variants reported in ExAC with MAF greater than 0.5% in any population and all variants present in two or more individuals among our cases and controls. We conducted de novo mutation association analysis using VARPRISM, which integrates functional variant prioritization information to identify de novo mutations influencing disease risk (Hu ) with the mutation rates in SNVs and indels of 1.2x10−8 and 4x10−10 per generation, respectively(Lynch 2010; Ségurel ). Default settings were used for all other VARPRISM parameters.

Data Availability

All sequences are available from the NCBI dbGap under BioProject accession phaXXXXX. Supplemental material available at Figshare: https://doi.org/10.25387/g3.6713156.

Results

Replication of established risk variants and identification of putative novel risk variants

Our gene-based rare variant case-control analysis identified three genes with nominally significant p-values: NOD2, RTKN2, and MGAT3 (Table 1 and Supplementary Table S3.). Only NOD2 was significant after Bonferroni correction (P = 1.53x10−5 x 63 = 0.00096). We identified 14 potential risk variants with positive VAAST scores in NOD2, including three well-established risk variants with MAFs of 1% or greater and 11 novel rare variants with MAFs less than 0.5% (Table 2). In addition to the 11 rare variants in NOD2, we also identified two variants with positive VAAST scores in RTKN2, and three in MGAT3 (Table 2.).

Table 1

VAAST results for genes with nominal p-value less than 0.05

Gene	VAAST p-value	VAAST p-value after Bonferroni correction
NOD2	1.53x10⁻⁵	0.0009639
RTKN2	0.00165	0.10395
MGAT3	0.0147	0.9261

Table 2

Rare coding variants identified in NOD2, RTKN2, and MGAT3

	PolyPhen2 annotation	PolyPhen2 value	SIFT annotation	SIFT value	CADD	Allele frequencyin cases	Allele frequency in ExAC and controls	p-value
NOD2 Variants
R702W	benign	0.008	deleterious	0	24.1	10.05%	3.502%	2.99E-09***
G908R	Probably damaging	1	deleterious	0	31	4.90%	1.407%	2.40E-06***
L1007fs	NA	NA	NA	NA	35	9.07%	2.020%	9.27E-14***
S47L	benign	0.98	deleterious	0.1	21.3	0.24%	0.12%	0.0886
E54K	probably damaging	0.995	deleterious	0.01	33	0.24%	0.003%	0.01825*
D291G	probably damaging	0.873	deleterious	0.01	23	0.25%	0.004%	0.02418*
R393C	probably damaging	0.996	deleterious	0	23.7	0.25%	0.004%	0.02396*
R420H	benign	0.082	tolerated	0.21	13.99	0.25%	0.001%	0.01209*
S431L	probably damaging	0.166	deleterious	0.03	21.4	0.24%	0.002%	0.401
P668fs	NA	NA	NA	NA	22.8	0.25%	0.002%	0.01220*
R791Q	benign	0.006	tolerated	0.22	0.074	0.49%	0.004%	0.2586
R830L	benign	0.006	tolerated	0.13	23.3	0.25%	0.001%	0.01212*
C842R	probably damaging	0.646	deleterious	0	23.4	0.25%	0.004%	0.02406*
C973R	benign	0.086	tolerated	0.36	16.96	0.25%	0.000%	0.00616^**a
RTKN2 Variants
A328fs	NA	NA	NA	NA	22.5	12.01%	16.399%	0.01561*
G573S	benign	0.001	tolerated	1	0.002	0.49%	0.106%	0.07243
MGAT3 Variants
D499H	benign	0.409	tolerated	0.15	18.52	0.73%	0.670%	0.7559
R525W	benign	0.116	deleterious	0.01	22.4	0.25%	0.051%	0.194
R162Q	benign	0.004	tolerated	0.54	9.76	0.25%	0.024%	0.1319

P < 0.05, **P < 0.01, ***P < 0.001; aThe p-value was estimated using the total NFE sample number at the nearby variant (chr16:50759433).

P < 0.05, **P < 0.01, ***P < 0.001; aThe p-value was estimated using the total NFE sample number at the nearby variant (chr16:50759433). Given the rarity of these variants, we did not have sufficient power to evaluate each variant individually among our cases and controls. Therefore, we incorporated allele counts from the non-Finnish European sample (NFE) in the ExAC data together with our controls to conduct single marker association tests (Fisher’s Exact Test). The individual variant results should be interpreted with caution given the potential for systematic bias between cases and controls resulting from both population stratification and heterogeneous sequencing technologies. We identified 11 additional rare variants in NOD2 (S47L, E54K, D291G, R393C, R420H, S431L, P668fs, R791Q, R830L, C842R and C973R) with positive VAAST scores suggesting that these rare variants occur in evolutionarily conserved regions of NOD2 and result in more severe amino acid substitutions. These variants are novel potential Crohn’s disease risk variants, although S431L and R791Q were previously reported as likely benign variants in ClinVar with limited evidence of pathogenicity. Eight of the 11 variants were significant when we performed the single variant test using the allele counts in cases, and controls combined with ExAC data (P < 0.05): E54K, D291G, R393C, R420H, P668fs, R830L, C842R and C973R (Table 2). Although S431L was not significant, its odds ratio (OR:1.97; CI: 0.05-11.4) was comparable with a previous report (OR:1.45; CI:1.1-2.0) (Rivas ). Six of the eight variants were identified as potentially damaging or deleterious by one or more functional prediction tools (Table 2)(Ng and Henikoff 2003; Adzhubei ; Kircher ). The scaled CADD score for each of these six variants was greater than 20, which represents the top 1% of predicted deleterious variants in the human genome. Nine of the 12 NOD2 missense variants identified by VAAST were located within the boundaries of three NOD2 domains: caspase recruitment domain (CARD), neuronal apoptosis inhibitor protein (NACHT) and leucine-rich-repeat (LRR) domain (Figure 1).

Figure 1

Location of risk variants in the NOD2 protein. Green dots indicate the well-established Crohn disease variants; Red dots indicate the potential risk rare variants found in the VAAST result.

Location of risk variants in the NOD2 protein. Green dots indicate the well-established Crohn disease variants; Red dots indicate the potential risk rare variants found in the VAAST result. We observed two protein-coding de novo mutations among the 179 affected offspring, in NOD2 and C11orf30. We tested for enrichment of de novo mutations using VARPRISM (Kong ); both NOD2 and C11orf30 were nominally significant, with p-values of 0.0157 and 0.0175, respectively (Table 3).

Table 3

VARPRISM de novo mutation results

Transcript ID	Gene name	p-value	Number mutations	Genome Coordinate	Reference base	Alternative base
NM_022162	NOD2	0.015682657	1	chr16:50744999	C	T
NM_020193	C11orf30	0.017465753	1	chr11:76248871	C	G

Replication of the common disease risk variant ATG16L1 T300A

Our single-marker tests replicated the association with the common susceptibility variant T300A (rs2241880) in ATG16L1 (P = 0.0085). We also identified a rare coding variant in ATG16L1 with an OR of 2.0, R258Q, although the variant was not statistically significant in the single marker test (Table 2). R258Q is a potential rare risk variant in Crohn’s disease because it was predicted to be a probably damaging variant in PolyPhen2 and had a high CADD score of 27. However, given the limited observations in our cases, further studies are required to understand the role of rare variants in ATG16L1 in Crohn’s disease.

Effect size estimated for R702W, G908R and L1007fs in NOD2

Logistic regression models are shown in Table 4. In the first model, heterozygosity for G908R and L1007fs was associated with a 7.2- and 3.1-fold increased risk of Crohn’s disease, respectively. R702W heterozygosity was nominally associated with Crohn’s disease, and homozygosity for this allele was associated with a 6.5-fold increased risk of Crohn’s disease. The second model incorporated haplotype phase and rare variants. Heterozygosity for L1007fs, G908R and homozygosity for common variants remained statistically associated with Crohn’s disease. Compound heterozygosity was associated with a 6.2-fold increased risk of Crohn’s disease. The coefficient of discrimination (an approximation of R2 for rare events) for model one was 0.074 and for model two was 0.089, suggesting the ability to improve explained variance when incorporating rare variants and phase into explanatory models evaluating the association between variants in NOD2 and Crohn’s disease.

Table 4

Logistic regression for three NOD2 established variants

Variable	Case	Control	OR	Adjusted OR¹	Adjusted p
Model 1: R²= 0.074
L1007fs (heterozygous)	29 (14.2%)	10 (5%)	3.1 (1.5-6.6)	3.1 (1.4-6.5)	0.004**
R702W (heterozygous)	29 (14.2%)	19 (9.5%)	1.6 (0.8-2.9)	1.6 (0.9-3.0)	0.14
G908R (heterozygous)	16 (7.8%)	2 (1.0%)	8.4 (1.9-36.9)	7.2 (1.8-28.7)	0.006**
Any homozygote	12 (5.9%)	2 (1.0%)	6.2 (1.4-27.9)	6.5 (1.6-27.1)	0.01*
Model 2: R² = 0.089
L1007fs (heterozygous)	21 (10.2%)	8 (4%)	2.7 (1.2-6.3)	3.4 (1.5-8.0)	0.004**
G908R (heterozygous)	11 (5.4%)	4 (1%)	5.6 (1.2-25.7)	6.3 (1.5-26.4)	0.01*
R702W (heterozygous)	20 (9.8%)	17 (8.5%)	1.2 (0.6-2.3)	1.6 (0.8-3.2)	0.18
Common homozygote	12 (5.9%)	2 (1.0%)	6.2 (1.4-27.9)	6.8 (1.6-28.3)	0.008**
Compound heterozygote	12 (5.9%)	2 (1.0%)	6.2 (1.4-27.9)	6.8 (1.6-28.3)	0.008**
Rare heterozygote	5 (2.4%)	0 (0%)	11 (0.46-264)^a	14.9 (0.6-360.5)	0.095

P < 0.05, **P < 0.01, ***P < 0.001; a unadjusted OR for rare heterozygote and adjusted OR for all covariates are using the method of Firth (Firth 1993)

P < 0.05, **P < 0.01, ***P < 0.001; a unadjusted OR for rare heterozygote and adjusted OR for all covariates are using the method of Firth (Firth 1993) We further refined the effect size estimates of R702W, G908R and L1007fs by conducting a meta-analysis of our results together with prior studies (Yazdanyar )(see Supplemental Table S4-S6 for lists of studies). The per-allele odds ratios relative to healthy controls for R702W, G908R and L1007fs were 2.23, 2.58, and 3.83, respectively. The odds ratio for compound heterozygotes and homozygotes of the three variants was 7.31.

Population attributable risk and total heritability for NOD2

We evaluated the population attributable risk (PAR) for the 14 coding variants in NOD2 with MAF less than 5% (Table 2). We estimated that the three variants with MAF of 1% or greater (R702W, G908R and L1007fs) collectively confer 30.3% PAR (CI:17.0–45.2%) and 3.1% total heritability to Crohn’s disease. This PAR estimate was comparable with previous estimates from prior estimates ranging from 26.7 to 33.2% (Hugot ; Abreu ; Ahmad ; Brant ). For rare variants with MAF less than 0.5%, we restricted our analysis to the eight variants listed in Table 2 with significantly elevated allele frequencies in our cases relative to ExAC based on Fisher’s Exact Test. Together, these rare variants confer approximately 3.7% additional PAR (CI: 1.18–9.8%) and 0.25% total heritability explained. The total estimated PAR and total heritability for NOD2 variants identified in our study were thus 32.9% and 3.4%, respectively.

Discussion

We performed a sequence-based case-control study in 205 European children with Crohn’s disease incorporating 101 genes. Our study has three major findings. First, we again demonstrate that common and rare variation in NOD2 represents the most important genetic risk factor for Crohn’s disease. Second, by sequencing trios, we identified compound heterozygosity and de novo mutations as potentially important sources for missing heritability. Third, we demonstrate the utility of applying gene burden tests such as VAAST to assess both common and rare variation in a complex genetic trait. Collectively, these data support the concept that the use of gene burden tests for complex traits and methods to incorporate phase such as sequencing parents and probands (as opposed to genotyping) can recover a portion of the “missing heritability” for Crohn’s disease. The mechanistic relationship between gene variants in NOD2 and Crohn’s disease pathogenesis is unclear, but could involve defective epithelial defenses, disruption of Paneth cell function, alterations in the gut microbial community, and/or defects in autophagy (reviewed in (Al Nabhani ; Feerick and Mckernan 2017)). VAAST scored 11 variants in three key domains of NOD2: six in the LRR domain, two in the CARD domain, and three in the NACHT domain, including one de novo mutation. The NACHT domain is the same domain that harbors disease-causing mutations found in the majority of patients with Blau syndrome (Sfriso ), a Mendelian disorder characterized by granulomatous arthritis, iritis and dermatitis. We speculate that patients with Crohn’s disease and mutations in the NACHT domain may represent an important, albeit uncommon, subgroup of patients potentially with phenotypic variability, variation in penetrance estimates for mutations in this region, variation in response to therapy, and/or intestinal disease distribution. Functional studies incorporating biological samples from subjects with NACHT domain variants, as well as a large cohort of such patients, are needed to test these hypotheses. Our data suggest that at least some of the “missing heritability” for Crohn’s disease is explained by rare variants and haplotype phase. Unphased genotyping of the canonical variants in NOD2 were assessed in our first logistic regression model, while we incorporated the phased data obtained from both parents and ascertainment of rare variants by sequencing in our second model, increasing the proportion of explained variance from 0.074 to 0.089. Similarly, PAR and total heritability estimates are increased from 30.3 to 32.9% and from 3.1 to 3.4%, respectively, when incorporating rare variants. We acknowledge that our sample size is too small to fully characterize the contribution of extremely rare genetic variation in Crohn’s disease. However, we show the feasibility of using gene burden tests for Crohn’s disease sequencing studies by demonstrating that VAAST has the ability to uncover statistical association in NOD2 among 63 genes and only 205 cases. We demonstrate the utility of using gene-based association tests rather than variant-based association tests as the field moves toward large-scale exome and whole genome approaches.

89 in total

1. A genetic contribution to inflammatory bowel disease in Iceland: a genealogic approach.

Authors: Inga Reynisdottir; Daniel F Gudbjartsson; Johann Heidar Johannsson; Ileana Manolescu; Kristleifur Kristjansson; Kari Stefansson; Jeffrey Gulcher; Sigurdur Bjornsson
Journal: Clin Gastroenterol Hepatol Date: 2004-09 Impact factor: 11.382

2. LPS-responsive beige-like anchor (LRBA) gene mutation in a family with inflammatory bowel disease and combined immunodeficiency.

Authors: Abdullah Alangari; Abdulrahman Alsultan; Nouran Adly; Michel J Massaad; Iram Shakir Kiani; Abdulrahman Aljebreen; Emad Raddaoui; Abdul-Kareem Almomen; Saleh Al-Muhsen; Raif S Geha; Fowzan S Alkuraya
Journal: J Allergy Clin Immunol Date: 2012-06-19 Impact factor: 10.793

3. A genome-wide association study identifies IL23R as an inflammatory bowel disease gene.

Authors: Richard H Duerr; Kent D Taylor; Steven R Brant; John D Rioux; Mark S Silverberg; Mark J Daly; A Hillary Steinhart; Clara Abraham; Miguel Regueiro; Anne Griffiths; Themistocles Dassopoulos; Alain Bitton; Huiying Yang; Stephan Targan; Lisa Wu Datta; Emily O Kistner; L Philip Schumm; Annette T Lee; Peter K Gregersen; M Michael Barmada; Jerome I Rotter; Dan L Nicolae; Judy H Cho
Journal: Science Date: 2006-10-26 Impact factor: 47.728

4. 'Nodophagy': New crossroads in Crohn disease pathogenesis.

Authors: Mahendrasingh Ramjeet; Séamus Hussey; Dana J Philpott; Leonardo H Travassos
Journal: Gut Microbes Date: 2010-08-08

5. Common NOD2 risk variants in African Americans with Crohn's disease are due exclusively to recent Caucasian admixture.

Authors: Oloruntosin Adeyanju; David T Okou; Clifton Huang; Archana Kumar; Cary Sauer; Courtney Galloway; Mahadev Prasad; Jon Waters; David J Cutler; Michael E Zwick; Tanvi Dhere; Subra Kugathasan
Journal: Inflamm Bowel Dis Date: 2012-03-22 Impact factor: 5.325

6. NOD2/CARD15, ATG16L1 and IL23R gene polymorphisms and childhood-onset of Crohn's disease.

Authors: Maria Gazouli; Ioanna Pachoula; Ioanna Panayotou; Gerassimos Mantzaris; George Chrousos; Nicholas P Anagnou; Eleftheria Roma-Giannikou
Journal: World J Gastroenterol Date: 2010-04-14 Impact factor: 5.742

7. Loci on 20q13 and 21q22 are associated with pediatric-onset inflammatory bowel disease.

Authors: Subra Kugathasan; Robert N Baldassano; Jonathan P Bradfield; Patrick M A Sleiman; Marcin Imielinski; Stephen L Guthery; Salvatore Cucchiara; Cecilia E Kim; Edward C Frackelton; Kiran Annaiah; Joseph T Glessner; Erin Santa; Tara Willson; Andrew W Eckert; Erin Bonkowski; Julie L Shaner; Ryan M Smith; F George Otieno; Nicholas Peterson; Debra J Abrams; Rosetta M Chiavacci; Robert Grundmeier; Petar Mamula; Gitit Tomer; David A Piccoli; Dimitri S Monos; Vito Annese; Lee A Denson; Struan F A Grant; Hakon Hakonarson
Journal: Nat Genet Date: 2008-08-31 Impact factor: 38.330

8. VAAST 2.0: improved variant classification and disease-gene identification using a conservation-controlled amino acid substitution matrix.

Authors: Hao Hu; Chad D Huff; Barry Moore; Steven Flygare; Martin G Reese; Mark Yandell
Journal: Genet Epidemiol Date: 2013-07-08 Impact factor: 2.135

9. Common variants at five new loci associated with early-onset inflammatory bowel disease.

Authors: Marcin Imielinski; Robert N Baldassano; Anne Griffiths; Richard K Russell; Vito Annese; Marla Dubinsky; Subra Kugathasan; Jonathan P Bradfield; Thomas D Walters; Patrick Sleiman; Cecilia E Kim; Aleixo Muise; Kai Wang; Joseph T Glessner; Shehzad Saeed; Haitao Zhang; Edward C Frackelton; Cuiping Hou; James H Flory; George Otieno; Rosetta M Chiavacci; Robert Grundmeier; Massimo Castro; Anna Latiano; Bruno Dallapiccola; Joanne Stempak; Debra J Abrams; Kent Taylor; Dermot McGovern; Gary Silber; Iwona Wrobel; Antonio Quiros; Jeffrey C Barrett; Sarah Hansoul; Dan L Nicolae; Judy H Cho; Richard H Duerr; John D Rioux; Steven R Brant; Mark S Silverberg; Kent D Taylor; M Michael Barmuda; Alain Bitton; Themistocles Dassopoulos; Lisa Wu Datta; Todd Green; Anne M Griffiths; Emily O Kistner; Michael T Murtha; Miguel D Regueiro; Jerome I Rotter; L Philip Schumm; A Hillary Steinhart; Stephen R Targan; Ramnik J Xavier; Cécile Libioulle; Cynthia Sandor; Mark Lathrop; Jacques Belaiche; Olivier Dewit; Ivo Gut; Simon Heath; Debby Laukens; Myriam Mni; Paul Rutgeerts; André Van Gossum; Diana Zelenika; Denis Franchimont; J P Hugot; Martine de Vos; Severine Vermeire; Edouard Louis; Lon R Cardon; Carl A Anderson; Hazel Drummond; Elaine Nimmo; Tariq Ahmad; Natalie J Prescott; Clive M Onnie; Sheila A Fisher; Jonathan Marchini; Jilur Ghori; Suzannah Bumpstead; Rhian Gwillam; Mark Tremelling; Panos Delukas; John Mansfield; Derek Jewell; Jack Satsangi; Christopher G Mathew; Miles Parkes; Michel Georges; Mark J Daly; Melvin B Heyman; George D Ferry; Barbara Kirschner; Jessica Lee; Jonah Essers; Richard Grand; Michael Stephens; Arie Levine; David Piccoli; John Van Limbergen; Salvatore Cucchiara; Dimitri S Monos; Stephen L Guthery; Lee Denson; David C Wilson; Straun F A Grant; Mark Daly; Mark S Silverberg; Jack Satsangi; Hakon Hakonarson
Journal: Nat Genet Date: 2009-11-15 Impact factor: 38.330

10. VARPRISM: incorporating variant prioritization in tests of de novo mutation association.

Authors: Hao Hu; Hilary Coon; Man Li; Mark Yandell; Chad D Huff
Journal: Genome Med Date: 2016-08-25 Impact factor: 11.117

1 in total

Review 1. The missing heritability in type 1 diabetes.

Authors: Haipeng Pang; Jian Lin; Shuoming Luo; Gan Huang; Xia Li; Zhiguo Xie; Zhiguang Zhou
Journal: Diabetes Obes Metab Date: 2022-06-13 Impact factor: 6.408

1 in total