Literature DB >> 29593342

Genetic variants and pathways implicated in a pediatric inflammatory bowel disease cohort.

Subra Kugathasan1, Michael E Zwick2, Kelly A Shaw3, David J Cutler3, David Okou1, Anne Dodd1, Bruce J Aronow4, Yael Haberman5, Christine Stevens6, Thomas D Walters7, Anne Griffiths7, Robert N Baldassano8, Joshua D Noe9, Jeffrey S Hyams10, Wallace V Crandall11, Barbara S Kirschner12, Melvin B Heyman13, Scott Snapper14, Stephen Guthery15, Marla C Dubinsky16, Jason M Shapiro17, Anthony R Otley18, Mark Daly6, Lee A Denson5.   

Abstract

In the United States, approximately 5% of individuals with inflammatory bowel disease (IBD) are younger than 20 years old. Studies of pediatric cohorts can provide unique insights into genetic architecture of IBD, which includes Crohn's disease (CD) and ulcerative colitis (UC). Large genome-wide association studies have found more than 200 IBD-associated loci but explain a minority of disease variance for CD and UC. We sought to characterize the contribution of rare variants to disease development, comparing exome sequencing of 368 pediatric IBD patients to publicly available exome sequencing (dbGaP) and aggregate frequency data (ExAC). Using dbGaP data, we performed logistic regression for common variants and optimal unified association tests (SKAT-O) for rare, likely-deleterious variants. We further compared rare variants to ExAC counts with Fisher's exact tests. We did pathway enrichment analysis on the most significant genes from each comparison. Many variants overlapped with known IBD-associated genes (e.g. NOD2). Rare variants were enriched in CD-associated loci (p = 0.009) and showed suggestive enrichment in neutrophil function genes (p = 0.05). Pathway enrichment implicated immune-related pathways, especially cell killing and apoptosis. Variants in extracellular matrix genes also emerged as an important theme in our analysis.

Entities:  

Mesh:

Year:  2018        PMID: 29593342      PMCID: PMC6162182          DOI: 10.1038/s41435-018-0015-2

Source DB:  PubMed          Journal:  Genes Immun        ISSN: 1466-4879            Impact factor:   2.676


Introduction

Crohn’s disease (CD) and ulcerative colitis (UC) are the most common inflammatory bowel diseases (IBD) and are characterized by chronic remitting and relapsing gastrointestinal inflammation. In the United States, the prevalence of IBD for children (<20 years old) was estimated to be 92 cases per 100,000 in 2009, accounting for approximately 5% of prevalent cases [1]. Increasing prevalence [1] and rates of hospitalization [2] for pediatric IBD have been observed in the US, mirroring the trend of increasing IBD incidence in both pediatric [3, 4] and adult [5] populations worldwide. Diagnosed early in life, pediatric patients face years of medication, surveillance colonoscopy, and a high probability of surgery. Better understanding of disease etiology and progression in this group is therefore vital. IBD is thought to have a strong genetic component, since family history of IBD is the greatest risk factor for disease at all ages. IBD patients with a family history of disease often present at a younger age [6-8], are more likely to experience extra-intestinal manifestations [6], have perforating disease, and require longer follow-up compared to patients without family history [6, 7], likely reflecting an increased genetic liability to disease. Genetic analyses of pediatric cohorts are therefore useful in exploring genetic architecture of IBD. Large genome-wide association studies (GWAS) of IBD have found more than 200 common loci associated with disease [9, 10]. Pathway analysis of associated loci has found an enrichment of immune system genes, especially those related to host response to microbes, and a great deal of overlap with other immune diseases [9]. Findings of studies of common variation in pediatric IBD cohorts generally echo findings in adult populations. One study of greater than 1000 pediatric-onset IBD cases and 1600 controls found slightly increased odds ratios for risk alleles also found in adult populations (including the well-known NOD2), and greater burden of these common variants was weakly correlated with earlier age of onset in CD [11]. A small proportion of disease liability has been explained by common variants in IBD—13.1% in CD and 8.2% in UC [9]—but the contribution of rare variants has not been assessed. This class of genetic variation is important because explosive growth of the human population in recent history has led to a corresponding excess of rare alleles [12], and most variants in protein-coding sequence are at low frequency [13-15]. The availability of public data sets allows us to compare whole-exome sequencing (WES) of a pediatric IBD cohort to other WES data [16] and to large databases containing population allele frequency information [15, 17]. We can further examine pathways implicated by genes annotated to these rare variants to gain greater understanding of IBD.

Results

Study participant characteristics

Relevant demographic and clinical characteristics are shown in Table 1 for the 368 cases with pediatric-onset IBD (<18 years of age at diagnosis) and 625 publicly available controls from the database of Genotypes and Phenotypes (dbGaP) whose data passed our quality control filters and principal components criteria (see Methods and Supplementary Fig. 1). The characteristics of the initial cohort of 517 pediatric-onset IBD cases (see Methods) are also available in Supplementary Table 1.
Table 1

Clinical and demographic characteristics of samples with exome sequencing data used in analysis

IBD cases of European ancestryARRA controls of European ancestryEpi4k controls of European ancestry
Age at participation
 Range0–1718–84Ages not provided, but controls were parents of children with epilepsy
 Median851
 Mean7.352
Gender
 Female152 (41%)118 (56%)223 (53%)
 Male216 (59%)91 (44%)199 (47%)
Diagnosis
 CD281 (76%)
 UC61 (17%)
 IBD-other26 (7%)

Dashes indicate not applicable

Clinical and demographic characteristics of samples with exome sequencing data used in analysis Dashes indicate not applicable

Common variants (MAF>0.05)

Using logistic regression to compare sites with minor allele frequency (MAF) > 0.05 between the 368 pediatric-onset IBD cases and 625 publicly available controls, we found no sites that reached genome-wide significance after genomic control (p < 2E-06, Figure 1 and Table 2). However, 14 out of the top 20 sites were within known CD- or IBD-associated loci (full list of loci from Jostins 2012 [9] and Liu 2015 [10] available as Supplementary Table 2). Nine variants were around the locus containing CARD9, a gene associated with both CD and UC (Supplementary Fig. 2), and three variants were near the locus containing CD-associated NOD2. Two protective variants also appeared at other CD loci in ADAM30 and NOTCH2. Genes annotated to the top 20 sites that also appeared in our list of genes involved in neutrophil function (Supplementary Table 3) included NOD2 and CARD9, which have key roles in anti-bacterial and anti-fungal functions of monocytes and macrophages.
Fig. 1

Manhattan plot of p-values from logistic regression (with significant principal components and sex as covariates) comparing frequency of exome sequencing common variants in pediatric IBD cases to controls from dbGaP

Table 2

Top 20 most significant loci found in our common variant logistic regression

ChromPositionIDAltTypeORGenep-valueAssoc. Diagnosis, StudyNeut. gene list
chr1741227070chr17_41227070ADEL0.1111KRTAP9-26.21E-06
chr1650711288rs2066843TSNP1.57NOD29.60E-06CD, JostinsYes
chr1650710713rs2066842TSNP1.569NOD21.02E-05CD, JostinsYes
chr9136371953rs10781499ASNP1.524CARD92.32E-05IBD, JostinsYes
chr1935488794rs10410228TSNP1.653KRTDAP2.59E-05
chr2062346665rs6143036ASNP1.607LAMA53.27E-05
chr1119895261rs2641348GSNP0.4588ADAM303.37E-05CD, Jostins
chr1119915381rs6685892TSNP0.4627NOTCH24.14E-05CD, Jostins
chr9136372044rs4077515TSNP1.499CARD94.49E-05IBD, JostinsYes
chr1650675812rs6596ASNP1.613SNX204.97E-05CD, Jostins
chr9136395373rs4266763GSNP1.496SNAPC45.02E-05IBD, Jostins
chr9136380752rs3812570CSNP1.49SNAPC46.04E-05IBD, Jostins
chr9136380842rs3812571CSNP1.486SNAPC46.77E-05IBD, Jostins
chr9136384721rs10781510ASNP1.484SNAPC47.74E-05IBD, Jostins
chr9136404141rs1051957GSNP1.451SDCCAG30.00016IBD, Jostins
chr9136477334rs6560632CSNP1.427SEC16A0.00025IBD, Jostins
chr9136432987rs10781542GSNP1.423INPP5E0.00029IBD, Jostins
chr2146246830rs17183220TSNP0.4371MCM3AP-AS10.00029
chr578885600rs1071598TSNP1.558ARSB0.00046
chr12128899303chr12_128899303GDEL0.4497GLT1D10.00047

Dashes indicate not applicable or no.

Manhattan plot of p-values from logistic regression (with significant principal components and sex as covariates) comparing frequency of exome sequencing common variants in pediatric IBD cases to controls from dbGaP Top 20 most significant loci found in our common variant logistic regression Dashes indicate not applicable or no.

Pathway enrichment

Many of the pathways we found in our ClueGO pathway enrichment analysis that were implicated by the top 200 most significant annotated genes were immune-related (Table 3 and Fig. 2). The largest network of significant gene ontology (GO) terms included regulation of production of molecular mediators of immune response, as well as regulation of cytokine and tumor necrosis factor production. Terms related to regulation of leukocyte-mediated immunity, cytotoxicity, and apoptosis were also significant. Other associated pathways related to the theme of cell killing included positive regulation of apoptotic cell clearance and regulation of complement activation. Regulation of keratinocyte proliferation, Ras signal transduction, and muscle cell and neural crest cell development were also implicated.
Table 3

Significantly enriched pathways in the top 200 most significant genes in our common variant (dbGaP) analysis

GO IDGO term% pathway coveredCorrected p-valueAssociated genes found
GO:2000427Positive regulation of apoptotic cell clearance339E-05[C2, C3, CCL2]
GO:0001910Regulation of leukocyte-mediated cytotoxicity9.42E-04[CCL2, HLA-A, LILRB1, RASGRP1, SERPINB4]
GO:0002699Positive regulation of immune effector process4.55E-04[C3, CCL2, GPI, HLA-A, IL2, LILRB1, NOD2, RASGRP1]
GO:0002703Regulation of leukocyte-mediated immunity4.40.001[C3, HLA-A, IL2, LILRB1, NOD2, RASGRP1, SERPINB4]
GO:0055001Muscle cell development4.30.002[ANK2, FHOD3, GPX1, IGSF22, MYPN, RYR1, XK]
GO:0001578Microtubule bundle formation5.80.002[CCDC40, DNAH5, MAP1B, RP1L1, SPAG16]
GO:0002705Positive regulation of leukocyte-mediated immunity5.50.003[C3, HLA-A, IL2, NOD2, RASGRP1]
GO:0048747Muscle fiber development7.10.003[GPX1, MYPN, RYR1, XK]
GO:0014032Neural crest cell development6.90.003[ERBB4, JAG1, LAMA5, RET]
GO:0010927Cellular component assembly involved in morphogenesis5.20.003[ANK2, DAG1, FHOD3, IGSF22, MYPN]
GO:0046487Glyoxylate metabolic process9.70.004[AMT, LDHD, LIAS]
GO:0006081Cellular aldehyde metabolic process4.80.005[AMT, GPI, H6PD, LDHD, LIAS]
GO:0032680Regulation of tumor necrosis factor production4.70.005[CARD9, CCL2, LILRB1, NOD2, RASGRP1]
GO:0010837Regulation of keratinocyte proliferation8.80.005[EPPK1, STXBP4, TGM1]
GO:0001912Positive regulation of leukocyte-mediated cytotoxicity8.60.006[CCL2, HLA-A, RASGRP1]
GO:0030449Regulation of complement activation8.10.007[C2, C3, CFB]
GO:0002700Regulation of production of molecular mediator of immune response4.10.009[GPI, HLA-A, IL2, LILRB1, NOD2]
GO:0045214Sarcomere organization6.80.01[FHOD3, IGSF22, MYPN]
GO:0072676Lymphocyte migration4.80.01[AIRE, CCL2, GCSAML, RET]
GO:2000106Regulation of leukocyte apoptotic process4.30.02[IL2, LILRB1, NOD2, TP53BP1]
GO:0032649Regulation of interferon-gamma production4.20.02[HLA-A, IL2, LILRB1, RASGRP1]
GO:0046579Positive regulation of Ras protein signal transduction5.60.02[ARRB1, NOTCH2, RASGRP1]
GO:0002718Regulation of cytokine production involved in immune response4.20.04[HLA-A, LILRB1, NOD2]
Fig. 2

Pathway enrichment of the genes annotated to the top 200 most significant common genes tested in our logistic regression

Significantly enriched pathways in the top 200 most significant genes in our common variant (dbGaP) analysis Pathway enrichment of the genes annotated to the top 200 most significant common genes tested in our logistic regression

Rare variants (MAF<0.05)

Optimal unified association test (SKAT-O) analysis of rare variants

Using the same IBD and dbGaP cohorts, we tested rare variants with combined annotation dependent depletion (CADD) scores [18] greater than 10 to see if any genes were significantly enriched with these possibly pathogenic variants. The only genome-wide significant gene (p < 2E-05) was the well-known NOD2 (Table 4A). When we tested enrichment of variants in loci associated with IBD, the only significant list was the Crohn’s-disease-associated loci (p = 0.009, Table 4B). We also found a suggestive relationship between case status and rare variants in 144 genes that have been implicated in neutrophil function (p = 0.05, Table 4C).
Table 4A

Top 15 results from SKAT-O analysis of enrichment of rare, likely-pathogenic (CADD > 10) variants in genes with five or more variants

SetIDp-valueNumber of variants included in gene
NOD28.4E-1215
VWA20.00067
HAPLN30.00085
LMF10.0025
SOS10.0025
MAGI20.0027
SRRM20.00213
RGS120.00310
SCAF40.0035
STARD130.0048
RHPN20.0056
D2HGDH0.0056
G6PC20.0056
NR4A10.0055
EFEMP20.0065
Table 4B

SKAT-O analysis for enrichment of rare variants with CADD scores >10 in loci associated with Crohn’s disease (CD), inflammatory bowel disease (IBD), or ulcerative colitis (UC)

SetIDp-valueNumber of variants included in SetID
CD0.009522
IBD0.91849
UC0.7445
Table 4C

SKAT-O analysis for enrichment of rare, conserved variants in neutrophil function genes (NEUT)

SetIDp-valueNumber of variants included in SetID
NEUT0.05413
Top 15 results from SKAT-O analysis of enrichment of rare, likely-pathogenic (CADD > 10) variants in genes with five or more variants SKAT-O analysis for enrichment of rare variants with CADD scores >10 in loci associated with Crohn’s disease (CD), inflammatory bowel disease (IBD), or ulcerative colitis (UC) SKAT-O analysis for enrichment of rare, conserved variants in neutrophil function genes (NEUT) We re-ran the SKAT-O analysis, adding common variants with CADD scores >10 to our list of rare variants. Including these common variants did not greatly impact the significance of genes associated with case status, likely because there were relatively few variants above the CADD score cutoff at 5% frequency or greater. However, including common variants strengthened the enrichment of variants in CD genes (p = 0.004; Supplementary Table 4A) and neutrophil function genes (p = 0.03; Supplementary Table 4B).

Exome Aggregation Consortium (ExAC) rare variant analysis

There was unsurprisingly a great deal of inflation when we performed Fisher’s exact tests comparing rare variant counts between the 368 pediatric IBD patients and aggregate allele frequencies for Caucasian populations in the ExAC database (Supplementary Fig. 3). We therefore limited our analysis to sites that made it past the stringent QC in our dbGaP analysis, and further filtered out sites in ExAC that were most significantly different from our dbGaP controls (see Methods). As seen in Fig. 3, genome-wide inflation was no longer apparent after applying these criteria. Shown in Table 5, six variants were genome-wide significant (p < 6E-07), with the most significant annotated to NOD2. Two other of the top 20 most significant variants were annotated to known IBD loci: one other in NOD2 and one in D2HGDH. Of our list of neutrophil function genes, only NOD2 was among the top 20 most significant rare variants.
Fig. 3

Manhattan plot of p-values from comparing frequency of exome sequencing rare variants in pediatric IBD cases to ExAC after filtering out sites most significantly different between ExAC and our control data set

Table 5

Top 20 most significant sites in our rare variant Fisher’s exact tests

ChromPositionIDAltTypeORGenep-valueAssoc Diagnosis, StudyNeut. gene list
chr1650729867rs796661546GCINS4.42151NOD29.19E-12CD, JostinsYes
chr8100712766chr8_100712766CAINS33.764PABPC19.19E-12
chr11294540chr11_294540GCINS122.892ATHL19.19E-12
chr1324447181chr13_24447181TDEL700.496PARP48.90E-09
chr9101390469chr9_101390469GTAINS172.978MRPL509.39E-08
chr1248273809chr1_248273809CDEL279.84OR2T335.58E-07
chr2144573789rs9977039GSNP5.75281TSPEAR7.43E-07
chr1029462394chr10_29462394ATINSInfSVIL-AS11.59E-06
chr1650722629rs2066845CMULTIALLELIC3.4408NOD24.52E-06CD, JostinsYes
chr456964497rs17087307CSNP0.34222NOA15.34E-06
chr772713798rs146095374ASNP0.25988TYW1B7.50E-06
chr5140822334rs61730632ASNP2.79249PCDHA11.03E-05
chr1473953419rs778985097ATINS10.2459COQ61.50E-05
chr5140875534rs114654172GSNP2.70029PCDHA12.28E-05
chr115544676rs7934354GSNP0.17691OR52H12.76E-05
chr631960262rs11541400GSNP5.1923SKIV2L4.06E-05
chr631728544rs139006870ASNP5.17682DDAH24.17E-05
chr1549588022chr15_49588022CTINSInfFAM227B4.47E-05
chr351995472rs371570896ASNP77.1658RPL296.15E-05
chr2241767780rs143940595ASNP0D2HGDH8.10E-05CD, Liu

Dashes indicate not applicable or no

Manhattan plot of p-values from comparing frequency of exome sequencing rare variants in pediatric IBD cases to ExAC after filtering out sites most significantly different between ExAC and our control data set Top 20 most significant sites in our rare variant Fisher’s exact tests Dashes indicate not applicable or no According to analysis in ClueGO, the top 200 most significant genes in our list of rare variants were enriched in a few pathways (Table 6 and Fig. 4). Immune-response-related hits included negative regulation of the JAK-STAT cascade, modulation by host of viral transcription, and modification by host of symbiont morphology and physiology. Genes were also enriched in pathways involving ion transmembrane transport and negative regulation of axon extension. ToppFun analysis also highlighted genes involved in response to bacterium, regulation of antigen processing and presentation of peptide antigen, immune system development, and biological adhesion pathways (Supplementary Table 5).
Table 6

Significantly enriched pathways using the list of the top 200 most significant genes in our ExAC rare variant analysis

GO IDGO Term% Pathway coveredCorrected p-valueAssociated genes found
GO:0043921Modulation by host of viral transcription110.006[HMGA2, POU2F3, PSG1]
GO:0030517Negative regulation of axon extension120.008[BCL11A, RTN4R, SEMA5A]
GO:0051851Modification by host of symbiont morphology or physiology5.80.01[HMGA2, POU2F3, PSG1, SMC3]
GO:0000288Nuclear-transcribed mRNA catabolic process, deadenylation-dependent decay5.10.01[EIF4A1, EIF4B, PABPC1, SKIV2L]
GO:0015698Inorganic anion transport4.10.01[ABCB11, ANKH, CLCN6, CLCNKB, SLC12A6, SLC26A2, SLC5A5]
GO:0046426Negative regulation of JAK-STAT cascade5.70.02[HMGA2, RTN4R, RTN4RL2]
GO:1902476Chloride transmembrane transport4.30.02[CLCN6, CLCNKB, SLC12A6, SLC26A2]
Fig. 4

Pathway enrichment of the genes annotated to the top 200 most significant rare variants tested in our rare variant analysis

Significantly enriched pathways using the list of the top 200 most significant genes in our ExAC rare variant analysis Pathway enrichment of the genes annotated to the top 200 most significant rare variants tested in our rare variant analysis

Discussion

Our findings echo important aspects of previous genetic and pathway enrichment analyses. Crohn’s-disease-associated loci had a strong showing in our results: two variants in NOD2 were the most significant in our dbGaP common variant analysis, and one site was significant in our ExAC rare variant analysis. NOD2 also emerged as significant in our gene-level SKAT-O analysis, and CD-associated genes as a group were also significant. This was not unexpected since the majority of our cohort were Crohn’s patients. Of the top 20 most significant common variants, 9 were within a single 100 kb region around CARD9 (Supplementary Fig. 2), a gene that has long been associated with IBD. This entire region looks equally associated with disease (OR ~1.5) in our cohort, reflecting that deep sequencing still cannot solve problems regarding fine mapping of causative variants without sufficient recombination. We also found intriguing variants in genes not yet associated with IBD. KRTAP9-2 and KRTDAP, two of our top five common variant findings, are involved in keratinocyte differentiation, a theme that also emerged in our common variant pathway analysis. Keratinocytes are the most abundant component of the epidermis, playing an important role in immunomodulation at the interface between the body and environment. Capable of producing cytokines, these cells have been linked to a different inflammatory disease, psoriasis [19, 20]. Additionally, one recent study found that the interplay of hair follicle development, colonization by commensal microbiota, and local chemokine production in skin was necessary to establish immune tolerance to commensal microbes [21]; dysfunction in the skin environment could potentially impact this process and have systemic immune repercussions. These suggestive findings require replication in future, larger studies of pediatric IBD. LAMA5, another top hit in our common variant analysis, encodes a subunit of laminin. Laminins are extracellular matrix proteins which are a major component of the basement membrane, a matrix of tissue that separates the epithelium, mesothelium, and endothelium from underlying connective tissue. Because of the important role of laminins in the integrity of this layer, there could be a role for LAMA5 in IBD pathogenesis. One study of transgenic mice overexpressing the LAMA5 mouse homolog found an attenuated response to DSS-induced inflammation [22]. The two most significant genes in our SKAT-O rare variant analysis after NOD2, VWA2 and HAPLN3, are also extracellular matrix components. In addition, the location and functions of the products of these genes are linked to integrins, which have emerged as important in large IBD GWAS [23]. And one recent, prospective study of more than 900 CD patients found that stricturing complications were associated with increased expression of extracellular matrix genes in ileal tissue at diagnosis [24]. Further studies are warranted to investigate the roles of these extracellular matrix proteins in disease etiology. We were additionally interested in testing enrichment of rare variants in neutrophil function genes because children with inherited disorders of these classes of immune cells exhibit chronic intestinal inflammation similar to CD during the first decade of life [25, 26]. Similarly, loss of function in monocyte and/or macrophage antimicrobial pathways could be one mechanism of pediatric CD pathogenesis. Though we did not find a significant association, we did find a suggestive relationship in SKAT-O between rare, likely-deleterious variants in genes involved in neutrophil function and case status (p = 0.05). And when likely-deleterious common variants were also included, this association was significant (p = 0.03). Positive regulation of leukocyte-mediated immunity was also one of the most significant pathways in our common variant analysis, supporting further study into the role of phagocyte function and dysfunction in IBD. Another important component of the immune system from our pathway analysis was complement; mutations in C2, C3, and CFB were among the top 200 most significant common variants associated with disease in our cohort. Though research into the role of complement has been somewhat lacking, evidence is growing for its potential relevance in disease pathophysiology (reviewed in [27]). A closely related theme, apoptosis, also appeared in several other significant pathways. Ras signaling was another pathway of interest from our common variant analysis, and SOS1, one of the top hits in our rare variant SKAT-O analysis, is also a guanine nucleotide exchange factor for RAS proteins. In fact, this pathway was previously implicated by a large study drawing from over 30,000 cases and 50,000 controls in contributing to IBD etiology as part of growth factor signaling [28]. Because growth factor deficiencies have been found in patients with IBD, there has been substantial interest in their use as a potential therapeutic agent (reviewed in [29]). Other current targets of therapy that emerged in our analysis include interferon-gamma, a pro-inflammatory cytokine involved in intestinal homeostasis and linked to regulation of IL-23 [30], another cytokine associated not only with IBD but other inflammatory diseases. In our rare variant analysis, we found negative regulation of the JAK-STAT cascade, another important inflammatory pathway targeted by recent therapies [31], which underscores the importance of immune cell response to cytokine signaling in disease. The primary limitation of this study is the lack of in-house controls for comparison to our cases. However, we performed stringent QC of our data to filter differences between data sets. We used the same processing pipeline for dbGaP as we used for our case data, and filtered to an ancestrally similar population. However, systematic calling differences between our pipeline and ExAC, such as calling or filtering of indels, could still be leading to inflation of p-values and odds ratios in our rare variant analysis. We combined CD and UC to leverage the maximum sample size possible to gain further insight into the shared genetic architecture of IBD. However, CD-related variants were enriched in our results, likely because of our CD-majority cohort and the large effect size of associated loci including NOD2. We still found variants in HLA genes, which are most strongly linked to UC, in our results, but these sites did not reach genome-wide significance in our cohort. For example, HLA-A and HLA-C were among the top 200 most significant genes in our logistic case/dbGaP control regression, and were therefore used in ClueGO analysis. While large genome-wide association studies have been performed in IBD, our study is the first to specifically investigate the contribution of rare, likely-damaging variants in pediatric-onset disease. Our findings provide further targets for exploring disease etiology—both at the gene and pathway level. Better understanding of the genetic architecture of IBD can hopefully improve disease prediction and treatment.

Subjects and methods

Ethical approval and recruitment of study participants

Subjects for WES were selected from patients enrolled in the Crohn’s and Colitis Foundation (CCFA) sponsored RISK cohort study and the NIH sponsored Emory African-American gene discovery study, for whom DNA had already been collected. RISK is the largest pediatric CD inception cohort in the world, with 1813 subjects younger than 18 years old with suspected IBD enrolled at 28 North American sites, including Emory University, from November 2008 to June 2012 (ClinicalTrials.gov Identifier: NCT00790543). All patients underwent baseline colonoscopy and histological confirmation of chronic active colitis/ileitis prior to diagnosis and treatment. Once standard and published guidelines were met, patients were diagnosed with CD, UC or IBD-undetermined (IBD-U). A consistent diagnosis of IBD was required during the one-year follow-up for inclusion into this study. At enrollment and during ongoing prospective follow-up, clinical and laboratory data were obtained for each enrolled patient and submitted to a centralized data management center. All patients were managed according to the dictates of their physicians, not by standardized protocols. The patient-based studies were approved by the Institutional Review Boards at each of the RISK sites. Consent was obtained from parents and adult subjects and assent from pediatric subjects age 11 and above.

Emory case sample collection, processing and exome sequencing

Genomic DNA was extracted from whole blood for a total of 567 pediatric IBD samples, of which 553 (97.5%) passed DNA QC. Library preparation and sequencing of the samples were performed at Broad Institute’s Genomics Platform, Cambridge, USA. The libraries were prepared according to the manufacturer's instructions using 1 μg of input DNA per sample. DNA was subjected to whole-exome capture with the SureSelect Human All Exon 50-Mb Kit (Agilent Technologies) following the standard protocols. Library validation was done with the KAPA Library Quantification Kit (KAPA Biosystems) and the whole-exome capture libraries were then sequenced on the Illumina HiSeq platform according to standard protocols.

Publicly available data sets

Database of genotypes and phenotypes (dbGaP) [16] data

We identified and downloaded control data from the Epi4K (accession phs000653.v2.p1) and ARRA (accession phs000298.v3.p2) studies. SRA files were converted to fastq format using NCBI’s SRA Toolkit [32].

ExAC (http://exac.broadinstitute.org/) [15, 17] data (version 0.3.1)

For this publicly available data set containing information on 60,706 individuals, we used liftOver to map all sites to hg38 for comparison with our data. We summed minor and total allele counts for the American, Finnish, and non-Finnish European groups and required a site to be typed in >90% of total chromosomes for these groups (at least 76,438 out of 84,930 chromosomes) for inclusion.

dbGaP (raw whole-exome sequencing) analysis

We mapped Emory and dbGaP exome sequencing fastq files to hg38 using PEMapper and called variants using PECaller [33]. We then used SeqAnt [34] version 2.0 [35] (Beta 3, https://seqant.genetics.emory.edu/) to get rsID numbers for plink and other annotation information for later analysis. All following variant quality control (QC) was performed in PLINK 1.9 [36-38]. Starting with 866,411 variants in 1035 controls and 541 cases diagnosed with IBD before age 18, we filtered samples and variants using increasingly stringent completeness criteria until information for all remaining variants and samples was 99% complete. For each study individually (IBD, ARRA, Epi4k), we removed sites that were Bonferroni significant in a Hardy–Weinberg equilibrium test. We then performed a sex check of samples. Cases were removed if their sex was discordant with record review (N = 9); other mislabeled sexes were corrected. We checked sample relatedness and removed 8 controls and 10 cases who were second degree or more closely related to another study participant. Supplementary Table 1 shows characteristics for the 517 remaining IBD patients who passed this first round of quality control. We combined CD and UC patients because of shared genetic architecture of these diseases and relatively small sample size of either group alone. To adjust for population stratification in our sample we used 10,913 common (minor allele frequency, a.k.a. MAF > 0.05) SNPs to calculate principal components (PCs) using EIGENSTRAT [39] and anchoring with HapMap controls as described by Anderson et al. [40] (Supplementary Fig. 1A). We removed outliers (those with values greater or less than 3 standard deviations away from the mean) for any of the top seven principal components (those which appeared meaningful with eigenvalues >2), recalculated principal components, and repeated outlier filtering with four meaningful PCs, leaving us with a final data set of 625 controls and 368 cases (Supplementary Fig. 1B; Table 1 shows basic characteristics for these participants). PCs were recalculated again without HapMap samples (Supplementary Fig. 1C) and the four principal components significant by Tracy-Widom tests were used as covariates in regressions. As an additional filter, we removed variants that were most significantly different (top 2.5%) in Fisher’s exact tests comparing our dbGaP controls to ExAC.

Common variant analysis

We performed logistic regression for sites with MAF > 0.05 in plink with case/control status as outcome, genotype as predictor of interest, and sex and PCs as covariates. p-Values were corrected with genomic control.

SKAT-O analysis

We used the SKAT-O method within the SKAT package [41] in R [42] to analyze genes annotated to sites with MAF < 0.05 and evidence of pathogenicity with CADD score >10. SKAT-O is an approach that optimizes association tests by unifying burden and sequence kernel association approaches [43]. We tested for association of genes with case/control status for any gene with five or more rare variants. We also lifted over loci associated with IBD from Jostins et al. 2012 [9] and Liu et al. 2015 [10] to hg38, yielding 201 loci, and tested for enrichment of rare variants 250 kb upstream or downstream of CD, UC, or IBD loci as groups (Supplementary Table 2). We also wanted to test whether variants were enriched in neutrophil function genes because strong ileal activation of the immune response including a strong signature for blood CD11b+Ly6-G+neutrophils (GSM854306, p < 6.5E-50) was found using clinical and RNA-Seq data from the CCFA RISK prospective cohort [44]. We next used the GSM854306 from immgen atlas (GSE15907) to retrieve all 409 blood CD11b+Ly6-G+neutrophil genes and combined this with a manually curated, literature-based list of 74 human neutrophil-related genes, including those known to cause CGD and GSD1b. We implemented these two gene lists in ToppCluster [45], cross-validating their association with neutrophil-related genes and pathways based on other annotations of critical neutrophil functions including priming, chemotaxis, adhesion, phagocytosis, oxidative burst, degranulation, microbial killing, and survival (GO, Mouse phenotypes, Diseases). Using this filtering we were able to decrease the original total of 463 neutrophil genes to 144 genes that are associated with CD and known to regulate key neutrophil functions (Supplementary Table 3).

ExAC (aggregate allele count) analysis

Rare variant analysis

Using the same set of variants as in the dbGaP analysis (with sites most significantly different between dbGaP and ExAC filtered out), we used Fisher’s exact tests to compare rare variant sites (MAF < 0.05) between our IBD cases and ExAC. Genomic control was used to correct p-values.

Pathway enrichment analysis

To test for pathway enrichment, we used the ClueGO plugin version 2.3.3 for Cytoscape version 3.4.0. We performed right-sided hypergeometric tests for enrichment of level 3 to 8 biological process GO terms (using the Human GO database from 25 January 2017) with Benjamini–Hochberg p-value correction for multiple tests. GO Term Fusion was used to reduce pathway redundancy. For common and rare variants, the top 200 most significant genes were used to interrogate pathway enrichment in our sample. This threshold was picked so that ClueGO input did not have duplicate genes and was consistent across common and rare variant comparisons. All genes in the common variant analysis had p-values ≤0.01, while those in the rare variant analysis had p ≤ 0.002. We also used ToppFun, from the ToppGene Suite of bioinformatics tools, to perform functional enrichment analysis. While we only used biological process terms with ClueGO, ToppFun pulls annotation information from GO, human and mouse phenotype data, gene expression, protein interaction and pathway databases [46].

Data availability

Raw sequencing data for individuals with inflammatory bowel disease included in this study are publicly available on dbGaP. Study accession: phs001076.v1.p1, URL: https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001076.v1.p1. Supplemental Figure Legends(DOCX 13 kb) Supplemental Figure 1(TIF 382 kb) Supplemental Figure 2(TIF 154 kb) Supplemental Figure 3(TIF 163 kb) Supplemental Table 1(XLSX 11 kb) Supplemental Table 2(XLSX 21 kb) Supplemental Table 3(XLSX 13 kb) Supplemental Table 4(XLSX 11 kb) Supplemental Table 5(XLSX 1714 kb)
  10 in total

1.  Interpretable network-guided epistasis detection.

Authors:  Diane Duroux; Héctor Climente-González; Chloé-Agathe Azencott; Kristel Van Steen
Journal:  Gigascience       Date:  2022-02-04       Impact factor: 6.524

2.  Eicosatetraynoic Acid and Butyrate Regulate Human Intestinal Organoid Mitochondrial and Extracellular Matrix Pathways Implicated in Crohn's Disease Strictures.

Authors:  Ingrid Jurickova; Erin Bonkowski; Elizabeth Angerman; Elizabeth Novak; Alex Huron; Grayce Akers; Kentaro Iwasawa; Tzipi Braun; Rotem Hadar; Maria Hooker; Sarah Han; David J Cutler; David T Okou; Subra Kugathasan; Anil Jegga; James Wells; Takanori Takebe; Kevin P Mollen; Yael Haberman; Lee A Denson
Journal:  Inflamm Bowel Dis       Date:  2022-07-01       Impact factor: 7.290

Review 3.  Extra-Adrenal Glucocorticoid Synthesis in the Intestinal Mucosa: Between Immune Homeostasis and Immune Escape.

Authors:  Asma Ahmed; Christian Schmidt; Thomas Brunner
Journal:  Front Immunol       Date:  2019-06-25       Impact factor: 7.561

4.  Replication of Crohn's Disease Mucosal E. coli Isolates inside Macrophages Correlates with Resistance to Superoxide and Is Dependent on Macrophage NF-kappa B Activation.

Authors:  Ahmed Tawfik; Paul Knight; Carrie A Duckworth; D Mark Pritchard; Jonathan M Rhodes; Barry J Campbell
Journal:  Pathogens       Date:  2019-06-08

5.  Microbial imbalance in inflammatory bowel disease patients at different taxonomic levels.

Authors:  Mohammad Tauqeer Alam; Gregory C A Amos; Andrew R J Murphy; Simon Murch; Elizabeth M H Wellington; Ramesh P Arasaradnam
Journal:  Gut Pathog       Date:  2020-01-04       Impact factor: 4.181

Review 6.  Advanced Understanding of Monogenic Inflammatory Bowel Disease.

Authors:  Ryusuke Nambu; Aleixo M Muise
Journal:  Front Pediatr       Date:  2021-01-22       Impact factor: 3.418

Review 7.  A Review on Inflammatory Bowel Diseases: Recent Molecular Pathophysiology Advances.

Authors:  Maheeba Abdulla; Nafeesa Mohammed
Journal:  Biologics       Date:  2022-09-12

8.  Genome-wide DNA Methylation in Treatment-naïve Ulcerative Colitis.

Authors:  Hagar Taman; Christopher G Fenton; Inga V Hensel; Endre Anderssen; Jon Florholmen; Ruth H Paulssen
Journal:  J Crohns Colitis       Date:  2018-11-15       Impact factor: 9.071

Review 9.  Integrating omics for a better understanding of Inflammatory Bowel Disease: a step towards personalized medicine.

Authors:  Manoj Kumar; Mathieu Garand; Souhaila Al Khodor
Journal:  J Transl Med       Date:  2019-12-13       Impact factor: 8.440

10.  Host Genetic and Gut Microbial Signatures in Familial Inflammatory Bowel Disease.

Authors:  Yoo Min Park; Eunji Ha; Ki-Nam Gu; Ga Young Shin; Chang Kyun Lee; Kwangwoo Kim; Hyo Jong Kim
Journal:  Clin Transl Gastroenterol       Date:  2020-07       Impact factor: 4.396

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.