Literature DB >> 28422189

Exome Analysis of Rare and Common Variants within the NOD Signaling Pathway.

Gaia Andreoletti1, Valentina Shakhnovich2,3, Kathy Christenson2, Tracy Coelho1,4, Rachel Haggarty5, Nadeem A Afzal4, Akshay Batra4, Britt-Sabina Petersen6, Matthew Mort7, R Mark Beattie4, Sarah Ennis1.   

Abstract

Pediatric inflammatory bowel disease (pIBD) is a chronic heterogeneous disorder. This study looks at the burden of common and rare coding mutations within 41 genes comprising the NOD signaling pathway in pIBD patients. 136 pIBD and 106 control samples underwent whole-exome sequencing. We compared the burden of common, rare and private mutation between these two groups using the SKAT-O test. An independent replication cohort of 33 cases and 111 controls was used to validate significant findings. We observed variation in 40 of 41 genes comprising the NOD signaling pathway. Four genes were significantly associated with disease in the discovery cohort (BIRC2 p = 0.004, NFKB1 p =  0.005, NOD2 p = 0.029 and SUGT1 p = 0.047). Statistical significance was replicated for BIRC2 (p = 0.041) and NOD2 (p = 0.045) in an independent validation cohort. A gene based test on the combined discovery and replication cohort confirmed association for BIRC2 (p = 0.030). We successfully applied burden of mutation testing that jointly assesses common and rare variants, identifying two previously implicated genes (NFKB1 and NOD2) and confirmed a possible role in disease risk in a previously unreported gene (BIRC2). The identification of this novel gene provides a wider role for the inhibitor of apoptosis gene family in IBD pathogenesis.

Entities:  

Mesh:

Substances:

Year:  2017        PMID: 28422189      PMCID: PMC5396125          DOI: 10.1038/srep46454

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.379


Inflammatory bowel disease (IBD) is an umbrella term for a group of complex and multifactorial illnesses: Crohn’s disease (CD), ulcerative colitis (UC) and inflammatory bowel disease unclassified (IBDU)1. The etiology of IBD is multi-genic and environmentally triggered, but generally accepted to occur as a result of an inappropriate immune response to the normal gut flora in genetically predisposed individuals2. Since the discovery of NOD2 in 2001 as the first susceptibility gene for IBD3, over 200 loci have been associated with IBD risk in humans through genome wide association studies (GWAS)45. GWAS have provided substantial insight into the understanding of the biology of complex diseases by providing robust and replicated evidence for autophagy2, immune response2 and bacterial recognition2 patterns. However, an intrinsic limitation of these studies is their focus on common variation, typically those with a minor allele frequency (MAF) ≥5% in the general population. The combined contribution of these common mutations to IBD heritability only account for 13.6% of CD and 8.2% of UC, respectively6. It is hypothesized that low frequency (MAF of 0.05–5%) and rare (MAF ≤ 0.05%) variation may contribute significantly towards some fraction of the missing heritability of IBD678. Recent technological advances in DNA sequencing have made it possible to sequence large tracts of the genome in a cost-effective manner. This has enabled large-scale studies of the impact of rare variants on complex diseases9. Whole-exome (WES) and whole genome sequencing (WGS) have improved the understanding of genetic cause of diseases by revealing variants not captured by GWAS10. It is estimated that ~85% of disease-causing mutations reside within the coding regions of the genome11. Therefore, targeting these expressed regions of the genome represents the most cost-effective means to uncover causal disease genes12. Pediatric onset IBD (pIBD) presents with unique phenotypic characteristics and pronounced severity compared to adult-onset disease13. PIBD is more often characterized by extensive intestinal involvement, rapid early progression and a high rate of resistance to conventional therapy1. Moreover, early-onset IBD has a stronger familial component than adult disease1. These combined features indicate a stronger genetic component to pIBD compared to IBD diagnosed in adulthood. GWAS are powered to assess common genetic variation in large patient cohorts that are often composed of adults, in order to amass sizeable patient groups. Large cohorts of patients with disease onset in childhood are less easily ascertained and also likely enriched for rare or private variation of large effect14. Approximately 300 genes have been prioritized within the 200 loci determined through adult studies and only less than half have been replicated in a small number of pediatric studies1516. To date, 51 genes have been associated with monogenic disease manifesting in an early onset IBD-like phenotype1718. Homozygous mutations in the interleukin 10 receptor (IL10) gene and its associated receptor alpha and beta subunits (IL10RA and IL10RB) have been associated with children presenting very-early-onset IBD (VEO, age of onset <6 years)1920. The discovery of the disease causal mutations helped to personalize treatments inducing a sustained remission in the patients1920. The investigation of a child with intractable IBD using whole-exome analysis by Worthey et al.21 found a hemizygous mutation in the gene X-linked inhibitor of apoptosis (XIAP). The same mutation was confirmed in the asymptomatic mother. Based on these findings, this patient underwent hematopoietic stem cell progenitor transplantation with a resolution of symptoms and sustained remission following this targeted treatment approach21. XIAP belongs to the in inhibitor of apoptosis protein (IAP) family (comprising XIAP, BIRC2 and BIRC3) that plays a role in regulation of the innate immune response through ubiquitin ligase activity, TNF survival, inflammatory and death signaling pathways2223. IAP proteins mediate the downstream signaling of pattern-recognition receptors such as NOD1 and NOD2 after response to bacterial pathogens24. The NOD signaling pathway, Fig. 1, is involved in gram-negative and gram-positive peptidoglycan recognition. NOD1 and NOD2 proteins are highly conserved cytoplasmatic receptors that sense microbial effectors. Activation of NOD receptors leads to downstream activation of multiple molecules including mitogen-activated protein kinases (MAPK) and nuclear factor kappa-light-chain-enhancer of activated B cells (NF-κB)25.
Figure 1

Proteins acting within the NOD signaling pathway.

The recognistion of NOD1 and NOD2 of the bacterial peptidoglycan (PGN) promotes the formation of the multi-protein complex named inflammasome. The complex recruits the kinase receptor interacting protein 2 (RIP2), which is ubiquitinated by the ubiquitin ligases XIAP, BIRC2 and BIRC3 proteins. The polyubiquitinated RIP2 recruits the kinase TAK1 and TAK binding protein 1 (TAB1), TAB2 and TAB3 which leads to the activation of the MAPK kinases, p38 and c-Jun N-terminal kinase (JNK) through the activation of mitogen-activated protein kinase kinase (MKK). RIP2 polyubiquitinated also interacts with the IKK complex (IKKα, IKKβ and NEMO). The IKK complex mediates the phosphorylation of the IKKβ subunit of IKK by TAK1 and results in the phosphorylation and degradation of the NF-κB inhibitor (IκBα) which results in the cytoplasmic release and translocation of NF-κB dimers p65 and p50 in the nucleus to activate of the expression of the NF-κB proinflammatory genes.

In this study, we hypothesize that rare and private genetic variation across genes involved in the NOD signaling pathway may contribute to childhood onset IBD. We interrogate WES data to extract all genetic variation across the frequency spectrum in a pIBD cohort and evaluate the joint effect of rare and common variants with a gene-based statistical test (SKAT-O26). We further validate our findings in an independent cohort.

Results

PCA procedure removed 10 cases and 20 controls reducing the final number of cases to 136 and controls to 106 within the discovery cohort (Supplementary Figure 1). The analysis revealed no outliers in the replication cohort. (Supplementary Figures 2 and 3). Mutations were identified in either cases and/or controls in all but one gene (CCL5) from the NOD signaling pathway in the discovery cohort (ncases = 136 and ncontrols = 106). A total of 250 variants (Supplementary Table 2) that occurred in at least one individual (either case or control) across 41 genes were called in order to extract and create the VCF file for all 242 individuals. We observed 67 novel variants, 94 rare variants with a MAF 1000 genome project (1 KG) <1%, 41 low frequency mutations (1% ≤ MAF1 KG ≥ 5%) and 48 common mutations (MAF1 KG > 5%). The majority of these variants would not have been detected or interrogated using array technology or traditional association studies.

Variants within the NOD2 gene

Across the 126 pIBD cases and 85 controls within the discovery cohort, we observed 31 mutations over 12 exons of the NOD2 gene. Of these, 26 had a MAF <0.05 across the cohort (Table 1). Eight mutations were identified in or proximal to the caspase recruitment (CARD) domain, 16 in the nucleotide-binding oligomerization (NBD) domain and seven in the leucine-rich (LRR) domain (Fig. 2). In addition to the known IBD biomarkers, Arg702Trp, Gly908Arg and Leu1007fsinsC327, we observed two novel variants, 20 rare (MAF1 KG < 0.01), two low frequency (0.01 ≤ MAF1 KG ≤ 0.05) and four common mutations (MAF1 KG > 0.05) (Table 1). Ten of the 26 mutations were annotated as deleterious by SIFT and 13 are described in HGMD as pathogenic28. Twenty six (out of 31) mutations observed would not have been assessed in any GWAS due to their rarity.
Table 1

List of 31 NOD2 variants observed across the discovery cohort.

Bp position (hg19)Vari-antCoding changeProtein changeProtein domainSIFT pred-ictionGerpMax-Ent scoreCA-DDdb-SNPFrequ-ency in 1 KGFrequ-ency in NH-LBI ESPFrequ-ency in ExacHG-MDFrequency in cases (n = 136)Frequency in controls (n = 106)
Homo-zygous Refe-renceHetero-zygousHomo-zygous Alter-nativeHomo-zygous Refer-enceHetero-zygousHomo-zygous Alter-native
5073-3392spc.74–7T > A Adjacent to CARD  1.83 rs1048-954210.00140.00-18610.00001listed1000.990.010
5073-3423nsc.98C > Ap.A33DCARDT1.13 1.40-2571 0.000008 0.00001not listed0.990.010100
5073-3661snc.336C > Tp.A112ACARD     0.00002 0.00003not listed1000.990.010
5073-3785nsc.460G > Ap.D154NCARDT0.958 0.41-0941rs146-054564 0.00-20930.00064not listed1000.990.010
5073-3859snc.534C > Gp.S178SCARD    rs206-70850.260.40-93020.334not listed0.390.540.070.470.380.15
5074-1791nsc.566C > Tp.T189MCARDT3.48 2.334-887rs6175-51820.00140.00-44190.00259listed0.990.010100
5074-1800nsc.575C > Tp.A192VCARDD0.916 0.88-5309rs1490-711160.00004 0.00005not listed1000.990.010
5074-1858snc.633C > Tp.A211ACARD    rs574-32690.00090.00-17440.0008not listed1000.990.010
5074-4624nsc.802C > Tp.P268SNBDT−9.98 −0.27-189rs206-68420.120.26-9070.184not listed0.430.470.10.560.350.09
5074-4688nsc.866A > Gp.N289SNBDD4.56.0.44-4188rs574-32710.010.00-62790.00425listed0.990.0100.980.020
5074-4850nsc.1028T > Cp.L343PNBDD5.4 0.51-7926  0.00-01160.00001not listed1000.990.010
5074-5114nsc.1292C > Tp.S431LNBDD3.64 0.85-1472rs1048-954310.00050.00-13950.00082listed0.990.010100
5074-5199snc.1377C > Tp.R459RNBD    rs206-68430.130.27-09930.185not listed0.420.480.10.50.40.1
5074-5316snc.1494A > Gp.E498ENBD        not listed1000.990.010
5074-5511snc.1689C > Tp.Y563YNBD    rs1116-084290.0005 0.00007not listed0.990.010100
5074-5583snc.1761T > Gp.R587RNBD    rs186-|17590.250.40-25580.328not listed0.40.540.070.470.390.14
5074-5655snc.1833C > Tp.A611ANBD    rs617-369320.00460.01-06980.00983not listed0.990.010100
5074-5751snc.1929C > Tp.L643LNBD     0.000008 0.00001not listed1000.990.010
5074-5926nsc.2104C > Tp.R702WNBDD2.42 1.73-6582rs206-|68440.020.043-4880.023listed0.880.10.010.870.130
5074-5929nsc.2107C > Tp.R703CNBDD2.89 1.78-8325rs574-32770.00230.00-69770.00002listed0.980.020100
5074-5960nsc.2138G > Ap.R713HNBDT4.13 2.22-5724rs1048-95483 0.00-02330.00034listed0.990.010100
5074-6086nsc.2264C > Tp.A755VNBDD5.12 1.22-5314rs6174-76250.00050.00-46510.00231listed0.990.010100
5074-6199nsc.2377G > Ap.V793MNBDD3.51 1.54-4959rs1048-954440.00050.00-16280.00105listed0.990.0100.980.020
5074-6228snc.2406G > Tp.V802VNBD   1.92-838rs1048-95495 0.00-1860.00196not listed0.990.0100.990.010
5074-6291spc.2462 + 7G > T LRR  0.83 rs202-1118130.00050.00-05810.00016not listed0.990.010100
5075-0842nsc.2587A > Gp.M863VLRRT−9.48  rs1048-95447 0.001860.0012listed0.990.010100
5075-6540nsc.2722G > Cp.G908RLRRD5.56  rs206-68450.010.01-45350.00992listed0.960.0400.960.040
5075-6571nsc.2753C > Ap.A918DLRRD5.56  rs1048-954520.00090.00-08140.0004listed1000.990.010
5075-7276nsc.2863G > Ap.V955ILRRT−9.14  rs574-32910.050.096-0470.061listed0.830.1700.810.190
5075-9405nsc.2888A > Gp.E963GLRRT5.29      not listed0.990.010100
5076-3778frc.3019-dupCp.L1007fsLRR    rs206-68470.006  listed0.90.090.010.990.010

ns, non-synonymous; sn, synonymous; fi, Wframeshift insertion, fd, frameshift deletion; sp, splicing; nfi, non-frameshift insertion; nfd, non-frameshift deletion, sp, splicing. CARD, caspase recruitment domain; NBD, Nucleotide-binding oligomerization domain; LRR, leucine-rich domain. B, benign; C, Conservative; D, deleterious; MC, moderately Conservative; MR, moderately Radical; NR, not reported; P, possibly damaging; R, radical; T, tolerated.

Figure 2

NOD2 gene and protein.

NOD2 is a gene composed of 12 exons (black rectangles). The NOD2 protein consists of two N-terminal caspase activation recruitment (CARD) domains, a central nucleotide-binding oligomerization (NBD) domain and a terminal sequence rich in leucine (LRR). The CARD domains interact with RIP2 protein to activate the immune response in the gut and the leucine-rich domain recognizes the bacterial peptidoglycan. Mutations within the NBD have been shown to increase the inflammatory cascade. The 31 mutations observed by interrogating exome data from 136 pIBD and 106 controls are depicted using arrows and the corresponding protein change noted. Known IBD biomarkers are in red.

Gene based burden of mutation testing in the discovery cohort

The gene-based test for assessing the combined association of novel, rare and common mutation with disease status showed significant evidence for association with four genes across the discovery cohort (BIRC2, NFKB1, NOD2, and SUGT1 see Table 2). NFKB1 (p = 0.005) and NOD2 (p = 0.029) are known IBD associated genes. SUGT1 is a previously unreported gene but has borderline significance only (p = 0.047). Combined variation in BIRC2 is more significantly associated (p = 0.004) with IBD in our discovery cohort than any other genes. This gene has not been previously implicated by association studies.
Table 2

Joint variant test (SKAT-O) result for the 41 genes within the NOD signaling pathway in which variations was found across the entire discovery cohort.

GeneChromosomebp position (hg19)Total number of samples (136 cases; 106 controls)Fraction of individuals who carry rare variants under the MAF thresholds (MAF < 0.05)*Number of all variants defined in the group fileNumber of variant defined as rare (MAF < 0.05)*P-value unadjusted
BIRC211102220918–1022484102420.07851660.004
NFKB14103488139–1035376722420.119831090.005
NOD21650733392–507637782420.2148831250.029
SUGT11353231709–532619362420.33058650.047
MAPK112250703796–507063812420.07024750.061
CARD6540841561–408534042420.09091080.074
BIRC311102195774–1022018502420.05371660.075
TAB26149699333–1497308462420.07024660.117
IKBKB842128942–421884892420.06198660.249
ERBB2IP565307924–653722002420.173551390.292
CASP82202122956–2021498642420.0909860.319
MAPK122250691870–506996682420.1983516130.35
TAB3X30849697–308778012420.02066540.362
CHUK10101964267–1019803552420.02892640.38
TNFAIP36138196066–1382023782420.06198770.653
NOD1730487954–304965182420.0826417130.657
TRIP67100465128–1004692232420.074381190.783
TAB12239795831–398325162420.02479760.795
MAPK13636098410–361071312420.02892980.799
RIPK2890770315–908026112420.07438540.803
CARD99139258615–1392665192420.1694212100.866

Only genes in which at least five variants were entered into the model are shown.

*These variants received different weights in the SKAT-O joint test. Genes are ordered by p-value.

Replication of the gene based burden of mutation test in the validation cohort

We aim to conduct a replication analysis of the four gene identified as significant in the discovery phase using a replication cohort (ncases = 33; ncontrols = 111). A total of 13 variants were identified across the regions sequenced in the NFKB1, BIRC2, NOD2 gene. No variant was observed in SUGT1 in the replication cohort and therefore SKAT-O test was not conducted on this gene. SKAT-O test showed independent statistical association for BIRC2 (p = 0.041) and NOD2 (p = 0.045) but was not powered to detect significant association for NFKB1 (p = 0.223), Table 3. The gene based test on the combined discovery and replication cohort (n cases  =  169 and n controls  =  217) showed statistical association for NOD2 (p = 0.011), NFKB1 (p = 0.017) and BIRC2 (p =  0.030), Table 3.
Table 3

SKAT-O test result for the four significant genes within the NOD signaling pathway in which variations was found across the replication cohort only and across the combined discovery and replication cohort.

GeneChromosomebp position (hg19)DatasetTotal number of samplesFraction of individuals who carry rare variants under the MAF thresholds (MAF < 0.05)*Number of all variants defined in the group fileNumber of variant defined as rare (MAF < 0.05)*P-value unadjusted
BIRC211102219940–102249151Replication cohort (33 cases; 111 controls)1440.11806320.041
Combined replication and validation cohort (169 cases; 217 controls)3860.04663110. 030
NOD21650733859–50763778Replication cohort (33 cases; 111 controls)1440.11111830.045
Combined replication and validation cohort (169 cases; 217 controls)3860.041451420.011
NFKB14103505961–103514658Replication cohort (33 cases; 111 controls)1440.02777210.223
Combined replication and validation cohort (169 cases; 217 controls)3860.05699210.017

*These variants received different weights in the SKAT-O joint test. Genes are ordered by p-value.

Discussion

Since 2005 next generation sequencing (NGS) has proven to be an effective technology for the study of rare and low frequency mutations within disease-associated genes29. More than 100 types of Mendelian disorders have been studied using WES with a diagnostic rate of success of 25–30%30. This success represents a substantially higher rate than that afforded by classical clinical genetic testing such as karyotyping (<5%) or array comparative genomic hybridization (~15–20%)30 The combination of traditional genetic testing and WES/WGS technology has rapidly accelerated the discovery of new disease-associated genes underlying Mendelian traits: from an average of 166 per year between 200530 and 2009 to 236 per year between 2010 and 201430, with the numbers increasing every year. WES/WGS has made gene discovery for all phenotypes feasible and cost effective30. The rapid growth and success of the next generation sequencing technologies in Mendelian traits has brought a great interest in their application to complex traits. WES and WGS have enable diagnosis and alternative treatment in patients with monogenic IBD18. In our study we applied WES and the SKAT-O statistical test on a discovery cohort of 242 individuals. We conducted the analysis with no assumption with regard to IBD diagnosis (CD, UC or IBDU) because in half of the families recruited in the study we observed mixed diagnoses reflecting the substantial genetic overlap between IBD subtypes. Although our data were derived from whole exome sequencing, we did not conduct SKAT-O on all gene across the exome due to our modest sample size. Instead, we targeted our analysis to all 41 genes across the NOD signaling pathway removing the requirement of an exome-wide significance threshold31. We chose to select the most significantly associated genes and to replicate their significance in an independent replication cohort. A limitation of the replication analysis was the use of data gleaned from different sources. Although an established method to take into account such differences is not yet available3233, we minimized bias by analyzing only variants that occurred in the regions common to all capture kits. Despite a modest cohort size, we detected significant association in four genes and replicated significant association for two genes (NOD2 and BIRC2). NOD2 is the earliest gene implicated in IBD pathogenesis and the most strongly associated in association studies with IBD34. Polymorphisms within NOD2 are known to increase the risk of developing CD.35 NOD2 patient carriers of one of the three allelic biomarker variants have an increased risk of developing CD: heterozygous carriers have a 2–4-fold increased risk of CD, while homozygous or compound heterozygous carriers have a 20–40-fold increased risk34.The association for NOD2 was solely driven by the three known biomarkers (Table S3). BIRC2 (Fig. 3) belongs to a gene family (XIAP, BIRC2 (also known as cIAP1) and BIRC3 (also known as cIAP2)) encoding three conserved proteins characterized by the presence of 1-3 baculovirus IAP repeat (BIR) motifs36. XIAP is located on the X chromosome while BIRC2 and BIRC3 are both located on chromosome 11. Several studies have demonstrated the importance of these genes in regulating the expression of proinflammatory cytokines, such as TNFα, through NF-kB and MAPK pathways primarily through their ubiquitin-ligase activity. XIAP, BIRC2 and BIRC3 are key players in regulating the NOD1 and NOD2 signaling pathway by directly promoting RIPK2 ubiquitylation and they facilitate activation of NF-kB pathway to promote cell survival37. Cellular studies on BIRC2, BIRC3 and XIAP deficient macrophages were defective for MAPKs and NF-kB activation2338. This defect in the NOD signaling was also further observed in vivo in BIRC2, BIRC3 and XIAP knockout murine IBD models38. BIRC2 and BIRC3 are inhibitors of the Fas signaling cascade in human intestinal cell line23. The expression profile of BIRC3 was further investigated in 14 UC patients indicating an overexpression in colonic specimens during disease flares39. Additional studies on the interleukin (IL)-11 expression suggested a possible protective role of IAP, indicating that an over-expression of the IAP proteins could promote healing of the gut40. It is therefore feasible that mutations within these genes might impact gut healing and contribute to flares in IBD. Six variants within BIRC2 were observed in the discovery cohort across 15 cases and 4 controls. Three of these were novel (p.112_113del, p.S154A and p.G517E), two were rare (p.K516E and p.S318S) and one was low frequency (p.A506V,). Across the 15 cases (four with CD, four with IBDU and seven with UC), four were diagnosed aged <6 years, seven had a positive family history for IBD and nine were diagnosed with a second autoimmune condition other than IBD. While our observed enrichment of variation within BIRC2 directly implicates this gene in pediatric IBD, further functional analyses are necessary for a comprehensive understanding of the role of individual variants in this protein and their wider impact on the signaling pathway. While mutations in XIAP are known to cause up to 4% of male early onset IBD, it is has been postulated that BIRC2 and BIRC3 might contribute to IBD pathogenesis by regulating the inflammatory cascade through their ubiquitin-ligase activity, our findings are the first to directly implicate this genes in pIBD41.
Figure 3

BIRC2 gene and protein.

BIRC2 is a gene composed of 9 exons (black rectangles). BIRC2 encodes an inhibitor of apoptosis protein, which contributes to innate immune responses by acting as inhibitor of cell death. All the members of the inhibitor of apoptosis (IAP) gene family share three tandem specific motifs: BIR belonging to the zinc-finger domain that mediates protein-protein interaction, a CARD domain is involved in CARD-CARD mediated interaction; and a C-terminal RING domain conferring an E3-ubiquitin ligase activity. The RING domain of BIRC2, BIRC3, and XIAP is required for the ubiquitin activity of the IAPs. Studies have reported that the CARD domain in BIRC2 and BIRC3 act as an inhibitor of the ubiquitin ligase activity. Mutations within the BIR1 domain in BIRC2 alters molecular interaction with TNF receptor associated factor 1 (TRAF1) and TRAF2. The six mutations found by interrogating exome data from 136 pIBD and 106 controls are depicted using arrows and the corresponding protein change shown.

Novel drugs that mimic the natural endogenous inhibitor of the IAP (the mitochondria-derived activator of caspases, SMAC) have been proposed to suppress the pro-inflammatory immune response in the gastro-intestinal tract for patient with moderate to severe disease activity42. It is possible that patients harboring BIRC2 mutations may benefit from new treatments targeting IAP expression and function. Further studies are required to assess the role of targeted therapy in the clinical management of these patients.

Conclusions

A gene based burden of mutation test for association using sequencing data on a small cohort have supported the involvement of NFKB1 and NOD2 in the pathogenesis of IBD and have confirmed a role for BIRC2 in the pathogenesis of disease. This is the first study highlighting the role of BIRC2 in IBD through targeted exome sequencing.

Methods

Ethics statement

This study was approved by the Southampton and South West Hampshire Research Ethics Committee (REC) (09/H0504/125) and University Hospital Southampton Foundation Trust Research & Development (RHM CHI0497). This study was approved by the Institutional Review Board (IRB) at The Children’s Mercy Hospital (IRB #15050179). All methods were conducted in accordance with the relevant guidelines and regulations. Written informed consent was obtained for every participant.

Cases and samples

For the discovery cohort, patients were recruited through pediatric gastroenterology clinics at University Hospital Southampton (UHS), a regional center providing tertiary pediatric gastroenterology and endoscopy service for the Wessex region in Southern England. Written informed consent was provided by an attending parent or legal guardian for all pediatric recruits. All children aged <18 at the point of diagnosis were eligible for recruitment to the study. The mean age of the cohort was 10.97 years (min 1–max 17 years). Diagnosis was established using the Porto criteria. Clinical data were recorded for each patient including family history of IBD and any history of autoimmune disease. We accessed control samples through our local database of germline exome sequence data for 126 unrelated patients with no inflammatory-related disease. We used an independent replication cohort derived from the Children’s Mercy Kansas City IBD cohort and the Critical Assessment of Genome Interpretation (CAGI, 2013)43 dataset to validate significant results from the discovery cohort. The Children’s Mercy Kansas City cohort consists of 43 whole-exome individuals of which 13 are independent IBD patients and 1 control; the CAGI dataset is composed of 66 whole-exomes datasets (in VCF format) of which 20 are unrelated adult CD patients and 8 are unrelated healthy controls. We merged 102 additional whole-genome control samples of British ethnicity from the 1 KG phase 3 dataset44 resulting in the retention of 33 unrelated cases and 111 independent controls for subsequent analysis in the validation cohort.

Discovery cohort DNA extraction

Genomic DNA for each of the Southampton patients undergoing exome sequencing was extracted from saliva or peripheral venous blood samples collected in EDTA using the salting out method. DNA concentration was estimated using the Qubit 2.0 Fluorometer and α260:280 ratio calculated using a nanodrop spectrophomter. The average DNA yield obtained was 150 μg/ml and approximately 20 ug of each patient DNA was extracted for next generation sequencing.

Whole-exome sequencing data generation and analysis

For the Southampton discovery and Children’s Mercy Kansas City cohort, whole-exome capture was performed using Agilent SureSelect Human all Exon 51 Mb (versions 4 and 5) capture kits and TruSeq Expanded Exome and Nextera Expanded Exome capture kits. Capture technology is characterized by rapid progress, including new content and improved probe design, and we applied the optimal capture chemistry available at the time of sample sequencing. All samples were sequenced on the Illumina HiSeq 2000 and HiSeq 2500 platforms. As previously described45, fastQ raw data generated from Illumina paired-end sequencing protocol were aligned against the human genome reference 19 using Novoalign (2.08.02). SAMtools mpileup tool (samtools/0.1.19) to call SNPs and short indels. Variants called with a read depth <4 were excluded. The Phred software reads DNA sequencing trace files, calls bases, and assigns a quality value to each called base and is powered to discriminate between correct and incorrect base-calls. To minimize the false positive rate for the called bases, only variants called with high confidence (Phred score >20) were retained for further analysis (99% base call accuracy). ANNOVAR (annovar/2013Feb21) was then applied for variant annotation. Genetic variants were annotated as “novel” if they were not previously reported in the dbSNP137 databases, 1000 Genomes Project (1 KG) and the Exome Variant Server (EVS) of European Americans of the NHLI-ESP project with 6500 exomes, or in the Southampton database of reference exomes. Resultant variant call files for each individual were subjected to further in-house quality control tests to detect DNA sample contamination and ensure sex concordance by assessing autosomal and X chromosome heterozygosity. Variant sharing between all pairs of individuals was assessed to confirm that subjects were not related. Sample provenance was confirmed by application of a validated SNP tracking panel developed specifically for exome data46. For the CAGI subgroup of the replication cohort, whole-exome sequencing was performed using the TruSeq capture kit and sequenced on Illumina platforms. Alignment against the human genome (hg19) was conducted with BWA. PICARD was used to remove duplicate reads and GATK for genotype calling. The VQSR method was used to identify true polymorphisms in the samples rather than those due to sequencing, alignment, or data processing artefacts43.

Gene selection

Genes involved in the NOD receptor pathway were extracted by interrogating the KEGG Pathway database47. The pathway (KEGG ID: hsa04621) is composed of 56 genes, of which 41 are intrinsic to NOD signaling. Gene names were cross-referenced with the HUGO webserver to confirm the approved gene symbol (Supplementary Table 1). All good quality (Depth ≥ 4 and Phred ≥ 20) variants within these genes were extracted using local scripts and retained for analyses. SKAT-O statistical test was then performed on the 41 genes directly involved in the NOD1 and NOD2 signaling cascade.

Principle component analysis

Whole-exome sequencing data were available for 146 independent children diagnosed with IBD within the discovery cohort. Demographic data for the IBD cohort are shown in Table 4.
Table 4

Patient demographics for 146 pediatric IBD patients that underwent whole-exome sequencing.

 CDUCIBDU
n903224
Male (%)57 (63.3)18 (56.25)8 (33.3)
Mean age in years (range)11.25 (2–17)9.97 (1–15)11.17 (2–16)

CD, Crohn's disease; UC, ulcerative colitis; IBDU, inflammatory bowel disease unclassified.

In order to minimize bias for association analysis, we conducted a principle component analysis (PCA) using the SNPRelate48 package on the discovery and validation cohort to exclude non-Caucasian samples. PCA was conducted on the whole discovery dataset merged with the 1,092 subjects from the 1,000 genome phase 1 dataset (20101123) in order to discriminate ethnic clusters. PCA was applied to 1363 samples with 305,950 biallelic SNPs. The same PCA procedure was conducted on the validation cohort using a combined set of CAGI and 1,000 genome phase 1 data (209,029 biallelic SNPs across 1158 samples) and on the combined Kansas and 1,000 genome phase 1 data (224, 786 biallelic SNP across 1134 samples) to discriminate ethnic clusters.

Variant calling and quality control

Next generation sequencing pipelines typically identify genomic locations at which any given sample differs from the human genome reference sequence on a case-by-case basis. After compiling the list of all variants identified in all cases and controls it was necessary to positively re-call the genotypic state (for the full set of all variants from all samples) in order to distinguish allelic genotypic status from missing data for each individual. The resultant genotypes were used for further analysis. Variants were excluded using vcftools if they deviated significantly from Hardy-Weinberg equilibrium status (p < 0.001) in the control group. Samples with a genotype missing call rate >95% were also excluded. VCF files containing genotypic information for all cases and controls were merged and annotated. To detect association between genetic variant and disease status, a gene-based test (the sequence kernel association optimal unified test26, SKAT-O) was performed using the EPACTS software package49 in the discovery cohort. SKAT-O test was further conducted on the replication cohort to validate significant results from the discovery cohort.

Burden of mutation testing in the discovery cohort

SKAT-O statistical test was applied to further investigate the joint effect of rare and low frequency variants. Specifically, SKAT-O encompasses both a burden test and a SKAT test to offer a powerful means of conducting association analyses on combined rare and common variation as single variant tests are often underpowered due to the large sample size needed to detect a significant association. To conduct the test, a group file with mutations of interests (synonymous, non-synonymous, splicing, frameshifts and non-frameshifts, stop gain and stop loss) was created for each of the 41 genes. SKAT-O was executed with the small sample adjustment, by using a MAF threshold of 0.05 to define rare variations within the sample size and using default weights26.

Burden of mutation testing in the validation cohort

As the validation cohort comprises of whole-exome and whole-genome subjects, only variants falling within the consensus target region were considered. By limiting variants assessed to only those found in the genomic regions captured by both technologies, we limited the potential for bias when using data from two different capture technologies. Variant sites across the four genes requiring replication were used to generate a subset of the VCF file for each dataset. Ultimately, VCF files for all individuals were merged and annotated. SKAT-O testing was conducted using the same settings applied in the discovery cohort. SKAT-O testing was further conducted using the same approach on the combined discovery and replication cohorts (ncases = 169 and ncontrols = 217).

Additional Information

How to cite this article: Andreoletti, G. et al. Exome Analysis of Rare and Common Variants within the NOD Signaling Pathway. Sci. Rep. 7, 46454; doi: 10.1038/srep46454 (2017). Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
  47 in total

1.  KEGG: kyoto encyclopedia of genes and genomes.

Authors:  M Kanehisa; S Goto
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

Review 2.  Genetics and pathogenesis of inflammatory bowel disease.

Authors:  Bernard Khor; Agnès Gardet; Ramnik J Xavier
Journal:  Nature       Date:  2011-06-15       Impact factor: 49.962

Review 3.  What can exome sequencing do for you?

Authors:  Jacek Majewski; Jeremy Schwartzentruber; Emilie Lalonde; Alexandre Montpetit; Nada Jabado
Journal:  J Med Genet       Date:  2011-07-05       Impact factor: 6.318

Review 4.  IAPs, regulators of innate immunity and inflammation.

Authors:  Yann Estornes; Mathieu J M Bertrand
Journal:  Semin Cell Dev Biol       Date:  2014-04-06       Impact factor: 7.727

5.  CARD15/NOD2 mutational analysis and genotype-phenotype correlation in 612 patients with inflammatory bowel disease.

Authors:  Suzanne Lesage; Habib Zouali; Jean-Pierre Cézard; Jean-Frédéric Colombel; Jacques Belaiche; Sven Almer; Curt Tysk; Colm O'Morain; Miquel Gassull; Vibeke Binder; Yigael Finkel; Robert Modigliani; Corinne Gower-Rousseau; Jeanne Macry; Françoise Merlin; Mathias Chamaillard; Anne-Sophie Jannot; Gilles Thomas; Jean-Pierre Hugot
Journal:  Am J Hum Genet       Date:  2002-03-01       Impact factor: 11.025

6.  XIAP variants in male Crohn's disease.

Authors:  Yvonne Zeissig; Britt-Sabina Petersen; Snezana Milutinovic; Esther Bosse; Gabriele Mayr; Kenneth Peuker; Jelka Hartwig; Andreas Keller; Martina Kohl; Martin W Laass; Susanne Billmann-Born; Heide Brandau; Alfred C Feller; Christoph Röcken; Martin Schrappe; Philip Rosenstiel; John C Reed; Stefan Schreiber; Andre Franke; Sebastian Zeissig
Journal:  Gut       Date:  2014-02-26       Impact factor: 23.059

Review 7.  Vedolizumab for the treatment of ulcerative colitis and Crohn's disease.

Authors:  Leon P McLean; Terez Shea-Donohue; Raymond K Cross
Journal:  Immunotherapy       Date:  2012-09       Impact factor: 4.196

8.  Loci on 20q13 and 21q22 are associated with pediatric-onset inflammatory bowel disease.

Authors:  Subra Kugathasan; Robert N Baldassano; Jonathan P Bradfield; Patrick M A Sleiman; Marcin Imielinski; Stephen L Guthery; Salvatore Cucchiara; Cecilia E Kim; Edward C Frackelton; Kiran Annaiah; Joseph T Glessner; Erin Santa; Tara Willson; Andrew W Eckert; Erin Bonkowski; Julie L Shaner; Ryan M Smith; F George Otieno; Nicholas Peterson; Debra J Abrams; Rosetta M Chiavacci; Robert Grundmeier; Petar Mamula; Gitit Tomer; David A Piccoli; Dimitri S Monos; Vito Annese; Lee A Denson; Struan F A Grant; Hakon Hakonarson
Journal:  Nat Genet       Date:  2008-08-31       Impact factor: 38.330

9.  A global reference for human genetic variation.

Authors:  Adam Auton; Lisa D Brooks; Richard M Durbin; Erik P Garrison; Hyun Min Kang; Jan O Korbel; Jonathan L Marchini; Shane McCarthy; Gil A McVean; Gonçalo R Abecasis
Journal:  Nature       Date:  2015-10-01       Impact factor: 49.962

10.  Common variants at five new loci associated with early-onset inflammatory bowel disease.

Authors:  Marcin Imielinski; Robert N Baldassano; Anne Griffiths; Richard K Russell; Vito Annese; Marla Dubinsky; Subra Kugathasan; Jonathan P Bradfield; Thomas D Walters; Patrick Sleiman; Cecilia E Kim; Aleixo Muise; Kai Wang; Joseph T Glessner; Shehzad Saeed; Haitao Zhang; Edward C Frackelton; Cuiping Hou; James H Flory; George Otieno; Rosetta M Chiavacci; Robert Grundmeier; Massimo Castro; Anna Latiano; Bruno Dallapiccola; Joanne Stempak; Debra J Abrams; Kent Taylor; Dermot McGovern; Gary Silber; Iwona Wrobel; Antonio Quiros; Jeffrey C Barrett; Sarah Hansoul; Dan L Nicolae; Judy H Cho; Richard H Duerr; John D Rioux; Steven R Brant; Mark S Silverberg; Kent D Taylor; M Michael Barmuda; Alain Bitton; Themistocles Dassopoulos; Lisa Wu Datta; Todd Green; Anne M Griffiths; Emily O Kistner; Michael T Murtha; Miguel D Regueiro; Jerome I Rotter; L Philip Schumm; A Hillary Steinhart; Stephen R Targan; Ramnik J Xavier; Cécile Libioulle; Cynthia Sandor; Mark Lathrop; Jacques Belaiche; Olivier Dewit; Ivo Gut; Simon Heath; Debby Laukens; Myriam Mni; Paul Rutgeerts; André Van Gossum; Diana Zelenika; Denis Franchimont; J P Hugot; Martine de Vos; Severine Vermeire; Edouard Louis; Lon R Cardon; Carl A Anderson; Hazel Drummond; Elaine Nimmo; Tariq Ahmad; Natalie J Prescott; Clive M Onnie; Sheila A Fisher; Jonathan Marchini; Jilur Ghori; Suzannah Bumpstead; Rhian Gwillam; Mark Tremelling; Panos Delukas; John Mansfield; Derek Jewell; Jack Satsangi; Christopher G Mathew; Miles Parkes; Michel Georges; Mark J Daly; Melvin B Heyman; George D Ferry; Barbara Kirschner; Jessica Lee; Jonah Essers; Richard Grand; Michael Stephens; Arie Levine; David Piccoli; John Van Limbergen; Salvatore Cucchiara; Dimitri S Monos; Stephen L Guthery; Lee Denson; David C Wilson; Straun F A Grant; Mark Daly; Mark S Silverberg; Jack Satsangi; Hakon Hakonarson
Journal:  Nat Genet       Date:  2009-11-15       Impact factor: 38.330

View more
  7 in total

1.  A genome-wide case-only test for the detection of digenic inheritance in human exomes.

Authors:  Gaspard Kerner; Matthieu Bouaziz; Aurélie Cobat; Benedetta Bigio; Andrew T Timberlake; Jacinta Bustamante; Richard P Lifton; Jean-Laurent Casanova; Laurent Abel
Journal:  Proc Natl Acad Sci U S A       Date:  2020-07-27       Impact factor: 11.205

2.  A gene-based recessive diplotype exome scan discovers FGF6, a novel hepcidin-regulating iron-metabolism gene.

Authors:  Shicheng Guo; Shuai Jiang; Narendranath Epperla; Yanyun Ma; Mehdi Maadooliat; Zhan Ye; Brent Olson; Minghua Wang; Terrie Kitchner; Jeffrey Joyce; Peng An; Fudi Wang; Robert Strenn; Joseph J Mazza; Jennifer K Meece; Wenyu Wu; Li Jin; Judith A Smith; Jiucun Wang; Steven J Schrodi
Journal:  Blood       Date:  2019-02-27       Impact factor: 22.113

3.  Mutations in fetal genes involved in innate immunity and host defense against microbes increase risk of preterm premature rupture of membranes (PPROM).

Authors:  Bhavi P Modi; Maria E Teves; Laurel N Pearson; Hardik I Parikh; Hannah Haymond-Thornburg; John L Tucker; Piya Chaemsaithong; Nardhy Gomez-Lopez; Timothy P York; Roberto Romero; Jerome F Strauss
Journal:  Mol Genet Genomic Med       Date:  2017-08-23       Impact factor: 2.183

4.  Deleterious Genetic Variation Across the NOD Signaling Pathway Is Associated With Reduced NFKB Signaling Transcription and Upregulation of Alternative Inflammatory Transcripts in Pediatric Inflammatory Bowel Disease.

Authors:  James J Ashton; Konstantinos Boukas; Imogen S Stafford; Guo Cheng; Rachel Haggarty; Tracy A F Coelho; Akshay Batra; Nadeem A Afzal; Anthony P Williams; Marta E Polak; R Mark Beattie; Sarah Ennis
Journal:  Inflamm Bowel Dis       Date:  2022-06-03       Impact factor: 7.290

5.  An integrative network-based approach to identify novel disease genes and pathways: a case study in the context of inflammatory bowel disease.

Authors:  Ryohei Eguchi; Mohammand Bozlul Karim; Pingzhao Hu; Tetsuo Sato; Naoaki Ono; Shigehiko Kanaya; Md Altaf-Ul-Amin
Journal:  BMC Bioinformatics       Date:  2018-07-13       Impact factor: 3.169

Review 6.  Mutations in Hsp90 Cochaperones Result in a Wide Variety of Human Disorders.

Authors:  Jill L Johnson
Journal:  Front Mol Biosci       Date:  2021-12-08

7.  Valosin-containing protein-regulated endoplasmic reticulum stress causes NOD2-dependent inflammatory responses.

Authors:  Maryam Ghalandary; Yue Li; Thomas Fröhlich; Thomas Magg; Yanshan Liu; Meino Rohlfs; Sebastian Hollizeck; Raffaele Conca; Tobias Schwerd; Holm H Uhlig; Philip Bufler; Sibylle Koletzko; Aleixo M Muise; Scott B Snapper; Fabian Hauck; Christoph Klein; Daniel Kotlarz
Journal:  Sci Rep       Date:  2022-03-10       Impact factor: 4.379

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.